JP6518096B2

JP6518096B2 - Speech recognition system and speech recognition method

Info

Publication number: JP6518096B2
Application number: JP2015053376A
Authority: JP
Inventors: 松谷　隆司; 隆司松谷; 中村　健二; 健二中村; 祥平野本
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2015-03-17
Filing date: 2015-03-17
Publication date: 2019-05-22
Anticipated expiration: 2035-03-17
Also published as: WO2016148157A1; JP2016173464A

Description

本発明は、音声認識に使用する音声辞書を選択する技術に関する。 The present invention relates to a technology for selecting a speech dictionary used for speech recognition.

昨今、「ながらスマホ」という言葉が巷に流布されている。この言葉は、ユーザが、なんらかの行動を起こしつつ、同時に、スマートフォン等の携帯端末装置を使用することを意味し、危険な行為として警鐘が鳴らされている。このような問題は、携帯端末装置の使用（閲覧や操作）に際し、ユーザは当該携帯端末装置に視線を注ぐ必要があり、これによって周囲に対するユーザの目視力が低下することに原因がある。多くの携帯端末装置において、ユーザは、情報を受け取るためにはディスプレイを閲覧しなければならず、情報を入力する（操作する）ためにはタッチパネルやキーを注視しなければならないからである。 In recent years, the word "while smartphone" has been widely distributed. This word means that the user takes some action and at the same time uses a portable terminal device such as a smartphone, and the alarm is sounded as a dangerous act. Such a problem is caused by the fact that the user needs to look at the mobile terminal device when using (viewing and operating) the mobile terminal device, which reduces the user's visual power with respect to the surroundings. In many portable terminal devices, the user must browse the display to receive information, and must gaze at a touch panel or key to input (manipulate) information.

従来より、携帯端末装置がユーザに情報を提供する方法として、スピーカから音声を流すなどの音声案内が知られている。また、ユーザの発した言葉（音声）に対して音声認識処理を実行し、携帯端末装置に情報を入力する技術が知られている。このように、音声案内および音声認識を採用すれば、ユーザの視線を必要とすることなく携帯端末装置を使用することができる。すなわち、音声を用いたユーザインタフェース、特に、音声認識を快適に利用するための技術は、昨今の携帯端末装置に欠かすことのできない技術になりつつある。 2. Description of the Related Art Conventionally, as a method for a portable terminal device to provide information to a user, voice guidance such as streaming voice from a speaker is known. In addition, there is known a technology of performing speech recognition processing on words (voice) uttered by a user and inputting information to a portable terminal device. As described above, by adopting voice guidance and voice recognition, it is possible to use the portable terminal device without requiring the user's gaze. That is, a user interface using speech, in particular, a technology for comfortably using speech recognition is becoming an indispensable technology for the recent portable terminal devices.

音声認識の技術において、入力された音声の音声認識率を向上させるために、ユーザの位置情報に応じて、複数の音声認識辞書の中から、最適な音声認識辞書を選択して切り替える技術が提案されている。例えば、特許文献１には、３軸ジャイロセンサーおよび３軸加速度センサーの出力に基づいて、作業者の相対的な位置を演算し、求められた作業者の相対的な位置に応じて、音声認識辞書を選択する技術が記載されている。 In the technology of speech recognition, in order to improve the speech recognition rate of the input speech, a technology is proposed that selects and switches the optimum speech recognition dictionary from among a plurality of speech recognition dictionaries according to the position information of the user It is done. For example, in Patent Document 1, the relative position of the operator is calculated based on the outputs of the 3-axis gyro sensor and the 3-axis acceleration sensor, and voice recognition is performed according to the determined relative position of the operator. Techniques for selecting a dictionary are described.

また、消費電力を抑制する技術として、消費電力の小さいマイコンをメインＣＰＵとは別に搭載する技術が知られている。当該マイコンは、定常的に監視する必要があるセンサなどを制御し、その間、必要に応じてメインＣＰＵを休ませることにより、全体として消費電力を抑制する。消費電力の抑制技術は、電力の供給能力の劣る携帯端末装置において特に重要である。 Further, as a technology for suppressing power consumption, there is known a technology in which a microcomputer with small power consumption is mounted separately from the main CPU. The microcomputer controls a sensor or the like that needs to be regularly monitored, and in the meantime, the main CPU is turned off as needed to suppress power consumption as a whole. Power consumption control techniques are particularly important in portable terminal devices with poor power supply capabilities.

特開２０１０−１９１２２３号公報JP, 2010-19122, A

ところが、特許文献１に記載されている技術では、作業者（ユーザ）は、「会計入ります」というように、これからの作業の内容などを示す音声を入力しなければ、音声認識辞書の切り替えが行われないという問題があった。すなわち、音声認識辞書を切り替えるためのトリガ（契機）は、ユーザ自身が、意識的に、かつ、確実に実行しなければならないという問題があった。 However, in the technology described in Patent Document 1, if the worker (user) does not input a voice indicating the content of the work to be done from now on, such as "I am entering an account," switching of the voice recognition dictionary is required. There was a problem that it was not done. That is, there is a problem that the user has to execute the trigger for switching the speech recognition dictionary consciously and surely.

ユーザに音声認識辞書の切り替えを意識させないためには、音声認識辞書を切り替えるトリガとなる事象の発生を監視し続けなければならず、電力消費が増大するという問題を生じる。すなわち、特許文献１に記載されている技術は、入力された音声の認識率を向上させることにのみ注視しており、認識率の向上と消費電力の抑制とを両立させるという視点に欠ける技術である。 In order not to make the user aware of the switching of the speech recognition dictionary, the occurrence of an event that triggers the switching of the speech recognition dictionary has to be continuously monitored, resulting in a problem of increased power consumption. That is, the technology described in Patent Document 1 focuses only on improving the recognition rate of the input speech, and lacks the viewpoint of achieving both improvement of the recognition rate and suppression of power consumption. is there.

本発明は、上記課題に鑑みてなされたものであり、音声認識の認識精度を低下させることなく、消費電力を抑制する技術を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a technology for suppressing power consumption without reducing the recognition accuracy of speech recognition.

上記の課題を解決するため、請求項１の発明は、音声を音声辞書により認識する音声認識システムであって、通常動作モードと前記通常動作モードに比べて消費電力が抑制される省電力モードとの間で動作モードを切り替えることが可能な第１演算装置と、予め想定される複数の事象と前記音声辞書の候補となる複数の音声辞書候補とを関連づけて記憶する第１記憶装置と、事象を検出するための物理量を観測情報として取得する観測手段と、前記音声を音声情報として取得するマイクロフォンと、前記音声辞書を記憶する第２記憶装置と、前記第２記憶装置にアクセスする第２演算装置とを備え、前記第２演算装置は、前記観測手段により取得された観測情報に基づいて、前記予め想定される複数の事象の中から現在生じている事象を検出する事象検出手段と、前記マイクロフォンにより取得された音声情報と前記第２記憶装置に記憶された音声辞書とに基づいて、音声認識を実行する音声認識手段とを備え、前記事象検出手段により現在生じている事象として検出された事象に応じて、前記第１記憶装置に記憶されている複数の音声辞書候補の中から１の音声辞書候補を選択する選択手段をさらに備え、前記選択手段により選択された１の音声辞書候補を前記音声辞書として前記第２記憶装置に記憶させ、前記第１演算装置を前記省電力モードで動作させつつ前記第２演算装置を動作させたときの消費電力が、前記第１演算装置を前記通常動作モードで動作させたときの消費電力よりも小さい。 In order to solve the above-mentioned problems, the invention according to claim 1 is a voice recognition system for recognizing voice by a voice dictionary, and a power saving mode in which power consumption is suppressed as compared with a normal operation mode and the normal operation mode. A first storage device capable of switching an operation mode between the first storage device, a first storage device associating and storing a plurality of events assumed in advance and a plurality of speech dictionary candidates serving as candidates for the speech dictionary; Monitoring means for acquiring physical quantities for detecting the detection information as observation information, a microphone for acquiring the speech as speech information, a second storage device for storing the speech dictionary, and a second operation for accessing the second storage device And the second computing device detects an event that is currently occurring from among the plurality of events assumed in advance, based on the observation information acquired by the observation means. Event detection means, and speech recognition means for performing speech recognition based on the speech information acquired by the microphone and the speech dictionary stored in the second storage device; The system further comprises selection means for selecting one speech dictionary candidate from among a plurality of speech dictionary candidates stored in the first storage device according to an event detected as an occurring event, the selection by the selection means Power consumption when one voice dictionary candidate is stored as the voice dictionary in the second storage device and the first computing device is operated in the power saving mode and the second computing device is operated; It is smaller than the power consumption when the first arithmetic device is operated in the normal operation mode.

また、請求項２の発明は、請求項１の発明に係る音声認識システムであって、前記観測手段は、ユーザの動きに起因する物理量を観測情報として取得し、前記事象検出手段は、現在生じている事象として、前記ユーザの行動を推定する。 The invention according to claim 2 is the speech recognition system according to the invention according to claim 1, wherein the observation means acquires a physical quantity resulting from the movement of the user as observation information, and the event detection means currently The behavior of the user is estimated as an event that has occurred.

また、請求項３の発明は、請求項２の発明に係る音声認識システムであって、前記事象検出手段は、前記ユーザの姿勢を推定することにより、前記ユーザの行動を推定する。 The invention of claim 3 is the speech recognition system according to the invention of claim 2, wherein the event detection means estimates the action of the user by estimating the posture of the user.

また、請求項４の発明は、請求項１ないし３のいずれかの発明に係る音声認識システムであって、前記複数の音声辞書候補は、関連づけられる事象に応じて、収録される語彙が取捨選択されている。 The invention according to claim 4 is the speech recognition system according to any one of claims 1 to 3, wherein the plurality of speech dictionary candidates are selected according to the event to be associated with. It is done.

また、請求項５の発明は、請求項１ないし４のいずれかの発明に係る音声認識システムであって、ユーザにより携帯され、前記第１演算装置、前記第２演算装置および前記第２記憶装置を備える携帯端末装置と、前記携帯端末装置との間でデータ通信が可能な状態で接続され、前記第１記憶装置および前記選択手段を備えるサーバ装置とを備える。 The invention according to claim 5 is the speech recognition system according to any one of claims 1 to 4, wherein the first arithmetic device, the second arithmetic device, and the second storage device are carried by a user. And a server device connected in a state in which data communication can be performed between the mobile terminal device and the mobile terminal device and including the first storage device and the selection unit.

また、請求項６の発明は、請求項１ないし５のいずれかの発明に係る音声認識システムであって、前記第２記憶装置は、過去の履歴情報を記憶しており、前記事象検出手段は、前記第２記憶装置に記憶された履歴情報に基づいて、現在生じている事象を推定する。 The invention according to claim 6 is the speech recognition system according to any one of claims 1 to 5, wherein the second storage device stores past history information, and the event detection means The present invention estimates the currently occurring event based on the history information stored in the second storage device.

また、請求項７の発明は、音声を音声辞書により認識する音声認識方法であって、予め想定される複数の事象と前記音声辞書の候補となる複数の音声辞書候補とを関連づけて第１記憶装置に記憶する工程と、通常動作モードと前記通常動作モードに比べて消費電力が抑制される省電力モードとの間で第１演算装置の動作モードを切り替える工程と、事象を検出するための物理量を観測手段により観測情報として取得する工程と、前記観測手段により取得された観測情報に基づいて、前記予め想定される複数の事象の中から現在生じている事象を、第２演算装置により検出する工程と、現在生じている事象として前記第２演算装置により検出された事象に応じて、前記第１記憶装置に記憶されている複数の音声辞書候補の中から１の音声辞書候補を選択する工程と、選択された前記１の音声辞書候補を前記音声辞書として、前記第２演算装置によりアクセスされる第２記憶装置に記憶させる工程と、前記音声をマイクロフォンにより音声情報として取得する工程と、前記マイクロフォンにより取得された音声情報と前記第２記憶装置に記憶された音声辞書とに基づいて、前記第２演算装置により音声認識を実行する工程とを有し、前記第１演算装置を前記省電力モードで動作させつつ前記第２演算装置を動作させたときの消費電力が、前記第１演算装置を前記通常動作モードで動作させたときの消費電力よりも小さい。 The invention according to claim 7 is a speech recognition method for recognizing speech by means of a speech dictionary, wherein a plurality of events assumed in advance are associated with a plurality of speech dictionary candidates serving as candidates for the speech dictionary to store the first memory. A step of storing in the device, a step of switching the operation mode of the first arithmetic device between the normal operation mode and a power saving mode in which the power consumption is reduced compared to the normal operation mode, physical quantities for detecting an event Detecting a currently occurring event from the plurality of events assumed in advance by the second computing device based on the step of acquiring the observation information as observation information by the observation means, and the observation information acquired by the observation means One speech dictionary candidate out of a plurality of speech dictionary candidates stored in the first storage device in accordance with the process and the event detected by the second arithmetic device as the currently occurring event. Selecting the selected voice dictionary candidate as the voice dictionary and storing the selected voice dictionary candidate in the second storage device accessed by the second computing device; and obtaining the voice as voice information by the microphone Performing the voice recognition by the second computing device based on the process, and the voice information acquired by the microphone and the voice dictionary stored in the second storage device, the first computing device The power consumption when operating the second arithmetic device while operating in the power saving mode is smaller than the power consumption when operating the first arithmetic device in the normal operation mode.

請求項１ないし７に記載の発明は、第１演算装置を省電力モードで動作させつつ第２演算装置によって音声認識を実行することにより、第１演算装置を省電力モードで動作させつつ第２演算装置を動作させたときの消費電力が、第１演算装置を通常動作モードで動作させたときの消費電力よりも小さいため、消費電力を抑制することができる。また、現在生じている事象として検出された事象に応じて、第１記憶装置に記憶されている複数の音声辞書候補の中から１の音声辞書候補を選択することにより、音声認識の認識精度を低下させることなく、音声辞書の情報容量を抑制することができる。 The invention according to any one of claims 1 to 7 operates the first computing device in the power saving mode by performing voice recognition by the second computing device while operating the first computing device in the power saving mode. Since the power consumption when the arithmetic device is operated is smaller than the power consumption when the first arithmetic device is operated in the normal operation mode, the power consumption can be suppressed. Further, according to an event detected as a currently occurring event, by selecting one speech dictionary candidate from among a plurality of speech dictionary candidates stored in the first storage device, recognition accuracy of speech recognition can be increased. It is possible to suppress the information capacity of the voice dictionary without reducing it.

音声認識システムを示す図である。FIG. 1 shows a speech recognition system. サーバ装置のブロック図である。It is a block diagram of a server apparatus. データベースの構造を例示する図である。It is a figure which illustrates the structure of a database. サーバ装置が備える機能ブロックをデータの流れとともに示す図である。It is a figure which shows the functional block with which a server apparatus is equipped with the flow of data. 携帯端末装置を示すブロック図である。It is a block diagram showing a personal digital assistant device. 携帯端末装置が備える機能ブロックをデータの流れとともに示す図である。It is a figure which shows the functional block with which a portable terminal device is equipped with the flow of data. サーバ装置の動作を示す流れ図である。It is a flowchart which shows operation | movement of a server apparatus. 携帯端末装置の動作を示す流れ図である。It is a flowchart which shows operation | movement of a portable terminal device. 携帯端末装置が実行する更新要求処理を示す流れ図である。It is a flowchart which shows the update request | requirement process which a portable terminal device performs.

以下、本発明の好適な実施の形態について、添付の図面を参照しつつ、詳細に説明する。ただし、以下の説明において特に断らない限り、方向や向きに関する記述は、当該説明の便宜上、図面に対応するものであり、例えば実施品、製品または権利範囲等を限定するものではない。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the attached drawings. However, unless otherwise specified in the following description, the description regarding the direction or the direction corresponds to the drawing for the convenience of the description, and does not limit, for example, the implemented product, the product or the scope of rights.

＜１．実施の形態＞
図１は、音声認識システム１を示す図である。音声認識システム１は、サーバ装置２と、ユーザによって携帯される携帯端末装置３とを備えている。サーバ装置２と携帯端末装置３との間では、データ通信が可能となるように、図示しないネットワークによって接続されている。このようなネットワークとして、例えば、携帯電話網などを利用することができる。 <1. Embodiment>
FIG. 1 is a diagram showing a speech recognition system 1. The voice recognition system 1 includes a server device 2 and a portable terminal device 3 carried by a user. The server device 2 and the portable terminal device 3 are connected by a network (not shown) so as to enable data communication. For example, a mobile telephone network can be used as such a network.

なお、音声認識システム１が備えるサーバ装置２や携帯端末装置３の数は、図１に示す１台に限定されるものではない。すなわち、音声認識システム１は、複数のサーバ装置２や複数の携帯端末装置３から構成されていてもよい。また、後述するサーバ装置２の構成および機能が複数の装置により分担して実現されていてもよい。さらに、サーバ装置２と携帯端末装置３とを接続するネットワークの形態は、単一のネットワークに限定されるものではなく、複合網であってもよい。また、以下の説明では、サーバ装置２を操作する者を「オペレータ」と称し、携帯端末装置３を操作する者を「ユーザ」と称する。 The number of server devices 2 and portable terminal devices 3 provided in the speech recognition system 1 is not limited to one as shown in FIG. That is, the speech recognition system 1 may be configured of a plurality of server devices 2 and a plurality of mobile terminal devices 3. Further, the configuration and function of the server device 2 described later may be shared and realized by a plurality of devices. Furthermore, the form of the network connecting the server device 2 and the portable terminal device 3 is not limited to a single network, and may be a complex network. Further, in the following description, a person who operates the server device 2 is referred to as an “operator”, and a person who operates the mobile terminal device 3 is referred to as a “user”.

図２は、サーバ装置２のブロック図である。サーバ装置２は、ＣＰＵ２０、記憶装置２１、操作部２２、表示部２３および通信部２４を備えている。 FIG. 2 is a block diagram of the server device 2. The server device 2 includes a CPU 20, a storage device 21, an operation unit 22, a display unit 23, and a communication unit 24.

ＣＰＵ２０は、記憶装置２１に格納されているプログラム２１０を読み取りつつ実行し、各種データの演算や制御信号の生成等を行う。これにより、ＣＰＵ２０は、サーバ装置２が備える各構成を制御するとともに、各種データを演算し作成する機能を有している。すなわち、サーバ装置２は、一般的なコンピュータとして構成されている。 The CPU 20 reads and executes the program 210 stored in the storage device 21 to calculate various data and generate control signals. Thus, the CPU 20 has a function of controlling each component of the server device 2 and calculating and creating various data. That is, the server device 2 is configured as a general computer.

記憶装置２１は、サーバ装置２において各種データを記憶する機能を提供する。言い換えれば、記憶装置２１がサーバ装置２において電子的に固定された情報を保存する。 The storage device 21 provides a function of storing various data in the server device 2. In other words, the storage device 21 stores the electronically fixed information in the server device 2.

記憶装置２１としては、ＣＰＵ２０の一時的なワーキングエリアとして使用されるＲＡＭやバッファ、読み取り専用のＲＯＭ、不揮発性のメモリ（例えばＮＡＮＤメモリなど）、比較的大容量のデータを記憶するハードディスク、専用の読み取り装置に装着された可搬性の記憶媒体（ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＰＣカード、ＳＤカード、ＵＳＢメモリなど）等が該当する。図２においては、記憶装置２１を、あたかも１つの構造物であるかのように図示している。しかし、通常、記憶装置２１は、上記例示した各種装置（あるいは媒体）のうち、必要に応じて採用される複数種類の装置から構成されるものである。すなわち、記憶装置２１は、データを記憶する機能を有する装置群の総称である。 The storage device 21 may be a RAM or buffer used as a temporary working area of the CPU 20, a read only ROM, a non-volatile memory (for example, a NAND memory), a hard disk for storing relatively large capacity data, a dedicated memory A portable storage medium (such as a CD-ROM, a DVD-ROM, a PC card, an SD card, a USB memory, etc.) attached to the reading device corresponds to this. In FIG. 2, the storage device 21 is illustrated as if it were a single structure. However, in general, the storage device 21 is configured of a plurality of types of devices adopted as needed among the various devices (or media) exemplified above. That is, the storage device 21 is a generic name of a device group having a function of storing data.

また、現実のＣＰＵ２０は高速にアクセス可能なＲＡＭを内部に備えた電子回路である。しかし、このようなＣＰＵ２０が備える記憶装置も、説明の都合上、記憶装置２１に含めて説明する。すなわち、一時的にＣＰＵ２０自体が記憶するデータも、記憶装置２１が記憶するとして説明する。図２に示すように、記憶装置２１は、プログラム２１０、データベース２１１、選択辞書情報２１２および更新要求情報３１１（事象情報３７２）などを記憶するために使用される。 The actual CPU 20 is an electronic circuit internally provided with a RAM that can be accessed at high speed. However, for convenience of explanation, the storage device provided with such a CPU 20 is also included in the storage device 21 and described. That is, the data temporarily stored by the CPU 20 itself is also described as being stored by the storage device 21. As shown in FIG. 2, the storage device 21 is used to store a program 210, a database 211, selected dictionary information 212, update request information 311 (event information 372), and the like.

図３は、データベース２１１の構造を例示する図である。図３に示すように、データベース２１１は、１つの音声辞書候補ごとに１つのレコードが作成されるテーブル構造の情報である。データベース２１１の各レコードには、レコード番号と、事象と、音声辞書候補とが互いに関連づけられて格納される。 FIG. 3 is a diagram illustrating the structure of the database 211. As shown in FIG. 3, the database 211 is information of a table structure in which one record is created for each voice dictionary candidate. In each record of the database 211, a record number, an event, and an audio dictionary candidate are associated with each other and stored.

レコード番号は、データベース２１１の各レコードを個別に識別するための識別子である。図３に示す例では、データベース２１１には、ｎ個のレコードが記録されている（ｎは２以上の自然数。）。 The record number is an identifier for identifying each record of the database 211 individually. In the example shown in FIG. 3, n records are recorded in the database 211 (n is a natural number of 2 or more).

事象は、予め想定されるものであって、例えば、ユーザの状態種別（性別、年齢など）、ユーザの行動種別（調理や学習、通勤など）、周囲の状況種別（天気、季節、時間、屋外や屋内、シーンなど）などである。もちろん、これらの種別が複合的に組み合わされて、１つの事象とされてもよい。事象としては、ユーザの使用する言語の語彙が特徴的に予測できるものを想定することが好ましい。図３に示すデータベース２１１の例では、事象として、「ショッピング」、「ジョギング」および「デフォルト」が登録されている。 The event is assumed in advance, and for example, the user's condition type (sex, age, etc.), the user's action type (cooking, learning, commuting, etc.), surrounding situation type (weather, season, time, outdoor) And indoors, scenes etc.). Of course, these types may be combined in combination to be one event. As the event, it is preferable to assume that the vocabulary of the language used by the user can be predicted characteristically. In the example of the database 211 shown in FIG. 3, “shopping”, “jogging” and “default” are registered as events.

音声辞書候補は、予め想定され、データベース２１１に登録されている複数の事象に対応して、それぞれ準備される個別の音声辞書である。例えば、「ショッピング」に対応して準備され、関連づけられている第１音声辞書は、ショッピングにおいて使用すると予想される語彙（例えば、商品名や店名、値段、用途などの言葉）を主に含む辞書である。また、例えば、「ジョギング」に関連づけられている第２音声辞書は、ジョギングにおいて使用すると予想される語彙（例えば、ペースや脈拍、コース、アドバイス、消費カロリーなどの言葉）を主に含む辞書である。 The speech dictionary candidates are individual speech dictionaries prepared corresponding to a plurality of events assumed in advance and registered in the database 211. For example, the first voice dictionary prepared and associated with "shopping" is a dictionary mainly including words (such as product name, store name, price, usage, etc.) expected to be used in shopping It is. Also, for example, the second audio dictionary associated with "jogging" is a dictionary mainly including vocabulary (for example, words such as pace, pulse, course, advice, consumed calories, etc.) expected to be used in jogging .

このように、データベース２１１に予め登録しておく複数の音声辞書候補は、関連づけられる事象に応じて、収録される語彙が取捨選択されている。これにより、各語彙に関する認識精度を低下させることなく、音声辞書として使用する情報の容量を抑制することができる。一般的な音声辞書（汎用辞書）は、情報容量が数メガバイト程度である。しかし、事象を特定し、語彙を限定することにより、音声認識システム１は、例えば、音声辞書候補の情報容量として、数キロバイト程度に抑えることができる。 As described above, as for the plurality of speech dictionary candidates registered in advance in the database 211, the vocabulary to be recorded is selected according to the events to be associated. As a result, it is possible to suppress the volume of information used as a speech dictionary without reducing the recognition accuracy of each vocabulary. A general voice dictionary (general-purpose dictionary) has an information capacity of about several megabytes. However, by specifying the event and limiting the vocabulary, the speech recognition system 1 can suppress the information volume of the speech dictionary candidate to, for example, several kilobytes.

なお、事象「デフォルト」に関連づけられている「第ｎ音声辞書」は、現在生じている事象（検索キーとしての事象、詳細後述。）が、レコード番号「００１」から「ｎ−１」までの事象に該当しないときにも選択される音声辞書候補とする。ここに示す第ｎ音声辞書は、汎用性のある語彙が登録されているものの、精度（例えば、サンプリング周波数）を抑えた音声辞書である。すなわち、サーバ装置２において、第ｎ音声辞書が選択された場合、使用される音声辞書の情報容量は他の音声辞書候補と同等であるが、認識精度は犠牲となる。 The "n-th speech dictionary" associated with the event "default" is an event currently occurring (event as a search key, details will be described later) from record numbers "001" to "n-1". The phonetic dictionary candidate is also selected when it does not correspond to an event. The n-th speech dictionary shown here is a speech dictionary in which general-purpose vocabulary is registered, but the accuracy (for example, sampling frequency) is suppressed. That is, when the n-th speech dictionary is selected in the server device 2, the information capacity of the speech dictionary to be used is equal to that of the other speech dictionary candidates, but the recognition accuracy is sacrificed.

すでに説明したように、データベース２１１の１つのレコードには１つの事象と１つの音声辞書候補とが格納される。そして、データベース２１１には、複数（ｎ個）のレコードが作成される。すなわち、記憶装置２１は、データベース２１１を記憶することにより、予め想定される複数の事象と、音声辞書の候補となる複数の音声辞書候補とを互いに関連づけて記憶する。したがって、記憶装置２１は、第１記憶装置に相当する。 As described above, one record of the database 211 stores one event and one speech dictionary candidate. Then, a plurality (n) of records are created in the database 211. That is, by storing the database 211, the storage device 21 associates and stores a plurality of events assumed in advance and a plurality of speech dictionary candidates serving as speech dictionary candidates. Therefore, the storage device 21 corresponds to a first storage device.

詳細は後述するが、図２に示す選択辞書情報２１２は、データベース２１１に登録されている複数の音声辞書候補の中から、ＣＰＵ２０によって選択された１の音声辞書候補を含む情報である。選択辞書情報２１２は、サーバ装置２から、更新要求情報３１１（事象情報３７２）を送信した携帯端末装置３に向けて送信される。 Although details will be described later, the selected dictionary information 212 shown in FIG. 2 is information including one speech dictionary candidate selected by the CPU 20 from among a plurality of speech dictionary candidates registered in the database 211. The selected dictionary information 212 is transmitted from the server device 2 to the portable terminal device 3 that has transmitted the update request information 311 (event information 372).

更新要求情報３１１は、携帯端末装置３において作成され、サーバ装置２により受信される情報である。更新要求情報３１１は、当該携帯端末装置３を個別に識別する識別子（例えば、ネットワークアドレスなど）と、事象情報３７２とを含む情報である。詳細は後述するが、携帯端末装置３は、更新要求情報３１１をサーバ装置２に向けて送信することにより、音声辞書の更新をサーバ装置２に要求する。 The update request information 311 is information generated by the mobile terminal device 3 and received by the server device 2. The update request information 311 is information including an identifier (for example, a network address or the like) for individually identifying the mobile terminal device 3 and event information 372. Although the details will be described later, the mobile terminal device 3 requests the server device 2 to update the voice dictionary by transmitting the update request information 311 to the server device 2.

事象情報３７２は、先述のように、更新要求情報３１１に含まれる情報である。詳細は後述するが、事象情報３７２は、当該事象情報３７２を含む更新要求情報３１１を送信した携帯端末装置３において現在生じている事象を示す情報である。 The event information 372 is information included in the update request information 311 as described above. Although details will be described later, the event information 372 is information indicating an event currently occurring in the portable terminal device 3 that has transmitted the update request information 311 including the event information 372.

操作部２２は、サーバ装置２に対してオペレータ等が指示を入力するために操作するハードウェアである。操作部２２としては、例えば、各種キーやボタン類、スイッチ、タッチパネル、ポインティングデバイス、あるいは、ジョグダイヤルなどが該当する。操作部２２は、例えば、オペレータがデータベース２１１を構築する際などに操作される。 The operation unit 22 is hardware operated by an operator or the like to input an instruction to the server device 2. The operation unit 22 corresponds to, for example, various keys and buttons, a switch, a touch panel, a pointing device, or a jog dial. The operation unit 22 is operated, for example, when an operator constructs the database 211.

表示部２３は、各種データを表示することによりオペレータ等に対して出力する機能を有するハードウェアである。表示部２３としては、例えば、ランプやＬＥＤ、ＣＲＴ、液晶ディスプレイや液晶パネルなどが該当する。 The display unit 23 is hardware having a function of outputting various data to an operator or the like. The display unit 23 corresponds to, for example, a lamp, an LED, a CRT, a liquid crystal display, a liquid crystal panel, or the like.

通信部２４は、サーバ装置２が携帯端末装置３とデータ通信を行う機能を提供する。すなわち、サーバ装置２は、通信部２４により、携帯端末装置３から送信された情報を受信するとともに、携帯端末装置３に向けて情報を送信する。サーバ装置２が携帯端末装置３から受信する情報としては、例えば、更新要求情報３１１がある。また、サーバ装置２が携帯端末装置３に向けて送信する情報としては、例えば、選択辞書情報２１２がある。通信部２４は、選択辞書情報２１２を送信するときに、更新要求情報３１１を参照して、当該更新要求情報３１１を送信した携帯端末装置３を特定する。 The communication unit 24 provides a function for the server device 2 to perform data communication with the mobile terminal device 3. That is, the server device 2 causes the communication unit 24 to receive the information transmitted from the mobile terminal device 3 and transmits the information to the mobile terminal device 3. As information that the server device 2 receives from the portable terminal device 3, there is, for example, update request information 311. Further, as information that the server device 2 transmits to the mobile terminal device 3, for example, there is selection dictionary information 212. When transmitting the selected dictionary information 212, the communication unit 24 refers to the update request information 311 to identify the mobile terminal device 3 that has transmitted the update request information 311.

図４は、サーバ装置２が備える機能ブロックをデータの流れとともに示す図である。図４に示す選択部２００は、ＣＰＵ２０がプログラム２１０に従って動作することにより実現される機能ブロックである。 FIG. 4 is a diagram showing functional blocks included in the server device 2 together with the flow of data. The selection unit 200 illustrated in FIG. 4 is a functional block implemented by the CPU 20 operating according to the program 210.

選択部２００は、更新要求情報３１１に含まれる事象情報３７２に応じて、記憶装置２１に記憶されているデータベース２１１を検索し、複数の音声辞書候補の中から１の音声辞書候補を選択する。また、選択部２００は、選択した１の音声辞書候補を含む選択辞書情報２１２を作成する。 The selection unit 200 searches the database 211 stored in the storage device 21 according to the event information 372 included in the update request information 311, and selects one speech dictionary candidate from among a plurality of speech dictionary candidates. In addition, the selection unit 200 creates selection dictionary information 212 including one selected speech dictionary candidate.

事象情報３７２は、携帯端末装置３において作成される情報であって、当該携帯端末装置３からサーバ装置２に向けて送信される情報である。携帯端末装置３は、音声辞書を更新する必要が生じた場合、事象情報３７２を含む更新要求情報３１１を作成し、サーバ装置２に向けて送信する。すでに説明したように、事象情報３７２は、携帯端末装置３において、現在生じている事象として検出された事象を示す情報である。選択部２００は、事象情報３７２に示される事象を検索キーとして、データベース２１１を検索し、当該事象に関連づけられている音声辞書候補を特定する機能を有している。 The event information 372 is information created in the mobile terminal device 3 and is information transmitted from the mobile terminal device 3 to the server device 2. When it is necessary to update the voice dictionary, the mobile terminal device 3 creates update request information 311 including the event information 372 and transmits it to the server device 2. As described above, the event information 372 is information indicating an event detected as an event that is currently occurring in the mobile terminal device 3. The selection unit 200 has a function of searching the database 211 using an event indicated in the event information 372 as a search key, and identifying a speech dictionary candidate associated with the event.

したがって、事象情報３７２に、現在生じている事象として、例えば「ショッピング」が示されている場合、図３に例示するデータベース２１１では、「ショッピング」に関連づけられている第１音声辞書が１の音声辞書候補として選択される。この場合、選択部２００によって作成される選択辞書情報２１２は、第１音声辞書となる。 Therefore, when the event information 372 indicates, for example, “shopping” as an event occurring at present, in the database 211 illustrated in FIG. 3, the first voice dictionary associated with “shopping” is an audio of 1. Selected as a dictionary candidate. In this case, the selected dictionary information 212 created by the selecting unit 200 is the first speech dictionary.

作成された選択辞書情報２１２は、携帯端末装置３からの更新要求（更新要求情報３１１）に対する応答として、通信部２４により当該更新要求を行った携帯端末装置３に向けて送信される。 The selected dictionary information 212 thus created is transmitted by the communication unit 24 to the portable terminal device 3 that has issued the update request as a response to the update request (update request information 311) from the portable terminal device 3.

図５は、携帯端末装置３を示すブロック図である。携帯端末装置３は、ＣＰＵ３０、記憶装置３１、操作部３２、表示部３３、通信部３４およびスピーカ３５を備えている。 FIG. 5 is a block diagram showing the mobile terminal device 3. The mobile terminal device 3 includes a CPU 30, a storage device 31, an operation unit 32, a display unit 33, a communication unit 34, and a speaker 35.

ＣＰＵ３０は、記憶装置３１に格納されているプログラム３１０を読み取りつつ実行し、各種データの演算や制御信号の生成等を行う。これにより、ＣＰＵ３０は、携帯端末装置３が備える各構成を制御するとともに、各種データを演算し作成する機能を有している。すなわち、携帯端末装置３は、一般的なコンピュータとして構成されている。 The CPU 30 reads and executes the program 310 stored in the storage device 31 to calculate various data and generate control signals. Thus, the CPU 30 has functions of controlling each component of the mobile terminal 3 and calculating and creating various data. That is, the portable terminal device 3 is configured as a general computer.

また、ＣＰＵ３０は、動作モードとして、すべての機能を利用可能な通常動作モードと、機能の一部または全部が制限される省電力モードとを有している。省電力モードは、いわゆるスリープモードであり、機能の一部または全部が制限される代わりに、ＣＰＵ３０における消費電力を抑制することができる動作モードである。 In addition, the CPU 30 has, as operation modes, a normal operation mode in which all functions can be used, and a power saving mode in which part or all of the functions are limited. The power saving mode is a so-called sleep mode, and is an operation mode in which power consumption in the CPU 30 can be suppressed instead of limiting some or all of the functions.

なお、省電力モードには、複数のモードが段階的に定義されていてもよい。また、以下の説明では、「ＣＰＵ３０が省電力モードで動作する」とは、ＣＰＵ３０が完全に停止する場合も含む意味として使用する。 A plurality of modes may be defined stepwise in the power saving mode. Further, in the following description, “the CPU 30 operates in the power saving mode” is used to include the case where the CPU 30 is completely stopped.

記憶装置３１は、携帯端末装置３において各種データを記憶する機能を提供する。記憶装置３１としては、ＣＰＵ３０の一時的なワーキングエリアとして使用されるＲＡＭやバッファ、読み取り専用のＲＯＭ、不揮発性のメモリ（例えばＮＡＮＤメモリなど）、比較的大容量のデータを記憶するハードディスク、専用の読み取り装置に装着された可搬性の記憶媒体（ＰＣカード、ＳＤカード、ＵＳＢメモリなど）等が該当する。図５においては、記憶装置３１を、あたかも１つの構造物であるかのように図示している。しかし、通常、記憶装置３１は、上記例示した各種装置（あるいは媒体）のうち、必要に応じて採用される複数種類の装置から構成されるものである。すなわち、記憶装置３１は、データを記憶する機能を有し、ＣＰＵ３０によってアクセスされる装置群の総称である。 The storage device 31 provides a function of storing various data in the mobile terminal device 3. The storage device 31 may be a RAM or buffer used as a temporary working area of the CPU 30, a read only ROM, a non-volatile memory (for example, a NAND memory), a hard disk for storing relatively large capacity data, a dedicated memory This corresponds to a portable storage medium (PC card, SD card, USB memory, etc.) attached to the reading device. In FIG. 5, the storage device 31 is illustrated as if it were a single structure. However, in general, the storage device 31 is configured by a plurality of types of devices adopted as needed among the various devices (or media) exemplified above. That is, the storage device 31 has a function of storing data, and is a generic name of devices accessed by the CPU 30.

また、現実のＣＰＵ３０は高速にアクセス可能なＲＡＭを内部に備えた電子回路である。しかし、このようなＣＰＵ３０が備える記憶装置も、説明の都合上、記憶装置３１に含めて説明する。すなわち、一時的にＣＰＵ３０自体が記憶するデータも、記憶装置３１が記憶するとして説明する。図５に示すように、記憶装置３１は、プログラム３１０および更新要求情報３１１などを記憶するために使用される。 The actual CPU 30 is an electronic circuit internally provided with a RAM that can be accessed at high speed. However, for convenience of explanation, the storage device included in such a CPU 30 will be described by being included in the storage device 31. That is, the data temporarily stored by the CPU 30 is also described as being stored by the storage device 31. As shown in FIG. 5, the storage device 31 is used to store the program 310, the update request information 311, and the like.

操作部３２は、携帯端末装置３に対してユーザが指示を入力するために操作するハードウェアである。操作部３２としては、例えば、各種キーやボタン類、スイッチ、タッチパネル、ポインティングデバイス、あるいは、ジョグダイヤルなどが該当する。 The operation unit 32 is hardware operated by the user to input an instruction to the mobile terminal device 3. The operation unit 32 corresponds to, for example, various keys and buttons, a switch, a touch panel, a pointing device, or a jog dial.

表示部３３は、各種データを表示することによりユーザに対して出力する機能を有するハードウェアである。表示部３３としては、例えば、ランプやＬＥＤ、液晶ディスプレイや液晶パネルなどが該当する。 The display unit 33 is hardware having a function of outputting various data to the user. The display unit 33 corresponds to, for example, a lamp, an LED, a liquid crystal display, a liquid crystal panel, or the like.

通信部３４は、携帯端末装置３がサーバ装置２とデータ通信を行う機能を提供する。すなわち、携帯端末装置３は、通信部３４により、サーバ装置２から送信された情報を受信するとともに、サーバ装置２に向けて情報を送信する。携帯端末装置３がサーバ装置２から受信する情報としては、例えば、選択辞書情報２１２がある。また、携帯端末装置３がサーバ装置２に向けて送信する情報としては、例えば、更新要求情報３１１がある。 The communication unit 34 provides a function for the portable terminal device 3 to perform data communication with the server device 2. That is, the mobile terminal device 3 causes the communication unit 34 to receive the information transmitted from the server device 2 and transmit the information to the server device 2. As information that the mobile terminal device 3 receives from the server device 2, for example, there is selection dictionary information 212. Further, as information transmitted to the server device 2 by the mobile terminal device 3, for example, there is the update request information 311.

スピーカ３５は、音声情報（一般に音声情報３７４とは異なるが、音声情報３７４を再生してもよい。）に基づいて、音声を出力する機能を有するハードウェアである。スピーカ３５は、ユーザに対して、音声案内をしたり、音楽や放送番組、受話機能などを提供するために用いられる。 The speaker 35 is hardware having a function of outputting voice based on voice information (generally different from the voice information 374 but may reproduce the voice information 374). The speaker 35 is used to give voice guidance to the user and to provide music, a broadcast program, a receiving function, and the like.

さらに、携帯端末装置３は、ＭＰＵ３６、記憶装置３７、観測装置群３８およびマイクロフォン３９を備えている。 Furthermore, the portable terminal device 3 includes an MPU 36, a storage device 37, an observation device group 38, and a microphone 39.

ＭＰＵ３６は、記憶装置３７に格納されているプログラム３７０を読み取りつつ実行し、各種データの演算や制御信号の生成等を行う。これにより、ＭＰＵ３６は、携帯端末装置３が備える各構成を制御するとともに、各種データを演算し作成する機能を有している。 The MPU 36 reads and executes the program 370 stored in the storage device 37, and performs operations of various data, generation of control signals, and the like. Thus, the MPU 36 has functions of controlling each component of the mobile terminal 3 and calculating and creating various data.

また、ＭＰＵ３６は、動作時に消費する電力が少ない演算装置として構成されている。具体的には、ＣＰＵ３０を省電力モードで動作させつつＭＰＵ３６を動作させた方が、ＣＰＵ３０が通常動作モードで動作する場合に比べて、携帯端末装置３における消費電力が少なくなるように設計されている。すなわち、ＭＰＵ３６は、携帯端末装置３におけるメインの演算装置であるＣＰＵ３０に比べて処理能力の低い、いわゆる組み込み用途のＬＳＩである。 Further, the MPU 36 is configured as an arithmetic device that consumes less power at the time of operation. Specifically, it is designed that operating the MPU 36 while operating the CPU 30 in the power saving mode consumes less power in the mobile terminal device 3 than when the CPU 30 operates in the normal operation mode. There is. That is, the MPU 36 is a so-called built-in LSI whose processing capacity is lower than that of the CPU 30 which is the main arithmetic unit in the portable terminal device 3.

記憶装置３７は、携帯端末装置３において各種データを記憶する機能を提供する。言い換えれば、記憶装置３７が、記憶装置３１と同様に、携帯端末装置３において電子的に固定された情報を保存する。 The storage device 37 provides a function of storing various data in the mobile terminal device 3. In other words, the storage device 37 stores the information electronically fixed in the portable terminal device 3 as the storage device 31 does.

記憶装置３７としては、ＭＰＵ３６の一時的なワーキングエリアとして使用されるＲＡＭやバッファ、読み取り専用のＲＯＭ、不揮発性のメモリ（例えばＮＡＮＤメモリなど）等が該当する。図５においては、記憶装置３７を、あたかも１つの構造物であるかのように図示している。しかし、通常、記憶装置３７は、上記例示した各種装置（あるいは媒体）のうち、必要に応じて採用される複数種類の装置から構成されるものである。すなわち、記憶装置３７は、データを記憶する機能を有し、ＭＰＵ３６によってアクセスされる装置群の総称である。 As the storage device 37, a RAM or buffer used as a temporary working area of the MPU 36, a read only ROM, a non-volatile memory (for example, a NAND memory or the like) or the like corresponds. In FIG. 5, the storage device 37 is illustrated as if it were a single structure. However, in general, the storage device 37 is configured of a plurality of types of devices adopted as needed among the various devices (or media) exemplified above. That is, the storage device 37 has a function of storing data, and is a generic name of devices accessed by the MPU 36.

また、現実のＭＰＵ３６は高速にアクセス可能なＲＡＭを内部に備えた電子回路である。しかし、このようなＭＰＵ３６が備える記憶装置も、説明の都合上、記憶装置３７に含めて説明する。すなわち、一時的にＭＰＵ３６自体が記憶するデータも、記憶装置３７が記憶するとして説明する。図５に示すように、記憶装置３７は、プログラム３７０、観測情報３７１、事象情報３７２、履歴情報３７３、音声情報３７４および選択辞書情報２１２などを記憶するために使用される。 The actual MPU 36 is an electronic circuit internally provided with a RAM that can be accessed at high speed. However, for convenience of explanation, the storage device included in such MPU 36 is also included in the storage device 37 and described. That is, the data temporarily stored by the MPU 36 itself will be described as being stored by the storage device 37. As shown in FIG. 5, the storage device 37 is used to store a program 370, observation information 371, event information 372, history information 373, voice information 374, selected dictionary information 212 and the like.

観測装置群３８は、周囲の環境を示す情報や携帯端末装置３（携帯端末装置３を携帯するユーザ）の動きに関する情報などを検出して、観測情報３７１を取得する複数の検出装置から構成される。観測装置群３８としては、温度センサ、気圧センサ、湿度センサ、照度センサ、振動感知センサ、位置を特定するＧＰＳ、周囲の画像を取得する撮像装置、ジャイロセンサ、加速度センサ、磁気センサ、脈拍センサ、血圧センサなどが想定される。 The observation device group 38 includes a plurality of detection devices that obtain observation information 371 by detecting information indicating the surrounding environment, information on the movement of the mobile terminal 3 (a user carrying the mobile terminal 3), and the like. Ru. The observation device group 38 includes a temperature sensor, an air pressure sensor, a humidity sensor, an illuminance sensor, a vibration sensor, a GPS for specifying a position, an imaging device for acquiring an image of the surroundings, a gyro sensor, an acceleration sensor, a magnetic sensor, a pulse sensor, A blood pressure sensor etc. are assumed.

また、ジャイロセンサ、加速度センサおよび磁気センサなどは、ユーザの動きに起因する物理量を観測情報３７１として取得する。ジャイロセンサや加速度センサ、磁気センサなどにより取得された情報に基づいて、携帯端末装置３を所持しているユーザの姿勢を推定したり、行動を推定したりする手法は従来の技術を適宜適用することができるため、ここでは詳細な説明を省略する。なお、後述するマイクロフォン３９によって取得される音声情報３７４を観測情報３７１の一部とみなしてもよい。 In addition, the gyro sensor, the acceleration sensor, the magnetic sensor, and the like acquire physical quantities resulting from the movement of the user as observation information 371. The method of estimating the posture of the user carrying the portable terminal device 3 or estimating the action based on the information acquired by the gyro sensor, the acceleration sensor, the magnetic sensor or the like appropriately applies the conventional technique. Here, the detailed description will be omitted. Note that audio information 374 acquired by the microphone 39 described later may be regarded as part of the observation information 371.

観測装置群３８は、ＭＰＵ３６によって制御されており、ＣＰＵ３０における通常動作モードのときのみならず、省電力モードのときにおいても、観測情報３７１の取得が可能とされている。ただし、ＭＰＵ３６により、必要に応じて、観測装置群３８のうちの一部の装置が停止されてもよい。 The observation device group 38 is controlled by the MPU 36, and acquisition of observation information 371 is enabled not only in the normal operation mode of the CPU 30 but also in the power saving mode. However, part of the observation device group 38 may be stopped by the MPU 36 as needed.

マイクロフォン３９は、周囲の音声を電気信号に変換し、音声情報３７４を取得する機能を有するハードウェアである。マイクロフォン３９は、観測装置群３８と同様に、ＭＰＵ３６によって制御されており、ＣＰＵ３０における通常動作モードのときのみならず、省電力モードのときにおいても、音声情報３７４の取得が可能とされている。携帯端末装置３がマイクロフォン３９を備えることにより、ユーザの発する音声（言語）は、音声情報３７４に変換され、記憶装置３７に記憶される。 The microphone 39 is hardware having a function of converting ambient sound into an electrical signal and acquiring audio information 374. The microphone 39 is controlled by the MPU 36 in the same manner as the observation device group 38, and the voice information 374 can be obtained not only in the normal operation mode of the CPU 30 but also in the power saving mode. By providing the microphone 39 to the portable terminal device 3, the voice (language) emitted by the user is converted into voice information 374 and stored in the storage device 37.

図６は、携帯端末装置３が備える機能ブロックをデータの流れとともに示す図である。図６に示すインタフェース部３６０、事象検出部３６１および音声認識部３６２は、ＭＰＵ３６がプログラム３７０に従って動作することにより実現される機能ブロックである。 FIG. 6 is a diagram showing functional blocks included in the mobile terminal device 3 together with the flow of data. The interface unit 360, the event detection unit 361, and the voice recognition unit 362 illustrated in FIG. 6 are functional blocks implemented by the MPU 36 operating according to the program 370.

インタフェース部３６０は、ＣＰＵ３０とＭＰＵ３６との間の信号の入出力を制御する機能を有する。インタフェース部３６０は、ＣＰＵ３０から転送される選択辞書情報２１２を記憶装置３７に記憶させる。また、事象検出部３６１によって作成された事象情報３７２をＣＰＵ３０に向けて転送する。さらに、音声認識部３６２の認識結果をＣＰＵ３０に向けて転送する機能も有している。 The interface unit 360 has a function of controlling input and output of signals between the CPU 30 and the MPU 36. The interface unit 360 causes the storage device 37 to store the selected dictionary information 212 transferred from the CPU 30. Also, event information 372 created by the event detection unit 361 is transferred to the CPU 30. Furthermore, it also has a function of transferring the recognition result of the speech recognition unit 362 to the CPU 30.

また、インタフェース部３６０は、必要に応じて、省電力モードで動作しているＣＰＵ３０の動作モードを通常動作モードに切り替える機能も有している。これにより、ＭＰＵ３６は、例えば、複雑な処理が必要になったときに、ＣＰＵ３０を通常動作モードに復帰させて処理させることができる。 The interface unit 360 also has a function of switching the operation mode of the CPU 30 operating in the power saving mode to the normal operation mode, as necessary. Thus, the MPU 36 can return the CPU 30 to the normal operation mode and process it, for example, when complicated processing is required.

事象検出部３６１は、観測装置群３８により取得された観測情報３７１と履歴情報３７３とに基づいて、予め想定される複数の事象の中から現在生じている事象を検出する。事象検出部３６１は、特定の事象（現在生じている事象）を検出した場合には、当該事象を示す事象情報３７２を作成する。より詳細には、事象検出部３６１は、常時取得される観測情報３７１により、現在生じている事象を監視しつつ、現在生じている事象の変化を検出する。そして、事象の変化を検出したときには、新たに現在生じている事象となった事象を示す事象情報３７２を作成する。 The event detection unit 361 detects a currently occurring event from among a plurality of events assumed in advance, based on the observation information 371 and the history information 373 acquired by the observation device group 38. The event detection unit 361, when detecting a specific event (event currently occurring), creates event information 372 indicating the event. More specifically, the event detection unit 361 detects a change in a currently occurring event while monitoring a currently occurring event based on observation information 371 constantly acquired. Then, when a change in an event is detected, event information 372 indicating an event that has become a newly occurring event is created.

また、事象検出部３６１は、履歴情報３７３を作成する機能も有している。履歴情報３７３とは、過去に、どのような観測情報３７１に基づいて、どの事象を検出したかといった情報である。履歴情報３７３の具体例としては、例えば、日曜の朝にユーザはジョギングをしたといった情報や、夕方の所定の時刻に料理をしたといった情報、あるいは、所定の位置（お店の場所）では買い物をしたといった情報などである。これにより、ユーザの行動パターンなどに基づいて事象を検出することができるため、事象の検出精度が向上する。 The event detection unit 361 also has a function of creating history information 373. The history information 373 is information such as which event has been detected based on what kind of observation information 371 in the past. Specific examples of the history information 373 include, for example, information that the user jogged on Sunday morning, information that the user cooked at a predetermined time in the evening, or shopping at a predetermined position (shop location) And other information. Thereby, since an event can be detected based on a user's action pattern etc., detection accuracy of an event improves.

なお、履歴情報３７３には、音声認識部３６２による認識率などを含めてもよい。このように構成することによって、事象情報３７２に基づいて選択された音声辞書候補（選択辞書情報２１２）の認識率をフィードバックすることができ、以後は、さらに適切な音声辞書候補が選択されることとなる。また、履歴情報３７３は、操作部３２がユーザによって操作されることにより、入力された情報に基づいて作成されてもよい。 The history information 373 may include the recognition rate by the speech recognition unit 362 or the like. By this configuration, it is possible to feed back the recognition rate of the voice dictionary candidate (selected dictionary information 212) selected based on the event information 372, and thereafter, a more appropriate voice dictionary candidate is selected. It becomes. Also, the history information 373 may be created based on the input information when the operation unit 32 is operated by the user.

音声認識部３６２は、マイクロフォン３９により取得された音声情報３７４と、記憶装置３７に記憶された選択辞書情報２１２とに基づいて、音声認識を実行する。音声認識部３６２による音声認識の具体的な手法は、従来の技術を適宜採用することができるため、詳細な説明を省略する。 The speech recognition unit 362 executes speech recognition based on the speech information 374 acquired by the microphone 39 and the selected dictionary information 212 stored in the storage device 37. A specific technique of speech recognition by the speech recognition unit 362 can appropriately adopt a conventional technique, and thus the detailed description is omitted.

なお、音声認識部３６２による認識結果は、インタフェース部３６０を介して、ＣＰＵ３０に伝達されるものとして説明する。ただし、認識結果は、必ずしもＣＰＵ３０に伝達されなければならないものではない。例えば、ＭＰＵ３６によってのみ処理される情報であってもよい。 The recognition result by the speech recognition unit 362 will be described as being transmitted to the CPU 30 via the interface unit 360. However, the recognition result does not necessarily have to be transmitted to the CPU 30. For example, the information may be information processed only by the MPU 36.

以上が、音声認識システム１の構成および機能の説明である。次に、音声認識方法について説明する。 The above is the description of the configuration and functions of the speech recognition system 1. Next, the speech recognition method will be described.

図７は、サーバ装置２の動作を示す流れ図である。図７に示す各工程は、本発明に係る音声認識方法を実現するにあたり、主に、サーバ装置２が実行する工程を示すものである。また、図７に示す各工程が開始されるまでに、予めデータベース２１１が作成され、記憶装置２１に記憶されているものとする。すなわち、携帯端末装置３側で想定される複数の事象と、複数の音声辞書候補とを互いに関連づけて記憶装置２１に記憶する工程は、すでに完了しているものとする。 FIG. 7 is a flow chart showing the operation of the server device 2. Each step shown in FIG. 7 mainly shows steps performed by the server device 2 in realizing the speech recognition method according to the present invention. Further, it is assumed that the database 211 is created in advance and stored in the storage device 21 before each process shown in FIG. 7 is started. That is, it is assumed that the process of storing the plurality of events assumed on the portable terminal 3 side and the plurality of speech dictionary candidates in the storage device 21 in association with each other has already been completed.

サーバ装置２は、運用が開始されると、携帯端末装置３からの更新要求情報３１１を受信したか否かを監視する（ステップＳ１）。 When the operation is started, the server device 2 monitors whether the update request information 311 from the portable terminal device 3 has been received (step S1).

更新要求情報３１１を受信すると（ステップＳ１においてＹｅｓ。）、選択部２００は、受信された更新要求情報３１１に含まれる事象情報３７２に基づいて、データベース２１１を検索する。 When the update request information 311 is received (Yes in step S1), the selection unit 200 searches the database 211 based on the event information 372 included in the received update request information 311.

すでに説明したように、更新要求情報３１１に含まれる事象情報３７２には、携帯端末装置３において、「現在生じている事象」が示されている。したがって、選択部２００は、当該現在生じている事象を検索キーとしてデータベース２１１を検索することにより、当該現在生じている事象に関連づけられている音声辞書候補をデータベース２１１から選択する（ステップＳ２）。 As described above, the event information 372 included in the update request information 311 indicates “currently occurring event” in the mobile terminal device 3. Therefore, the selection unit 200 selects the speech dictionary candidate associated with the currently occurring event from the database 211 by searching the database 211 using the currently occurring event as a search key (step S2).

データベース２１１において各事象に関連づけられている音声辞書候補は、各事象に対応して最適化され、登録された音声辞書である。したがって、選択部２００が、携帯端末装置３において現在生じている事象に関連づけられている音声辞書候補を選択することにより、現在生じている事象に最も適した音声辞書候補が選択されることになる。例えば、現在生じている事象として、「ジョギング」が示されている場合、選択部２００は、ジョギングしているユーザに適した音声辞書として作成されている第２音声辞書を選択することができる。 The speech dictionary candidates associated with each event in the database 211 are speech dictionaries optimized and registered corresponding to each event. Therefore, the selection unit 200 selects a speech dictionary candidate associated with an event currently occurring in the portable terminal 3 to select the speech dictionary candidate most suitable for the event occurring currently. . For example, when “jogging” is indicated as a currently occurring event, the selection unit 200 can select the second speech dictionary created as the speech dictionary suitable for the user who is jogging.

ステップＳ２を実行すると、選択部２００は、選択した音声辞書候補と、選択に使用した事象情報３７２を含む更新要求情報３１１を識別するための識別子とを含む選択辞書情報２１２を作成する（ステップＳ３）。 When step S2 is executed, the selection unit 200 creates selection dictionary information 212 including the selected speech dictionary candidate and an identifier for identifying the update request information 311 including the event information 372 used for selection (step S3). ).

このようにして、新たに選択辞書情報２１２が作成されると、通信部２４は、選択辞書情報２１２に含まれる識別子に基づいて更新要求情報３１１を特定する。そして、通信部２４は、特定した更新要求情報３１１を送信した携帯端末装置３を特定して、当該携帯端末装置３に向けて、当該更新要求情報３１１を送信する（ステップＳ４）。これにより、サーバ装置２は、携帯端末装置３からの更新要求（更新要求情報３１１）に対する応答として、選択辞書情報２１２を送信する。 Thus, when the selected dictionary information 212 is newly created, the communication unit 24 specifies the update request information 311 based on the identifier included in the selected dictionary information 212. Then, the communication unit 24 specifies the mobile terminal device 3 that has transmitted the specified update request information 311, and transmits the update request information 311 to the mobile terminal device 3 (step S4). Thus, the server device 2 transmits the selected dictionary information 212 as a response to the update request (update request information 311) from the mobile terminal device 3.

図８は、携帯端末装置３の動作を示す流れ図である。図８に示す各工程は、本発明に係る音声認識方法を実現するにあたり、主に、携帯端末装置３が実行する工程を示すものである。また、図８に示す各工程が開始されるまでに、予め第ｎ音声辞書が記憶装置３７に選択辞書情報２１２として記憶されているものとする。すなわち、携帯端末装置３において、デフォルトの音声辞書（第ｎ音声辞書）を記憶装置３７に記憶する工程は、すでに完了しているものとする。 FIG. 8 is a flowchart showing the operation of the mobile terminal device 3. The steps shown in FIG. 8 mainly show steps performed by the mobile terminal device 3 in realizing the speech recognition method according to the present invention. In addition, it is assumed that the n-th speech dictionary is stored in the storage device 37 as the selected dictionary information 212 before each process shown in FIG. 8 is started. That is, it is assumed that the process of storing the default speech dictionary (nth speech dictionary) in the storage device 37 in the portable terminal device 3 has already been completed.

なお、図８には、ＣＰＵ３０を通常動作モードから省電力モードに切り替える工程を図示していない。このような動作モードの切り替えは、例えば、所定の期間、ユーザによる操作が検出されないときや、ユーザによる直接の指示（省電力モードへの切り替え指示）があったとき、あるいは、利用中のアプリケーションによって判断されたときなどに実行される。ただし、省電力モードへの切り替えのトリガとなるものは、これらに限定されるものではない。 Note that FIG. 8 does not illustrate the process of switching the CPU 30 from the normal operation mode to the power saving mode. Such operation mode switching can be performed, for example, when no operation by the user is detected for a predetermined period, when a direct instruction by the user (instruction to switch to the power saving mode) is given, or by an application in use. It is executed when it is judged. However, the triggers for switching to the power saving mode are not limited to these.

携帯端末装置３は、電源が投入されると、所定の初期設定を実行した後、音声認識を実行することが可能な状態に遷移する。この状態を、以下、「運用開始状態」と称する。運用開始状態において、ＣＰＵ３０の動作モードは、通常動作モードまたは省電力モードである。また、運用開始状態において、携帯端末装置３は、ユーザによって携帯されているものとする。 When the mobile terminal device 3 is powered on, the mobile terminal device 3 performs a predetermined initial setting and then transitions to a state in which voice recognition can be performed. This state is hereinafter referred to as "operation start state". In the operation start state, the operation mode of the CPU 30 is a normal operation mode or a power saving mode. Further, in the operation start state, the portable terminal device 3 is assumed to be carried by the user.

運用開始状態において、携帯端末装置３は、観測装置群３８により観測情報３７１を作成する（ステップＳ１１）。ステップＳ１１は、ユーザの指示がなくても、周期的かつ継続的に実行される。ステップＳ１１において作成された観測情報３７１は、記憶装置３７に格納される。 In the operation start state, the mobile terminal device 3 creates observation information 371 by the observation device group 38 (step S11). Step S11 is periodically and continuously executed without an instruction from the user. The observation information 371 created in step S11 is stored in the storage device 37.

観測情報３７１が記憶装置３７に格納されると、事象検出部３６１は、当該観測情報３７１と、履歴情報３７３とに基づいて、現在生じている事象を検出する。より詳細には、事象検出部３６１は、観測装置群３８により取得された観測情報３７１を分析して状況を把握するとともに、履歴情報３７３に基づいてユーザの行動パターン等を参照することにより現在生じている事象を推定する。さらに、事象検出部３６１は、すでに記憶されている事象情報３７２と比較することにより、事象が変化したか否かを判定する（ステップＳ１２）。 When the observation information 371 is stored in the storage device 37, the event detection unit 361 detects an event that is currently occurring based on the observation information 371 and the history information 373. More specifically, the event detection unit 361 analyzes the observation information 371 acquired by the observation device group 38 to grasp the situation, and at the present time, by referring to the user's action pattern or the like based on the history information 373 Estimate the event that is Furthermore, the event detection unit 361 determines whether the event has changed by comparing it with the event information 372 already stored (step S12).

このように、携帯端末装置３は、運用開始状態において、常時、観測情報３７１の取得を行い、かつ、現在生じている事象に変化がないか否かを監視している。すなわち、通常動作モードのときは当然として、省電力モードにおいても、ステップＳ１２の処理は実行される。さらに、ステップＳ１２の処理に際して、ユーザからの特別な指示を必要としないため、ユーザが特に意識しなくても、ステップＳ１２の処理は実行される。なお、事象情報３７２に示される事象の初期値は、「デフォルト」である。 As described above, in the operation start state, the mobile terminal device 3 constantly acquires the observation information 371 and monitors whether or not a currently occurring event has changed. That is, as a matter of course in the normal operation mode, the process of step S12 is also performed in the power saving mode. Furthermore, since the special instruction from the user is not required in the process of step S12, the process of step S12 is executed even if the user is not particularly aware. The initial value of the event shown in the event information 372 is “default”.

事象に変化が生じており、ステップＳ１２においてＹｅｓと判定すると、携帯端末装置３は、更新要求処理を実行する（ステップＳ１３）。 If a change has occurred in the event and it is determined as Yes in step S12, the portable terminal device 3 executes an update request process (step S13).

図９は、携帯端末装置３が実行する更新要求処理を示す流れ図である。更新要求処理とは、携帯端末装置３が、サーバ装置２に対して、新しい音声辞書を送信するように要求する処理である。 FIG. 9 is a flowchart showing the update request process performed by the mobile terminal device 3. The update request process is a process in which the mobile terminal device 3 requests the server device 2 to transmit a new voice dictionary.

更新要求処理が開始されると、事象検出部３６１は、検出した事象（新たに生じた事象）を示す新たな事象情報３７２を作成する（ステップＳ３１）。また、事象検出部３６１は、新たに作成した事象情報３７２により、記憶装置３７にすでに記憶されている事象情報３７２を更新する。 When the update request process is started, the event detection unit 361 creates new event information 372 indicating the detected event (event newly generated) (step S31). Also, the event detection unit 361 updates the event information 372 already stored in the storage device 37 based on the newly created event information 372.

さらに、事象検出部３６１は、新たに作成した事象情報３７２（検出結果）や、当該事象情報３７２を作成する際に参照した観測情報３７１（検出結果の元となった情報）などに基づいて、履歴情報３７３を更新する（ステップＳ３２）。これにより、過去における事象の検出結果などが蓄積される。 Furthermore, the event detection unit 361 is based on the newly created event information 372 (detection result), the observation information 371 referred to when creating the event information 372 (information based on the detection result), etc. The history information 373 is updated (step S32). As a result, detection results of events in the past are accumulated.

次に、インタフェース部３６０は、事象情報３７２が更新されたことを検出して、ＣＰＵ３０の動作モードが省電力モードであるか否かを判定する（ステップＳ３３）。そして、動作モードが省電力モードの場合（ステップＳ３３においてＹｅｓ。）、動作モードを通常動作モードに切り替える（ステップＳ３４）。一方、省電力モードでない場合（ステップＳ３３においてＮｏ。）、インタフェース部３６０は、ステップＳ３４の処理をスキップする。 Next, the interface unit 360 detects that the event information 372 has been updated, and determines whether the operation mode of the CPU 30 is the power saving mode (step S33). Then, if the operation mode is the power saving mode (Yes in step S33), the operation mode is switched to the normal operation mode (step S34). On the other hand, when it is not the power saving mode (No in step S33), the interface unit 360 skips the process of step S34.

ステップＳ３３およびＳ３４の処理を、より具体的に説明する。事象情報３７２が更新され、当該事象情報３７２をＣＰＵ３０に向けて送信する必要が生じたときに、インタフェース部３６０は、ＣＰＵ３０に向けて割り込み信号を通知する。ＭＰＵ３６（インタフェース部３６０）からＣＰＵ３０に向けて割り込み信号が通知されたときに、ＣＰＵ３０が省電力モードであれば、この割り込み信号がＣＰＵ３０を通常動作モードに復帰させる信号となる。一方で、通信部３４からＣＰＵ３０に向けて割り込み信号が通知されたときに、ＣＰＵ３０が通常動作モードであれば、この割り込み信号がＣＰＵ３０を通常動作モードに復帰させる信号とみなされることはなく、一般的な割り込み信号として処理される。したがって、実際のインタフェース部３６０は、ステップＳ３３において、ＣＰＵ３０が省電力モードであるか否かについて判定するわけではない。 The processes of steps S33 and S34 will be described more specifically. When the event information 372 is updated and it is necessary to transmit the event information 372 to the CPU 30, the interface unit 360 notifies the CPU 30 of an interrupt signal. When an interrupt signal is notified from the MPU 36 (interface unit 360) to the CPU 30, if the CPU 30 is in the power saving mode, this interrupt signal becomes a signal for returning the CPU 30 to the normal operation mode. On the other hand, when the interrupt signal is notified from the communication unit 34 to the CPU 30, if the CPU 30 is in the normal operation mode, this interrupt signal is not regarded as a signal for returning the CPU 30 to the normal operation mode. It is processed as a typical interrupt signal. Therefore, the actual interface unit 360 does not determine in step S33 whether the CPU 30 is in the power saving mode.

ステップＳ３３およびＳ３４が実行された後において、ＣＰＵ３０は必ず通常動作モードとなり、事象情報３７２を受け取れる状態となる。したがって、インタフェース部３６０は、新たに作成された事象情報３７２をＣＰＵ３０に向けて転送する（ステップＳ３５）。 After steps S33 and S34 are executed, the CPU 30 always enters the normal operation mode and can receive event information 372. Therefore, the interface unit 360 transfers the newly created event information 372 to the CPU 30 (step S35).

ＭＰＵ３６から事象情報３７２を受け取ると、ＣＰＵ３０は、当該事象情報３７２に基づいて、更新要求情報３１１を作成する（ステップＳ３６）。更新要求情報３１１は、すでに説明したように、事象情報３７２および携帯端末装置３の識別子（例えば、ネットワークアドレスなど）を含む情報である。 When the event information 372 is received from the MPU 36, the CPU 30 creates the update request information 311 based on the event information 372 (step S36). The update request information 311 is information including the event information 372 and the identifier of the mobile terminal 3 (for example, a network address etc.) as described above.

更新要求情報３１１を作成すると、ＣＰＵ３０は、当該更新要求情報３１１をサーバ装置２に向けて送信するように通信部３４を制御する。これにより、通信部３４が、更新要求情報３１１をサーバ装置２に向けて送信する（ステップＳ３７）。 When the update request information 311 is created, the CPU 30 controls the communication unit 34 to transmit the update request information 311 to the server device 2. Thus, the communication unit 34 transmits the update request information 311 to the server device 2 (step S37).

ステップＳ３７が実行され、通信部３４が更新要求情報を送信すると、携帯端末装置３は、更新要求処理を終了して、図８に示す処理に戻る。 When step S37 is executed and the communication unit 34 transmits the update request information, the portable terminal device 3 ends the update request process, and returns to the process shown in FIG.

図８に戻って、ステップＳ１２においてＮｏと判定した場合、携帯端末装置３はステップＳ１３をスキップする。したがって、現在生じている事象に変化が生じない限り、携帯端末装置３が更新要求処理（ステップＳ１３）を実行することはなく、更新要求情報３１１がサーバ装置２に向けて送信されることもない。 Referring back to FIG. 8, when it is determined No in step S12, the mobile terminal device 3 skips step S13. Therefore, the portable terminal device 3 does not execute the update request process (step S13) unless the currently occurring event changes, and the update request information 311 is not transmitted to the server device 2. .

運用開始状態において、携帯端末装置３は、マイクロフォン３９により音声情報３７４を作成する（ステップＳ１４）。ステップＳ１４は、運用開始状態において、ユーザの指示がなくても、周期的かつ継続的に実行される。ステップＳ１４において作成された音声情報３７４は、記憶装置３７に格納される。 In the operation start state, the portable terminal device 3 creates the voice information 374 by the microphone 39 (step S14). Step S14 is periodically and continuously executed in the operation start state even without the user's instruction. The voice information 374 created in step S14 is stored in the storage device 37.

音声情報３７４が記憶装置３７に格納されると、音声認識部３６２は、当該音声情報３７４と、選択辞書情報２１２とに基づいて、音声認識を実行し（ステップＳ１５）、認識に成功したか否かを判定する（ステップＳ１６）。 When the speech information 374 is stored in the storage device 37, the speech recognition unit 362 executes speech recognition based on the speech information 374 and the selected dictionary information 212 (step S15), and whether or not the recognition is successful. It is determined (step S16).

このように、携帯端末装置３では、ステップＳ１４ないしＳ１６における処理が、ＭＰＵ３６によって実現されており、ＣＰＵ３０が省電力モードであっても、実行することが可能である。すなわち、音声認識システム１は、消費電力を抑制しつつ、常時、音声認識を実行することができるように構成されている。 As described above, in the mobile terminal device 3, the processing in steps S14 to S16 is realized by the MPU 36, and can be executed even when the CPU 30 is in the power saving mode. That is, the voice recognition system 1 is configured to be able to always perform voice recognition while suppressing power consumption.

また、ステップＳ１４ないしＳ１６における処理は、携帯端末装置３の運用開始状態において、ユーザの特別な指示がなくても実行される処理として構成されている。したがって、ユーザは、特に意識することなく、音声認識を利用することができ、ユーザの負担が軽減される。 In addition, the processing in steps S14 to S16 is configured as processing that is executed without the user's special instruction in the operation start state of the portable terminal device 3. Therefore, the user can use speech recognition without being particularly aware, and the burden on the user is reduced.

音声認識部３６２が認識に成功した場合（ステップＳ１６においてＹｅｓ。）、ＭＰＵ３６は、認識結果を実行する（ステップＳ１７）。 If the speech recognition unit 362 succeeds in recognition (Yes in step S16), the MPU 36 executes the recognition result (step S17).

ステップＳ１７における認識結果の実行とは、ＭＰＵ３６がＣＰＵ３０に向けて認識結果を転送することである。具体的には、まず、音声認識部３６２が、当該認識結果をインタフェース部３６０に伝達する。次に、インタフェース部３６０が、音声認識部３６２から伝達された認識結果をＣＰＵ３０に向けて転送する。 The execution of the recognition result in step S17 means that the MPU 36 transfers the recognition result to the CPU 30. Specifically, first, the speech recognition unit 362 transmits the recognition result to the interface unit 360. Next, the interface unit 360 transfers the recognition result transmitted from the speech recognition unit 362 to the CPU 30.

なお、認識結果をＣＰＵ３０に向けて転送するときに、ＣＰＵ３０が省電力モードであった場合、インタフェース部３６０は、ＣＰＵ３０の動作モードを通常動作モードに切り替えてから、当該認識結果を転送する。 When the recognition result is transferred to the CPU 30 and the CPU 30 is in the power saving mode, the interface unit 360 transfers the recognition result after switching the operation mode of the CPU 30 to the normal operation mode.

現在生じている事象として「ジョギング」が検出されており、第２音声辞書候補が選択辞書情報２１２として記憶装置３７に記憶されている状態を例に、ＣＰＵ３０が実行する処理を説明する。このような状態で、例えば、ユーザが「脈拍」と発声すると、音声認識部３６２が第２音声辞書候補を用いて音声認識を行い、「脈拍」という言葉（テキスト情報）を認識結果としてＣＰＵ３０に伝達する。 The process executed by the CPU 30 will be described by taking an example in which “jogging” is detected as a currently occurring event and the second speech dictionary candidate is stored in the storage device 37 as the selected dictionary information 212 as an example. In such a state, for example, when the user utters "pulse", the voice recognition unit 362 performs voice recognition using the second voice dictionary candidate, and causes the CPU 30 to use the word "pulse" (text information) as a recognition result. introduce.

ＭＰＵ３６から認識結果を受け取ったＣＰＵ３０は、当該認識結果に応じた処理を実行する。 The CPU 30 that has received the recognition result from the MPU 36 executes a process according to the recognition result.

上記に示す例では、ＣＰＵ３０は、ユーザの「脈拍」という発声に対する処理として、ユーザの脈拍数を計測して音声案内するようにスピーカ３５を制御する。これにより、スピーカ３５から、例えば、「１２０」などの音声が再生される。したがって、ユーザは、携帯端末装置３を目視して閲覧し、操作しなくても、携帯端末装置３を使用することができる。 In the example shown above, the CPU 30 controls the speaker 35 to measure the user's pulse rate and provide voice guidance as processing for the user's utterance of "pulse". As a result, sound such as "120" is reproduced from the speaker 35, for example. Therefore, the user can use the portable terminal device 3 without visually observing and viewing the portable terminal device 3 and operating it.

運用開始状態において、サーバ装置２から送信された選択辞書情報２１２を通信部３４が受信すると（ステップＳ１８においてＹｅｓ。）、携帯端末装置３は、動作モードが省電力モードであるか否かを判定する（ステップＳ１９）。そして、動作モードが省電力モードの場合（ステップＳ１９においてＹｅｓ。）、動作モードを通常動作モードに切り替える（ステップＳ２０）。一方、省電力モードでない場合（ステップＳ１９においてＮｏ。）、携帯端末装置３は、ステップＳ２０の処理をスキップする。 In the operation start state, when the communication unit 34 receives the selected dictionary information 212 transmitted from the server device 2 (Yes in step S18), the portable terminal device 3 determines whether the operation mode is the power saving mode. (Step S19). Then, if the operation mode is the power saving mode (Yes in step S19), the operation mode is switched to the normal operation mode (step S20). On the other hand, when it is not the power saving mode (No in step S19), the mobile terminal device 3 skips the process of step S20.

ステップＳ１８ないしＳ２０の処理を、より具体的に説明する。通信部３４は、運用開始状態において、ネットワークを監視しており、当該ネットワークを介して携帯端末装置３に着信があったか否かを監視している。そして、通信部３４が着信を検出した場合には、通信部３４からＣＰＵ３０に向けて割り込み信号が通知される。したがって、実際の通信部３４は、ステップＳ１８において、受信された情報が選択辞書情報２１２であるか否かについて判定するわけではない。 The processes of steps S18 to S20 will be described more specifically. The communication unit 34 monitors the network in the operation start state, and monitors whether the mobile terminal device 3 receives a call via the network. When the communication unit 34 detects an incoming call, an interrupt signal is notified from the communication unit 34 to the CPU 30. Therefore, the actual communication unit 34 does not determine whether the received information is the selected dictionary information 212 in step S18.

通信部３４からＣＰＵ３０に向けて割り込み信号が通知されたときに、ＣＰＵ３０が省電力モードであれば、この割り込み信号がＣＰＵ３０を通常動作モードに復帰させる信号となる。一方で、通信部３４からＣＰＵ３０に向けて割り込み信号が通知されたときに、ＣＰＵ３０が通常動作モードであれば、この割り込み信号がＣＰＵ３０を通常動作モードに復帰させる信号とみなされることはなく、一般的な割り込み信号として処理される。したがって、ＣＰＵ３０は、ステップＳ１９において省電力モードか否かを判定するわけではない。 When an interrupt signal is notified from the communication unit 34 to the CPU 30, if the CPU 30 is in the power saving mode, this interrupt signal becomes a signal for returning the CPU 30 to the normal operation mode. On the other hand, when the interrupt signal is notified from the communication unit 34 to the CPU 30, if the CPU 30 is in the normal operation mode, this interrupt signal is not regarded as a signal for returning the CPU 30 to the normal operation mode. It is processed as a typical interrupt signal. Therefore, the CPU 30 does not determine whether or not the power saving mode is set in step S19.

ステップＳ１９においてＮｏの場合、または、ステップＳ２０が実行された場合、ＣＰＵ３０は、通信部３４が受信した選択辞書情報２１２をＭＰＵ３６（インタフェース部３６０）に向けて転送する。ＣＰＵ３０から選択辞書情報２１２が転送されると、インタフェース部３６０は、記憶装置３７に、当該選択辞書情報２１２を記憶させる（ステップＳ２１）。これにより、携帯端末装置３において、すでに記憶されていた選択辞書情報２１２が、新たに受信された選択辞書情報２１２に更新される。 If No in step S19 or if step S20 is executed, the CPU 30 transfers the selected dictionary information 212 received by the communication unit 34 toward the MPU 36 (interface unit 360). When the selected dictionary information 212 is transferred from the CPU 30, the interface unit 360 causes the storage device 37 to store the selected dictionary information 212 (step S21). As a result, in the portable terminal device 3, the selected dictionary information 212 already stored is updated to the newly received selected dictionary information 212.

すでに説明したように、初期状態において、携帯端末装置３の記憶装置３７には、第ｎ音声辞書が選択辞書情報２１２として記憶されている。この状態において、例えば、事象「ジョギング」に関連づけられている第２音声辞書が選択辞書情報２１２として受信されると、ステップＳ２１が実行されることにより、第ｎ音声辞書が第２音声辞書に更新されることになる。 As described above, in the initial state, the n-th speech dictionary is stored as the selected dictionary information 212 in the storage device 37 of the mobile terminal device 3. In this state, for example, when the second speech dictionary associated with the event "jogging" is received as the selected dictionary information 212, step S21 is executed to update the n-th speech dictionary to the second speech dictionary It will be done.

例えば、ジョギング中のユーザが使用する言葉（携帯端末装置３に対して入力する言葉）は、ジョギングに関連する語彙に限られると予想できる。したがって、現在生じている事象として「ジョギング」が検出されたときに、ジョギングに対応して語彙が取捨選択された第２音声辞書を音声認識に用いることにより、音声認識の精度を低下させることなく、通常の音声辞書に比べて情報容量（サイズ）を減らした音声辞書を使用することができる。 For example, it can be expected that words used by the user during jogging (words input to the mobile terminal device 3) are limited to the vocabulary related to jogging. Therefore, when "jogging" is detected as a currently occurring event, the second speech dictionary whose vocabulary is selected corresponding to the jogging is used for speech recognition, without lowering the speech recognition accuracy. It is possible to use an audio dictionary whose information capacity (size) is reduced compared to a normal audio dictionary.

このように、音声認識システム１は、音声辞書のサイズが小さいため、応答性能がよいという利点がある。すでに説明したように、音声認識システム１では、携帯端末装置３において新しい事象が検出されるたびに、それに応じた音声辞書がサーバ装置２から携帯端末装置３にダウンロードされる。もし、ダウンロードする音声辞書のサイズが大きければ、ダウンロードに時間を要し、音声辞書を準備するまでの時間が増大することになり、応答性能が低下する。しかし、音声認識システム１は、ダウンロードする音声辞書（選択辞書情報２１２）のサイズが小さいため、ダウンロードに要する時間は短く、応答性能が犠牲にならずに済む。 Thus, the speech recognition system 1 has an advantage that the response performance is good because the size of the speech dictionary is small. As described above, in the voice recognition system 1, whenever a new event is detected in the portable terminal device 3, a corresponding voice dictionary is downloaded from the server device 2 to the portable terminal device 3. If the size of the speech dictionary to be downloaded is large, it takes time to download, the time to prepare the speech dictionary will increase, and the response performance will deteriorate. However, in the speech recognition system 1, since the size of the speech dictionary (selected dictionary information 212) to be downloaded is small, the time required for the download is short, and the response performance is not sacrificed.

また、音声辞書のサイズが小さいため、記憶装置３７の記憶容量が小さくて済むとともに、ＭＰＵ３６のような比較的処理能力の低い演算装置でも音声認識を実行することができる。したがって、システム全体として、コストを抑制することができる。 In addition, since the size of the voice dictionary is small, the storage capacity of the storage device 37 can be small, and voice recognition can be performed even with an arithmetic device having a relatively low processing capability such as the MPU 36. Therefore, the cost of the entire system can be reduced.

また、特開２０１０−１９１２２３号公報に記載されている技術では、作業者（ユーザ）は、「会計入ります」というように、これからの作業の内容などを示す音声を入力しなければ、音声辞書の切り替えが行われない。すなわち、音声辞書を切り替えるためのトリガは、ユーザ自身が、意識的に確実に実行しなければならない。しかし、音声認識システム１は、観測装置群３８によって、継続的に、かつ、ユーザに意識させることなく取得される観測情報３７１に基づいて現在生じている事象を自動的に検出し、これをトリガとして更新要求（更新要求処理）を行う。したがって、ユーザは、音声辞書を切り替えるためのトリガを与えることを特に意識する必要がなく、ユーザの負担が軽減される。 Further, in the technology described in Japanese Patent Application Laid-Open No. 2010-191223, if a worker (user) does not input a voice indicating the content of work to be done from now on, such as "I am accounting," a voice dictionary Switching is not performed. In other words, the trigger for switching the speech dictionary must be consciously and surely executed by the user. However, the speech recognition system 1 automatically detects a currently occurring event based on the observation information 371 obtained continuously and without the user's awareness by the observation device group 38 and triggers this. As update request (update request processing) is performed. Therefore, the user does not have to be particularly aware of providing a trigger for switching the voice dictionary, and the burden on the user is reduced.

また、従来の技術では、音声辞書の選択は、位置情報に基づいて行われるため、ユーザは、目的の音声辞書が選択される位置でしかトリガを与えることができない。したがって、従来の技術は、ユーザが、自身の存在位置と音声辞書との対応関係をよく理解していなければ、逆に、不適切な音声辞書を選択するおそれがある。しかし、音声認識システム１は、多様な観測情報３７１（および履歴情報３７３）に基づいて現在生じている事象を検出するため、ユーザに頼ることなく、最適な音声辞書を選択することができる。 Also, in the prior art, since the selection of the speech dictionary is performed based on the position information, the user can only give a trigger at the position where the target speech dictionary is selected. Therefore, the prior art may select an inappropriate voice dictionary if the user does not understand the correspondence between the user's location and the voice dictionary. However, the speech recognition system 1 can select an optimum speech dictionary without relying on the user because it detects events that are currently occurring based on various observation information 371 (and history information 373).

また、従来の技術では、音声辞書の選択が位置情報のみに基づいて行われるため、位置に関係のない事象に基づいて音声辞書を最適化することができず、汎用性が低いとともに、語彙の絞り込みも不十分という問題がある。しかし、音声認識システム１は、多様な観測情報３７１に基づいて現在生じている事象を検出するため、より状況に応じた音声辞書を選択することが可能となる。 Further, in the prior art, since the selection of the speech dictionary is performed based only on the position information, the speech dictionary can not be optimized based on the event unrelated to the position, and the versatility is low. There is a problem that narrowing is also insufficient. However, since the speech recognition system 1 detects a currently occurring event based on various observation information 371, it is possible to select a speech dictionary more suitable for the situation.

以上のように、音声を音声辞書により認識する音声認識システム１は、通常動作モードと通常動作モードに比べて消費電力が抑制される省電力モードとの間で動作モードを切り替えることが可能なＣＰＵ３０と、予め想定される複数の事象と音声辞書の候補となる複数の音声辞書候補とを関連づけるデータベース２１１を記憶する記憶装置２１と、事象を検出するための物理量を観測情報３７１として取得する観測装置群３８と、音声を音声情報３７４として取得するマイクロフォン３９と、選択辞書情報２１２を記憶する記憶装置３７と、記憶装置３７にアクセスするＭＰＵ３６とを備えている。そして、ＭＰＵ３６は、観測装置群３８により取得された観測情報３７１に基づいて、予め想定される複数の事象の中から現在生じている事象を検出する事象検出部３６１と、マイクロフォン３９により取得された音声情報３７４と記憶装置３７に記憶された選択辞書情報２１２とに基づいて、音声認識を実行する音声認識部３６２とを備える。また、音声認識システム１は、事象検出部３６１により現在生じている事象として検出された事象に応じて、記憶装置２１に記憶されている複数の音声辞書候補の中から１の音声辞書候補を選択する選択部２００をさらに備え、選択部２００により選択された選択辞書情報２１２を音声辞書として記憶装置３７に記憶させる。さらに、音声認識システム１では、ＣＰＵ３０を省電力モードで動作させつつＭＰＵ３６を動作させたときの消費電力が、ＣＰＵ３０を通常動作モードで動作させたときの消費電力よりも小さくなるように設計されている。これにより、消費電力の低いＭＰＵ３６に音声認識を実行させることにより、消費電力を抑制することができる。また、サイズの小さい選択辞書情報２１２を音声認識に使用したとしても、選択辞書情報２１２が事象に応じて最適化されているため、認識率を低下させることがない。 As described above, the voice recognition system 1 that recognizes voice by means of a voice dictionary can switch the operation mode between the normal operation mode and the power saving mode in which power consumption is suppressed as compared to the normal operation mode. , A storage device 21 storing a database 211 associating a plurality of events assumed in advance with a plurality of speech dictionary candidates serving as candidates for a speech dictionary, and an observation device acquiring physical quantities for detecting events as observation information 371 A group 38, a microphone 39 for acquiring speech as speech information 374, a storage unit 37 for storing selected dictionary information 212, and an MPU 36 for accessing the storage unit 37 are provided. Then, based on the observation information 371 acquired by the observation device group 38, the MPU 36 is acquired by the event detection unit 361 that detects an event that is currently occurring out of a plurality of events assumed in advance, and the microphone 39. The voice recognition unit 362 performs voice recognition based on the voice information 374 and the selected dictionary information 212 stored in the storage device 37. Further, the speech recognition system 1 selects one speech dictionary candidate from among a plurality of speech dictionary candidates stored in the storage device 21 in accordance with the event detected as the event currently occurring by the event detection unit 361. The selection unit 200 is further provided, and the selected dictionary information 212 selected by the selection unit 200 is stored in the storage device 37 as a speech dictionary. Furthermore, in the voice recognition system 1, the power consumption when operating the MPU 36 while operating the CPU 30 in the power saving mode is designed to be smaller than the power consumption when operating the CPU 30 in the normal operation mode There is. Thus, power consumption can be suppressed by causing the MPU 36 with low power consumption to perform voice recognition. In addition, even if the small selection dictionary information 212 is used for speech recognition, the selection dictionary information 212 is optimized according to the event, so the recognition rate is not reduced.

また、観測装置群３８は、ユーザの動きに起因する物理量を観測情報３７１として取得し、事象検出部３６１は、現在生じている事象として、ユーザの行動を推定する。ユーザの発する音声（言葉）は、ユーザの行動との関連性が高い。したがって、ユーザの行動を推定することで、より最適な音声辞書を選択することができる。すなわち、音声認識の精度が向上する。 In addition, the observation device group 38 acquires a physical quantity resulting from the user's movement as the observation information 371, and the event detection unit 361 estimates the user's action as an event that is currently occurring. The voice (word) emitted by the user is highly relevant to the user's action. Therefore, by estimating the user's behavior, it is possible to select a more optimal speech dictionary. That is, the accuracy of speech recognition is improved.

また、事象検出部３６１は、ユーザの姿勢を推定することにより、当該ユーザの行動を推定する。ユーザの行動は、ユーザの姿勢との関連性が高いため、ユーザの行動推定精度が向上する。 Further, the event detection unit 361 estimates the user's action by estimating the user's posture. The behavior of the user is highly related to the posture of the user, so that the accuracy of estimating the behavior of the user is improved.

また、複数の音声辞書候補は、関連づけられる事象に応じて、収録される語彙が取捨選択されている。これにより、認識精度を低下させることなく、音声辞書（選択辞書情報２１２）のサイズを抑制することができる。 In addition, the plurality of speech dictionary candidates are sorted out according to the event to be associated with. Thereby, the size of the voice dictionary (selected dictionary information 212) can be suppressed without reducing the recognition accuracy.

また、音声認識システム１電力の自給能力の低い携帯端末装置３に適用することにより、消費電力の抑制効果がより顕著となる。 Moreover, the suppression effect of power consumption becomes more remarkable by applying to the portable terminal device 3 with low self-supporting ability of the speech recognition system 1 electric power.

また、記憶装置３７は、過去の履歴情報３７３を記憶しており、事象検出部３６１は、記憶装置３７に記憶された履歴情報３７３に基づいて、現在生じている事象を推定する。したがって、事象の推定精度が向上する。 Further, the storage device 37 stores the past history information 373, and the event detection unit 361 estimates a currently occurring event based on the history information 373 stored in the storage device 37. Therefore, the estimation accuracy of the event is improved.

なお、上記実施の形態では、ＣＰＵ３０を省電力モードから通常動作モードに切り替える工程として、ステップＳ２０およびステップＳ３４のみを説明した。ただし、ＣＰＵ３０を通常動作モードに切り替えるトリガとなるものは、これらの工程を実行する条件となるものに限定されるわけではない。 In the above embodiment, only steps S20 and S34 have been described as the process of switching the CPU 30 from the power saving mode to the normal operation mode. However, what becomes a trigger which changes CPU30 to a normal operation mode is not necessarily limited to what becomes conditions which perform these processes.

＜２．変形例＞
以上、本発明の実施の形態について説明してきたが、本発明は上記実施の形態に限定されるものではなく様々な変形が可能である。 <2. Modified example>
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, A various deformation | transformation is possible.

例えば、上記実施の形態に示した各工程は、あくまでも例示であって、上記に示した順序や内容に限定されるものではない。すなわち、同様の効果が得られるならば、適宜、順序や内容が変更されてもよい。例えば、事象情報３７２を更新する工程（ステップＳ３１）と、履歴情報３７３を更新する工程（ステップＳ３２）との順序を入れ替えても、本発明を実現することができる。 For example, each process shown in the above-mentioned embodiment is an illustration to the last, and is not limited to the order and the contents which were shown above. That is, the order and content may be changed as appropriate, as long as the same effect is obtained. For example, the present invention can be realized even if the order of the step of updating event information 372 (step S31) and the step of updating history information 373 (step S32) is switched.

また、上記実施の形態に示した選択部２００は、ＣＰＵ２０がプログラム２１０に従って動作することにより、ソフトウェア的に実現されると説明した。また、インタフェース部３６０や事象検出部３６１、および、音声認識部３６２は、ＭＰＵ３６がプログラム３７０に従って動作することにより、ソフトウェア的に実現されると説明した。しかし、これらの機能ブロックの一部または全部を専用の論理回路で構成し、ハードウェア的に実現してもよい。 In addition, it has been described that the selection unit 200 described in the above embodiment is realized as software by the CPU 20 operating according to the program 210. In addition, it has been described that the interface unit 360, the event detection unit 361, and the voice recognition unit 362 are realized as software by the MPU 36 operating according to the program 370. However, some or all of these functional blocks may be configured as dedicated logic circuits and implemented as hardware.

また、上記実施の形態では、サーバ装置２にデータベース２１１が記憶され、サーバ装置２が備えるＣＰＵ２０（選択部２００）が音声辞書候補の選択を行う例で説明した。しかし、例えば、データベース２１１に相当する情報を携帯端末装置３の記憶装置３１に記憶しておき、ＣＰＵ３０が音声辞書候補を選択してＭＰＵ３６に伝達するように構成してもよい。 In the above embodiment, the database 211 is stored in the server device 2 and the CPU 20 (selection unit 200) of the server device 2 selects the voice dictionary candidate. However, for example, information corresponding to the database 211 may be stored in the storage device 31 of the mobile terminal device 3 and the CPU 30 may be configured to select speech dictionary candidates and transmit them to the MPU 36.

１音声認識システム
２サーバ装置
２０，３０ＣＰＵ
２００選択部
２１，３１，３７記憶装置
２１０，３１０，３７０プログラム
２１１データベース
２１２選択辞書情報
２２，３２操作部
２３，３３表示部
２４，３４通信部
３携帯端末装置
３１１更新要求情報
３５スピーカ
３６ＭＰＵ
３６０インタフェース部
３６１事象検出部
３６２音声認識部
３７１観測情報
３７２事象情報
３７３履歴情報
３７４音声情報
３８観測装置群
３９マイクロフォン 1 speech recognition system 2 server device 20, 30 CPU
200 selection unit 21, 31, 37 storage device 210, 310, 370 program 211 database 212 selection dictionary information 22, 32 operation unit 23, 33 display unit 24, 34 communication unit 3 portable terminal device 311 update request information 35 speaker 36 MPU
360 interface unit 361 event detection unit 362 speech recognition unit 371 observation information 372 event information 373 history information 374 speech information 38 observation device group 39 microphone

Claims

A speech recognition system for recognizing speech by means of a speech dictionary, comprising
A first arithmetic device capable of switching an operation mode between a normal operation mode and a power saving mode in which power consumption is suppressed as compared with the normal operation mode;
A first storage device that associates and stores a plurality of events assumed in advance and a plurality of speech dictionary candidates serving as speech dictionary candidates;
An observation means for acquiring a physical quantity for detecting an event as observation information;
A microphone for acquiring the voice as voice information;
A second storage device storing the speech dictionary;
A second computing device accessing the second storage device;
Equipped with
The second computing device is
An event detection unit that detects an event currently occurring from the plurality of events assumed in advance based on observation information acquired by the observation unit;
Voice recognition means for performing voice recognition based on the voice information acquired by the microphone and the voice dictionary stored in the second storage device;
Equipped with
Selecting means for selecting one speech dictionary candidate from among a plurality of speech dictionary candidates stored in the first storage device according to an event detected as an event currently occurring by the event detection means Equipped
Storing one speech dictionary candidate selected by the selection unit in the second storage device as the speech dictionary;
The power consumption when operating the second arithmetic device while operating the first arithmetic device in the power saving mode is smaller than the power consumption when operating the first arithmetic device in the normal operation mode Speech recognition system.

The speech recognition system according to claim 1, wherein
The observation means acquires, as observation information, a physical quantity resulting from the movement of the user.
The speech recognition system, wherein the event detection means estimates the user's behavior as a currently occurring event.

The speech recognition system according to claim 2, wherein
The voice recognition system which estimates the action of the user by the event detecting means estimating the posture of the user.

The speech recognition system according to any one of claims 1 to 3, wherein
The speech recognition system according to claim 1, wherein the plurality of speech dictionary candidates are selected according to an event to be associated.

The speech recognition system according to any one of claims 1 to 4, wherein
A portable terminal device carried by a user and comprising the first computing device, the second computing device, and the second storage device;
A server device connected to the portable terminal device in a data communication enabled state and including the first storage device and the selection unit;
Speech recognition system comprising:

The speech recognition system according to any one of claims 1 to 5, wherein
The second storage device stores past history information,
The speech recognition system according to claim 1, wherein the event detection means estimates a currently occurring event based on history information stored in the second storage device.

A speech recognition method for recognizing speech with a speech dictionary, comprising:
Storing a plurality of events assumed in advance and a plurality of speech dictionary candidates as candidates for the speech dictionary in the first storage device in association with each other;
Switching the operation mode of the first computing device between a normal operation mode and a power saving mode in which power consumption is suppressed as compared to the normal operation mode;
Acquiring a physical quantity for detecting an event as observation information by the observation means;
Detecting a currently occurring event from the plurality of events assumed in advance by the second arithmetic unit based on observation information acquired by the observation unit;
Selecting one speech dictionary candidate from among a plurality of speech dictionary candidates stored in the first storage device according to an event detected by the second arithmetic device as a currently occurring event;
Storing the selected one voice dictionary candidate as the voice dictionary in a second storage device accessed by the second arithmetic device;
Acquiring the voice as voice information by a microphone;
Executing voice recognition by the second computing device based on voice information acquired by the microphone and a voice dictionary stored in the second storage device;
Have
The power consumption when operating the second arithmetic device while operating the first arithmetic device in the power saving mode is smaller than the power consumption when operating the first arithmetic device in the normal operation mode Speech recognition method.