JP2009075582A

JP2009075582A - Terminal device, language model creation device, and distributed speech recognition system

Info

Publication number: JP2009075582A
Application number: JP2008219820A
Authority: JP
Inventors: Atsushi Sasaki; 淳志佐々木; Toshihiro Shiren; 俊弘枝連; Masanori Nakamura; 正規中村; Yutaka Kondo; 裕近藤
Original assignee: Advanced Media Inc
Current assignee: Advanced Media Inc
Priority date: 2007-08-29
Filing date: 2008-08-28
Publication date: 2009-04-09

Abstract

PROBLEM TO BE SOLVED: To provide a terminal device, a language model creation device and a distributed speech recognition system, which easily improve the accuracy of speech recognition with respect to different notation depending on a context. SOLUTION: A mobile phone 200 has a speech data transmission unit 208 for transmitting speech data to a speech recognizer which performs speech recognition processing by using a language model, and a mail processing unit 205 for transmitting a mail text of normal transmission mail to the language model creation device which creates the language model as the mail for language model creation. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、メール送信を行うとともに言語モデルを用いて音声認識を行う音声認識装置に音声データを送信する端末装置と、言語モデルを作成する言語モデル作成装置と、これらの装置を用いた分散型音声認識システムとに関する。 The present invention relates to a terminal device that transmits voice data to a speech recognition device that performs voice recognition and performs speech recognition using a language model, a language model creation device that creates a language model, and a distributed type using these devices The present invention relates to a speech recognition system.

近年、携帯電話機などの各種端末装置において、キースイッチを用いた文字入力ではなく、マイクロフォンを用いた音声入力によって文字列を作成することが行われている（例えば特許文献１および特許文献２参照）。 In recent years, in various terminal devices such as mobile phones, character strings are created by voice input using a microphone instead of character input using a key switch (see, for example, Patent Document 1 and Patent Document 2). .

特許文献１および特許文献２記載の技術では、ネットワーク上に、音声認識により音声データを文章化する音声認識装置を配置する。端末装置は、音声データを音声認識装置に送信する。音声認識装置は、音響モデル、辞書、および言語モデルを含む音声認識データベースを参照して音声データから文字列を作成し、端末装置に返信する。これにより、端末装置では、メール本文などの文字列を簡単に作成することが可能となる。 In the techniques described in Patent Document 1 and Patent Document 2, a speech recognition device that converts speech data into text by speech recognition is arranged on a network. The terminal device transmits voice data to the voice recognition device. The voice recognition device creates a character string from the voice data by referring to a voice recognition database including an acoustic model, a dictionary, and a language model, and returns the character string to the terminal device. As a result, the terminal device can easily create a character string such as a mail text.

また、特許文献１および特許文献２には、音声認識データベースを更新する技術が記載されている。 Patent Documents 1 and 2 describe techniques for updating a speech recognition database.

特許文献１および特許文献２記載の技術では、音声認識装置は、音声認識データベースの辞書を参照して、端末装置から受信した音声データを文字列に変換し、端末装置に返信する。端末装置は、音声認識装置により作成された文字列に対するユーザ修正を受け付け、修正された文字列を音声認識装置に送信する。音声認識装置は、受信した文字列をメール本文とするメールを作成して送信するとともに、修正部分に基づいて音声認識データベースの辞書を修正する。これにより、読み仮名に対する単語表記の認識精度を向上させることができる。 In the techniques described in Patent Literature 1 and Patent Literature 2, the speech recognition device refers to the dictionary of the speech recognition database, converts speech data received from the terminal device into a character string, and returns the character string to the terminal device. The terminal device accepts user correction for the character string created by the voice recognition device, and transmits the corrected character string to the voice recognition device. The voice recognition device creates and transmits a mail having the received character string as a mail text, and corrects the dictionary of the voice recognition database based on the correction portion. Thereby, the recognition accuracy of the word notation with respect to the reading kana can be improved.

更に、特許文献１および特許文献２には、ユーザ別に辞書を作成する技術が記載されている。 Furthermore, Patent Literature 1 and Patent Literature 2 describe a technique for creating a dictionary for each user.

特許文献１および特許文献２記載の技術では、端末装置は、音声データおよび文字列を、発信者番号と対応付けて音声認識装置に送信する。音声認識装置は、発信者電話番号に対応付けた複数の辞書を作成する。そして、音声認識装置は、音声データの送信元の発信者電話番号に対応する辞書を用いて音声認識を行うとともに、文字列の送信元の発信者電話番号に対応する辞書を修正する。これにより、ユーザ属性ごとに異なる単語表記の傾向を反映したユーザ別の辞書を作成することができ、音声認識の精度を向上させることができる。
特開２００２−２１５６１５号公報特開２００１−３０９０４９号公報 In the technologies described in Patent Literature 1 and Patent Literature 2, the terminal device transmits voice data and a character string to the voice recognition device in association with the caller number. The voice recognition device creates a plurality of dictionaries associated with the caller telephone number. The voice recognition apparatus performs voice recognition using a dictionary corresponding to the caller telephone number of the voice data transmission source and corrects the dictionary corresponding to the caller telephone number of the character string transmission source. Thereby, the dictionary for every user reflecting the tendency of different word notation for every user attribute can be created, and the precision of voice recognition can be improved.
JP 2002-215615 A JP 2001-309049 A

ところで、音声認識データベースに含まれる言語モデルは、通常、学習対象として用意された文字列に対して、所定の統計情報処理を行うことにより作成される。言語モデルは、辞書に記述された各単語について、出現確率や接続確率をデータ化したものである。 By the way, the language model included in the speech recognition database is usually created by performing predetermined statistical information processing on a character string prepared as a learning target. The language model is obtained by converting the appearance probability and connection probability into data for each word described in the dictionary.

文脈の特徴はユーザごとに異なるため、各単語の出現確率や接続確率もユーザごとに異なる。したがって、音声認識の精度の向上を図るには、このような違いを考慮して音声認識を行うことが望ましい。 Since context features differ from user to user, the appearance probability and connection probability of each word also vary from user to user. Therefore, in order to improve the accuracy of speech recognition, it is desirable to perform speech recognition in consideration of such differences.

ところが、特許文献１および特許文献２記載の技術では、文字列に対する修正部分から辞書を修正するのみであるため、ユーザごとの文脈の違いを考慮して言語モデルを作成することはできない。すなわち、特許文献１および特許文献２記載の技術では、文脈に依存して異なる表記についての音声認識の精度を向上させることは困難である。 However, with the techniques described in Patent Document 1 and Patent Document 2, since the dictionary is only corrected from the corrected portion of the character string, a language model cannot be created in consideration of the difference in context for each user. That is, with the techniques described in Patent Literature 1 and Patent Literature 2, it is difficult to improve the accuracy of speech recognition for different notations depending on the context.

そこで、特許文献１および特許文献２の音声認識装置で、端末装置から送られてきた文字列を利用して、ユーザ別の言語モデルを作成することが考えられる。これにより、言語モデルを作成するのに十分な量の文字列を、各ユーザから容易に取得することが可能となる。 Therefore, it is conceivable to create a language model for each user using the character strings sent from the terminal device in the speech recognition devices of Patent Literature 1 and Patent Literature 2. Thereby, it is possible to easily obtain a sufficient amount of character strings from each user to create a language model.

しかしながら、特許文献１および特許文献２記載の技術を用いた場合、メールサーバに音声認識装置を設けるなど、送信メールの経路上に音声認識装置が位置するようなシステム構成としなければならず、既存のシステムへの適用が困難である。すなわち、システム構築にコストや手間が掛かることから、文脈に依存して異なる表記についての音声認識の精度を向上させることは難しい。 However, when the techniques described in Patent Literature 1 and Patent Literature 2 are used, a system configuration in which the voice recognition device is positioned on the route of the outgoing mail, such as providing a voice recognition device in the mail server, is required. It is difficult to apply to the system. In other words, since it takes time and effort to construct a system, it is difficult to improve the accuracy of speech recognition for different notations depending on the context.

本発明は、かかる点に鑑みてなされたものであり、文脈に依存して異なる表記についての音声認識の精度を容易に向上させることができる端末装置、言語モデル作成装置、および分散型音声認識システムを提供することを目的とする。 The present invention has been made in view of such points, and a terminal device, a language model creation device, and a distributed speech recognition system that can easily improve the accuracy of speech recognition for different notations depending on the context. The purpose is to provide.

本発明の端末装置は、音声データを、言語モデルを用いて音声認識処理を行う音声認識装置に送信する音声データ送信手段と、通常の送信メールのメール本文を、言語モデル作成用メールとして、前記言語モデルを作成する言語モデル作成装置に送信するメール送信手段とを有する構成を採る。 The terminal device according to the present invention includes voice data transmitting means for transmitting voice data to a voice recognition device that performs voice recognition processing using a language model, and a mail text of a normal outgoing mail as a language model creation mail, A configuration having a mail transmitting means for transmitting to a language model creating apparatus for creating a language model is adopted.

本発明の言語モデル作成装置は、端末装置から受信した言語モデル作成用メールを用いて、音声認識処理に用いる言語モデルを作成する言語モデル作成装置であって、ＩＤ情報とメール本文とを含む前記言語モデル作成用メールを受信するメール受信手段と、受信した前記言語モデル作成用メールから、メール本文とＩＤ情報とを抽出するメール処理手段と、抽出した前記メール本文を学習し、前記ＩＤ情報毎に前記言語モデルを作成する言語モデル作成手段とを有する構成を採る。 The language model creation device of the present invention is a language model creation device that creates a language model used for speech recognition processing using a language model creation mail received from a terminal device, and includes the ID information and a mail text. Mail receiving means for receiving language model creation mail, mail processing means for extracting mail text and ID information from the received language model creation mail, learning the extracted mail text, and for each ID information And a language model creating means for creating the language model.

本発明の分散型音声認識システムは、言語モデルを用いて音声データに対する音声認識処理を行う音声認識装置と、前記音声認識装置に音声データを送信する端末装置と、文字列の学習により前記言語モデルを作成する言語モデル作成装置と、を具備する分散型音声認識システムであって、前記端末装置は、通常の送信メールの宛先を編集して言語モデル作成用メールを生成し、前記言語モデル作成装置に送信し、前記言語モデル作成装置は、受信した前記言語モデル作成用メールのメール本文を学習して前記言語モデルを作成し、前記音声認識装置は、前記端末装置から受信した前記音声データに対し、前記言語モデルを用いて音声認識処理を行う構成を採る。 The distributed speech recognition system of the present invention includes a speech recognition device that performs speech recognition processing on speech data using a language model, a terminal device that transmits speech data to the speech recognition device, and the language model by learning a character string. A distributed speech recognition system comprising: a language model creation device that creates a language model creation email by editing a destination of a normal transmission email, and the language model creation device The language model creation device learns the received mail body of the language model creation email to create the language model, and the speech recognition device applies the speech data received from the terminal device to the speech data The speech recognition process is performed using the language model.

本発明によれば、メールにより送信メールのメール本文を収集するので、既存のシステムに変更を加えることなく、ユーザ別の言語モデルを作成するのに十分な量の文字列を各ユーザから収集することができる。これにより、文脈に依存して異なる表記についての音声認識の精度を容易に向上させることができる。 According to the present invention, since the mail body of the outgoing mail is collected by mail, a sufficient amount of character strings for creating a language model for each user is collected from each user without changing the existing system. be able to. Thereby, the accuracy of speech recognition for different notations depending on the context can be easily improved.

以下、本発明の各実施の形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（実施の形態１）
図１は、本発明の実施の形態１に係る分散型音声認識システムとしての音声認識システムの構成の一例を示すシステム構成図である。本実施の形態は、本発明を、音声認識を用いて携帯電話機でメール本文を作成するシステムに適用した例である。 (Embodiment 1)
FIG. 1 is a system configuration diagram showing an example of a configuration of a speech recognition system as a distributed speech recognition system according to Embodiment 1 of the present invention. The present embodiment is an example in which the present invention is applied to a system for creating a mail text on a mobile phone using voice recognition.

図１において、音声認識システム１００は、携帯電話機２００−１〜２００−Ｍ、メールサーバ３００、および音声認識サーバ４００を有する。これらの装置は、無線または有線により通信網５００に接続されている。携帯電話機２００−１〜２００−Ｍは、同一の構成を有し、本発明の端末装置を含むものである。また、音声認識サーバ４００は、本発明の言語モデル作成装置と、音声認識データベースとを含むものである。 In FIG. 1, the voice recognition system 100 includes mobile phones 200-1 to 200 -M, a mail server 300, and a voice recognition server 400. These devices are connected to the communication network 500 by radio or wire. The mobile phones 200-1 to 200-M have the same configuration and include the terminal device of the present invention. The speech recognition server 400 includes the language model creation device of the present invention and a speech recognition database.

携帯電話機２００は、携帯電話機能およびメール送受信機能を有し、更に、音声認識サーバ４００を利用して音声入力によりメール本文の作成を行う音声入力機能を有している。具体的には、携帯電話機２００は、入力音声の特徴を示す音声データを、自装置のＩＤ（identifier）情報と対応付けて音声認識サーバ４００に送信する。そして、携帯電話機２００は、音声認識サーバ４００による音声認識の結果である文字列のデータ（以下「テキストデータ」という）を受信する。携帯電話機２００は、このようにして音声入力機能により作成したメール本文を、メールサーバ３００を介して、任意の宛先にメールで送信する。 The mobile phone 200 has a mobile phone function and a mail transmission / reception function, and further has a voice input function for creating a mail text by voice input using the voice recognition server 400. Specifically, the cellular phone 200 transmits voice data indicating the characteristics of the input voice to the voice recognition server 400 in association with ID (identifier) information of the own device. Then, the cellular phone 200 receives character string data (hereinafter referred to as “text data”) as a result of the voice recognition by the voice recognition server 400. The mobile phone 200 transmits the mail text created by the voice input function in this way to the arbitrary destination via the mail server 300 by mail.

また、携帯電話機２００は、メールを送信するごとに、その送信メールのメール本文の文字列を、自装置のＩＤ情報と対応付けて、メールにより音声認識サーバ４００に送信する。このメール本文は、音声認識サーバ４００において、携帯電話機２００のユーザ別の言語モデル（以下「ユーザ別言語モデル」という）を作成するのに用いられる。 In addition, each time the mobile phone 200 transmits a mail, the mobile phone 200 transmits the text string of the mail body of the transmitted mail to the voice recognition server 400 by mail in association with the ID information of the own device. This mail text is used in the speech recognition server 400 to create a language model for each user of the mobile phone 200 (hereinafter referred to as “user language model”).

メールサーバ３００は、携帯電話機２００のメールの送受信を管理する。 The mail server 300 manages mail transmission / reception of the mobile phone 200.

音声認識サーバ４００は、音声認識データベースを有し、この音声認識データベースに基づいて、所定の音声認識処理を行う。音声認識サーバ４００は、音声データを、携帯電話機２００のＩＤ情報と対応付けて受信し、対応するユーザ別言語モデルが存在する場合には、そのユーザ別言語モデルを用いて音声認識処理を行う。そして、音声認識サーバ４００は、音声認識の結果として作成したテキストデータを、音声データの送信元に返信する。 The voice recognition server 400 has a voice recognition database, and performs a predetermined voice recognition process based on the voice recognition database. The voice recognition server 400 receives the voice data in association with the ID information of the mobile phone 200, and when there is a corresponding user-specific language model, performs voice recognition processing using the user-specific language model. Then, the voice recognition server 400 returns the text data created as a result of the voice recognition to the voice data transmission source.

また、音声認識サーバ４００は、携帯電話機２００から、メールにより、携帯電話機２００のＩＤ情報と対応付けて送られてきた文字列を受信する。そして、音声認識サーバ４００は、受信した文字列を学習し、携帯電話機２００のＩＤ情報に対応付けたユーザ別言語モデルを作成する。 The voice recognition server 400 receives a character string sent from the mobile phone 200 in association with the ID information of the mobile phone 200 by mail. Then, the speech recognition server 400 learns the received character string and creates a user-specific language model associated with the ID information of the mobile phone 200.

通信網５００は、例えば、インターネットである。携帯電話機２００、メールサーバ３００、および音声認識サーバ４００は、ＴＣＰ／ＩＰ（transmission control protocol）などの通信プロトコルを用いて、通信網５００を介して互いに通信を行う。また、携帯電話機２００、メールサーバ３００、および音声認識サーバ４００は、ＳＭＴＰ（simple mail transfer protocol）やＰＯＰ３（post office protocol version 3）を用いて、メールの送受信を行う。 The communication network 500 is, for example, the Internet. The mobile phone 200, the mail server 300, and the voice recognition server 400 communicate with each other via the communication network 500 using a communication protocol such as TCP / IP (transmission control protocol). In addition, the mobile phone 200, the mail server 300, and the voice recognition server 400 perform mail transmission / reception using SMTP (simple mail transfer protocol) and POP3 (post office protocol version 3).

このような音声認識システム１００によれば、携帯電話機２００からメールが送信されるごとに、送信メールのメール本文と同一内容の文字列が、メールにより音声認識サーバ４００に送信される。この結果、携帯電話機２００のユーザにより作成されたメール本文と同一の文字列が、音声認識サーバ４００に自動的に送信される。すなわち、ユーザに特別な意識や操作をさせることなく、また、煩雑な処理を伴うことなく、ユーザ別言語モデルの作成に十分な量のテキストデータを、音声認識サーバ４００に収集することができる。また、ユーザが音声認識の機能を使用してメール本文を作成するに従って、次第にユーザ固有のメール本文の特徴を文脈情報と共に学習し、音声認識精度を向上させることができる。 According to such a voice recognition system 100, each time a mail is transmitted from the mobile phone 200, a character string having the same content as the mail text of the transmitted mail is transmitted to the voice recognition server 400 by mail. As a result, the same character string as the mail text created by the user of the mobile phone 200 is automatically transmitted to the voice recognition server 400. That is, a sufficient amount of text data for creating a user-specific language model can be collected in the speech recognition server 400 without causing the user to have special consciousness or operation and without complicated processing. In addition, as the user creates a mail text using the voice recognition function, the user can gradually learn the features of the mail text unique to the user together with the context information, thereby improving the voice recognition accuracy.

次に、携帯電話機２００の構成について説明する。 Next, the configuration of the mobile phone 200 will be described.

図２は、携帯電話機２００の構成を示すブロック図である。 FIG. 2 is a block diagram showing a configuration of the mobile phone 200.

図２に示すように、携帯電話機２００は、ＩＤ記憶部２０１、無線部２０２、アンテナ部２０３、操作部２０４、メール処理部２０５、マイクロフォン２０６、特徴量抽出部２０７、音声データ送信部２０８、テキストデータ受信部２０９、ディスプレイ２１０、スピーカ２１１、および制御部２１２を有する。メール処理部２０５は、ＢＣＣ（blind carbon copy）生成部２１３を有する。 As shown in FIG. 2, the mobile phone 200 includes an ID storage unit 201, a radio unit 202, an antenna unit 203, an operation unit 204, a mail processing unit 205, a microphone 206, a feature amount extraction unit 207, a voice data transmission unit 208, a text A data receiving unit 209, a display 210, a speaker 211, and a control unit 212 are included. The mail processing unit 205 includes a BCC (blind carbon copy) generation unit 213.

ＩＤ記憶部２０１は、携帯電話機２００に固有のＩＤ情報を記憶する。ＩＤ記憶部２０１は、例えば、多くの携帯電話機２００に搭載されているＳＩＭ（subscriber identity module）である。この場合のＩＤ情報は、例えば、加入者識別番号（ＩＭＳＩ：international mobile subscriber identity）や、ＳＩＭに割り当てられた識別番号（ＳＩＭＮＯ）である。 The ID storage unit 201 stores ID information unique to the mobile phone 200. The ID storage unit 201 is, for example, a SIM (subscriber identity module) installed in many mobile phones 200. The ID information in this case is, for example, a subscriber identification number (IMSI: International mobile subscriber identity) or an identification number (SIMNO) assigned to the SIM.

無線部２０２は、アンテナ部２０３を介して、通信網５００に配置された無線基地局（図示せず）と無線通信を行い、通信網５００に接続する。 The wireless unit 202 performs wireless communication with a wireless base station (not shown) arranged in the communication network 500 via the antenna unit 203 and connects to the communication network 500.

操作部２０４は、キースイッチ（図示せず）を備え、文字入力操作や、音声認識によるメール作成の開始の指示操作を含む各種のユーザ操作を受け付ける。 The operation unit 204 includes a key switch (not shown) and accepts various user operations including a character input operation and an instruction operation for starting mail creation by voice recognition.

メール処理部２０５は、ユーザの文字入力操作、および音声認識サーバ４００から受信されたテキストデータに基づいて、メール本文を作成する。そして、メール処理部２０５は、作成したメール本文を、任意のメールアドレスを宛先に指定して、無線部２０２を介してメールサーバ３００に送信する。 The mail processing unit 205 creates a mail body based on the user's character input operation and the text data received from the voice recognition server 400. Then, the mail processing unit 205 transmits the created mail text to the mail server 300 via the wireless unit 202 by designating an arbitrary mail address as a destination.

ＢＣＣ生成部２１３は、メールが送信されるごとに、その送信メールのメール本文と同一の文字列を、ＩＤ記憶部２０１に記憶されたＩＤ情報と対応付けて、メールにより音声認識サーバ４００に送信する。具体的には、ＢＣＣ生成部２１３は、ＢＣＣの機能を用いて、送信メールのメール本文の宛先に音声認識サーバ４００を追加し、送信メールのコピーメールを、音声認識サーバ４００に送信する。 Each time a mail is transmitted, the BCC generation unit 213 associates the same character string with the mail text of the transmitted mail with the ID information stored in the ID storage unit 201 and transmits the same to the voice recognition server 400 by mail. To do. Specifically, the BCC generation unit 213 uses the BCC function to add the voice recognition server 400 to the destination of the mail body of the outgoing mail, and sends a copy mail of the outgoing mail to the voice recognition server 400.

マイクロフォン２０６は、ユーザの発話音声を含む音声を入力し、音声信号に変換する。 The microphone 206 inputs the voice including the user's uttered voice and converts it into a voice signal.

特徴量抽出部２０７は、マイクロフォン２０６から出力される音声信号を分析し、音声認識サーバ４００での音声認識で用いられる特徴量を抽出する。具体的には、特徴量抽出部２０７は、音声信号に対してフレーム処理を行い、フレームごとにフーリエ解析を含む所定の処理を行って、ケプストラムパラメータなどの音声特徴量（以下単に「特徴量」という）を抽出する。そして、特徴量抽出部２０７は、解析結果からユーザの音声が含まれている音声区間を検出し、音声区間の特徴量のみによる時系列データを生成する。 The feature quantity extraction unit 207 analyzes the voice signal output from the microphone 206 and extracts the feature quantity used for voice recognition by the voice recognition server 400. Specifically, the feature amount extraction unit 207 performs frame processing on the speech signal, performs predetermined processing including Fourier analysis for each frame, and performs speech feature amounts such as cepstrum parameters (hereinafter simply referred to as “feature amount”). Extract). Then, the feature amount extraction unit 207 detects a speech section including the user's voice from the analysis result, and generates time-series data based only on the feature amount of the speech section.

音声データ送信部２０８は、音声認識によるメール本文の作成が開始されると、無線部２０２を介して、音声認識サーバ４００との間で音声データおよびテキストデータを送受信するためのセッションを確立し、ＩＤ記憶部２０１に記憶されたＩＤ情報を音声認識サーバ４００に送信する。そして、音声データ送信部２０８は、特徴量抽出部２０７から出力される音声データをパケット化し、音声認識サーバ４００に送信する。 The voice data transmission unit 208 establishes a session for transmitting and receiving voice data and text data to and from the voice recognition server 400 via the wireless unit 202 when creation of a mail body by voice recognition is started. The ID information stored in the ID storage unit 201 is transmitted to the voice recognition server 400. Then, the voice data transmission unit 208 packetizes the voice data output from the feature amount extraction unit 207 and transmits it to the voice recognition server 400.

テキストデータ受信部２０９は、音声データに対する音声認識結果として音声認識サーバ４００から返信されるテキストデータを、無線部２０２を介して受信する。 The text data receiving unit 209 receives text data returned from the speech recognition server 400 as a speech recognition result for the speech data via the wireless unit 202.

ディスプレイ２１０は、ユーザによる文字入力操作、および音声認識サーバ４００から受信したテキストデータに基づいて、テキスト文書をメール本文の候補として表示する。また、ディスプレイ２１０は、携帯電話機２００の操作に関する各種情報を表示する。 The display 210 displays a text document as a mail body candidate based on the character input operation by the user and the text data received from the speech recognition server 400. The display 210 displays various information related to the operation of the mobile phone 200.

スピーカ２１１は、例えば、携帯電話機能において相手先から送られてくる音声データを音声出力する。 The speaker 211 outputs, for example, voice data transmitted from the other party in the mobile phone function.

制御部２１２は、ＣＰＵ（central processing unit）、制御プログラムを格納したＲＯＭ（read only memory）などの記憶媒体、ＲＡＭ（random access memory）などの作業用メモリなどを含んで構成され、携帯電話機２００の各部を制御する。また、携帯電話機２００の各部は、例えば、ＡＳＩＣ（application specific integrated circuit）や、通信回路を含んで構成される。 The control unit 212 includes a CPU (central processing unit), a storage medium such as a ROM (read only memory) storing a control program, a working memory such as a RAM (random access memory), and the like. Control each part. Each unit of the mobile phone 200 includes, for example, an ASIC (Application Specific Integrated Circuit) and a communication circuit.

次に、音声認識サーバ４００の構成について説明する。 Next, the configuration of the voice recognition server 400 will be described.

図３は、音声認識サーバ４００の構成を示すブロック図である。 FIG. 3 is a block diagram showing a configuration of the voice recognition server 400.

図３に示すように、音声認識サーバ４００は、ネットワークインタフェース（Ｉ／Ｆ：interface）部４１０、ユーザ別言語モデル作成部４２０、音声認識データベース（ＤＢ：database）４３０、および音声認識部４４０を有する。 As illustrated in FIG. 3, the speech recognition server 400 includes a network interface (I / F) unit 410, a user-specific language model creation unit 420, a speech recognition database (DB) 430, and a speech recognition unit 440. .

ネットワークインタフェース部４１０は、通信網５００に有線接続する。 The network interface unit 410 is connected to the communication network 500 by wire.

ユーザ別言語モデル作成部４２０は、携帯電話機２００からメールサーバ３００経由で送られてきた、携帯電話機２００のＩＤ情報と対応付けられた文字列を、ネットワークインタフェース部４１０を介して受信する。そして、ユーザ別言語モデル作成部４２０は、受信した文字列を解析し、ＩＤ情報に対応付けてユーザ別言語モデルを作成して、音声認識データベース４３０を更新する。このユーザ別言語モデル作成部４２０は、メール受信部４２１、データベース（ＤＢ）切換部４２２、および言語モデル作成部４２３を有する。メール受信部４２１は、文書抽出部４２４およびＩＤ抽出部４２５を有する。 The user-specific language model creating unit 420 receives the character string associated with the ID information of the mobile phone 200 sent from the mobile phone 200 via the mail server 300 via the network interface unit 410. Then, the user-specific language model creation unit 420 analyzes the received character string, creates a user-specific language model in association with the ID information, and updates the speech recognition database 430. The user-specific language model creation unit 420 includes a mail reception unit 421, a database (DB) switching unit 422, and a language model creation unit 423. The mail reception unit 421 includes a document extraction unit 424 and an ID extraction unit 425.

メール受信部４２１は、自装置宛のメールを受信する。具体的には、メール受信部４２１は、メールサーバ３００から、音声認識サーバ４００のドメイン名を宛先とするメールを取得する。 The mail receiving unit 421 receives mail addressed to its own device. Specifically, the mail receiving unit 421 acquires mail from the mail server 300 that is addressed to the domain name of the voice recognition server 400.

文書抽出部４２４は、受信メールのメール本文を学習対象の文字列として抽出し、言語モデル作成部４２３に出力する。 The document extraction unit 424 extracts the mail text of the received mail as a character string to be learned and outputs it to the language model creation unit 423.

言語モデル作成部４２３は、メール受信部４２１で抽出されたメール本文に対して所定の統計情報処理を行い、ユーザ別言語モデルを作成する。具体的には、既に存在するユーザ別言語モデルが処理の対象である場合には、新たに受信したメール本文の統計情報処理結果に応じて言語モデルを作成し、ユーザ別言語モデル４３４を修正する。また、新たなユーザ別言語モデルを作成する場合には、作成した言語モデルを、基となる受信メールから抽出されたＩＤ情報に対応付けて、ユーザ別言語モデル４３４に登録する。 The language model creation unit 423 performs predetermined statistical information processing on the mail text extracted by the mail reception unit 421 to create a user-specific language model. Specifically, when an already existing user-specific language model is a processing target, a language model is created according to the statistical information processing result of the newly received mail body, and the user-specific language model 434 is corrected. . When a new user-specific language model is created, the created language model is registered in the user-specific language model 434 in association with the ID information extracted from the base received mail.

一方、ＩＤ抽出部４２５は、受信メールに対応付けられた、送信元の携帯電話機２００のＩＤ情報を抽出し、データベース切換部４２２に出力する。具体的には、ＩＤ抽出部４２５は、受信メールの宛先アドレスのアカウント名から、送信元のＩＤ情報を抽出する。携帯電話機２００が音声認識サーバ４００にメールを送信する際の宛先アドレスの構成については後述する。 On the other hand, the ID extraction unit 425 extracts the ID information of the transmission source mobile phone 200 associated with the received mail, and outputs it to the database switching unit 422. Specifically, the ID extraction unit 425 extracts the source ID information from the account name of the destination address of the received mail. The configuration of the destination address when the mobile phone 200 transmits mail to the voice recognition server 400 will be described later.

データベース切換部４２２は、言語モデル作成部４２３による登録処理および更新処理の対象となるユーザ別言語モデルを切り換える。具体的には、データベース切換部４２２は、音声認識データベース４３０から、ＩＤ抽出部４２５が受信メールから抽出したＩＤ情報に対応するユーザ別言語モデルを、言語モデル作成部４２３の処理対象として選択する。 The database switching unit 422 switches the language model for each user that is the target of the registration process and the update process by the language model creation unit 423. Specifically, the database switching unit 422 selects a language model for each user corresponding to the ID information extracted from the received mail by the ID extraction unit 425 from the speech recognition database 430 as a processing target of the language model creation unit 423.

ここで、言語モデルについて説明する。言語モデルは、学習対象の文字列における文脈のパターンを、関連語彙数によりモデル化したものであり、トライグラムモデル、バイグラムモデル、およびユニグラムなどのＮグラムモデルを含む。 Here, the language model will be described. The language model is obtained by modeling a context pattern in a character string to be learned by the number of related vocabularies, and includes N-gram models such as a trigram model, a bigram model, and a unigram.

Ｎグラムは、連続するＮ個の単語の組み合わせである。Ｎグラムモデルは、学習対象の文字列からＮグラムを抽出し、抽出されたＮグラムのそれぞれについて出現確率を算出したものである。実際には、Ｎグラムモデルの作成では、機能語や固有名詞の無視など、学習対象として扱う語彙の制限や、出現頻度の少ないＮグラムのカットオフが行われる。これにより、計算量を削減することができる。また、出現していないＮグラムを考慮した確率の平滑化などが行われる。これにより、学習の初期段階において、統計量の不足により音声認識精度が低下するのを防ぐことができる。 An N-gram is a combination of N consecutive words. The N-gram model is obtained by extracting N-grams from a character string to be learned and calculating the appearance probability for each of the extracted N-grams. In actuality, in the creation of an N-gram model, the vocabulary handled as a learning target is restricted, such as ignoring function words and proper nouns, and N-grams with a low appearance frequency are cut off. Thereby, the amount of calculation can be reduced. Moreover, smoothing of the probability in consideration of N-grams that do not appear is performed. Thereby, it is possible to prevent the voice recognition accuracy from being deteriorated due to a lack of statistics in the initial stage of learning.

すなわち、ユーザ別言語モデルは、ユーザ別にそのユーザが作成した文字列のみに基づいて作成されるため、ユーザの文脈のパターンをより反映させた内容となる。例えば、ある携帯電話機２００から「尾田さんに連絡」という文字列を含むメールが何度か送信された場合、対応するユーザ別言語モデル４３４では、「小田さんに連絡」や「織田さんに連絡」の出現確率よりも、「尾田さんに連絡」という文字列の出現確率が高くなる。 That is, since the language model for each user is created based on only the character string created by the user for each user, the user language model more reflects the context pattern of the user. For example, when a mail including a character string “Contact Ms. Oda” is transmitted several times from a certain mobile phone 200, “Contact Ms. Oda” or “Contact Ms. Oda” in the corresponding language model 434 for each user. The appearance probability of the character string “Contact Oda” is higher than the appearance probability of.

音声認識データベース４３０は、音声認識部４４０が所定の音声認識処理で用いる各種データを格納する。音声認識データベース４３０は、音響モデル４３１、辞書４３２、共通言語モデル４３３、およびユーザ別言語モデル４３４を格納する。ここでは、言語モデル作成部４２３によって、携帯電話機２００−１〜２００−Ｍに対応するＭ個のユーザ別言語モデル４３４−１〜４３４−Ｍが既に作成された状態を示している。 The voice recognition database 430 stores various data used by the voice recognition unit 440 in a predetermined voice recognition process. The speech recognition database 430 stores an acoustic model 431, a dictionary 432, a common language model 433, and a user-specific language model 434. Here, a state is shown in which M language-specific language models 434-1 to 434 -M corresponding to the mobile phones 200-1 to 200 -M have already been created by the language model creation unit 423.

音響モデル４３１は、携帯電話機２００で入力音声から得られる特徴量と発音記号との確率的な対応付けをデータ化したものである。音響モデル４３１は、例えば、新聞記事の文字列とその新聞記事を読み上げたときの音声とをデータ化して蓄積した、読み上げ音声データベースに基づいて作成される。 The acoustic model 431 is obtained by converting the probabilistic association between the feature amount obtained from the input voice by the mobile phone 200 and the phonetic symbol into data. The acoustic model 431 is created, for example, based on a reading voice database in which character strings of newspaper articles and voices when reading the newspaper articles are converted into data and stored.

辞書４３２は、音声認識の対象となる単語の発音記号をデータ化したものである。辞書４３２は、例えば、上記した読み上げ音声データベースに基づいて作成される。 The dictionary 432 is obtained by converting phonetic symbols of words to be speech-recognized into data. The dictionary 432 is created based on, for example, the above-described reading voice database.

共通言語モデル４３３は、辞書４３２に記述された単語のそれぞれについて、一般的な出現確率や接続確率をデータ化したものである。共通言語モデル４３３は、例えば、上記した読み上げ音声データベースのうち、テキストデータから作成される。ここでは、上記音響モデル４３１、辞書４３２、および共通言語モデル４３３は、全てのユーザに対して共通して使用されるものとして説明するが、勿論、これらの一部または全てをユーザ別に設けた構成としてもよい。 The common language model 433 is obtained by converting general appearance probabilities and connection probabilities for each word described in the dictionary 432 into data. For example, the common language model 433 is created from text data in the above-described speech database. Here, the acoustic model 431, the dictionary 432, and the common language model 433 are described as being commonly used for all users, but of course, a configuration in which some or all of these are provided for each user. It is good.

ユーザ別言語モデル４３４は、ユーザ別言語モデル作成部４２０によって、携帯電話機２００からの受信メール、つまり携帯電話機２００の送信メールに含まれるメール本文の文字列に基づいて作成される言語モデルである。ユーザ別言語モデル４３４は、上記したように携帯電話機２００のユーザごとに作成され、それぞれ携帯電話機２００のＩＤ情報に対応付けられている。 The user-specific language model 434 is a language model created by the user-specific language model creation unit 420 based on the received mail from the mobile phone 200, that is, based on the character string of the mail text included in the sent mail of the mobile phone 200. The user-specific language model 434 is created for each user of the mobile phone 200 as described above, and is associated with the ID information of the mobile phone 200, respectively.

音声認識部４４０は、携帯電話機２００から受信した音声データに対し、音響モデル４３１、辞書４３２、および共通言語モデル４３３を用いて、所定の音声認識処理を行う。そして、音声認識部４４０は、音声認識結果として作成したテキストデータを、音声データの送信元に返信する。また、音声認識部４４０は、音声データを携帯電話機２００のＩＤ情報と対応付けて受信し、対応するユーザ別言語モデル４３４が存在する場合には、そのユーザ別言語モデル４３４も併せて用いる。この音声認識部４４０は、音声データ受信部４４１、データベース切換部４４２、文章作成部４４３、およびテキストデータ送信部４４４を有する。音声データ受信部４４１は、ＩＤ受信部４４５を有する。 The voice recognition unit 440 performs predetermined voice recognition processing on the voice data received from the mobile phone 200 using the acoustic model 431, the dictionary 432, and the common language model 433. Then, the voice recognition unit 440 returns the text data created as the voice recognition result to the voice data transmission source. In addition, the voice recognition unit 440 receives voice data in association with the ID information of the mobile phone 200, and when there is a corresponding user-specific language model 434, the user-specific language model 434 is also used. The voice recognition unit 440 includes a voice data reception unit 441, a database switching unit 442, a sentence creation unit 443, and a text data transmission unit 444. The audio data receiving unit 441 includes an ID receiving unit 445.

音声データ受信部４４１は、携帯電話機２００からの要求に応じて携帯電話機２００とのセッションを確立し、携帯電話機２００から送られてくる音声データを受信する。 The voice data receiving unit 441 establishes a session with the mobile phone 200 in response to a request from the mobile phone 200 and receives the voice data sent from the mobile phone 200.

ＩＤ受信部４４５は、セッション確立の際に携帯電話機２００から送られてくるＩＤ情報を受信する。 The ID receiving unit 445 receives ID information transmitted from the mobile phone 200 when a session is established.

データベース切換部４４２は、文章作成部４４３による処理の対象となるユーザ別言語モデル４３４を切り換える。具体的には、データベース切換部４４２は、音声認識データベース４３０から、ＩＤ受信部４４５が受信したＩＤ情報に対応するユーザ別言語モデル４３４を、文章作成部４４３の処理対象として選択する。 The database switching unit 442 switches the user-specific language model 434 to be processed by the sentence creation unit 443. Specifically, the database switching unit 442 selects the language model for each user 434 corresponding to the ID information received by the ID receiving unit 445 from the speech recognition database 430 as a processing target of the sentence creating unit 443.

文章作成部４４３は、音声データに対し、音響モデル４３１、辞書４３２、および共通言語モデル４３３を用いて所定の音声認識処理を行い、テキストデータを生成する。具体的には、文章作成部４４３は、音響モデル４３１から各発音記号の尤度を、辞書４３２から各発音記号の組み合わせに対応する単語を、共通言語モデル４３３およびユーザ別言語モデル４３４からＮグラムによる文脈上の各単語の出現確率をそれぞれ求める。そして、文章作成部４４３は、例えば、発音記号の尤度と単語の出現確率との積が最大となる単語列を探索する処理を音声データに対して行い、探索された単語列から、テキストデータを作成する。 The sentence creation unit 443 performs predetermined speech recognition processing on the speech data using the acoustic model 431, the dictionary 432, and the common language model 433, and generates text data. Specifically, the sentence creation unit 443 generates the likelihood of each phonetic symbol from the acoustic model 431, the word corresponding to the combination of each phonetic symbol from the dictionary 432, and N-grams from the common language model 433 and the user-specific language model 434. The occurrence probability of each word in the context is calculated. Then, the sentence creation unit 443 performs, for example, a process for searching for a word string that maximizes the product of the likelihood of the phonetic symbol and the word appearance probability, and the text data is extracted from the searched word string. Create

また、文章作成部４４３は、受信したＩＤ情報に対応するユーザ別言語モデル４３４が存在する場合には、そのユーザ別言語モデル４３４も併せて参照する。具体的には、文章作成部４４３は、共通言語モデル４３３とユーザ別言語モデル４３４とに重み付けを行い、両方の出現確率のそれぞれに重みを乗じた値の加算値を、各Ｎグラムの出現確率として採用する。なお、文章作成部４４３は、ユーザ別言語モデル４３４の作成過程における統計量の不足を考慮して、ユーザ別言語モデル４３４が作成されてからの経過時間やユーザ別言語モデル４３４に対する更新回数等に応じて、上記重み付けを変化させるようにしてもよい。 In addition, when there is a user-specific language model 434 corresponding to the received ID information, the text creation unit 443 also refers to the user-specific language model 434. Specifically, the sentence creation unit 443 weights the common language model 433 and the user-specific language model 434, and adds the value obtained by multiplying each of the appearance probabilities by the weight to the appearance probability of each N-gram. Adopt as. Note that the text creation unit 443 takes into account the lack of statistics in the process of creating the user-specific language model 434, and determines the elapsed time since the user-specific language model 434 was created, the number of updates to the user-specific language model 434, and the like. Accordingly, the weighting may be changed.

このようにユーザ別言語モデル４３４を用いることにより、ユーザの文脈のパターンをより反映させて、音声認識を行うことができ、音声認識の精度を向上させることができる。例えば、上記の「尾田さんに連絡」の例でいうと、「オダサンニレンラク」という音声データに対して、「小田さんに連絡」や「織田さんに連絡」ではなく、「尾田さんに連絡」というテキストデータを生成することになる。ユーザに尾田さんという友人がおり、なんらかの連絡先としてメールで他者に伝達される回数が多い場合、「オダサンニレンラク」という音声が、「尾田さんに連絡」を意味する可能性が高い。すなわち、ユーザの所望の文字列をより高い確率で選択することができ、このことは、音声認識精度が向上したことを示す。 By using the user-specific language model 434 as described above, it is possible to more accurately reflect the user's context pattern and perform speech recognition, and improve the accuracy of speech recognition. For example, in the case of “Contact Ms. Oda” above, “Contact Ms. Oda” rather than “Contact Ms. Oda” or “Contact Ms. Oda” for the voice data “Odasan Nirenraku”. Will be generated. If the user has a friend named Mr. Oda and the number of times he / she is communicated to others by email as some kind of contact information, the voice “Odasan Nirenraku” is likely to mean “contact Mr. Oda”. That is, the user's desired character string can be selected with a higher probability, which indicates that the voice recognition accuracy has been improved.

テキストデータ送信部４４４は、文章作成部４４３から出力されるテキストデータをパケット化し、ネットワークインタフェース部４１０を介して、携帯電話機２００に送信する。 The text data transmission unit 444 packetizes the text data output from the text creation unit 443 and transmits it to the mobile phone 200 via the network interface unit 410.

音声認識サーバ４００は、図示しないが、ＣＰＵ、制御プログラムを格納したＨＤＤ（hard disc drive）およびＲＯＭなどの記憶媒体、ＲＡＭなどの作業用メモリなどを有する。ＣＰＵによる制御プログラムの実行により、上記した各部の機能は実現される。 Although not shown, the speech recognition server 400 includes a CPU, a storage medium such as a hard disk drive (HDD) and a ROM that stores a control program, a working memory such as a RAM, and the like. By executing the control program by the CPU, the functions of the above-described units are realized.

以下、上記構成を有する携帯電話機２００および音声認識サーバ４００の動作について説明する。 Hereinafter, operations of the mobile phone 200 and the voice recognition server 400 having the above-described configurations will be described.

まず、携帯電話機２００の動作について、フローチャートを用いて説明する。ここでは、メール作成に関する動作のみについて説明を行う。 First, the operation of the mobile phone 200 will be described using a flowchart. Here, only operations related to mail creation will be described.

図４は、携帯電話機２００のメール作成に関する動作の流れを示すフローチャートである。 FIG. 4 is a flowchart showing a flow of operations related to mail creation of the mobile phone 200.

ステップＳ１１００で、制御部２１２は、操作部２０４の操作などにより音声認識によるメール作成の開始が指示されたか否かを判断する。音声認識によるメール作成の開始が指示されていない場合には（Ｓ１１００：ＮＯ）、処理はステップＳ１２００に進み、音声認識によるメール作成の開始が指示された場合には（Ｓ１１００：ＹＥＳ）、処理はステップＳ１３００に進む。 In step S 1100, the control unit 212 determines whether or not the start of mail creation by voice recognition is instructed by the operation of the operation unit 204 or the like. If the start of mail creation by voice recognition is not instructed (S1100: NO), the process proceeds to step S1200. If the start of mail creation by voice recognition is instructed (S1100: YES), the process is performed. The process proceeds to step S1300.

ステップＳ１２００で、制御部２１２は、通常の操作部２０４の文字入力操作によるメール作成の開始が指示されたか否かを判断する。通常のメール作成の開始が指示されていない場合には（Ｓ１２００：ＮＯ）、処理は後述のステップＳ２３００に進み、通常のメール作成の開始が指示された場合には（Ｓ１２００：ＹＥＳ）、処理はステップＳ１４００に進む。 In step S1200, control unit 212 determines whether or not an instruction to start mail creation by a character input operation of normal operation unit 204 is given. If the start of normal mail creation is not instructed (S1200: NO), the process proceeds to step S2300 described later. If the start of normal mail creation is instructed (S1200: YES), the process is performed. The process proceeds to step S1400.

ステップＳ１３００で、音声データ送信部２０８は、ＴＣＰ／ＩＰプロトコルにより音声認識サーバ４００との通信を開始して音声認識処理のためのセッションを確立し、ＩＤ記憶部２０１からＩＤ情報を読み出して音声認識サーバ４００に送信する。 In step S1300, the voice data transmission unit 208 establishes a session for voice recognition processing by starting communication with the voice recognition server 400 using the TCP / IP protocol, reads ID information from the ID storage unit 201, and performs voice recognition. Send to server 400.

ステップＳ１５００で、制御部２１２は、マイクロフォン２０６による音声入力を開始し、マイクロフォン２０６から出力される音声信号を特徴量抽出部２０７に入力させる。 In step S 1500, the control unit 212 starts audio input by the microphone 206 and causes the feature amount extraction unit 207 to input an audio signal output from the microphone 206.

ステップＳ１６００で、特徴量抽出部２０７は、音声信号を分析して特徴量を抽出し、特徴量の時系列データである音声データを出力する。 In step S1600, the feature amount extraction unit 207 extracts a feature amount by analyzing the speech signal, and outputs speech data that is time-series data of the feature amount.

ステップＳ１７００で、音声データ送信部２０８は、特徴量抽出部２０７から出力される音声データをパケット化し、音声認識サーバ４００に送信する。例えば、音声データ送信部２０８は、音声データを蓄積しておき、操作部２０４にて音声入力の終了操作が行われたときに音声データを一括して音声認識サーバ４００に送信する。 In step S 1700, the voice data transmission unit 208 packetizes the voice data output from the feature amount extraction unit 207 and transmits the packetized data to the voice recognition server 400. For example, the voice data transmission unit 208 accumulates voice data, and transmits the voice data to the voice recognition server 400 in a batch when a voice input end operation is performed by the operation unit 204.

ステップＳ１８００で、テキストデータ受信部２０９は、音声認識サーバ４００から音声認識結果の受信を待機する。音声認識結果が受信されない場合は（Ｓ１８００：ＮＯ）、処理はステップＳ１９００に進む。 In step S1800, the text data receiving unit 209 waits for reception of a speech recognition result from the speech recognition server 400. If the speech recognition result is not received (S1800: NO), the process proceeds to step S1900.

ステップＳ１９００で、テキストデータ受信部２０９は、音声データ送信部２０８から音声データが送信されてから所定の時間が経過してタイムアウトになったか否かを判断する。タイムアウトになっていない場合には（Ｓ１９００：ＮＯ）、処理はステップＳ１８００に戻り、音声認識結果を受信しないままタイムアウトになった場合には（Ｓ１９００：ＹＥＳ）、処理は後述のステップＳ２３００に進む。このとき、携帯電話機２００は、音声認識によるメール作成ができない旨を、ディスプレイ２１０を用いてユーザに通知するようにしてもよい。 In step S1900, the text data receiving unit 209 determines whether or not a predetermined time has elapsed after the audio data is transmitted from the audio data transmitting unit 208 and timed out. If not timed out (S1900: NO), the process returns to step S1800, and if timed out without receiving a voice recognition result (S1900: YES), the process proceeds to step S2300 described later. At this time, the mobile phone 200 may notify the user using the display 210 that an email cannot be created by voice recognition.

タイムアウトになる前にテキストデータ受信部２０９が音声認識結果を受信した場合には（Ｓ１８００：ＹＥＳ）、処理はステップＳ２０００に進む。 If the text data receiving unit 209 receives the voice recognition result before the time-out (S1800: YES), the process proceeds to step S2000.

一方、ステップＳ１４００では、通常のメール作成の開始が指示されたことから、制御部２１２は、操作部２０４を用いた文字入力操作による通常のテキストデータ作成を開始し、処理はステップＳ２０００に進む。 On the other hand, in step S1400, since the start of normal mail creation has been instructed, the control unit 212 starts normal text data creation by a character input operation using the operation unit 204, and the process proceeds to step S2000.

ステップＳ２０００で、制御部２１２は、音声認識結果のテキストデータまたはキー入力されたテキストデータを、ディスプレイ２１０に文字列表示させる。このとき、制御部２１２は、必要に応じて操作部２０４の操作による文字列の編集を受け付ける。メール本文として文字列が確定し、送信先のメールアドレスが指定されて、メール送信が指示されると、ステップＳ２１００に進む。 In step S2000, the control unit 212 causes the display 210 to display the text data of the speech recognition result or the text data that has been key-inputted on the display 210. At this time, the control unit 212 accepts editing of a character string by operating the operation unit 204 as necessary. When the character string is confirmed as the mail text, the destination mail address is specified, and mail transmission is instructed, the process proceeds to step S2100.

ステップＳ２１００で、メール処理部２０５は、言語モデル作成用メールの送信準備をする。言語モデル作成用メールは、ユーザ別言語モデル４３４の作成の材料として、送信メールのメール本文に含まれる文字列を音声認識サーバ４００に送るためのメールである。言語モデル作成用メールは、送信メールと同一の文字列を含み、宛て先を音声認識サーバ４００とし、送信者を一意に特定するＩＤ情報を含む。 In step S2100, the mail processing unit 205 prepares to send a language model creation mail. The language model creation e-mail is an e-mail for sending the character string included in the e-mail body of the transmission e-mail to the speech recognition server 400 as a material for creating the user-specific language model 434. The language model creation mail includes the same character string as the transmission mail, the destination is the voice recognition server 400, and includes ID information that uniquely identifies the sender.

図５は、通常の送信メールの構成と、この通常の送信メールに対応して生成される言語モデル作成用メールの構成とを示す図である。 FIG. 5 is a diagram showing a configuration of a normal transmission mail and a configuration of a language model creation mail generated corresponding to the normal transmission mail.

図５に示すように、通常の送信メール６１０では、宛先として、ＴＯにユーザが指定したメールアドレスが、メール本文として、確定されたテキストがそれぞれ記述される。ここでは、付加的な宛先として、ＣＣ（copy carbon）およびＢＣＣには何も記述されていない場合を図示している。一方、言語モデル作成用メールを含むメール６２０では、ＢＣＣ生成部２１３により、付加的なアドレスとして、ＢＣＣに、言語モデル作成用メールの宛先アドレス（以下「言語モデル作成用アドレス」という）が記述される。 As shown in FIG. 5, in a normal transmission mail 610, the mail address designated by the user in TO is described as the destination, and the confirmed text is described as the mail body. Here, a case where nothing is described in CC (copy carbon) and BCC is shown as an additional destination. On the other hand, in the mail 620 including the language model creation mail, the BCC generation unit 213 describes the destination address of the language model creation mail (hereinafter referred to as “language model creation address”) in the BCC as an additional address. The

言語モデル作成用アドレスは、メールアドレスのドメイン部分とアカウント部分に、音声認識サーバ４００のドメイン名と、携帯電話機２００のＩＤ情報とをそれぞれ記述したものである。ここでは、ＩＤ記憶部２０１に記憶されたＩＤ情報が「０１」であり、音声認識サーバ４００のドメイン名が「ＳＲｓｅｒｖｅｒ．ｎｅ．ｊｐ」の場合を示している。すなわち、言語モデル作成用アドレスは、送信メールのメール本文を、音声認識サーバ４００に、ＩＤ情報に対応付けて送信することを可能にするものである。しかも、ＢＣＣに言語モデル作成用アドレスを指定するので、送信メールの本来の宛先である「ＡＡＡ＠ｂｂｂ．ｎｅ．ｊｐ」の端末には、言語モデル作成用メールの送信を意識させることが無い。 The address for creating the language model describes the domain name of the voice recognition server 400 and the ID information of the mobile phone 200 in the domain part and account part of the mail address. Here, the ID information stored in the ID storage unit 201 is “01”, and the domain name of the speech recognition server 400 is “SRserver.ne.jp”. In other words, the language model creation address enables the mail body of the outgoing mail to be transmitted to the voice recognition server 400 in association with the ID information. In addition, since the language model creation address is designated in the BCC, the terminal “AAA@bbb.ne.jp”, which is the original destination of the sent mail, does not make the user aware of sending the language model creating mail.

図４のステップＳ２２００で、メール処理部２０５は、通常メールおよび言語モデル作成用メールを、本来の宛先および音声認識サーバ４００に、メールサーバ３００を介してそれぞれ送信する。 In step S2200 of FIG. 4, the mail processing unit 205 transmits the normal mail and the language model creation mail to the original destination and the voice recognition server 400 via the mail server 300, respectively.

そして、ステップＳ２３００で、携帯電話機２００は、ユーザ操作等によりメール作成に関する処理の終了を指示されたか否かを判断する。携帯電話機２００は、処理の終了を指示されていない場合には（Ｓ２３００：ＮＯ）、ステップＳ１１００に戻り、指示された場合には（Ｓ２３００：ＹＥＳ）、一連の処理を終了する。 In step S2300, mobile phone 200 determines whether or not an instruction to end processing relating to mail creation is given by a user operation or the like. When the termination of the process is not instructed (S2300: NO), the mobile phone 200 returns to step S1100, and when instructed (S2300: YES), the series of processes is terminated.

このように、携帯電話機２００は、音声認識実行時には、音声データをＩＤ情報と対応付けて音声認識サーバ４００に対して送信するとともに、メール送信時には、送信メールに含まれるメール本文をＩＤ情報と対応付けて音声認識サーバ４００に対して送信する。これにより、ユーザが作成した文字列を極めて容易に言語モデルの作成に利用することができ、作成された言語モデルを音声認識処理に活用することができる。 As described above, the mobile phone 200 transmits the voice data to the voice recognition server 400 in association with the ID information at the time of executing the voice recognition, and corresponds to the mail text included in the transmitted mail with the ID information at the time of mail transmission. At the same time, it is transmitted to the voice recognition server 400. Thereby, the character string created by the user can be used for the creation of the language model very easily, and the created language model can be used for the speech recognition processing.

また、文字列の送信を、コピーメール送信により行い、ＩＤ情報の文字列への対応付けを、宛先アドレスのアカウント部分にＩＤ情報を記述することによって行う。これにより、既存の設備やアプリケーションソフトウェアの機能を最大限に活用することができ、システム構築のコストを低減することができる。 Further, the character string is transmitted by copy mail transmission, and the ID information is associated with the character string by describing the ID information in the account part of the destination address. As a result, the functions of existing facilities and application software can be utilized to the maximum, and the cost of system construction can be reduced.

また、ＳＩＭの識別情報は、読み出しが容易である一方で、通常はユーザによる変更が不可能となっている。また、ＳＩＭは、加入者識別情報など、ユーザに固有の情報を格納した記憶媒体として使用端末に差し替えて使用される。すなわち、ユーザは、機種変更を行っても、同一のＳＩＭを継続して使用する場合が多い。したがって、ＩＤ情報としてＳＩＭの識別情報を採用すれば、より容易かつ確実なユーザ識別が可能となり、音声認識の精度についての信頼性が向上する。 Further, the SIM identification information is easy to read, but normally cannot be changed by the user. The SIM is used as a storage medium storing information unique to the user, such as subscriber identification information, in place of the terminal in use. That is, the user often uses the same SIM continuously even after changing the model. Therefore, if SIM identification information is used as the ID information, user identification can be performed more easily and reliably, and the reliability of voice recognition accuracy is improved.

次に、音声認識サーバ４００の動作について、フローチャートを用いて説明する。 Next, the operation of the voice recognition server 400 will be described using a flowchart.

図６は、音声認識サーバ４００の動作の流れを示すフローチャートである。 FIG. 6 is a flowchart showing the operation flow of the voice recognition server 400.

ステップＳ３１００で、音声データ受信部４４１は、携帯電話機２００からの要求を受けて携帯電話機２００とのセッションが確立したか否かを判別する。セッションが確立していない場合には（Ｓ３１００：ＮＯ）、処理はステップＳ３２００に進む。セッションが確立した場合には（Ｓ３１００：ＹＥＳ）、処理はステップＳ３３００に進む。ここで、ＩＤ受信部４４５がＩＤ情報を受信した場合には、受信されたＩＤ情報がデータベース切換部４４２に出力される。 In step S3100, the audio data receiving unit 441 determines whether a session with the mobile phone 200 has been established in response to a request from the mobile phone 200. If a session has not been established (S3100: NO), the process proceeds to step S3200. If the session is established (S3100: YES), the process proceeds to step S3300. Here, when the ID receiving unit 445 receives the ID information, the received ID information is output to the database switching unit 442.

ステップＳ３２００で、メール受信部４２１は、メールサーバ３００に対してメール取得要求を行って自装置宛のメールを受信し、携帯電話機２００から言語モデル作成用メールを受信したか否かを判断する。言語モデル作成用メールを受信した場合には（Ｓ３２００：ＹＥＳ）、処理はステップＳ３４００に進む。受信していない場合には（Ｓ３２００：ＮＯ）、処理は後述のステップＳ４１００に進む。なお、音声認識処理に比べて、ユーザ別言語モデル作成処理は求められる即時性が低いため、ステップＳ３２００の処理は、予め定められた時間おきに実行するようにしてもよい。 In step S3200, mail receiving unit 421 makes a mail acquisition request to mail server 300, receives mail addressed to itself, and determines whether a language model creation mail is received from mobile phone 200 or not. If a language model creation mail has been received (S3200: YES), the process proceeds to step S3400. If not received (S3200: NO), the process proceeds to step S4100 described later. Note that the user-specific language model creation process requires less immediacy than the voice recognition process, and therefore the process of step S3200 may be executed at predetermined intervals.

ステップＳ３３００以降では、音声認識処理が実行される。まず、音声認識部４４０は、ＩＤ受信部４４５が取得したＩＤ情報に従って、文章作成部４４３が参照すべき言語モデルを決定する。すなわち、ＩＤ情報をインデクスとして、いずれかのユーザ別言語モデル４３４が選択される。なお、データベース切換部４４２で行われる処理は、文章作成部４４３が音声認識データベース４３０を参照する際に行う処理の一部としてもよい。 In step S3300 and subsequent steps, voice recognition processing is executed. First, the voice recognition unit 440 determines a language model to be referred to by the text creation unit 443 according to the ID information acquired by the ID reception unit 445. That is, any user-specific language model 434 is selected using the ID information as an index. Note that the processing performed by the database switching unit 442 may be part of the processing performed when the text creation unit 443 refers to the speech recognition database 430.

ステップＳ３５００で、音声データ受信部４４１は、携帯電話機２００から音声データを受信したか否かを判断する。音声データを受信していない場合には（Ｓ３５００：ＮＯ）、処理はステップＳ３６００に進み、音声データを受信した場合には（Ｓ３５００：ＹＥＳ）、処理はステップＳ３７００に進み、受信した音声データの文章作成部４４３への入力が開始される。 In step S3500, audio data receiving unit 441 determines whether audio data has been received from mobile phone 200 or not. If the voice data has not been received (S3500: NO), the process proceeds to step S3600. If the voice data has been received (S3500: YES), the process proceeds to step S3700, and the sentence of the received voice data Input to the creation unit 443 is started.

ステップＳ３６００で、音声データ受信部４４１は、携帯電話機２００とのセッションが開始されてから音声データを受信しないまま所定の時間が経過してタイムアウトになったか否かを判断する。タイムアウトになっていない場合には（Ｓ３６００：ＮＯ）、処理はステップＳ３５００に戻り、音声データを受信しないままタイムアウトになった場合には（Ｓ３６００：ＹＥＳ）、処理は後述のステップＳ４１００に進む。 In step S3600, audio data receiving unit 441 determines whether or not a predetermined time has elapsed without receiving audio data since a session with mobile phone 200 has been started, and time-out has occurred. If not timed out (S3600: NO), the process returns to step S3500, and if timed out without receiving audio data (S3600: YES), the process proceeds to step S4100 described later.

ステップＳ３７００で、文章作成部４４３は、音声認識データベース４３０を参照して所定の音声認識処理を行い、テキストデータを作成する。そして、文章作成部４４３は、作成したテキストデータを、テキストデータ送信部４４４に出力する。このとき、データベース切換部４４２によりいずれかのユーザ別言語モデル４３４の使用が決定されている場合には、文章作成部４４３は、該当するユーザ別言語モデル４３４も用いる。 In step S3700, the sentence creation unit 443 performs predetermined speech recognition processing with reference to the speech recognition database 430 to create text data. Then, the sentence creation unit 443 outputs the created text data to the text data transmission unit 444. At this time, when the database switching unit 442 determines to use any one of the user-specific language models 434, the sentence creation unit 443 also uses the corresponding user-specific language model 434.

ステップＳ３８００で、テキストデータ送信部４４４は、入力されたテキストデータを、音声認識結果として、音声データの送信元の携帯電話機２００に送信する。そして、処理は、後述のステップＳ４１００に進む。 In step S3800, text data transmission unit 444 transmits the input text data as a voice recognition result to mobile phone 200 that is the transmission source of the voice data. Then, the process proceeds to step S4100 described later.

一方、ステップＳ３４００以降では、言語モデル作成処理が実行される。メール受信部４２１は、受信した言語モデル作成用メールから、メール本文のテキストデータおよびＩＤ情報を抽出し、言語モデル作成部４２３およびデータベース切換部４２２にそれぞれ出力する。 On the other hand, in step S3400 and subsequent steps, language model creation processing is executed. The mail receiving unit 421 extracts text data and ID information of the mail body from the received language model creation mail, and outputs them to the language model creation unit 423 and the database switching unit 422, respectively.

ステップＳ３９００で、データベース切換部４２２は、入力されたＩＤ情報をインデクスとして、言語モデル作成部４２３が作成の対象とすべきユーザ別言語モデル４３４を決定する。なお、データベース切換部４２２で行われる処理は、言語モデル作成部４２３がユーザ別言語モデル４３４を作成する際に行う処理の一部としてもよい。 In step S3900, the database switching unit 422 determines the user-specific language model 434 to be created by the language model creation unit 423 using the input ID information as an index. The process performed by the database switching unit 422 may be part of the process performed when the language model creation unit 423 creates the user-specific language model 434.

ステップＳ４０００で、言語モデル作成部４２３は、入力されたテキストデータに対して所定の統計情報処理を行い、ステップＳ３９００で決定されたユーザ別言語モデル４３４を、統計情報処理結果に基づいて更新または作成する。 In step S4000, the language model creation unit 423 performs predetermined statistical information processing on the input text data, and updates or creates the user-specific language model 434 determined in step S3900 based on the statistical information processing result. To do.

そして、ステップＳ４１００で、音声認識サーバ４００は、ユーザ操作等により音声認識に関する処理の終了を指示されたか否かを判断する。音声認識サーバ４００は、処理の終了を指示されていない場合には（Ｓ４１００：ＮＯ）、ステップＳ３１００へ戻り、指示された場合には（Ｓ４１００：ＹＥＳ）、一連の処理を終了する。 In step S4100, the voice recognition server 400 determines whether or not an instruction to end processing related to voice recognition is given by a user operation or the like. If the termination of the process is not instructed (S4100: NO), the voice recognition server 400 returns to step S3100, and if instructed (S4100: YES), the series of processes is terminated.

このように、音声認識サーバ４００は、携帯電話機２００から受信した言語モデル作成用メールに含まれるメール本文に基づいて、その送信元のＩＤ情報に対応付けたユーザ別言語モデル４３４を作成する。また、音声認識サーバ４００は、携帯電話機２００から受信した音声データに対して、その送信元のＩＤ情報をインデクスとしてユーザ別言語モデル４３４を参照して音声認識を行う。これにより、ユーザが作成した文字列を、既存の設備やアプリケーションソフトウェアの機能を最大限に活用して、そのユーザ用の音声モデルを作成する材料として収集することができる。これにより、システム構築のコストを抑えた状態で、ユーザ別の言語モデルを作成することができる。 As described above, the speech recognition server 400 creates the user-specific language model 434 associated with the ID information of the transmission source based on the mail text included in the language model creation mail received from the mobile phone 200. The voice recognition server 400 performs voice recognition on the voice data received from the mobile phone 200 with reference to the user-specific language model 434 using the ID information of the transmission source as an index. Thereby, the character string created by the user can be collected as a material for creating a voice model for the user by making the best use of the functions of existing facilities and application software. As a result, it is possible to create a language model for each user with the system construction cost suppressed.

以下、音声認識システム１００における各装置の処理および通信の流れについて、一例を挙げて説明する。 Hereinafter, the processing and communication flow of each device in the speech recognition system 100 will be described with an example.

図７は、音声認識システム１００における各装置の処理および通信の流れの一例を示すシーケンス図である。ここでは、説明の簡便化のため、音声認識部４４０、ユーザ別言語モデル作成部４２０、および音声認識データベース４３０を分離して取り扱うものとする。 FIG. 7 is a sequence diagram showing an example of the processing and communication flow of each device in the speech recognition system 100. Here, for simplification of explanation, it is assumed that the speech recognition unit 440, the user-specific language model creation unit 420, and the speech recognition database 430 are handled separately.

携帯電話機２００は、メール本文を作成するごとに（Ｓ５１００）、通常メールおよびＩＤ情報が付された言語モデル作成用メールをメールサーバ３００に送信し（Ｓ５２００）、メールサーバ３００は、これらのメールを保管する（Ｓ５３００）。この状態で、音声認識サーバ４００のユーザ別言語モデル作成部４２０が、メールサーバ３００にメール取得要求を行うと（Ｓ５４００）、メールサーバ３００は、宛先アドレスに音声認識サーバ４００のドメインが記述された言語モデル作成用メールを、ユーザ別言語モデル作成部４２０に返信する（Ｓ５５００）。 Each time the mobile phone 200 creates a mail text (S5100), the normal mail and a language model creation mail with ID information attached are sent to the mail server 300 (S5200), and the mail server 300 sends these mails. Store (S5300). In this state, when the user-specific language model creation unit 420 of the voice recognition server 400 makes a mail acquisition request to the mail server 300 (S5400), the mail server 300 describes the domain of the voice recognition server 400 in the destination address. The language model creation mail is returned to the user-specific language model creation unit 420 (S5500).

ユーザ別言語モデル作成部４２０は、受信した言語モデル作成用メールに付されたＩＤ情報に基づいてユーザ別言語モデル４３４を切り換え、メール本文の文字列に対して所定の統計情報処理を行い（Ｓ５６００）、音声認識データベース４３０を更新する（Ｓ５７００）。この結果、携帯電話機２００のＩＤ情報に対応付けられたユーザ別言語モデル４３４は、携帯電話機２００で作成されたメール本文の文脈を反映させる形で更新される（Ｓ５８００）。 The user-specific language model creation unit 420 switches the user-specific language model 434 based on the received ID information attached to the language model creation mail, and performs predetermined statistical information processing on the character string of the mail body (S5600). ), The voice recognition database 430 is updated (S5700). As a result, the user-specific language model 434 associated with the ID information of the mobile phone 200 is updated to reflect the context of the mail text created by the mobile phone 200 (S5800).

その後、携帯電話機２００が音声認識サーバ４００の音声認識部４４０とのセッションを確立し（Ｓ５９００）、自装置のＩＤ情報を音声認識部４４０に送信すると（Ｓ６０００）、携帯電話機２００は、音声入力および音声データからの特徴量抽出を開始し（Ｓ６１００）、音声認識部４４０は、受信したＩＤ情報に基づいてユーザ別言語モデル４３４を切り換える（Ｓ６２００）。そして、音声認識部４４０は、携帯電話機２００から音声データを受信し（Ｓ６３００）、更新された音声認識データベース４３０を参照して（Ｓ６４００）、所定の音声認識処理によりテキストデータを生成する（Ｓ６５００）。そして、音声認識部４４０は、生成したテキストデータを、携帯電話機２００に返信する（Ｓ６６００）。 Thereafter, when the mobile phone 200 establishes a session with the voice recognition unit 440 of the voice recognition server 400 (S5900) and transmits its own ID information to the voice recognition unit 440 (S6000), the mobile phone 200 receives voice input and The feature amount extraction from the voice data is started (S6100), and the voice recognition unit 440 switches the user-specific language model 434 based on the received ID information (S6200). Then, the voice recognition unit 440 receives voice data from the mobile phone 200 (S6300), refers to the updated voice recognition database 430 (S6400), and generates text data by a predetermined voice recognition process (S6500). . Then, the voice recognition unit 440 returns the generated text data to the mobile phone 200 (S6600).

携帯電話機２００は、音声認識結果の文字列を、メール本文の候補として表示するとともに、表示した文字列に対する編集を受け付け（Ｓ６７００）、通常メールおよび言語モデル作成用メールでメールサーバ３００に送信する（Ｓ６８００）。これらステップＳ６７００、Ｓ６８００の処理は、上記したステップＳ５１００、Ｓ５２００の処理に対応している。 The mobile phone 200 displays the character string of the voice recognition result as a mail text candidate, accepts editing of the displayed character string (S6700), and transmits it to the mail server 300 by normal mail and language model creation mail (S6700). S6800). The processes in steps S6700 and S6800 correspond to the processes in steps S5100 and S5200 described above.

このように、音声認識システム１００では、携帯電話機２００から送信されるメールのメール本文がＩＤ情報と対応付けてユーザ別言語モデル作成部４２０にも送信され、ユーザ別言語モデルの作成に使用される。 As described above, in the speech recognition system 100, the mail text of the mail transmitted from the mobile phone 200 is also transmitted to the user-specific language model creating unit 420 in association with the ID information, and used for creating the user-specific language model. .

以上説明したように、本実施の形態によれば、携帯電話機２００は、送信メールに含まれる文字列を、自装置のＩＤ情報に対応付けて、音声認識サーバ４００のユーザ別言語モデル作成部４２０にメールで送信する。また、携帯電話機２００は、音声データを、自装置のＩＤ情報に対応付けて、音声認識サーバの音声認識部４４０に送信する。ユーザ別言語モデル作成部４２０は、受信した文字列を学習して、送信元のＩＤ情報に対応付けたユーザ別言語モデル４３４を作成する。音声認識部４４０は、受信した音声データに対して、その送信元のＩＤ情報に対応付けられたユーザ別言語モデル４３４を用いて音声認識を行う。メールにより送信メールのメール本文を収集するので、既存のシステムに変更を加えることなく、ユーザが作成した文字列を極めて容易に言語モデルの作成に利用することができ、作成された言語モデルを音声認識処理に活用することができる。すなわち、文脈に依存して異なる表記についての音声認識の精度を、容易に向上させることができる。 As described above, according to the present embodiment, the mobile phone 200 associates the character string included in the outgoing mail with the ID information of the own device, and the language model creation unit 420 for each user of the speech recognition server 400. Send by email. In addition, the mobile phone 200 transmits the voice data to the voice recognition unit 440 of the voice recognition server in association with the ID information of the own device. The user-specific language model creation unit 420 learns the received character string and creates the user-specific language model 434 associated with the transmission source ID information. The voice recognition unit 440 performs voice recognition on the received voice data using the user-specific language model 434 associated with the transmission source ID information. Since the mail body of the outgoing mail is collected by e-mail, the character string created by the user can be used to create a language model very easily without changing the existing system. It can be used for recognition processing. That is, it is possible to easily improve the accuracy of speech recognition for different notations depending on the context.

なお、ユーザ別言語モデルおよび音声認識処理の種類は、上記内容に限定されるものではなく、文字列からその文脈に応じた内容で作成される各種言語モデルおよびこれを用いた各種音声認識処理を適用できることは勿論である。 Note that the language model for each user and the type of speech recognition process are not limited to the above contents, but various language models created with contents corresponding to the context from a character string and various speech recognition processes using the same. Of course, it can be applied.

また、ＩＤ情報の文字列および音声データへの対応付けは、直接にＩＤ情報を用いるのではなく、予めＩＤ情報に対応付けられた、メールアドレスなどの他の識別情報を用いることによって行うようにしてもよい。 In addition, the association of the ID information with the character string and the voice data is not performed by using the ID information directly, but by using other identification information such as an e-mail address associated with the ID information in advance. May be.

また、携帯電話機２００は、メール本文の文字列の音声認識サーバ４００への送信を、通常のメールを送信するごとにではなく、定期的にまたはユーザに指定されたタイミングで、一括して行うようにしてもよい。この場合には、携帯電話機２００は、例えば、送信メールを蓄積しておき、ユーザから選択を受け付け、選択されたメールの送信先を編集して、言語モデル作成用メールを作成してもよい。具体的には、例えば、ＴＯまたはＣＣの宛て先に言語モデル作成用アドレスを記述し、選択された送信メールのメール本文を格納した言語モデル作成用メールを作成する。これにより、ユーザが意図的に文脈を変えた送信メールを学習対象から除外することができ、音声認識の精度の更なる向上を図ることができる。 In addition, the mobile phone 200 transmits the character string of the mail text to the voice recognition server 400 at once or at a timing designated by the user at a time, not every time a normal mail is transmitted. It may be. In this case, for example, the cellular phone 200 may accumulate transmission mail, accept selection from the user, edit the transmission destination of the selected mail, and create a language model creation mail. More specifically, for example, a language model creation address is written at the destination of TO or CC, and a language model creation mail is created in which the mail text of the selected outgoing mail is stored. As a result, it is possible to exclude a transmission mail whose context has been intentionally changed by the user from the learning target, and to further improve the accuracy of voice recognition.

また、携帯電話機２００は、送信メールを蓄積しない場合でも、本来の送信メールとは別個に、メール本文をコピーし言語モデル作成用アドレスをＴＯまたはＣＣの宛先としたメールを送信するようにしてもよい。 Further, even when the mobile phone 200 does not store the outgoing mail, the mobile phone 200 may copy the mail text and send the mail with the language model creation address as the TO or CC destination separately from the original outgoing mail. Good.

更に、携帯電話機２００は、言語モデル作成用アドレスをＴＯの宛先として記述する場合には、送信メールの本来の宛先をメールに含めないようにしてもよい。これにより、送信メールの本来の宛先が音声認識サーバ４００側に漏洩するのを防ぐことができ、携帯電話機２００のユーザおよびメール送信相手のプライバシー保護を図ることができる。 Further, when the language model creation address is described as the TO destination, the mobile phone 200 may not include the original destination of the outgoing mail in the mail. Thereby, it is possible to prevent the original destination of the outgoing mail from leaking to the voice recognition server 400 side, and it is possible to protect the privacy of the user of the mobile phone 200 and the other party of the outgoing mail.

（実施の形態２）
次いで、本発明の実施の形態２に係る、分散型音声認識システムに用いる端末装置について説明する。実施の形態１との相違点は、言語モデル作成用メールの作成対象に、受信メールを追加して、音声認識の精度の更なる向上を図るようにしたことにある。 (Embodiment 2)
Next, a terminal device used in the distributed speech recognition system according to Embodiment 2 of the present invention will be described. The difference from the first embodiment is that the received mail is added to the language model creation mail to be created to further improve the accuracy of voice recognition.

家族や親しい友人などとの間で、同じ話題について会話感覚でメールのやり取りが行われる場合、相手からの受信頻度や返信の確率は高くなる。また、このようなメール通信では、メール本文の文脈もお互いに類似していることが多い。ユーザが作成する文字列と文脈が類似している文字列をより多く収集できれば、より短時間で音声認識の精度を向上させることができる。そこで、受信頻度の高い他のユーザからの受信メールを、言語モデル作成用メールとして追加する場合について説明する。 When emails are exchanged between family members and close friends on the same topic in a conversational manner, the frequency of reception from the other party and the probability of replying are high. In such mail communication, the context of the mail text is often similar to each other. If more character strings whose context is similar to the character string created by the user can be collected, the accuracy of speech recognition can be improved in a shorter time. Therefore, a case will be described in which received mail from another user with high reception frequency is added as language model creation mail.

実施の形態２に係る携帯電話機２００は、例えば実施の形態１の図２に示す携帯電話機２００と同様の構成を有する。ただし、メール処理部２０５は、実施の形態１で説明した処理に加えて、以下に説明する受信メール転送処理を実行する。 Mobile phone 200 according to Embodiment 2 has the same configuration as mobile phone 200 shown in FIG. 2 of Embodiment 1, for example. However, the mail processing unit 205 executes a received mail transfer process described below in addition to the process described in the first embodiment.

受信メール転送処理において、メール処理部２０５は、他の装置からメールを受信するごとに、その受信メールの送信元アドレスを記録する。また、メール処理部２０５は、他の装置からメールを受信するごとに、過去の記録データから、閾値との比較などにより、その受信メールの相手先からのメール受信頻度が高いか否かを判断する。そして、メール受信頻度が高い場合には、メール処理部２０５は、その送信元からの受信メールのメール本文の文字列を記述した言語モデル作成用メールを作成するとともに、その宛先として、実施の形態１と同様の言語モデル作成用アドレスを指定する。これにより、受信メールのコピーメールが、音声認識サーバ４００に転送される。 In the received mail transfer process, the mail processing unit 205 records the source address of the received mail every time mail is received from another device. In addition, each time a mail is received from another device, the mail processing unit 205 determines whether or not the frequency of receiving the received mail from the other party is high, based on past recorded data, by comparison with a threshold value or the like. To do. When the mail reception frequency is high, the mail processing unit 205 creates a language model creation mail describing the character string of the mail text of the received mail from the transmission source, and the destination is the embodiment. The same language model creation address as 1 is specified. Thereby, a copy mail of the received mail is transferred to the voice recognition server 400.

このように、本実施の形態によれば、送信メールのみならず、文脈の類似した受信メールのメール本文も、ＩＤ情報に対応付けて音声認識サーバ４００に送信される。これにより、音声認識サーバ４００のユーザ別言語モデル作成部４２０は、送信メールのみを用いる場合に比べて、同じ話題および同じ文章表現という観点における十分な統計量を、より短時間で収集することができる。すなわち、より短時間で、音声認識の精度を向上させることができる。 As described above, according to the present embodiment, not only the transmitted mail but also the mail text of the received mail having a similar context is transmitted to the voice recognition server 400 in association with the ID information. As a result, the user-specific language model creation unit 420 of the speech recognition server 400 can collect sufficient statistics in terms of the same topic and the same sentence expression in a shorter time than when only the outgoing mail is used. it can. That is, the accuracy of voice recognition can be improved in a shorter time.

なお、受信メールについても、受信メール単位で選択して、音声認識サーバ４００に一括して送信するようにしてもよい。これにより、相手が意図的に文脈を変えている受信メールや、受信頻度は高いものの文脈が例外的に異なるような受信メールを、学習対象から除外することができ、音声認識の精度の更なる向上を図ることができる。 Note that received mail may also be selected in units of received mail and sent to the voice recognition server 400 in a batch. This makes it possible to exclude incoming emails whose recipients have intentionally changed the context, or incoming emails that have a high frequency of reception but have exceptionally different contexts, and further improve the accuracy of speech recognition. Improvements can be made.

また、いたずらメールなどを考慮して、送信頻度も高い相手先であることを、メール本文の送信の条件としてもよい。 In addition, in consideration of mischievous mail and the like, it is also possible to use a destination with a high transmission frequency as a condition for transmitting the mail text.

また、自己のメールのメール本文を学習対象として音声認識サーバ４００に送信することについて、許可を得た相手からの受信メールに限定して、メール本文の送信を行うようにしてもよい。 In addition, regarding the transmission of the mail text of its own mail to the speech recognition server 400 as a learning target, the mail text may be transmitted only to received mail from a partner who has obtained permission.

（実施の形態３）
次いで、本発明の実施の形態３に係る、分散型音声認識システムに用いる端末装置について説明する。実施の形態１との相違点は、送信メールの相手先をグループ化し、グループごとにユーザ別言語モデルを作成するようにしたことにある。 (Embodiment 3)
Next, a terminal device used in the distributed speech recognition system according to Embodiment 3 of the present invention will be described. The difference from the first embodiment is that destinations of outgoing mail are grouped and a language model for each user is created for each group.

例えば、家族に対するメール、友人に対するメール、仕事関係の人に対するメールとでは、使用される単語や文体表現が異なるのが通常である。すなわち、一人のユーザが送信するメールでも、メールの送信相手によって、メール本文の文脈は異なる。したがって、送信メールの文脈が類似する相手先をグループ化し、グループごとにユーザ別言語モデル４３４を作成することで、音声認識精度を更に向上させることが可能となる。そこで、送信メールの文脈が類似する相手先のグループごとに、ユーザ別言語モデル４３４を作成する場合について説明する。 For example, in general, the word and style used are different for mail for family members, mail for friends, and mail for work-related people. That is, even in a mail transmitted by one user, the context of the mail text differs depending on the mail transmission partner. Therefore, it is possible to further improve the voice recognition accuracy by grouping destinations having similar email contexts and creating a user-specific language model 434 for each group. Therefore, a case will be described in which a user-specific language model 434 is created for each group of destinations having a similar context of outgoing mail.

実施の形態３に係る携帯電話機２００は、例えば実施の形態１の図２に示す携帯電話機２００と同様の構成を有する。ただし、メール処理部２０５は、実施の形態１で説明した処理に加えて、以下に説明する相手先グルーピング処理を実行する。また、メール処理部２０５および音声データ送信部２０８は、携帯電話機２００のＩＤ情報に加えて、相手先グルーピング処理においてグループごとに設定されたグループＩＤを、音声認識サーバ４００に送信する。 Mobile phone 200 according to Embodiment 3 has the same configuration as mobile phone 200 shown in FIG. 2 of Embodiment 1, for example. However, the mail processing unit 205 executes the partner grouping process described below in addition to the process described in the first embodiment. In addition to the ID information of the mobile phone 200, the mail processing unit 205 and the voice data transmission unit 208 transmit the group ID set for each group in the destination grouping process to the voice recognition server 400.

携帯電話機２００は、電話番号やメールアドレスを相手先ごとに登録した電話帳を有している。電話帳に登録された相手先は、検索および管理の便宜のために、「家族」、「友人」、「会社関係」など、予め用意されたグループに振り分けられている。 The mobile phone 200 has a telephone directory in which telephone numbers and mail addresses are registered for each destination. The destinations registered in the telephone directory are allocated to groups prepared in advance, such as “family”, “friends”, and “company relations” for convenience of search and management.

相手先グルーピング処理において、メール処理部２０５は、電話帳のグループを、送信メールの文脈が類似する相手先のグループとして扱い、電話帳のグループのそれぞれに、グループＩＤを設定する。なお、このグループＩＤは、各グループに予め割り当てられている識別情報を用いてもよい。 In the destination grouping process, the mail processing unit 205 treats the phone book group as a destination group having a similar context in the transmitted mail, and sets a group ID for each of the phone book groups. The group ID may be identification information assigned in advance to each group.

メール処理部２０５は、言語モデル作成用メールを送信する際に、通常メールの送信先が属するグループに設定されたグループＩＤを、自装置のＩＤ情報と共に言語モデル作成用メールに付加する。例えば、メール処理部２０５は、実施の形態１で説明した言語モデル作成用アドレスのアカウント部分に、グループＩＤを追加して記述する。 When transmitting the language model creation mail, the mail processing unit 205 adds the group ID set to the group to which the normal mail transmission destination belongs to the language model creation mail together with the ID information of the own device. For example, the mail processing unit 205 adds a group ID to the account part of the language model creation address described in the first embodiment.

また、音声データ送信部２０８は、音声認識サーバ４００とのセッション確立の際に、通常メールの送信先が属するグループに設定されたグループＩＤを、自装置のＩＤ情報と共に音声認識サーバ４００に送信する。例えば、音声データ送信部２０８は、自装置のＩＤ情報にグループＩＤを追加した情報を、音声認識サーバ４００に送信する。 Also, when establishing a session with the voice recognition server 400, the voice data transmission unit 208 transmits the group ID set to the group to which the normal mail transmission destination belongs to the voice recognition server 400 together with the ID information of the own device. . For example, the voice data transmission unit 208 transmits information obtained by adding the group ID to the ID information of the own device to the voice recognition server 400.

この場合、音声認識サーバ４００のユーザ別言語モデル作成部４２０は、ＩＤ情報とグループＩＤとの組み合わせにより構成される情報に対応付けて、ユーザ別言語モデル４３４を作成する。また、音声認識サーバ４００の音声認識部４４０は、ＩＤ情報とグループＩＤとの組み合わせにより構成される情報をインデクスとして、ユーザ別言語モデル４３４を参照する。 In this case, the user-specific language model creation unit 420 of the speech recognition server 400 creates the user-specific language model 434 in association with information configured by a combination of ID information and a group ID. Further, the speech recognition unit 440 of the speech recognition server 400 refers to the user-specific language model 434 using information configured by a combination of ID information and group ID as an index.

このように、本実施の形態によれば、複数のＩＤ情報を、送信メールの送信先に応じて切り換えて、送信メールに含まれる文字列に対応付ける。これにより、ユーザごとかつメール送信先ごとに異なる文脈を考慮して音声認識を行うことができ、個々のメール作成における音声認識精度を向上させることができる。 As described above, according to the present embodiment, a plurality of ID information is switched according to the transmission destination of the transmission mail and associated with the character string included in the transmission mail. Thereby, speech recognition can be performed in consideration of different contexts for each user and for each mail transmission destination, and the speech recognition accuracy in creating each mail can be improved.

（実施の形態４）
次いで、本発明の実施の形態４に係る、分散型音声認識システムに用いる言語モデル作成装置について説明する。実施の形態１との相違点は、受信した言語モデル作成用メールに、読みが不明な単語（以下「未知語」という）が含まれているときに、その未知語の読みを解決するようにしたことである。 (Embodiment 4)
Next, a language model creation device used in a distributed speech recognition system according to Embodiment 4 of the present invention will be described. The difference from the first embodiment is that, when the received language model creation mail includes a word whose reading is unknown (hereinafter referred to as “unknown word”), the reading of the unknown word is solved. It is that.

ここで、未知語を含め、単語とは、文字、文字列、記号、記号列、画像、アニメーション等、メール本文として挿入可能な情報であって、読みを設定することにより音声入力可能とすべきものの全てを含む概念とする。 Here, including unknown words, words are information that can be inserted as the body of an e-mail, such as characters, character strings, symbols, symbol strings, images, animations, etc. The concept includes all of the above.

図８は、本発明の実施の形態４に係る音声認識サーバの構成を示すブロック図であり、実施の形態１の図３に対応するものである。図３と同一部分には同一符号を付し、これについての説明を省略する。 FIG. 8 is a block diagram showing the configuration of the speech recognition server according to the fourth embodiment of the present invention, and corresponds to FIG. 3 of the first embodiment. The same parts as those in FIG.

図８に示すように、音声認識サーバ４００ａは、未知語処理部４５０ａを有する。 As shown in FIG. 8, the speech recognition server 400a includes an unknown word processing unit 450a.

未知語処理部４５０ａは、言語モデル作成用メールに含まれる未知語の読みを解決する。未知語処理部４５０ａは、未知語検出部４５１ａ、問合メール送受信部４５２ａ、および辞書登録部４５３ａを有する。 The unknown word processing unit 450a solves the reading of unknown words included in the language model creation mail. The unknown word processing unit 450a includes an unknown word detection unit 451a, an inquiry mail transmission / reception unit 452a, and a dictionary registration unit 453a.

未知語検出部４５１ａは、文書抽出部４２４から、言語モデル作成用メールのメール本文を入力し、メール本文の未知語を検出する。具体的には、未知語検出部４５１ａは、入力したメール本文に含まれる個々の単語を、音声認識データベース４３０の辞書４３２で検索する。そして、未知語検出部４５１ａは、辞書４３２に存在しない単語を、未知語として検出する。 The unknown word detection unit 451a inputs the mail body of the language model creation mail from the document extraction unit 424, and detects an unknown word in the mail body. Specifically, the unknown word detection unit 451a searches the dictionary 432 of the speech recognition database 430 for individual words included in the input mail text. And the unknown word detection part 451a detects the word which does not exist in the dictionary 432 as an unknown word.

問合メール送受信部４５２ａは、未知語検出部４５１ａで検出された未知語を、その未知語の送信元のユーザにメールで問合せ、問合せ結果を、辞書４３２に登録する。具体的には、問合メール送受信部４５２ａは、未知語を示してその未知語の読みを問い合わせる内容のメール（以下「問合メール」という）を作成し、未知語が含まれていた言語モデル作成用メールの送信元に送信する。そして、問合メール送受信部４５２ａは、問合メールに対する応答として、未知語の読みを記述したメール（以下「応答メール」という）を受信すると、応答メールから、未知語の読みを抽出する。 The inquiry mail transmission / reception unit 452a queries the unknown word detected by the unknown word detection unit 451a by mail to the user who transmitted the unknown word, and registers the query result in the dictionary 432. Specifically, the inquiry mail transmission / reception unit 452a creates an email (hereinafter referred to as “inquiry mail”) that indicates an unknown word and inquires about reading of the unknown word, and includes the unknown language. Sent to the sender of the creation email. When the inquiry mail transmission / reception unit 452a receives a mail describing an unknown word reading (hereinafter referred to as “response mail”) as a response to the inquiry mail, it extracts the unknown word reading from the response mail.

辞書登録部４５３ａは、問合メール送受信部４５２ａで抽出された未知語の読みを、未知語と対応付けて辞書４３２に登録する。 The dictionary registration unit 453a registers the unknown word reading extracted by the inquiry mail transmission / reception unit 452a in the dictionary 432 in association with the unknown word.

このような音声認識サーバ４００ａによれば、受信した言語モデル作成用メールに未知語が含まれているときに、その未知語の読みを解決することができる。したがって、該当する読みの音声データを受信したときに、適切な単語を音声認識結果として得ることができる。 According to such a speech recognition server 400a, when an unknown word is included in the received language model creation mail, reading of the unknown word can be solved. Therefore, an appropriate word can be obtained as a voice recognition result when the corresponding reading voice data is received.

以下、音声認識サーバ４００ａの動作について説明する。 Hereinafter, the operation of the voice recognition server 400a will be described.

図９は、音声認識サーバ４００ａの動作の流れを示すフローチャートであり、実施の形態１の図６に対応するものである。図６と同一部分には同一ステップ番号を付し、これについての説明を省略する。 FIG. 9 is a flowchart showing an operation flow of the voice recognition server 400a, and corresponds to FIG. 6 of the first embodiment. The same steps as those in FIG. 6 are denoted by the same step numbers, and description thereof will be omitted.

文書抽出部４２４で言語モデル作成用メールからメール本文のテキストデータが抽出されると（Ｓ３４００）、処理はステップＳ３８１０ａに進む。このとき、文書抽出部４２４は、未知語処理部４５０ａの未知語検出部４５１ａに対して、抽出したテキストデータと、送信元のメールアドレスとを出力する。 When the text data of the mail body is extracted from the language model creation mail by the document extraction unit 424 (S3400), the process proceeds to step S3810a. At this time, the document extraction unit 424 outputs the extracted text data and the e-mail address of the transmission source to the unknown word detection unit 451a of the unknown word processing unit 450a.

ステップＳ３８１０ａで、未知語検出部４５１ａは、辞書４３２を参照して、文書抽出部４２４から入力されたテキストデータに未知語が存在するか否かを判断する。未知語が存在しない場合には（Ｓ３８１０ａ：ＮＯ）、処理はステップＳ３９００に進む。未知語が存在する場合には（Ｓ３８１０ａ：ＹＥＳ）、処理はステップＳ３８２０ａに進む。このとき、未知語検出部４５１ａは、未知語と、未知語の送信元のメールアドレスとを、問合メール送受信部４５２ａに出力する。 In step S3810a, the unknown word detection unit 451a refers to the dictionary 432 and determines whether an unknown word exists in the text data input from the document extraction unit 424. If there is no unknown word (S3810a: NO), the process proceeds to step S3900. If an unknown word exists (S3810a: YES), the process proceeds to step S3820a. At this time, the unknown word detection unit 451a outputs the unknown word and the email address of the unknown word transmission source to the inquiry mail transmission / reception unit 452a.

ステップＳ３８２０ａで、問合メール送受信部４５２ａは、問合メールを、未知語検出部４５１ａから入力されたメールアドレスを宛先として送信する。このとき、問合メール送受信部４５２ａは、送信元アドレスとして、言語モデル作成用アドレスとは異なる、未知語解決用のアドレス（以下「未知語解決用アドレス」という）を設定することが望ましい。これにより、言語モデル作成用メールと応答メールとを区別して取り扱うことが容易となる。そして、処理はステップＳ３９００に進み、言語モデル作成用メールに基づいてユーザ言語モデルの更新等が行われる。 In step S3820a, the inquiry mail transmission / reception unit 452a transmits the inquiry mail with the mail address input from the unknown word detection unit 451a as a destination. At this time, it is desirable that the inquiry mail transmission / reception unit 452a sets an address for unknown word resolution (hereinafter referred to as “an unknown word resolution address”) different from the language model creation address as a transmission source address. This makes it easy to distinguish between language model creation mail and response mail. Then, the process proceeds to step S3900, where the user language model is updated based on the language model creation mail.

一方、メール受信部４２１が言語モデル作成用メールを受信していない場合には（Ｓ３２００：ＮＯ）、処理はステップＳ３２１０ａに進む。 On the other hand, if the mail receiving unit 421 has not received the language model creation mail (S3200: NO), the process proceeds to step S3210a.

ステップＳ３２１０ａで、問合メール送受信部４５２ａは、過去の問合メールに対する応答メールを受信したか否かを判断する。問合メール送受信部４５２ａが応答メールを受信していない場合には（Ｓ３２１０ａ：ＮＯ）、処理はステップＳ４１００に進む。問合メール送受信部４５２ａが応答メールを受信した場合には（Ｓ３２１０ａ：ＹＥＳ）、処理はステップＳ３２２０ａに進む。このとき、問合メール送受信部４５２ａは、受信した応答メールに記述された未知語および未知語の読みを抽出して、辞書登録部４５３ａに出力する。 In step S3210a, inquiry mail transmission / reception unit 452a determines whether a response mail for a past inquiry mail has been received or not. If the inquiry mail transmission / reception unit 452a has not received a response mail (S3210a: NO), the process proceeds to step S4100. When the inquiry mail transmission / reception unit 452a receives the response mail (S3210a: YES), the process proceeds to step S3220a. At this time, the inquiry mail transmission / reception unit 452a extracts the unknown word and the reading of the unknown word described in the received response mail, and outputs them to the dictionary registration unit 453a.

ステップＳ３２２０ａで、辞書登録部４５３ａは、問合メール送受信部４５２ａから入力された読みを、同じく問合メール送受信部４５２ａから入力された未知語に対応付けて、辞書４３２に登録する。 In step S3220a, the dictionary registration unit 453a registers the reading input from the inquiry mail transmission / reception unit 452a in the dictionary 432 in association with the unknown word input from the inquiry mail transmission / reception unit 452a.

このような動作により、音声認識サーバ４００ａは、未知語の読みを解決することができる。 With such an operation, the voice recognition server 400a can solve reading of unknown words.

図１０は、問合せメールと、この問合メールに対応して生成される応答メールの記述内容の一例を示す図である。ここでは、「ＡＭＩ」という単語が未知語として検出され、ユーザが「ＡＭＩ」に対して「あみ」という読みを希望する場合を例示する。 FIG. 10 is a diagram showing an example of description contents of an inquiry mail and a response mail generated corresponding to the inquiry mail. Here, the case where the word “AMI” is detected as an unknown word and the user wishes to read “Ami” with respect to “AMI” is illustrated.

図１０に示すように、問合メール６３０ａには、例えば、送信元アドレスとして、言語モデル作成用アドレスとは異なる未知語解決用アドレス「０２＠ＳＲｓｅｒｖｅｒ．ｎｅ．ｊｐ」が記述される。また、問合メール６３０ａには、例えば、件名（ＳＵＢＪＥＣＴ）として、「ＡＭＩの読みを本文に入力し、返信して下さい」という指示が記述される。携帯電話機２００は、受信した問合メール６３０ａの記述内容を表示する。 As shown in FIG. 10, in the inquiry mail 630a, for example, an unknown word resolution address “02@SRserver.ne.jp” different from the language model creation address is described as a transmission source address. In the inquiry mail 630a, for example, an instruction “Please input AMI reading in the text and reply” is written as a subject (SUBJECT). The mobile phone 200 displays the description content of the received inquiry mail 630a.

そして、ユーザが、問合メール６３０ａの指示に従うと、応答メール６４０ａには、メール本文として、「あみ」が記述される。 When the user follows the instruction of the inquiry mail 630a, “Ami” is described in the response mail 640a as the mail text.

問合メール送受信部４５２ａは、応答メール６４０ａを受信すると、件名に記述された未知語、つまり、「の読みを」の直前であって「Ｒｅ：」を取り除いた部分を未知語として抽出し、メール本文に記述されたテキストを未知語の読みとして検出する。この結果、音声認識サーバ４００ａの辞書４３２には、「ＡＭＩ」という単語に対応付けて、「あみ」という読みが登録される。この結果、「ＡＭＩ」も言語モデル作成に利用可能となるとともに、ユーザが「あみ」と発声したときに、「ＡＭＩ」という音声認識結果が得られるようになる。 When the inquiry mail transmission / reception unit 452a receives the response mail 640a, the inquiry mail transmission / reception unit 452a extracts the unknown word described in the subject, that is, the part immediately before “Reading” and excluding “Re:” as an unknown word, The text described in the mail body is detected as an unknown word reading. As a result, a reading “ami” is registered in the dictionary 432 of the speech recognition server 400a in association with the word “AMI”. As a result, “AMI” can also be used to create a language model, and a voice recognition result “AMI” can be obtained when the user utters “Ami”.

なお、未知語が長く、件名として記入することができない場合を考慮して、問合せメールを、例えば、「本文に、『ＡＭＩ』に続けて『ＡＭＩ』の読みを入力した文書を、返信して下さい」という指示をメール本文に記述したものとしてもよい。この場合には、応答メールには、メール本文として、「ＡＭＩあみ」と記述されることになる。未知語と読みの分離は、例えば、メール本文の前方から未知語を検索すれば可能である。また、未知語と読みとの間に、「：」等の予め定められた文字や記号を挿入するようにすれば、未知語と読みとの分離が容易となる。 In consideration of the case where the unknown word is long and cannot be entered as the subject, reply with an inquiry e-mail, for example, “a text with“ AMI ”followed by“ AMI ”reading” Please indicate "Please" in the body of the email. In this case, “AMI Ami” is described as the mail body in the response mail. For example, the unknown word and the reading can be separated by searching the unknown word from the front of the mail text. Further, if a predetermined character or symbol such as “:” is inserted between the unknown word and the reading, the unknown word and the reading can be easily separated.

また、問合メールの送信先と問い合わせの対象となった未知語とを対応付けておき、応答メールがどの未知語に対する応答であるかを、この対応付けに基づいて判断するようにしてもよい。これにより、応答メールに未知語が正しく記述されていない場合でも、未知語および読みの抽出を行うことができる。 Further, the destination of the inquiry mail and the unknown word that is the target of the inquiry may be associated with each other, and it may be determined based on this association to which unknown word the response mail is a response. . Thereby, even when unknown words are not correctly described in the response mail, unknown words and readings can be extracted.

このように、本実施の形態によれば、学習対象に読みが不明な未知語が存在する場合に、その未知語の読みを解決して辞書に登録する。これにより、絵文字、顔文字、アニメーション等、メールで多用される一方で一般的な読みが定着していないような単語であっても、音声入力が可能となる。すなわち、一般的な読みが定着していないような単語をユーザが発話する場合に、音声認識率を向上させることができる。また、メールでの逐次の問い合わせにより未知語解決を行うので、未知語が検出されてから早期にその読みを解決することができ、迅速な音声認識の精度向上が可能となる。 Thus, according to the present embodiment, when an unknown word whose reading is unknown exists in the learning target, the reading of the unknown word is resolved and registered in the dictionary. As a result, even words such as pictograms, emoticons, animations, etc. that are frequently used in e-mails but are not fixed in general reading can be input by voice. That is, the speech recognition rate can be improved when the user utters a word for which general reading is not fixed. In addition, since unknown words are resolved by sequential inquiries by e-mail, the reading can be resolved early after the unknown words are detected, and the accuracy of rapid speech recognition can be improved.

なお、表記と読みが対にして登録されたバックグラウンド辞書を音声認識用の辞書４３２とは別に用意しておき、単語が、音声認識用の辞書４３２に未登録であって、バックグラウンド辞書にも未登録の場合にのみ、問合せを行うようにしてもよい。バックグラウンド辞書は、音声認識サーバ４００ａに格納してもよいし、音声認識サーバ４００ａからアクセス可能な他のサーバに格納してもよい。 Note that a background dictionary in which notation and reading are registered as a pair is prepared separately from the dictionary 432 for speech recognition, and the word is not registered in the dictionary 432 for speech recognition and is stored in the background dictionary. Inquiries may be made only when they are not registered. The background dictionary may be stored in the voice recognition server 400a, or may be stored in another server accessible from the voice recognition server 400a.

また、未知語処理部４５０ａの一部または全てを、ネットワーク上の他の装置に配置するようにしてもよい。 Moreover, you may make it arrange | position some or all of the unknown word process parts 450a to the other apparatus on a network.

（実施の形態５）
次いで、本発明の実施の形態５に係る、分散型音声認識システムに用いる言語モデル作成装置について説明する。本実施の形態においても、実施の形態４と同様に未知語の解決を行うが、実施の形態４との相違点は、ウェブ上に用意したＧＵＩ（graphical user interface）によって、未知語の読みを解決するようにしたことにある。 (Embodiment 5)
Next, a language model creation apparatus used for a distributed speech recognition system according to Embodiment 5 of the present invention will be described. Also in the present embodiment, unknown words are resolved in the same manner as in the fourth embodiment. The difference from the fourth embodiment is that the unknown words are read by a GUI (graphical user interface) prepared on the web. It is to solve it.

図１１は、本発明の実施の形態５に係る音声認識サーバの構成を示すブロック図であり、実施の形態４の図８に対応するものである。図８と同一部分には同一符号を付し、これについての説明を省略する。 FIG. 11 is a block diagram showing a configuration of the speech recognition server according to the fifth embodiment of the present invention, and corresponds to FIG. 8 of the fourth embodiment. The same parts as those in FIG. 8 are denoted by the same reference numerals, and description thereof will be omitted.

図１１に示すように、音声認識サーバ４００ｂは、未知語処理部４５０ｂを有する。 As shown in FIG. 11, the speech recognition server 400b includes an unknown word processing unit 450b.

未知語処理部４５０ｂは、言語モデル作成用メールに含まれる未知語の読みを解決する。未知語処理部４５０ｂは、実施の形態４の未知語処理部４５０ａの問合メール送受信部４５２ａに代えて、未知語蓄積部４５４ｂおよびＧＵＩ処理部４５５ｂを有する。 The unknown word processing unit 450b solves the reading of unknown words included in the language model creation mail. The unknown word processing unit 450b includes an unknown word storage unit 454b and a GUI processing unit 455b instead of the inquiry mail transmission / reception unit 452a of the unknown word processing unit 450a of the fourth embodiment.

未知語蓄積部４５４ｂは、未知語検出部４５１ａで検出された未知語を、その未知語の読みが解決されるまで、その未知語の送信元のＩＤ情報と対応付けて蓄積する。 The unknown word accumulation unit 454b accumulates the unknown word detected by the unknown word detection unit 451a in association with the ID information of the transmission source of the unknown word until the reading of the unknown word is resolved.

ＧＵＩ処理部４５５ｂは、ユーザがウェブ上で自由にアクセスして未知語の読みを登録するためのグラフィカルユーザインタフェース（以下「未知語登録サイト」という）を構築する。この未知語登録サイトは、例えば、ＣＧＩ（common gateway interface）を用いて構築され、ＨＴＴＰ（hypertext transfer protocol）により通信網５００を介して携帯電話機２００からアクセス可能となっている。ＧＵＩ処理部４５５ｂは、未知語登録サイトにおいて、未知語蓄積部４５４ｂに蓄積された未知語のうち、アクセス元のユーザが作成した言語モデル作成用メールから抽出された未知語を表示し、表示した未知語に対する読みの入力を受け付ける。 The GUI processing unit 455b constructs a graphical user interface (hereinafter referred to as “unknown word registration site”) for a user to freely access on the web and register readings of unknown words. This unknown word registration site is constructed using, for example, a common gateway interface (CGI), and can be accessed from the mobile phone 200 via the communication network 500 using a hypertext transfer protocol (HTTP). In the unknown word registration site, the GUI processing unit 455b displays and displays the unknown words extracted from the language model creation mail created by the access source user among the unknown words stored in the unknown word storage unit 454b. Accepts readings for unknown words.

辞書登録部４５３ａは、上述の未知語登録サイトで入力された未知語の読みを、未知語と対応付けて辞書４３２に登録する。 The dictionary registration unit 453a registers the unknown word reading input at the unknown word registration site in the dictionary 432 in association with the unknown word.

このような音声認識サーバ４００ｂによれば、受信した言語モデル作成用メールに未知語が含まれているときに、その未知語の読みを解決することができる。また、ユーザが望むタイミングで、未知語の読みの登録を行うことができる。 According to such a speech recognition server 400b, when an unknown word is included in the received language model creation mail, reading of the unknown word can be solved. Also, the unknown word reading can be registered at the timing desired by the user.

以下、音声認識サーバ４００ｂの動作について説明する。 Hereinafter, the operation of the voice recognition server 400b will be described.

まず、未知語登録サイトの動作について説明し、その後、音声認識サーバ４００ｂの全体動作について説明する。 First, the operation of the unknown word registration site will be described, and then the overall operation of the voice recognition server 400b will be described.

未知語登録サイトは、まず、ユーザのログイン処理として、アクセス元のＩＤ情報の取得を行う。この取得は、ウェブ画面上でユーザに入力を促すことにより行ってもよいし、携帯電話機２００から製造番号等の情報を取得することにより行ってもよい。そして、未知語登録サイトは、取得したＩＤ情報に対応付けられた全ての未知語を、未知語蓄積部４５４ｂから抽出し、抽出した未知語を、個別に選択可能な状態で、ウェブ画面上に一覧表示する。そして、未知登録サイトは、いずれかの未知語が選択されると、未知語の読みを入力するための読み入力画面に遷移する。未知語登録サイトは、読み入力画面で読みが入力され、決定ボタンのクリック等の決定操作が行われると、未知語と入力された読みとを対にして一時的に記憶し、未知語の一覧表示画面に戻る。そして、登録ボタンのクリック等の登録操作、または、ユーザのログアウト処理が行われると、未知語登録サイトは、入力された未知語の読みを、登録対象として取得する。 The unknown word registration site first obtains access source ID information as a user login process. This acquisition may be performed by prompting the user to input on the web screen, or may be performed by acquiring information such as a manufacturing number from the mobile phone 200. Then, the unknown word registration site extracts all unknown words associated with the acquired ID information from the unknown word storage unit 454b, and the extracted unknown words can be individually selected on the web screen. Display a list. Then, when any unknown word is selected, the unknown registration site transitions to a reading input screen for inputting an unknown word reading. When an unknown word registration site inputs a reading on the reading input screen and performs a decision operation such as clicking the decision button, the unknown word and the inputted reading are temporarily stored in pairs and a list of unknown words is stored. Return to the display screen. When a registration operation such as clicking a registration button or a logout process of the user is performed, the unknown word registration site acquires the input reading of the unknown word as a registration target.

なお、未知語登録サイトは、携帯電話機２００ではなく、パーソナルコンピュータ等の端末からアクセス可能としてもよい。携帯電話機２００からのアクセスを想定した場合、未知語登録サイトは、小さい表示画面での視認性や限られたキースイッチによる操作性を考慮して、上述のように一覧表示画面と読み入力画面とを切り替えることが望ましい。しかし、パーソナルコンピュータ等の端末からのアクセスを想定した場合には、未知語登録サイトは、１つの画面上で未知語を一覧表示と読み入力とを行えるようにしてもよい。 The unknown word registration site may be accessible from a terminal such as a personal computer instead of the mobile phone 200. Assuming access from the mobile phone 200, the unknown word registration site considers the visibility on a small display screen and the operability by a limited key switch as described above. It is desirable to switch. However, when an access from a terminal such as a personal computer is assumed, the unknown word registration site may be able to display a list of unknown words and input them on one screen.

図１２は、音声認識サーバ４００ｂの動作の流れを示すフローチャートであり、実施の形態４の図９に対応するものである。図９と同一部分には同一ステップ番号を付し、これについての説明を省略する。 FIG. 12 is a flowchart showing an operation flow of the voice recognition server 400b, and corresponds to FIG. 9 of the fourth embodiment. The same steps as those in FIG. 9 are denoted by the same step numbers, and description thereof will be omitted.

文書抽出部４２４で抽出されたメール本文のテキストデータに未知語が存在する場合には（Ｓ３８１０ａ：ＹＥＳ）、処理はステップＳ３８３０ｂに進む。このとき、未知語検出部４５１ａは、その未知語と未知語の送信元のＩＤ情報とを、未知語蓄積部４５４ｂに出力する。 If there is an unknown word in the text data of the mail body extracted by the document extraction unit 424 (S3810a: YES), the process proceeds to step S3830b. At this time, the unknown word detection unit 451a outputs the unknown word and the ID information of the transmission source of the unknown word to the unknown word storage unit 454b.

ステップＳ３８３０ｂで、未知語蓄積部４５４ｂは、未知語検出部４５１ａから入力された未知語とＩＤ情報とを、対応付けて格納する。そして、処理はステップＳ３９００に進む。 In step S3830b, the unknown word accumulation unit 454b stores the unknown word input from the unknown word detection unit 451a and the ID information in association with each other. Then, the process proceeds to step S3900.

一方、メール受信部４２１が言語モデル作成用メールを受信していない場合には（Ｓ３２００：ＮＯ）、処理はステップＳ３２３０ｂに進む。 On the other hand, if the mail receiving unit 421 has not received the language model creation mail (S3200: NO), the process proceeds to step S3230b.

ステップＳ３２３０ｂで、ＧＵＩ処理部４５５ｂは、未知語登録サイトにおいて、ユーザによる未知語の読みの登録操作が行われたか否かを判断する。未知語の読みの登録操作が行われていない場合には（Ｓ３２３０ｂ：ＮＯ）、処理はステップＳ４１００に進む。 In step S3230b, the GUI processing unit 455b determines whether or not an operation for registering an unknown word reading by the user has been performed at the unknown word registration site. If the unknown word reading registration operation has not been performed (S3230b: NO), the process proceeds to step S4100.

一方、未知語の読みの登録操作が行われた場合には（Ｓ３２３０ｂ：ＹＥＳ）、処理は、ステップＳ３２４０ｂに進む。この際、ＧＵＩ処理部４５５ｂは、未知語登録サイトにおいて登録操作が行われた未知語と未知語の読みとを抽出して辞書登録部４５３ａに出力し、ステップＳ３２２０ａに進む。この未知語および読みは、対応付けて辞書４３２に登録されることになる。また、ＧＵＩ処理部４５５ｂは、辞書登録部４５３ａに出力した未知語を、未知語蓄積部４５４ｂから削除する。これにより、既に読みが登録された単語を未知語登録サイトで問い合わせるのを防ぐことができる。 On the other hand, when an unknown word reading registration operation is performed (S3230b: YES), the process proceeds to step S3240b. At this time, the GUI processing unit 455b extracts the unknown word for which the registration operation has been performed at the unknown word registration site and the reading of the unknown word, outputs the extracted unknown word to the dictionary registration unit 453a, and proceeds to step S3220a. This unknown word and reading are registered in the dictionary 432 in association with each other. Also, the GUI processing unit 455b deletes the unknown word output to the dictionary registration unit 453a from the unknown word storage unit 454b. As a result, it is possible to prevent an unknown word registration site from inquiring about a word whose reading has already been registered.

このような動作により、音声認識サーバ４００ｂは、未知語の読みを解決することができる。 With such an operation, the voice recognition server 400b can solve reading of unknown words.

このように、本実施の形態によれば、学習対象に読みが不明な未知語が存在する場合に、その未知語の読みを解決して辞書に登録することができ、音声認識率を向上させることができる。また、ユーザは、自己に都合の良いタイミングで、自己のメールで記述した複数の未知語の読みの入力操作を、一挙に行うことができる。これにより、ユーザの未知語入力の手間を軽減することができる。 As described above, according to the present embodiment, when an unknown word whose reading is unknown exists in the learning target, the reading of the unknown word can be solved and registered in the dictionary, thereby improving the speech recognition rate. be able to. In addition, the user can perform input operations for reading a plurality of unknown words described in his / her mail at a time that is convenient for him / her. Thereby, the trouble of a user's unknown word input can be reduced.

なお、未知語処理部４５０ｂの一部または全てを、ネットワーク上の他の装置に配置するようにしてもよい。特に、未知語蓄積部４５４ｂとＧＵＩ処理部４５５ｂとを他の装置にまとめて配置すれば、未知語登録サイトの機能をまとめて分離することができ、音声認識サーバ４００ｂの負担を軽減することができるとともに、未知語登録サイトの処理を高速化することが可能となる。 A part or all of the unknown word processing unit 450b may be arranged in another device on the network. In particular, if the unknown word storage unit 454b and the GUI processing unit 455b are arranged together in another device, the functions of the unknown word registration site can be separated and the burden on the speech recognition server 400b can be reduced. In addition, it is possible to speed up the processing of the unknown word registration site.

また、未知語登録サイトと同様の動作を行う画面を表示するユーザインタフェース（ＩＵ：user interface）を、携帯電話機のメールアプリケーションソフトウェア内に用意するようにしてもよい。この場合には、音声認識サーバは、例えば、検出した未知語を、逐次または定期的に、未知語の送信元の携帯電話機に送信し、携帯電話機は、受信した未知語を蓄積しておく。そして、アプリケーションソフトウェアは、未知語登録サイトと同様に未知語の表示と読みの入力受け付けを行い、入力された読みを、未知語と対応付けて音声認識サーバに送信し、音声認識サーバの辞書に登録させる。これにより、未知語解決の処理負担を分散することができる。 Further, a user interface (IU: user interface) that displays a screen for performing the same operation as that of the unknown word registration site may be prepared in the mail application software of the mobile phone. In this case, for example, the speech recognition server transmits the detected unknown word sequentially or periodically to the mobile phone that is the source of the unknown word, and the mobile phone stores the received unknown word. Then, the application software displays the unknown word and accepts the input of the reading in the same manner as the unknown word registration site, sends the input reading to the speech recognition server in association with the unknown word, and stores it in the dictionary of the speech recognition server. Let me register. Thereby, the processing burden of unknown word resolution can be distributed.

また、携帯電話機は、音声認識サーバから未知語の読みの登録を促される前に、ユーザにより入力された単語の読みを、音声認識サーバに送信するようにしてもよい。このような読みの入力は、例えば、携帯電話機に格納された、文字変換用のユーザ辞書、かな漢字変換システムの学習情報、および電話帳に対して行われる。したがって、携帯電話機は、例えば、これらのデータが更新されるごとに、その更新後の全データ、または更新されたデータ部分のみを、音声認識サーバに送信する。データを受信した音声認識サーバは、受信データから未知語を検出し、更に未知語の読みを受信データから取得し、これらの未知語と未知語の読みとを対応付けて辞書に登録する。これにより、音声認識サーバで検出される前に、未知語の読みを解決することが可能となる。 In addition, the mobile phone may transmit the reading of the word input by the user to the voice recognition server before being prompted to register the reading of the unknown word by the voice recognition server. Such reading is input to, for example, a user dictionary for character conversion, learning information of a Kana-Kanji conversion system, and a telephone directory stored in a mobile phone. Therefore, for example, every time these data are updated, the mobile phone transmits all the updated data or only the updated data portion to the voice recognition server. The voice recognition server that has received the data detects an unknown word from the received data, further obtains an unknown word reading from the received data, and associates the unknown word with the unknown word reading and registers them in the dictionary. This makes it possible to resolve the reading of unknown words before they are detected by the speech recognition server.

また、以上説明した各実施の形態では、本発明を、音声認識を用いて携帯電話機でメール本文を作成するシステムに適用した例について説明したが、これに限定されるものではない。例えば、言語モデルを用いて音声認識を行う音声認識装置と、この音声認識装置を利用してメール本文の作成を行う、パーソナルコンピュータおよびＰＤＡ（personal digital assistant）などの各種端末装置とを含むシステムに適用できることは勿論である。 Further, in each of the embodiments described above, the example in which the present invention is applied to a system for creating a mail text using a mobile phone using voice recognition has been described. However, the present invention is not limited to this. For example, in a system including a speech recognition device that performs speech recognition using a language model, and various terminal devices such as a personal computer and a PDA (personal digital assistant) that create a mail text using the speech recognition device Of course, it can be applied.

また、ユーザ別言語モデル作成部、音声認識データベース、および音声認識部を同一のサーバ内に配置した例について説明したが、これらをネットワーク上の別個の装置に配置するようにしてもよい。 Further, although an example in which the user-specific language model creation unit, the speech recognition database, and the speech recognition unit are arranged in the same server has been described, they may be arranged in separate devices on the network.

本発明に係る端末装置、言語モデル作成装置、および分散型音声認識システムは、文脈に依存して異なる表記についての音声認識の精度を容易に向上させることができる端末装置、言語モデル作成装置、および分散型音声認識システムとして有用である。 A terminal device, a language model creation device, and a distributed speech recognition system according to the present invention are a terminal device, a language model creation device, and a terminal device that can easily improve speech recognition accuracy for different notations depending on context. It is useful as a distributed speech recognition system.

本発明の実施の形態１に係る分散型音声認識システムとしての音声認識システムの構成の一例を示すシステム構成図1 is a system configuration diagram showing an example of a configuration of a speech recognition system as a distributed speech recognition system according to Embodiment 1 of the present invention. 実施の形態１に係る端末装置を含む携帯電話機の構成を示すブロック図FIG. 3 is a block diagram illustrating a configuration of a mobile phone including the terminal device according to the first embodiment. 実施の形態１に係る言語モデル作成装置を含む音声認識サーバの構成を示すブロック図1 is a block diagram showing a configuration of a speech recognition server including a language model creation device according to Embodiment 1. FIG. 実施の形態１における携帯電話機の動作の流れを示すフローチャートFlowchart showing an operation flow of the mobile phone according to the first embodiment. 実施の形態１における言語モデル作成用メールの構成を示す図The figure which shows the structure of the mail for language model creation in Embodiment 1. 実施の形態１における音声認識サーバの動作の流れを示すフローチャートThe flowchart which shows the flow of operation | movement of the speech recognition server in Embodiment 1. 実施の形態１における音声認識システムのシーケンス図Sequence diagram of voice recognition system according to Embodiment 1 本発明の実施の形態４に係る音声認識サーバの構成を示すブロック図The block diagram which shows the structure of the speech recognition server which concerns on Embodiment 4 of this invention. 本発明の実施の形態４における音声認識サーバの動作の流れを示すフローチャートThe flowchart which shows the flow of operation | movement of the speech recognition server in Embodiment 4 of this invention. 本発明の実施の形態４における問合せメールおよび応答メールの記述内容の一例を示す図The figure which shows an example of the description content of the inquiry mail and the response mail in Embodiment 4 of this invention 本発明の実施の形態５に係る音声認識サーバの構成を示すブロック図The block diagram which shows the structure of the speech recognition server which concerns on Embodiment 5 of this invention. 本発明の実施の形態５における音声認識サーバの動作の流れを示すフローチャートThe flowchart which shows the flow of operation | movement of the speech recognition server in Embodiment 5 of this invention.

Explanation of symbols

１００音声認識システム
２００携帯電話機
２０１ＩＤ記憶部
２０２無線部
２０３アンテナ部
２０４操作部
２０５メール処理部
２０６マイクロフォン
２０７特徴量抽出部
２０８音声データ送信部
２０９テキストデータ受信部
２１０ディスプレイ
２１２制御部
２１３ＢＣＣ生成部
３００メールサーバ
４００、４００ａ、４００ｂ音声認識サーバ
４１０ネットワークインタフェース部
４２０ユーザ別言語モデル作成部
４２１メール受信部
４２２データベース切換部
４２３言語モデル作成部
４２４文書抽出部
４２５ＩＤ抽出部
４３０音声認識データベース
４３１音響モデル
４３２辞書
４３３共通言語モデル
４３４ユーザ別言語モデル
４４０音声認識部
４４１音声データ受信部
４４２データベース切換部
４４３文章作成部
４４４テキストデータ送信部
４４５ＩＤ受信部
４５０ａ、４５０ｂ未知語処理部
４５１ａ未知語検出部
４５２ａ問合メール送受信部
４５３ａ辞書登録部
４５４ｂ未知語蓄積部
４５５ｂＧＵＩ処理部
DESCRIPTION OF SYMBOLS 100 Speech recognition system 200 Mobile phone 201 ID memory | storage part 202 Radio | wireless part 203 Antenna part 204 Operation part 205 Mail processing part 206 Microphone 207 Feature-value extraction part 208 Voice data transmission part 209 Text data reception part 210 Display 212 Control part 213 BCC generation part 300 mail server 400, 400a, 400b speech recognition server 410 network interface unit 420 user-specific language model creation unit 421 mail reception unit 422 database switching unit 423 language model creation unit 424 document extraction unit 425 ID extraction unit 430 speech recognition database 431 acoustic model 432 Dictionary 433 Common language model 434 User-specific language model 440 Speech recognition unit 441 Speech data reception unit 442 Database switching unit 4 3 sentence creation unit 444 text data transmission section 445 ID receiving section 450a, 450b unknown word processing section 451a unknown word detection unit 452a inquiry mail transmitting and receiving unit 453a dictionary registration unit 454b unknown word storage unit 455b GUI processing unit

Claims

Voice data transmitting means for transmitting voice data to a voice recognition device that performs voice recognition processing using a language model;
A mail transmission means for transmitting a mail body of a normal transmission mail as a language model creation mail to a language model creation device that creates the language model;
A terminal device.

The mail sending means
Editing the address of the normal outgoing mail to generate the language model creation mail;
The terminal device according to claim 1.

The mail sending means
ID information that is the same as or corresponding to the ID information used when using the speech recognition apparatus is described in a part of the mail, and the language model creation mail is generated.
The terminal device according to claim 2.

The mail sending means
Each time an email not addressed to the language model creation device is sent, the language model creation device is added to the destination of the transmitted email.
The terminal device according to claim 2.

The mail sending means
The email to be sent as the language model creation email is configured to be selectable in units of outgoing emails and sent in batches.
The terminal device according to claim 1.

The mail sending means
The language model creation mail is generated using the language model creation address in which the domain name of the speech recognition device is described in the domain portion and the ID information used when using the speech recognition device is described in the account portion. ,
The terminal device according to claim 1.

The mail sending means
Further, the mail body of the received mail is transmitted to the language model creation device as the language model creation mail.
The terminal device according to claim 1.

The mail sending means
A plurality of ID information is switched according to the destination of the outgoing mail and associated with the character string.
The terminal device according to claim 4.

A reading input means for accepting input of a word reading;
A reading transmission means for transmitting the input reading to the language model creation device as a reading of the word for creating the language model;
The terminal device according to claim 1.

An unknown word acquisition means for acquiring an unknown word that is an unknown word in the language model creation device;
The reading input means receives the input of the reading by displaying the acquired unknown word,
The terminal device according to claim 9.

A language model creation device that creates a language model used for speech recognition processing using a language model creation email received from a terminal device,
Mail receiving means for receiving the language model creating mail including the ID information and the mail text;
Mail processing means for extracting a mail text and ID information from the received language model creation mail;
Language model creation means for learning the extracted mail text and creating the language model for each ID information;
A language model creation device having

Reading acquisition means for acquiring a word reading from the terminal device;
Dictionary registration means for registering the acquired reading in association with the word in the language model creation dictionary;
The language model creation device according to claim 11.

An unknown word detecting means for detecting an unknown word which is a word whose reading is unknown from the extracted mail body;
The reading acquisition means includes
Accepting reading of the unknown word from the terminal device;
The language model creation device according to claim 12.

A speech recognition device that performs speech recognition processing on speech data using a language model; a terminal device that transmits speech data to the speech recognition device; and a language model creation device that creates the language model by learning a character string. A distributed speech recognition system comprising:
The terminal device
Edit the destination of the normal outgoing mail to generate a language model creation mail, send it to the language model creation device,
The language model creation device includes:
Learn the email body of the received language model creation email to create the language model,
The voice recognition device
Performing speech recognition processing using the language model for the speech data received from the terminal device,
Distributed speech recognition system.