JP2010033340A

JP2010033340A - Voice recognition server, communication system, and voice recognition method

Info

Publication number: JP2010033340A
Application number: JP2008195022A
Authority: JP
Inventors: Yoshiteru Chiba; 芳晃千葉; Noriyuki Miura; 宣之三浦; 幸夫 ▲高▼屋敷; Yukio Takayashiki; Jun Tazawa; 淳田澤; Yoshikazu Akagi; 美和赤木; Atsushi Miura; 淳三浦; Manabu Toyoda; 麻名武豊田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2008-07-29
Filing date: 2008-07-29
Publication date: 2010-02-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology for improving efficiency of character string conversion. <P>SOLUTION: A dictionary reception part 121 receives from a portable terminal 110 dictionary data registered to the portable terminal 110 by associating a character string with its reading. A voice reception part 123 receives from the portable terminal 110 voice data showing characteristics of voice. A voice recognition part 124 recognizes only reading of the voice data received by the voice reception part 123. A conversion candidate generation part 125 uses the dictionary data received by the dictionary reception part 121 to generate a reading conversion candidate recognized by the voice recognition part 124. A conversion candidate transmission part 126 transmits to the portable terminal 110 a conversion candidate generated by the conversion candidate generation part 125. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、音声認識を行う音声認識サーバ、通信システムおよび音声認識方法に関する。 The present invention relates to a speech recognition server that performs speech recognition, a communication system, and a speech recognition method.

従来、一般的な携帯電話機などの携帯端末においては、文字入力時の文字列変換を行うために、文字列とその読みが対応付けられた辞書データが携帯端末にあらかじめ登録されている。また、携帯端末に登録されていない特殊な読みを持つ単語を補う目的で、ユーザが任意の辞書データを登録できるユーザ辞書登録機能が携帯端末に設けられている。 2. Description of the Related Art Conventionally, in a portable terminal such as a general cellular phone, dictionary data in which a character string and its reading are associated is registered in advance in the portable terminal in order to perform character string conversion when inputting characters. In addition, a user dictionary registration function that allows a user to register arbitrary dictionary data is provided in the portable terminal for the purpose of supplementing words having special readings that are not registered in the portable terminal.

また、携帯端末の文字入力エリアへ文字列を入力するときに、音声認識サーバによって提供される音声認識サービスを用いて文字入力を行う技術が開示されている（たとえば、下記特許文献１参照。）。音声認識サーバによって提供される音声認識サービスを用いることで、携帯端末に音声認識機能を持たせる場合と比べて容易かつ低コストで音声認識を行うことができる。音声認識サーバには、携帯端末から受信した音声データを音声認識して取得した読みの文字列変換を行うための辞書データがあらかじめ登録されている。 Moreover, when inputting a character string into the character input area of a portable terminal, the technique which inputs a character using the speech recognition service provided by the speech recognition server is disclosed (for example, refer the following patent document 1). . By using the voice recognition service provided by the voice recognition server, voice recognition can be performed easily and at a lower cost than the case where the portable terminal has a voice recognition function. In the voice recognition server, dictionary data for converting a character string of a reading obtained by voice recognition of voice data received from a portable terminal is registered in advance.

特開２００７−２１３１０９号公報JP 2007-213109 A

しかしながら、上述した従来技術では、音声認識サーバに登録できる辞書データには限りがあり、また多数の携帯端末へ音声認識サービスを提供するために、音声認識サーバに登録される辞書データは汎用性のあるデータとなる。このため、たとえば人名や土地名のように特殊な読みをする単語などの、ユーザに特有な文字列変換を行う場合は、ユーザの意図する文字列を精度よく変換候補として出力することができないという問題がある。 However, in the above-described conventional technology, there is a limit to dictionary data that can be registered in the voice recognition server, and the dictionary data registered in the voice recognition server is versatile in order to provide voice recognition services to a large number of mobile terminals. It becomes some data. For this reason, for example, when performing character string conversion specific to the user, such as a special reading word such as a person name or a land name, it is impossible to accurately output a character string intended by the user as a conversion candidate. There's a problem.

開示の音声認識サーバ、通信システムおよび音声認識方法は、上述した問題点を解消するものであり、文字列変換の効率を向上させることを目的とする。 The disclosed speech recognition server, communication system, and speech recognition method are intended to solve the above-described problems and to improve the efficiency of character string conversion.

上述した課題を解決し、目的を達成するため、この音声認識サーバは、文字列と読みが対応付けられた辞書データが記憶された記憶手段と、携帯端末に登録され、前記辞書データとは異なる辞書データを前記携帯端末から受信する辞書受信手段と、音声の特徴を示す音声データを前記携帯端末から受信する音声受信手段と、前記音声受信手段によって受信された音声データの読みを認識する音声認識手段と、前記記憶手段に記憶された辞書データと、前記辞書受信手段によって受信された辞書データとを用いて前記音声認識手段によって認識された読みの変換候補を生成する生成手段と、前記生成手段によって生成された変換候補を前記携帯端末へ送信する送信手段と、を備えることを要件とする。 In order to solve the above-described problems and achieve the object, this speech recognition server is different from the dictionary data registered in the portable device and storage means storing dictionary data in which character strings and readings are associated with each other. Dictionary receiving means for receiving dictionary data from the portable terminal, voice receiving means for receiving voice data indicating voice characteristics from the portable terminal, and voice recognition for recognizing reading of voice data received by the voice receiving means Generating means for generating conversion candidates for reading recognized by the speech recognition means using dictionary data stored in the storage means and dictionary data received by the dictionary receiving means, and the generating means And transmitting means for transmitting the conversion candidate generated by the above to the portable terminal.

上記構成によれば、音声認識により取得した読みの変換候補を、ユーザによって携帯端末に登録された辞書データを用いて生成することができる。これにより、携帯端末のユーザに特有な文字列変換を行う場合においても、ユーザの意図する文字列を精度よく変換候補として出力することができる。 According to the above configuration, a conversion candidate for reading acquired by voice recognition can be generated using dictionary data registered in the mobile terminal by the user. Thereby, even when character string conversion unique to the user of the mobile terminal is performed, the character string intended by the user can be output as a conversion candidate with high accuracy.

開示の音声認識サーバ、通信システムおよび音声認識方法によれば、文字列変換の効率を向上させることができるという効果を奏する。 According to the disclosed speech recognition server, communication system, and speech recognition method, the efficiency of character string conversion can be improved.

以下に添付図面を参照して、この音声認識サーバ、通信システムおよび音声認識方法の好適な実施の形態を詳細に説明する。この音声認識サーバ、通信システムおよび音声認識方法では、ユーザによって携帯端末に登録された辞書データを受信して、音声認識した読みの変換候補を、受信した辞書データ用いて生成する。これにより、ユーザの意図する文字列を精度よく変換候補として出力することができる。 Exemplary embodiments of a speech recognition server, a communication system, and a speech recognition method will be described below in detail with reference to the accompanying drawings. In the voice recognition server, the communication system, and the voice recognition method, dictionary data registered in the mobile terminal by the user is received, and a speech conversion candidate that has been voice-recognized is generated using the received dictionary data. Thereby, the character string intended by the user can be output as a conversion candidate with high accuracy.

（実施の形態１）
図１は、実施の形態１にかかる音声認識システムの機能的構成を示すブロック図である。図１に示すように、実施の形態１にかかる音声認識システム１００は、携帯端末１１０と、音声認識サーバ１２０と、を含んでいる。携帯端末１１０と音声認識サーバ１２０は、移動体通信網を介して互いに無線通信を行う。 (Embodiment 1)
FIG. 1 is a block diagram of a functional configuration of the speech recognition system according to the first embodiment. As shown in FIG. 1, the speech recognition system 100 according to the first exemplary embodiment includes a mobile terminal 110 and a speech recognition server 120. The portable terminal 110 and the voice recognition server 120 perform wireless communication with each other via a mobile communication network.

また、携帯端末１１０には、メールアプリケーションなどのユーザによる文字入力が必要なプログラムがインストールされている。携帯端末１１０は、辞書登録部１１１と、辞書記憶部１１２と、辞書送信部１１３と、音声入力部１１４と、特徴抽出部１１５と、音声送信部１１６と、変換候補受信部１１７と、変換候補出力部１１８と、を備えている。 The portable terminal 110 is installed with a program such as a mail application that requires a user to input characters. The mobile terminal 110 includes a dictionary registration unit 111, a dictionary storage unit 112, a dictionary transmission unit 113, a voice input unit 114, a feature extraction unit 115, a voice transmission unit 116, a conversion candidate reception unit 117, and a conversion candidate. And an output unit 118.

辞書登録部１１１は、辞書データの登録をユーザから受け付ける。辞書データとは、文字列とその読みが対応付けられた情報である。たとえば、辞書データは、文字列「貞山堀」とその読み「ていざんぼり」が対応付けられた情報「貞山堀：ていざんぼり」である。また、文字列は、文字列とその読みが対応付けられた情報の集合であってもよい。辞書登録部１１１は、登録された辞書データを辞書記憶部１１２へ出力する。 The dictionary registration unit 111 receives registration of dictionary data from the user. Dictionary data is information in which a character string is associated with its reading. For example, the dictionary data is information “Sadayamabori: Tezanbori” in which the character string “Sadayamabori” is associated with its reading “Tezazabori”. The character string may be a set of information in which the character string and its reading are associated with each other. The dictionary registration unit 111 outputs the registered dictionary data to the dictionary storage unit 112.

辞書記憶部１１２は、辞書登録部１１１から出力された辞書データを記憶する。辞書記憶部１１２によって記憶される辞書データは、たとえば、ユーザがキー操作によって文字列を入力する場合に、読みから漢字などの文字列に変換する際の変換辞書として利用される。また、辞書記憶部１１２は、辞書データを辞書送信部１１３へ出力する。 The dictionary storage unit 112 stores the dictionary data output from the dictionary registration unit 111. The dictionary data stored in the dictionary storage unit 112 is used, for example, as a conversion dictionary when converting a character string such as kanji into a character string when the user inputs a character string by a key operation. Further, the dictionary storage unit 112 outputs the dictionary data to the dictionary transmission unit 113.

辞書送信部１１３は、辞書記憶部１１２から出力された辞書データを音声認識サーバ１２０へ送信する。辞書データの送信は、携帯端末１１０のユーザに意識させることなく自動的に行われてもよい。音声入力部１１４は、ユーザによる音声入力を受け付ける。音声入力部１１４は、ユーザによって入力された音声を特徴抽出部１１５へ出力する。 The dictionary transmission unit 113 transmits the dictionary data output from the dictionary storage unit 112 to the voice recognition server 120. The dictionary data may be transmitted automatically without making the user of the mobile terminal 110 aware of it. The voice input unit 114 receives a voice input by a user. The voice input unit 114 outputs the voice input by the user to the feature extraction unit 115.

特徴抽出部１１５は、音声入力部１１４から出力された音声から、音声の特徴を示す音声データを抽出する。音声データは、たとえば周波数特性である。特徴抽出部１１５は、抽出した音声データを音声送信部１１６へ出力する。音声送信部１１６は、特徴抽出部１１５から出力された音声データを音声認識サーバ１２０へ送信する。 The feature extraction unit 115 extracts audio data indicating the audio features from the audio output from the audio input unit 114. The audio data is, for example, frequency characteristics. The feature extraction unit 115 outputs the extracted audio data to the audio transmission unit 116. The voice transmission unit 116 transmits the voice data output from the feature extraction unit 115 to the voice recognition server 120.

変換候補受信部１１７は、音声認識サーバ１２０から送信された変換候補を受信する。変換候補受信部１１７は、受信した変換候補を変換候補出力部１１８へ出力する。変換候補出力部１１８は、変換候補受信部１１７から出力された変換候補をユーザに対して出力する。ユーザは、変換候補出力部１１８によって出力された変換候補から文字列を選択することによって、メールアプリケーションなどへの文字入力を確定させることができる。 The conversion candidate receiving unit 117 receives the conversion candidate transmitted from the voice recognition server 120. The conversion candidate receiving unit 117 outputs the received conversion candidate to the conversion candidate output unit 118. The conversion candidate output unit 118 outputs the conversion candidates output from the conversion candidate reception unit 117 to the user. The user can confirm the character input to the mail application or the like by selecting a character string from the conversion candidates output by the conversion candidate output unit 118.

音声認識サーバ１２０は、辞書受信部１２１と、辞書データベース１２２と、音声受信部１２３と、音声認識部１２４と、変換候補生成部１２５と、変換候補送信部１２６と、を備えている。辞書受信部１２１は、携帯端末１１０から送信された辞書データを受信する。辞書受信部１２１は、受信した辞書データを辞書データベース１２２へ出力する。 The voice recognition server 120 includes a dictionary receiving unit 121, a dictionary database 122, a voice receiving unit 123, a voice recognition unit 124, a conversion candidate generation unit 125, and a conversion candidate transmission unit 126. The dictionary receiving unit 121 receives dictionary data transmitted from the mobile terminal 110. The dictionary receiving unit 121 outputs the received dictionary data to the dictionary database 122.

辞書データベース１２２には、辞書データの集合があらかじめ記憶されている。辞書データベース１２２にあらかじめ記憶されている辞書データは、携帯端末１１０を含む多くの携帯端末のユーザに対して汎用性を有する一般的な辞書データである。また、辞書データベース１２２は、辞書受信部１２１から出力された辞書データを新たに記憶する。 The dictionary database 122 stores a set of dictionary data in advance. The dictionary data stored in the dictionary database 122 in advance is general dictionary data that has versatility for users of many portable terminals including the portable terminal 110. The dictionary database 122 newly stores dictionary data output from the dictionary receiving unit 121.

音声受信部１２３は、携帯端末１１０から送信された音声データを受信する。音声受信部１２３は、受信した音声データを音声認識部１２４へ出力する。音声認識部１２４は、音声受信部１２３から出力された音声データを解析して、音声データの読みを認識する。音声認識部１２４は、認識した読みを示す情報を変換候補生成部１２５へ出力する。 The audio receiving unit 123 receives audio data transmitted from the mobile terminal 110. The voice receiving unit 123 outputs the received voice data to the voice recognition unit 124. The voice recognition unit 124 analyzes the voice data output from the voice reception unit 123 and recognizes the reading of the voice data. The voice recognition unit 124 outputs information indicating the recognized reading to the conversion candidate generation unit 125.

変換候補生成部１２５は、辞書データベース１２２に記憶された辞書データを用いて、音声認識部１２４から出力された情報が示す読みの変換候補を生成する。変換候補生成部１２５は、生成した変換候補を変換候補送信部１２６へ出力する。変換候補送信部１２６は、変換候補生成部１２５から出力された変換候補を携帯端末１１０へ送信する。 The conversion candidate generation unit 125 uses the dictionary data stored in the dictionary database 122 to generate reading conversion candidates indicated by the information output from the speech recognition unit 124. The conversion candidate generation unit 125 outputs the generated conversion candidate to the conversion candidate transmission unit 126. The conversion candidate transmission unit 126 transmits the conversion candidate output from the conversion candidate generation unit 125 to the mobile terminal 110.

図２は、実施の形態１にかかる音声認識システムの動作の一例を示すシーケンス図である。ここでは、携帯端末１１０の辞書登録部１１１が辞書データの登録をすでにユーザから受け付け、辞書データが辞書記憶部１１２に記憶されていることを前提とする。まず、携帯端末１１０が、音声認識サーバ１２０による音声認識サービスを要求する音声認識要求を、ユーザからのキー操作などによって受け付ける（ステップＳ２０１）。 FIG. 2 is a sequence diagram illustrating an example of the operation of the speech recognition system according to the first embodiment. Here, it is assumed that the dictionary registration unit 111 of the mobile terminal 110 has already accepted registration of dictionary data from the user and the dictionary data is stored in the dictionary storage unit 112. First, the mobile terminal 110 receives a voice recognition request for requesting a voice recognition service by the voice recognition server 120 by a key operation from the user (step S201).

ステップＳ２０１によって受け付けられる音声認識要求は、たとえばメールアプリケーションなどで文字入力を行うときに、ユーザが音声認識機能を使用する機能を選択することによって指定される。または、ユーザの操作によって、メールアプリケーションなどで文字入力を行う状態となったときに、音声認識要求があったとみなしてもよい。 The voice recognition request accepted in step S201 is designated by the user selecting a function that uses the voice recognition function, for example, when inputting characters with a mail application or the like. Alternatively, it may be considered that a voice recognition request has been made when the user enters a character input state using a mail application or the like.

つぎに、携帯端末１１０が、記憶しておいた辞書データを読み出す（ステップＳ２０２）。つぎに、携帯端末１１０が、音声認識要求とともに、ステップＳ２０２によって読み出された辞書データを音声認識サーバ１２０へ送信する（ステップＳ２０３）。ここで送信される辞書データには、「貞山堀：ていざんぼり」が含まれているとする。また、辞書データ「貞山堀：ていざんぼり」は、音声認識サーバ１２０の辞書データベース１２２にあらかじめ登録された辞書データではないとする。 Next, the portable terminal 110 reads the stored dictionary data (step S202). Next, the mobile terminal 110 transmits the dictionary data read out in step S202 together with the voice recognition request to the voice recognition server 120 (step S203). It is assumed that the dictionary data transmitted here includes “Sadayamabori: Tezabori”. Further, it is assumed that the dictionary data “Sadayamabori: Teizanbori” is not dictionary data registered in advance in the dictionary database 122 of the speech recognition server 120.

つぎに、音声認識サーバ１２０が、ステップＳ２０３によって送信された辞書データを辞書データベース１２２に登録する（ステップＳ２０４）。つぎに、音声認識サーバ１２０が、音声送信指示を携帯端末１１０へ送信する（ステップＳ２０５）。つぎに、携帯端末１１０が、ユーザからの音声入力を受け付ける（ステップＳ２０６）。ここでは、携帯端末１１０のユーザは、「ていざんぼりでまちあわせね」という音声を入力したとする。 Next, the voice recognition server 120 registers the dictionary data transmitted in step S203 in the dictionary database 122 (step S204). Next, the voice recognition server 120 transmits a voice transmission instruction to the portable terminal 110 (step S205). Next, the portable terminal 110 receives a voice input from the user (step S206). Here, it is assumed that the user of the mobile terminal 110 has input a voice saying “Carefully tune up”.

つぎに、携帯端末１１０が、ステップＳ２０６によって入力された音声の特徴を示す音声データを抽出する（ステップＳ２０７）。つぎに、携帯端末１１０が、ステップＳ２０７によって抽出された音声データを音声認識サーバ１２０へ送信する（ステップＳ２０８）。つぎに、音声認識サーバ１２０が、ステップＳ２０８によって送信された音声データの読みを認識する（ステップＳ２０９）。ここでは、音声認識部１２４が、「ていざんぼりでまちあわせね」という読みを認識する。 Next, the mobile terminal 110 extracts audio data indicating the audio characteristics input in step S206 (step S207). Next, the portable terminal 110 transmits the voice data extracted in step S207 to the voice recognition server 120 (step S208). Next, the voice recognition server 120 recognizes the reading of the voice data transmitted in step S208 (step S209). Here, the voice recognizing unit 124 recognizes the reading “sneakly and tune up”.

つぎに、音声認識サーバ１２０が、ステップＳ２０９によって認識された読みの変換候補を生成する（ステップＳ２１０）。つぎに、音声認識サーバ１２０が、ステップＳ２１０によって生成された変換候補を携帯端末１１０へ送信する（ステップＳ２１１）。つぎに、携帯端末１１０が、ステップＳ２１１によって送信された変換候補をユーザに対して出力し（ステップＳ２１２）、一連の動作を終了する。 Next, the speech recognition server 120 generates a reading conversion candidate recognized in step S209 (step S210). Next, the speech recognition server 120 transmits the conversion candidate generated in step S210 to the mobile terminal 110 (step S211). Next, the portable terminal 110 outputs the conversion candidate transmitted by step S211 with respect to a user (step S212), and complete | finishes a series of operation | movement.

ここでは、携帯端末１１０が、音声認識要求と同時に辞書データを送信する場合について説明したが、携帯端末１１０が辞書データを送信するタイミングは、携帯端末１１０から音声認識サーバ１２０への音声認識要求時であればよく、必ずしも音声認識要求の送信と同時でなくてもよい。携帯端末１１０から音声認識サーバ１２０への音声認識要求時とは、携帯端末１１０が音声認識要求をユーザから受け付けてから、携帯端末１１０から音声認識サーバ１２０へ音声データを送信するまでの期間である。 Here, the case where the mobile terminal 110 transmits the dictionary data simultaneously with the voice recognition request has been described, but the timing at which the mobile terminal 110 transmits the dictionary data is the time when the mobile terminal 110 sends a voice recognition request to the voice recognition server 120. As long as the voice recognition request is transmitted. The time of voice recognition request from the mobile terminal 110 to the voice recognition server 120 is a period from when the mobile terminal 110 receives a voice recognition request from the user to when voice data is transmitted from the mobile terminal 110 to the voice recognition server 120. .

このように、音声認識サーバ１２０は、携帯端末１１０からの音声認識要求時に送信される辞書データを受信することで、携帯端末１１０からの音声認識要求時に携帯端末１１０に登録された最新の辞書データにより文字列変換を行うことができる。このため、携帯端末１１０のユーザの意図する文字列を精度よく変換候補として出力することができる。 As described above, the voice recognition server 120 receives the dictionary data transmitted at the time of the voice recognition request from the portable terminal 110, so that the latest dictionary data registered in the portable terminal 110 at the time of the voice recognition request from the portable terminal 110 is received. The character string can be converted by For this reason, the character string intended by the user of the mobile terminal 110 can be accurately output as a conversion candidate.

図３は、変換候補の出力の一例を示す図である。図２に示したステップＳ２１２においては、たとえば、携帯端末１１０は、図３に示す表示画面３００を表示する。表示画面３００の上部には、音声認識部１２４によって認識された「ていざんぼりでまちあわせね」という読みを変換した「貞山堀で待ち合わせね」という文字列３１０が表示されている。 FIG. 3 is a diagram illustrating an example of output of conversion candidates. In step S212 illustrated in FIG. 2, for example, the mobile terminal 110 displays the display screen 300 illustrated in FIG. In the upper part of the display screen 300, a character string 310 of “Meet me at Sadayamabori”, which is a conversion of the reading “Made in town” recognized by the voice recognition unit 124, is displayed.

また、文字列３１０における「貞山堀」の文字列部分３１１は選択状態になっている。表示画面３００の下部には、文字列部分３１１の変換候補３２０が列挙されている。表示画面３００の中央部に表示された「変換候補３」は、文字列部分３１１の変換候補が３つあることを示している。変換候補３２０においては、「貞山堀」が第１変換候補、「低山堀」が第２変換候補、「ていざんぼり」が第３変換候補として表示されている。 In addition, the character string portion 311 of “Sadayamabori” in the character string 310 is in a selected state. In the lower part of the display screen 300, conversion candidates 320 for the character string portion 311 are listed. “Conversion candidate 3” displayed in the center of the display screen 300 indicates that there are three conversion candidates for the character string portion 311. In the conversion candidate 320, “Sadayamabori” is displayed as the first conversion candidate, “Takayamabori” is displayed as the second conversion candidate, and “Dezabori” is displayed as the third conversion candidate.

ユーザは、キー操作などにより、変換候補３２０の「貞山堀」、「低山堀」および「ていざんぼり」のいずれかを選択する。これにより、文字列部分３１１が、選択された文字列に変換されて確定する。この後、ユーザは、文字列３１０における「待ち合わせ」の文字列部分についても同様に、変換候補のいずれかの文字列を選択する。これにより、文字列３１０が、確定した状態でテキスト入力ボックスなどに入力される。 The user selects one of “Sadayamabori”, “Takayamabori”, and “Dezabori” from the conversion candidates 320 by a key operation or the like. Thereby, the character string portion 311 is converted into the selected character string and confirmed. Thereafter, the user selects any character string as a conversion candidate in the same manner for the character string portion of “waiting” in the character string 310. As a result, the character string 310 is input to a text input box or the like in a fixed state.

図４は、実施の形態１にかかる音声認識システムの終了動作の一例を示すシーケンス図である。図２に示した各ステップの後、携帯端末１１０が、音声認識サーバ１２０による音声認識サービスの終了を要求する音声認識終了要求をユーザから受け付ける（ステップＳ４０１）。つぎに、携帯端末１１０が、音声認識終了要求とともに、辞書データを削除すべき旨の削除要求を音声認識サーバ１２０へ送信する（ステップＳ４０２）。 FIG. 4 is a sequence diagram illustrating an example of an end operation of the speech recognition system according to the first embodiment. After each step shown in FIG. 2, the portable terminal 110 receives a voice recognition end request for requesting the end of the voice recognition service by the voice recognition server 120 from the user (step S401). Next, the portable terminal 110 transmits a deletion request to the effect that the dictionary data should be deleted to the voice recognition server 120 together with the voice recognition end request (step S402).

つぎに、音声認識サーバ１２０が、図２に示したステップＳ２０４によって登録した辞書データを辞書データベース１２２から削除する（ステップＳ４０３）。つぎに、音声認識サーバ１２０が、辞書データを削除したことを示す削除通知を携帯端末１１０へ送信し（ステップＳ４０４）、一連の終了動作を終了する。 Next, the voice recognition server 120 deletes the dictionary data registered in step S204 shown in FIG. 2 from the dictionary database 122 (step S403). Next, the voice recognition server 120 transmits a deletion notification indicating that the dictionary data has been deleted to the portable terminal 110 (step S404), and the series of end operations ends.

このように、音声認識サーバ１２０は、携帯端末１１０からの音声認識終了要求時に、辞書受信部１２１によって受信して辞書データベース１２２に登録した辞書データを自装置から削除する削除手段を備えることで、辞書データベース１２２に常に登録する辞書データの数を少なくすることができる。このため、記憶容量の低減を図ることができる。 As described above, the voice recognition server 120 includes a deletion unit that deletes the dictionary data received by the dictionary receiving unit 121 and registered in the dictionary database 122 from the own device when the voice recognition end request is received from the mobile terminal 110. The number of dictionary data always registered in the dictionary database 122 can be reduced. For this reason, the storage capacity can be reduced.

図５は、図２に示した動作の他の例を示すシーケンス図（その１）である。まず、携帯端末１１０が、辞書データの登録をユーザから受け付け（ステップＳ５０１）、ステップＳ５０１によって登録された辞書データを音声認識サーバ１２０へ送信する（ステップＳ５０２）。つぎに、音声認識サーバ１２０が、ステップＳ５０２によって送信された辞書データを辞書データベース１２２に登録する（ステップＳ５０３）。 FIG. 5 is a sequence diagram (part 1) illustrating another example of the operation illustrated in FIG. First, the portable terminal 110 receives registration of dictionary data from the user (step S501), and transmits the dictionary data registered in step S501 to the voice recognition server 120 (step S502). Next, the voice recognition server 120 registers the dictionary data transmitted in step S502 in the dictionary database 122 (step S503).

ステップＳ５０１〜Ｓ５０３を、携帯端末１１０のユーザが辞書登録部１１１に対して辞書データを登録するたびに繰り返す。このように、携帯端末１１０の辞書登録時に携帯端末１１０から辞書データを音声認識サーバ１２０へ送信することで、携帯端末１１０のユーザが携帯端末１１０に登録した辞書データを音声認識サーバ１２０の辞書データベース１２２に蓄積することができる。そして、携帯端末１１０のユーザが携帯端末１１０に対して音声認識要求を入力した場合は、図６に示す動作が行われる。 Steps S501 to S503 are repeated each time the user of the mobile terminal 110 registers dictionary data in the dictionary registration unit 111. In this way, by transmitting dictionary data from the mobile terminal 110 to the voice recognition server 120 when the dictionary of the mobile terminal 110 is registered, the dictionary data registered in the mobile terminal 110 by the user of the mobile terminal 110 is stored in the dictionary database of the voice recognition server 120. 122 can be stored. When the user of the mobile terminal 110 inputs a voice recognition request to the mobile terminal 110, the operation shown in FIG. 6 is performed.

図６は、図２に示した動作の他の例を示すシーケンス図（その２）である。まず、携帯端末１１０が、音声認識要求をユーザから受け付ける（ステップＳ６０１）。つぎに、携帯端末１１０が、音声認識要求を音声認識サーバ１２０へ送信する（ステップＳ６０２）。ステップＳ６０３〜Ｓ６１０は、ステップＳ２０５〜Ｓ２１２（図２参照）と同様であるため説明を省略する。なお、図５に示した各ステップによって、辞書データは音声認識サーバ１２０にあらかじめ登録されている。 FIG. 6 is a sequence diagram (part 2) illustrating another example of the operation illustrated in FIG. First, the mobile terminal 110 receives a voice recognition request from the user (step S601). Next, the portable terminal 110 transmits a voice recognition request to the voice recognition server 120 (step S602). Steps S603 to S610 are the same as steps S205 to S212 (see FIG. 2), and thus description thereof is omitted. It should be noted that the dictionary data is registered in advance in the voice recognition server 120 by each step shown in FIG.

このため、携帯端末１１０は、ステップＳ６０２において、辞書データ（図２のステップＳ２０３参照）を送信しなくてもよい。このように、携帯端末１１０の辞書登録時に辞書データを辞書データベース１２２に登録しておくことで、音声認識要求時の、携帯端末１１０から音声認識サーバ１２０へ送信する情報（図２のステップＳ２０３）および音声認識サーバ１２０による処理（図２のステップＳ２０４）を減らすことができる。 For this reason, the portable terminal 110 does not need to transmit dictionary data (refer to step S203 in FIG. 2) in step S602. Thus, by registering the dictionary data in the dictionary database 122 when the dictionary of the portable terminal 110 is registered, information transmitted from the portable terminal 110 to the voice recognition server 120 at the time of the voice recognition request (step S203 in FIG. 2). And the process (step S204 of FIG. 2) by the speech recognition server 120 can be reduced.

このように、実施の形態１にかかる音声認識サーバ１２０によれば、音声認識により取得した読みの変換候補を、ユーザによって携帯端末１１０に登録された辞書データを用いて生成することができる。これにより、携帯端末１１０のユーザに特有な文字列変換を行う場合においても、ユーザの意図する文字列を精度よく変換候補として出力することができる。このため、文字列変換の効率を向上させることができる。 As described above, according to the speech recognition server 120 according to the first exemplary embodiment, reading conversion candidates acquired by speech recognition can be generated using dictionary data registered in the mobile terminal 110 by the user. Thereby, even when character string conversion unique to the user of the mobile terminal 110 is performed, the character string intended by the user can be output as a conversion candidate with high accuracy. For this reason, the efficiency of character string conversion can be improved.

（実施の形態２）
実施の形態２にかかる音声認識システム１００の機能的構成については、図１に示した機能的構成と同様であるため説明を省略する。実施の形態２における辞書受信部１２１は、携帯端末１１０を含む携帯端末群から送信された各辞書データを受信する。 (Embodiment 2)
The functional configuration of the speech recognition system 100 according to the second exemplary embodiment is the same as the functional configuration illustrated in FIG. The dictionary receiving unit 121 in the second embodiment receives each dictionary data transmitted from the mobile terminal group including the mobile terminal 110.

図７は、実施の形態２にかかる音声認識システムの動作の一例を示すシーケンス図である。図７において、携帯端末７１０および携帯端末７２０のそれぞれは、携帯端末１１０と同様の機能を有する携帯端末である。ここでは、携帯端末１１０，７１０，７２０，１１０の順に、辞書データの登録をユーザから受け付けた場合について説明する。 FIG. 7 is a sequence diagram illustrating an example of the operation of the speech recognition system according to the second embodiment. In FIG. 7, each of the mobile terminal 710 and the mobile terminal 720 is a mobile terminal having the same function as the mobile terminal 110. Here, the case where registration of dictionary data is received from the user in the order of the mobile terminals 110, 710, 720, 110 will be described.

まず、携帯端末１１０が、辞書データの登録を携帯端末１１０のユーザから受け付ける（ステップＳ７０１）。つぎに、携帯端末１１０が、ステップＳ７０１によって登録された辞書データを音声認識サーバ１２０へ送信する（ステップＳ７０２）。つぎに、音声認識サーバ１２０が、ステップＳ７０２によって送信された辞書データを辞書データベース１２２に登録する（ステップＳ７０３）。 First, the portable terminal 110 receives registration of dictionary data from the user of the portable terminal 110 (step S701). Next, the portable terminal 110 transmits the dictionary data registered in step S701 to the voice recognition server 120 (step S702). Next, the speech recognition server 120 registers the dictionary data transmitted in step S702 in the dictionary database 122 (step S703).

つぎに、携帯端末７２０が、辞書データの登録を携帯端末７２０のユーザから受け付ける（ステップＳ７０４）。つぎに、携帯端末７２０が、ステップＳ７０４によって登録された辞書データを音声認識サーバ１２０へ送信する（ステップＳ７０５）。つぎに、音声認識サーバ１２０が、ステップＳ７０５によって送信された辞書データを辞書データベース１２２に登録する（ステップＳ７０６）。 Next, the portable terminal 720 receives registration of dictionary data from the user of the portable terminal 720 (step S704). Next, the portable terminal 720 transmits the dictionary data registered in step S704 to the voice recognition server 120 (step S705). Next, the voice recognition server 120 registers the dictionary data transmitted in step S705 in the dictionary database 122 (step S706).

つぎに、携帯端末７１０が、辞書データの登録を携帯端末７１０のユーザから受け付ける（ステップＳ７０７）。つぎに、携帯端末７１０が、ステップＳ７０７によって登録された辞書データを音声認識サーバ１２０へ送信する（ステップＳ７０８）。つぎに、音声認識サーバ１２０が、ステップＳ７０８によって送信された辞書データを辞書データベース１２２に登録する（ステップＳ７０９）。 Next, the portable terminal 710 receives registration of dictionary data from the user of the portable terminal 710 (step S707). Next, the portable terminal 710 transmits the dictionary data registered in step S707 to the voice recognition server 120 (step S708). Next, the voice recognition server 120 registers the dictionary data transmitted in step S708 in the dictionary database 122 (step S709).

つぎに、携帯端末１１０が、辞書データの登録を携帯端末１１０のユーザから受け付ける（ステップＳ７１０）。つぎに、携帯端末１１０が、ステップＳ７１０によって登録された辞書データを音声認識サーバ１２０へ送信する（ステップＳ７１１）。つぎに、音声認識サーバ１２０が、ステップＳ７１１によって送信された辞書データを辞書データベース１２２に登録する（ステップＳ７１２）。 Next, the portable terminal 110 receives registration of dictionary data from the user of the portable terminal 110 (step S710). Next, the portable terminal 110 transmits the dictionary data registered in step S710 to the voice recognition server 120 (step S711). Next, the voice recognition server 120 registers the dictionary data transmitted in step S711 in the dictionary database 122 (step S712).

このように、携帯端末１１０，７１０，７２０が辞書データの登録をユーザから受け付けるたびに、登録された辞書データを音声認識サーバ１２０へ送信し、送信された辞書データを音声認識サーバ１２０が登録することで、携帯端末１１０，７１０，７２０の各ユーザが登録した辞書データを音声認識サーバ１２０の辞書データベース１２２に蓄積することができる。そして、携帯端末１１０に対してユーザからの音声認識要求が入力されると、実施の形態１と同様の動作（図６参照）が行われる。 In this way, each time portable terminals 110, 710, and 720 accept registration of dictionary data from the user, the registered dictionary data is transmitted to voice recognition server 120, and voice recognition server 120 registers the transmitted dictionary data. Thus, dictionary data registered by each user of the mobile terminals 110, 710, and 720 can be stored in the dictionary database 122 of the speech recognition server 120. When a voice recognition request is input from the user to the mobile terminal 110, the same operation as that in the first embodiment (see FIG. 6) is performed.

図８は、図７に示した動作の具体例１を示すシーケンス図である。まず、携帯端末１１０が、辞書データの登録を携帯端末１１０のユーザから受け付ける（ステップＳ８０１）。ここでは、携帯端末１１０は、辞書データとして「低山堀：ていざんぼり」を受け付けたとする。つぎに、携帯端末１１０が、辞書データ「低山堀：ていざんぼり」を音声認識サーバ１２０へ送信し（ステップＳ８０２）、送信された辞書データを音声認識サーバ１２０が辞書データベース１２２に登録する（ステップＳ８０３）。 FIG. 8 is a sequence diagram showing a specific example 1 of the operation shown in FIG. First, the portable terminal 110 receives registration of dictionary data from the user of the portable terminal 110 (step S801). Here, it is assumed that the mobile terminal 110 accepts “Takayamabori: Dezabori” as dictionary data. Next, the portable terminal 110 transmits the dictionary data “Takayamabori: Teizanbori” to the speech recognition server 120 (step S802), and the speech recognition server 120 registers the transmitted dictionary data in the dictionary database 122 ( Step S803).

つぎに、携帯端末７２０が、辞書データの登録を携帯端末７２０のユーザから受け付ける（ステップＳ８０４）。ここでは、携帯端末７２０は、辞書データとして「赤坂：あかさか」を受け付けたとする。つぎに、携帯端末７２０が、辞書データ「赤坂：あかさか」を音声認識サーバ１２０へ送信し（ステップＳ８０５）、送信された辞書データを音声認識サーバ１２０が辞書データベース１２２に登録する（ステップＳ８０６）。 Next, the portable terminal 720 receives registration of dictionary data from the user of the portable terminal 720 (step S804). Here, it is assumed that the mobile terminal 720 receives “Akasaka: Akasaka” as dictionary data. Next, the portable terminal 720 transmits the dictionary data “Akasaka: Akasaka” to the speech recognition server 120 (step S805), and the speech recognition server 120 registers the transmitted dictionary data in the dictionary database 122 (step S806). .

つぎに、携帯端末７１０が、辞書データの登録を携帯端末７１０のユーザから受け付ける（ステップＳ８０７）。ここでは、携帯端末７１０は辞書データとして「貞山堀：ていざんぼり」を受け付けたとする。つぎに、携帯端末７１０が、辞書データ「貞山堀：ていざんぼり」を音声認識サーバ１２０へ送信し（ステップＳ８０８）、送信された辞書データを音声認識サーバ１２０が辞書データベース１２２に登録する（ステップＳ８０９）。 Next, the portable terminal 710 receives registration of dictionary data from the user of the portable terminal 710 (step S807). Here, it is assumed that the portable terminal 710 has received “Sadayamabori: Dezabori” as dictionary data. Next, the portable terminal 710 transmits the dictionary data “Sadayamabori: Tezanbori” to the speech recognition server 120 (step S808), and the speech recognition server 120 registers the transmitted dictionary data in the dictionary database 122 (step S809). ).

ここでは、音声認識サーバ１２０は、読みが同じで文字列が異なる複数の辞書データ（「低山堀：ていざんぼり」と「貞山堀：ていざんぼり」）を受信している。この場合は、変換候補生成部１２５は、複数の辞書データのうちの最後に受信された辞書データ「貞山堀：ていざんぼり」の文字列「貞山堀」を優先にした変換候補を生成する。 Here, the speech recognition server 120 has received a plurality of dictionary data (“Takayamabori: Sazazanbori” and “Sadayamabori: Sazazanbori”) that have the same reading but different character strings. In this case, the conversion candidate generation unit 125 generates a conversion candidate giving priority to the character string “Sadayamabori” of the dictionary data “Sadayamabori: Teizanbori” received last among the plurality of dictionary data.

すなわち、変換候補生成部１２５は、音声認識部１２４からの情報が示す読みが「ていざんぼり」である場合の変換候補に「貞山堀」と「低山堀」を含め、「貞山堀」の優先順位を「低山堀」よりも高く設定する。たとえば、変換候補生成部１２５は、「貞山堀」を第１変換候補、「低山堀」を第２変換候補とした変換候補（図３参照）を生成する。 In other words, the conversion candidate generation unit 125 includes “Sadayamabori” and “Takayamabori” as conversion candidates when the reading indicated by the information from the speech recognition unit 124 is “Crispy”, and the priority order of “Sadayamabori” Is set higher than “Takayamabori”. For example, the conversion candidate generation unit 125 generates a conversion candidate (see FIG. 3) with “Sadayamabori” as the first conversion candidate and “Takayamabori” as the second conversion candidate.

これにより、読みが同じで文字列が異なる複数の辞書データが音声認識サーバ１２０の辞書データベース１２２に登録された場合は、複数の文字列のうちの最後に登録された文字列を優先にした変換候補が生成される。このため、新たに命名された地名や人名など、あらかじめ音声認識サーバ１２０に登録されていない文字列を優先的に変換候補とすることができるため、文字列変換の効率をさらに向上させることができる。 As a result, when a plurality of dictionary data having the same reading but different character strings are registered in the dictionary database 122 of the speech recognition server 120, the conversion is performed with priority given to the last registered character string among the plurality of character strings. Candidates are generated. For this reason, since a character string that is not registered in the speech recognition server 120 in advance, such as a newly named place name or personal name, can be preferentially used as a conversion candidate, the efficiency of character string conversion can be further improved. .

図９は、図７に示した動作の具体例２を示すシーケンス図である。まず、携帯端末１１０が、辞書データの登録をユーザから受け付ける（ステップＳ９０１）。ここでは、携帯端末１１０は、辞書データとして「貞山堀：ていざんぼり」を受け付けたとする。つぎに、携帯端末１１０が、辞書データ「貞山堀：ていざんぼり」を音声認識サーバ１２０へ送信し（ステップＳ９０２）、送信された辞書データを音声認識サーバ１２０が辞書データベース１２２に登録する（ステップＳ９０３）。 FIG. 9 is a sequence diagram showing a specific example 2 of the operation shown in FIG. First, the mobile terminal 110 receives registration of dictionary data from the user (step S901). Here, it is assumed that the mobile terminal 110 accepts “Sadayamabori: Teizanbori” as dictionary data. Next, the mobile terminal 110 transmits the dictionary data “Sadayamabori: Teizanbori” to the speech recognition server 120 (step S902), and the speech recognition server 120 registers the transmitted dictionary data in the dictionary database 122 (step S903). ).

つぎに、携帯端末７２０が、辞書データの登録をユーザから受け付ける（ステップＳ９０４）。ここでは、携帯端末７２０は、辞書データとして「赤坂：あかさか」を受け付けたとする。つぎに、携帯端末７２０が、辞書データ「赤坂：あかさか」を音声認識サーバ１２０へ送信し（ステップＳ９０５）、送信された辞書データを音声認識サーバ１２０が辞書データベース１２２に登録する（ステップＳ９０６）。 Next, the portable terminal 720 accepts registration of dictionary data from the user (step S904). Here, it is assumed that the mobile terminal 720 receives “Akasaka: Akasaka” as dictionary data. Next, the portable terminal 720 transmits the dictionary data “Akasaka: Akasaka” to the speech recognition server 120 (step S905), and the speech recognition server 120 registers the transmitted dictionary data in the dictionary database 122 (step S906). .

つぎに、携帯端末７１０が、辞書データの登録をユーザから受け付ける（ステップＳ９０７）。ここでは、携帯端末７１０は、辞書データとして「低山堀：ていざんぼり」を受け付けたとする。つぎに、携帯端末７１０が、辞書データ「低山堀：ていざんぼり」を音声認識サーバ１２０へ送信し（ステップＳ９０８）、送信された辞書データを音声認識サーバ１２０が辞書データベース１２２に登録する（ステップＳ９０９）。 Next, the portable terminal 710 accepts dictionary data registration from the user (step S907). Here, it is assumed that the portable terminal 710 has received “Takayamabori: Dezabori” as dictionary data. Next, the mobile terminal 710 transmits the dictionary data “Takayamabori: Tezanbori” to the speech recognition server 120 (step S908), and the speech recognition server 120 registers the transmitted dictionary data in the dictionary database 122 (step S908). Step S909).

つぎに、携帯端末７２０が、辞書データの登録をユーザから受け付ける（ステップＳ９１０）。ここでは、携帯端末７２０は、辞書データとして「貞山堀：ていざんぼり」を受け付けたとする。つぎに、携帯端末７２０が、辞書データ「貞山堀：ていざんぼり」を音声認識サーバ１２０へ送信し（ステップＳ９１１）、送信された辞書データを音声認識サーバ１２０が辞書データベース１２２に登録する（ステップＳ９１２）。 Next, the portable terminal 720 receives registration of dictionary data from the user (step S910). Here, it is assumed that the mobile terminal 720 has received “Sadayamabori: Dezabori” as dictionary data. Next, the mobile terminal 720 transmits the dictionary data “Sadayamabori: Tezanbori” to the speech recognition server 120 (step S911), and the speech recognition server 120 registers the transmitted dictionary data in the dictionary database 122 (step S912). ).

ここで、音声認識サーバ１２０は、読みおよび文字列が同じ複数の辞書データ（「貞山堀：ていざんぼり」）を受信している。この場合は、変換候補生成部１２５は、この複数の辞書データの文字列「貞山堀」を優先にした変換候補を生成する。たとえば、ステップＳ９０３においては、「貞山堀：ていざんぼり」がはじめて登録されるため、辞書データベース１２２において、「貞山堀：ていざんぼり」の優先度が１に設定される。 Here, the voice recognition server 120 receives a plurality of dictionary data (“Sadayamabori: Tezanbori”) having the same reading and character string. In this case, the conversion candidate generation unit 125 generates conversion candidates giving priority to the character string “Sadayamabori” of the plurality of dictionary data. For example, in step S 903, “Sadayamabori: Seizanbori” is registered for the first time, so the priority of “Sadayamabori: Seizanbori” is set to 1 in the dictionary database 122.

また、ステップＳ９０６においては、「赤坂：あかさか」がはじめて登録されるため、「赤坂：あかさか」の優先度が１に設定される。また、ステップＳ９０９においては、「低山堀：ていざんぼり」がはじめて登録されるため、「低山堀：ていざんぼり」の優先度が１に設定される。また、ステップＳ９１２においては、「貞山堀：ていざんぼり」が２度目に登録されるため、「貞山堀：ていざんぼり」の優先度が２に設定される。 In step S906, since “Akasaka: Akasaka” is registered for the first time, the priority of “Akasaka: Akasaka” is set to 1. In step S909, since “Takayamabori: Tezabori” is registered for the first time, the priority of “Takayamabori: Tezabori” is set to 1. In step S912, “Sadayamabori: Tezanbori” is registered for the second time, so the priority of “Sadayamabori: Tezanbori” is set to 2.

図１０は、図９に示した動作における辞書データベースの一例を示す図である。テーブル１０１０は、辞書データベース１２２に記憶された辞書情報を示している。テーブル１０１０のＮｏ１〜Ｎｏ４のそれぞれは、携帯端末１１０から音声認識サーバ１２０へ送信され、辞書データベース１２２に登録された辞書データを示している。 FIG. 10 is a diagram showing an example of the dictionary database in the operation shown in FIG. A table 1010 shows dictionary information stored in the dictionary database 122. Each of No. 1 to No. 4 in the table 1010 indicates dictionary data transmitted from the mobile terminal 110 to the voice recognition server 120 and registered in the dictionary database 122.

テーブル１０１０のＮｏ１の辞書データ「貞山堀：ていざんぼり」は、図９に示したステップＳ９０３によって登録された辞書データである。テーブル１０１０のＮｏ２の辞書データ「赤坂：あかさか」は、図９に示したステップＳ９０６によって登録された辞書データである。テーブル１０１０のＮｏ３の辞書データ「低山堀：ていざんぼり」は、図９に示したステップＳ９０９によって登録された辞書データである。 The No. 1 dictionary data “Sadayamabori: Tezanbori” in the table 1010 is the dictionary data registered in step S903 shown in FIG. The dictionary data “Akasaka: Akasaka” of No. 2 in the table 1010 is the dictionary data registered in step S906 shown in FIG. The dictionary data “Takayamabori: Tezanbori” of No. 3 in the table 1010 is the dictionary data registered in step S909 shown in FIG.

ステップＳ９０９の時点では、辞書データ「貞山堀：ていざんぼり」「赤坂：あかさか」「低山堀：ていざんぼり」はそれぞれ１回ずつ登録されているため、それぞれの辞書データの優先度は１に設定されている。そして、ステップＳ９０９によって、辞書データ「貞山堀：ていざんぼり」がＮｏ４の辞書データとして登録される。 At the time of step S909, the dictionary data “Sadayamabori: Teizanbori”, “Akasaka: Akasaka”, and “Takayamabori: Tezanbori” are registered once each, so the priority of each dictionary data is 1 Is set to Then, in step S909, the dictionary data “Sadayamabori: Teizanbori” is registered as No. 4 dictionary data.

ここで、Ｎｏ４の辞書データ「貞山堀：ていざんぼり」は、Ｎｏ１の辞書データ「貞山堀：ていざんぼり」と読みおよび文字列が同じであるため、テーブル１０２０のように、Ｎｏ１の辞書データ「貞山堀：ていざんぼり」の優先度に１を加算する。このため、Ｎｏ１の辞書データ「貞山堀：ていざんぼり」の優先度は２となる。 Here, the dictionary data “No. 4 Sadayamabori” is the same as the No. 1 dictionary data “Sadayamabori: Sazazanbori”, and the character string is the same. : Adds 1 to the priority of “Crispy”. For this reason, the priority of the dictionary data “Sadayamabori: Tezanbori” of No1 is 2.

この場合は、変換候補生成部１２５は、音声認識部１２４から出力された情報が示す読みが「ていざんぼり」である場合の変換候補に「貞山堀」と「低山堀」を含め、優先度が２である「貞山堀」の優先順位を、優先度が１である「低山堀」よりも高く設定する。たとえば、変換候補生成部１２５は、「貞山堀」を第１変換候補、「低山堀」を第２変換候補とした変換候補（図３参照）を生成する。 In this case, the conversion candidate generation unit 125 includes “Sadayamabori” and “Takayamabori” as the conversion candidates when the reading indicated by the information output from the speech recognition unit 124 is “Crispy”. The priority order of “Sadayamabori” with 2 is set higher than that of “Low mountain moat” with a priority of 1. For example, the conversion candidate generation unit 125 generates a conversion candidate (see FIG. 3) with “Sadayamabori” as the first conversion candidate and “Takayamabori” as the second conversion candidate.

これにより、読みおよび文字列が同じ複数の辞書データが音声認識サーバ１２０の辞書データベース１２２に登録された場合は、その辞書データの文字列を優先にした変換候補が生成される。このため、多くのユーザによって登録された文字列を優先的に変換候補とすることができるため、文字列変換の効率をさらに向上させることができる。 Thereby, when a plurality of dictionary data having the same reading and character string are registered in the dictionary database 122 of the speech recognition server 120, conversion candidates giving priority to the character string of the dictionary data are generated. For this reason, since character strings registered by many users can be preferentially used as conversion candidates, the efficiency of character string conversion can be further improved.

図１１は、実施の形態２にかかる音声認識サーバの実施例を示す図である。図１１に示すように、通信システム１１００は、親サーバ１１１０と、子サーバ１１２１〜１１２５と、を含んでいる。子サーバ１１２１〜１１２５のそれぞれは、それぞれ異なる地域「北海道」「東北」「関東」「近畿」「九州」に配置されている。子サーバ１１２１〜１１２５のそれぞれは、実施の形態２にかかる音声認識サーバ１２０を備えている。 FIG. 11 is a diagram of an example of the speech recognition server according to the second embodiment. As illustrated in FIG. 11, the communication system 1100 includes a parent server 1110 and child servers 1121 to 1125. Each of the child servers 1121 to 1125 is arranged in different regions “Hokkaido”, “Tohoku”, “Kanto”, “Kinki”, and “Kyushu”. Each of the child servers 1121 to 1125 includes the speech recognition server 120 according to the second embodiment.

子サーバ１１２１〜１１２５のそれぞれは、自装置が配置された地域の携帯端末との間で通信を行い、通信を行う携帯端末に対して上述した音声認識サービスを提供する。各地域の携帯端末は、上述した携帯端末１１０と同様の機能的構成を備えている。また、子サーバ１１２１〜１１２５のそれぞれは、親サーバ１１１０に接続されている。 Each of the child servers 1121 to 1125 performs communication with the mobile terminal in the area where the own device is arranged, and provides the above-described voice recognition service to the mobile terminal that performs communication. The mobile terminal in each region has the same functional configuration as the mobile terminal 110 described above. In addition, each of the child servers 1121 to 1125 is connected to the parent server 1110.

子サーバ１１２１〜１１２５のそれぞれは、辞書データベース１２２に登録された辞書データを親サーバ１１１０へ送信する送信手段（不図示）を備えている。子サーバ１１２１〜１１２５が親サーバ１１１０へ辞書データを送信するタイミングは、定期的でもよいし、辞書データベース１２２に新たな辞書データが登録されたときでもよい。 Each of the child servers 1121 to 1125 includes transmission means (not shown) that transmits the dictionary data registered in the dictionary database 122 to the parent server 1110. The timing at which the child servers 1121 to 1125 transmit the dictionary data to the parent server 1110 may be regular or when new dictionary data is registered in the dictionary database 122.

図１２は、図１１に示した各子サーバの辞書データベースの一例を示す図である。図１２において、テーブル１２１０，１２２０，１２３０は、それぞれ子サーバ１１２１（北海道），子サーバ１１２２（東北），子サーバ１１２３（関東）の辞書データベース１２２に記憶された辞書データを示している。子サーバ１１２１〜１１２３は、それぞれテーブル１２１０，１２２０，１２３０を親サーバ１１１０へ送信する。 FIG. 12 is a diagram illustrating an example of the dictionary database of each child server illustrated in FIG. 11. In FIG. 12, tables 1210, 1220, and 1230 indicate dictionary data stored in the dictionary database 122 of the child server 1121 (Hokkaido), the child server 1122 (Tohoku), and the child server 1123 (Kanto), respectively. The child servers 1121 to 1123 transmit the tables 1210, 1220, and 1230 to the parent server 1110, respectively.

図１３は、図１１に示した親サーバの辞書データベースの一例を示す図である。親サーバ１１１０は、子サーバ１１２１〜１１２３からそれぞれテーブル１２１０，１２２０，１２３０を受信すると、テーブル１２１０，１２２０，１２３０を含めたテーブル１３１０を生成する。ここで、テーブル１２１０，１２２０，１２３０の中には、読みおよび文字列が同じ複数の辞書データ（「男爵：だんしゃく」）が含まれている。 FIG. 13 is a diagram showing an example of the dictionary database of the parent server shown in FIG. When the parent server 1110 receives the tables 1210, 1220, and 1230 from the child servers 1121 to 1123, respectively, the parent server 1110 generates a table 1310 including the tables 1210, 1220, and 1230. Here, the tables 1210, 1220, and 1230 include a plurality of dictionary data (“baron: dansaku”) having the same reading and character string.

この場合は、親サーバ１１１０は、読みおよび文字列が同じ複数の辞書データ「男爵：だんしゃく」の優先度を高く設定する。具体的には、テーブル１２１０，１２２０，１２３０の中に辞書データ「男爵：だんしゃく」が３つ含まれているため、テーブル１３１０の「男爵：だんしゃく」の優先度を３に設定する。 In this case, the parent server 1110 sets a higher priority for a plurality of dictionary data “baron: dansaku” having the same reading and character string. Specifically, since the table 1210, 1220, 1230 includes three dictionary data “Baron: Dansaku”, the priority of “Baron: Dansaku” in the table 1310 is set to 3.

親サーバ１１１０は、生成したテーブル１３１０を子サーバ１１２１〜１１２５のそれぞれへ送信する。子サーバ１１２１〜１１２５は、親サーバ１１１０から受信したテーブル１３１０を辞書データとして辞書データベース１２２に登録する。これにより、子サーバ１１２１〜１１２５のそれぞれは、他の子サーバに登録された辞書データを反映させた辞書データを辞書データベース１２２に登録することができる。 The parent server 1110 transmits the generated table 1310 to each of the child servers 1121 to 1125. The child servers 1121 to 1125 register the table 1310 received from the parent server 1110 in the dictionary database 122 as dictionary data. Thereby, each of the child servers 1121 to 1125 can register the dictionary data reflecting the dictionary data registered in the other child servers in the dictionary database 122.

ここでは、子サーバ１１２１〜１１２５の各辞書データを親サーバ１１１０が取得して、取得した辞書データをまとめて子サーバ１１２１〜１１２５へ送信する場合について説明したが、子サーバ１１２１〜１１２５は、親サーバ１１１０へ辞書データを送信しない構成にしてもよい。この場合は、子サーバ１１２１〜１１２５の各辞書データベース１２２は、それぞれ自装置が設けられた地域内で登録された辞書データの集合となる。 Here, a case has been described in which the parent server 1110 acquires each dictionary data of the child servers 1121 to 1125 and collectively transmits the acquired dictionary data to the child servers 1121 to 1125. A configuration in which dictionary data is not transmitted to the server 1110 may be adopted. In this case, each of the dictionary databases 122 of the child servers 1121 to 1125 is a set of dictionary data registered in the area where the own device is provided.

これにより、地域名や方言などの、携帯端末１１０が位置する地域に特有な文字列変換を行う場合においても、ユーザの意図する文字列を精度よく変換候補として出力することができる。このため、文字列変換の効率をさらに向上させることができる。また、音声認識サーバ１２０の地域に特有な辞書データを記憶しておき、他の地域に特有な辞書データは記憶しないため、辞書データベース１２２に必要な記憶容量を低減することができる。 Thereby, even when performing character string conversion specific to the region where the mobile terminal 110 is located, such as a region name or dialect, a character string intended by the user can be output as a conversion candidate with high accuracy. For this reason, the efficiency of character string conversion can be further improved. Further, dictionary data specific to the area of the voice recognition server 120 is stored and dictionary data specific to other areas is not stored, so that the storage capacity required for the dictionary database 122 can be reduced.

また、辞書データベース１２２に登録する辞書データは、辞書データを送信した携帯端末に設定されたユーザ特性によって分類されていてもよい。ユーザ特性とは、携帯端末のユーザの性別や年代などの各種特性である。この場合は、音声認識サーバ１２０は、携帯端末１１０に設定されたユーザ特性を携帯端末１１０から取得する。 Further, the dictionary data registered in the dictionary database 122 may be classified according to user characteristics set in the mobile terminal that transmitted the dictionary data. The user characteristics are various characteristics such as the gender and age of the user of the mobile terminal. In this case, the voice recognition server 120 acquires user characteristics set in the mobile terminal 110 from the mobile terminal 110.

そして、変換候補生成部１２５は、辞書データベース１２２の辞書データのうちの、携帯端末１１０から取得したユーザ特性に分類された辞書データを用いて変換候補を生成する。これにより、ユーザの性別や年代などの、携帯端末１１０のユーザの特性に特有な文字列変換を行う場合においても、ユーザの意図する文字列を精度よく変換候補として出力することができる。このため、文字列変換の効率をさらに向上させることができる。 Then, the conversion candidate generation unit 125 generates conversion candidates using dictionary data classified into user characteristics acquired from the mobile terminal 110 in the dictionary data of the dictionary database 122. Thereby, even when performing character string conversion specific to the user characteristics of the mobile terminal 110 such as the user's gender and age, the character string intended by the user can be output as a conversion candidate with high accuracy. For this reason, the efficiency of character string conversion can be further improved.

このように、実施の形態２にかかる音声認識サーバ１２０によれば、音声認識した読みの変換候補を、携帯端末１１０を含む携帯端末群のいずれかに登録された各辞書データを用いて生成することができる。これにより、携帯端末１１０とは異なる携帯端末において登録された辞書データを、携帯端末１１０の文字列変換に用いることができる。 As described above, according to the speech recognition server 120 according to the second exemplary embodiment, the speech-recognized reading conversion candidate is generated using each dictionary data registered in one of the mobile terminal groups including the mobile terminal 110. be able to. Thereby, dictionary data registered in a mobile terminal different from the mobile terminal 110 can be used for character string conversion of the mobile terminal 110.

また、複数の携帯端末のいずれかのユーザによって実際に使用されている辞書データを辞書データベース１２２に登録することができる。したがって、あらかじめ辞書データベース１２２に大量の辞書データを登録しておく場合と比べて、実際に使用されている辞書データを自動的かつ効率的に辞書データベース１２２に登録することができる。このため、文字列変換の効率を向上させることができる。 In addition, dictionary data actually used by any one of a plurality of portable terminals can be registered in the dictionary database 122. Therefore, compared to a case where a large amount of dictionary data is registered in the dictionary database 122 in advance, dictionary data actually used can be registered in the dictionary database 122 automatically and efficiently. For this reason, the efficiency of character string conversion can be improved.

また、実施の形態２にかかる音声認識サーバ１２０において、辞書データベース１２２に登録した辞書データのうちの、一定期間、変換候補生成部１２５によって変換候補とされなかった辞書データを自動的に消去する構成にしてもよい。これにより、長期間使用されなかったり、いずれかの携帯端末によって誤って登録されたりした辞書データを自動的に消去することができる。このため、変換候補生成部１２５による文字列変換の精度を向上させるとともに、辞書データベース１２２に必要な記憶容量を低減することができる。 In addition, in the speech recognition server 120 according to the second exemplary embodiment, the dictionary data that has not been converted into conversion candidates by the conversion candidate generation unit 125 for a certain period of time among the dictionary data registered in the dictionary database 122 is automatically deleted. It may be. Thereby, dictionary data that has not been used for a long period of time or has been mistakenly registered by any portable terminal can be automatically deleted. Therefore, the accuracy of character string conversion by the conversion candidate generation unit 125 can be improved, and the storage capacity required for the dictionary database 122 can be reduced.

（音声認識サーバのハードウェア構成）
図１４は、音声認識サーバのハードウェア構成を示すブロック図である。図１に示した音声認識サーバ１２０は、ＣＰＵ１４１１と、メモリ１４１２と、ネットワーク通信インターフェース１４１３（ネットワーク通信Ｉ／Ｆ）と、ユーザインターフェース１４１４（ユーザＩ／Ｆ）と、を備えたコンピュータ１４００によって実現することができる。 (Hardware configuration of voice recognition server)
FIG. 14 is a block diagram illustrating a hardware configuration of the voice recognition server. The voice recognition server 120 illustrated in FIG. 1 is realized by a computer 1400 including a CPU 1411, a memory 1412, a network communication interface 1413 (network communication I / F), and a user interface 1414 (user I / F). be able to.

ＣＰＵ１４１１、メモリ１４１２、ネットワーク通信インターフェース１４１３およびユーザインターフェース１４１４は、バス１４３０を介して互いに接続されている。ＣＰＵ１４１１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）は、コンピュータ１４００の全体の制御を司る。 The CPU 1411, the memory 1412, the network communication interface 1413, and the user interface 1414 are connected to each other via a bus 1430. A CPU 1411 (Central Processing Unit) controls the entire computer 1400.

メモリ１４１２は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＨＤ（ＨａｒｄＤｉｓｋ）、光ディスク、フラッシュメモリなどである。メモリ１４１２はＣＰＵ１４１１のワークエリアとして使用される。また、メモリ１４１２には各種プログラムが格納されている。メモリ１４１２に格納された各種プログラムは、ＣＰＵ１４１１からの命令に応じてロードされる。 The memory 1412 is a ROM (Read Only Memory), a RAM (Random Access Memory), an HD (Hard Disk), an optical disk, a flash memory, or the like. The memory 1412 is used as a work area for the CPU 1411. The memory 1412 stores various programs. Various programs stored in the memory 1412 are loaded in accordance with instructions from the CPU 1411.

ネットワーク１４２０は、移動体通信網である。ネットワーク通信インターフェース１４１３は、ネットワーク１４２０を介して無線通信を行うモジュールである。ネットワーク通信インターフェース１４１３は、たとえば、無線通信を行うアンテナなどである。 Network 1420 is a mobile communication network. The network communication interface 1413 is a module that performs wireless communication via the network 1420. The network communication interface 1413 is, for example, an antenna that performs wireless communication.

ユーザインターフェース１４１４は、ユーザとの間で情報の入出力を行う。ユーザインターフェース１４１４は、ユーザからの入力を受け付ける各種キー、タッチパネルまたはマイクなどを備えている。また、ユーザインターフェース１４１４は、ユーザへ情報を出力する表示画面、発光部またはスピーカなどを備えている。 The user interface 1414 inputs and outputs information with the user. The user interface 1414 includes various keys that accept input from the user, a touch panel, a microphone, and the like. The user interface 1414 includes a display screen for outputting information to the user, a light emitting unit, a speaker, and the like.

図１に示した辞書受信部１２１、音声受信部１２３および変換候補送信部１２６は、ネットワーク通信インターフェース１４１３によって実現することができる。音声認識部１２４および変換候補生成部１２５は、ＣＰＵ１４１１によって実現することができる。辞書データベース１２２は、メモリ１４１２によって実現することができる。 The dictionary reception unit 121, the voice reception unit 123, and the conversion candidate transmission unit 126 illustrated in FIG. 1 can be realized by the network communication interface 1413. The voice recognition unit 124 and the conversion candidate generation unit 125 can be realized by the CPU 1411. The dictionary database 122 can be realized by the memory 1412.

また、図１に示した携帯端末１１０も、コンピュータ１４００によって実現することができる。辞書送信部１１３、音声送信部１１６および変換候補受信部１１７は、ネットワーク通信インターフェース１４１３によって実現することができる。特徴抽出部１１５は、ＣＰＵ１４１１によって実現することができる。辞書登録部１１１、音声入力部１１４および変換候補出力部１１８は、ユーザインターフェース１４１４によって実現することができる。辞書記憶部１１２は、メモリ１４１２によって実現することができる。 The mobile terminal 110 shown in FIG. 1 can also be realized by the computer 1400. The dictionary transmitting unit 113, the voice transmitting unit 116, and the conversion candidate receiving unit 117 can be realized by the network communication interface 1413. The feature extraction unit 115 can be realized by the CPU 1411. The dictionary registration unit 111, the voice input unit 114, and the conversion candidate output unit 118 can be realized by the user interface 1414. The dictionary storage unit 112 can be realized by the memory 1412.

本実施の形態で説明した音声認識方法は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーション等のコンピュータで実行することにより実現することができる。このプログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネット等のネットワークを介して配布することが可能な媒体であってもよい。 The speech recognition method described in this embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The program may be a medium that can be distributed through a network such as the Internet.

以上説明したように、開示の音声認識サーバ、通信システムおよび音声認識方法によれば、文字列変換の効率を向上させることができる。上述した実施の形態に関し、さらに以下の付記を開示する。 As described above, according to the disclosed speech recognition server, communication system, and speech recognition method, the efficiency of character string conversion can be improved. The following additional notes are disclosed with respect to the embodiment described above.

（付記１）文字列と読みが対応付けられた辞書データを記憶する記憶手段と、
携帯端末に登録され、前記辞書データとは異なる辞書データを前記携帯端末から受信する辞書受信手段と、
音声の特徴を示す音声データを前記携帯端末から受信する音声受信手段と、
前記音声受信手段によって受信された音声データの読みを認識する音声認識手段と、
前記記憶手段によって記憶された辞書データと、前記辞書受信手段によって受信された辞書データと、を用いて前記音声認識手段によって認識された読みの変換候補を生成する生成手段と、
前記生成手段によって生成された変換候補を前記携帯端末へ送信する送信手段と、
を備えることを特徴とする音声認識サーバ。 (Supplementary note 1) storage means for storing dictionary data in which character strings and readings are associated;
A dictionary receiving means that is registered in the portable terminal and receives dictionary data different from the dictionary data from the portable terminal;
Voice receiving means for receiving voice data indicating voice characteristics from the portable terminal;
Voice recognition means for recognizing reading of voice data received by the voice reception means;
Generating means for generating conversion candidates for reading recognized by the voice recognition means using dictionary data stored by the storage means and dictionary data received by the dictionary receiving means;
Transmitting means for transmitting the conversion candidate generated by the generating means to the mobile terminal;
A speech recognition server comprising:

（付記２）前記辞書受信手段は、前記携帯端末からの音声認識要求時に送信される前記辞書データを受信することを特徴とする付記１に記載の音声認識サーバ。 (Additional remark 2) The said dictionary reception means receives the said dictionary data transmitted at the time of the speech recognition request | requirement from the said portable terminal, The speech recognition server of Additional remark 1 characterized by the above-mentioned.

（付記３）前記携帯端末からの音声認識終了要求時に、前記辞書受信手段によって受信された辞書データを自装置から削除する削除手段を備えることを特徴とする付記２に記載の音声認識サーバ。 (Supplementary note 3) The voice recognition server according to supplementary note 2, further comprising a deletion unit that deletes the dictionary data received by the dictionary reception unit from the own device when a voice recognition end request is made from the portable terminal.

（付記４）前記辞書受信手段は、前記携帯端末の辞書登録時に送信される前記辞書データを受信することを特徴とする付記１に記載の音声認識サーバ。 (Additional remark 4) The said dictionary reception means receives the said dictionary data transmitted at the time of the dictionary registration of the said portable terminal, The speech recognition server of Additional remark 1 characterized by the above-mentioned.

（付記５）前記辞書受信手段は、前記携帯端末を含む携帯端末群から送信される各辞書データを受信することを特徴とする付記１に記載の音声認識サーバ。 (Additional remark 5) The said dictionary reception means receives each dictionary data transmitted from the portable terminal group containing the said portable terminal, The speech recognition server of Additional remark 1 characterized by the above-mentioned.

（付記６）前記生成手段は、前記辞書受信手段によって前記読みが同じで前記文字列が異なる複数の辞書データが受信された場合は、前記複数の辞書データのうちの最後に受信された辞書データの文字列を優先にした変換候補を生成することを特徴とする付記５に記載の音声認識サーバ。 (Supplementary note 6) When the dictionary receiving unit receives a plurality of dictionary data having the same reading but different character strings, the generating unit receives the dictionary data received last among the plurality of dictionary data 6. The speech recognition server according to appendix 5, wherein a conversion candidate giving priority to the character string is generated.

（付記７）前記生成手段は、前記辞書受信手段によって前記文字列および前記読みが同じ複数の辞書データが受信された場合は、前記複数の辞書データの文字列を優先にした変換候補を生成することを特徴とする付記５に記載の音声認識サーバ。 (Supplementary Note 7) When the dictionary receiving unit receives a plurality of dictionary data having the same character string and the same reading, the generating unit generates a conversion candidate giving priority to the character strings of the plurality of dictionary data. The speech recognition server according to appendix 5, wherein

（付記８）付記１〜７のいずれか一つに記載の音声認識サーバを備える複数の子サーバと、親サーバと、を含む通信システムであって、
前記複数の子サーバは、それぞれ異なる地域に配置されるとともに、前記辞書受信手段によって受信された辞書データを親サーバへ送信する辞書送信手段を備え、
前記親サーバは、複数の前記音声認識サーバから送信された各辞書データを含めた辞書データを前記複数の子サーバへ送信し、
前記複数の子サーバの前記辞書受信手段は、前記親サーバによって送信された辞書データを受信することを特徴とする通信システム。 (Appendix 8) A communication system including a plurality of child servers including the voice recognition server according to any one of appendices 1 to 7, and a parent server,
The plurality of child servers are arranged in different regions, respectively, and include dictionary transmission means for transmitting the dictionary data received by the dictionary reception means to the parent server,
The parent server transmits dictionary data including each dictionary data transmitted from the plurality of voice recognition servers to the plurality of child servers,
The dictionary receiving means of the plurality of child servers receives dictionary data transmitted by the parent server.

（付記９）前記親サーバは、前記複数の子サーバによって前記文字列および前記読みが同じ複数の辞書データが送信された場合は、前記複数の辞書データを優先にした辞書データを送信することを特徴とする付記８に記載の通信システム。 (Supplementary note 9) When a plurality of dictionary data having the same character string and the same reading are transmitted by the plurality of child servers, the parent server transmits dictionary data giving priority to the plurality of dictionary data. The communication system according to Supplementary Note 8, wherein the communication system is characterized.

（付記１０）文字列と読みが対応付けられた辞書データを記憶する記憶工程と、
携帯端末に登録され、前記辞書データとは異なる辞書データを前記携帯端末から受信する辞書受信工程と、
音声の特徴を示す音声データを前記携帯端末から受信する音声受信工程と、
前記音声受信工程によって受信された音声データの読みを認識する音声認識工程と、
前記記憶工程によって記憶された辞書データと、前記辞書受信工程によって受信された辞書データとを用いて前記音声認識工程によって認識された読みの変換候補を生成する生成工程と、
前記生成工程によって生成された変換候補を前記携帯端末へ送信する送信工程と、
を含むことを特徴とする音声認識方法。 (Additional remark 10) The memory | storage process which memorize | stores the dictionary data with which the character string and the reading were matched,
A dictionary receiving step of receiving from the portable terminal dictionary data different from the dictionary data registered in the portable terminal;
An audio receiving step of receiving audio data indicating audio characteristics from the mobile terminal;
A voice recognition step for recognizing reading of the voice data received by the voice reception step;
A generation step of generating conversion candidates for reading recognized by the voice recognition step using the dictionary data stored by the storage step and the dictionary data received by the dictionary reception step;
A transmission step of transmitting the conversion candidate generated by the generation step to the mobile terminal;
A speech recognition method comprising:

実施の形態１にかかる音声認識システムの機能的構成を示すブロック図である。1 is a block diagram showing a functional configuration of a speech recognition system according to a first exemplary embodiment. 実施の形態１にかかる音声認識システムの動作の一例を示すシーケンス図である。FIG. 3 is a sequence diagram illustrating an example of an operation of the speech recognition system according to the first exemplary embodiment. 変換候補の出力の一例を示す図である。It is a figure which shows an example of the output of a conversion candidate. 実施の形態１にかかる音声認識システムの終了動作の一例を示すシーケンス図である。FIG. 6 is a sequence diagram illustrating an example of an end operation of the voice recognition system according to the first exemplary embodiment. 図２に示した動作の他の例を示すシーケンス図（その１）である。FIG. 6 is a sequence diagram (part 1) illustrating another example of the operation illustrated in FIG. 2. 図２に示した動作の他の例を示すシーケンス図（その２）である。FIG. 11 is a sequence diagram (part 2) illustrating another example of the operation illustrated in FIG. 2. 実施の形態２にかかる音声認識システムの動作の一例を示すシーケンス図である。FIG. 10 is a sequence diagram illustrating an example of an operation of the speech recognition system according to the second exemplary embodiment. 図７に示した動作の具体例１を示すシーケンス図である。It is a sequence diagram which shows the specific example 1 of the operation | movement shown in FIG. 図７に示した動作の具体例２を示すシーケンス図である。FIG. 8 is a sequence diagram showing a specific example 2 of the operation shown in FIG. 7. 図９に示した動作における辞書データベースの一例を示す図である。It is a figure which shows an example of the dictionary database in the operation | movement shown in FIG. 実施の形態２にかかる音声認識サーバの実施例を示す図である。It is a figure which shows the Example of the speech recognition server concerning Embodiment 2. 図１１に示した各子サーバの辞書データベースの一例を示す図である。It is a figure which shows an example of the dictionary database of each child server shown in FIG. 図１１に示した親サーバの辞書データベースの一例を示す図である。It is a figure which shows an example of the dictionary database of the parent server shown in FIG. 音声認識サーバのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of a speech recognition server.

Explanation of symbols

１００音声認識システム
１１０，７１０，７２０携帯端末
１２０音声認識サーバ
３００表示画面
３１０文字列
３１１文字列部分
３２０変換候補
１０１０，１０２０，１２１０，１２２０，１２３０，１３１０テーブル
１１００通信システム
１１１０親サーバ
１１２１〜１１２５子サーバ DESCRIPTION OF SYMBOLS 100 Voice recognition system 110,710,720 Portable terminal 120 Voice recognition server 300 Display screen 310 Character string 311 Character string part 320 Conversion candidate 1010,1020,1210,1220,1230,1310 Table 1100 Communication system 1110 Parent server 1121-1125 Child server

Claims

Storage means for storing dictionary data in which character strings and readings are associated;
A dictionary receiving means that is registered in the portable terminal and receives dictionary data different from the dictionary data from the portable terminal;
Voice receiving means for receiving voice data indicating voice characteristics from the portable terminal;
Voice recognition means for recognizing reading of voice data received by the voice reception means;
Generating means for generating conversion candidates for reading recognized by the voice recognition means using dictionary data stored by the storage means and dictionary data received by the dictionary receiving means;
Transmitting means for transmitting the conversion candidate generated by the generating means to the mobile terminal;
A speech recognition server comprising:

The speech recognition server according to claim 1, wherein the dictionary receiving unit receives the dictionary data transmitted when a speech recognition request is made from the portable terminal.

The speech recognition server according to claim 1, wherein the dictionary receiving unit receives each dictionary data transmitted from a mobile terminal group including the mobile terminals.

The generating means generates a conversion candidate giving priority to the character strings of the plurality of dictionary data when the dictionary receiving means receives a plurality of dictionary data having the same character string and reading. The speech recognition server according to claim 3.

A communication system including a plurality of child servers including the voice recognition server according to any one of claims 1 to 4, and a parent server,
The plurality of child servers are arranged in different regions, respectively, and include dictionary transmission means for transmitting the dictionary data received by the dictionary reception means to the parent server,
The parent server transmits dictionary data including each dictionary data transmitted from the plurality of voice recognition servers to the plurality of child servers,
The dictionary receiving means of the plurality of child servers receives dictionary data transmitted by the parent server.

A storage step of storing dictionary data in which character strings and readings are associated;
A dictionary receiving step of receiving from the portable terminal dictionary data different from the dictionary data registered in the portable terminal;
An audio receiving step of receiving audio data indicating audio characteristics from the mobile terminal;
A voice recognition step for recognizing reading of the voice data received by the voice reception step;
A generation step of generating conversion candidates for reading recognized by the voice recognition step using the dictionary data stored by the storage step and the dictionary data received by the dictionary reception step;
A transmission step of transmitting the conversion candidate generated by the generation step to the mobile terminal;
A speech recognition method comprising: