JP6833203B2

JP6833203B2 - Voice recognition system, voice recognition server, terminal device, and phrase management method

Info

Publication number: JP6833203B2
Application number: JP2017025725A
Authority: JP
Inventors: 浩明小窪; 松本　卓也; 卓也松本; 則男度會; 睿張; 和憲中山; 本間　健; 健本間
Original assignee: Clarion Co Ltd; Faurecia Clarion Electronics Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2017-02-15
Filing date: 2017-02-15
Publication date: 2021-02-24
Anticipated expiration: 2037-02-15
Also published as: JP2018132626A

Description

本発明は、音声認識システム、音声認識サーバ、端末装置、及び語句管理方法に関する。 The present invention relates to a voice recognition system, a voice recognition server, a terminal device, and a phrase management method.

近年、スマートフォンやカーナビゲーション等の端末装置には、サーバ側に搭載された音声認識エンジンを使用した音声入力機能を有するものがある。サーバ側の音声認識エンジンは、端末装置に比べて豊富な計算機リソース（例えば、演算処理速度や記憶容量）を活用することができる。これにより、サーバ側の音声認識エンジンは、幅広い語彙を認識できる可能性を持っており、また、さまざまな音環境で入力された音声を精度よく認識できる可能性を持っている。 In recent years, some terminal devices such as smartphones and car navigation systems have a voice input function using a voice recognition engine mounted on the server side. The voice recognition engine on the server side can utilize abundant computer resources (for example, arithmetic processing speed and storage capacity) as compared with the terminal device. As a result, the voice recognition engine on the server side has the possibility of recognizing a wide range of vocabulary, and also has the possibility of accurately recognizing the voice input in various sound environments.

しかし、サーバ側の音声認識エンジンは、過疎地やトンネル内などの端末装置とデータ通信を確立できない環境では、使用することができない。そこで、端末装置側にも音声認識エンジンを搭載し、状況に応じて音声認識処理を端末装置側又はサーバ側に振り分けるように、システムを構成することもできる。 However, the server-side voice recognition engine cannot be used in an environment where data communication cannot be established with a terminal device such as in a depopulated area or in a tunnel. Therefore, it is also possible to mount a voice recognition engine on the terminal device side and configure the system so that the voice recognition process is distributed to the terminal device side or the server side depending on the situation.

特許文献１には、「振り分け判定部１０２は、解析した入力モードが該当項目選択モードであるか否かを判定する（ステップ１０４）。現在の入力モードが該当項目選択モードである場合には肯定判断が行われる。次に、車載装置１に内蔵された音声認識処理部１００は、マイクロホン２２によって集音された利用者の音声に対して音声認識処理を行う（ステップ１０６）。」、「一方、現在の入力モードがテキスト入力モードである場合にはステップ１０４の判定において否定判断が行われる。次に、音声データ送信部５６は、マイクロホン２２から入力されて圧縮処理部２６によって圧縮処理された音声データをネットワーク３を介してサーバ２に向けて送信して、サーバ２内の音声認識処理部２００による音声認識処理を依頼する（ステップ１１０）。」と記載されている。 According to Patent Document 1, "The distribution determination unit 102 determines whether or not the analyzed input mode is the corresponding item selection mode (step 104). If the current input mode is the corresponding item selection mode, it is affirmative. The determination is made. Next, the voice recognition processing unit 100 built in the vehicle-mounted device 1 performs voice recognition processing on the user's voice collected by the microphone 22 (step 106). ”,“ One side. When the current input mode is the text input mode, a negative determination is made in the determination in step 104. Next, the voice data transmission unit 56 is input from the microphone 22 and compressed by the compression processing unit 26. The voice data is transmitted to the server 2 via the network 3 to request the voice recognition processing by the voice recognition processing unit 200 in the server 2 (step 110). "

特開２０１３−８８４７７号公報Japanese Unexamined Patent Publication No. 2013-888477

端末装置に搭載される音声認識エンジンは、当該端末装置の計算リソースの制約のため、サーバ側の音声認識エンジンに比べて、認識できる語句が少ない。一方、サーバ側の音声認識エンジンは、端末装置と比べて自由にメンテナンス可能であることから、語句を新たに音声認識辞書に追加することが容易である。そのため、サーバ側の音声認識エンジンでは正しく認識できるものの、端末装置側の音声認識エンジンでは認識できない語句が多く存在する。このような語句は、サーバ側の音声認識エンジンを使用することができない環境では、端末装置によって認識されず、端末装置のユーザは不便を強いられることになる。 The voice recognition engine mounted on the terminal device can recognize fewer words and phrases than the voice recognition engine on the server side due to the limitation of the computing resources of the terminal device. On the other hand, since the voice recognition engine on the server side can be freely maintained as compared with the terminal device, it is easy to add a new phrase to the voice recognition dictionary. Therefore, there are many words and phrases that can be correctly recognized by the voice recognition engine on the server side, but cannot be recognized by the voice recognition engine on the terminal device side. In an environment where the server-side voice recognition engine cannot be used, such words and phrases are not recognized by the terminal device, and the user of the terminal device is inconvenienced.

ここで、サーバ側の音声認識エンジンで認識された語句のうちユーザの使用頻度が高い語句を、そのユーザの端末装置側の音声認識辞書に追加すれば、端末装置側で音声認識可能な語彙を増やすことができる。しかしながら、そのユーザが使用したことがない又は使用頻度が低い語句は、音声認識辞書に追加されない又は追加が遅れる。そのため、新しくオープンした店舗の名称などの新しい語句については、ユーザが使用する可能性があってもユーザが使用しない限り音声認識辞書に追加されず、ユーザの利便性を向上できない。 Here, if the words and phrases that are frequently used by the user among the words and phrases recognized by the voice recognition engine on the server side are added to the voice recognition dictionary on the terminal device side of the user, the vocabulary that can be voice-recognized on the terminal device side can be obtained. Can be increased. However, words that the user has never used or are used infrequently are not added to the speech recognition dictionary or are delayed in addition. Therefore, new words and phrases such as the name of a newly opened store are not added to the speech recognition dictionary unless the user uses them even if they may be used, and the convenience of the user cannot be improved.

本発明は、上記の問題に鑑みてなされたものであり、好適な音声認識システム、音声認識サーバ、端末装置、及び語句管理方法を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a suitable voice recognition system, voice recognition server, terminal device, and phrase management method.

本発明は、上記課題の少なくとも一部を解決する手段を複数含んでいるが、その例を挙げるならば、以下のとおりである。 The present invention includes a plurality of means for solving at least a part of the above problems, and examples thereof are as follows.

本発明の一態様は、ユーザの音声データを音声認識する端末装置と、前記端末装置と通信し前記ユーザの音声データを音声認識する音声認識サーバとを備える音声認識システムである。前記音声認識サーバは、前記端末装置と通信するサーバ側通信制御部と、前記端末装置から送信された前記ユーザの音声データを音声認識し、その認識結果を前記端末装置に送信するサーバ側音声認識部と、前記サーバ側通信制御部を用いて取得した前記端末装置から送信された前記語句に関する情報と、他の端末装置から送信された語句に関する情報とを、語句リストに登録する語句管理部と、前記語句リストに登録された語句に関する情報を、前記端末装置及び前記他の端末装置の少なくとも一方に前記サーバ側通信制御部を用いて配信する語句配信部と、を備える。前記端末装置は、前記音声認識サーバと通信する端末側通信制御部と、音声認識するための音声認識辞書を記憶する音声認識辞書記憶部と、前記ユーザの音声データを前記音声認識辞書記憶部に記憶されている音声認識辞書を用いて音声認識し、その認識結果を得る端末側音声認識部と、前記ユーザの音声データを、前記端末側通信制御部を用いて前記音声認識サーバに送信する音声送信部と、前記端末側音声認識部からの認識結果と、前記端末側通信制御部を用いて取得した前記音声認識サーバからの認識結果とを比較し、いずれか一方の認識結果を選択する認識結果取得部と、前記選択された認識結果が示す語句が、前記音声認識辞書記憶部に記憶されている音声認識辞書に存在するか否かを判定し、存在しない場合に前記語句を前記音声認識辞書に登録する辞書管理部と、前記辞書管理部によって前記音声認識辞書に登録された語句に関する情報を、前記端末側通信制御部を用いて前記音声認識サーバに送信する語句送信部と、を備える。前記辞書管理部は、前記端末側通信制御部を用いて取得した前記音声認識サーバから配信された語句を前記音声認識辞書に登録する。 One aspect of the present invention is a voice recognition system including a terminal device that recognizes a user's voice data by voice and a voice recognition server that communicates with the terminal device and recognizes the user's voice data by voice. The voice recognition server has a server-side communication control unit that communicates with the terminal device, and a server-side voice recognition that recognizes the voice data of the user transmitted from the terminal device and transmits the recognition result to the terminal device. A word management unit that registers information about the phrase transmitted from the terminal device acquired by using the server-side communication control unit and information about the word transmitted from another terminal device in the word list. A word / phrase distribution unit that distributes information about a word / phrase registered in the word / phrase list to at least one of the terminal device and the other terminal device using the server-side communication control unit. The terminal device stores the terminal-side communication control unit that communicates with the voice recognition server, the voice recognition dictionary storage unit that stores the voice recognition dictionary for voice recognition, and the voice data of the user into the voice recognition dictionary storage unit. A voice recognition unit that recognizes voice using a stored voice recognition dictionary and obtains the recognition result, and a voice that transmits the user's voice data to the voice recognition server using the terminal communication control unit. Recognition that compares the recognition result from the transmission unit, the voice recognition unit on the terminal side, and the recognition result from the voice recognition server acquired by using the communication control unit on the terminal side, and selects one of the recognition results. It is determined whether or not the result acquisition unit and the phrase indicated by the selected recognition result exist in the speech recognition dictionary stored in the speech recognition dictionary storage unit, and if not, the phrase is recognized by the speech recognition dictionary. It includes a dictionary management unit that registers in a dictionary, and a phrase transmission unit that transmits information about words and phrases registered in the voice recognition dictionary by the dictionary management unit to the voice recognition server using the terminal-side communication control unit. .. The dictionary management unit registers the words and phrases distributed from the voice recognition server acquired by using the terminal-side communication control unit in the voice recognition dictionary.

本発明によれば、好適な音声認識システム、音声認識サーバ、端末装置、及び語句管理方法を提供することができる。 According to the present invention, it is possible to provide a suitable voice recognition system, voice recognition server, terminal device, and word management method.

上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 Issues, configurations and effects other than those described above will be clarified by the description of the following embodiments.

本発明の第１実施形態に係る音声認識システムのシステム構成及び機能構成の一例を示す図である。It is a figure which shows an example of the system structure and the functional structure of the voice recognition system which concerns on 1st Embodiment of this invention. 音声認識システムにより実行される処理の概要を示すシーケンス図である。It is a sequence diagram which shows the outline of the processing executed by a voice recognition system. 語句リストのデータ構成の一例を示す図である。It is a figure which shows an example of the data structure of a word list. 音声認識サーバを実現するコンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the computer which realizes a voice recognition server. 端末装置の音声認識処理および新規語句送信処理の一例を示すフローチャートである。It is a flowchart which shows an example of the voice recognition processing and the new word transmission processing of a terminal device. 音声認識サーバの新規語句登録処理および新規語句配信処理の一例を示すフローチャートである。It is a flowchart which shows an example of the new word registration process and the new word delivery process of a voice recognition server. 端末装置の新規語句登録処理の一例を示すフローチャートである。It is a flowchart which shows an example of the new word registration process of a terminal device. 本発明の第２実施形態に係る語句リストのデータ構成の一例を示す図である。It is a figure which shows an example of the data structure of the word list which concerns on 2nd Embodiment of this invention. 音声認識サーバの新規語句登録処理および新規語句配信処理の一例を示すフローチャートである。It is a flowchart which shows an example of the new word registration process and the new word delivery process of a voice recognition server.

以下、本発明の複数の実施形態について、図面を参照して説明する。 Hereinafter, a plurality of embodiments of the present invention will be described with reference to the drawings.

［第１実施形態］
図１は、第１実施形態に係る音声認識システムのシステム構成及び機能構成の一例を示す図である。 [First Embodiment]
FIG. 1 is a diagram showing an example of a system configuration and a functional configuration of the voice recognition system according to the first embodiment.

音声認識システム１は、複数の端末装置１０（図１では１台のみを図示）と、音声認識サーバ２０とを含む。各端末装置１０と音声認識サーバ２０は、例えば携帯電話網、インターネット等の通信ネットワークＮを介して、互いに通信することができる。 The voice recognition system 1 includes a plurality of terminal devices 10 (only one is shown in FIG. 1) and a voice recognition server 20. Each terminal device 10 and the voice recognition server 20 can communicate with each other via, for example, a communication network N such as a mobile phone network or the Internet.

端末装置１０は、例えばスマートフォン、フィーチャーフォン、タブレットコンピュータ、ＰＣ（Personal Computer）、ウェアラブルデバイス、車載カーナビゲーション機、車載オーディオ機器、車載ＥＣＵ（Electronic Control Unit）などの情報通信機器である。音声認識サーバ２０は、例えばサーバコンピュータなどの情報通信機器である。 The terminal device 10 is an information communication device such as a smartphone, a feature phone, a tablet computer, a PC (Personal Computer), a wearable device, an in-vehicle car navigation device, an in-vehicle audio device, and an in-vehicle ECU (Electronic Control Unit). The voice recognition server 20 is an information communication device such as a server computer.

端末装置１０は、音声認識機能を有する。また、端末装置１０は、音声認識辞書に新しく語句を登録する機能を有する。音声認識サーバ２０は、音声認識機能を有する。また、音声認識サーバ２０は、各端末装置１０から送信された語句を、各端末装置１０に配信する機能を有する。図２を参照して、この音声認識システム１の処理の概要を説明する。 The terminal device 10 has a voice recognition function. Further, the terminal device 10 has a function of registering a new phrase in the voice recognition dictionary. The voice recognition server 20 has a voice recognition function. Further, the voice recognition server 20 has a function of delivering words and phrases transmitted from each terminal device 10 to each terminal device 10. The outline of the processing of the voice recognition system 1 will be described with reference to FIG.

図２は、音声認識システムにより実行される処理の概要を示すシーケンス図である。 FIG. 2 is a sequence diagram showing an outline of processing executed by the voice recognition system.

ある端末装置１０は、ユーザの音声データの入力を受け付けて（ステップＳ１）、当該音声データを音声認識サーバ２０に送信する（ステップＳ２）。端末装置１０は、入力された音声データに対して音声認識処理を実行して、認識結果を得る（ステップＳ３）。一方、音声認識サーバ２０は、端末装置１０から送信された音声データに対して音声認識処理を実行して、認識結果を得て（ステップＳ４）、認識結果を音声データの送信元の端末装置１０に送信する（ステップＳ５）。 A terminal device 10 receives the input of the user's voice data (step S1) and transmits the voice data to the voice recognition server 20 (step S2). The terminal device 10 executes a voice recognition process on the input voice data and obtains a recognition result (step S3). On the other hand, the voice recognition server 20 executes voice recognition processing on the voice data transmitted from the terminal device 10 to obtain a recognition result (step S4), and obtains the recognition result from the terminal device 10 from which the voice data is transmitted. (Step S5).

ユーザの入力音声データに対して２つの認識結果を取得した端末装置１０は、いずれの認識結果を採用するかを判定する（ステップＳ６）。そして、端末装置１０は、採用した認識結果が示す認識語句が、自身の備える音声認識辞書に登録されていない場合には、当該認識語句を音声認識辞書に登録する（ステップＳ７）。このようにして、新規語句が端末装置１０の音声認識辞書に追加される。 The terminal device 10 that has acquired the two recognition results for the user's input voice data determines which recognition result is to be adopted (step S6). Then, when the recognition phrase indicated by the adopted recognition result is not registered in the speech recognition dictionary provided by the terminal device 10, the terminal device 10 registers the recognition phrase in the speech recognition dictionary (step S7). In this way, the new phrase is added to the speech recognition dictionary of the terminal device 10.

新規語句を音声認識辞書に登録した端末装置１０は、当該新規語句を音声認識サーバ２０に送信（通知）する（ステップＳ８）。音声認識サーバ２０は、端末装置１０から通知された新規語句を、自身の備える語句リストに登録する（ステップＳ９）。ステップＳ１〜Ｓ９の処理は、それぞれの端末装置１０と音声認識サーバ２０との間で実行される。つまり、語句リストには、複数の端末装置１０から通知された新規語句が登録される。 The terminal device 10 that has registered the new phrase in the voice recognition dictionary transmits (notifies) the new phrase to the voice recognition server 20 (step S8). The voice recognition server 20 registers the new phrase notified from the terminal device 10 in the phrase list provided by the voice recognition server 20 (step S9). The processes of steps S1 to S9 are executed between the respective terminal devices 10 and the voice recognition server 20. That is, new words / phrases notified from the plurality of terminal devices 10 are registered in the word / phrase list.

音声認識サーバ２０は、語句リストに登録した新規語句を、当該新規語句を通知した端末装置１０以外の１つ以上の端末装置１０に対して配信する（ステップＳ１０）。ここで、音声認識サーバ２０は、所定の規則に従って、例えば複数のユーザの間で使用頻度が高い新規語句を、配信対象の語句として選択する。 The voice recognition server 20 distributes the new phrase registered in the phrase list to one or more terminal devices 10 other than the terminal device 10 that notified the new phrase (step S10). Here, the voice recognition server 20 selects, for example, a new phrase frequently used among a plurality of users as a phrase to be distributed according to a predetermined rule.

配信された新規語句を受信した各端末装置１０は、受信した新規語句が、自身の備える音声認識辞書に登録されていない場合には、当該新規語句を音声認識辞書に登録する（ステップＳ１１）。このようにして、ステップＳ７において端末装置１０の音声認識辞書に追加された新規語句が、ステップＳ１１において他の端末装置１０の音声認識辞書にも追加される。 If the received new phrase is not registered in the voice recognition dictionary provided by each terminal device 10 that has received the delivered new phrase, the new phrase is registered in the voice recognition dictionary (step S11). In this way, the new phrase added to the voice recognition dictionary of the terminal device 10 in step S7 is also added to the voice recognition dictionary of the other terminal device 10 in step S11.

端末装置１０は、音声認識サーバ２０と比べて、計算機リソース（例えば、演算処理速度や記憶容量）が少ない。そのため、端末装置１０における音声認識機能は、音声認識辞書に登録されている語句数が少ないことや処理速度が遅いことにより、認識能力が低く制限される。一方で、音声認識サーバ２０における音声認識機能は、音声認識辞書に登録されている語句数が多いことや処理速度が速いことにより、認識能力が高い。ただし、端末装置１０と音声認識サーバ２０の通信を確立できない場合には、音声認識サーバ２０における音声認識機能を利用できないというデメリットはある。 The terminal device 10 has less computer resources (for example, arithmetic processing speed and storage capacity) than the voice recognition server 20. Therefore, the voice recognition function of the terminal device 10 is limited in recognition ability due to the small number of words and phrases registered in the voice recognition dictionary and the slow processing speed. On the other hand, the voice recognition function in the voice recognition server 20 has a high recognition ability due to the large number of words and phrases registered in the voice recognition dictionary and the high processing speed. However, if the communication between the terminal device 10 and the voice recognition server 20 cannot be established, there is a demerit that the voice recognition function of the voice recognition server 20 cannot be used.

そこで、本実施形態の音声認識システム１では、音声認識サーバ２０が各端末装置１０から新規語句を収集するとともに、収集した新規語句を各端末装置１０に配信して、各端末装置１０の音声認識辞書に登録させる。これにより、各端末装置１０の音声認識辞書に、各ユーザによって使用される可能性の高い語句を効率的に追加し、音声認識サーバ２０と通信を確立できない環境でも、ユーザの利便性を向上することができる。 Therefore, in the voice recognition system 1 of the present embodiment, the voice recognition server 20 collects new words and phrases from each terminal device 10 and distributes the collected new words and phrases to each terminal device 10 to recognize the voice of each terminal device 10. Have them registered in the dictionary. As a result, words and phrases that are likely to be used by each user are efficiently added to the voice recognition dictionary of each terminal device 10, and user convenience is improved even in an environment where communication cannot be established with the voice recognition server 20. be able to.

図１の説明に戻り、端末装置１０及び音声認識サーバ２０の機能についてより詳細に説明する。 Returning to the description of FIG. 1, the functions of the terminal device 10 and the voice recognition server 20 will be described in more detail.

端末装置１０は、音声送信部１１と、音声認識部１２と、音声認識辞書１３（ユーザ辞書１４を含む）と、認識結果取得部１５と、インターフェイス制御部１６と、辞書管理部１７と、語句送信部１８と、通信制御部１９とを有する。また、端末装置１０は、内蔵あるいは外部に接続されるマイクロフォンＭを有する。 The terminal device 10 includes a voice transmission unit 11, a voice recognition unit 12, a voice recognition dictionary 13 (including a user dictionary 14), a recognition result acquisition unit 15, an interface control unit 16, a dictionary management unit 17, and words and phrases. It has a transmission unit 18 and a communication control unit 19. Further, the terminal device 10 has a microphone M that is built-in or is connected to the outside.

音声送信部１１は、マイクロフォンＭを介してユーザの音声データの入力を受け付ける。また、音声送信部１１は、入力された音声データを、通信制御部１９を介して音声認識サーバ２０に送信する。なお、音声送信部１１は、送信する音声データに圧縮処理を施してデータサイズを小さくしてもよい。 The voice transmission unit 11 receives the input of the user's voice data via the microphone M. Further, the voice transmission unit 11 transmits the input voice data to the voice recognition server 20 via the communication control unit 19. The voice transmission unit 11 may reduce the data size by compressing the voice data to be transmitted.

音声認識部１２は、ユーザの音声データの入力をマイクロフォンＭを介して受け付け、その音声データの音声認識処理を実行し、認識結果として例えば認識された語句（文字列）とその信頼度を出力する。具体的には、音声認識部１２は、音声認識辞書１３を参照して、その中に登録されている語句の中から、入力された音声に最も類似する語句あるいは複数の語句により表現される語句を推定する。また、音声認識部１２は、語句の推定とともに、当該推定の信頼度を算出する。このような音声認識処理は、既存の技術を用いて実現できるため、詳細な説明を省略する。 The voice recognition unit 12 receives the input of the user's voice data via the microphone M, executes the voice recognition process of the voice data, and outputs, for example, the recognized word (character string) and its reliability as the recognition result. .. Specifically, the voice recognition unit 12 refers to the voice recognition dictionary 13, and among the words and phrases registered in the dictionary, the words and phrases most similar to the input voice or words and phrases expressed by a plurality of words and phrases. To estimate. In addition, the voice recognition unit 12 calculates the reliability of the estimation as well as the estimation of the phrase. Since such a voice recognition process can be realized by using an existing technique, detailed description thereof will be omitted.

音声認識辞書１３は、予め登録された複数の語句を含む標準辞書（図示せず）を有する。また、音声認識辞書１３は、端末装置１０のユーザにより使用されたあるいは他の端末装置１０のユーザにより使用された新規語句を登録するためのユーザ辞書１４を有する。標準辞書及びユーザ辞書１４には、語句の文字列とともにその語句の読みデータやパラメータが登録されてもよい。本実施形態では、音声認識部１２は、標準辞書及びユーザ辞書１４を用いて音声認識処理を実行する。 The voice recognition dictionary 13 has a standard dictionary (not shown) including a plurality of pre-registered words and phrases. Further, the voice recognition dictionary 13 has a user dictionary 14 for registering a new phrase used by the user of the terminal device 10 or used by another user of the terminal device 10. In the standard dictionary and the user dictionary 14, reading data and parameters of the phrase may be registered together with the character string of the phrase. In the present embodiment, the voice recognition unit 12 executes the voice recognition process using the standard dictionary and the user dictionary 14.

認識結果取得部１５は、音声認識部１２から出力される認識結果を取得する。また、認識結果取得部１５は、音声送信部１１により送信された音声データの認識結果を、通信制御部１９を介して音声認識サーバ２０から取得する。また、認識結果取得部１５は、取得した２つの認識結果のいずれかを選択し、辞書管理部１７に出力する。認識結果取得部１５は、例えば各認識結果に含まれる信頼度を比較して、信頼度が高い方の認識結果を選択する。 The recognition result acquisition unit 15 acquires the recognition result output from the voice recognition unit 12. Further, the recognition result acquisition unit 15 acquires the recognition result of the voice data transmitted by the voice transmission unit 11 from the voice recognition server 20 via the communication control unit 19. Further, the recognition result acquisition unit 15 selects one of the two acquired recognition results and outputs it to the dictionary management unit 17. The recognition result acquisition unit 15 compares, for example, the reliability included in each recognition result, and selects the recognition result having the higher reliability.

なお、認識結果取得部１５は、選択した認識結果をインターフェイス制御部１６を介してユーザに提示し、当該認識結果を許可するかキャンセルするかを、インターフェイス制御部１６を介してユーザから受け付けてもよい。 Even if the recognition result acquisition unit 15 presents the selected recognition result to the user via the interface control unit 16 and accepts from the user via the interface control unit 16 whether to allow or cancel the recognition result. Good.

インターフェイス制御部１６は、端末装置１０の備えるディスプレイやスピーカ等の出力装置（図示せず）を介してユーザに情報を出力する。また、インターフェイス制御部１６は、端末装置１０の備えるソフトキーやハードキー等の入力装置（図示せず）を介してユーザからの情報の入力を受け付ける。 The interface control unit 16 outputs information to the user via an output device (not shown) such as a display or a speaker included in the terminal device 10. Further, the interface control unit 16 receives input of information from the user via an input device (not shown) such as a soft key or a hard key included in the terminal device 10.

辞書管理部１７は、ユーザ辞書１４の内容を管理する。具体的には、辞書管理部１７は、認識結果取得部１５から出力された認識結果を参照し、当該認識結果が示す語句が音声認識辞書１３（標準辞書及びユーザ辞書１４）に登録されているか否かを判定する。認識語句が音声認識辞書１３に登録されていない場合、辞書管理部１７は、当該語句を新規語句としてユーザ辞書１４に登録する。 The dictionary management unit 17 manages the contents of the user dictionary 14. Specifically, the dictionary management unit 17 refers to the recognition result output from the recognition result acquisition unit 15, and is the phrase indicated by the recognition result registered in the voice recognition dictionary 13 (standard dictionary and user dictionary 14)? Judge whether or not. When the recognized phrase is not registered in the voice recognition dictionary 13, the dictionary management unit 17 registers the phrase as a new phrase in the user dictionary 14.

また、辞書管理部１７は、音声認識サーバ２０から配信された新規語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、通信制御部１９を介して受信する。辞書管理部１７は、当該新規語句が音声認識辞書１３（標準辞書及びユーザ辞書１４）に登録されているか否かを判定する。当該新規語句が音声認識辞書１３に登録されていない場合、辞書管理部１７は、当該新規語句をユーザ辞書１４に登録する。 Further, the dictionary management unit 17 receives information (including, for example, the word / phrase, reading data, and parameters) about the new word / phrase delivered from the voice recognition server 20 via the communication control unit 19. The dictionary management unit 17 determines whether or not the new phrase is registered in the voice recognition dictionary 13 (standard dictionary and user dictionary 14). When the new phrase is not registered in the voice recognition dictionary 13, the dictionary management unit 17 registers the new phrase in the user dictionary 14.

語句送信部１８は、認識結果取得部１５により取得され、辞書管理部１７によりユーザ辞書１４に新しく登録された語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、通信制御部１９を介して音声認識サーバ２０に送信（通知）する。 The word / phrase transmission unit 18 transmits information (for example, including the word / phrase, reading data, and parameters) regarding the word / phrase newly registered in the user dictionary 14 by the dictionary management unit 17 acquired by the recognition result acquisition unit 15 in the communication control unit 19. Is transmitted (notified) to the voice recognition server 20 via.

通信制御部１９は、端末装置１０の備える通信装置（図示せず）を介して音声認識サーバ２０と通信し、他の機能（音声送信部１１、語句送信部１８等）からを出力された情報を音声認識サーバ２０に送信したり、音声認識サーバ２０から受信した情報を他の機能（認識結果取得部１５、辞書管理部１７等）に出力したりする。もちろん、音声認識サーバ２０以外の機器と通信してもよい。 The communication control unit 19 communicates with the voice recognition server 20 via a communication device (not shown) included in the terminal device 10, and information output from other functions (voice transmission unit 11, word transmission unit 18, etc.). Is transmitted to the voice recognition server 20, and the information received from the voice recognition server 20 is output to other functions (recognition result acquisition unit 15, dictionary management unit 17, etc.). Of course, it may communicate with a device other than the voice recognition server 20.

音声認識サーバ２０は、音声認識部２１と、音声認識辞書２２と、語句管理部２３と、語句配信部２４と、語句リスト２５と、通信制御部２６とを有する。音声認識部２１、音声認識辞書２２、及び通信制御部２６を含む部分を音声認識サーバとして構築し、語句管理部２３、語句配信部２４、語句リスト２５、及び通信制御部２６を含む部分を語句管理サーバとして構築してもよい。 The voice recognition server 20 includes a voice recognition unit 21, a voice recognition dictionary 22, a word management unit 23, a word distribution unit 24, a word list 25, and a communication control unit 26. The part including the voice recognition unit 21, the voice recognition dictionary 22, and the communication control unit 26 is constructed as a voice recognition server, and the part including the word management unit 23, the phrase distribution unit 24, the phrase list 25, and the communication control unit 26 is used as a phrase. It may be built as a management server.

音声認識部２１は、各端末装置１０のユーザの音声データを、通信制御部２６を介して受信し、その音声データの音声認識処理を実行し、認識結果として例えば認識された語句（文字列）とその信頼度を出力する。具体的には、音声認識部２１は、音声認識辞書２２を参照して、その中に登録されている語句の中から、入力された音声に最も類似する語句あるいは複数の語句により表現される語句を推定する。また、音声認識部２１は、語句の推定とともに、当該推定の信頼度を算出する。このような音声認識処理は、既存の技術を用いて実現できるため、詳細な説明を省略する。なお、音声認識部２１は、得られた認識結果を、対応する音声データの送信元の端末装置１０に通信制御部２６を介して送信する。 The voice recognition unit 21 receives the voice data of the user of each terminal device 10 via the communication control unit 26, executes the voice recognition process of the voice data, and executes, for example, a recognized phrase (character string) as a recognition result. And its reliability are output. Specifically, the voice recognition unit 21 refers to the voice recognition dictionary 22, and among the words and phrases registered in the dictionary, the words and phrases most similar to the input voice or words and phrases expressed by a plurality of words and phrases. To estimate. In addition, the voice recognition unit 21 calculates the reliability of the estimation as well as the estimation of the phrase. Since such a voice recognition process can be realized by using an existing technique, detailed description thereof will be omitted. The voice recognition unit 21 transmits the obtained recognition result to the terminal device 10 of the corresponding voice data transmission source via the communication control unit 26.

音声認識辞書２２は、音声認識部２１により参照される。音声認識辞書２２は、予め複数の語句が登録されている。音声認識辞書２２には、例えば管理者によってあるいはプログラムによって自動的に新しい語句が追加される。また、音声認識辞書２２に登録されている語句は、例えば管理者によってあるいはプログラムによって自動的に更新される。 The voice recognition dictionary 22 is referred to by the voice recognition unit 21. A plurality of words and phrases are registered in advance in the voice recognition dictionary 22. New words and phrases are automatically added to the speech recognition dictionary 22 by, for example, an administrator or a program. Further, the words and phrases registered in the voice recognition dictionary 22 are automatically updated by, for example, an administrator or a program.

語句管理部２３は、語句リスト２５の内容を管理する。具体的には、語句管理部２３は、各端末装置１０から送信（通知）された新規語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、通信制御部２６を介して受信する。語句管理部２３は、受信した新規語句が語句リスト２５に登録されているか否かを判定する。当該新規語句が語句リスト２５に登録されていない場合、語句管理部２３は、当該新規語句を語句リスト２５に登録し、登録回数を１に設定する。当該新規語句が既に語句リスト２５に登録されている場合、語句管理部２３は、当該新規語句の登録回数を１カウントアップする。 The phrase management unit 23 manages the contents of the phrase list 25. Specifically, the phrase management unit 23 receives information (including, for example, the phrase, reading data, and parameters) regarding a new phrase transmitted (notified) from each terminal device 10 via the communication control unit 26. .. The phrase management unit 23 determines whether or not the received new phrase is registered in the phrase list 25. When the new phrase is not registered in the phrase list 25, the phrase management unit 23 registers the new phrase in the phrase list 25 and sets the number of registrations to 1. When the new phrase is already registered in the phrase list 25, the phrase management unit 23 counts up the number of registrations of the new phrase by one.

語句配信部２４は、語句リスト２５に登録されている各語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、通信制御部２６を介して各端末装置１０に配信する。具体的には、語句配信部２４は、配信対象の語句の登録回数に基づいて、当該語句を配信するか否かを決定する。例えば、語句配信部２４は、当該登録回数が所定の閾値を超えているか否かを判定し、当該閾値を超えている場合に配信すると決定する。当該所定閾値は、例えば、登録回数が当該閾値を超えた場合は当該語句が不特定多数のユーザによって頻繁に使われる可能性がある語句であると推定するための設計値である。 The word / phrase distribution unit 24 distributes information about each word / phrase registered in the word / phrase list 25 (including, for example, the word / phrase, reading data, and parameters) to each terminal device 10 via the communication control unit 26. Specifically, the phrase distribution unit 24 determines whether or not to distribute the phrase based on the number of registrations of the phrase to be distributed. For example, the phrase distribution unit 24 determines whether or not the number of registrations exceeds a predetermined threshold value, and determines that distribution is performed when the threshold value is exceeded. The predetermined threshold value is, for example, a design value for estimating that the phrase is a phrase that may be frequently used by an unspecified number of users when the number of registrations exceeds the threshold.

語句リスト２５は、各端末装置１０から通知された新規語句のリストであり、例えば図３に示すように構成される。 The word / phrase list 25 is a list of new words / phrases notified from each terminal device 10, and is configured as shown in FIG. 3, for example.

図３は、語句リストのデータ構成の一例を示す図である。語句リスト２５は、語句２５ａと、属性２５ｂと、総登録回数２５ｃとを関連付けたレコードを、語句ごとに格納することができる。語句２５ａは、語句に関する情報であり、例えば、当該語句、読みデータ、パラメータを含む。属性２５ｂは、語句２５ａが示す語句の属性であり、例えば、当該語句が施設の名称である場合はその位置情報（例えば所在地や地図上の座標情報など）であり、当該語句が楽曲の名称である場合はその楽曲に関連するアーティストの名称等の識別子である。図３の例では、施設の名称とその所在地を含むレコードが示されている。総登録回数２５ｃは、語句２５ａが示す語句の総登録回数である。総登録回数２５ｃは、当該語句を新規語句として通知したユーザの数（端末装置１０の数）に相当する。 FIG. 3 is a diagram showing an example of the data structure of the phrase list. The phrase list 25 can store a record in which the phrase 25a, the attribute 25b, and the total number of registrations 25c are associated with each phrase. The phrase 25a is information about the phrase, and includes, for example, the phrase, reading data, and parameters. The attribute 25b is an attribute of the phrase indicated by the phrase 25a. For example, when the phrase is the name of a facility, it is the position information (for example, the location or the coordinate information on the map), and the phrase is the name of the music. If there is, it is an identifier such as the name of the artist related to the music. In the example of FIG. 3, a record including the name of the facility and its location is shown. The total number of registrations 25c is the total number of registrations of the phrase indicated by the phrase 25a. The total number of registrations 25c corresponds to the number of users (the number of terminal devices 10) who have notified the phrase as a new phrase.

図１の説明に戻り、通信制御部２６は、音声認識サーバ２０の備える通信装置（図示せず）を介して各端末装置１０と通信し、他の機能（音声認識部２１、語句配信部２４等）からを出力された情報を各端末装置１０に送信したり、各端末装置１０から受信した情報を他の機能（音声認識部２１、語句管理部２３等）に出力したりする。もちろん、端末装置１０以外の機器と通信してもよい。 Returning to the description of FIG. 1, the communication control unit 26 communicates with each terminal device 10 via a communication device (not shown) included in the voice recognition server 20, and has other functions (voice recognition unit 21, word distribution unit 24). Etc.), the information output from each terminal device 10 is transmitted to each terminal device 10, and the information received from each terminal device 10 is output to other functions (speech recognition unit 21, word management unit 23, etc.). Of course, it may communicate with a device other than the terminal device 10.

図４は、音声認識サーバを実現するコンピュータのハードウェア構成の一例を示す図である。音声認識サーバ２０は、例えば、図４に示すようなコンピュータ９０により実現することができる。音声認識サーバ２０は、複数のコンピュータ９０により構成されてもよい。 FIG. 4 is a diagram showing an example of a hardware configuration of a computer that realizes a voice recognition server. The voice recognition server 20 can be realized by, for example, a computer 90 as shown in FIG. The voice recognition server 20 may be composed of a plurality of computers 90.

コンピュータ９０は、例えば、演算装置９１と、主記憶装置９２と、外部記憶装置９３と、通信装置９４と、入力装置９５と、出力装置９６とを含む。 The computer 90 includes, for example, an arithmetic unit 91, a main storage device 92, an external storage device 93, a communication device 94, an input device 95, and an output device 96.

演算装置９１は、例えば、ＣＰＵ（Central Processing Unit）などの装置である。主記憶装置９２は、例えば、ＲＡＭ（Random Access Memory）などの記憶装置である。外部記憶装置９３は、例えば、ハードディスクやＳＳＤ（Solid State Drive）、あるいはフラッシュＲＯＭ（Read Only Memory）などの記憶装置である。 The arithmetic unit 91 is, for example, a device such as a CPU (Central Processing Unit). The main storage device 92 is, for example, a storage device such as a RAM (Random Access Memory). The external storage device 93 is, for example, a storage device such as a hard disk, an SSD (Solid State Drive), or a flash ROM (Read Only Memory).

通信装置９４は、ネットワークケーブルを介して有線通信を行う通信装置、アンテナを介して無線通信を行う通信装置を含む、情報を送受信する装置である。入力装置９５は、キーボードやマウスなどのポインティングデバイス、タッチパネル、マイクロフォンなどを含む、入力情報を受け付ける装置である。出力装置９６は、ディスプレイ、プリンタ、スピーカなどを含む、出力情報を出力する装置である。 The communication device 94 is a device for transmitting and receiving information, including a communication device that performs wired communication via a network cable and a communication device that performs wireless communication via an antenna. The input device 95 is a device that receives input information, including a pointing device such as a keyboard and a mouse, a touch panel, and a microphone. The output device 96 is a device that outputs output information, including a display, a printer, a speaker, and the like.

音声認識サーバ２０の各機能は、例えば、演算装置９１が所定のアプリケーションプログラムを実行することによって実現することができる。このアプリケーションプログラムは、例えば、主記憶装置９２又は外部記憶装置９３内に記憶され、実行にあたって主記憶装置９２上にロードされ、演算装置９１によって実行される。音声認識辞書２２及び語句リスト２５は、例えば、主記憶装置９２及び外部記憶装置９３の少なくとも一方の記憶部によって実現される。音声認識辞書２２及び語句リスト２５の少なくとも一部は、例えば、通信装置９４を介して接続されるネットワーク上の記憶部により実現されてもよい。 Each function of the voice recognition server 20 can be realized, for example, by the arithmetic unit 91 executing a predetermined application program. This application program is stored in, for example, the main storage device 92 or the external storage device 93, loaded on the main storage device 92 for execution, and executed by the arithmetic unit 91. The voice recognition dictionary 22 and the phrase list 25 are realized by, for example, at least one storage unit of the main storage device 92 and the external storage device 93. At least a part of the voice recognition dictionary 22 and the phrase list 25 may be realized by, for example, a storage unit on a network connected via a communication device 94.

各端末装置１０も、例えば、図４に示すようなコンピュータ９０により実現することができる。すなわち、端末装置１０の各機能は、例えば、演算装置９１が所定のアプリケーションプログラムを実行することによって実現することができる。音声認識辞書１３は、例えば、主記憶装置９２及び外部記憶装置９３の少なくとも一方の記憶部によって実現される。 Each terminal device 10 can also be realized by, for example, a computer 90 as shown in FIG. That is, each function of the terminal device 10 can be realized, for example, by the arithmetic unit 91 executing a predetermined application program. The voice recognition dictionary 13 is realized by, for example, at least one storage unit of the main storage device 92 and the external storage device 93.

図５は、端末装置の音声認識処理および新規語句送信処理の一例を示すフローチャートである。本フローチャートは、音声データの入力及びその音声認識処理が実行された後の処理を示している。端末装置１０と音声認識サーバ２０の通信は確立されているものとする。 FIG. 5 is a flowchart showing an example of voice recognition processing and new word transmission processing of the terminal device. This flowchart shows the input of voice data and the processing after the voice recognition processing is executed. It is assumed that the communication between the terminal device 10 and the voice recognition server 20 has been established.

まず、認識結果取得部１５は、入力された音声データの音声認識結果を取得したか否かを判定する（ステップＳ１０１）。具体的には、認識結果取得部１５は、音声認識部１２及び音声認識サーバ２０のそれぞれから認識結果を取得したか否かを判定する。２つの認識結果を取得していないと判定した場合（ステップＳ１０１：ＮＯ）、認識結果取得部１５は、ステップＳ１０１の処理を継続する。 First, the recognition result acquisition unit 15 determines whether or not the voice recognition result of the input voice data has been acquired (step S101). Specifically, the recognition result acquisition unit 15 determines whether or not the recognition result has been acquired from each of the voice recognition unit 12 and the voice recognition server 20. When it is determined that the two recognition results have not been acquired (step S101: NO), the recognition result acquisition unit 15 continues the process of step S101.

２つの認識結果を取得したと判定した場合（ステップＳ１０１：ＹＥＳ）、認識結果取得部１５は、認識結果を選択する（ステップＳ１０２）。具体的には、認識結果取得部１５は、ステップＳ１０１で取得した２つの認識結果うち、各認識結果に含まれる信頼度を比較して、信頼度が高い方の認識結果を選択する。信頼度の範囲が最小値０〜最大値１である場合を考える。例えば、音声認識部１２から得られた認識結果が「東京国際空港」（信頼度０．９２）、音声認識サーバ２０から得られた認識結果が「東京国際空港」（信頼度０．９７）の場合は、どちらの認識結果も信頼度が高いが、より信頼度が高い方が選択される。また例えば、音声認識部１２から得られた認識結果が「成田国際空港」（信頼度０．３２）、音声認識サーバ２０から得られた認識結果が「セントレア国際空港」（信頼度０．９４）の場合は、異なる語句の信頼度ではあるが、音声認識部１２の認識結果は誤っている可能性が高いため、信頼度が高い方が選択される。 When it is determined that two recognition results have been acquired (step S101: YES), the recognition result acquisition unit 15 selects the recognition results (step S102). Specifically, the recognition result acquisition unit 15 compares the reliability included in each recognition result among the two recognition results acquired in step S101, and selects the recognition result having the higher reliability. Consider the case where the reliability range is from the minimum value to the maximum value 1. For example, the recognition result obtained from the voice recognition unit 12 is "Tokyo International Airport" (reliability 0.92), and the recognition result obtained from the voice recognition server 20 is "Tokyo International Airport" (reliability 0.97). In that case, both recognition results have high reliability, but the one with higher reliability is selected. For example, the recognition result obtained from the voice recognition unit 12 is "Narita International Airport" (reliability 0.32), and the recognition result obtained from the voice recognition server 20 is "Centrea International Airport" (reliability 0.94). In the case of, although the reliability of the words is different, the recognition result of the voice recognition unit 12 is likely to be incorrect, so the one with the higher reliability is selected.

それから、認識結果取得部１５は、選択した認識結果がキャンセルされたか否かを判定する（ステップＳ１０３）。具体的には、インターフェイス制御部１６は、ステップＳ１０２で選択された認識結果が示す語句（あるいは当該語句に対応する操作コマンド）を、ディスプレイやスピーカを介してユーザに提示するとともに、当該語句（あるいは操作コマンド）を許可するかキャンセルするかの選択を、入力装置を介してユーザから受け付ける。インターフェイス制御部１６は、提示した語句（あるいは操作コマンド）の修正をユーザから受け付けてもよい。 Then, the recognition result acquisition unit 15 determines whether or not the selected recognition result has been canceled (step S103). Specifically, the interface control unit 16 presents the phrase (or the operation command corresponding to the phrase) indicated by the recognition result selected in step S102 to the user via the display or the speaker, and presents the phrase (or the phrase (or the operation command)). The user accepts the selection of whether to allow or cancel the operation command) from the user via the input device. The interface control unit 16 may accept corrections of the presented words (or operation commands) from the user.

インターフェイス制御部１６がキャンセルの選択を受け付けた場合、認識結果取得部１５は、認識結果がキャンセルされたと判定し（ステップＳ１０３：ＹＥＳ）、処理をステップＳ１０１に戻し、次の音声データに関する処理を実行する。 When the interface control unit 16 accepts the cancellation selection, the recognition result acquisition unit 15 determines that the recognition result has been canceled (step S103: YES), returns the process to step S101, and executes the process related to the next voice data. To do.

インターフェイス制御部１６が許可の選択を受け付けた場合、認識結果取得部１５は、認識結果が許可されたと判定し（ステップＳ１０３：ＮＯ）、処理をＳ１０４に進める。このとき、認識結果取得部１５は、ステップＳ１０２で選択されかつステップＳ１０３で許可された認識結果を辞書管理部１７に出力する。ステップＳ１０３で語句が修正された場合には、認識結果取得部１５は、その修正後の認識結果を辞書管理部１７に出力すればよい。なお、ユーザに提示した語句（あるいは操作コマンド）が許可された場合、端末装置１０の処理部（図示せず）は、当該語句（あるいは操作コマンド）に対応付けられた機能を実行してもよい。 When the interface control unit 16 accepts the selection of permission, the recognition result acquisition unit 15 determines that the recognition result is permitted (step S103: NO), and proceeds to the process in S104. At this time, the recognition result acquisition unit 15 outputs the recognition result selected in step S102 and permitted in step S103 to the dictionary management unit 17. When the phrase is corrected in step S103, the recognition result acquisition unit 15 may output the corrected recognition result to the dictionary management unit 17. If the phrase (or operation command) presented to the user is permitted, the processing unit (not shown) of the terminal device 10 may execute the function associated with the phrase (or operation command). ..

それから、辞書管理部１７は、認識語句が音声認識辞書１３に登録済であるか否かを判定する（ステップＳ１０４）。具体的には、辞書管理部１７は、ステップＳ１０３で認識結果取得部１５から出力された認識結果を参照し、当該認識結果が示す語句が音声認識辞書１３（標準辞書及びユーザ辞書１４）に登録されているか否かを判定する。認識語句が音声認識辞書１３に登録済であると判定された場合（ステップＳ１０４：ＹＥＳ）、処理はステップＳ１０１に戻り、認識結果取得部１５は次の音声データに関する処理を実行する。 Then, the dictionary management unit 17 determines whether or not the recognition phrase is registered in the speech recognition dictionary 13 (step S104). Specifically, the dictionary management unit 17 refers to the recognition result output from the recognition result acquisition unit 15 in step S103, and the words and phrases indicated by the recognition result are registered in the voice recognition dictionary 13 (standard dictionary and user dictionary 14). Judge whether or not it has been done. When it is determined that the recognition phrase has been registered in the voice recognition dictionary 13 (step S104: YES), the process returns to step S101, and the recognition result acquisition unit 15 executes the process related to the next voice data.

認識語句が音声認識辞書１３に登録済でないと判定された場合（ステップＳ１０４：ＮＯ）、辞書管理部１７は、当該認識語句をユーザ辞書１４に登録するか否かを判定する（ステップＳ１０５）。具体的には、辞書管理部１７は、同一の認識語句について、ステップＳ１０４で登録されていないと判定した回数を記録する。そして、辞書管理部１７は、当該認識語句の回数が所定閾値を超えた場合に、ユーザ辞書１４に登録すると判定する。このようにすれば、少ない回数しか認識されていない語句（すなわち、使用頻度が低いと推測される語句）が、即座にユーザ辞書１４に登録されてしまうのを避けることができる。 When it is determined that the recognized phrase is not registered in the voice recognition dictionary 13 (step S104: NO), the dictionary management unit 17 determines whether or not to register the recognized phrase in the user dictionary 14 (step S105). Specifically, the dictionary management unit 17 records the number of times it is determined that the same recognized phrase is not registered in step S104. Then, the dictionary management unit 17 determines that the recognition phrase is registered in the user dictionary 14 when the number of times of the recognition phrase exceeds a predetermined threshold value. By doing so, it is possible to prevent words and phrases that are recognized only a small number of times (that is, words and phrases that are presumed to be used infrequently) from being immediately registered in the user dictionary 14.

ステップＳ１０５の判定方法は、上述の例に限られない。例えば、辞書管理部１７は、当該認識語句の信頼度を参照してもよい。そして、辞書管理部１７は、当該信頼度が所定閾値より大きい場合に、ユーザ辞書１４に登録すると判定する。このようにすれば、認識の信頼度が低い語句が、ユーザ辞書１４に登録されてしまうのを避けることができる。もちろん、ステップＳ１０５の処理を省略して、辞書管理部１７は、無条件で認識語句をユーザ辞書１４に登録してもよい。 The determination method in step S105 is not limited to the above example. For example, the dictionary management unit 17 may refer to the reliability of the recognized phrase. Then, the dictionary management unit 17 determines that the user dictionary 14 is registered when the reliability is greater than the predetermined threshold value. By doing so, it is possible to prevent words and phrases having low recognition reliability from being registered in the user dictionary 14. Of course, the process of step S105 may be omitted, and the dictionary management unit 17 may unconditionally register the recognized phrase in the user dictionary 14.

認識語句をユーザ辞書１４に登録しないと判定された場合（ステップＳ１０５：ＮＯ）、処理はステップＳ１０１に戻り、認識結果取得部１５は次の音声データに関する処理を実行する。 When it is determined that the recognition phrase is not registered in the user dictionary 14 (step S105: NO), the process returns to step S101, and the recognition result acquisition unit 15 executes the process related to the next voice data.

認識語句をユーザ辞書１４に登録すると判定した場合（ステップＳ１０５：ＹＥＳ）、辞書管理部１７は、認識語句を新規語句としてユーザ辞書１４に登録する（ステップＳ１０６）。具体的には、辞書管理部１７は、ステップＳ１０３で認識結果取得部１５から出力された認識結果が示す語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、ユーザ辞書１４に登録する。 When it is determined that the recognized phrase is registered in the user dictionary 14 (step S105: YES), the dictionary management unit 17 registers the recognized phrase as a new phrase in the user dictionary 14 (step S106). Specifically, the dictionary management unit 17 registers information (for example, including the word / phrase, reading data, and parameters) related to the word / phrase indicated by the recognition result output from the recognition result acquisition unit 15 in step S103 in the user dictionary 14. To do.

それから、語句送信部１８は、新規語句を音声認識サーバ２０に送信する（ステップＳ１０７）。具体的には、語句送信部１８は、ステップＳ１０６でユーザ辞書１４に新しく登録された語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、通信制御部１９を介して音声認識サーバ２０に送信（通知）する。ステップＳ１０７の後、処理はステップＳ１０１に戻り、認識結果取得部１５は次の音声データに関する処理を実行する。 Then, the phrase transmission unit 18 transmits a new phrase to the voice recognition server 20 (step S107). Specifically, the phrase transmission unit 18 transmits information (for example, including the phrase, reading data, and parameters) newly registered in the user dictionary 14 in step S106 to the voice recognition server via the communication control unit 19. Send (notify) to 20. After step S107, the process returns to step S101, and the recognition result acquisition unit 15 executes the process related to the next voice data.

図６は、音声認識サーバの新規語句登録処理および新規語句配信処理の一例を示すフローチャートである。少なくとも１つの端末装置１０と音声認識サーバ２０の通信は確立されているものとする。 FIG. 6 is a flowchart showing an example of a new word registration process and a new word distribution process of the voice recognition server. It is assumed that communication between at least one terminal device 10 and the voice recognition server 20 has been established.

まず、語句管理部２３は、新規語句を受信したか否かを判定する（ステップＳ２０１）。具体的には、語句管理部２３は、いずれかの端末装置１０から送信（通知）された新規語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、通信制御部２６を介して受信したか否かを判定する。新規語句を受信していないと判定した場合（ステップＳ２０１：ＮＯ）、語句管理部２３は、ステップＳ２０１の処理を継続する。 First, the phrase management unit 23 determines whether or not a new phrase has been received (step S201). Specifically, the phrase management unit 23 transmits information (including, for example, the phrase, reading data, and parameters) regarding a new phrase transmitted (notified) from any of the terminal devices 10 via the communication control unit 26. Determine if it has been received. When it is determined that the new phrase has not been received (step S201: NO), the phrase management unit 23 continues the process of step S201.

新規語句を受信したと判定した場合（ステップＳ２０１：ＹＥＳ）、語句管理部２３は、新規語句が語句リスト２５に登録済であるか否かを判定する（ステップＳ２０２）。具体的には、語句管理部２３は、ステップＳ２０１で受信した新規語句が語句リスト２５に登録されているか否かを判定する。 When it is determined that the new phrase has been received (step S201: YES), the phrase management unit 23 determines whether or not the new phrase has been registered in the phrase list 25 (step S202). Specifically, the phrase management unit 23 determines whether or not the new phrase received in step S201 is registered in the phrase list 25.

新規語句が語句リスト２５に登録済でないと判定した場合（ステップＳ２０２：ＮＯ）、語句管理部２３は、当該新規語句を語句リスト２５に登録する（ステップＳ２０３）。具体的には、語句管理部２３は、ステップＳ２０１で受信した新規語句に対応するレコードを生成し、語句リスト２５に追加する。語句管理部２３は、当該新規語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、語句２５ａに設定する。語句管理部２３は、新規語句の属性（位置情報やアーティストの識別子など）を判定し、属性２５ｂに設定する。新規語句の属性は、例えば新規語句をキーワードとして、予め用意されたデータベースやインターネット上を検索することで得ることができる。語句管理部２３は、総登録回数２５ｃに０を設定する。 When it is determined that the new phrase is not registered in the phrase list 25 (step S202: NO), the phrase management unit 23 registers the new phrase in the phrase list 25 (step S203). Specifically, the phrase management unit 23 generates a record corresponding to the new phrase received in step S201 and adds it to the phrase list 25. The phrase management unit 23 sets information about the new phrase (including, for example, the phrase, reading data, and parameters) in the phrase 25a. The phrase management unit 23 determines the attributes of the new phrase (position information, artist identifier, etc.) and sets the attributes 25b. The attributes of a new phrase can be obtained, for example, by searching a database prepared in advance or the Internet using the new phrase as a keyword. The word management unit 23 sets 0 to the total number of registrations 25c.

新規語句が語句リスト２５に登録済であると判定した場合（ステップＳ２０２：ＹＥＳ）、又は、ステップＳ２０３の処理の後、語句管理部２３は、総登録回数をカウントアップする（ステップＳ２０４）。具体的には、語句管理部２３は、ステップＳ２０１で受信した新規語句（配信対象の語句）に対応するレコードの総登録回数２５ｃを１カウントアップする。 When it is determined that the new phrase has been registered in the phrase list 25 (step S202: YES), or after the processing in step S203, the phrase management unit 23 counts up the total number of registrations (step S204). Specifically, the phrase management unit 23 counts up the total number of registered records 25c corresponding to the new phrase (phrase to be distributed) received in step S201 by one.

それから、語句配信部２４は、総登録回数が所定閾値を超えたか否かを判定する（ステップＳ２０５）。具体的には、語句配信部２４は、ステップＳ２０４でカウントアップした総登録回数２５ｃが、所定閾値を超えたか否かを判定する。総登録回数２５ｃが所定閾値を超えていないと判定された場合（ステップＳ２０５：ＮＯ）、処理はステップＳ２０１に戻り、語句管理部２３は次に受信する新規語句に関する処理を実行する。 Then, the phrase distribution unit 24 determines whether or not the total number of registrations exceeds a predetermined threshold value (step S205). Specifically, the phrase distribution unit 24 determines whether or not the total number of registrations 25c counted up in step S204 exceeds a predetermined threshold value. When it is determined that the total number of registrations 25c does not exceed the predetermined threshold value (step S205: NO), the process returns to step S201, and the phrase management unit 23 executes the process related to the new phrase to be received next.

総登録回数２５ｃが所定閾値を超えていると判定した場合（ステップＳ２０５：ＹＥＳ）、語句配信部２４は、新規語句を配信する（ステップＳ２０６）。具体的には、語句配信部２４は、予め配信先として登録された端末装置１０のうち、配信対象の語句の送信元の端末装置１０以外の端末装置１０を、配信先として決定する。もちろん、配信対象の語句の送信元の端末装置１０を配信先に含めてもよい。 When it is determined that the total number of registrations 25c exceeds a predetermined threshold value (step S205: YES), the phrase distribution unit 24 distributes a new phrase (step S206). Specifically, the phrase distribution unit 24 determines, among the terminal devices 10 registered in advance as the distribution destination, the terminal device 10 other than the terminal device 10 of the transmission source of the word to be distributed as the distribution destination. Of course, the terminal device 10 that is the source of the words and phrases to be distributed may be included in the distribution destination.

配信先の決定方法は、上述の例に限られない。語句配信部２４は、当該配信対象の語句の属性２５ｂを参照してもよい。語句配信部２４は、例えば当該語句が施設の名称でありかつ当該属性２５ｂが施設の位置情報である場合、予め配信先として登録された端末装置１０のうち、当該施設の位置情報が示す位置と所定の関係にあるユーザを特定し、当該ユーザの端末装置１０を、配信先として決定する。所定の関係とは、例えば、地図上において当該施設の位置から所定範囲内に居住するユーザや、当該施設の位置する行政区画と同じ行政区画に居住するユーザである。ユーザに関する情報は、予め用意されたデータベースを参照して得ることができる。このようにすれば、配信対象の語句を、その使用頻度が高いあるいは使用される可能性が高いと推測されるユーザの端末装置１０に配信することができる。 The method of determining the delivery destination is not limited to the above example. The word / phrase distribution unit 24 may refer to the attribute 25b of the word / phrase to be distributed. For example, when the phrase is the name of the facility and the attribute 25b is the location information of the facility, the phrase distribution unit 24 sets the position indicated by the location information of the facility among the terminal devices 10 registered in advance as the distribution destination. A user having a predetermined relationship is specified, and the terminal device 10 of the user is determined as a delivery destination. The predetermined relationship is, for example, a user who resides within a predetermined range from the position of the facility on the map, or a user who resides in the same administrative division as the administrative division where the facility is located. Information about the user can be obtained by referring to a database prepared in advance. In this way, the words and phrases to be distributed can be distributed to the terminal device 10 of the user who is presumed to have a high frequency of use or a high possibility of being used.

また例えば、語句配信部２４は、当該語句が楽曲の名称でありかつ当該属性２５ｂがアーティストの名称等の識別子である場合、予め配信先として登録された端末装置１０のうち、当該アーティストの識別子が示すアーティストと所定の関係にあるユーザを特定し、当該ユーザの端末装置１０を、配信先として決定してもよい。所定の関係とは、例えば、当該アーティストをお気に入りとして登録しているユーザや、当該アーティストの楽曲を保有しているユーザである。ユーザに関する情報は、予め用意されたデータベースを参照して得ることができる。このようにすれば、配信対象の語句を、その使用頻度が高いあるいは使用される可能性が高いと推測されるユーザの端末装置１０に配信することができる。 Further, for example, in the phrase distribution unit 24, when the phrase is the name of a musical piece and the attribute 25b is an identifier such as an artist's name, the identifier of the artist is among the terminal devices 10 registered as distribution destinations in advance. A user who has a predetermined relationship with the indicated artist may be specified, and the terminal device 10 of the user may be determined as the distribution destination. The predetermined relationship is, for example, a user who has registered the artist as a favorite or a user who owns the music of the artist. Information about the user can be obtained by referring to a database prepared in advance. In this way, the words and phrases to be distributed can be distributed to the terminal device 10 of the user who is presumed to have a high frequency of use or a high possibility of being used.

語句配信部２４は、配信対象の語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、語句リスト２５から取得し、通信制御部２６を介して上記のように決定した配信先の端末装置１０に対して配信する。このように、総登録回数が所定閾値を超える場合に語句を配信することで、各ユーザに使用される可能性が高いと推測される語句が配信される。その後、処理はステップＳ２０１に戻り、語句管理部２３は次に受信する新規語句に関する処理を実行する。 The word / phrase distribution unit 24 acquires information about the word / phrase to be distributed (for example, including the word / phrase, reading data, and parameters) from the word / phrase list 25, and determines the distribution destination as described above via the communication control unit 26. Distribute to the terminal device 10. In this way, by delivering the phrase when the total number of registrations exceeds a predetermined threshold value, the phrase that is presumed to be used by each user is delivered. After that, the process returns to step S201, and the phrase management unit 23 executes the process related to the new phrase to be received next.

図７は、端末装置の新規語句登録処理の一例を示すフローチャートである。端末装置１０と音声認識サーバ２０の通信は確立されているものとする。 FIG. 7 is a flowchart showing an example of a new word registration process of the terminal device. It is assumed that the communication between the terminal device 10 and the voice recognition server 20 has been established.

まず、辞書管理部１７は、新規語句を受信したか否かを判定する（ステップＳ１１１）。具体的には、辞書管理部１７は、音声認識サーバ２０から配信された新規語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、通信制御部１９を介して受信したか否かを判定する。新規語句を受信していないと判定した場合（ステップＳ１１１：ＮＯ）、辞書管理部１７は、ステップＳ１１１の処理を継続する。 First, the dictionary management unit 17 determines whether or not a new phrase has been received (step S111). Specifically, whether or not the dictionary management unit 17 has received information (including, for example, the word / phrase, reading data, and parameters) about a new word / phrase delivered from the voice recognition server 20 via the communication control unit 19. To judge. When it is determined that the new phrase has not been received (step S111: NO), the dictionary management unit 17 continues the process of step S111.

新規語句を受信したと判定した場合（ステップＳ１１１：ＹＥＳ）、辞書管理部１７は、ステップＳ１１１で受信した新規語句がユーザ辞書１４に登録済であるか否かを判定する（ステップＳ１１２）。新規語句がユーザ辞書１４に登録済であると判定した場合（ステップＳ１１２：ＹＥＳ）、辞書管理部１７は、処理をステップＳ１１１に戻し、次の新規語句に関する処理を実行する。 When it is determined that the new phrase has been received (step S111: YES), the dictionary management unit 17 determines whether or not the new phrase received in step S111 has been registered in the user dictionary 14 (step S112). When it is determined that the new phrase has been registered in the user dictionary 14 (step S112: YES), the dictionary management unit 17 returns the process to step S111 and executes the process related to the next new phrase.

新規語句がユーザ辞書１４に登録済でないと判定した場合（ステップＳ１１２：ＮＯ）、辞書管理部１７は、新規語句をユーザ辞書１４に登録するか否かを判定する（ステップＳ１１３）。具体的には、辞書管理部１７は、ステップＳ１１１で受信した新規語句に類似する語句が、ユーザ辞書１４に登録済であるか否かを判定する。語句どうしが類似するか否かは、例えば読みデータの類似度を算出してこれに基づいて判定すればよい。そして、辞書管理部１７は、新規語句に類似する語句がユーザ辞書１４に登録済でない場合、新規語句をユーザ辞書１４に登録すると判定する。このようにすれば、新規語句と既に登録済の語句との間で、音声認識の誤りが発生するのを防ぐことができる。もちろん、ステップＳ１１３の処理を省略して、辞書管理部１７は、無条件で新規語句をユーザ辞書１４に登録してもよい。 When it is determined that the new phrase is not registered in the user dictionary 14 (step S112: NO), the dictionary management unit 17 determines whether or not to register the new phrase in the user dictionary 14 (step S113). Specifically, the dictionary management unit 17 determines whether or not a phrase similar to the new phrase received in step S111 has been registered in the user dictionary 14. Whether or not the words are similar may be determined based on, for example, calculating the similarity of the reading data. Then, the dictionary management unit 17 determines that the new phrase is registered in the user dictionary 14 when the phrase similar to the new phrase is not registered in the user dictionary 14. In this way, it is possible to prevent an error in voice recognition from occurring between the new phrase and the already registered phrase. Of course, the process of step S113 may be omitted, and the dictionary management unit 17 may unconditionally register a new phrase in the user dictionary 14.

新規語句をユーザ辞書１４に登録しないと判定した場合（ステップＳ１１３：ＮＯ）、辞書管理部１７は、処理をステップＳ１１１に戻し、次の新規語句に関する処理を実行する。 When it is determined that the new phrase is not registered in the user dictionary 14 (step S113: NO), the dictionary management unit 17 returns the process to step S111 and executes the process related to the next new phrase.

新規語句をユーザ辞書１４に登録すると判定した場合（ステップＳ１１３：ＹＥＳ）、辞書管理部１７は、新規語句をユーザ辞書１４に登録する（ステップＳ１１４）。具体的には、辞書管理部１７は、ステップＳ１１１で受信した新規語句に関する情報（例えば、当該語句、読みデータ、パラメータを含む）を、ユーザ辞書１４に登録する。そして、辞書管理部１７は、処理をステップＳ１１１に戻し、次の新規語句に関する処理を実行する。 When it is determined that the new phrase is registered in the user dictionary 14 (step S113: YES), the dictionary management unit 17 registers the new phrase in the user dictionary 14 (step S114). Specifically, the dictionary management unit 17 registers information (including, for example, the phrase, reading data, and parameters) about the new phrase received in step S111 in the user dictionary 14. Then, the dictionary management unit 17 returns the process to step S111 and executes the process related to the next new phrase.

以上、本発明の第１実施形態について説明した。本実施形態によれば、端末装置側の音声認識辞書に語句を効率的に追加してユーザの利便性を向上することができる。 The first embodiment of the present invention has been described above. According to the present embodiment, words and phrases can be efficiently added to the voice recognition dictionary on the terminal device side to improve user convenience.

［第２実施形態］
第２実施形態では、語句リスト２５に登録された語句を分類し、この分類に基づいて配信するか否かの条件を決定する。以下、第１実施形態と同様の構成は同一の符号を付して説明を省略し、第１実施形態と異なる構成を中心に説明する。 [Second Embodiment]
In the second embodiment, the words / phrases registered in the word / phrase list 25 are classified, and the condition of whether or not to distribute is determined based on this classification. Hereinafter, the same configurations as those of the first embodiment are designated by the same reference numerals and the description thereof will be omitted, and the configurations different from those of the first embodiment will be mainly described.

新規語句として想定される語句には、例えば新しくオープンした施設の名称や新しくリリースされた楽曲の名称などのように新たに作成された語句（原語句）もあれば、原名称の略称、愛称、誤用されている他の名称などの言い換え語句もある。原語句（言い換え語句に対して正式語句と呼んでもよい）については、複数のユーザに使用される可能性が高いため、音声認識サーバ２０から各端末装置１０に配信してユーザ辞書１４に登録させることで、ユーザの利便性が向上する。 The words and phrases that are supposed to be new words and phrases include newly created words and phrases (original words and phrases) such as the name of a newly opened facility and the name of a newly released song, and the abbreviation and nickname of the original name. There are also paraphrases such as other names that are misused. Since the original phrase (which may be called a formal phrase for a paraphrase phrase) is likely to be used by a plurality of users, it is distributed from the voice recognition server 20 to each terminal device 10 and registered in the user dictionary 14. As a result, the convenience of the user is improved.

しかしながら、言い換え語句は、原語句と比べると、複数のユーザに使用される可能性が低い。ユーザによっては全く使わない可能性もある。そのため、言い換え語句を原語句と同様の条件で配信すると、ユーザ辞書１４の容量を浪費し、却ってユーザの利便性に悪影響を与えるおそれがある。また、音声認識の精度の低下を不必要に発生させるおそれもある。例えば、ある施設の原名称「シアターコクーン」の言い換え名称「コクーンシアター」が、ユーザ辞書１４に登録されたとする。この場合、端末装置１０は、ユーザの発音した「コクーンシティ」を、誤って類似する「コクーンシアター」と認識する可能性が高まってしまう。 However, paraphrases are less likely to be used by multiple users than the original phrase. Some users may not use it at all. Therefore, if the paraphrase phrase is distributed under the same conditions as the original phrase, the capacity of the user dictionary 14 may be wasted, and the convenience of the user may be adversely affected. In addition, the accuracy of voice recognition may be unnecessarily reduced. For example, suppose that the paraphrase name "Cocoon Theater" of the original name "Theater Cocoon" of a certain facility is registered in the user dictionary 14. In this case, the terminal device 10 is more likely to mistakenly recognize the "cocoon city" pronounced by the user as a similar "cocoon theater".

そこで、第２実施形態の音声認識サーバ２０は、語句リスト２５に登録された語句を、原語句と言い換え語句に分類し、当該語句を配信するか否かを決定する際に、種類に応じた条件を用いる。 Therefore, the voice recognition server 20 of the second embodiment classifies the words / phrases registered in the word / phrase list 25 into the original words / phrases and the paraphrased words / phrases, and determines whether or not to deliver the words / phrases according to the type. Use the condition.

図８は、第２実施形態に係る語句リストのデータ構成の一例を示す図である。語句リスト２５の各レコードは、語句２５ａ、属性２５ｂ、及び総登録回数２５ｃに加え、月別登録回数２５ｄを含む。月別登録回数２５ｄは、語句２５ａが示す語句の月別の登録回数である。月別登録回数２５ｄには、例えば、直近１２ヵ月の各月の登録回数が登録される。もちろん、単位期間は、月に限定されるものでなく、任意の月数、週数、日数などの他の単位期間であってもよい。 FIG. 8 is a diagram showing an example of the data structure of the phrase list according to the second embodiment. Each record in the phrase list 25 includes the phrase 25a, the attribute 25b, and the total number of registrations 25c, as well as the number of monthly registrations 25d. The monthly registration count 25d is the monthly registration count of the phrase indicated by the phrase 25a. In the monthly registration number 25d, for example, the registration number of each month of the last 12 months is registered. Of course, the unit period is not limited to months, and may be any other unit period such as the number of months, the number of weeks, or the number of days.

図９は、音声認識サーバの新規語句登録処理および新規語句配信処理の一例を示すフローチャートである。図９のステップＳ２０１〜Ｓ２０３の処理は、図６のステップＳ２０１〜Ｓ２０３の処理と同様なので、説明を省略する。 FIG. 9 is a flowchart showing an example of a new word registration process and a new word distribution process of the voice recognition server. Since the processes of steps S201 to S203 of FIG. 9 are the same as the processes of steps S201 to S203 of FIG. 6, the description thereof will be omitted.

ステップＳ２０２又はステップＳ２０３の後、語句管理部２３は、登録回数をカウントアップする（ステップＳ２１０）。具体的には、語句管理部２３は、ステップＳ２０１で受信した新規語句（配信対象の語句）に対応するレコードの総登録回数２５ｃを１カウントアップする。また、語句管理部２３は、当該新規語句に対応するレコードの月別登録回数２５ｄのうち、当該新規語句の受信時の月に対応する登録回数を１カウントアップする。 After step S202 or step S203, the phrase management unit 23 counts up the number of registrations (step S210). Specifically, the phrase management unit 23 counts up the total number of registered records 25c corresponding to the new phrase (phrase to be distributed) received in step S201 by one. In addition, the phrase management unit 23 counts up the number of registrations corresponding to the month when the new phrase is received by 1 out of the monthly registration count 25d of the record corresponding to the new phrase.

それから、語句配信部２４は、新規語句を分類する（ステップＳ２１１）。本実施形態では、語句配信部２４は、新規語句の月別登録回数２５ｄが示す各月の登録回数の時系列推移に基づいて、新規語句が原語句であるかその言い換え語句であるかを判定する。 Then, the phrase distribution unit 24 classifies the new phrase (step S211). In the present embodiment, the phrase distribution unit 24 determines whether the new phrase is the original phrase or its paraphrase based on the time-series transition of the registration count of each month indicated by the monthly registration count 25d of the new phrase. ..

例えば、原語句が新しくオープンした施設の名称や新しく発売された楽曲の名称であれば、ある時点から登録回数が増加傾向になると考えられる。一方、当該原語句の言い換え語句の場合、当該言い換え語句を使用するユーザは少ないため、ある時点から登録回数が増加傾向になるとは考えられない。このような性質に基づけば、語句配信部２４は、例えば、新規語句の登録回数の時系列推移のパターンを統計的に分析することで、当該新規語句が原語句であるか言い換え語句であるか、さらに他の種類の語句であるかを判定することができる。 For example, if the original phrase is the name of a newly opened facility or the name of a newly released song, it is thought that the number of registrations will tend to increase from a certain point. On the other hand, in the case of a paraphrase of the original phrase, since there are few users who use the paraphrase, it is unlikely that the number of registrations will tend to increase from a certain point in time. Based on such a property, the phrase distribution unit 24, for example, statistically analyzes the pattern of the time-series transition of the number of registrations of a new phrase to determine whether the new phrase is an original phrase or a paraphrase phrase. , And it can be determined whether it is another kind of phrase.

新規語句の種類の判定方法は、上述の例に限られない。例えば、語句配信部２４は、新規語句をキーワードとして、予め用意されたデータベースやインターネット上を検索することで、当該新規語句が原語句であるか言い換え語句であるか、さらには他の種類の語句であるかを判定してもよい。 The method for determining the type of new phrase is not limited to the above example. For example, the phrase distribution unit 24 searches a database or the Internet prepared in advance using a new phrase as a keyword to determine whether the new phrase is an original phrase, a paraphrase phrase, or another type of phrase. It may be determined whether or not.

それから、語句配信部２４は、新規語句の種類が第１の種類であるか否かを判定する（ステップＳ２１２）。具体的には、語句配信部２４は、ステップＳ２１１で分類した新規語句の種類（原語句、言い換え語句、又は他の語句）が、第１の種類（原語句）又は第２の種類（言い換え語句、他の語句）であるかを判定する。 Then, the word / phrase distribution unit 24 determines whether or not the new word / phrase type is the first type (step S212). Specifically, in the phrase distribution unit 24, the type of the new phrase (original phrase, paraphrase phrase, or other phrase) classified in step S211 is the first type (original phrase) or the second type (paraphrase phrase). , Other words).

新規語句の種類が第１の種類であると判定した場合（ステップＳ２１２：ＹＥＳ）、語句配信部２４は、総登録回数が第１の閾値を超えたか否かを判定する（ステップＳ２１３）。具体的には、語句配信部２４は、ステップＳ２１０でカウントアップした総登録回数２５ｃが、原語句のための第１の閾値を超えたか否かを判定する。第１の閾値は、例えば、総登録回数が当該閾値を超えた場合は当該原語句が不特定多数のユーザによって使われる可能性がある語句であると推定するための設計値である。総登録回数２５ｃが第１の閾値を超えていないと判定された場合（ステップＳ２１３：ＮＯ）、処理はステップＳ２０１に戻り、語句管理部２３は次に受信する新規語句に関する処理を実行する。 When it is determined that the new word type is the first type (step S212: YES), the word / phrase distribution unit 24 determines whether or not the total number of registrations exceeds the first threshold value (step S213). Specifically, the phrase distribution unit 24 determines whether or not the total number of registrations 25c counted up in step S210 exceeds the first threshold value for the original phrase. The first threshold value is, for example, a design value for estimating that the original phrase is a phrase that may be used by an unspecified number of users when the total number of registrations exceeds the threshold. When it is determined that the total number of registrations 25c does not exceed the first threshold value (step S213: NO), the process returns to step S201, and the phrase management unit 23 executes the process related to the new phrase to be received next.

新規語句の種類が第２の種類であると判定した場合（ステップＳ２１２：ＮＯ）、語句配信部２４は、総登録回数が第２の閾値を超えたか否かを判定する（ステップＳ２１４）。具体的には、語句配信部２４は、ステップＳ２１０でカウントアップした総登録回数２５ｃが、言い換え語句又は他の種類の語句のための第２の閾値を超えたか否かを判定する。第２の閾値は、例えば、総登録回数が当該閾値を超えた場合は当該言い換え語句又は他の種類の語句が不特定多数のユーザによって使われる可能性がある語句であると推定するための設計値である。総登録回数２５ｃが第２の閾値を超えていないと判定された場合（ステップＳ２１４：ＮＯ）、処理はステップＳ２０１に戻り、語句管理部２３は次に受信する新規語句に関する処理を実行する。 When it is determined that the type of the new phrase is the second type (step S212: NO), the phrase distribution unit 24 determines whether or not the total number of registrations exceeds the second threshold value (step S214). Specifically, the phrase distribution unit 24 determines whether or not the total number of registrations 25c counted up in step S210 exceeds the second threshold value for paraphrase phrases or other types of phrases. The second threshold value is designed to estimate, for example, that the paraphrase phrase or another type of phrase is a phrase that may be used by an unspecified number of users when the total number of registrations exceeds the threshold value. The value. When it is determined that the total number of registrations 25c does not exceed the second threshold value (step S214: NO), the process returns to step S201, and the phrase management unit 23 executes the process related to the new phrase to be received next.

総登録回数２５ｃが第１の閾値を超えていると判定した場合（ステップＳ２１３：ＹＥＳ）、又は、総登録回数２５ｃが第２の閾値を超えていると判定した場合（ステップＳ２１４：ＹＥＳ）、語句配信部２４は、新規語句を配信する（ステップＳ２１５）。ステップＳ２１５の処理は、図６のステップＳ２０６の処理と同様なので、説明を省略する。 When it is determined that the total number of registrations 25c exceeds the first threshold value (step S213: YES), or when it is determined that the total number of registrations 25c exceeds the second threshold value (step S214: YES). The word / phrase distribution unit 24 distributes a new word / phrase (step S215). Since the process of step S215 is the same as the process of step S206 of FIG. 6, the description thereof will be omitted.

図９のフローでは、語句を２つの種類に分類しているが、例えば、原語句、言い換え語句、及びその他の語句の３つ以上に分類してもよい。また、種類ごとに閾値を用意して、総登録回数を判定してもよい。 In the flow of FIG. 9, words and phrases are classified into two types, but for example, they may be classified into three or more of original words, paraphrase words, and other words. Further, a threshold value may be prepared for each type to determine the total number of registrations.

以上、本発明の第２実施形態について説明した。本実施形態によれば、言い換え語句を原語句よりも音声認識辞書に登録し難くすることで、音声認識辞書に追加される語句を制御して、辞書の容量の浪費や音声認識の精度の低下を防ぐことができる。 The second embodiment of the present invention has been described above. According to the present embodiment, by making it more difficult to register the paraphrase phrase in the speech recognition dictionary than the original phrase, the phrase added to the speech recognition dictionary is controlled, the capacity of the dictionary is wasted, and the accuracy of speech recognition is reduced. Can be prevented.

本発明は、上述の実施形態に限定されず、本発明の要旨の範囲内で種々の変形実施が可能である。実施形態および各変形例を適宜組み合わせることもできる。 The present invention is not limited to the above-described embodiment, and various modifications can be made within the scope of the gist of the present invention. The embodiment and each modification can be combined as appropriate.

ある変形例では、図９のステップＳ２１５において、語句配信部２４は、新規語句の種類（原語句又は言い換え語句）を各端末装置１０に配信してもよい。この場合、新規語句を受信した端末装置１０の辞書管理部１７は、図７のステップＳ１１３において、他の語句と類似であると判定をするための基準を、受信した新規語句が言い換え語句である場合は原語句である場合よりも厳しく設定する。例えば、類似度が所定の閾値を超える場合に類似すると判定する場合は、言い換え語句に用いる閾値を、原語句に用いる閾値よりも高く設定すればよい。このようにすれば、端末装置１０側において、新規語句の種類に応じて、ユーザ辞書１４に登録する条件を決定することができる。各端末装置１０において新規語句の種類に応じて条件を決定する場合は、図９のステップＳ２１２からステップＳ２１４の処理は図６のステップＳ２０５の処理と置き換えてもよい。 In a modified example, in step S215 of FIG. 9, the phrase distribution unit 24 may distribute a new phrase type (original phrase or paraphrase phrase) to each terminal device 10. In this case, in step S113 of FIG. 7, the dictionary management unit 17 of the terminal device 10 that has received the new phrase is a paraphrase phrase based on the criteria for determining that the new phrase is similar to the other phrase. In the case, set it more strictly than in the case of the original phrase. For example, when it is determined that the similarity exceeds a predetermined threshold value, the threshold value used for the paraphrase phrase may be set higher than the threshold value used for the original phrase. In this way, the terminal device 10 can determine the conditions for registering in the user dictionary 14 according to the type of the new phrase. When the conditions are determined according to the type of the new phrase in each terminal device 10, the processes of steps S212 to S214 of FIG. 9 may be replaced with the processes of step S205 of FIG.

他の変形例では、語句送信部１８が新規語句を音声認識サーバ２０に送信するタイミングは、図５のフローチャート内に限られず、別のタイミングであってもよい。またさらに他の変形例では、語句配信部２４が新規語句を端末装置１０に配信するタイミングは、図６及び図９のフローチャート内に限られず、別のタイミングであってもよい。 In another modification, the timing at which the phrase transmission unit 18 transmits a new phrase to the voice recognition server 20 is not limited to the flowchart of FIG. 5, and may be another timing. In yet another modification, the timing at which the phrase distribution unit 24 distributes a new phrase to the terminal device 10 is not limited to the flowcharts of FIGS. 6 and 9, and may be another timing.

図１の端末装置１０及び音声認識サーバ２０の構成は、当該これらの装置の構成を理解容易にするために、主な処理内容に応じて分類したものである。構成要素の分類の仕方や名称によって、本発明が制限されることはない。端末装置１０及び音声認識サーバ２０の構成は、処理内容に応じて、さらに多くの構成要素に分類することもできる。また、１つの構成要素がさらに多くの処理を実行するように分類することもできる。また、各構成要素の処理は、１つのハードウェアで実行されてもよいし、複数のハードウェアで実行されてもよい。また、各構成要素の処理又は機能の分担は、本発明の目的及び効果を達成できるのであれば、上述したものに限られない。また、図３及び図８に示すデータ構成は、一例であり、本発明の目的を達成することができるのであれば、図示した例に限定されない。 The configurations of the terminal device 10 and the voice recognition server 20 in FIG. 1 are classified according to the main processing contents in order to make the configurations of the devices easy to understand. The present invention is not limited by the method of classifying the components and the names. The configurations of the terminal device 10 and the voice recognition server 20 can be further classified into more components according to the processing content. It can also be categorized so that one component performs more processing. Further, the processing of each component may be executed by one hardware or may be executed by a plurality of hardware. Further, the processing or division of functions of each component is not limited to those described above as long as the object and effect of the present invention can be achieved. Further, the data structures shown in FIGS. 3 and 8 are examples, and are not limited to the illustrated examples as long as the object of the present invention can be achieved.

図５〜７、及び図９で示したフローチャートの処理単位は、端末装置１０及び音声認識サーバ２０の処理を理解容易にするために、主な処理内容に応じて分割したものである。処理単位の分割の仕方や名称によって、本発明が制限されることはない。端末装置１０及び音声認識サーバ２０の処理は、処理内容に応じて、さらに多くの処理単位に分割することもできる。また、１つの処理単位がさらに多くの処理を含むように分割することもできる。さらに、本発明の目的及び効果を達成できるのであれば、上記のフローチャートの処理順序も、図示した例に限られるものではない。 The processing units of the flowcharts shown in FIGS. 5 to 7 and 9 are divided according to the main processing contents in order to make the processing of the terminal device 10 and the voice recognition server 20 easy to understand. The present invention is not limited by the method and name of division of processing units. The processing of the terminal device 10 and the voice recognition server 20 can be divided into more processing units according to the processing content. Further, one processing unit can be divided so as to include more processing. Furthermore, the processing order of the above flowchart is not limited to the illustrated example as long as the object and effect of the present invention can be achieved.

上記の実施形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施形態の構成の一部を他の実施形態や変形例の構成に置き換えることが可能であり、ある実施形態の構成に他の実施形態や変形例の構成を加えることも可能である。また、各実施形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 The above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the described configurations. Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment or modification, and it is also possible to add the configuration of another embodiment or modification to the configuration of one embodiment. .. Further, it is possible to add / delete / replace a part of the configuration of each embodiment with another configuration.

また、上記の各構成、機能、処理部及び処理手段などは、それらの一部又は全部を、プロセッサが各々の機能を実現するプログラムにより実現しても良い。各機能を実現するプログラム、テーブル、ファイルなどの情報は、メモリや、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）などの記憶装置、又は、ＩＣカード、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリカード、ＤＶＤなどの記憶媒体に置くことができる。なお、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。 In addition, each of the above configurations, functions, processing units, processing means, and the like may be realized in part or all of them by a program in which the processor realizes each function. Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a storage device such as an SSD (Solid State Drive), or a storage medium such as an IC card, SD (Secure Digital) memory card, or DVD. Can be placed. Note that the control lines and information lines are those that are considered necessary for explanation, and not all control lines and information lines are necessarily shown in the product.

本発明は、音声認識システム、音声認識サーバ、端末装置に限らず、語句管理方法、コンピュータ読み取り可能なプログラム等の様々な形態で提供することができる。 The present invention is not limited to a voice recognition system, a voice recognition server, and a terminal device, and can be provided in various forms such as a phrase management method and a computer-readable program.

１…音声認識システム、１０…端末装置、１１…音声送信部、１２…音声認識部、１３…音声認識辞書、１４…ユーザ辞書、１５…認識結果取得部、１６…インターフェイス制御部、１７…辞書管理部、１８…語句送信部、１９…通信制御部、２０…音声認識サーバ、２１…音声認識部、２２…音声認識辞書、２３…語句管理部、２４…語句配信部、２５…語句リスト、２５ａ…語句、２５ｂ…属性、２５ｃ…総登録回数、２５ｄ…月別登録回数、２６…通信制御部、９０…コンピュータ、９１…演算装置、９２…主記憶装置、９３…外部記憶装置、９４…通信装置、９５…入力装置、９６…出力装置、Ｍ…マイクロフォン、Ｎ…通信ネットワーク 1 ... Voice recognition system, 10 ... Terminal device, 11 ... Voice transmission unit, 12 ... Voice recognition unit, 13 ... Voice recognition dictionary, 14 ... User dictionary, 15 ... Recognition result acquisition unit, 16 ... Interface control unit, 17 ... Dictionary Management unit, 18 ... word transmission unit, 19 ... communication control unit, 20 ... voice recognition server, 21 ... voice recognition unit, 22 ... voice recognition dictionary, 23 ... word management unit, 24 ... word distribution unit, 25 ... word list, 25a ... words, 25b ... attributes, 25c ... total number of registrations, 25d ... monthly registrations, 26 ... communication control unit, 90 ... computer, 91 ... arithmetic device, 92 ... main storage device, 93 ... external storage device, 94 ... communication Device, 95 ... Input device, 96 ... Output device, M ... Microphone, N ... Communication network

Claims

A voice recognition system including a terminal device that recognizes a user's voice data by voice and a voice recognition server that communicates with the terminal device and recognizes the user's voice data by voice.
The voice recognition server
A server-side communication control unit that communicates with the terminal device,
A server-side voice recognition unit that recognizes the user's voice data transmitted from the terminal device and transmits the recognition result to the terminal device.
Information about words and phrases that have been transmitted from the terminal device obtained by using the server-side communication control section, and information about the word transmitted from another terminal apparatus, a word management unit for registering the word list,
A phrase distribution unit that distributes information about a phrase registered in the phrase list to at least one of the terminal device and the other terminal device using the server-side communication control unit is provided.
The terminal device is
A terminal-side communication control unit that communicates with the voice recognition server,
A voice recognition dictionary storage unit that stores a voice recognition dictionary for voice recognition,
A terminal-side voice recognition unit that recognizes the user's voice data by using a voice recognition dictionary stored in the voice recognition dictionary storage unit and obtains the recognition result.
A voice transmission unit that transmits the voice data of the user to the voice recognition server using the terminal-side communication control unit, and
A recognition result acquisition unit that compares the recognition result from the terminal-side voice recognition unit with the recognition result from the voice recognition server acquired by using the terminal-side communication control unit, and selects one of the recognition results. ,
A dictionary that determines whether or not the phrase indicated by the selected recognition result exists in the speech recognition dictionary stored in the speech recognition dictionary storage unit, and if it does not exist, registers the phrase in the speech recognition dictionary. With the management department
A word / phrase transmission unit that transmits information about words / phrases registered in the voice recognition dictionary by the dictionary management unit to the voice recognition server using the terminal-side communication control unit is provided.
The dictionary management unit is a voice recognition system that registers words and phrases distributed from the voice recognition server acquired by using the terminal-side communication control unit in the voice recognition dictionary.

The voice recognition system according to claim 1.
The phrase management unit records the number of registrations for each phrase in the phrase list.
The phrase distribution unit is a voice recognition system that determines whether or not to distribute the phrase based on the number of registrations.

The voice recognition system according to claim 1.
The phrase management unit records the number of registrations for each phrase in the phrase list.
The phrase distribution unit is a voice recognition system that determines the type of the phrase and determines whether or not to distribute the phrase based on the number of times the phrase is registered and the conditions corresponding to the determined type.

The voice recognition system according to claim 3.
The phrase management unit records the number of registrations for each of the terms in the phrase list for each unit period.
The phrase distribution unit is a voice recognition system that determines the type of the phrase based on the time-series transition of the number of registrations of the phrase for each unit period.

The voice recognition system according to claim 3.
The phrase distribution unit is a voice recognition system that classifies the phrase into a formal phrase or a paraphrase phrase.

The voice recognition system according to claim 1.
The phrase management unit associates the attributes of each phrase with each other and records them in the phrase list.
The phrase distribution unit is a voice recognition system that determines a terminal device to which the phrase is distributed based on the attributes of the phrase.

The voice recognition system according to claim 6.
The attribute is position information related to the phrase, and is
The phrase distribution unit is a voice recognition system that identifies a user who has a predetermined relationship with the position indicated by the position information, and determines the terminal device of the specified user as the distribution destination.

The voice recognition system according to claim 6.
The phrase is the name of the song and
The attribute is an identifier of the artist associated with the song.
The phrase distribution unit is a voice recognition system that identifies a user who has a predetermined relationship with the artist indicated by the identifier of the artist, and determines the terminal device of the specified user as the distribution destination.

The voice recognition system according to claim 1.
The dictionary management unit calculates the degree of similarity between the distributed words and phrases and each word and phrase registered in the voice recognition dictionary, and determines whether or not to register the distributed words and phrases based on the similarity. Speech recognition system to decide.

A voice recognition server that communicates with multiple terminal devices that recognize voice data of each user.
A server-side communication control unit that communicates with each of the terminal devices,
A server-side voice recognition unit that recognizes the voice data of the user transmitted from each terminal device and transmits the recognition result to the terminal device that transmits the voice data.
A word management unit that receives information about words and phrases registered in the voice recognition dictionary of each terminal device based on the recognition result by using the server-side communication control unit, and registers information about the words and phrases in a word and phrase list.
A voice recognition server including a word / phrase distribution unit that distributes information about a word / phrase registered in the word / phrase list to one or more of the plurality of terminal devices by using the server-side communication control unit.

A terminal device that communicates with a voice recognition server that recognizes user's voice data.
A terminal-side communication control unit that communicates with the voice recognition server,
A voice recognition dictionary storage unit that stores a voice recognition dictionary for voice recognition,
A terminal-side voice recognition unit that recognizes the user's voice data by using a voice recognition dictionary stored in the voice recognition dictionary storage unit and obtains the recognition result.
A voice transmission unit that transmits the voice data of the user to the voice recognition server using the terminal-side communication control unit, and
A recognition result acquisition unit that compares the recognition result from the terminal-side voice recognition unit with the recognition result from the voice recognition server acquired by using the terminal-side communication control unit, and selects one of the recognition results. ,
A dictionary that determines whether or not the phrase indicated by the selected recognition result exists in the speech recognition dictionary stored in the speech recognition dictionary storage unit, and if it does not exist, registers the phrase in the speech recognition dictionary. With the management department
A word / phrase transmission unit that transmits information about words / phrases registered in the voice recognition dictionary by the dictionary management unit to the voice recognition server using the terminal-side communication control unit is provided.
The dictionary management unit is a terminal device that registers words and phrases distributed from the voice recognition server in the voice recognition dictionary using the terminal-side communication control unit.

It is a word management method of a voice recognition system including a terminal device that recognizes a user's voice data by voice and a voice recognition server that communicates with the terminal device and recognizes the user's voice data by voice.
A step in which the terminal device uses the voice recognition dictionary provided in the terminal device to perform voice recognition of the user's voice data and obtains the recognition result.
A step in which the terminal device transmits the voice data of the user to the voice recognition server.
A step in which the voice recognition server voice-recognizes the voice data of the user transmitted from the terminal device and transmits the recognition result to the terminal device.
A step in which the terminal device compares the recognition result from the terminal device with the recognition result from the voice recognition server and selects one of the recognition results.
The terminal device determines whether or not the phrase indicated by the selected recognition result exists in the speech recognition dictionary, and if it does not exist, the step of registering the phrase in the speech recognition dictionary.
A step of the terminal device, the pre-Symbol information about words registered in the voice recognition dictionary, and transmits to the speech recognition server,
A step in which the voice recognition server registers information about the phrase transmitted from the terminal device and information about the phrase transmitted from another terminal device in the phrase list.
A step in which the voice recognition server distributes information about a phrase registered in the phrase list to at least one of the terminal device and the other terminal device.
A word / phrase management method including a step in which the terminal device registers a word / phrase delivered from the voice recognition server in the voice recognition dictionary.