JP2016212331A

JP2016212331A - Articulation rating server device, articulation rating method, and program

Info

Publication number: JP2016212331A
Application number: JP2015097807A
Authority: JP
Inventors: 博章田川; Hiroaki Tagawa; 玲子山田; Reiko Yamada
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2015-05-13
Filing date: 2015-05-13
Publication date: 2016-12-15
Anticipated expiration: 2035-05-13
Also published as: JP6635427B2

Abstract

PROBLEM TO BE SOLVED: To overcome the long-suffered difficulty to effectively articulate, resulting from the problem of inadequate processing speed.SOLUTION: High-speed articulation rating can be accomplished with an articulation rating server device equipped with a receiving unit that accepts voice uttered by a user regarding a content to be uttered and receives from a terminal device voice-related data, which is the result of first partial processing for articulation rating; a rating result acquiring unit that performs on the voice-related data second partial processing for articulation rating, to be accomplished after the first partial processing, and acquires the result of rating; and a transmitting unit that transmits the result of rating to the terminal device.SELECTED DRAWING: Figure 2

Description

本発明は、発音評定処理を分散環境で行う発音評定システム、発音評定サーバ装置等に関するものである。 The present invention relates to a pronunciation rating system that performs pronunciation rating processing in a distributed environment, a pronunciation rating server apparatus, and the like.

従来、スタンドアロンの発音評定装置が存在した（例えば、特許文献１、特許文献２参照）。 Conventionally, there has been a stand-alone pronunciation rating device (see, for example, Patent Document 1 and Patent Document 2).

特許第４８５９１２５号公報Japanese Patent No. 4859125 特許第５００７４０１号公報Japanese Patent No. 5007401

しかしながら、従来技術において、ユーザの端末である発音評定装置に発音評定プログラムの全体をインストールして発音トレーニングを行うので、発音評定のコアを形成するアルゴリズムの機密を保護することが難しかった。 However, in the prior art, the entire pronunciation rating program is installed in the pronunciation rating device, which is the user's terminal, and pronunciation training is performed, so it is difficult to protect the confidentiality of the algorithm that forms the core of the pronunciation rating.

また、従来技術において、１台のコンピュータで処理を行っており、処理速度の問題から、効果的な発音トレーニングが困難である場合があった。なお、語学学習における発音トレーニングのために発音評定装置を利用する場合、ユーザが発声した後、評定結果を出力するまでに遅れが発生すると、語学学習の意欲が削がれる、という課題があった。 Further, in the conventional technology, processing is performed by one computer, and effective pronunciation training may be difficult due to a problem of processing speed. In addition, when using a pronunciation rating device for pronunciation training in language learning, there was a problem that if there was a delay after the user uttered until the rating result was output, the motivation for language learning would be cut off .

本第一の発明の発音評定サーバ装置は、指示された発話内容に対してユーザが発声した音声を受け付け、発音評定のための第一の部分処理である第一部分処理を音声に対して行った結果である音声関連データを端末装置から受信する受信部と、第一部分処理の後に行うべき発音評定のための第二の部分処理である第二部分処理を音声関連データに対して行い、評定結果を取得する評定結果取得部と、評定結果を端末装置に送信する送信部とを具備する発音評定サーバ装置である。 The pronunciation rating server device according to the first aspect of the invention receives the voice uttered by the user for the instructed utterance content, and performs the first partial process, which is the first partial process for the pronunciation rating, on the voice A receiving unit that receives the voice-related data as a result from the terminal device, and a second partial process that is a second partial process for pronunciation evaluation to be performed after the first partial process is performed on the voice-related data, and the evaluation result Is a pronunciation rating server device comprising a rating result acquisition unit for acquiring the rating result and a transmission unit for transmitting the rating result to the terminal device.

かかる構成により、発音評定のアルゴリズムを外部に知られるリスクを低減できる。また、高速な発音評定処理を行える。 With this configuration, it is possible to reduce the risk of knowing the pronunciation rating algorithm to the outside. In addition, high-speed pronunciation rating processing can be performed.

また、本第二の発明の発音評定サーバ装置は、第一の発明に対して、音声関連データ、指示された発話内容を示す発話内容データまたは発話内容を識別する発話内容識別子、および評定結果を対応付けた１以上の発音評定情報を蓄積する蓄積部と、１以上の発音評定情報に対して処理を行う処理部とをさらに具備する請求項１記載の発音評定サーバ装置である。 In addition, the pronunciation rating server device of the second aspect of the present invention provides the voice related data, the utterance content data indicating the instructed utterance content or the utterance content identifier for identifying the utterance content, and the evaluation result for the first invention. The pronunciation rating server apparatus according to claim 1, further comprising a storage unit that accumulates one or more associated pronunciation rating information and a processing unit that performs processing on the one or more pronunciation rating information.

かかる構成により、発音評定の結果を有効に利用できる。 With this configuration, the result of pronunciation evaluation can be used effectively.

また、本第三の発明の発音評定サーバ装置は、第二の発明に対して、受信部は、発話内容を識別する発話内容識別子、または発話内容を識別する発話内容識別子を用いて取得された発話内容データをも受信し、蓄積部は、音声関連データ、および発話内容データまたは発話内容識別子、および評定結果を対応付けて蓄積する発音評定サーバ装置である。 In addition, the pronunciation rating server device according to the third aspect of the invention is obtained by using the utterance content identifier for identifying the utterance content or the utterance content identifier for identifying the utterance content, as compared with the second invention. The utterance content data is also received, and the storage unit is a pronunciation rating server device that stores the voice related data, the utterance content data or the utterance content identifier, and the rating result in association with each other.

かかる構成により、少ない通信量で、発音評定の結果を有効に利用できる。 With such a configuration, the result of pronunciation evaluation can be used effectively with a small amount of communication.

また、本第四の発明の発音評定サーバ装置は、第一から第三いずれか１つの発明に対して、受信部が受信したデータであり、音声関連データを含む受信データを、評定結果取得部が第二部分処理を行えるデータに変更し、かつ評定結果取得部が取得した評定結果を、送信部が送信できるデータに変更するミドルウェア部をさらに具備し、評定結果取得部は、ミドルウェア部が変更した後の音声関連データに対して第二部分処理を行い、評定結果を取得し、送信部は、ミドルウェア部が変更した後の評定結果を端末装置に送信する発音評定サーバ装置である。 In addition, the pronunciation rating server device according to the fourth aspect of the present invention is the data received by the receiving unit with respect to any one of the first to third inventions, and the received data including the voice-related data is converted into the rating result acquiring unit. Is changed to data that can be processed by the second partial processing, and further comprises a middleware part that changes the evaluation result acquired by the evaluation result acquisition part to data that can be transmitted by the transmission part. The evaluation result acquisition part is changed by the middleware part The second partial process is performed on the voice-related data after the acquisition, the evaluation result is acquired, and the transmission unit is a pronunciation evaluation server device that transmits the evaluation result after the middleware unit changes to the terminal device.

かかる構成により、通信プロトコルの変更や、発音評定アルゴリズムの高度化への対応が容易になる。 With such a configuration, it becomes easy to cope with changes in communication protocols and advancement of pronunciation rating algorithms.

また、本第五の発明の発音評定サーバ装置は、第四の発明に対して、受信部は、端末装置の端末ミドルウェア部が行った処理の結果のデータであり、第一部分処理を音声に対して行われた結果である音声関連データに対して、送信できるデータに変更された後のデータであり、音声関連データを含むデータを端末装置から受信する発音評定サーバ装置である。 In the pronunciation rating server device according to the fifth aspect of the present invention, in contrast to the fourth aspect of the invention, the receiving unit is data obtained as a result of processing performed by the terminal middleware unit of the terminal device. This is a pronunciation rating server device that receives data including voice-related data, which is data after the voice-related data, which is the result of the above, has been changed to data that can be transmitted.

また、本第六の発明の発音評定サーバ装置は、第一から第三いずれか１つの発明に対して、受信部は、２以上の端末装置から音声関連データを受信し、評定結果取得部は、受信部が受信した２以上の各音声関連データに対して、並行して第二部分処理を行い、２以上の評定結果を取得する発音評定サーバ装置である。 In the pronunciation rating server device according to the sixth aspect of the present invention, with respect to any one of the first to third inventions, the receiving unit receives voice-related data from two or more terminal devices, and the rating result acquiring unit is This is a pronunciation rating server device that performs second partial processing on two or more pieces of audio-related data received by the receiving unit in parallel to obtain two or more rating results.

かかる構成により、複数の端末装置に対して、発音評定サービスを提供でき、かつ発音評定のアルゴリズムを外部に知られるリスクを低減できる。また、複数の端末装置に対して、高速な発音評定サービスを提供できる。 With this configuration, a pronunciation rating service can be provided for a plurality of terminal devices, and the risk of knowing the pronunciation rating algorithm to the outside can be reduced. In addition, a high-speed pronunciation rating service can be provided to a plurality of terminal devices.

また、本第七の発明の発音評定サーバ装置は、第六の発明に対して、受信部は、２以上の各端末装置が異なる第一部分処理を行った結果である、異なる種類の音声関連データを、２以上の端末装置から受信し、評定結果取得部は、異なる種類の各音声関連データに対して、異なる第二部分処理を行い、評定結果を取得する発音評定サーバ装置である。 In addition, the pronunciation rating server device according to the seventh invention is different from the sixth invention in that the receiving unit is a result of different first partial processing performed by two or more terminal devices, and different types of voice related data. Are received from two or more terminal devices, and the rating result acquisition unit is a pronunciation rating server device that performs different second partial processing on each type of audio-related data and acquires the rating result.

かかる構成により、異なる端末装置の仕様に対応しつつ、複数の端末装置に対して、発音評定サービスを提供できる。 With this configuration, it is possible to provide a pronunciation rating service to a plurality of terminal devices while supporting specifications of different terminal devices.

また、本第八の発明の発音評定サーバ装置は、第六または第七の発明に対して、１以上の音素毎の音響モデルである２以上の教師データを格納している格納部をさらに具備し、評定結果取得部は、ユーザの属性値であるユーザ属性値に応じて、異なる教師データを用いて、第一部分処理の後に行うべき発音評定のための第二の部分処理である第二部分処理を音声関連データに対して行い、評定結果を取得する発音評定サーバ装置である。 In addition, the pronunciation rating server device according to the eighth aspect of the invention further includes a storage unit that stores two or more teacher data, which is an acoustic model for each of one or more phonemes, with respect to the sixth or seventh aspect. Then, the rating result acquisition unit uses a different teacher data in accordance with the user attribute value that is the user attribute value, and the second part is a second partial process for pronunciation evaluation to be performed after the first partial process. It is a pronunciation rating server device that performs processing on voice-related data and obtains a rating result.

かかる構成により、ユーザ属性値に応じた、適切な発音評定を行える。 With this configuration, it is possible to perform an appropriate pronunciation rating according to the user attribute value.

また、本第九の発明の発音評定サーバ装置は、第六から第八いずれか１つの発明に対して、２以上の発音評定サーバ装置のうちの一の発音評定サーバ装置であって、評定結果取得部は、２以上の端末装置のうちの一部の端末装置から送信された音声関連データに対して第二部分処理を行い、評定結果を取得する発音評定サーバ装置である。 The pronunciation rating server device according to the ninth aspect of the present invention is one pronunciation rating server device out of two or more pronunciation rating server devices for any one of the sixth to eighth inventions, and the rating result The acquisition unit is a pronunciation rating server device that performs second partial processing on voice-related data transmitted from some of the two or more terminal devices and acquires a rating result.

かかる構成により、発音評定サーバの負荷分散を行える。 With this configuration, it is possible to distribute the load on the pronunciation rating server.

また、本第十の発明の発音評定サーバ装置は、第九の発明に対して、２以上の発音評定サーバ装置の負荷を調整するために、受信部が受信した２以上の各音声関連データに対して、第二部分処理を行う発音評定サーバ装置を決定する調整手段を具備するミドルウェア部をさらに具備する発音評定サーバ装置である。 In addition, the pronunciation rating server device of the tenth aspect of the invention relates to each of the two or more audio-related data received by the receiving unit in order to adjust the load of the two or more pronunciation rating server devices. On the other hand, the pronunciation rating server apparatus further includes a middleware unit including an adjusting unit that determines a pronunciation rating server apparatus that performs the second partial processing.

また、本第十一の発明の発音評定サーバ装置は、第十の発明に対して、ミドルウェア部は、調整手段が決定した発音評定サーバ装置に対して、音声関連データを送付する送付手段と、調整手段が決定した発音評定サーバ装置が取得した評定結果を受け付ける受付手段とをさらに具備し、送信部は、受付手段が受け付けた評定結果を端末装置に送信する発音評定サーバ装置である。 In addition, the pronunciation rating server device according to the eleventh aspect of the invention relates to the tenth invention, in which the middleware unit sends voice-related data to the pronunciation rating server device determined by the adjusting means; And a receiving unit that receives the rating result acquired by the pronunciation rating server device determined by the adjusting unit, and the transmitting unit is a pronunciation rating server device that transmits the rating result received by the receiving unit to the terminal device.

かかる構成により、発音評定サーバの負荷分散を行え、かつ通信プロトコルの変更や、発音評定アルゴリズムの高度化への対応が容易になる。 With this configuration, it is possible to distribute the load of the pronunciation rating server, and to easily cope with changes in the communication protocol and advancement of the pronunciation rating algorithm.

また、本第十二の発明の発音評定サーバ装置は、第一から第十一いずれか１つの発明に対して、第一部分処理は、音声をデジタル化し、音声データを取得する処理、または、音声をデジタル化し、音声データを取得し、音声データから特徴ベクトルの集合である特徴データを取得する処理であり、第二部分処理は、音声データから特徴データを取得し、特徴データと教師データとを用いて、特徴データが評定対象の音素である事後確率を算出し、事後確率から音声の評定値を算出し、１以上の評定値から評定結果を取得する処理、または、特徴データと教師データとを用いて、特徴データが評定対象の音素である事後確率を算出し、事後確率から音声の評定値を算出し、１以上の評定値から評定結果を取得する処理である発音評定サーバ装置である。 Further, in the pronunciation rating server device according to the twelfth aspect of the present invention, in relation to any one of the first to eleventh aspects, the first partial process is a process of digitizing sound and acquiring sound data, or sound Is obtained, and voice data is obtained, and feature data that is a set of feature vectors is obtained from the voice data. The second partial process obtains feature data from the voice data, and combines the feature data and the teacher data. And calculating the posterior probability that the feature data is the phoneme to be evaluated, calculating the speech rating value from the posterior probability, and obtaining the rating result from one or more rating values, or the feature data and the teacher data Is used to calculate the posterior probability that the feature data is the phoneme to be evaluated, calculate the speech rating value from the posterior probability, and obtain the rating result from one or more rating values. A.

本発明による発音評定サーバ装置によれば、発音評定のアルゴリズムを外部に知られるリスクを低減できる。または、本発明による発音評定サーバ装置によれば、高速な発音評定処理を行える。 According to the pronunciation rating server device of the present invention, it is possible to reduce the risk of knowing the pronunciation rating algorithm to the outside. Alternatively, the pronunciation rating server device according to the present invention can perform high-speed pronunciation rating processing.

実施の形態１における発音評定システムＡの概念図Conceptual diagram of pronunciation rating system A in the first embodiment 同発音評定システムＡのブロック図Block diagram of the same pronunciation rating system A 同発音評定サーバ装置１の動作について説明するフローチャートA flowchart for explaining the operation of the pronunciation rating server device 1 同端末装置４の動作について説明するフローチャートA flowchart for explaining the operation of the terminal device 4 同発音評定システムＡのブロック図Block diagram of the same pronunciation rating system A 実施の形態２における発音評定システムＢの概念図Conceptual diagram of pronunciation rating system B in the second embodiment 同発音評定システムＢのブロック図Block diagram of the pronunciation rating system B 同発音評定サーバ装置２の動作について説明するフローチャートA flowchart for explaining the operation of the pronunciation rating server device 2 同第二部分処理の詳細について説明するフローチャートFlow chart explaining details of the second partial process 同ユーザ情報管理表を示す図Figure showing the same user information management table 同教師データ管理表を示す図Figure showing the teacher data management table 同発音練習のリストを示す図Figure showing a list of pronunciation practice 同発音評定システムＢのブロック図Block diagram of the pronunciation rating system B 実施の形態３における発音評定システムＣの概念図Conceptual diagram of pronunciation rating system C in Embodiment 3 同発音評定システムＣのブロック図Block diagram of the pronunciation rating system C 同発音評定サーバ装置３Ａの動作について説明するフローチャートFlow chart for explaining the operation of the pronunciation pronunciation server device 3A 同発音評定サーバ装置３Ｂの動作について説明するフローチャートA flowchart for explaining the operation of the pronunciation rating server device 3B 同負荷情報管理表を示す図Figure showing the load information management table 上記実施の形態におけるコンピュータシステムの概観図Overview of the computer system in the above embodiment 同コンピュータシステムのブロック図Block diagram of the computer system

以下、発音評定サーバ装置等の実施形態について図面を参照して説明する。なお、実施の形態において同じ符号を付した構成要素は同様の動作を行うので、再度の説明を省略する場合がある。 Hereinafter, embodiments of the pronunciation rating server device and the like will be described with reference to the drawings. In addition, since the component which attached | subjected the same code | symbol in embodiment performs the same operation | movement, description may be abbreviate | omitted again.

（実施の形態１）
本実施の形態において、一の端末装置と一の発音評定サーバ装置とを具備する発音評定システムについて説明する。 (Embodiment 1)
In the present embodiment, a pronunciation rating system including one terminal device and one pronunciation rating server apparatus will be described.

また、本実施の形態において、発音評定の処理全体のうちの一部である第一部分処理を端末装置が行い、第二部分処理を発音評定サーバ装置が行う発音評定システムについて説明する。 In the present embodiment, a pronunciation rating system in which a terminal device performs a first partial process, which is a part of the entire pronunciation rating process, and a pronunciation rating server apparatus performs a second partial process will be described.

また、本実施の形態において、音声関連データ、発話内容データ、および評定結果を対応付けて蓄積し、利用する発音評定サーバ装置を具備する発音評定システムについて説明する。なお、音声関連データ、発話内容データ、および評定結果の詳細については、後述する。 In the present embodiment, a pronunciation rating system including a pronunciation rating server apparatus that stores and uses voice-related data, utterance content data, and rating results in association with each other will be described. Details of the voice related data, utterance content data, and the evaluation result will be described later.

また、本実施の形態において、発音評定の対象を識別する発話内容識別子を保持し、当該発話内容識別子をキーに発話内容データを取得し、当該発話内容データを出力する端末装置を具備する発音評定システムについて説明する。 Further, in the present embodiment, an utterance content identifier for identifying an utterance rating target is held, utterance content data is acquired using the utterance content identifier as a key, and a pronunciation rating provided with a terminal device that outputs the utterance content data The system will be described.

また、本実施の形態において、通信プロトコル等に対応する処理を行うミドルウェアを有する発音評定サーバ装置を具備する発音評定システムについて説明する。 In the present embodiment, a pronunciation rating system including a pronunciation rating server apparatus having middleware that performs processing corresponding to a communication protocol and the like will be described.

さらに、本実施の形態において、通信プロトコル等に対応する処理を行うミドルウェアを有する端末装置を具備する発音評定システムについて説明する。 Furthermore, in this embodiment, a pronunciation rating system including a terminal device having middleware that performs processing corresponding to a communication protocol or the like will be described.

図１は、本実施の形態における発音評定システムＡの概念図である。発音評定システムＡは、発音評定サーバ装置１、および端末装置４を備える。発音評定サーバ装置１とは、通常、サーバ装置であり、いわゆるクラウドサーバ等も含み、広く解する。発音評定サーバ装置１は、相当以上の処理能力がある装置であれば良く、その種類は問わない。また、端末装置４は、例えば、いわゆるスマートフォン、タブレット端末、携帯電話、ノートパソコン等、その種類は問わない。また、発音評定サーバ装置１と端末装置４とは、通信可能であり、例えば、インターネット等で接続されている。さらに、発音評定サーバ装置１の処理能力は、通常、端末装置４の処理能力より大きい。 FIG. 1 is a conceptual diagram of a pronunciation rating system A in the present embodiment. The pronunciation rating system A includes a pronunciation rating server device 1 and a terminal device 4. The pronunciation rating server device 1 is usually a server device and includes a so-called cloud server and is widely understood. The pronunciation rating server device 1 may be any device as long as it has a considerable processing capability, and the type thereof is not limited. The terminal device 4 may be of any type, such as a so-called smartphone, tablet terminal, mobile phone, notebook computer, or the like. The pronunciation rating server device 1 and the terminal device 4 can communicate with each other, and are connected via, for example, the Internet. Furthermore, the processing capability of the pronunciation rating server device 1 is usually larger than the processing capability of the terminal device 4.

図２は、本実施の形態における発音評定システムＡのブロック図である。 FIG. 2 is a block diagram of the pronunciation rating system A in the present embodiment.

発音評定サーバ装置１は、格納部１１、受信部１２、ミドルウェア部１３、評定結果取得部１４、送信部１５、蓄積部１６、および処理部１７を備える。 The pronunciation rating server device 1 includes a storage unit 11, a receiving unit 12, a middleware unit 13, a rating result acquisition unit 14, a transmission unit 15, a storage unit 16, and a processing unit 17.

端末装置４は、端末格納部４１、端末受付部４２、端末第一部分処理部４３、端末ミドルウェア部４４、端末送受信部４５、および端末出力部４６を備える。 The terminal device 4 includes a terminal storage unit 41, a terminal reception unit 42, a terminal first partial processing unit 43, a terminal middleware unit 44, a terminal transmission / reception unit 45, and a terminal output unit 46.

発音評定サーバ装置１を構成する格納部１１は、各種の情報を格納し得る。各種の情報とは、例えば、１以上の発音評定情報である。なお、１以上の発音評定情報は、ユーザ識別子と対応付いていても良い。発音評定情報とは、音声関連データ、発話内容データまたは発話内容識別子、および評定結果を対応付けた情報である。 The storage unit 11 constituting the pronunciation rating server device 1 can store various types of information. The various information is, for example, one or more pronunciation rating information. Note that one or more pronunciation rating information may be associated with a user identifier. The pronunciation rating information is information in which voice related data, utterance content data or utterance content identifier, and a rating result are associated with each other.

音声関連データは、端末装置４における第一部分処理の結果である。音声関連データは、例えば、音声データである。音声関連データは、例えば、音声をフレームに区分し、区分されたフレーム毎の音声データである１以上のフレーム音声データである。また、音声関連データは、例えば、音声データから取得された特徴ベクトルの集合である特徴データである。なお、音声データは、端末装置４のユーザが発し、受け付けられた音声のデータである。また、音声データは、通常、受け付けられた音声をデジタル化したデータである。また、１以上のフレーム音声データは、音声データである、と考えても良い。また、音声関連データは、指示された発話内容に対してユーザが発声した音声を受け付け、発音評定のための第一の部分処理である第一部分処理を音声に対して行った結果である、とも言える。 The voice related data is a result of the first partial process in the terminal device 4. The voice related data is, for example, voice data. The audio-related data is, for example, one or more frame audio data that is an audio data for each of the divided frames. The voice related data is, for example, feature data that is a set of feature vectors acquired from the voice data. The voice data is voice data issued and accepted by the user of the terminal device 4. The audio data is usually data obtained by digitizing the received audio. Further, one or more frame audio data may be considered as audio data. Also, the voice-related data is a result of receiving a voice uttered by the user for the instructed utterance content and performing a first partial process that is a first partial process for pronunciation evaluation on the voice. I can say that.

また、特徴データとは、１以上のフレーム音声データから、フレーム毎に取得された特徴ベクトルの集合「Ｏ＝ｏ_１，ｏ_２，・・・，ｏ_Ｔ」である。特徴データを構成する特徴ベクトル（ｏ_ｔ）は、フレーム音声データをスペクトル分析することにより取得される。また、特徴ベクトルは、例えば、三角型フィルタを用いたチャネル数２４のフィルタバンク出力を離散コサイン変換したＭＦＣＣであり、その静的パラメータ、デルタパラメータおよびデルタデルタパラメータをそれぞれ１２次元、さらに正規化されたパワーとデルタパワーおよびデルタデルタパワー（３９次元）を有する。また、スペクトル分析において、ケプストラム平均除去を施すことは好適である。ただし、音声分析条件が、他の条件でも良いことは言うまでもない。なお、特徴データを取得する処理は、公知技術により実現され得る。 The feature data is a set of feature vectors “O = o ₁ , o ₂ ,..., O _T ” acquired for each frame from one or more frame sound data. The feature vector (o _t ) constituting the feature data is obtained by performing spectral analysis on the frame speech data. The feature vector is, for example, an MFCC obtained by performing discrete cosine transform on a filter bank output of 24 channels using a triangular filter, and the static parameter, the delta parameter, and the delta delta parameter are further normalized to 12 dimensions, respectively. Power and delta power and delta delta power (39th dimension). In spectral analysis, it is preferable to perform cepstrum average removal. However, it goes without saying that the voice analysis conditions may be other conditions. Note that the process of acquiring the feature data can be realized by a known technique.

さらに、第一部分処理と第二部分処理についての詳細は、後述する。ただし、通常、第一部分処理と第二部分処理とにより、通常、発音評定処理が完結する。また、通常、第一部分処理を実行する場合の負荷は、第二部分処理を実行する場合の負荷と比較して、小さい。また、第一部分処理は発音評定クライアント処理または発音評定フロントエンド処理、第二部分処理は発音評定サーバ処理または発音評定バックエンド処理と言っても良い。 Furthermore, the detail about a 1st partial process and a 2nd partial process is mentioned later. However, the pronunciation evaluation process is usually completed by the first partial process and the second partial process. In general, the load when executing the first partial process is smaller than the load when executing the second partial process. Further, the first partial process may be referred to as a pronunciation rating client process or a pronunciation rating front-end process, and the second partial process may be referred to as a pronunciation rating server process or a pronunciation rating back-end process.

発話内容データは、発音評定の対象の発話内容を示すデータである。発話内容データは、例えば、発話内容を識別する発話内容識別子を用いて取得されたデータである。また、発話内容データは、ユーザが発音しなければならない正解の単語や文章の情報であり、例えば、その単語や文章を構成する音素の情報である。発話内容データは、ユーザが発音しなければならない正解の単語や、単語列等でも良い。 The utterance content data is data indicating the utterance content of the pronunciation evaluation target. The utterance content data is, for example, data acquired using an utterance content identifier that identifies the utterance content. The utterance content data is information on correct words and sentences that the user has to pronounce, for example, information on phonemes constituting the words and sentences. The utterance content data may be a correct word or a word string that the user must pronounce.

発話内容識別子は、発話内容データを識別する情報である。発話内容識別子は、発話内容を識別する情報であるとも言える。発話内容識別子は、例えば、ＩＤである。 The utterance content identifier is information for identifying utterance content data. It can be said that the utterance content identifier is information for identifying the utterance content. The utterance content identifier is, for example, an ID.

評定結果は、ユーザが発声した発話内容に対して発音評定された結果を示す情報である。評定結果は、評定値でも良いし、評定値から得られたユーザの発音のレベルを示す評定値でも良い。 The rating result is information indicating a result of pronunciation rating for the utterance content uttered by the user. The rating result may be a rating value or a rating value indicating a user's pronunciation level obtained from the rating value.

ユーザ識別子とは、ユーザまたは端末装置４を識別する情報であり、例えば、ＩＤ、電話番号、氏名等であり、その種類は問わない。 A user identifier is information which identifies a user or the terminal device 4, for example, ID, a telephone number, a name, etc., The kind is not ask | required.

また、各種の情報とは、例えば、発話内容識別子と発話内容データとの組を１組以上有する発話内容データベースである。 The various information is, for example, an utterance content database having one or more sets of utterance content identifiers and utterance content data.

また、各種の情報とは、例えば、１以上のユーザ情報である。ユーザ情報とは、ユーザに関する情報である。ユーザ情報は、ユーザ識別子と１以上のユーザ属性値とを有する。ユーザ属性値とは、例えば、年齢、年齢層、性別などである。 Various types of information are, for example, one or more pieces of user information. User information is information about the user. The user information has a user identifier and one or more user attribute values. The user attribute value is, for example, age, age group, sex or the like.

また、各種の情報とは、例えば、１または２以上の教師データである。教師データとは、発音評定に利用される音響モデルである。音響モデルは、通常、１以上の音素毎の音響モデルである。音響モデルは、隠れマルコフモデル（ＨＭＭ）に基づくデータであることは好適である。さらに、音響モデルは、例えば、一の音素ＨＭＭの終端状態から、当該一の音素または他のすべての音素の始端状態へ連結された音響モデルである。ただし、音響モデルは、一の音素ＨＭＭの終端状態から、当該一の音素または他のすべての音素の始端状態へ連結された音響モデルでなくても良い。 The various information is, for example, one or more teacher data. Teacher data is an acoustic model used for pronunciation evaluation. The acoustic model is usually an acoustic model for each one or more phonemes. The acoustic model is preferably data based on a Hidden Markov Model (HMM). Further, the acoustic model is, for example, an acoustic model connected from the terminal state of one phoneme HMM to the starting state of the one phoneme or all other phonemes. However, the acoustic model may not be an acoustic model connected from the terminal state of one phoneme HMM to the starting state of the one phoneme or all other phonemes.

なお、教師データは、必ずしも、音韻毎のＨＭＭを連結したＨＭＭに基づくデータである必要はない。教師データは、全音素のＨＭＭの、単なる集合であっても良い。また、教師データは、必ずしもＨＭＭに基づくデータである必要はない。教師データは、単一ガウス分布モデルや、確率モデル（ＧＭＭ：ガウシャンミクスチャモデル）や、統計モデルなど、他のモデルに基づくデータでも良い。なお、音響モデルは、例えば、音響を識別するＩＤと音響を特徴付ける特徴ベクトルの組である。 The teacher data does not necessarily need to be data based on the HMM in which the HMMs for each phoneme are connected. The teacher data may be a simple set of all phoneme HMMs. The teacher data does not necessarily need to be data based on the HMM. The teacher data may be data based on other models such as a single Gaussian distribution model, a probability model (GMM: Gaussian mixture model), and a statistical model. Note that the acoustic model is, for example, a set of an ID for identifying a sound and a feature vector that characterizes the sound.

受信部１２は、音声関連データを端末装置４から受信する。受信部１２は、音声関連データとユーザ識別子とを端末装置４から受信しても良い。 The receiving unit 12 receives voice related data from the terminal device 4. The receiving unit 12 may receive the voice related data and the user identifier from the terminal device 4.

受信部１２は、発話内容識別子、または発話内容データをも受信することは好適である。 It is preferable that the receiving unit 12 also receives an utterance content identifier or utterance content data.

受信部１２は、端末装置４の端末ミドルウェア部４４が行った処理の結果のデータであり、第一部分処理を音声に対して行われた結果である音声関連データに対して、送信できるデータに変更された後のデータであり、音声関連データを含むデータを端末装置４から受信することは好適である。かかる受信部１２が受信したデータには、ユーザ識別子、発話内容識別子または発話内容データが含まれていても良い。 The receiving unit 12 is data of a result of processing performed by the terminal middleware unit 44 of the terminal device 4 and is changed to data that can be transmitted with respect to voice-related data that is a result of performing the first partial processing on voice. It is preferable to receive data including the voice-related data from the terminal device 4 after being processed. The data received by the receiving unit 12 may include a user identifier, utterance content identifier, or utterance content data.

受信部１２は、発音評定情報に対する処理の指示を受信しても良い。かかる処理の指示とは、後述する処理部１７が１以上の発音評定情報に対して処理である。また、発音評定情報に対する処理の指示は、ユーザ識別子を含んでも良い。 The receiving unit 12 may receive a processing instruction for the pronunciation rating information. This processing instruction is processing for one or more pronunciation rating information by the processing unit 17 described later. Further, the processing instruction for the pronunciation rating information may include a user identifier.

ミドルウェア部１３は、受信部１２が受信したデータであり、音声関連データを含む受信データを、評定結果取得部１４が第二部分処理を行えるデータに変更する。なお、かかる処理を、ミドルウェアの第一処理という。なお、ミドルウェアの第一処理の結果には、音声関連データが含まれる。 The middleware unit 13 is data received by the receiving unit 12 and changes received data including audio-related data to data that can be subjected to the second partial processing by the evaluation result acquisition unit 14. Such processing is referred to as middleware first processing. The result of the first processing of the middleware includes voice related data.

なお、ミドルウェアの第一処理におけるデータの変更とは、暗号化されたデータの復号化、圧縮されたデータの解凍、受信部１２が受信したデータの構造変換、受信部１２が受信したデータから発音評定に無関係な情報の除去などのうちの１以上の処理であり、通常、端末装置４との通信に利用される通信プロトコルに対応する処理を含む。また、通信プロトコルとは、例えば、ＨＴＴＰ、ＴＣＰ／ＩＰ等である。また、ミドルウェア部１３は、通常、発音評定そのものに関する処理は行わない。 Note that the data change in the first middleware processing includes decryption of encrypted data, decompression of compressed data, structural conversion of data received by the receiving unit 12, and pronunciation from data received by the receiving unit 12. This is one or more processes in the removal of information unrelated to the rating, and usually includes a process corresponding to a communication protocol used for communication with the terminal device 4. The communication protocol is, for example, HTTP, TCP / IP, or the like. Further, the middleware unit 13 normally does not perform processing related to the pronunciation evaluation itself.

また、ミドルウェア部１３は、評定結果取得部１４が取得した評定結果を、送信部１５が送信できるデータに変更する。なお、かかる処理を、ミドルウェアの第二処理という。 Further, the middleware unit 13 changes the evaluation result acquired by the evaluation result acquisition unit 14 to data that can be transmitted by the transmission unit 15. Such processing is referred to as second middleware processing.

なお、ミドルウェアの第二処理におけるデータの変更とは、評定結果の暗号化、評定結果の圧縮処理、評定結果の構造変換などのうちの１以上の処理であり、端末装置４との通信に利用される通信プロトコルに対応する処理である。 The data change in the second processing of the middleware is one or more of the encryption of the evaluation result, the compression processing of the evaluation result, the structure conversion of the evaluation result, and the like, and is used for communication with the terminal device 4 This is processing corresponding to the communication protocol to be performed.

なお、ミドルウェア部１３と後述する評定結果取得部１４とが同じコンピュータ（発音評定サーバ装置１）上で動作する場合は、より高速に通信を行うために、無関係なネットワークパケットの影響のないローカルループバックインターフェースを通して通信を行うことは好適である。 When the middleware unit 13 and a rating result acquisition unit 14 described later operate on the same computer (pronunciation rating server device 1), a local loop free from the influence of irrelevant network packets is used for higher-speed communication. It is preferable to communicate through the back interface.

評定結果取得部１４は、第二部分処理を音声関連データに対して行い、評定結果を取得する。第二部分処理とは、第一部分処理の後に行うべき発音評定のための第二の部分処理である。 The rating result acquisition unit 14 performs the second partial process on the voice-related data and acquires the rating result. The second partial process is a second partial process for pronunciation evaluation to be performed after the first partial process.

また、評定結果取得部１４は、ミドルウェア部１３が変更した後の音声関連データに対して第二部分処理を行い、評定結果を取得しても良い。 Moreover, the rating result acquisition part 14 may perform a 2nd partial process with respect to the audio | voice related data after the middleware part 13 changed, and may acquire a rating result.

ここで、第二部分処理とは、通常、音声関連データ、発話内容データを元に、教師データと照合し、発音評定を行い、評定値を算出する処理である。また、第二部分処理とは、例えば、音声関連データ、発話内容データを元に、教師データと照合し、発音評定を行い、評定値を算出し、当該算出した１または２以上の評定値を用いて評定結果を取得する処理である。 Here, the second partial process is usually a process of collating with teacher data based on voice-related data and utterance content data, performing pronunciation evaluation, and calculating a rating value. The second partial processing is, for example, collating with teacher data based on voice-related data and utterance content data, performing pronunciation rating, calculating a rating value, and calculating the calculated one or more rating values. It is a process which acquires a rating result using.

また、第二部分処理とは、例えば、音声データから特徴データを取得し、特徴データと教師データとを用いて、特徴データが評定対象の音素である事後確率を算出し、事後確率から音声の評定値を算出し、１以上の評定値から評定結果を取得する処理である。 Also, the second partial processing is, for example, acquiring feature data from voice data, calculating the posterior probability that the feature data is the phoneme to be evaluated using the feature data and the teacher data, This is a process of calculating a rating value and acquiring a rating result from one or more rating values.

また、第二部分処理とは、例えば、特徴データと教師データとを用いて、特徴データが評定対象の音素である事後確率を算出し、事後確率から音声の評定値を算出し、１以上の評定値から評定結果を取得する処理である。 Further, the second partial processing is, for example, using feature data and teacher data, calculating a posterior probability that the feature data is a phoneme to be evaluated, calculating a speech rating value from the posterior probability, and calculating one or more This is a process for obtaining a rating result from a rating value.

第二部分処理は、例えば、特徴ベクトル部分系列取得処理、評定値算出処理を含む。特徴ベクトル部分系列取得処理とは、音声関連データである特徴データ（特徴ベクトル系列と言っても良い）から、評定対象の発話内容データである１以上の音素の並びであり、同一の音素が連続する１以上の最適音素系列の集合である最適音素部分系列に対応する１以上の特徴ベクトルの組である特徴ベクトル部分系列を１組以上取得する処理である。評定値算出処理とは、格納部１１から教師データを読み出し、当該教師データを用いて、取得した特徴ベクトル部分系列が評定対象の音素である事後確率を算出し、当該事後確率から音声の評定値を算出する処理である。 The second partial process includes, for example, a feature vector partial series acquisition process and a rating value calculation process. The feature vector partial sequence acquisition process is a sequence of one or more phonemes as utterance content data to be evaluated from feature data (also referred to as a feature vector sequence) that is speech-related data, and the same phoneme is continuous. This is a process of acquiring one or more feature vector partial sequences that are a set of one or more feature vectors corresponding to an optimal phoneme partial sequence that is a set of one or more optimal phoneme sequences. The rating value calculation process reads teacher data from the storage unit 11, uses the teacher data to calculate a posterior probability that the acquired feature vector partial series is a phoneme to be evaluated, and evaluates a speech rating value from the posterior probability Is a process for calculating.

また、第二部分処理は、例えば、フレーム音声データ取得処理、特徴データ取得処理、特徴ベクトル部分系列取得処理、評定値算出処理を含む。フレーム音声データ取得処理とは、音声関連データである音声データをフレームに区分し、区分されたフレーム毎の音声データである1以上のフレーム音声データを得る処理である。また、特徴データ取得処理とは、1以上のフレーム音声データから、フレーム毎の特徴ベクトルの集合である特徴データを取得する処理である。 The second partial process includes, for example, a frame audio data acquisition process, a feature data acquisition process, a feature vector partial series acquisition process, and a rating value calculation process. The frame audio data acquisition process is a process of dividing audio data that is audio-related data into frames and obtaining one or more frame audio data that are audio data for each of the divided frames. The feature data acquisition process is a process of acquiring feature data that is a set of feature vectors for each frame from one or more frame audio data.

なお、評定結果を取得するアルゴリズムは問わない。評定結果を取得するアルゴリズムの例は、上述した特許文献１、特許文献２等に記載されている。 In addition, the algorithm which acquires a rating result does not ask | require. Examples of the algorithm for obtaining the evaluation result are described in Patent Document 1, Patent Document 2, and the like described above.

送信部１５は、評定結果取得部１４が取得した評定結果を端末装置４に送信する。 The transmission unit 15 transmits the evaluation result acquired by the evaluation result acquisition unit 14 to the terminal device 4.

また、送信部１５は、ミドルウェア部１３がミドルウェアの第二処理を行った後の情報を端末装置４に送信しても良い。かかる情報は、評定結果を含む。 The transmission unit 15 may transmit information after the middleware unit 13 performs the second middleware process to the terminal device 4. Such information includes rating results.

また、送信部１５は、処理部１７が１または２以上の発音評定情報に対する処理を行った結果である処理結果を端末装置４に送信しても良い。 In addition, the transmission unit 15 may transmit a processing result, which is a result of the processing unit 17 performing processing on one or more pronunciation rating information, to the terminal device 4.

蓄積部１６は、音声関連データ、発話内容データまたは発話内容識別子、および評定結果を対応付けた１以上の発音評定情報を蓄積する。発音評定情報は、例えば、音声関連データ、発話内容データまたは発話内容識別子、および評定結果を有する。蓄積部１６は、通常、１以上の発音評定情報を格納部１１に蓄積する。なお、発話内容データは、発音の練習等を行うユーザに示された発話内容を示すデータである。また、蓄積部１６は、図示しない外部の装置の記録媒体等に、１以上の発音評定情報を蓄積しても良い。 The accumulating unit 16 accumulates one or more pronunciation rating information in which voice related data, utterance content data or utterance content identifier, and a rating result are associated with each other. The pronunciation rating information includes, for example, voice related data, utterance content data or utterance content identifier, and a rating result. The storage unit 16 normally stores one or more pronunciation rating information in the storage unit 11. Note that the utterance content data is data indicating the utterance content presented to the user who is practicing pronunciation. Further, the storage unit 16 may store one or more pronunciation rating information in a recording medium of an external device (not shown).

また、蓄積部１６は、１以上の発音評定情報をユーザ識別子に対応付けて蓄積しても良い。また、蓄積部１６は、１以上の発音評定情報をユーザ属性値別に蓄積しても良い。ユーザ属性値とは、例えば、年齢、年齢層、性別、音質を識別する情報等である。ユーザ属性値は、例えば、ユーザ識別子に対応付けて、格納部１１に蓄積されている。 The storage unit 16 may store one or more pronunciation rating information in association with the user identifier. Further, the storage unit 16 may store one or more pronunciation rating information for each user attribute value. The user attribute value is, for example, information for identifying age, age group, sex, and sound quality. The user attribute value is accumulated in the storage unit 11 in association with the user identifier, for example.

ここで、発話内容データまた発話内容識別子とは、どのように取得しても良い。発話内容データまた発話内容識別子は、受信部１２が端末装置４から受信しても良い。また、発話内容データまた発話内容識別子は、格納部１１に予め格納されていても良い。 Here, the utterance content data or the utterance content identifier may be acquired in any way. The reception unit 12 may receive the utterance content data or the utterance content identifier from the terminal device 4. Moreover, the utterance content data or the utterance content identifier may be stored in the storage unit 11 in advance.

処理部１７は、１以上の発音評定情報に対して処理を行う。ここでの処理とは、例えば、表示、外部の装置（端末装置４など）への送信、統計処理などである。 The processing unit 17 processes one or more pronunciation rating information. The processing here is, for example, display, transmission to an external device (terminal device 4 or the like), statistical processing, and the like.

処理部１７は、受信部１２が発音評定情報に対する処理の指示を受信した場合に、１以上の発音評定情報に対して処理を行っても良い。処理部１７が処理を行うトリガーやタイミングは問わない。 The processing unit 17 may perform processing on one or more pronunciation rating information when the receiving unit 12 receives an instruction for processing on the pronunciation rating information. The trigger and timing at which the processing unit 17 performs processing does not matter.

処理部１７は、受信部１２が発音評定情報に対する処理の指示に含まれるユーザ識別子に対応する１以上の発音評定情報に対して処理を行っても良い。 The processing unit 17 may perform processing on one or more pronunciation rating information corresponding to the user identifier included in the processing instruction for the pronunciation rating information by the receiving unit 12.

統計処理とは、例えば、２以上の評定結果の平均、分散、偏差値等を算出し、出力する処理である。統計処理とは、例えば、２以上の評定結果の平均、分散、偏差値などを、ユーザの年齢別に算出することや、音声データの声質の違いや背景雑音による評定結果の偏りなどを分析すること等である。なお、統計処理とは、２以上の評定結果を用いて、何らかの加工された情報を取得すれば良い。 The statistical process is, for example, a process of calculating and outputting an average, variance, deviation value, etc. of two or more evaluation results. Statistical processing means, for example, calculating the average, variance, deviation value, etc. of two or more evaluation results according to the user's age, and analyzing the difference in voice quality of voice data and the bias of evaluation results due to background noise. Etc. In the statistical process, some processed information may be acquired using two or more evaluation results.

また、統計処理とは、例えば、ユーザの学習進捗度や、発音の成功や失敗の傾向、設問の難易度や学習効果などを、統計的に分析することである。例えば、設問に対応する発音評定結果の合格点を格納部１１に保持しており、処理部１７は、ユーザ毎の設問に対する合否の結果を取得し、ユーザ毎の学習進捗度を取得する。学習進捗度とは、例えば、合格した設問の割合、合格した設問の数、合格した設問に対応するレベル値などである。また、例えば、各設問の特性を格納部１１に保持しており、処理部１７は、一のユーザの評定値が閾値以上（発音の成功）の設問の特性（例えば、「ｔｈ」の発音、「ｌ」または「ｒ」の発音等）を取得したり、一のユーザの評定値が閾値未満（発音の失敗）の設問の特性を取得したりして、発音の成功や失敗の傾向を分析する。また、例えば、設問の難易度と評定値との関係を示す情報を格納部１１に保持しており、処理部１７は、２以上のユーザの、各設問の評定値の平均値を算出し、各設問の難易度を取得する。また、例えば、処理部１７は、ユーザごとに、評定値の上昇の度合いを取得し、学習効果を示す処理結果として出力する。処理結果は、送信部１５が端末装置４に送信することは好適である。 The statistical processing means, for example, statistical analysis of a user's learning progress, pronunciation success or failure tendency, question difficulty, learning effect, and the like. For example, a passing score of the pronunciation evaluation result corresponding to the question is held in the storage unit 11, and the processing unit 17 acquires a pass / fail result for the question for each user, and acquires a learning progress for each user. The learning progress is, for example, the ratio of questions that have passed, the number of questions that have passed, the level value corresponding to the passed questions, and the like. In addition, for example, the characteristics of each question are held in the storage unit 11, and the processing unit 17 has a characteristic of a question whose pronunciation value of one user is equal to or greater than a threshold value (successful pronunciation) (for example, pronunciation of “th”, Pronunciation of “l” or “r”, etc.), or the characteristics of a question whose rating value of one user is less than a threshold (pronunciation failure), to analyze the tendency of pronunciation success or failure To do. In addition, for example, the storage unit 11 holds information indicating the relationship between the difficulty level of the question and the rating value, and the processing unit 17 calculates an average value of the rating values of each question of two or more users. Get the difficulty of each question. Further, for example, the processing unit 17 acquires the degree of increase in the rating value for each user and outputs it as a processing result indicating a learning effect. It is preferable that the processing unit 15 transmits the processing result to the terminal device 4.

端末装置４を構成する端末格納部４１は、各種の情報を格納し得る。各種の情報とは、例えば、ユーザ識別子、１以上のユーザ属性値等である。各種の情報とは、例えば、１または２以上の発話内容識別子である。また、各種の情報とは、例えば、１または２以上の発話内容データである。発話内容識別子または発話内容データは、発音評定のための設問に対応する。なお、２以上の発話内容識別子または２以上の発話内容データは、発音評定の設問リストを構成する情報となり得る。 The terminal storage part 41 which comprises the terminal device 4 can store various information. The various information includes, for example, a user identifier, one or more user attribute values, and the like. The various types of information are, for example, one or more utterance content identifiers. The various types of information are, for example, one or more utterance content data. The utterance content identifier or the utterance content data corresponds to a question for pronunciation evaluation. In addition, two or more utterance content identifiers or two or more utterance content data can be information constituting a question list of pronunciation ratings.

端末受付部４２は、ユーザから各種のデータや指示を受け付ける。端末受付部４２は、ユーザから音声を受け付ける。各種のデータや指示とは、例えば、動作指示、発音評定情報に対する処理の指示等である。動作指示とは、発音評定のサービスを開始する指示である。 The terminal reception unit 42 receives various data and instructions from the user. The terminal reception unit 42 receives voice from the user. The various data and instructions are, for example, operation instructions, processing instructions for pronunciation evaluation information, and the like. The operation instruction is an instruction to start a pronunciation rating service.

ここで、受け付けとは、例えば、マイク、キーボードやマウスやタッチパネル等の入力手段からの受け付けである。音声の入力手段は、例えば、マイク、記録媒体からの読み込み等、何でも良い。端末受付部４２は、マイク、キーボード、マウス、タッチパネル等の入力手段のデバイスドライバー等で実現され得る。 Here, reception is reception from input means, such as a microphone, a keyboard, a mouse, and a touch panel, for example. The voice input means may be anything such as a microphone or reading from a recording medium. The terminal reception unit 42 can be realized by a device driver of an input unit such as a microphone, a keyboard, a mouse, and a touch panel.

端末第一部分処理部４３は、端末受付部４２が受け付けた音声に対して第一部分処理を行い、音声関連データを取得する。ここで、第一部分処理とは、例えば、受け付けられた音声をデジタル化し、音声データを取得する処理である。また、第一部分処理とは、例えば、音声をデジタル化し、音声データを取得し、当該音声データから特徴ベクトルの集合である特徴データを取得する処理である。また、第一部分処理とは、例えば、音声をデジタル化し、音声データを取得し、当該音声データをフレームに区分し、区分されたフレーム毎の音声データである１以上のフレーム音声データを取得する処理である。つまり、音声関連データは、例えば、音声データ、特徴データ、１以上のフレーム音声データ等である。 The terminal first partial processing unit 43 performs a first partial process on the voice received by the terminal receiving unit 42 and acquires voice related data. Here, the first partial process is, for example, a process of digitizing the received voice and acquiring voice data. The first partial process is, for example, a process of digitizing sound, acquiring sound data, and acquiring feature data that is a set of feature vectors from the sound data. The first partial processing is, for example, a process of digitizing sound, obtaining sound data, dividing the sound data into frames, and obtaining one or more frame sound data that is sound data for each divided frame. It is. That is, the voice related data is, for example, voice data, feature data, one or more frame voice data, and the like.

端末ミドルウェア部４４は、端末第一部分処理部４３が取得した音声関連データを変更し、送信するデータを取得する。端末ミドルウェア部４４は、例えば、音声関連データを暗号化したり、音声関連データを圧縮したり、送信するためにデータ構造を変換したりする。かかる処理は、通常、発音評定サーバ装置１との通信に利用される通信プロトコルに対応する処理を含む。なお、端末ミドルウェア部４４が音声関連データを圧縮する場合、劣化の少ない方式が望ましい。また、端末ミドルウェア部４４が音声関連データを圧縮する場合、特に、端末ミドルウェア部４４は、音声の周波数帯域の情報を劣化させずに、圧縮することは好適である。さらに、端末ミドルウェア部４４は、音声関連データの圧縮・解凍処理において、高速に実行できる方式を採用することは好適である。なお、かかる端末ミドルウェア部４４の処理を端末ミドルウェアの第一処理という。 The terminal middleware unit 44 changes the voice-related data acquired by the terminal first partial processing unit 43 and acquires data to be transmitted. For example, the terminal middleware unit 44 encrypts voice-related data, compresses voice-related data, and converts a data structure for transmission. Such processing usually includes processing corresponding to a communication protocol used for communication with the pronunciation rating server device 1. In addition, when the terminal middleware part 44 compresses audio | voice related data, a system with little deterioration is desirable. When the terminal middleware unit 44 compresses audio-related data, it is particularly preferable that the terminal middleware unit 44 compresses the audio frequency band information without degrading the information. Furthermore, it is preferable that the terminal middleware unit 44 adopts a method that can be executed at high speed in the compression / decompression processing of the voice-related data. The process of the terminal middleware unit 44 is referred to as a first process of the terminal middleware.

また、端末ミドルウェア部４４は、端末送受信部４５が受信したデータであり、評定結果を含むデータを変換し、出力できるデータを得る。端末ミドルウェア部４４は、例えば、受信されたデータを復号化したり、解凍したり、データ構造を変換したりする。かかる処理は、通常、発音評定サーバ装置１との通信に利用される通信プロトコルに対応する処理を含む。なお、かかる端末ミドルウェア部４４の処理を端末ミドルウェアの第二処理という。 The terminal middleware unit 44 is data received by the terminal transmitting / receiving unit 45, converts data including the evaluation result, and obtains data that can be output. The terminal middleware unit 44, for example, decodes received data, decompresses it, or converts the data structure. Such processing usually includes processing corresponding to a communication protocol used for communication with the pronunciation rating server device 1. The process of the terminal middleware unit 44 is referred to as a second process of the terminal middleware.

なお、端末ミドルウェア部４４は、通常、発音評定そのものに関する処理は行わない。 Note that the terminal middleware unit 44 normally does not perform processing related to the pronunciation evaluation itself.

端末送受信部４５は、端末第一部分処理部４３が取得した音声関連データを発音評定サーバ装置１に送信する。また、端末送受信部４５は、端末ミドルウェア部４４が変更したデータであり、音声関連データを含むデータを発音評定サーバ装置１に送信しても良い。 The terminal transmitting / receiving unit 45 transmits the voice related data acquired by the terminal first partial processing unit 43 to the pronunciation rating server device 1. The terminal transmission / reception unit 45 may be data changed by the terminal middleware unit 44 and may transmit data including voice related data to the pronunciation rating server device 1.

また、端末送受信部４５は、格納部１１のユーザ識別子をも発音評定サーバ装置１に送信しても良い。 The terminal transmission / reception unit 45 may also transmit the user identifier in the storage unit 11 to the pronunciation rating server device 1.

また、端末送受信部４５は、発話内容データまたは発話内容識別子をも発音評定サーバ装置１に送信しても良い。 The terminal transmission / reception unit 45 may also transmit the utterance content data or the utterance content identifier to the pronunciation rating server device 1.

また、端末送受信部４５は、端末受付部４２が受け付けた発音評定情報に対する処理の指示を発音評定サーバ装置１に送信しても良い。 In addition, the terminal transmission / reception unit 45 may transmit an instruction to process the pronunciation rating information received by the terminal reception unit 42 to the pronunciation rating server apparatus 1.

また、端末送受信部４５は、発音評定サーバ装置１から評定結果を含むデータを受信する。 Further, the terminal transmission / reception unit 45 receives data including a rating result from the pronunciation rating server device 1.

端末出力部４６は、端末格納部４１の発話内容識別子を用いて、当該発話内容識別子に対応する発話内容データを取得し、出力することは好適である。端末出力部４６は、端末格納部４１の発話内容データを出力しても良い。 The terminal output unit 46 preferably uses the utterance content identifier of the terminal storage unit 41 to acquire and output the utterance content data corresponding to the utterance content identifier. The terminal output unit 46 may output the utterance content data in the terminal storage unit 41.

また、端末出力部４６は、端末送受信部４５が受信した評定結果を出力する。また、端末出力部４６は、端末ミドルウェア部４４が変換して得たデータであり、評定結果を含むデータから、評定結果を出力する。 Further, the terminal output unit 46 outputs the evaluation result received by the terminal transmission / reception unit 45. The terminal output unit 46 is data obtained by conversion by the terminal middleware unit 44, and outputs a rating result from data including a rating result.

なお、端末出力部４６は、端末格納部４１の２以上の発話内容識別子を取得し、２以上の発話内容データを取得し、出力しても良い。かかる場合、端末出力部４６は、一の評定結果を出力した後、当該評定結果に対応する発話内容識別子の次の発話内容識別子で識別される発話内容データを出力することは好適である。つまり、端末出力部４６は、発話内容データの出力と、当該発話内容データに対応する評定結果の出力とを順に行い、かつ２以上の発話内容データの出力と評定結果の出力とを、順次、連続して行うことは好適である。 Note that the terminal output unit 46 may acquire two or more utterance content identifiers in the terminal storage unit 41, acquire two or more utterance content data, and output the data. In such a case, it is preferable that the terminal output unit 46 outputs the utterance content data identified by the utterance content identifier next to the utterance content identifier corresponding to the evaluation result after outputting the one evaluation result. That is, the terminal output unit 46 sequentially outputs the utterance content data and the rating result corresponding to the utterance content data, and sequentially outputs two or more utterance content data and the rating result. It is preferable to carry out continuously.

また、端末出力部４６は、端末格納部４１の２以上の発話内容データを取得し、出力しても良い。かかる場合、端末出力部４６は、一の評定結果を出力した後、当該評定結果に対応する発話内容データの次の発話内容データを出力することは好適である。つまり、端末出力部４６は、発話内容データの出力と、当該発話内容データに対応する評定結果の出力とを順に行い、かつ２以上の発話内容データの出力と評定結果の出力とを、順次、連続して行うことは好適である。 Further, the terminal output unit 46 may acquire and output two or more utterance content data in the terminal storage unit 41. In such a case, it is preferable that the terminal output unit 46 outputs the utterance content data next to the utterance content data corresponding to the evaluation result after outputting the one evaluation result. That is, the terminal output unit 46 sequentially outputs the utterance content data and the rating result corresponding to the utterance content data, and sequentially outputs two or more utterance content data and the rating result. It is preferable to carry out continuously.

ここで、出力とは、ディスプレイへの表示、プロジェクターを用いた投影、プリンタでの印字、音出力、外部の装置への送信、記録媒体への蓄積、他の処理装置や他のプログラムなどへの処理結果の引渡しなどを含む概念である。 Here, output refers to display on a display, projection using a projector, printing with a printer, sound output, transmission to an external device, storage in a recording medium, and output to other processing devices or other programs. It is a concept that includes delivery of processing results.

格納部１１、端末格納部４１は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。格納部１１等に情報が記憶される過程は問わない。例えば、記録媒体を介して、情報が格納部１１等で記憶されるようになってもよく、通信回線等を介して送信された情報が格納部１１等で記憶されるようになってもよく、あるいは、入力デバイスを介して入力された情報が格納部１１で記憶されるようになってもよい。 The storage unit 11 and the terminal storage unit 41 are preferably non-volatile recording media, but can also be realized by volatile recording media. The process in which information is stored in the storage unit 11 or the like is not limited. For example, information may be stored in the storage unit 11 or the like via a recording medium, and information transmitted via a communication line or the like may be stored in the storage unit 11 or the like. Alternatively, information input via the input device may be stored in the storage unit 11.

受信部１２は、通常、無線または有線の通信手段で実現されるが、放送を受信する手段で実現されても良い。 The receiving unit 12 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.

ミドルウェア部１３、評定結果取得部１４、蓄積部１６、処理部１７、端末第一部分処理部４３、端末ミドルウェア部４４は、通常、ＭＰＵやメモリ等から実現され得る。ミドルウェア部１３等の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The middleware unit 13, the evaluation result acquisition unit 14, the storage unit 16, the processing unit 17, the terminal first partial processing unit 43, and the terminal middleware unit 44 can be usually realized by an MPU, a memory, or the like. The processing procedure of the middleware unit 13 and the like is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

送信部１５は、通常、無線または有線の通信手段で実現されるが、放送手段で実現されても良い。 The transmission unit 15 is usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.

端末送受信部４５は、通常、無線または有線の通信手段で実現される。 The terminal transmission / reception unit 45 is usually realized by a wireless or wired communication means.

端末出力部４６は、ディスプレイやスピーカー等の出力デバイスを含むと考えても含まないと考えても良い。端末出力部４６は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。 The terminal output unit 46 may be considered as including or not including an output device such as a display or a speaker. The terminal output unit 46 can be implemented by output device driver software, or output device driver software and an output device.

次に、発音評定システムＡの動作について説明する。まず、発音評定サーバ装置１の動作について、図３のフローチャートを用いて説明する。 Next, the operation of the pronunciation rating system A will be described. First, the operation of the pronunciation rating server device 1 will be described using the flowchart of FIG.

（ステップＳ３０１）受信部１２は、音声関連データ等を端末装置４から受信したか否かを判断する。音声関連データ等を受信した場合はステップＳ３０２に行き、受信しない場合はステップＳ３０８に行く。なお、音声関連データ等とは、例えば、音声関連データ、およびユーザ識別子と発話内容識別子と発話内容データのうちの１以上のデータである。 (Step S <b> 301) The receiving unit 12 determines whether voice related data or the like has been received from the terminal device 4. When the voice related data or the like is received, the process goes to step S302, and when not received, the process goes to step S308. Note that the voice-related data and the like are, for example, voice-related data and at least one of a user identifier, an utterance content identifier, and an utterance content data.

（ステップＳ３０２）ミドルウェア部１３は、ステップＳ３０１で受信された音声関連データ等に対して、ミドルウェアの第一処理を行い、評定結果取得部１４が第二部分処理を行えるデータを取得する。 (Step S302) The middleware unit 13 performs a first middleware process on the voice-related data received in step S301, and acquires data for which the evaluation result acquisition unit 14 can perform the second partial process.

（ステップＳ３０３）評定結果取得部１４は、ステップＳ３０２で取得されたデータに対して、第二部分処理を行い、評定結果を取得する。評定結果取得部１４は、通常、発話内容データと教師データとを用いて、ステップＳ３０２で取得されたデータに対して、発音評定処理を行い、評定結果を取得する。 (Step S303) The rating result acquisition unit 14 performs a second partial process on the data acquired in step S302, and acquires the rating result. The rating result acquisition unit 14 usually performs pronunciation rating processing on the data acquired in step S302 using the utterance content data and the teacher data, and acquires the rating result.

なお、発話内容データは、例えば、受信部１２が受信した発話内容識別子に対応する発話内容データである。また、教師データは、格納部１１の教師データである。つまり、評定結果取得部１４は、受信部１２が受信した発話内容識別子に対応する発話内容データを図示しないデータベースや外部の装置等から取得する。また、評定結果取得部１４は、格納部１１の教師データを取得する。そして、評定結果取得部１４は、音声関連データと発話内容データとを用いて、教師データを照合し、発音評定を行い、評定値を算出する。なお、発音評定を行い、評定値を算出する処理は公知技術であるので詳細な説明を省略する。 Note that the utterance content data is, for example, utterance content data corresponding to the utterance content identifier received by the receiving unit 12. The teacher data is teacher data in the storage unit 11. That is, the rating result acquisition unit 14 acquires utterance content data corresponding to the utterance content identifier received by the reception unit 12 from a database (not shown), an external device, or the like. Further, the evaluation result acquisition unit 14 acquires teacher data in the storage unit 11. Then, the rating result acquisition unit 14 collates the teacher data using the voice-related data and the utterance content data, performs a pronunciation rating, and calculates a rating value. In addition, since the process of performing pronunciation evaluation and calculating the evaluation value is a known technique, detailed description thereof is omitted.

（ステップＳ３０４）ミドルウェア部１３は、ステップＳ３０３で取得された評定結果に対して、ミドルウェアの第二処理を行い、送信部１５が送信できるデータを構成する。 (Step S304) The middleware unit 13 performs second processing of middleware on the evaluation result acquired in step S303, and configures data that can be transmitted by the transmission unit 15.

（ステップＳ３０５）送信部１５は、ステップＳ３０４で構成された評定結果を含むデータを端末装置４に送信する。 (Step S305) The transmission unit 15 transmits data including the evaluation result configured in step S304 to the terminal device 4.

（ステップＳ３０６）蓄積部１６は、音声関連データ、発話内容データまたは発話内容識別子、および評定結果を対応付けた発音評定情報を構成する。 (Step S306) The storage unit 16 configures pronunciation rating information in which voice related data, utterance content data or utterance content identifier, and a rating result are associated with each other.

（ステップＳ３０７）蓄積部１６は、ステップＳ３０６で構成した発音評定情報を格納部１１に蓄積する。ステップＳ３０１に戻る。 (Step S307) The accumulating unit 16 accumulates the pronunciation rating information configured in step S306 in the storage unit 11. The process returns to step S301.

（ステップＳ３０８）受信部１２は、発音評定情報に対する処理の指示を受信したか否かを判断する。処理の指示を受信した場合はステップＳ３０９に行き、処理の指示を受信しない場合はステップＳ３０１に戻る。 (Step S308) The receiving unit 12 determines whether or not an instruction for processing the pronunciation rating information has been received. If a processing instruction is received, the process proceeds to step S309. If a processing instruction is not received, the process returns to step S301.

（ステップＳ３０９）処理部１７は、１以上の発音評定情報を格納部１１から取得する。なお、処理部１７は、処理の指示に含まれるユーザ識別子に対応する１以上の発音評定情報を格納部１１から取得しても良い。 (Step S309) The processing unit 17 acquires one or more pronunciation rating information from the storage unit 11. The processing unit 17 may obtain one or more pronunciation rating information corresponding to the user identifier included in the processing instruction from the storage unit 11.

（ステップＳ３１０）処理部１７は、ステップＳ３０９で取得した１以上の発音評定情報に対する処理を行い、処理結果を取得する。 (Step S310) The processing unit 17 performs processing on one or more pronunciation rating information acquired in step S309, and acquires a processing result.

（ステップＳ３１１）送信部１５は、ステップＳ３１０で取得した処理結果を端末装置４に送信する。ステップＳ３０１に戻る。 (Step S311) The transmission unit 15 transmits the processing result acquired in Step S310 to the terminal device 4. The process returns to step S301.

なお、図３のフローチャートにおいて、ミドルウェア部１３が行うステップは存在しなくても良い。 In the flowchart of FIG. 3, the steps performed by the middleware unit 13 may not exist.

また、図３のフローチャートにおいて、ステップＳ３０３の発音評定処理のアルゴリズムは問わないことは言うまでもない。 In the flowchart of FIG. 3, it goes without saying that the algorithm of the pronunciation rating process in step S303 is not questioned.

また、図３のフローチャートにおいて、ステップＳ３０８における発音評定情報に対する処理の指示、およびステップＳ３１０における処理結果に対して、ミドルウェア部１３が処理を行っても良い。 In the flowchart of FIG. 3, the middleware unit 13 may perform processing on the pronunciation instruction information in step S308 and the processing result in step S310.

さらに、図３のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 Further, in the flowchart of FIG. 3, the process is ended by power-off or a process end interrupt.

次に、端末装置４の動作について、図４のフローチャートを用いて説明する。 Next, operation | movement of the terminal device 4 is demonstrated using the flowchart of FIG.

（ステップＳ４０１）端末受付部４２は、ユーザから動作指示を受け付けたか否かを判断する。動作指示を受け付けた場合はステップＳ４０２に行き、動作指示を受け付けない場合はステップＳ４１４に行く。 (Step S401) The terminal receiving unit 42 determines whether an operation instruction has been received from the user. When the operation instruction is accepted, the process goes to step S402, and when the operation instruction is not accepted, the process goes to step S414.

（ステップＳ４０２）端末出力部４６は、カウンタｉに１を代入する。 (Step S402) The terminal output unit 46 substitutes 1 for the counter i.

（ステップＳ４０３）端末出力部４６は、ｉ番目の設問が存在するか否かを判断する。ｉ番目の設問が存在する場合はステップＳ４０４に行き、ｉ番目の設問が存在しない場合はステップＳ４０１に戻る。なお、端末出力部４６は、例えば、ｉ番目の設問に対応するｉ番目の発話内容識別子が格納部１１に格納されているか否かを判断する。 (Step S403) The terminal output unit 46 determines whether or not the i-th question exists. If the i-th question exists, the process goes to step S404, and if the i-th question does not exist, the process returns to step S401. The terminal output unit 46 determines whether, for example, the i-th utterance content identifier corresponding to the i-th question is stored in the storage unit 11.

（ステップＳ４０４）端末出力部４６は、ｉ番目の設問に対応する発話内容識別子を用いて、当該発話内容識別子に対応する発話内容データを、格納部１１または外部の装置から取得する。また、端末出力部４６は、ｉ番目の設問に対応する発話内容データを、格納部１１または外部の装置から取得しても良い。 (Step S404) The terminal output unit 46 acquires the utterance content data corresponding to the utterance content identifier from the storage unit 11 or an external device using the utterance content identifier corresponding to the i-th question. The terminal output unit 46 may acquire the utterance content data corresponding to the i-th question from the storage unit 11 or an external device.

（ステップＳ４０５）端末出力部４６は、ステップＳ４０４で取得した発話内容データを出力する。 (Step S405) The terminal output unit 46 outputs the utterance content data acquired in step S404.

（ステップＳ４０６）端末受付部４２は、ユーザから音声を受け付けたか否かを判断する。音声を受け付けた場合はステップＳ４０７に行き、音声を受け付けない場合はステップＳ４０６に戻る。なお、受け付けられる音声は、発話内容データを見て、発音したユーザの音声である。 (Step S406) The terminal receiving unit 42 determines whether or not a voice is received from the user. When the voice is accepted, the process goes to step S407, and when the voice is not accepted, the process returns to step S406. Note that the received voice is the voice of the user who pronounced the utterance content data.

（ステップＳ４０７）端末第一部分処理部４３は、ステップＳ４０６で受け付けられた音声に対して第一部分処理を行い、音声関連データ等を取得する。 (Step S407) The terminal first partial processing unit 43 performs first partial processing on the voice received in Step S406, and acquires voice-related data and the like.

（ステップＳ４０８）端末ミドルウェア部４４は、端末ミドルウェアの第一処理をステップＳ４０７で取得された音声関連データに対して実行し、送信する音声関連データ等を取得する。音声関連データ等は、音声関連データを有する。音声関連データ等は、音声関連データ等は、例えば、ユーザ識別子、発話内容データ、発話内容識別子のうちの１以上のデータを有する。 (Step S408) The terminal middleware unit 44 executes the first process of the terminal middleware on the voice related data acquired in Step S407, and acquires voice related data and the like to be transmitted. The voice related data has voice related data. The voice related data or the like includes, for example, one or more data of a user identifier, utterance content data, and utterance content identifier.

（ステップＳ４０９）端末送受信部４５は、ステップＳ４０８で取得された音声関連データ等を発音評定サーバ装置１に送信する。 (Step S409) The terminal transmission / reception unit 45 transmits the sound-related data and the like acquired in step S408 to the pronunciation rating server device 1.

（ステップＳ４１０）端末送受信部４５は、発音評定サーバ装置１から評定結果を含むデータを受信したか否かを判断する。評定結果を含むデータを受信した場合はステップＳ４１１に行き、受信しない場合はステップＳ４１０に戻る。 (Step S410) The terminal transmitting / receiving unit 45 determines whether or not data including a rating result has been received from the pronunciation rating server device 1. When the data including the evaluation result is received, the process goes to step S411. When the data is not received, the process returns to step S410.

（ステップＳ４１１）端末ミドルウェア部４４は、ステップＳ４１０で受信されたデータに対して端末ミドルウェアの第二処理を行い、評定結果を含むデータを取得する。 (Step S411) The terminal middleware unit 44 performs the second process of the terminal middleware on the data received in Step S410, and acquires data including the evaluation result.

（ステップＳ４１２）端末出力部４６は、ステップＳ４１１で取得された評定結果を出力する。 (Step S412) The terminal output unit 46 outputs the rating result acquired in step S411.

（ステップＳ４１３）端末出力部４６は、カウンタｉを１、インクリメントする。ステップＳ４０３に戻る。 (Step S413) The terminal output unit 46 increments the counter i by one. The process returns to step S403.

（ステップＳ４１４）端末受付部４２は、ユーザから発音評定情報に対する処理の指示を受け付けたか否かを判断する。処理の指示を受け付けた場合はステップＳ４１５に行き、処理の指示を受け付けない場合はステップＳ４０１に戻る。 (Step S414) The terminal receiving unit 42 determines whether or not a processing instruction for pronunciation rating information has been received from the user. If a processing instruction is accepted, the process proceeds to step S415. If a processing instruction is not accepted, the process returns to step S401.

（ステップＳ４１５）端末送受信部４５は、ステップＳ４１４で受け付けられた発音評定情報に対する処理の指示を発音評定サーバ装置１に送信する。なお、ここで、端末送受信部４５は、ユーザ識別子を含む指示を発音評定サーバ装置１に送信することは好適である。 (Step S415) The terminal transmission / reception unit 45 transmits a processing instruction for the pronunciation rating information received in Step S414 to the pronunciation rating server apparatus 1. Here, it is preferable that the terminal transmission / reception unit 45 transmits an instruction including the user identifier to the pronunciation rating server device 1.

（ステップＳ４１６）端末送受信部４５は、発音評定サーバ装置１から処理結果を含むデータを受信したか否かを判断する。データを受信した場合はステップＳ４１７に行き、データを受信しない場合はステップＳ４１６に戻る。 (Step S416) The terminal transmission / reception unit 45 determines whether or not data including a processing result has been received from the pronunciation rating server device 1. When data is received, the process goes to step S417, and when data is not received, the process returns to step S416.

（ステップＳ４１７）端末出力部４６は、ステップＳ４１６で受信された処理結果を出力する。ステップＳ４０１に戻る。 (Step S417) The terminal output unit 46 outputs the processing result received in Step S416. The process returns to step S401.

なお、図４のフローチャートにおいて、端末ミドルウェア部４４は、発音評定情報に対する処理の指示、および受信された評定結果を含むデータに対して、端末ミドルウェアの処理を行っても良い。 In the flowchart of FIG. 4, the terminal middleware unit 44 may perform terminal middleware processing on the data including the instruction of processing on the pronunciation rating information and the received rating result.

また、図４のフローチャートにおいて、端末ミドルウェア部４４が行うステップは存在しなくても良い。 In the flowchart of FIG. 4, the steps performed by the terminal middleware unit 44 may not exist.

さらに、図４のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 Further, in the flowchart of FIG. 4, the process is terminated by powering off or a process termination interrupt.

以上、本実施の形態によれば、発音評定のコアを形成する部分のアルゴリズムを実現したプログラムを端末装置にインストールせずに、発音評定サービスをユーザに提供できるため、発音評定のコアを形成するアルゴリズムを外部に知られるリスクを低減できる。または、本発明による発音評定サーバ装置によれば、処理能力の低い端末装置に頼ることなく、高速な発音評定処理を行える。 As described above, according to the present embodiment, the pronunciation rating service can be provided to the user without installing the program that implements the algorithm of the part that forms the pronunciation rating core in the terminal device, so the pronunciation rating core is formed. The risk that the algorithm is known to the outside can be reduced. Alternatively, according to the pronunciation rating server device according to the present invention, high-speed pronunciation rating processing can be performed without relying on a terminal device having a low processing capability.

なお、本実施の形態の発音評定サーバ装置１は、ミドルウェア部１３が動作する装置と、評定結果取得部１４が動作する装置とが分離されていることは好適である。また、評定結果取得部１４を含む装置は、インターネット等の外部の装置がアクセスできるネットワークに接続されていないことは好適である。発音評定のコアを形成する部分のアルゴリズムが漏洩することを防止できるからである。 In the pronunciation rating server device 1 according to the present embodiment, it is preferable that the device in which the middleware unit 13 operates and the device in which the rating result acquisition unit 14 operate are separated. Moreover, it is preferable that the apparatus including the evaluation result acquisition unit 14 is not connected to a network that can be accessed by an external apparatus such as the Internet. This is because it is possible to prevent leakage of the algorithm of the part that forms the core of pronunciation evaluation.

かかる場合、発音評定システムＡのブロック図は、図５である。発音評定システムＡは、ミドルウェア装置Ａ１、発音評定装置Ａ２、および端末装置４を具備する。ミドルウェア装置Ａ１は、受信部１２、ミドルウェア部１３、および送信部１５を具備する。発音評定装置Ａ２は、格納部１１、評定結果取得部１４、蓄積部１６、および処理部１７を具備する。そして、ミドルウェア装置Ａ１と発音評定装置Ａ２とは、通常、ＬＡＮ等で接続されており、発音評定装置Ａ２は外部からアクセスできない状況である。 In this case, a block diagram of the pronunciation rating system A is shown in FIG. The pronunciation rating system A includes a middleware device A1, a pronunciation rating device A2, and a terminal device 4. The middleware apparatus A1 includes a reception unit 12, a middleware unit 13, and a transmission unit 15. The pronunciation rating device A2 includes a storage unit 11, a rating result acquisition unit 14, a storage unit 16, and a processing unit 17. The middleware device A1 and the pronunciation rating device A2 are usually connected via a LAN or the like, and the pronunciation rating device A2 cannot be accessed from the outside.

なお、図５の発音評定システムＡは、ミドルウェア部１３と評定結果取得部１４との分離態様の一例である。ミドルウェア部１３と評定結果取得部１４とが異なる装置に存在すれば良い。つまり、ミドルウェア装置Ａ１は、受信部１２とミドルウェア部１３のみを具備し、発音評定装置Ａ２は、格納部１１、評定結果取得部１４、送信部１５、蓄積部１６、および処理部１７を具備しても良い。また、ミドルウェア装置Ａ１は、格納部１１、受信部１２、ミドルウェア部１３、送信部１５、蓄積部１６、処理部１７を具備し、発音評定装置Ａ２は、評定結果取得部１４のみを具備しても良い。 Note that the pronunciation rating system A in FIG. 5 is an example of a separation mode of the middleware unit 13 and the rating result acquisition unit 14. The middleware part 13 and the evaluation result acquisition part 14 should just exist in a different apparatus. That is, the middleware device A1 includes only the reception unit 12 and the middleware unit 13, and the pronunciation rating device A2 includes the storage unit 11, the rating result acquisition unit 14, the transmission unit 15, the storage unit 16, and the processing unit 17. May be. The middleware device A1 includes a storage unit 11, a reception unit 12, a middleware unit 13, a transmission unit 15, a storage unit 16, and a processing unit 17, and the pronunciation evaluation device A2 includes only the evaluation result acquisition unit 14. Also good.

ミドルウェア部１３と評定結果取得部１４とが分離しており、評定結果取得部１４が外部の装置からアクセスできないことにより、評定処理のアルゴリズムの漏洩を回避できる。 Since the middleware unit 13 and the rating result acquisition unit 14 are separated and the rating result acquisition unit 14 cannot be accessed from an external device, leakage of the algorithm of the rating process can be avoided.

さらに、本実施の形態における処理は、ソフトウェアで実現しても良い。そして、このソフトウェアをソフトウェアダウンロード等により配布しても良い。また、このソフトウェアをＣＤ−ＲＯＭなどの記録媒体に記録して流布しても良い。なお、このことは、本明細書における他の実施の形態においても該当する。なお、本実施の形態における発音評定サーバ装置１を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータを指示された発話内容に対してユーザが発声した音声を受け付け、発音評定のための第一の部分処理である第一部分処理を前記音声に対して行った結果である音声関連データを端末装置から受信する受信部と、前記第一部分処理の後に行うべき発音評定のための第二の部分処理である第二部分処理を前記音声関連データに対して行い、評定結果を取得する評定結果取得部と、前記評定結果を前記端末装置に送信する送信部として機能させるためのプログラムである。 Furthermore, the processing in the present embodiment may be realized by software. Then, this software may be distributed by software download or the like. Further, this software may be recorded and distributed on a recording medium such as a CD-ROM. This also applies to other embodiments in this specification. In addition, the software which implement | achieves the pronunciation rating server apparatus 1 in this Embodiment is the following programs. In other words, this program is a result of accepting the voice uttered by the user for the utterance content instructed by the computer and performing the first partial process, which is the first partial process for pronunciation evaluation, on the voice. A receiving unit that receives voice-related data from the terminal device, and a second partial process that is a second partial process for pronunciation evaluation to be performed after the first partial process is performed on the voice-related data, and a rating result is obtained. It is a program for functioning as a rating result acquisition unit to be acquired and a transmission unit for transmitting the rating result to the terminal device.

上記プログラムにおいて、前記音声関連データ、前記指示された発話内容を示す発話内容データまたは前記発話内容を識別する発話内容識別子、および前記評定結果を対応付けた１以上の発音評定情報を蓄積する蓄積部と、前記１以上の発音評定情報に対して処理を行う処理部として、コンピュータをさらに機能させるプログラムであることは好適である。 In the program, a storage unit that stores the voice related data, the utterance content data indicating the instructed utterance content or the utterance content identifier for identifying the utterance content, and one or more pronunciation rating information in association with the rating result It is preferable that the program further causes a computer to function as the processing unit that processes the one or more pronunciation rating information.

上記プログラムにおいて、前記受信部は、前記発話内容を識別する発話内容識別子、または前記発話内容を識別する発話内容識別子を用いて取得された発話内容データをも受信し、前記蓄積部は、前記音声関連データ、および前記発話内容データまたは前記発話内容識別子、および前記評定結果を対応付けて蓄積するものとして、コンピュータを機能させるプログラムであることは好適である。 In the above program, the receiving unit also receives an utterance content identifier for identifying the utterance content, or utterance content data acquired using the utterance content identifier for identifying the utterance content, and the storage unit It is preferable that the program causes a computer to function as the related data, the utterance content data or the utterance content identifier, and the rating result.

上記プログラムにおいて、前記受信部が受信したデータであり、音声関連データを含む受信データを、前記評定結果取得部が前記第二部分処理を行えるデータに変更し、かつ前記評定結果取得部が取得した評定結果を、前記送信部が送信できるデータに変更するミドルウェア部として、コンピュータをさらに機能させ、前記評定結果取得部は、前記ミドルウェア部が変更した後の音声関連データに対して第二部分処理を行い、評定結果を取得し、前記送信部は、前記ミドルウェア部が変更した後の評定結果を前記端末装置に送信するものとして、コンピュータを機能させるプログラムであることは好適である。 In the above program, the received data including the voice-related data that is received by the receiving unit is changed to data that can be subjected to the second partial processing by the rating result acquisition unit, and the rating result acquisition unit acquires the data. The computer further functions as a middleware unit that changes the rating result to data that can be transmitted by the transmission unit, and the rating result acquisition unit performs second partial processing on the voice-related data after the middleware unit has changed. It is preferable that the transmission unit is a program that causes a computer to function as transmitting the evaluation result after the change by the middleware unit to the terminal device.

上記プログラムにおいて、前記受信部は、前記端末装置の端末ミドルウェア部が行った処理の結果のデータであり、前記第一部分処理を前記音声に対して行われた結果である音声関連データに対して、送信できるデータに変更された後のデータであり、音声関連データを含むデータを前記端末装置から受信するものとして、コンピュータを機能させるプログラムであることは好適である。 In the above program, the receiving unit is data of a result of processing performed by a terminal middleware unit of the terminal device, and for voice related data that is a result of performing the first partial processing on the voice, It is preferable that the program is a program that causes a computer to function as data that has been changed to data that can be transmitted and that includes data related to voice, from the terminal device.

（実施の形態２）
本実施の形態において、実施の形態１との相違点は、発音評定システムが複数の端末装置を具備する点である。つまり、本実施の形態において、マルチクライアントである。 (Embodiment 2)
In the present embodiment, the difference from the first embodiment is that the pronunciation rating system includes a plurality of terminal devices. That is, in this embodiment, it is a multi-client.

また、本実施の形態において、端末装置によって、発音評定サーバ装置が行う第一部分処理が異なる場合がある発音評定システムについて説明する。つまり、かかる場合は、例えば、一の端末装置には、旧バージョンの発音評定クライアントプログラムが格納されており、他の端末装置には、新バージョンの発音評定クライアントプログラムが格納されているような場合である。 In the present embodiment, a pronunciation rating system in which the first partial processing performed by the pronunciation rating server apparatus may be different depending on the terminal device will be described. That is, in such a case, for example, one terminal device stores an old version of the pronunciation rating client program, and another terminal device stores a new version of the pronunciation rating client program. It is.

さらに、本実施の形態において、ユーザ属性値に応じて、発音評定で使用する教師データが異なる発音評定サーバ装置を具備する発音評定システムについて説明する。 Furthermore, in the present embodiment, a pronunciation rating system including a pronunciation rating server apparatus that uses different teacher data for pronunciation evaluation according to user attribute values will be described.

図６は、本実施の形態における発音評定システムＢの概念図である。発音評定システムＢは、発音評定サーバ装置２、２以上の端末装置４を備える。 FIG. 6 is a conceptual diagram of the pronunciation rating system B in the present embodiment. The pronunciation rating system B includes a pronunciation rating server device 2 and two or more terminal devices 4.

図７は、本実施の形態における発音評定システムＢのブロック図である。発音評定サーバ装置２は、格納部１１、受信部２２、ミドルウェア部２３、評定結果取得部２４、送信部１５、蓄積部１６、および処理部１７を備える。 FIG. 7 is a block diagram of the pronunciation rating system B in the present embodiment. The pronunciation rating server device 2 includes a storage unit 11, a receiving unit 22, a middleware unit 23, a rating result acquisition unit 24, a transmission unit 15, a storage unit 16, and a processing unit 17.

発音評定サーバ装置２を構成する受信部２２は、２以上の端末装置４から音声関連データを受信する。受信部２２は、通常、２以上の端末装置４から音声関連データとユーザ識別子とを受信する。 The receiving unit 22 constituting the pronunciation rating server device 2 receives voice-related data from two or more terminal devices 4. The receiving unit 22 normally receives voice-related data and user identifiers from two or more terminal devices 4.

受信部２２は、異なる種類の音声関連データを、２以上の端末装置４から受信しても良い。異なる種類の音声関連データとは、２以上の各端末装置４が異なる第一部分処理を行った結果である。かかる音声関連データの違いは、例えば、端末装置４の機種の違いや端末装置４にインストールされている発音評定端末プログラムのバージョンの違い等に起因する。 The receiving unit 22 may receive different types of audio-related data from two or more terminal devices 4. Different types of sound-related data are results of two or more terminal devices 4 performing different first partial processes. The difference in the voice related data is caused by, for example, a difference in the model of the terminal device 4 or a difference in the version of the pronunciation rating terminal program installed in the terminal device 4.

つまり、例えば、第一の種類の端末装置４は、受け付けた音声から音声データを取得し、当該音声データを発音評定サーバ装置２に送信する。第一の種類の端末装置４が行う第一部分処理は音声データの取得処理である。また、第二の種類の端末装置４は、受け付けた音声から音声データを取得し、当該音声データから特徴データを取得し、当該特徴データを発音評定サーバ装置２に送信する。第二の種類の端末装置４が行う第一部分処理は音声データの取得処理と特徴データの取得処理である。 That is, for example, the first type terminal device 4 acquires audio data from the received audio and transmits the audio data to the pronunciation rating server device 2. The first partial process performed by the first type terminal device 4 is an audio data acquisition process. The second type terminal device 4 acquires voice data from the received voice, acquires feature data from the voice data, and transmits the feature data to the pronunciation rating server device 2. The first partial processing performed by the second type of terminal device 4 is voice data acquisition processing and feature data acquisition processing.

また、例えば、第一の種類の端末装置４は、端末ミドルウェア処理を行い、その結果を発音評定サーバ装置２に送信する。なお、その結果とは、例えば、音声データまたは特徴データ等である。また、第二の種類の端末装置４は、端末ミドルウェア処理を行わずに、音声データまたは特徴データ等を発音評定サーバ装置２に送信する。 Further, for example, the first type terminal device 4 performs terminal middleware processing and transmits the result to the pronunciation rating server device 2. The result is, for example, voice data or feature data. Further, the second type terminal device 4 transmits voice data or feature data to the pronunciation rating server device 2 without performing terminal middleware processing.

受信部２２は、さらに、各端末装置４から、発話内容識別子または発話内容データをも受信することは好適である。 It is preferable that the receiving unit 22 further receives an utterance content identifier or utterance content data from each terminal device 4.

受信部２２は、端末装置４の端末ミドルウェア部４４が行った処理の結果のデータであり、音声関連データを含むデータを、２以上の各端末装置４から受信することは好適である。 The receiving unit 22 is data as a result of processing performed by the terminal middleware unit 44 of the terminal device 4, and it is preferable to receive data including audio-related data from two or more terminal devices 4.

ミドルウェア部２３は、受信部２２が２以上の端末装置４から受信したデータであり、音声関連データを含む受信データを、評定結果取得部２４が第二部分処理を行えるデータに変更する。例えば、受信部２２が音声関連データを受信した場合に、ミドルウェア部２３は、プロセスをフォークする、スレッドを立ち上げる等の処理により、並行してミドルウェアの第一処理を行う。 The middleware unit 23 is data received by the receiving unit 22 from the two or more terminal devices 4, and changes the received data including the voice-related data to data that the evaluation result acquisition unit 24 can perform the second partial processing. For example, when the receiving unit 22 receives audio-related data, the middleware unit 23 performs a first middleware process in parallel by a process such as forking a process or starting a thread.

また、ミドルウェア部２３は、評定結果取得部２４が取得した評定結果を、送信部１５が送信できるデータに変更する。なお、かかる処理は、ミドルウェアの第二処理である。 Further, the middleware unit 23 changes the evaluation result acquired by the evaluation result acquisition unit 24 to data that can be transmitted by the transmission unit 15. This process is the second middleware process.

評定結果取得部２４は、第二部分処理を音声関連データに対して行い、評定結果を取得する。評定結果取得部２４は、ミドルウェア部２３が変更した後の音声関連データに対して第二部分処理を行い、評定結果を取得しても良い。 The rating result acquisition unit 24 performs the second partial process on the voice-related data and acquires the rating result. The rating result acquisition unit 24 may perform the second partial process on the voice-related data after the middleware unit 23 has changed and acquire the rating result.

評定結果取得部２４は、受信部２２が２以上の端末装置４から受信した２以上の各音声関連データに対して、並行して第二部分処理を行い、２以上の評定結果を取得する。なお、２以上の評定結果は、直ちに、各端末装置４に送信されることは好適である。並行処理は、並列処理でも良い。例えば、受信部２２が音声関連データを受信した場合に、評定結果取得部２４は、プロセスをフォークする、スレッドを立ち上げる等の処理により、並行して第二部分処理を行っても良い。 The rating result acquisition unit 24 performs a second partial process on the two or more audio-related data received by the receiving unit 22 from the two or more terminal devices 4 in parallel, and acquires two or more rating results. It is preferable that two or more evaluation results are immediately transmitted to each terminal device 4. The parallel processing may be parallel processing. For example, when the reception unit 22 receives audio-related data, the evaluation result acquisition unit 24 may perform the second partial processing in parallel by processing such as forking a process or starting a thread.

評定結果取得部２４は、異なる種類の各音声関連データに対して、異なる第二部分処理を行い、評定結果を取得しても良い。異なる種類の音声関連データとは、例えば、音声データ、特徴データ等である。つまり、受信された音声関連データが音声データである場合、評定結果取得部２４は、例えば、音声データから特徴データを抽出し、当該特徴データと発話内容データを元に、教師データと照合し、発音評定を行い、評定値を算出する。また、受信された音声関連データが特徴データである場合、評定結果取得部２４は、例えば、受信された特徴データと発話内容データを元に、教師データと照合し、発音評定を行い、評定値を算出する。つまり、受信された音声関連データが特徴データである場合、音声データから特徴データを抽出する処理は不要となる。また、受信された音声関連データが１以上の音声フレームデータである場合、評定結果取得部２４は、例えば、１以上の音声フレームデータから特徴データを抽出し、当該特徴データと発話内容データを元に、教師データと照合し、発音評定を行い、評定値を算出する。 The rating result acquisition unit 24 may perform different second partial processing on different types of audio-related data and acquire the rating result. Different types of voice-related data are, for example, voice data, feature data, and the like. That is, when the received voice-related data is voice data, for example, the evaluation result acquisition unit 24 extracts feature data from the voice data and collates with teacher data based on the feature data and the utterance content data. Perform pronunciation rating and calculate the rating value. Further, when the received voice-related data is feature data, the evaluation result acquisition unit 24 compares the teacher data with the teacher data based on the received feature data and utterance content data, for example, Is calculated. That is, when the received voice related data is feature data, the process of extracting feature data from the voice data is not necessary. When the received voice-related data is one or more voice frame data, the evaluation result acquisition unit 24 extracts, for example, feature data from the one or more voice frame data, and based on the feature data and the utterance content data. Then, the pronunciation value is checked against the teacher data, and the rating value is calculated.

評定結果取得部２４は、ユーザの属性値であるユーザ属性値に応じて、異なる教師データを用いて、第二部分処理を音声関連データに対して行い、評定結果を取得することは好適である。なお、教師データは、例えば、ユーザ属性値に対応付けて、格納部１１に格納されている。ここで、ユーザ属性値は、例えば、年齢層、性別のうちの１以上の情報である。 It is preferable that the evaluation result acquisition unit 24 performs the second partial process on the voice-related data using different teacher data according to the user attribute value that is the user attribute value, and acquires the evaluation result. . The teacher data is stored in the storage unit 11 in association with the user attribute value, for example. Here, the user attribute value is, for example, one or more pieces of information of age group and sex.

受信部２２は、通常、無線または有線の通信手段で実現されるが、放送を受信する手段で実現されても良い。 The receiving unit 22 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.

ミドルウェア部２３、評定結果取得部２４は、通常、ＭＰＵやメモリ等から実現され得る。ミドルウェア部２３等の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The middleware unit 23 and the evaluation result acquisition unit 24 can be usually realized by an MPU, a memory, or the like. The processing procedure of the middleware unit 23 and the like is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

次に、発音評定システムＢの動作について説明する。まず、発音評定サーバ装置２の動作について、図８のフローチャートを用いて説明する。図８のフローチャートにおいて、図３のフローチャートと同一のステップについて説明を省略する。なお、ステップＳ３０１において受信部２２が音声関連データを受信した場合に、ミドルウェア部２３は、プロセスをフォークする、スレッドを立ち上げる等の処理により、並行してミドルウェアの第一処理（Ｓ３０２の処理）を行う。また、ミドルウェアの第一処理が行われない場合、ステップＳ３０１において受信部２２が音声関連データを受信した際に、評定結果取得部２４は、プロセスをフォークする、スレッドを立ち上げる等の処理により、並行してステップＳ８０１の第二部分処理を行う。 Next, the operation of the pronunciation rating system B will be described. First, the operation of the pronunciation rating server device 2 will be described using the flowchart of FIG. In the flowchart of FIG. 8, the description of the same steps as those in the flowchart of FIG. 3 is omitted. When the receiving unit 22 receives the voice-related data in step S301, the middleware unit 23 performs first middleware processing (processing in S302) in parallel by processing such as forking a process and starting up a thread. I do. Further, when the first middleware process is not performed, when the receiving unit 22 receives the voice-related data in step S301, the evaluation result acquisition unit 24 performs a process such as forking a process and starting a thread. In parallel, the second partial process of step S801 is performed.

（ステップＳ８０１）評定結果取得部２４は、ステップＳ３０２で取得されたデータに対して、第二部分処理を行い、評定結果を取得する。本発音評定処理は、実施の形態１における第二部分処理とは異なる。また、ステップＳ８０１の第二部分処理の詳細について、図９フローチャートを用いて説明する。 (Step S801) The rating result acquisition unit 24 performs second partial processing on the data acquired in step S302, and acquires the rating result. This pronunciation evaluation process is different from the second partial process in the first embodiment. Details of the second partial process in step S801 will be described with reference to the flowchart of FIG.

なお、図８のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 In the flowchart of FIG. 8, the process ends when the power is turned off or the process is terminated.

次に、ステップＳ８０１の第二部分処理の詳細について、図９フローチャートを用いて説明する。 Next, details of the second partial processing in step S801 will be described with reference to the flowchart of FIG.

（ステップＳ９０１）評定結果取得部２４は、１以上のユーザ属性値を取得する。評定結果取得部２４は、例えば、ステップＳ３０１で受信されたユーザ識別子と対になる１以上のユーザ属性値を格納部１１から取得する。評定結果取得部２４は、例えば、ステップＳ３０１で受信された１以上のユーザ属性値を取得しても良い。 (Step S901) The evaluation result acquisition unit 24 acquires one or more user attribute values. The rating result acquisition unit 24 acquires, from the storage unit 11, one or more user attribute values that are paired with the user identifier received in step S301, for example. The rating result acquisition unit 24 may acquire, for example, one or more user attribute values received in step S301.

（ステップＳ９０２）評定結果取得部２４は、ステップＳ９０１で取得した１以上のユーザ属性値に対応する教師データを格納部１１から取得する。 (Step S902) The evaluation result acquisition unit 24 acquires teacher data corresponding to one or more user attribute values acquired in step S901 from the storage unit 11.

（ステップＳ９０３）評定結果取得部２４は、受信された音声関連データの種類を決定する。評定結果取得部２４は、ステップＳ３０１で受信された音声関連データの形式や構造等のデータの内容から音声関連データの種類を決定しても良いし、ステップＳ３０１で受信された音声関連データの種類を識別する種類識別子から音声関連データの種類を決定しても良い。なお、種類を決定することは、例えば、音声関連データの種類識別子を取得することである。 (Step S903) The rating result acquisition unit 24 determines the type of the received voice-related data. The evaluation result acquisition unit 24 may determine the type of audio-related data from the content of data such as the format and structure of the audio-related data received in step S301, or the type of audio-related data received in step S301. The type of voice-related data may be determined from the type identifier for identifying. Note that determining the type is, for example, obtaining the type identifier of the voice related data.

（ステップＳ９０４）評定結果取得部２４は、ステップＳ９０３で決定した音声関連データの種類に応じた第二部分処理を行う。上位処理にリターンする。 (Step S904) The evaluation result acquisition unit 24 performs the second partial process according to the type of the voice related data determined in Step S903. Return to upper process.

なお、音声関連データの種類に応じた第二部分処理とは、例えば、以下である。音声関連データの種類が特徴データである場合、第二部分処理は、例えば、特徴データと教師データとを用いて、特徴データが評定対象の音素である事後確率を算出し、事後確率から音声の評定値を算出し、１以上の評定値から評定結果を取得する処理である。また、音声関連データの種類が音声データである場合、例えば、音声データから特徴データを取得し、当該特徴データと教師データとを用いて、特徴データが評定対象の音素である事後確率を算出し、事後確率から音声の評定値を算出し、１以上の評定値から評定結果を取得する処理である。 The second partial process corresponding to the type of audio related data is, for example, as follows. When the type of speech-related data is feature data, for example, the second partial process calculates the posterior probability that the feature data is the phoneme to be evaluated using the feature data and the teacher data. This is a process of calculating a rating value and acquiring a rating result from one or more rating values. In addition, when the type of speech related data is speech data, for example, feature data is obtained from speech data, and the posterior probability that the feature data is a phoneme to be evaluated is calculated using the feature data and the teacher data. This is a process of calculating a voice rating value from the posterior probability and obtaining a rating result from one or more rating values.

以下、本実施の形態における発音評定システムＢの具体的な動作について説明する。発音評定システムＢの概念図は図６である。 Hereinafter, a specific operation of the pronunciation rating system B in the present embodiment will be described. A conceptual diagram of the pronunciation rating system B is shown in FIG.

今、格納部１１は、図１０に示すユーザ情報管理表を保持している。ユーザ情報管理表は「ＩＤ」「ユーザ識別子」「氏名」「ユーザ属性値」を有する。「ユーザ属性値」は、「年齢」「性別」を有する。「ＩＤ」は、レコードを識別する情報である。 Now, the storage unit 11 holds the user information management table shown in FIG. The user information management table has “ID”, “user identifier”, “name”, and “user attribute value”. The “user attribute value” has “age” and “sex”. “ID” is information for identifying a record.

また、格納部１１は、図１１に示す教師データ管理表を保持している。教師データ管理表は「ＩＤ」「ユーザ属性値条件」「教師データ識別子」「教師データ」を有する。「ユーザ属性値条件」は、教師データを選択するための条件であり、ユーザ属性値に関する条件である。「ユーザ属性値条件」は、ここでは、ユーザの年齢層、性別に対応する条件である。 The storage unit 11 holds a teacher data management table shown in FIG. The teacher data management table has “ID”, “user attribute value condition”, “teacher data identifier”, and “teacher data”. The “user attribute value condition” is a condition for selecting teacher data, and is a condition related to a user attribute value. Here, the “user attribute value condition” is a condition corresponding to the age group and sex of the user.

また、ユーザＡ「山田太郎」の端末装置４には、バージョン１の発音評定クライアントプログラムがインストールされている、とする。また、ユーザＢ「太田花子」の端末装置４には、バージョン２の発音評定クライアントプログラムがインストールされている、とする。なお、バージョン１の発音評定クライアントプログラムにおいて、第一部分処理は受け付けられた音声をデジタル化し、音声データを取得する処理である。また、バージョン２の発音評定クライアントプログラムにおいて、第一部分処理は受け付けられた音声をデジタル化し、音声データを取得し、当該音声データから特徴データを取得する処理である。 Further, it is assumed that a version 1 pronunciation rating client program is installed in the terminal device 4 of the user A “Taro Yamada”. Further, it is assumed that a version 2 pronunciation rating client program is installed in the terminal device 4 of the user B “Hanako Ota”. In the version 1 pronunciation rating client program, the first partial process is a process of digitizing the received voice and acquiring voice data. In the version 2 pronunciation rating client program, the first partial process is a process of digitizing the received voice, acquiring voice data, and acquiring feature data from the voice data.

また、ユーザＡの端末装置４およびユーザＢの端末装置４には、図１２に示す発音練習のリストが格納されている、とする。発音練習のリストは、発話内容データを識別する発話内容識別子の集合である。 Further, it is assumed that the terminal device 4 of user A and the terminal device 4 of user B store the pronunciation practice list shown in FIG. The pronunciation practice list is a set of utterance content identifiers for identifying utterance content data.

以上の状況において、ユーザＡは、自分の端末装置４に対して、動作指示を入力した、とする。次に、端末受付部４２は、ユーザＡからの動作指示を受け付ける。 In the above situation, it is assumed that the user A inputs an operation instruction to his / her terminal device 4. Next, the terminal reception unit 42 receives an operation instruction from the user A.

次に、ユーザＡの端末装置４の端末出力部４６は、１番目の設問の発話内容識別子「５３」を発音練習のリストから取得する。そして、端末出力部４６は、発話内容識別子「５３」を用いて、当該発話内容識別子に対応する発話内容データを外部の装置から取得する。 Next, the terminal output unit 46 of the terminal device 4 of the user A acquires the utterance content identifier “53” of the first question from the pronunciation practice list. Then, using the utterance content identifier “53”, the terminal output unit 46 acquires utterance content data corresponding to the utterance content identifier from an external device.

次に、端末出力部４６は、取得した発話内容データを表示する。そして、ユーザＡは、表示された発話内容データを見て、発話内容データに合うように発音した、とする。 Next, the terminal output unit 46 displays the acquired utterance content data. Then, it is assumed that the user A looks at the displayed utterance content data and pronounces it so as to match the utterance content data.

すると、端末受付部４２は、ユーザから音声を受け付ける。次に、端末第一部分処理部４３は、受け付けられた音声をデジタル化し、音声関連データである音声データを取得する。次に、端末ミドルウェア部４４は、音声データに対して端末ミドルウェアの第一処理を実行する。ここでは、端末ミドルウェア部４４は、音声データにユーザ識別子「１００１」と発話内容識別子「５３」とを付加し、暗号化、および圧縮し、ＨＴＴＰの通信プロトコルに合わせたデータ構造に変更し、送信する音声関連データ等を取得した、とする。 Then, the terminal reception part 42 receives a voice from a user. Next, the terminal first partial processing unit 43 digitizes the received voice and acquires voice data that is voice-related data. Next, the terminal middleware unit 44 executes a first process of terminal middleware on the audio data. Here, the terminal middleware unit 44 adds the user identifier “1001” and the utterance content identifier “53” to the voice data, encrypts and compresses it, changes the data structure to match the HTTP communication protocol, and transmits it. Suppose you have acquired voice related data.

次に、ユーザＡの端末装置４の端末送受信部４５は、取得された音声関連データ等を発音評定サーバ装置１に送信する。 Next, the terminal transmission / reception unit 45 of the terminal device 4 of the user A transmits the acquired voice related data and the like to the pronunciation rating server device 1.

そして、ユーザＡから少し遅れて、ユーザＢは、自分の端末装置４に対して、動作指示を入力した、とする。次に、ユーザＢの端末装置４の端末受付部４２は、ユーザＢからの動作指示を受け付ける。 Then, it is assumed that the user B inputs an operation instruction to his / her terminal device 4 with a slight delay from the user A. Next, the terminal reception unit 42 of the terminal device 4 of the user B receives an operation instruction from the user B.

次に、端末出力部４６は、１番目の設問の発話内容識別子「５３」を発音練習のリストから取得する。そして、端末出力部４６は、発話内容識別子「５３」を用いて、当該発話内容識別子に対応する発話内容データを端末格納部４１から取得する。なお、ユーザＢの端末装置４の端末格納部４１には、発話内容識別子に対応付けて発話内容データが格納されている、とする。 Next, the terminal output unit 46 acquires the utterance content identifier “53” of the first question from the pronunciation practice list. Then, the terminal output unit 46 acquires the utterance content data corresponding to the utterance content identifier from the terminal storage unit 41 using the utterance content identifier “53”. It is assumed that utterance content data is stored in the terminal storage unit 41 of the terminal device 4 of the user B in association with the utterance content identifier.

次に、ユーザＢの端末装置４の端末出力部４６は、取得した発話内容データを表示する。そして、ユーザＢは、表示された発話内容データを見て、発話内容データに合うように発音した、とする。 Next, the terminal output unit 46 of the terminal device 4 of the user B displays the acquired utterance content data. Then, it is assumed that the user B looks at the displayed utterance content data and pronounces it so as to match the utterance content data.

すると、ユーザＢの端末装置４の端末受付部４２は、ユーザから音声を受け付ける。次に、端末第一部分処理部４３は、受け付けられた音声をデジタル化し、音声関連データである音声データを取得する。そして、端末第一部分処理部４３は、取得した音声データから特徴データを取得する。 Then, the terminal reception unit 42 of the terminal device 4 of the user B receives voice from the user. Next, the terminal first partial processing unit 43 digitizes the received voice and acquires voice data that is voice-related data. And the terminal 1st partial process part 43 acquires feature data from the acquired audio | voice data.

次に、端末ミドルウェア部４４は、特徴データに対して端末ミドルウェアの第一処理を実行する。ここでは、ユーザＢの端末装置４の端末ミドルウェア部４４は、音声データにユーザ識別子「１００２」と発話内容識別子「５３」とを付加し、暗号化、および圧縮し、ＨＴＴＰの通信プロトコルに合わせたデータ構造に変更し、送信する音声関連データ等を取得した、とする。 Next, the terminal middleware unit 44 performs a first process of terminal middleware on the feature data. Here, the terminal middleware unit 44 of the terminal device 4 of the user B adds the user identifier “1002” and the utterance content identifier “53” to the voice data, encrypts and compresses them, and matches the HTTP communication protocol. Assume that the data structure is changed and the voice related data to be transmitted is acquired.

次に、ユーザＢの端末装置４の端末送受信部４５は、取得された音声関連データ等を発音評定サーバ装置１に送信する。 Next, the terminal transmission / reception unit 45 of the terminal device 4 of the user B transmits the acquired voice related data and the like to the pronunciation rating server device 1.

次に、発音評定サーバ装置１の受信部１２は、音声関連データ等をユーザＡの端末装置４から受信する。 Next, the receiving unit 12 of the pronunciation rating server device 1 receives voice-related data and the like from the terminal device 4 of the user A.

そして、ミドルウェア部２３は、プロセスをフォークし、当該プロセス（プロセス１とする）内でミドルウェアの第一処理を行う。また、評定結果取得部２４は、プロセス１内で第二部分処理を行う。 Then, the middleware unit 23 forks the process and performs the first middleware processing within the process (referred to as process 1). The evaluation result acquisition unit 24 performs the second partial process in the process 1.

また、発音評定サーバ装置１の受信部１２は、音声関連データ等をユーザＢの端末装置４から受信する。 In addition, the receiving unit 12 of the pronunciation rating server device 1 receives voice related data and the like from the terminal device 4 of the user B.

そして、ミドルウェア部２３は、プロセスをフォークし、当該プロセス（プロセス２とする）内でミドルウェアの第一処理を行う。また、評定結果取得部２４は、プロセス２内で第二部分処理を行う。 Then, the middleware unit 23 forks the process and performs the first middleware processing within the process (referred to as process 2). The evaluation result acquisition unit 24 performs the second partial process in the process 2.

以下、プロセス１、プロセス２の詳細について説明する。
（プロセス１） Hereinafter, details of the process 1 and the process 2 will be described.
(Process 1)

ミドルウェア部２３は、ミドルウェアの第一処理を行い、評定結果取得部２４が第二部分処理を行えるデータを取得する。つまり、ミドルウェア部２３は、受信された音声関連データ等を解凍し、復号化し、第二部分処理を行えるデータを取得する。なお、第二部分処理を行えるデータは、ここでは、音声データ、ユーザ識別子「１００１」、および発話内容識別子「５３」を含む。 The middleware unit 23 performs a first process of middleware, and acquires data that the evaluation result acquisition unit 24 can perform the second partial process. That is, the middleware unit 23 decompresses the received voice-related data and the like, decodes it, and obtains data that can be subjected to the second partial processing. Here, the data that can be subjected to the second partial processing includes voice data, a user identifier “1001”, and an utterance content identifier “53”.

次に、評定結果取得部２４は、取得されたデータに対して、以下の第二部分処理を行い、評定結果を取得する。つまり、評定結果取得部２４は、ユーザ識別子「１００１」と対になる１以上のユーザ属性値「年齢：３５」「性別：男」を、図１０のユーザ情報管理表から取得する。次に、評定結果取得部２４は、ユーザ属性値「年齢：３５」「性別：男」に対応する教師データ識別子「教師男＿１．ｄａｔ」で識別される教師データを図１１の教師データ管理表から取得する。 Next, the rating result acquisition unit 24 performs the following second partial process on the acquired data, and acquires the rating result. That is, the rating result acquisition unit 24 acquires one or more user attribute values “age: 35” and “gender: male” paired with the user identifier “1001” from the user information management table of FIG. Next, the evaluation result acquisition unit 24 converts the teacher data identified by the teacher data identifier “teacher male_1.dat” corresponding to the user attribute values “age: 35” and “gender: male” into the teacher data management table of FIG. Get from.

次に、評定結果取得部２４は、発話内容識別子「５３」に対応する発話内容データを取得する。なお、評定結果取得部２４は、発話内容識別子「５３」に対応する発話内容データを格納部１１から取得しても良いし、外部の装置から取得しても良い。 Next, the evaluation result acquisition unit 24 acquires utterance content data corresponding to the utterance content identifier “53”. The rating result acquisition unit 24 may acquire the utterance content data corresponding to the utterance content identifier “53” from the storage unit 11 or may be acquired from an external device.

次に、評定結果取得部２４は、受信された音声関連データのデータ内容から、その種類を決定し、種類識別子「１」を取得する。なお、種類識別子「１」は「音声データ」、種類識別子「２」は「特徴データ」である、とする。 Next, the rating result acquisition unit 24 determines the type from the data content of the received voice-related data, and acquires the type identifier “1”. The type identifier “1” is “voice data”, and the type identifier “2” is “feature data”.

次に、評定結果取得部２４は、教師データ識別子「教師男＿１．ｄａｔ」で識別される教師データと発話内容データとを用いて、音声関連データに対して、音声関連データの種類識別子「１」に応じた第二部分処理を行う。つまり、評定結果取得部２４は、音声データから特徴データを抽出する。そして、評定結果取得部２４は、抽出した特徴データと発話内容データとを教師データ「教師男＿１．ｄａｔ」に適用し、評定値を算出する。なお、ここでは、評定値は評定結果である、とする。 Next, the evaluation result acquisition unit 24 uses the teacher data and the utterance content data identified by the teacher data identifier “teacher male_1.dat” to the voice-related data, using the type identifier “1” of the voice-related data. The second partial processing according to “is performed. That is, the rating result acquisition unit 24 extracts feature data from the voice data. Then, the rating result acquisition unit 24 applies the extracted feature data and utterance content data to the teacher data “teacher male_1.dat”, and calculates a rating value. Here, it is assumed that the rating value is the rating result.

次に、ミドルウェア部２３は、取得された評定結果に対して、ミドルウェアの第二処理を行い、送信部１５が送信できるデータを構成する。ここでは、ミドルウェア部２３は、評定結果を圧縮し、ＨＴＴＰの通信プロトコルで送信できるデータ構造に変換する、とする、とする。 Next, the middleware unit 23 performs a second middleware process on the acquired evaluation result, and configures data that can be transmitted by the transmission unit 15. Here, it is assumed that the middleware unit 23 compresses the evaluation result and converts it into a data structure that can be transmitted using the HTTP communication protocol.

次に、送信部１５は、構成された評定結果を含むデータを、ユーザＡの端末装置４に送信する。 Next, the transmission unit 15 transmits data including the configured evaluation result to the terminal device 4 of the user A.

次に、蓄積部１６は、音声関連データ、発話内容データまたは発話内容識別子、および評定結果を対応付けた発音評定情報を構成する。 Next, the storage unit 16 configures pronunciation rating information in which voice related data, utterance content data or utterance content identifier, and a rating result are associated with each other.

次に、蓄積部１６は、ユーザ識別子「１００１」に対応付けて、構成した発音評定情報を格納部１１に蓄積する。 Next, the storage unit 16 stores the configured pronunciation rating information in the storage unit 11 in association with the user identifier “1001”.

次に、ユーザＡの端末装置４の端末送受信部４５は、評定結果を含むデータを受信する。そして、端末ミドルウェア部４４は、受信されたデータに対して、端末ミドルウェアの第二処理を行い、出力できるデータを得る。そして、端末出力部４６は、端末ミドルウェア部４４が変換して得た評定結果を含むデータを出力する。 Next, the terminal transmitting / receiving unit 45 of the terminal device 4 of the user A receives data including the evaluation result. And the terminal middleware part 44 performs the 2nd process of terminal middleware with respect to the received data, and obtains the data which can be output. And the terminal output part 46 outputs the data containing the evaluation result obtained by the terminal middleware part 44 converting.

そして、以後、ユーザＡの端末装置４において、同様に、発話内容識別子「１０８」「１０９」等で識別される発話内容データに対応する処理が行われる。
（プロセス２） Thereafter, in the terminal device 4 of the user A, similarly, processing corresponding to the utterance content data identified by the utterance content identifiers “108” and “109” is performed.
(Process 2)

ミドルウェア部２３は、ミドルウェアの第一処理を行い、評定結果取得部２４が第二部分処理を行えるデータを取得する。つまり、ミドルウェア部２３は、受信された音声関連データ等を解凍し、復号化し、第二部分処理を行えるデータを取得する。なお、第二部分処理を行えるデータは、ここでは、音声データ、ユーザ識別子「１００２」、および発話内容識別子「５３」を含む。 The middleware unit 23 performs a first process of middleware, and acquires data that the evaluation result acquisition unit 24 can perform the second partial process. That is, the middleware unit 23 decompresses the received voice-related data and the like, decodes it, and obtains data that can be subjected to the second partial processing. Here, the data that can be subjected to the second partial processing includes voice data, a user identifier “1002”, and an utterance content identifier “53”.

次に、評定結果取得部２４は、取得されたデータに対して、以下の第二部分処理を行い、評定結果を取得する。つまり、評定結果取得部２４は、ユーザ識別子「１００２」と対になる１以上のユーザ属性値「年齢：２３」「性別：女」を、ユーザ情報管理表から取得する。次に、評定結果取得部２４は、ユーザ属性値「年齢：２３」「性別：女」に対応する教師データ識別子「教師女＿ｎ．ｄａｔ」で識別される教師データを教師データ管理表から取得する。 Next, the rating result acquisition unit 24 performs the following second partial process on the acquired data, and acquires the rating result. That is, the rating result acquisition unit 24 acquires one or more user attribute values “age: 23” and “sex: female” that are paired with the user identifier “1002” from the user information management table. Next, the rating result acquisition unit 24 acquires the teacher data identified by the teacher data identifier “teacher woman_n.dat” corresponding to the user attribute values “age: 23” and “gender: woman” from the teacher data management table. .

次に、評定結果取得部２４は、受信された音声関連データのデータ内容から、その種類を決定し、種類識別子「２」を取得する。 Next, the rating result acquisition unit 24 determines the type from the data content of the received voice-related data, and acquires the type identifier “2”.

次に、評定結果取得部２４は、教師データ識別子「教師女＿ｎ．ｄａｔ」で識別される教師データと発話内容データとを用いて、音声関連データに対して、音声関連データの種類識別子「２」に応じた第二部分処理を行う。つまり、評定結果取得部２４は、特徴データと発話内容データとを教師データ「教師女＿ｎ．ｄａｔ」に適用し、評定値を算出する。なお、ここでは、評定値は評定結果である、とする。 Next, the evaluation result acquisition unit 24 uses the teacher data and the utterance content data identified by the teacher data identifier “teacher_n.dat”, and the voice related data type identifier “2” for the voice related data. The second partial processing according to “is performed. That is, the rating result acquisition unit 24 applies the feature data and the utterance content data to the teacher data “teacher_n.dat”, and calculates a rating value. Here, it is assumed that the rating value is the rating result.

次に、送信部１５は、構成された評定結果を含むデータを、ユーザＢの端末装置４に送信する。 Next, the transmission unit 15 transmits data including the configured evaluation result to the terminal device 4 of the user B.

次に、蓄積部１６は、ユーザ識別子「１００２」に対応付けて、構成した発音評定情報を格納部１１に蓄積する。 Next, the storage unit 16 stores the configured pronunciation rating information in the storage unit 11 in association with the user identifier “1002”.

次に、ユーザＢの端末装置４の端末送受信部４５は、評定結果を含むデータを受信する。そして、端末ミドルウェア部４４は、受信されたデータに対して、端末ミドルウェアの第二処理を行い、出力できるデータを得る。そして、端末出力部４６は、端末ミドルウェア部４４が変換して得た評定結果を含むデータを出力する。 Next, the terminal transmitting / receiving unit 45 of the terminal device 4 of the user B receives data including the evaluation result. And the terminal middleware part 44 performs the 2nd process of terminal middleware with respect to the received data, and obtains the data which can be output. And the terminal output part 46 outputs the data containing the evaluation result obtained by the terminal middleware part 44 converting.

そして、以後、ユーザＢの端末装置４において、同様に、発話内容識別子「１０８」「１０９」等で識別される発話内容データに対応する処理が行われる。 Thereafter, in the terminal device 4 of the user B, similarly, processing corresponding to the utterance content data identified by the utterance content identifiers “108” and “109” is performed.

以上、本実施の形態によれば、マルチクライアントの発音評定システムにおいて、発音評定のコアを形成するアルゴリズムを実現したプログラムを端末装置４にインストールせずに、発音評定サービスをユーザに提供できるため、発音評定のコアを形成するアルゴリズムを外部に知られるリスクを低減できる。または、本発明による発音評定サーバ装置２によれば、処理能力の低い端末装置４に頼ることなく、高速な発音評定処理を行える。 As described above, according to the present embodiment, in the multi-client pronunciation rating system, the pronunciation rating service can be provided to the user without installing the program that implements the algorithm for forming the pronunciation rating core in the terminal device 4. The risk of knowing the algorithm that forms the core of pronunciation rating to the outside can be reduced. Alternatively, according to the pronunciation rating server device 2 according to the present invention, high-speed pronunciation rating processing can be performed without relying on the terminal device 4 having a low processing capability.

また、本実施の形態によれば、異なる端末装置４の仕様に対応しつつ、複数の端末装置４に対して、発音評定サービスを提供できる。 In addition, according to the present embodiment, it is possible to provide a pronunciation rating service to a plurality of terminal devices 4 while corresponding to specifications of different terminal devices 4.

また、本実施の形態によれば、ユーザ属性値に応じた、精度の高い発音評定を行える。 Further, according to the present embodiment, it is possible to perform pronunciation evaluation with high accuracy according to the user attribute value.

なお、本実施の形態の発音評定サーバ装置１は、ミドルウェア部２３が動作する装置と、評定結果取得部２４が動作する装置とが分離されていることは好適である。かかる場合の発音評定システムＢのブロック図は、図１３である。発音評定システムＢは、ミドルウェア装置Ｂ１、発音評定装置Ｂ２、および端末装置４を具備する。ミドルウェア装置Ｂ１は、例えば、受信部１２、ミドルウェア部２３、および送信部１５を具備する。発音評定装置Ｂ２は、例えば、格納部１１、評定結果取得部２４、蓄積部１６、および処理部１７を具備する。そして、ミドルウェア装置Ｂ１と発音評定装置Ｂ２とは、通常、ＬＡＮ等で接続されており、発音評定装置Ｂ２は外部からアクセスできない状況である。 In the pronunciation rating server device 1 according to the present embodiment, it is preferable that the device in which the middleware unit 23 operates and the device in which the rating result acquisition unit 24 operate are separated. FIG. 13 is a block diagram of the pronunciation rating system B in such a case. The pronunciation rating system B includes a middleware device B1, a pronunciation rating device B2, and a terminal device 4. The middleware device B1 includes, for example, a reception unit 12, a middleware unit 23, and a transmission unit 15. The pronunciation rating device B2 includes, for example, a storage unit 11, a rating result acquisition unit 24, a storage unit 16, and a processing unit 17. The middleware device B1 and the pronunciation rating device B2 are normally connected via a LAN or the like, and the pronunciation rating device B2 cannot be accessed from the outside.

なお、図１３の発音評定システムＢは、ミドルウェア部２３と評定結果取得部２４との分離態様の一例である。ミドルウェア部２３と評定結果取得部２４とが異なる装置に存在すれば良い。つまり、ミドルウェア装置Ｂ１は、受信部１２とミドルウェア部２３のみを具備し、発音評定装置Ｂ２は、格納部１１、評定結果取得部２４、送信部１５、蓄積部１６、および処理部１７を具備しても良い。また、ミドルウェア装置Ｂ１は、格納部１１、受信部１２、ミドルウェア部２３、送信部１５、蓄積部１６、処理部１７を具備し、発音評定装置Ｂ２は、評定結果取得部２４のみを具備しても良い。 Note that the pronunciation rating system B in FIG. 13 is an example of a separation mode of the middleware unit 23 and the rating result acquisition unit 24. The middleware part 23 and the evaluation result acquisition part 24 should just exist in a different apparatus. That is, the middleware device B1 includes only the reception unit 12 and the middleware unit 23, and the pronunciation rating device B2 includes the storage unit 11, the rating result acquisition unit 24, the transmission unit 15, the storage unit 16, and the processing unit 17. May be. The middleware device B1 includes a storage unit 11, a receiving unit 12, a middleware unit 23, a transmission unit 15, a storage unit 16, and a processing unit 17. The pronunciation rating device B2 includes only a rating result acquisition unit 24. Also good.

さらに、本実施の形態における発音評定サーバ装置２を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータを指示された発話内容に対してユーザが発声した音声を受け付け、発音評定のための第一の部分処理である第一部分処理を前記音声に対して行った結果である音声関連データを端末装置から受信する受信部と、前記第一部分処理の後に行うべき発音評定のための第二の部分処理である第二部分処理を前記音声関連データに対して行い、評定結果を取得する評定結果取得部と、前記評定結果を前記端末装置に送信する送信部として機能させるためのプログラムである。 Furthermore, the software that implements the pronunciation rating server device 2 in the present embodiment is the following program. In other words, this program is a result of accepting the voice uttered by the user for the utterance content instructed by the computer and performing the first partial process, which is the first partial process for pronunciation evaluation, on the voice. A receiving unit that receives voice-related data from the terminal device, and a second partial process that is a second partial process for pronunciation evaluation to be performed after the first partial process is performed on the voice-related data, and a rating result is obtained. It is a program for functioning as a rating result acquisition unit to be acquired and a transmission unit for transmitting the rating result to the terminal device.

上記プログラムにおいて、前記受信部は、２以上の端末装置から音声関連データを受信し、前記評定結果取得部は、前記受信部が受信した２以上の各音声関連データに対して、並行して第二部分処理を行い、２以上の評定結果を取得するものとして、コンピュータをさらに機能させるプログラムであることは好適である。 In the program, the receiving unit receives voice-related data from two or more terminal devices, and the rating result acquisition unit performs parallel processing on each of the two or more voice-related data received by the receiving unit. It is preferable that the program further functions as a computer that performs two-part processing and obtains two or more evaluation results.

上記プログラムにおいて、前記受信部は、２以上の各端末装置が異なる第一部分処理を行った結果である、異なる種類の音声関連データを、２以上の端末装置から受信し、前記評定結果取得部は、前記異なる種類の各音声関連データに対して、異なる第二部分処理を行い、評定結果を取得するものとして、コンピュータをさらに機能させるプログラムであることは好適である。 In the above program, the receiving unit receives different types of audio-related data from two or more terminal devices, which is a result of different first partial processes performed by two or more terminal devices, and the rating result acquisition unit includes: It is preferable that the program further causes the computer to function as performing different second partial processing on each of the different types of audio-related data and obtaining the evaluation result.

コンピュータがアクセス可能な記憶媒体は、１以上の音素毎の音響モデルである２以上の教師データを格納している格納部を具備し、上記プログラムにおいて、前記評定結果取得部は、前記ユーザの属性値であるユーザ属性値に応じて、異なる教師データを用いて、前記第一部分処理の後に行うべき発音評定のための第二の部分処理である第二部分処理を前記音声関連データに対して行い、評定結果を取得するものとして、コンピュータをさらに機能させるプログラムであることは好適である。 The computer-accessible storage medium includes a storage unit that stores two or more teacher data that are acoustic models of one or more phonemes. In the program, the evaluation result acquisition unit includes the attribute of the user A second partial process, which is a second partial process for pronunciation evaluation that should be performed after the first partial process, is performed on the speech-related data using different teacher data according to the user attribute value that is a value. It is preferable that the program is a program that further functions a computer as a means for obtaining the evaluation result.

（実施の形態３）
本実施の形態において、実施の形態２との相違点は、発音評定システムが複数の発音評定サーバ装置を具備する点である。つまり、本実施の形態において、マルチクライアント、マルチサーバである。 (Embodiment 3)
In the present embodiment, the difference from the second embodiment is that the pronunciation rating system includes a plurality of pronunciation rating server devices. That is, in this embodiment, it is a multi-client and a multi-server.

本実施の形態において、複数の発音評定サーバ装置の負荷が分散されるような調整機能を有する発音評定システムについて説明する。 In this embodiment, a pronunciation rating system having an adjustment function that distributes the load of a plurality of pronunciation rating server devices will be described.

図１４は、本実施の形態における発音評定システムＣの概念図である。発音評定システムＣは、２以上の発音評定サーバ装置３、２以上の端末装置４を備える。 FIG. 14 is a conceptual diagram of the pronunciation rating system C in the present embodiment. The pronunciation rating system C includes two or more pronunciation rating server devices 3 and two or more terminal devices 4.

なお、２以上の発音評定サーバ装置３のうち、一の発音評定サーバ装置３のみがミドルウェア処理を行えることは好適である。かかる場合、ミドルウェア処理を行える発音評定サーバ装置は３Ａ、ミドルウェア処理を行えない発音評定サーバ装置は３Ｂとする。かかる場合、発音評定サーバ装置３Ｂは、インターネット等のネットワークに接続されておらず、外部の装置からアクセスできないことは好適である。 Of the two or more pronunciation rating server devices 3, it is preferable that only one pronunciation rating server device 3 can perform middleware processing. In this case, it is assumed that the pronunciation rating server apparatus that can perform middleware processing is 3A, and the pronunciation rating server apparatus that cannot perform middleware processing is 3B. In such a case, it is preferable that the pronunciation rating server device 3B is not connected to a network such as the Internet and cannot be accessed from an external device.

図１５は、本実施の形態における発音評定システムＣのブロック図である。２以上の発音評定サーバ装置３のうち、一の発音評定サーバ装置３のみがミドルウェア部３３を備えることは好適である。また、ミドルウェア部３３を備える発音評定サーバ装置３Ａは、インターネット等の通信網により、２以上の端末装置４と通信可能であり、他の発音評定サーバ装置３Ｂは、２以上の端末装置４と、直接的に通信可能でないことは好適である。また、ミドルウェア部３３を備える一の発音評定サーバ装置３Ａと、他の一の発音評定サーバ装置３Ｂとは、ＬＡＮ等により、相互に通信可能であることは好適である。 FIG. 15 is a block diagram of the pronunciation rating system C in the present embodiment. Of the two or more pronunciation rating server devices 3, it is preferable that only one pronunciation rating server device 3 includes the middleware unit 33. The pronunciation rating server device 3A including the middleware unit 33 can communicate with two or more terminal devices 4 via a communication network such as the Internet, and the other pronunciation rating server device 3B includes two or more terminal devices 4; It is preferable that direct communication is not possible. In addition, it is preferable that one pronunciation rating server apparatus 3A including the middleware unit 33 and the other one pronunciation rating server apparatus 3B can communicate with each other via a LAN or the like.

発音評定サーバ装置３Ａは、格納部３１、受信部２２、ミドルウェア部３３、評定結果取得部２４、送信部１５、蓄積部１６、および処理部１７を備える。ミドルウェア部３３は、データ変更手段３３１、調整手段３３２、送付手段３３３、および受付手段３３４を備える。 The pronunciation rating server device 3A includes a storage unit 31, a reception unit 22, a middleware unit 33, a rating result acquisition unit 24, a transmission unit 15, a storage unit 16, and a processing unit 17. The middleware unit 33 includes a data changing unit 331, an adjusting unit 332, a sending unit 333, and a receiving unit 334.

発音評定サーバ装置３Ｂは、格納部１１、受信部３２、評定結果取得部２４、送信部３５、蓄積部１６、および処理部１７を備える。なお、発音評定サーバ装置３Ｂは、格納部１１、受信部３２、評定結果取得部２４、および送信部３５のみで構成されても良い。 The pronunciation rating server device 3B includes a storage unit 11, a receiving unit 32, a rating result acquisition unit 24, a transmission unit 35, a storage unit 16, and a processing unit 17. Note that the pronunciation rating server device 3B may include only the storage unit 11, the receiving unit 32, the rating result acquiring unit 24, and the transmitting unit 35.

発音評定サーバ装置３Ａを構成する格納部３１は、各種の情報を格納し得る。各種の情報とは、例えば、１以上の負荷情報である。負荷情報は、装置ごとの負荷に関する情報であり、例えば、発音評定サーバ装置の装置識別子、処理中の発音評定のプロセス数を有する。また、負荷情報は、例えば、発音評定サーバ装置の装置識別子、処理中の発音評定のプロセス数、並行して処理可能なプロセス数を有する。 The storage unit 31 constituting the pronunciation rating server device 3A can store various types of information. The various information is, for example, one or more pieces of load information. The load information is information regarding the load for each device, and includes, for example, the device identifier of the pronunciation rating server device and the number of pronunciation rating processes being processed. The load information includes, for example, the device identifier of the pronunciation rating server device, the number of processes of pronunciation evaluation being processed, and the number of processes that can be processed in parallel.

また、各種の情報とは、例えば、１以上の発音評定情報、１以上のユーザ情報、１以上の発話内容データ、１以上の発話内容識別子等である。 The various information includes, for example, one or more pronunciation rating information, one or more user information, one or more utterance content data, one or more utterance content identifiers, and the like.

ミドルウェア部３３は、データ変更手段３３１、調整手段３３２、送付手段３３３、および受付手段３３４が行う処理を実施する。 The middleware unit 33 performs processing performed by the data changing unit 331, the adjusting unit 332, the sending unit 333, and the receiving unit 334.

データ変更手段３３１は、受信部２２が受信したデータであり、音声関連データを含む受信データを、評定結果取得部２４が第二部分処理を行えるデータに変更する。なお、かかる処理を、ミドルウェアの第一処理という。また、ミドルウェアの第一処理は、例えば、暗号化されたデータの復号化、圧縮されたデータの解凍、受信部２２が受信したデータの構造変換、受信部１２が受信したデータから発音評定に無関係な情報の除去などであり、通常、端末装置４との通信に利用される通信プロトコルに対応する処理を含む。また、通信プロトコルとは、例えば、ＨＴＴＰ、ＴＣＰ／ＩＰ等である。また、ミドルウェア部３３は、通常、発音評定そのものに関する処理は行わない。 The data changing unit 331 is data received by the receiving unit 22 and changes received data including voice-related data to data that can be subjected to the second partial processing by the evaluation result acquisition unit 24. Such processing is referred to as middleware first processing. Further, the first processing of the middleware is not related to, for example, decryption of encrypted data, decompression of compressed data, structural conversion of data received by the receiving unit 22, and pronunciation evaluation from data received by the receiving unit 12. And the like, and usually includes processing corresponding to a communication protocol used for communication with the terminal device 4. The communication protocol is, for example, HTTP, TCP / IP, or the like. Further, the middleware unit 33 normally does not perform processing related to the pronunciation evaluation itself.

データ変更手段３３１は、評定結果取得部２４が取得した評定結果を、送信部１５が送信できるデータに変更する。なお、かかる処理を、ミドルウェアの第二処理という。また、ミドルウェアの第二処理は、例えば、データの暗号化、データの圧縮、データの構造変換などであり、端末装置４との通信に利用される通信プロトコルに対応する処理である。 The data changing unit 331 changes the evaluation result acquired by the evaluation result acquisition unit 24 to data that can be transmitted by the transmission unit 15. Such processing is referred to as second middleware processing. Further, the second process of the middleware is, for example, data encryption, data compression, data structure conversion, and the like, and is a process corresponding to a communication protocol used for communication with the terminal device 4.

調整手段３３２は、受信部２２が受信した２以上の各音声関連データに対して、第二部分処理を行う発音評定サーバ装置３を決定する。かかる処理は、２以上の発音評定サーバ装置３の負荷を調整するための処理である。 The adjustment unit 332 determines the pronunciation rating server device 3 that performs the second partial processing on each of the two or more audio-related data received by the receiving unit 22. Such a process is a process for adjusting the load of two or more pronunciation rating server devices 3.

調整手段３３２は、格納部１１に格納されている１以上の負荷情報を参照し、第二部分処理を行う発音評定サーバ装置３を決定する。調整手段３３２は、例えば、格納部１１の１以上の負荷情報を参照し、最もプロセス数が少ない装置の装置識別子を取得する。また、調整手段３３２は、例えば、格納部１１の１以上の負荷情報を順に参照し、一つのプロセスでも処理可能な装置の装置識別子を取得する。また、調整手段３３２は、例えば、格納部１１の１以上の負荷情報を順に参照し、「処理中の発音評定のプロセス数／並行して処理可能なプロセス数」の値が最小の装置の装置識別子を取得する。 The adjusting unit 332 refers to one or more pieces of load information stored in the storage unit 11, and determines the pronunciation rating server device 3 that performs the second partial process. For example, the adjustment unit 332 refers to one or more pieces of load information in the storage unit 11 and acquires the device identifier of the device having the smallest number of processes. For example, the adjustment unit 332 refers to one or more pieces of load information in the storage unit 11 in order, and acquires a device identifier of a device that can be processed by one process. Further, for example, the adjustment unit 332 refers to one or more pieces of load information in the storage unit 11 in order, and the device of the device having the smallest value of “number of pronunciation evaluation processes being processed / number of processes that can be processed in parallel” Get an identifier.

送付手段３３３は、調整手段３３２が決定した発音評定サーバ装置３に対して、音声関連データを送付する。調整手段３３２が決定した発音評定サーバ装置３とは、例えば、調整手段３３２が取得した装置識別子に対応する発音評定サーバ装置３である。 The sending means 333 sends voice related data to the pronunciation rating server device 3 determined by the adjusting means 332. The pronunciation rating server device 3 determined by the adjusting unit 332 is, for example, the pronunciation rating server device 3 corresponding to the device identifier acquired by the adjusting unit 332.

受付手段３３４は、調整手段３３２が決定した発音評定サーバ装置３が取得した評定結果を受け付ける。 The accepting unit 334 accepts the rating result acquired by the pronunciation rating server device 3 determined by the adjusting unit 332.

発音評定サーバ装置３Ｂを構成する受信部３２は、ミドルウェア部３３を備える一の発音評定サーバ装置３Ａから音声関連データ等を受信する。 The receiving unit 32 constituting the pronunciation rating server device 3B receives voice-related data and the like from one pronunciation rating server device 3A including the middleware unit 33.

評定結果取得部３４は、発音評定サーバ装置３Ａのミドルウェア部３３が変更した後の音声関連データに対して第二部分処理を行い、評定結果を取得する。 The rating result acquisition unit 34 performs second partial processing on the voice-related data after the middleware unit 33 of the pronunciation rating server device 3A has changed, and acquires the rating result.

評定結果取得部３４は、受信部３２が受信した２以上の各音声関連データに対して、並行して第二部分処理を行い、２以上の評定結果を取得する。例えば、受信部３２が音声関連データを受信した場合に、評定結果取得部３４は、プロセスをフォークする、スレッドを立ち上げる等の処理により、並行して第二部分処理を行っても良い。かかる処理は、評定結果取得部２４と同様である。 The rating result acquisition unit 34 performs second partial processing on the two or more audio-related data received by the receiving unit 32 in parallel, and acquires two or more rating results. For example, when the reception unit 32 receives voice-related data, the evaluation result acquisition unit 34 may perform the second partial processing in parallel by processing such as forking a process or starting up a thread. This process is the same as that of the evaluation result acquisition unit 24.

評定結果取得部３４は、異なる種類の各音声関連データに対して、異なる第二部分処理を行い、評定結果を取得しても良い。かかる処理は、評定結果取得部２４と同様である。 The rating result acquisition unit 34 may perform different second partial processing on different types of audio-related data and acquire the rating result. This process is the same as that of the evaluation result acquisition unit 24.

評定結果取得部３４は、ユーザの属性値であるユーザ属性値に応じて、異なる教師データを用いて、第二部分処理を音声関連データに対して行い、評定結果を取得しても良い。かかる処理は、評定結果取得部２４と同様である。 The evaluation result acquisition unit 34 may acquire the evaluation result by performing the second partial process on the voice-related data using different teacher data according to the user attribute value that is the user attribute value. This process is the same as that of the evaluation result acquisition unit 24.

送信部３５は、評定結果を発音評定サーバ装置３Ａに送信する。なお、送信部１５は、受付手段３３４が受信した評定結果を端末装置４に送信する。 The transmission unit 35 transmits the rating result to the pronunciation rating server device 3A. The transmission unit 15 transmits the evaluation result received by the receiving unit 334 to the terminal device 4.

データ変更手段３３１、調整手段３３２、評定結果取得部３４は、通常、ＭＰＵやメモリ等から実現され得る。ミドルウェア部３３等の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The data changing unit 331, the adjusting unit 332, and the evaluation result acquisition unit 34 can be usually realized by an MPU, a memory, or the like. The processing procedure of the middleware unit 33 and the like is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

送信部３５、送付手段３３３は、通常、無線または有線の通信手段で実現されるが、放送手段で実現されても良い。 The transmission unit 35 and the sending unit 333 are usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.

受付手段３３４は、通常、無線または有線の通信手段で実現されるが、放送を受信する段で実現されても良い。 The accepting unit 334 is usually realized by a wireless or wired communication unit, but may be realized at a stage of receiving a broadcast.

次に、発音評定システムＣの動作について説明する。まず、発音評定サーバ装置３Ａの動作について、図１６のフローチャートを用いて説明する。図１６のフローチャートにおいて、図３のフローチャートと同一のステップについて説明を省略する。 Next, the operation of the pronunciation rating system C will be described. First, the operation of the pronunciation rating server device 3A will be described using the flowchart of FIG. In the flowchart of FIG. 16, the description of the same steps as those in the flowchart of FIG. 3 is omitted.

（ステップＳ１６０１）調整手段３３２は、格納部１１の負荷情報を参照して、第二部分処理を行う発音評定サーバ装置３を決定する。なお、調整手段３３２が発音評定サーバ装置３を決定するアルゴリズムは種々あり得る。 (Step S1601) The adjustment unit 332 refers to the load information in the storage unit 11, and determines the pronunciation rating server device 3 that performs the second partial process. There may be various algorithms for the adjustment means 332 to determine the pronunciation rating server device 3.

（ステップＳ１６０２）調整手段３３２は、ステップＳ１６０１で決定した装置に対応する負荷情報の情報を更新する。例えば、調整手段３３２は、ステップＳ１６０１で取得した装置識別子と対になるプロセス数を１、インクリメントする。 (Step S1602) The adjusting unit 332 updates the information on the load information corresponding to the device determined in Step S1601. For example, the adjustment unit 332 increments the number of processes paired with the device identifier acquired in step S1601 by one.

（ステップＳ１６０３）送付手段３３３は、ステップＳ１６０１で決定された装置が、自装置（発音評定サーバ装置３Ａ）であるか否かを判断する。自装置であればステップＳ３０３に行き、自装置でなければステップＳ１６０５に行く。 (Step S1603) The sending means 333 determines whether or not the device determined in step S1601 is its own device (the pronunciation rating server device 3A). If it is its own device, it goes to step S303, and if it is not its own device, it goes to step S1605.

（ステップＳ１６０４）調整手段３３２は、第二部分処理を行った装置に対応する負荷情報の情報を更新する。調整手段３３２は、例えば、第二部分処理を行った装置に対応する装置識別子と対になるプロセス数を１、デクリメントする。 (Step S1604) The adjustment unit 332 updates the information of the load information corresponding to the device that has performed the second partial process. For example, the adjustment unit 332 decrements the number of processes that are paired with the device identifier corresponding to the device that has performed the second partial process by one.

（ステップＳ１６０５）送付手段３３３は、ステップＳ１６０１で決定された装置に、ミドルウェアにより処理された音声関連データ等を送信する。 (Step S1605) The sending means 333 transmits voice-related data and the like processed by the middleware to the apparatus determined in Step S1601.

（ステップＳ１６０６）受付手段３３４は、調整手段３３２が決定した発音評定サーバ装置３Ｂが取得した評定結果を、当該発音評定サーバ装置３Ｂから受信したか否かを判断する。受信した場合はステップＳ１６０４に行き、受信しない場合はステップＳ１６０６に戻る。 (Step S1606) The accepting unit 334 determines whether or not the rating result acquired by the pronunciation rating server device 3B determined by the adjusting unit 332 has been received from the pronunciation rating server device 3B. If received, go to step S1604; otherwise, return to step S1606.

なお、図１６のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 In the flowchart of FIG. 16, the process is terminated by power-off or a process termination interrupt.

次に、発音評定サーバ装置３Ｂの動作について、図１７のフローチャートを用いて説明する。図１７のフローチャートにおいて、図３のフローチャートと同一のステップについて説明を省略する。 Next, the operation of the pronunciation rating server device 3B will be described using the flowchart of FIG. In the flowchart of FIG. 17, description of the same steps as those in the flowchart of FIG. 3 is omitted.

（ステップＳ１７０１）受信部３２は、発音評定サーバ装置３Ａから音声関連データ等を受信したか否かを判断する。受信した場合はステップＳ１７０２に行き、受信しない場合はステップＳ１７０１に戻る。なお、音声関連データ等とは、例えば、音声関連データ、およびユーザ識別子等である。 (Step S1701) The receiving unit 32 determines whether voice related data or the like has been received from the pronunciation rating server device 3A. If received, the process goes to step S1702, and if not received, the process returns to step S1701. Note that the voice-related data and the like are, for example, voice-related data and a user identifier.

（ステップＳ１７０２）送信部３５は、評定結果等を発音評定サーバ装置３Ａに送信する。なお、評定結果等とは、例えば、評定結果、ユーザ識別子、および装置識別子等である。 (Step S1702) The transmission unit 35 transmits a rating result or the like to the pronunciation rating server device 3A. Note that the rating result and the like are, for example, a rating result, a user identifier, a device identifier, and the like.

なお、図１７のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 In the flowchart of FIG. 17, the process ends when the power is turned off or the process ends.

また、発音評定サーバ装置３Ａと発音評定サーバ装置３Ｂとの処理の分担方法は、他の分担方法でも良い。例えば、図１６、図１７のフローチャートにおいて、１以上の発音評定情報は発音評定サーバ装置３Ａで集中管理された。しかし、発音評定サーバ装置３Ａと１以上の発音評定サーバ装置３Ｂに、分散管理されても良い。また、発音評定サーバ装置３Ａは、第二部分処理を行わなくても良い。 Further, the sharing method of the pronunciation rating server device 3A and the pronunciation rating server device 3B may be another sharing method. For example, in the flowcharts of FIGS. 16 and 17, one or more pronunciation rating information is centrally managed by the pronunciation rating server device 3A. However, the phonetic rating server device 3A and one or more phonetic rating server devices 3B may be distributed and managed. In addition, the pronunciation rating server device 3A may not perform the second partial process.

以下、本実施の形態における発音評定システムＣの具体的な動作について説明する。発音評定システムＣの概念図は図１４である。 Hereinafter, a specific operation of the pronunciation rating system C in the present embodiment will be described. A conceptual diagram of the pronunciation rating system C is shown in FIG.

今、発音評定サーバ装置３Ａは、格納部１１、受信部２２、ミドルウェア部３３、評定結果取得部２４、送信部１５、蓄積部１６、および処理部１７を具備する、とする。また、発音評定サーバ装置３Ｂは、受信部３２、評定結果取得部２４、および送信部３５のみを具備する、とする。また、発音評定システムＣは、一つの発音評定サーバ装置３Ａ（装置識別子「ｄ０１」で識別される装置）、２つの発音評定サーバ装置３Ｂ（装置識別子「ｄ０２」「ｄ０３」で識別される装置）を具備し、多数の端末装置４が、発音評定サービスを受ける、とする。 Now, it is assumed that the pronunciation rating server device 3A includes a storage unit 11, a receiving unit 22, a middleware unit 33, a rating result acquisition unit 24, a transmission unit 15, a storage unit 16, and a processing unit 17. Further, the pronunciation rating server device 3B includes only the receiving unit 32, the rating result acquiring unit 24, and the transmitting unit 35. Further, the pronunciation rating system C includes one pronunciation rating server device 3A (device identified by the device identifier “d01”) and two pronunciation rating server devices 3B (devices identified by the device identifiers “d02” and “d03”). And a large number of terminal devices 4 receive the pronunciation rating service.

そして、発音評定サーバ装置３Ａの格納部１１は、図１８に示す負荷情報管理表を保持している、とする。負荷情報管理表は、１以上の負荷情報を管理している。負荷情報は、「ＩＤ」「装置識別子」「現プロセス数」「最大プロセス数」を有する。「ＩＤ」はレコードを識別する情報である。「装置識別子」は発音評定サーバ装置３を識別する情報である。「現プロセス数」は各発音評定サーバ装置３において、現在、処理中の発音評定プロセスの数である。「最大プロセス数」は各発音評定サーバ装置３において、並行して処理可能な発音評定プロセスの最大数である。なお、負荷情報は、「最大プロセス数」を有しなくても良い。 It is assumed that the storage unit 11 of the pronunciation rating server device 3A holds the load information management table shown in FIG. The load information management table manages one or more pieces of load information. The load information includes “ID”, “device identifier”, “current process number”, and “maximum process number”. “ID” is information for identifying a record. The “device identifier” is information for identifying the pronunciation rating server device 3. “Number of current processes” is the number of pronunciation rating processes currently being processed in each pronunciation rating server device 3. The “maximum number of processes” is the maximum number of pronunciation rating processes that can be processed in parallel in each pronunciation rating server device 3. Note that the load information may not include the “maximum number of processes”.

また、格納部１１は、図１０に示すユーザ情報管理表を保持している、とする。 Further, it is assumed that the storage unit 11 holds the user information management table shown in FIG.

さらに、ユーザＡの端末装置４およびユーザＢの端末装置４には、図１２に示す発音練習のリスト（上級者向けリスト）が格納されている、とする。発音練習のリストは、発話内容データを識別する発話内容識別子の集合である。 Furthermore, it is assumed that the terminal device 4 of user A and the terminal device 4 of user B store the pronunciation practice list (list for advanced users) shown in FIG. The pronunciation practice list is a set of utterance content identifiers for identifying utterance content data.

かかる状況において、図１８の負荷情報管理表が示すように、装置識別子「ｄ０１」で識別される発音評定サーバ装置３Ａには、現在、５つの発音評定プロセスが実行されている。また、装置識別子「ｄ０２」「ｄ０３」で識別される２つの各発音評定サーバ装置３Ｂには、現在、２つの発音評定プロセスが実行されている。 In this situation, as shown in the load information management table of FIG. 18, five pronunciation rating processes are currently executed in the pronunciation rating server apparatus 3A identified by the apparatus identifier “d01”. In addition, two pronunciation rating processes are currently being executed in each of the two pronunciation rating server devices 3B identified by the device identifiers “d02” and “d03”.

そして、かかる状況において、ユーザＡは、自分の端末装置４に対して、動作指示を入力した、とする。次に、端末受付部４２は、ユーザＡからの動作指示を受け付ける。 In this situation, it is assumed that the user A has input an operation instruction to his / her terminal device 4. Next, the terminal reception unit 42 receives an operation instruction from the user A.

すると、端末受付部４２は、ユーザから音声を受け付ける。次に、端末第一部分処理部４３は、受け付けられた音声をデジタル化し、音声関連データである音声データを取得する。次に、端末ミドルウェア部４４は、音声データに対して端末ミドルウェアの第一処理を実行する。ここでは、端末ミドルウェア部４４は、音声データにユーザ識別子「１００１」と発話内容識別子「５３」とを付加し、暗号化、および圧縮等し、送信する音声関連データ等を取得した、とする。 Then, the terminal reception part 42 receives a voice from a user. Next, the terminal first partial processing unit 43 digitizes the received voice and acquires voice data that is voice-related data. Next, the terminal middleware unit 44 executes a first process of terminal middleware on the audio data. Here, it is assumed that the terminal middleware unit 44 adds the user identifier “1001” and the utterance content identifier “53” to the voice data, performs encryption and compression, and acquires voice-related data to be transmitted.

次に、ユーザＡの端末装置４の端末送受信部４５は、取得された音声関連データ等を発音評定サーバ装置３Ａに送信する。 Next, the terminal transmission / reception unit 45 of the terminal device 4 of the user A transmits the acquired voice related data and the like to the pronunciation rating server device 3A.

次に、発音評定サーバ装置３Ａの受信部２２は、音声関連データ等をユーザＡの端末装置４から受信する。 Next, the receiving unit 22 of the pronunciation rating server device 3A receives voice-related data and the like from the terminal device 4 of the user A.

そして、ミドルウェア部３３のデータ変更手段３３１は、受信された音声関連データ等を発音評定できる構造にデータ変更する。 Then, the data changing unit 331 of the middleware unit 33 changes the data so that the received voice related data and the like can be pronounced.

次に、調整手段３３２は、データ変更された音声関連データ等に対して、第二部分処理を行う発音評定サーバ装置３を決定する。つまり、調整手段３３２は、図１８に示す負荷情報管理表を参照し、「現プロセス数／最大プロセス数」の値が最小の装置の装置識別子「ｄ０２」を取得する。なお、ここでは、「現プロセス数／最大プロセス数」の値が最小の装置の装置識別子は「ｄ０２」「ｄ０３」の２つであるが、最初の「ｄ０２」を取得した、とする。 Next, the adjusting unit 332 determines the pronunciation rating server device 3 that performs the second partial process on the voice-related data and the like whose data has been changed. That is, the adjustment unit 332 refers to the load information management table shown in FIG. 18 and acquires the device identifier “d02” of the device having the smallest value of “current process number / maximum process number”. Here, it is assumed that the device identifier of the device with the smallest value of “current process number / maximum process number” is “d02” and “d03”, but the first “d02” is acquired.

次に、送付手段３３３は、調整手段３３２が取得した装置識別子「ｄ０２」に対応する発音評定サーバ装置３Ｂに対して、データ変更された音声関連データ等を送付する。 Next, the sending unit 333 sends the voice-related data whose data has been changed to the pronunciation rating server device 3B corresponding to the device identifier “d02” acquired by the adjusting unit 332.

次に、調整手段３３２は、負荷情報を更新する。つまり、装置識別子「ｄ０２」に対応する現プロセス数を１、インクリメントし「３」とする。 Next, the adjustment unit 332 updates the load information. That is, the current process number corresponding to the device identifier “d02” is incremented by 1 to “3”.

次に、装置識別子「ｄ０２」に対応する発音評定サーバ装置３Ｂの受信部３２は、音声関連データ等を受信する。 Next, the receiving unit 32 of the pronunciation rating server device 3B corresponding to the device identifier “d02” receives voice-related data and the like.

そして、発音評定サーバ装置３Ｂの評定結果取得部３４は、受信された音声関連データ等を、格納部１１の教師データ等に適用し、評定結果を取得する。なお、かかる評定結果の取得方法は、上述した通りである。 Then, the rating result acquisition unit 34 of the pronunciation rating server device 3B applies the received voice related data or the like to the teacher data or the like of the storage unit 11 and acquires the rating result. In addition, the acquisition method of this evaluation result is as having mentioned above.

次に、発音評定サーバ装置３Ｂの送信部３５は、評定結果等を発音評定サーバ装置３Ａに送信する。 Next, the transmission unit 35 of the pronunciation rating server device 3B transmits a rating result or the like to the pronunciation rating server device 3A.

次に、発音評定サーバ装置３Ａの受付手段３３４は、評定結果等を受信する。 Next, the receiving means 334 of the pronunciation rating server device 3A receives the rating result and the like.

そして、調整手段３３２は、負荷情報を更新する。つまり、調整手段３３２は、装置識別子「ｄ０２」に対応する現プロセス数を１、デクリメントし「２」とする。 Then, the adjusting unit 332 updates the load information. That is, the adjustment unit 332 decrements the current process number corresponding to the device identifier “d02” by 1, and sets it to “2”.

次に、データ変更手段３３１は、受信された評定結果等に対して、ミドルウェアの第二処理を行う。 Next, the data changing unit 331 performs a second middleware process on the received evaluation result and the like.

次に、送信部１５は、データ変更手段３３１がミドルウェアの第二処理を行った結果のデータであり、評定結果等を含むデータを端末装置４に送信する。 Next, the transmission unit 15 transmits data including the evaluation result and the like to the terminal device 4 as data obtained as a result of the second processing of the middleware performed by the data changing unit 331.

次に、蓄積部１６は、発音評定情報を構成し、格納部１１に蓄積する。なお、かかる処理の詳細は上述した。 Next, the accumulation unit 16 configures pronunciation rating information and accumulates it in the storage unit 11. The details of this processing have been described above.

次に、端末装置４の端末送受信部４５は、評定結果を含むデータを受信する。そして、端末ミドルウェア部４４は、受信されたデータに対して、端末ミドルウェアの第二処理を行い、出力できるデータを得る。そして、端末出力部４６は、端末ミドルウェア部４４が変換して得た評定結果を含むデータを出力する。 Next, the terminal transmission / reception unit 45 of the terminal device 4 receives data including the evaluation result. And the terminal middleware part 44 performs the 2nd process of terminal middleware with respect to the received data, and obtains the data which can be output. And the terminal output part 46 outputs the data containing the evaluation result obtained by the terminal middleware part 44 converting.

以上、一のユーザが発音評定サービスを享受する場合の発音評定システムＣの一具体的な動作例について説明した。 The specific operation example of the pronunciation rating system C when one user enjoys the pronunciation rating service has been described above.

以上、本実施の形態によれば、発音評定サーバ装置３の負荷分散を図ることができる。 As described above, according to the present embodiment, the load distribution of the pronunciation rating server device 3 can be achieved.

また、本実施の形態によれば、発音評定のコアを形成するアルゴリズムを外部に知られるリスクを低減できる。または、本発明による発音評定サーバ装置によれば、高速な発音評定処理を行える。 In addition, according to the present embodiment, it is possible to reduce the risk of knowing the algorithm for forming the pronunciation evaluation core. Alternatively, the pronunciation rating server device according to the present invention can perform high-speed pronunciation rating processing.

なお、本実施の形態の発音評定サーバ装置３Ａは、評定結果取得部２４を有さないことは好適である。つまり、発音評定サーバ装置３Ａは、第二部分処理は行わず、ミドルウェアの処理、発音評定情報の管理等の処理を行うことは好適である。かかる場合、発音評定サーバ装置３Ａは、格納部３１、受信部２２、ミドルウェア部３３、送信部１５、蓄積部１６、および処理部１７を備える。 It is preferable that the pronunciation rating server device 3A of the present embodiment does not have the rating result acquisition unit 24. That is, it is preferable that the pronunciation rating server device 3A does not perform the second partial process but performs the middleware process, the management of the pronunciation rating information, and the like. In such a case, the pronunciation rating server device 3 </ b> A includes a storage unit 31, a reception unit 22, a middleware unit 33, a transmission unit 15, a storage unit 16, and a processing unit 17.

さらに、本実施の形態における発音評定サーバ装置３を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータを指示された発話内容に対してユーザが発声した音声を受け付け、発音評定のための第一の部分処理である第一部分処理を前記音声に対して行った結果である音声関連データを端末装置から受信する受信部と、前記第一部分処理の後に行うべき発音評定のための第二の部分処理である第二部分処理を前記音声関連データに対して行い、評定結果を取得する評定結果取得部と、前記評定結果を前記端末装置に送信する送信部として機能させるためのプログラムである。 Furthermore, the software that implements the pronunciation rating server device 3 in the present embodiment is the following program. In other words, this program is a result of accepting the voice uttered by the user for the utterance content instructed by the computer and performing the first partial process, which is the first partial process for pronunciation evaluation, on the voice. A receiving unit that receives voice-related data from the terminal device, and a second partial process that is a second partial process for pronunciation evaluation to be performed after the first partial process is performed on the voice-related data, and a rating result is obtained. It is a program for functioning as a rating result acquisition unit to be acquired and a transmission unit for transmitting the rating result to the terminal device.

上記プログラムにおいて、前記評定結果取得部は、２以上の端末装置のうちの一部の端末装置から送信された音声関連データに対して第二部分処理を行い、評定結果を取得するものとして、コンピュータを機能させるプログラムであることは好適である。 In the above program, the rating result acquisition unit performs a second partial process on voice-related data transmitted from a part of two or more terminal devices, and acquires a rating result. It is preferable that the program is a function that functions.

上記プログラムにおいて、２以上の発音評定サーバ装置の負荷を調整するために、前記受信部が受信した２以上の各音声関連データに対して、第二部分処理を行う発音評定サーバ装置を決定する調整手段を具備するミドルウェア部として、コンピュータをさらに機能させるプログラムであることは好適である。 In the above program, in order to adjust the load of two or more pronunciation rating server devices, an adjustment for determining a pronunciation rating server device that performs second partial processing on each of the two or more audio-related data received by the receiving unit It is preferable that the middleware unit having the means is a program that further functions a computer.

上記プログラムにおいて、前記ミドルウェア部は、前記調整手段が決定した発音評定サーバ装置に対して、前記音声関連データを送付する送付手段と、前記調整手段が決定した発音評定サーバ装置が取得した評定結果を受け付ける受付手段とをさらに具備し、前記送信部は、前記受付手段が受け付けた評定結果を端末装置に送信するものとして、コンピュータを機能させるプログラムであることは好適である。 In the above-mentioned program, the middleware unit sends a rating result obtained by the pronunciation rating server apparatus determined by the adjusting means and a sending means for sending the voice related data to the pronunciation rating server apparatus determined by the adjusting means. It is preferable that the program further includes a receiving unit that receives the rating result received by the receiving unit and that transmits the rating result to the terminal device.

また、図１９は、本明細書で述べたプログラムを実行して、上述した実施の形態の発音評定サーバ装置等を実現するコンピュータの外観を示す。上述の実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムで実現され得る。図１９は、このコンピュータシステム３００の概観図であり、図２０は、コンピュータシステム３００のブロック図である。 FIG. 19 shows the external appearance of a computer that executes the program described in this specification and realizes the pronunciation rating server apparatus and the like according to the above-described embodiment. The above-described embodiments can be realized by computer hardware and a computer program executed thereon. FIG. 19 is an overview diagram of the computer system 300, and FIG. 20 is a block diagram of the computer system 300.

図１９において、コンピュータシステム３００は、ＣＤ−ＲＯＭドライブ３０１２を含むコンピュータ３０１と、キーボード３０２と、マウス３０３と、モニタ３０４と、マイク３０５とを含む。 In FIG. 19, the computer system 300 includes a computer 301 including a CD-ROM drive 3012, a keyboard 302, a mouse 303, a monitor 304, and a microphone 305.

図２０において、コンピュータ３０１は、ＣＤ−ＲＯＭドライブ３０１２に加えて、ＭＰＵ３０１３と、ＣＤ−ＲＯＭドライブ３０１２に接続されたバス３０１４と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ３０１５とに接続され、アプリケーションプログラムの命令を一時的に記憶するとともに一時記憶空間を提供するためのＲＡＭ３０１６と、アプリケーションプログラム、システムプログラム、及びデータを記憶するためのハードディスク３０１７とを含む。ここでは、図示しないが、コンピュータ３０１は、さらに、ＬＡＮへの接続を提供するネットワークカードを含んでも良い。 20, in addition to the CD-ROM drive 3012, a computer 301 is connected to an MPU 3013, a bus 3014 connected to the CD-ROM drive 3012, and a ROM 3015 for storing a program such as a bootup program. A RAM 3016 for temporarily storing application program instructions and providing a temporary storage space, and a hard disk 3017 for storing application programs, system programs, and data are included. Although not shown here, the computer 301 may further include a network card that provides connection to a LAN.

コンピュータシステム３００に、上述した実施の形態の発音評定サーバ装置等の機能を実行させるプログラムは、ＣＤ−ＲＯＭ３１０１に記憶されて、ＣＤ−ＲＯＭドライブ３０１２に挿入され、さらにハードディスク３０１７に転送されても良い。これに代えて、プログラムは、図示しないネットワークを介してコンピュータ３０１に送信され、ハードディスク３０１７に記憶されても良い。プログラムは実行の際にＲＡＭ３０１６にロードされる。プログラムは、ＣＤ−ＲＯＭ３１０１またはネットワークから直接、ロードされても良い。 A program that causes the computer system 300 to execute the functions of the pronunciation rating server device and the like of the above-described embodiment may be stored in the CD-ROM 3101, inserted into the CD-ROM drive 3012, and further transferred to the hard disk 3017. . Alternatively, the program may be transmitted to the computer 301 via a network (not shown) and stored in the hard disk 3017. The program is loaded into the RAM 3016 at the time of execution. The program may be loaded directly from the CD-ROM 3101 or the network.

プログラムは、コンピュータ３０１に、上述した実施の形態の発音評定サーバ装置等の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティープログラム等は、必ずしも含まなくても良い。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいれば良い。コンピュータシステム３００がどのように動作するかは周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS), a third-party program, or the like that causes the computer 301 to execute functions such as the pronunciation rating server device of the above-described embodiment. The program only needs to include an instruction portion that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 300 operates is well known and will not be described in detail.

なお、上記プログラムにおいて、情報を送信するステップや、情報を受信するステップなどでは、ハードウェアによって行われる処理、例えば、送信ステップにおけるモデムやインターフェースカードなどで行われる処理（ハードウェアでしか行われない処理）は含まれない。 In the above program, in the step of transmitting information, the step of receiving information, etc., processing performed by hardware, for example, processing performed by a modem or an interface card in the transmission step (only performed by hardware) Processing) is not included.

また、上記プログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

また、上記各実施の形態において、一の装置に存在する２以上の通信手段は、物理的に一の媒体で実現されても良いことは言うまでもない。 Further, in each of the above embodiments, it goes without saying that two or more communication units existing in one apparatus may be physically realized by one medium.

また、上記各実施の形態において、各処理は、単一の装置によって集中処理されることによって実現されてもよく、あるいは、複数の装置によって分散処理されることによって実現されてもよい。 In each of the above embodiments, each process may be realized by centralized processing by a single device, or may be realized by distributed processing by a plurality of devices.

本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 The present invention is not limited to the above-described embodiments, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上のように、本発明にかかる発音評定サーバ装置は、発音評定のアルゴリズムを外部に知られるリスクを低減できるという効果を有し、発音評定サーバ装置等として有用である。 As described above, the pronunciation rating server apparatus according to the present invention has an effect of reducing the risk of knowing the pronunciation rating algorithm to the outside, and is useful as a pronunciation rating server apparatus.

１、２、３発音評定サーバ装置
Ａ１、Ｂ１ミドルウェア装置
Ａ２、Ｂ２発音評定装置
４端末装置
１１、３１格納部
１２、２２、３２受信部
１３、２３、３３ミドルウェア部
１４、２４、３４評定結果取得部
１５、３５送信部
１６蓄積部
１７処理部
４１端末格納部
４２端末受付部
４３端末第一部分処理部
４４端末ミドルウェア部
４５端末送受信部
４６端末出力部
３３１データ変更手段
３３２調整手段
３３３送付手段
３３４受付手段 1, 2, 3 Pronunciation rating server device A1, B1 Middleware device A2, B2 Pronunciation rating device 4 Terminal device 11, 31 Storage unit 12, 22, 32 Receiving unit 13, 23, 33 Middleware unit 14, 24, 34 Obtaining evaluation results Unit 15, 35 Transmitter 16 Storage unit 17 Processing unit 41 Terminal storage unit 42 Terminal reception unit 43 Terminal first partial processing unit 44 Terminal middleware unit 45 Terminal transmission / reception unit 46 Terminal output unit 331 Data change unit 332 Adjustment unit 333 Transmission unit 334 Reception means

Claims

Receives speech uttered by the user for the instructed utterance content, and receives speech-related data as a result of performing the first partial processing, which is the first partial processing for pronunciation evaluation, on the speech from the terminal device A receiving unit to
A second partial process that is a second partial process for pronunciation evaluation to be performed after the first partial process is performed on the voice-related data, and a rating result acquisition unit that acquires a rating result;
A pronunciation rating server device comprising: a transmission unit that transmits the rating result to the terminal device.

An accumulator that accumulates the voice related data, the utterance content data indicating the instructed utterance content or the utterance content identifier for identifying the utterance content, and one or more pronunciation rating information in association with the rating result;
The pronunciation rating server apparatus according to claim 1, further comprising a processing unit that performs processing on the one or more pronunciation rating information.

The receiver is
The speech content identifier for identifying the speech content, or the speech content data acquired using the speech content identifier for identifying the speech content is also received,
The storage unit
The pronunciation rating server apparatus according to claim 2, wherein the voice-related data, the utterance content data or the utterance content identifier, and the rating result are stored in association with each other.

The data received by the receiving unit, the received data including voice related data is changed to data that the rating result acquisition unit can perform the second partial processing, and the rating result acquired by the rating result acquisition unit, A middleware unit for changing to data that can be transmitted by the transmission unit;
The rating result acquisition unit
The second partial processing is performed on the voice-related data after the middleware part is changed, and the evaluation result is obtained.
The transmitter is
The pronunciation rating server device according to any one of claims 1 to 3, wherein the middleware unit transmits a rating result after the change to the terminal device.

The receiver is
Data obtained as a result of processing performed by the terminal middleware unit of the terminal device, and after voice-related data that is a result of the first partial processing performed on the voice, is changed to data that can be transmitted. 5. The pronunciation rating server device according to claim 4, wherein the pronunciation rating server device receives data including voice-related data from the terminal device.

The receiver is
Receive voice-related data from two or more terminal devices,
The rating result acquisition unit
The pronunciation according to any one of claims 1 to 3, wherein a second partial process is performed in parallel with respect to each of the two or more audio-related data received by the receiving unit, and two or more evaluation results are acquired. Rating server device.

The receiver is
Two or more terminal devices receive different types of audio-related data, which are the results of different first partial processes, from two or more terminal devices,
The rating result acquisition unit
The pronunciation rating server apparatus according to claim 6, wherein a different second partial process is performed on each of the different types of voice-related data to obtain a rating result.

A storage unit storing two or more teacher data, each of which is an acoustic model of one or more phonemes;
The rating result acquisition unit
Depending on the user attribute value that is the user attribute value, the second partial process, which is a second partial process for pronunciation evaluation to be performed after the first partial process, is performed using the different teacher data. The pronunciation rating server apparatus according to claim 6, wherein the rating result is obtained by performing the processing for the pronunciation rating.

A pronunciation rating server device of two or more pronunciation rating server devices,
The rating result acquisition unit
The pronunciation according to any one of claims 6 to 8, wherein a second partial process is performed on voice-related data transmitted from a part of the two or more terminal devices, and a rating result is obtained. Rating server device.

In order to adjust the load of two or more pronunciation rating server devices, an adjusting means is provided for determining a pronunciation rating server device that performs second partial processing on each of the two or more audio-related data received by the receiving unit. The pronunciation rating server device according to claim 9, further comprising a middleware unit.

The middleware part is
Sending means for sending the voice related data to the pronunciation rating server device determined by the adjusting means;
Receiving means for receiving a rating result acquired by the pronunciation rating server device determined by the adjusting means;
The transmitter is
The pronunciation rating server device according to claim 10, wherein the rating result received by the receiving unit is transmitted to a terminal device.

The first partial process is a process of digitizing the voice and obtaining voice data, or
A process of digitizing the voice, obtaining voice data, and obtaining feature data that is a set of feature vectors from the voice data;
The second partial process acquires feature data from the speech data, calculates a posterior probability that the feature data is a phoneme to be evaluated using the feature data and teacher data, A process for calculating a rating value and obtaining a rating result from one or more rating values, or
Using the feature data and the teacher data, calculate a posterior probability that the feature data is a phoneme to be evaluated, calculate a speech rating value from the posterior probability, and obtain a rating result from one or more rating values The pronunciation rating server device according to any one of claims 1 to 11, which is a process.

A pronunciation rating method realized by a receiving unit, a rating result acquiring unit, and a transmitting unit,
Voice-related data as a result of the reception unit receiving voice uttered by the user with respect to the instructed utterance content and performing a first partial process, which is a first partial process for pronunciation evaluation, on the voice Receiving from the terminal device;
A rating result acquisition step in which the rating result acquisition unit performs a second partial process, which is a second partial process for pronunciation rating to be performed after the first partial process, on the voice-related data, and acquires a rating result When,
A pronunciation rating method comprising: a transmitting step in which the transmitting unit transmits the rating result to the terminal device.

Computer
Receives speech uttered by the user for the instructed utterance content, and receives speech-related data as a result of performing the first partial processing, which is the first partial processing for pronunciation evaluation, on the speech from the terminal device A receiving unit to
A second partial process that is a second partial process for pronunciation evaluation to be performed after the first partial process is performed on the voice-related data, and a rating result acquisition unit that acquires a rating result;
The program for functioning as a transmission part which transmits the said evaluation result to the said terminal device.