JP6314884B2

JP6314884B2 - Reading aloud evaluation device, reading aloud evaluation method, and program

Info

Publication number: JP6314884B2
Application number: JP2015062769A
Authority: JP
Inventors: 伸行浅野
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2015-03-25
Filing date: 2015-03-25
Publication date: 2018-04-25
Anticipated expiration: 2035-03-25
Also published as: JP2016183992A

Description

本発明は、話者が文を音読したときに発した音声に基づいて、文の音読に対する評価を行うシステム等の技術分野に関する。 The present invention relates to a technical field such as a system that evaluates reading of a sentence based on speech uttered when a speaker reads out a sentence.

近年、アナウンス評価や歌唱評価等の支援を目的とした様々なシステムが提案されている。例えば特許文献１には、歌唱者の歌唱音声信号から抽出された抑揚（音高）や音量などに基づいて、曲の区間別に歌唱を採点することにより、歌唱の巧拙を正しく採点評価する技術が開示されている。一方、アナウンス評価における評価項目には、抑揚、声量、滑舌、スピードなどがあるが、アナウンスでは、伝えるべき情報を伝えることが重要であり、例えば文の一部が聞き取れなかったとしても、必要な情報が聴者に伝わればアナウンスとしての意味を成していると言える。 In recent years, various systems have been proposed for the purpose of supporting announcement evaluation and singing evaluation. For example, Patent Document 1 discloses a technique for scoring and evaluating the skill of a song correctly by scoring a song for each song section based on an inflection (pitch) or volume extracted from a singer's singing voice signal. It is disclosed. On the other hand, the evaluation items in announcement evaluation include inflection, voice volume, smooth tongue, speed, etc. In announcements, it is important to convey information that should be conveyed, for example, even if part of the sentence is not heard If such information is transmitted to the listener, it can be said that it makes sense as an announcement.

特開平１０−７８７４９号公報JP-A-10-78749

しかしながら、従来のシステムでは、例えばアナウンス評価の得点化の際に、文脈上重要かを考慮しておらず、一ヶ所ミスが発生した場合は、ミスの発生箇所が文脈上重要かどうかに関わらず、一律で同じ点数しか出すことができなかった。このため、練習者側も表示される点数に対して納得感がなく、高得点を取るためには実際の重要度と関係なく、アナウンス全体を万遍なく練習する必要があった。 However, in conventional systems, for example, when scoring announcement evaluation, it is not considered whether it is important in context, and if a mistake occurs in one place, regardless of whether the location of the mistake is important in context , I could only get the same score. For this reason, the practitioner is not convinced of the displayed score, and in order to obtain a high score, it is necessary to practice the entire announcement regardless of the actual importance.

本発明は、以上の点に鑑みてなされたものであり、文脈上重要な部分かどうかに応じて評価を行うことが可能な音読評価装置、音読評価方法、及びプログラムを提供する。 The present invention has been made in view of the above points, and provides a reading aloud evaluation apparatus, a reading aloud evaluation method, and a program capable of performing an evaluation according to whether or not the part is important in context.

上記課題を解決するために、請求項１に記載の発明は、話者が１以上の文節からなり一息で話される文節のまとまりであるフレーズを複数含む文を音読したときに発した音声の波形を示す音声波形データに基づいて前記フレーズ毎に算出された得点を取得する取得手段と、前記フレーズ毎の重みを決定する第１決定手段と、前記第１決定手段により決定された前記フレーズ毎の重みに応じた配点比率を前記フレーズ毎に決定する第２決定手段と、前記取得手段により取得された前記得点を、前記第２決定手段により決定された前記配点比率により前記フレーズ毎に調整し、調整したフレーズ毎の得点に基づいて、前記複数のフレーズを含む文全体の音読に対する総得点を算出する算出手段と、を備えることを特徴とする。請求項２に記載の発明は、話者が１以上の文節からなり一息で話される文節のまとまりであるフレーズを複数含む文を音読したときに発した音声の波形を示す音声波形データに基づいて前記フレーズ毎に算出された得点を取得する第１取得手段と、前記フレーズ毎の重みを決定する第１決定手段と、前記第１決定手段により決定された前記フレーズ毎の重みに応じた配点比率を前記フレーズ毎に決定する第２決定手段と、複数の前記フレーズのうち何れかのフレーズの終了タイミングから次のフレーズの開始タイミングまでのインターバル区間において算出された間合いの得点を取得する第２取得手段と、前記第１取得手段により取得された前記得点を、前記第２決定手段により決定された前記配点比率により前記フレーズ毎に調整し、調整したフレーズ毎の得点と、前記第２取得手段により取得された間合いの得点とに基づいて、前記複数のフレーズを含む文全体の音読に対する総得点を算出する算出手段と、を備えることを特徴とする。 Speech in order to solve the above problems, the invention according to claim 1, emitted when the speaker has read aloud a plurality including statements phrase is a group of clause spoken in one breath Ri Do from one or more clauses An acquisition means for acquiring a score calculated for each phrase based on speech waveform data indicating a waveform of the phrase, a first determination means for determining a weight for each phrase, and the phrase determined by the first determination means A second deciding unit that decides, for each phrase, a scoring ratio corresponding to each weight, and the score acquired by the acquiring unit is adjusted for each phrase by the scoring ratio determined by the second determining unit And calculating means for calculating a total score for reading aloud the entire sentence including the plurality of phrases based on the adjusted score for each phrase. The invention according to claim 2 is based on speech waveform data indicating a waveform of speech uttered when a speaker reads a sentence including a plurality of phrases that are composed of one or more phrases and are spoken at a time. A first acquisition means for acquiring the score calculated for each phrase, a first determination means for determining a weight for each phrase, and a score according to the weight for each phrase determined by the first determination means A second determining means for determining a ratio for each phrase; and a second score for obtaining a score between intervals calculated from an end timing of one of the plurality of phrases to a start timing of the next phrase. Adjust the score acquired by the acquisition means and the first acquisition means for each phrase according to the scoring ratio determined by the second determination means, and adjust And calculating means for calculating a total score for reading aloud the entire sentence including the plurality of phrases based on the score for each phrase and the score of the interval acquired by the second acquisition means, To do.

請求項３に記載の発明は、請求項１または２に記載の音読評価装置において、前記文を前記複数のフレーズに区分して画面に表示させ、且つ前記複数のフレーズの中で相対的に高い重み付けがなされた前記フレーズを他のフレーズとは異なる表示態様で表示させる表示制御手段を更に備えることを特徴とする。 According to a third aspect of the present invention, in the reading aloud evaluation device according to the first or second aspect , the sentence is divided into the plurality of phrases to be displayed on a screen, and is relatively high among the plurality of phrases. It further comprises display control means for displaying the weighted phrase in a display mode different from other phrases.

請求項４に記載の発明は、請求項３に記載の音読評価装置において、前記表示制御手段は、前記フレーズ毎に取得された前記得点と、前記フレーズ毎に調整された得点との少なくとも何れか一方をそれぞれの前記フレーズに対応付けて前記画面に表示させることを特徴とする。 Invention according to claim 4, in the reading-aloud evaluation apparatus according to claim 3, wherein the display control unit, a front Symbol phrase acquired the score for each, one at least of the adjusted score for each of the phrases One of them is displayed on the screen in association with each of the phrases.

請求項５に記載の発明は、請求項１乃至４の何れか一項に記載の音読評価装置において、前記文のテキストデータを入力する第１入力手段と、前記第１入力手段により入力されたテキストデータが示す文を複数の単語に分解する分解手段と、単語の重要度を規定する参照情報を参照して、前記分解されたそれぞれの前記単語の重要度を特定する第１特定手段と、を更に備え、前記第１決定手段は、前記第１特定手段により特定された前記単語の重要度に基づいて、前記単語により構成されるフレーズの重みを前記フレーズ毎に決定することを特徴とする。 According to a fifth aspect of the present invention, in the reading aloud evaluation device according to any one of the first to fourth aspects, the first input means for inputting the text data of the sentence and the first input means are input. Decomposing means for decomposing the sentence indicated by the text data into a plurality of words; first specifying means for specifying the importance of each of the decomposed words with reference to reference information that defines the importance of the word; The first determining means determines the weight of the phrase composed of the words for each phrase based on the importance of the word specified by the first specifying means. .

請求項６に記載の発明は、請求項５に記載の音読評価装置において、前記文を音読した得点算出の基準となる音声の波形を示す基準音声波形データを入力する第２入力手段と、前記第２入力手段により入力された基準音声波形データが示す音声の波形に基づいて特定される前記フレーズの時間長を前記フレーズ毎に特定する第２特定手段と、を更に備え前記第１決定手段は、前記第１特定手段により特定された前記単語の重要度と、前記第２特定手段により特定された前記フレーズの時間長とに基づいて、前記単語により構成されるフレーズの重みを前記フレーズ毎に決定することを特徴とする。 According to a sixth aspect of the present invention, in the reading aloud evaluation apparatus according to the fifth aspect , the second input means for inputting reference voice waveform data indicating a waveform of a voice as a reference for score calculation by reading the sentence aloud, and The first determining means further comprises second specifying means for specifying, for each phrase, the time length of the phrase specified based on the waveform of the voice indicated by the reference voice waveform data input by the second input means. Based on the importance of the word specified by the first specifying means and the time length of the phrase specified by the second specifying means, the weight of the phrase composed of the words is determined for each phrase. It is characterized by determining.

請求項７に記載の発明は、話者が複数の文節を含む文を音読したときに発した音声の波形を示す音声波形データに基づいて前記文節毎に算出された得点を取得する第１取得手段と、前記文節毎の重みを決定する第１決定手段と、前記第１決定手段により決定された前記文節毎の重みに応じた配点比率を前記文節毎に決定する第２決定手段と、前記複数の文節のうち何れかの文節の終了タイミングから次の文節の開始タイミングまでのインターバル区間において算出された間合いの得点を取得する第２取得手段と、前記第１取得手段により取得された前記得点を、前記第２決定手段により決定された前記配点比率により前記文節毎に調整し、調整した文節毎の得点と、前記第２取得手段により取得された間合いの得点とに基づいて、前記複数の文節を含む文全体の音読に対する総得点を算出する算出手段と、を備えることを特徴とする。 The invention according to claim 7 is a first acquisition for acquiring a score calculated for each phrase based on voice waveform data indicating a waveform of a voice uttered when a speaker reads a sentence including a plurality of phrases aloud. means, a first determination means for determining a weight for each of the clauses, the second determining means for determining the allocated points ratio corresponding to the weight of each of the clauses which are determined by the first determining means for each of the clauses, wherein A second acquisition means for acquiring a score of the interval calculated in an interval section from the end timing of any one of the plurality of phrases to the start timing of the next phrase; and the score acquired by the first acquisition means a second adjusted by the allotted points ratio determined by the determining means for each said clause, the scores for each adjusted clause, based on the scores of Maai acquired by the second acquisition means, said plurality of Characterized in that it comprises calculating means for calculating the total score for the sentence entire reading aloud containing section a.

請求項８に記載の発明は、請求項７に記載の音読評価装置において、前記文を前記複数の文節に区分して画面に表示させ、且つ前記複数の文節の中で相対的に高い重み付けがなされた前記文節を他の文節とは異なる表示態様で表示させる表示制御手段を更に備えることを特徴とする。 According to an eighth aspect of the present invention, in the reading aloud evaluation device according to the seventh aspect , the sentence is divided into the plurality of phrases and displayed on a screen, and a relatively high weighting is given among the plurality of phrases. It is further characterized by further comprising display control means for displaying the made clause in a display mode different from other clauses.

請求項９に記載の発明は、請求項８に記載の音読評価装置において、前記表示制御手段は、前記文節毎に取得された前記得点と、前記文節毎に調整された得点との少なくとも何れか一方をそれぞれの前記文節に対応付けて前記画面に表示させることを特徴とする。 Invention according to claim 9, in reading aloud evaluation apparatus according to claim 8, wherein the display control unit, before SL and the scores obtained for each clause, either at least the adjusted score for each of the clauses One of them is displayed on the screen in association with each of the phrases.

請求項１０に記載の発明は、請求項７乃至９の何れか一項に記載の音読評価装置において、前記文のテキストデータを入力する第１入力手段と、前記第１入力手段により入力されたテキストデータが示す文を複数の単語に分解する分解手段と、単語の重要度を規定する参照情報を参照して、前記分解されたそれぞれの前記単語の重要度を特定する第１特定手段と、を更に備え、前記第１決定手段は、前記第１特定手段により特定された前記単語の重要度に基づいて、前記単語により構成される文節の重みを前記文節毎に決定することを特徴とする。 A tenth aspect of the present invention is the reading aloud evaluation device according to any one of the seventh to ninth aspects, wherein the first input means for inputting the text data of the sentence and the first input means are input. Decomposing means for decomposing the sentence indicated by the text data into a plurality of words; first specifying means for specifying the importance of each of the decomposed words with reference to reference information that defines the importance of the word; The first determining means determines, for each phrase, the weight of the phrase composed of the words based on the importance of the word specified by the first specifying means. .

請求項１１に記載の発明は、請求項１０に記載の音読評価装置において、前記文を音読した得点算出の基準となる音声の波形を示す基準音声波形データを入力する第２入力手段と、前記第２入力手段により入力された基準音声波形データが示す音声の波形に基づいて特定される前記文節の時間長を前記文節毎に特定する第２特定手段と、を更に備え、前記第１決定手段は、前記第１特定手段により特定された前記単語の重要度と、前記第２特定手段により特定された前記文節の時間長とに基づいて、前記単語により構成される文節の重みを前記文節毎に決定することを特徴とする。 The invention according to claim 11 is the reading aloud evaluation apparatus according to claim 10 , wherein the second input means for inputting reference speech waveform data indicating a waveform of a speech as a reference for score calculation by reading the sentence aloud, and Second specifying means for specifying, for each phrase, a time length of the phrase specified based on the waveform of the voice indicated by the reference voice waveform data input by the second input means, the first determining means Is based on the importance of the word specified by the first specifying means and the time length of the phrase specified by the second specifying means, and sets the weight of the phrase composed of the words for each phrase. It is characterized by determining to.

請求項１２に記載の発明は、１つ以上のコンピュータにより実行される音読評価方法であって、話者が１以上の文節からなり一息で話される文節のまとまりであるフレーズを複数含む文を音読したときに発した音声の波形を示す音声波形データに基づいて前記フレーズ毎に算出された得点を取得する取得ステップと、前記フレーズ毎の重みを決定する第１決定ステップと、前記第１決定ステップにより決定された前記フレーズ毎の重みに応じた配点比率を前記フレーズ毎に決定する第２決定ステップと、前記取得ステップにより取得された前記得点を、前記第２決定ステップにより決定された前記配点比率により前記フレーズ毎に調整し、調整したフレーズ毎の得点に基づいて、前記複数のフレーズを含む文全体の音読に対する総得点を算出する算出ステップと、を含むことを特徴とする。請求項１３に記載の発明は、１つ以上のコンピュータにより実行される音読評価方法であって、話者が１以上の文節からなり一息で話される文節のまとまりであるフレーズを複数含む文を音読したときに発した音声の波形を示す音声波形データに基づいて前記フレーズ毎に算出された得点を取得する第１取得ステップと、前記フレーズ毎の重みを決定する第１決定ステップと、前記第１決定ステップにより決定された前記フレーズ毎の重みに応じた配点比率を前記フレーズ毎に決定する第２決定ステップと、複数の前記フレーズのうち何れかのフレーズの終了タイミングから次のフレーズの開始タイミングまでのインターバル区間において算出された間合いの得点を取得する第２取得ステップと、前記第１取得ステップにより取得された前記得点を、前記第２決定ステップにより決定された前記配点比率により前記フレーズ毎に調整し、調整したフレーズ毎の得点と、前記第２取得ステップにより取得された間合いの得点とに基づいて、前記複数のフレーズを含む文全体の音読に対する総得点を算出する算出ステップと、を含むことを特徴とする。 Statement invention according to claim 12, a reading aloud evaluation method performed by one or more computers, including a plurality of phrases a collection of clause speaker spoken in one breath Ri Do from one or more clauses An acquisition step for acquiring a score calculated for each phrase based on speech waveform data indicating a waveform of a speech uttered when reading aloud, a first determination step for determining a weight for each phrase, and the first A second determination step for determining for each phrase a scoring ratio according to the weight for each phrase determined by the determination step; and the score acquired by the acquisition step, the determination by the second determination step. Adjust for each phrase by the scoring ratio, and calculate the total score for reading aloud the entire sentence including the plurality of phrases based on the adjusted score for each phrase A calculation step that, characterized in that it comprises a. The invention according to claim 13 is a reading aloud evaluation method executed by one or more computers, wherein a speaker is composed of one or more phrases and includes a plurality of phrases that are a group of phrases spoken at a breath. A first acquisition step of acquiring a score calculated for each phrase based on voice waveform data indicating a waveform of a voice generated when reading aloud; a first determination step of determining a weight for each phrase; A second determination step for determining for each phrase a scoring ratio according to the weight for each phrase determined in one determination step; and a start timing of the next phrase from an end timing of any one of the phrases The second acquisition step of acquiring the score of the interval calculated in the interval section until, and the acquired in the first acquisition step The points are adjusted for each phrase by the scoring ratio determined in the second determination step, and the plurality of points are calculated based on the adjusted score for each phrase and the score for the interval acquired in the second acquisition step. And a calculating step for calculating a total score for reading aloud the entire sentence including the phrase.

請求項１４に記載の発明は、話者が１以上の文節からなり一息で話される文節のまとまりであるフレーズを複数含む文を音読したときに発した音声の波形を示す音声波形データに基づいて前記フレーズ毎に算出された得点を取得する取得ステップと、前記フレーズ毎の重みを決定する第１決定ステップと、前記第１決定ステップにより決定された前記フレーズ毎の重みに応じた配点比率を前記フレーズ毎に決定する第２決定ステップと、前記取得ステップにより取得された前記得点を、前記第２決定ステップにより決定された前記配点比率により前記フレーズ毎に調整し、調整したフレーズ毎の得点に基づいて、前記複数のフレーズを含む文全体の音読に対する総得点を算出する算出ステップと、をコンピュータに実行させることを特徴とする。請求項１５に記載の発明は、話者が１以上の文節からなり一息で話される文節のまとまりであるフレーズを複数含む文を音読したときに発した音声の波形を示す音声波形データに基づいて前記フレーズ毎に算出された得点を取得する第１取得ステップと、前記フレーズ毎の重みを決定する第１決定ステップと、前記第１決定ステップにより決定された前記フレーズ毎の重みに応じた配点比率を前記フレーズ毎に決定する第２決定ステップと、複数の前記フレーズのうち何れかのフレーズの終了タイミングから次のフレーズの開始タイミングまでのインターバル区間において算出された間合いの得点を取得する第２取得ステップと、前記第１取得ステップにより取得された前記得点を、前記第２決定ステップにより決定された前記配点比率により前記フレーズ毎に調整し、調整したフレーズ毎の得点と、前記第２取得ステップにより取得された間合いの得点とに基づいて、前記複数のフレーズを含む文全体の音読に対する総得点を算出する算出ステップと、をコンピュータに実行させることを特徴とする。 The invention according to claim 14 is based on speech waveform data indicating a waveform of speech uttered when a speaker reads a sentence including a plurality of phrases that are composed of one or more phrases and are spoken at a time. An acquisition step for acquiring the score calculated for each phrase, a first determination step for determining a weight for each phrase, and a scoring ratio according to the weight for each phrase determined by the first determination step The second determination step determined for each phrase, and the score acquired by the acquisition step is adjusted for each phrase by the scoring ratio determined by the second determination step, and the adjusted score for each phrase And a calculation step of calculating a total score for reading aloud the entire sentence including the plurality of phrases based on the computer. . The invention according to claim 15 is based on speech waveform data indicating a waveform of speech uttered when a speaker reads a sentence including a plurality of phrases that are composed of one or more phrases and are spoken at a time. A first acquisition step for acquiring a score calculated for each phrase, a first determination step for determining a weight for each phrase, and a score according to the weight for each phrase determined by the first determination step A second determination step of determining a ratio for each of the phrases; and a second step of acquiring a score of a gap calculated in an interval section from an end timing of one of the phrases to a start timing of the next phrase The score obtained in the acquisition step and the first acquisition step is determined according to the scoring ratio determined in the second determination step. A calculation step of adjusting for each phrase and calculating a total score for aloud reading of the whole sentence including the plurality of phrases based on the score for each adjusted phrase and the score of the interval acquired by the second acquisition step And making the computer execute.

請求項１６に記載の発明は、１つ以上のコンピュータにより実行される音読評価方法であって、話者が複数の文節を含む文を音読したときに発した音声の波形を示す音声波形データに基づいて前記文節毎に算出された得点を取得する第１取得ステップと、前記文節毎の重みを決定する第１決定ステップと、前記第１決定ステップにより決定された前記文節毎の重みに応じた配点比率を前記文節毎に決定する第２決定ステップと、前記複数の文節のうち何れかの文節の終了タイミングから次の文節の開始タイミングまでのインターバル区間において算出された間合いの得点を取得する第２取得ステップと、前記第１取得ステップにより取得された前記得点を、前記第２決定ステップにより決定された前記配点比率により前記文節毎に調整し、調整した文節毎の得点と、前記第２取得ステップにより取得された間合いの得点とに基づいて、前記複数の文節を含む文全体の音読に対する総得点を算出する算出ステップと、を含むことを特徴とする。 The invention according to claim 16 is a speech reading evaluation method executed by one or more computers, wherein speech waveform data indicating a speech waveform generated when a speaker reads a sentence including a plurality of phrases aloud. A first acquisition step for acquiring a score calculated for each phrase based on the first determination step for determining a weight for each phrase, and a weight for each phrase determined by the first determination step A second determination step of determining a scoring ratio for each of the clauses, and obtaining a score of the interval calculated in an interval section from the end timing of any one of the plurality of clauses to the start timing of the next clause and second acquisition step, has been the scores obtained by the first obtaining step, by the allocated points ratio determined by said second determining step to adjust for each of the clauses, tone And scores for each clause that, based on the scores of Maai acquired by the second acquisition step, a calculation step of calculating the total score for the sentence entire reading aloud including the plurality of clauses, and characterized in that it comprises To do.

請求項１７に記載の発明は、話者が複数の文節を含む文を音読したときに発した音声の波形を示す音声波形データに基づいて前記文節毎に算出された得点を取得する第１取得ステップと、前記文節毎の重みを決定する第１決定ステップと、前記第１決定ステップにより決定された前記文節毎の重みに応じた配点比率を前記文節毎に決定する第２決定ステップと、前記複数の文節のうち何れかの文節の終了タイミングから次の文節の開始タイミングまでのインターバル区間において算出された間合いの得点を取得する第２取得ステップと、前記第１取得ステップにより取得された前記得点を、前記第２決定ステップにより決定された前記配点比率により前記文節毎に調整し、調整した文節毎の得点と、前記第２取得ステップにより取得された間合いの得点とに基づいて、前記複数の文節を含む文全体の音読に対する総得点を算出する算出ステップと、をコンピュータに実行させることを特徴とする。 The invention according to claim 17 is a first acquisition for acquiring a score calculated for each phrase based on voice waveform data indicating a waveform of a voice uttered when a speaker reads a sentence including a plurality of phrases aloud. a step, a first determining step of determining a weight for each of the clauses, and a second determination step of determining the allocated points ratio corresponding to the weight of each of the clauses which are determined by the first determining step for each of the clauses, wherein A second acquisition step of acquiring a score of the interval calculated in an interval section from the end timing of any one of the plurality of clauses to the start timing of the next clause; and the score acquired by the first acquisition step while a second was adjusted by the allotted points ratio determined by the determining step for each of the clauses, the scores for each adjusted clause, acquired by the second acquisition step Based on the had scores, characterized in that to execute a calculation step, to a computer which calculates the total score for the sentence entire reading aloud including the plurality of clauses.

請求項１，２，７，１２〜１７に記載の発明によれば、文脈上重要な部分かどうかに応じて評価を行うことができる。 According to invention of Claim 1, 2 , 7, 12-17 , it can evaluate according to whether it is an important part in context.

請求項３，８に記載の発明によれば、複数のフレーズ、文節、または単語の中で相対的に高い重み付けがなされたフレーズをユーザに一見して確認させることができる。 According to the third and eighth aspects of the present invention, it is possible to cause the user to confirm at a glance a phrase that is given a relatively high weight among a plurality of phrases, phrases, or words.

請求項４，９に記載の発明によれば、フレーズ、文節、または単語毎の得点をユーザに一見して確認させることができる。 According to the fourth and ninth aspects of the present invention, the score for each phrase, phrase, or word can be confirmed at a glance by the user.

請求項５，１０に記載の発明によれば、単語単位でフレーズまたは文節の重要度を重みに反映させることができる。 According to the fifth and tenth aspects of the present invention, it is possible to reflect the importance of the phrase or phrase in the weight in units of words.

請求項６，１１に記載の発明によれば、フレーズまたは文節の重みの適切さを高めることができる。 According to the inventions described in claims 6 and 11 , it is possible to increase the appropriateness of the weight of the phrase or phrase.

本実施形態に係る音読評価装置Ｓの概要構成例を示す図である。It is a figure which shows the example of a schematic structure of the reading aloud evaluation apparatus S which concerns on this embodiment. ある評価項目についてのフレーズ毎の得点が配点比率により調整されて文全体の音読に対する総得点が算出される例を示す概念図である。It is a conceptual diagram which shows the example by which the score for every phrase about a certain evaluation item is adjusted with a scoring ratio, and the total score with respect to the reading of the whole sentence is calculated. 図２に示す重み付けエンジン内で、フレーズを構成する複数の単語それぞれの重要度に基づいてフレーズ毎に重要度が決定される例を示す概念図である。It is a conceptual diagram which shows the example in which importance is determined for every phrase based on the importance of each of the some word which comprises a phrase within the weighting engine shown in FIG. 話者の音読に対する評価を示す情報を表示する画面例を示す図である。It is a figure which shows the example of a screen which displays the information which shows evaluation with respect to a speaker's reading aloud. 実施例１における制御部３の音読評価処理の一例を示すフローチャートである。3 is a flowchart illustrating an example of a reading aloud evaluation process performed by a control unit 3 according to the first embodiment. （Ａ）は、単語の重要度及び単語長に基づいてフレーズの重要度が決定される例を示す概念図である。（Ｂ）は、フレーズの重要度及びフレーズ長に基づいてフレーズの重み及び配点比率が決定される例を示す概念図である。(A) is a conceptual diagram which shows the example in which the importance of a phrase is determined based on the importance and the word length of a word. (B) is a conceptual diagram showing an example in which a phrase weight and a score ratio are determined based on a phrase importance and a phrase length. 実施例２における制御部３の音読評価処理の一例を示すフローチャートである。10 is a flowchart illustrating an example of a reading aloud evaluation process performed by a control unit 3 according to the second embodiment. 単語毎の得点が配点比率により調整されて文全体の音読に対する総得点が算出される例を示す図である。It is a figure which shows the example by which the score for every word is adjusted with the scoring ratio, and the total score with respect to the reading of the whole sentence is calculated.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［１.音読評価装置Ｓの構成及び機能］
初めに、図１を参照して、本発明の一実施形態に係る音読評価装置Ｓの構成及び機能について説明する。図１は、本実施形態に係る音読評価装置Ｓの概要構成例を示す図である。なお、音読評価装置の一例として、パーソナルコンピュータや、携帯型情報端末（スマートフォン等）などが挙げられる。図１に示すように、音読評価装置Ｓは、通信部１、記憶部２、制御部３、操作部４、及びインターフェース（ＩＦ）部５等を備えて構成され、これらの構成要素はバス６に接続されている。操作部４は、ユーザからの操作指示を受け付け、受け付けた操作に応じた信号を制御部３へ出力する。インターフェース部５には、マイクＭ、及びディスプレイＤ等が接続される。マイクＭは、語学学習や、アナウンス、朗読などの発声発話訓練等を行う話者が、文（文章）を音読したときに発した音声を集音する。文は、複数のフレーズ、または複数の文節を含む。フレーズは、１以上の文節からなり、一般に一息で話される文節のまとまりである。フレーズと文節は、それぞれ、１以上の単語を含む。そのため、文は、１以上の単語を含む。単語には、名詞、動詞、形容詞、副詞、及び接続詞等の自立語（単独で文節を構成できる品詞）や、助動詞及び助詞等の付属語（単独で文節を構成できない品詞）などがある。音読対象となる文の例として、例えば、語学学習またはアナウンス訓練や朗読訓練などで用いられる文章、または歌唱に用いられる歌詞文などが挙げられる。ディスプレイＤは、制御部３からの表示指令にしたがって、例えば、話者の音読に対する評価を示す情報（例えば、得点）を画面に表示する。この評価には、声量の評価、抑揚（音高またはピッチともいう）の評価、滑舌の評価、スピード（音読スピード）の評価、間（間合い）の評価、総合評価などがある。なお、マイクＭ、及びディスプレイＤは、音読評価装置Ｓと一体型であってもよいし、別体であってもよい。 [1. Configuration and function of reading aloud evaluation device S]
First, with reference to FIG. 1, the structure and function of the reading aloud evaluation apparatus S which concerns on one Embodiment of this invention are demonstrated. FIG. 1 is a diagram illustrating a schematic configuration example of the reading aloud evaluation apparatus S according to the present embodiment. In addition, a personal computer, a portable information terminal (smartphone, etc.) etc. are mentioned as an example of a reading aloud evaluation apparatus. As shown in FIG. 1, the reading aloud evaluation device S includes a communication unit 1, a storage unit 2, a control unit 3, an operation unit 4, an interface (IF) unit 5, and the like. It is connected to the. The operation unit 4 receives an operation instruction from the user and outputs a signal corresponding to the received operation to the control unit 3. The interface unit 5 is connected to a microphone M, a display D, and the like. The microphone M collects a sound produced when a speaker who performs speech learning such as language learning, announcement, reading, etc. reads a sentence (sentence) aloud. The sentence includes a plurality of phrases or a plurality of clauses. A phrase is composed of one or more phrases, and is a group of phrases that are generally spoken at a breath. Each phrase and phrase includes one or more words. Therefore, the sentence includes one or more words. Words include independent words such as nouns, verbs, adjectives, adverbs, and conjunctions (parts of speech that can constitute a phrase alone), adjuncts such as auxiliary verbs and particles (parts of speech that cannot constitute a phrase alone), and the like. Examples of sentences to be read aloud include, for example, sentences used for language learning, announcement training, reading training, and lyrics used for singing. The display D displays information (for example, a score) indicating, for example, evaluation of the speaker's reading aloud on the screen according to a display command from the control unit 3. This evaluation includes evaluation of voice volume, evaluation of inflection (also referred to as pitch or pitch), evaluation of smooth tongue, evaluation of speed (reading speed), evaluation of interval (interval), and overall evaluation. The microphone M and the display D may be integrated with the reading aloud evaluation device S or may be separate.

通信部１は、有線または無線によりネットワーク（図示せず）に接続してサーバ等と通信を行う。記憶部２は、例えばハードディスクドライブ等からなり、ＯＳ（オペレーティングシステム）、及び音読評価処理プログラム（本発明のプログラムの一例）等を記憶する。音読評価処理プログラムは、コンピュータとしての制御部３に、後述する音読評価処理を実行させるプログラムである。音読評価処理プログラムは、アプリケーションとして、所定のサーバからダウンロードされてもよいし、ＣＤ、ＤＶＤ等の記録媒体に記憶されて提供されてもよい。また、記憶部２は、上述した文のテキストデータと、この文を音読した得点算出の基準となる音声（例えば、音読するときの手本となる音声）の波形を示す音声波形データ（以下、「基準音声波形データ」という）とを記憶する。ここで、テキストデータには、例えば、各文字の発音タイミング（例えば、発音開始からの経過時間）が文字毎または単語毎に対応付けられて含まれる。なお、基準音声波形データは、所定の音声ファイル形式で記憶される。 The communication unit 1 communicates with a server or the like by connecting to a network (not shown) by wire or wireless. The storage unit 2 includes, for example, a hard disk drive and stores an OS (Operating System), a reading aloud evaluation processing program (an example of the program of the present invention), and the like. The reading aloud evaluation processing program is a program for causing the control unit 3 as a computer to execute a reading aloud evaluation processing described later. The reading aloud evaluation processing program may be downloaded from a predetermined server as an application, or may be provided by being stored in a recording medium such as a CD or a DVD. The storage unit 2 also stores voice data (hereinafter referred to as “voice data”) indicating the waveform of the above-described sentence text data and the voice that serves as a reference for score calculation by reading the sentence (for example, the voice that serves as a model for reading aloud). "Reference voice waveform data"). Here, the text data includes, for example, the sound generation timing of each character (for example, the elapsed time from the start of sound generation) in association with each character or each word. The reference voice waveform data is stored in a predetermined voice file format.

また、記憶部２には、単語の重要度を規定する参照情報を登録する単語重要度データベース（ＤＢ）が構築される。単語の重要度とは、単語が文中に含まれた場合に文脈上どの程度重要になるかを示す度合である。例えば、固有名詞や、数値を示す単語は、アナウンス等の聞き手に伝えるべき重要な情報ということができるため、このような単語には重要度が高く設定（他の単語と比較して相対的に高い重要度が付与）される。特に、数によって数量や順序を表す数詞は、重要な情報であるため、無条件で重要度が高く設定されるとよい。単語の重要度は、例えば、「小（低）」、「中」、「大（高）」というように文字で表されてもよいし、「１」、「２」、「３」、「４」、「５」というように数値で表されてもよい。これにより、例えば、重要である単語とそうでない単語とに差がつけられる。参照情報には、文に含まれる全ての単語について単語毎に重要度が規定されていてもよいし、重要度が高い単語のみ（または重要度が低い単語のみ）が規定されてもよい。 In the storage unit 2, a word importance database (DB) for registering reference information that defines the importance of words is constructed. The importance of a word is a degree indicating how important a word is in context when the word is included in a sentence. For example, proper nouns and words indicating numerical values can be said to be important information to be communicated to listeners such as announcements. Therefore, such words have a high importance (relative to other words). High importance is given). In particular, the number representing the quantity and the order by the number is important information, so it is preferable that the importance is unconditionally set high. The importance of the word may be expressed by letters such as “small (low)”, “medium”, “large (high)”, or “1”, “2”, “3”, “ It may be expressed numerically such as “4” and “5”. Thereby, for example, a difference is made between words that are important and words that are not. In the reference information, the importance level may be defined for each word for all the words included in the sentence, or only words with high importance levels (or only words with low importance levels) may be specified.

制御部３は、コンピュータとしてのＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、及びＲＡＭ（Random Access Memory）等により構成される。制御部３は、音読評価処理プログラムにより、音声処理部３１、音読評価部３２、得点調整部３３、及び表示制御部３４として機能する。音声処理部３１、音読評価部３２、及び得点調整部３３は、本発明における取得手段、第１入力手段、第２入力手段、入力手段、第１決定手段、第２決定手段、決定手段、算出手段、分解手段、第１特定手段、第２特定手段、及び特定手段の一例である。表示制御部３４は、本発明における表示制御手段の一例である。 The control unit 3 includes a CPU (Central Processing Unit) as a computer, a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The control unit 3 functions as a speech processing unit 31, a reading aloud evaluation unit 32, a score adjustment unit 33, and a display control unit 34 by a reading aloud evaluation processing program. The speech processing unit 31, the reading aloud evaluation unit 32, and the score adjustment unit 33 are an acquisition unit, a first input unit, a second input unit, an input unit, a first determination unit, a second determination unit, a determination unit, and a calculation in the present invention. It is an example of a means, a disassembling means, a first specifying means, a second specifying means, and a specifying means. The display control unit 34 is an example of display control means in the present invention.

制御部３は、所定の音声ファイル形式で記憶された基準音声波形データを記憶部２から音声処理部３１へ入力する。また、制御部３は、話者が上記文を音読したときに発した音声であってマイクＭにより集音された音声の波形を示す音声波形データ（以下、「話者音声波形データ」という）を音声処理部３１へ入力する。基準音声波形データ及び話者音声波形データを総称して音声波形データという。なお、音声波形データは、離散化された時系列の音圧波形データであり、例えば、サンプリングレート44.1kHz、量子化16bit、及びモノラルの波形データである。なお、音圧とは、音波による空気の圧力の変化分（Pa）をいう。本実施形態では、音圧として、瞬時音圧（Pa）の二乗平均平方根（RMS）である実効音圧（Pa）の大きさを計算上扱い易い数値で表した音圧レベル(dB)を適用する。音圧レベル(dB)は、広義には音量ともいう。音圧レベルは、声量の評価に用いられる。 The control unit 3 inputs reference audio waveform data stored in a predetermined audio file format from the storage unit 2 to the audio processing unit 31. In addition, the control unit 3 is voice waveform data (hereinafter referred to as “speaker voice waveform data”) indicating a waveform of a voice collected by the microphone M when the speaker reads the above sentence aloud. Is input to the voice processing unit 31. The reference speech waveform data and speaker speech waveform data are collectively referred to as speech waveform data. The voice waveform data is discretized time-series sound pressure waveform data, for example, sampling rate 44.1 kHz, quantization 16 bits, and monaural waveform data. Note that the sound pressure refers to a change (Pa) in air pressure due to sound waves. In this embodiment, the sound pressure level (dB) that represents the magnitude of effective sound pressure (Pa), which is the root mean square (RMS) of instantaneous sound pressure (Pa), is expressed as a numerical value that is easy to handle in calculation. To do. The sound pressure level (dB) is also called volume in a broad sense. The sound pressure level is used to evaluate the voice volume.

音声処理部３１は、音声波形データから例えば所定時間（例えば、10ms）毎に切り出したデータから音圧レベル(dB)を声量として所定時間毎に（所定時間間隔で）算出する。また、音声処理部３１は、音声波形データから例えば所定時間毎に切り出したデータから基本周波数（Hz）を算出し、算出した基本周波数（Hz）を抑揚として所定時間毎に算出する。なお、抑揚の算出方法には、例えば、ゼロクロス法やベクトル自己相関等の公知の手法を適用できる。また、音声処理部３１は、滑舌の評価に用いる声道特性を示す特徴量（音響特性）を単語毎に算出する。例えば、音声処理部３１は、音声波形データを単語（単語区間）毎に切り出し（例えば、音読された文のテキストデータに基づいて切り出し）、切り出した単語区間のデータを窓掛けで区切って（例えば、25ms毎にフレーム化）、フーリエ解析（ＦＦＴ）することで振幅スペクトルを求める。そして、音声処理部３１は、求めた振幅スペクトルにメルフィルタバンクをかけ、メルフィルタバンクの出力を対数化した値を離散コサイン変換（ＤＣＴ）することでＭＦＣＣ（メル周波数ケプストラム係数）を、声道特性を示す特徴量として単語毎に算出する。 The voice processing unit 31 calculates the sound pressure level (dB) from the data cut out from the voice waveform data every predetermined time (for example, 10 ms), for example, as a voice volume every predetermined time (at predetermined time intervals). Further, the voice processing unit 31 calculates a fundamental frequency (Hz) from data cut out at predetermined time intervals from the voice waveform data, and calculates the calculated fundamental frequency (Hz) at predetermined time intervals as inflections. For example, a known method such as a zero-cross method or a vector autocorrelation can be applied to the intonation calculation method. In addition, the voice processing unit 31 calculates a feature amount (acoustic characteristic) indicating a vocal tract characteristic used for evaluation of the smooth tongue for each word. For example, the voice processing unit 31 cuts out the voice waveform data for each word (word section) (for example, cut out based on text data of a sentence read aloud), and divides the data of the cut out word section by windowing (for example, Amplitude spectrum is obtained by performing a Fourier analysis (FFT). Then, the speech processing unit 31 multiplies the obtained amplitude spectrum by a mel filter bank and performs a discrete cosine transform (DCT) on a logarithmic value of the output of the mel filter bank to obtain a MFCC (mel frequency cepstrum coefficient). It is calculated for each word as a feature quantity indicating the characteristic.

また、音声処理部３１は、基準音声波形データに基づいて、各フレーズの開始タイミングから終了タイミングまでのフレーズ区間（以下、「基準フレーズ区間」という）をフレーズ毎に特定する。ここで、開始タイミングと終了タイミングは、それぞれ、音声の波形から認識されてもよいし、上述したように算出された音圧レベル(dB)から認識されてもよい。例えば、音声処理部３１は、音声の波形の振幅幅が所定値以上になった時点を開始タイミングとして認識する。或いは、音声処理部３１は、音圧レベル(dB)が所定値以上になった時点を開始タイミングとして認識する。また、例えば、音声処理部３１は、音声の波形の振幅幅が所定値未満になった時点を終了タイミングとして認識する。或いは、音声処理部３１は、音圧レベル(dB)が所定値未満になった時点を終了タイミングとして認識する。また、音声処理部３１は、複数のフレーズのうち何れかのフレーズの終了タイミングから次のフレーズの開始タイミングまでのインターバル区間（以下、「基準インターバル区間」という）を特定する。また、音声処理部３１は、複数の文節のうち何れかの文節の終了タイミングから次の文節の開始タイミングまでの基準インターバル区間を特定してもよい。また、音声処理部３１は、例えば、音読された文のテキストデータが示す発音タイミングに基づいて、各単語の単語区間（以下、「基準単語区間」という）を単語毎に特定してもよい。さらに、音声処理部３１は、例えば、音読された文のテキストデータが示す発音タイミングに基づいて、各文節の開始タイミングから終了タイミングまでの文節区間（以下、「基準文節区間」という）を、文節毎に特定してもよい。 In addition, the voice processing unit 31 specifies a phrase section (hereinafter referred to as “reference phrase section”) from the start timing to the end timing of each phrase based on the reference voice waveform data for each phrase. Here, the start timing and the end timing may be recognized from the sound waveform, or may be recognized from the sound pressure level (dB) calculated as described above. For example, the voice processing unit 31 recognizes the time point when the amplitude width of the voice waveform becomes a predetermined value or more as the start timing. Alternatively, the sound processing unit 31 recognizes the time point when the sound pressure level (dB) becomes a predetermined value or more as the start timing. Further, for example, the voice processing unit 31 recognizes the time point when the amplitude width of the voice waveform is less than a predetermined value as the end timing. Alternatively, the sound processing unit 31 recognizes the time point when the sound pressure level (dB) becomes less than a predetermined value as the end timing. In addition, the voice processing unit 31 specifies an interval section (hereinafter referred to as a “reference interval section”) from the end timing of one of the phrases to the start timing of the next phrase. The speech processing unit 31 may specify a reference interval section from the end timing of any one of the plurality of clauses to the start timing of the next clause. For example, the voice processing unit 31 may specify a word section of each word (hereinafter referred to as “reference word section”) for each word based on the pronunciation timing indicated by the text data of the sentence read aloud. Furthermore, the speech processing unit 31 determines, for example, a phrase section (hereinafter referred to as “reference phrase section”) from the start timing to the end timing of each phrase based on the pronunciation timing indicated by the text data of the sentence read aloud. You may specify every.

また、音声処理部３１は、話者音声波形データに基づいて、各フレーズの開始タイミングから終了タイミングまでのフレーズ区間（以下、「話者フレーズ区間」という）をフレーズ毎に特定する。また、音声処理部３１は、複数のフレーズのうち何れかのフレーズの終了タイミングから次のフレーズの開始タイミングまでのインターバル区間（以下、「話者インターバル区間」という）を特定する。また、音声処理部３１は、複数の文節のうち何れかの文節の終了タイミングから次の文節の開始タイミングまでの話者インターバル区間を特定してもよい。また、音声処理部３１は、例えば、音読された文のテキストデータが示す発音タイミングに基づいて、各単語の単語区間（以下、「話者単語区間」という）を単語毎に特定してもよい。さらに、音声処理部３１は、例えば、音読された文のテキストデータが示す発音タイミングに基づいて、各文節の開始タイミングから終了タイミングまでの文節区間（以下、「話者文節区間」という）を、文節毎に特定してもよい。 Further, the speech processing unit 31 specifies a phrase section (hereinafter referred to as “speaker phrase section”) from the start timing to the end timing of each phrase based on the speaker speech waveform data. In addition, the voice processing unit 31 identifies an interval section (hereinafter referred to as “speaker interval section”) from the end timing of any one of the phrases to the start timing of the next phrase. Further, the speech processing unit 31 may specify a speaker interval section from the end timing of any one of a plurality of clauses to the start timing of the next clause. For example, the speech processing unit 31 may specify a word section of each word (hereinafter referred to as “speaker word section”) for each word based on the pronunciation timing indicated by the text data of the sentence read aloud. . Furthermore, the speech processing unit 31, for example, based on the pronunciation timing indicated by the text data of the sentence read aloud, the phrase section (hereinafter referred to as “speaker phrase section”) from the start timing to the end timing of each phrase, You may specify for each phrase.

以上のようにして特定されたフレーズ区間（または文節区間）、単語区間、及びインターバル区間のデータは、例えば、それぞれの音声波形データの音声ファイルに対応付けられて記憶部２に記憶される。なお、特定された各区間（フレーズ区間（または文節区間）、単語区間、及びインターバル区間）は、例えば波形の開始時点からの時間の範囲（例えば、01:00-03:00）で表される。また、特定された各区間には、それぞれ、例えば先頭から順番にシリアル番号が付与される。 The phrase section (or phrase section), word section, and interval section data identified as described above are stored in the storage unit 2 in association with the sound file of each sound waveform data, for example. Each identified section (phrase section (or phrase section), word section, and interval section) is represented by, for example, a time range (for example, 01: 00-03: 00) from the start of the waveform. . Also, serial numbers are assigned to the identified sections in order, for example, from the top.

次に、音読評価部３２は、基準フレーズ区間の声量と、基準フレーズ区間に対応する（例えばシリアル番号が一致する）話者フレーズ区間の声量とを比較して、声量の評価をフレーズ毎に行う。比較される声量には、例えば上述した音圧レベル(dB)が用いられる。音読評価部３２は、例えば、声量の比較結果として、基準フレーズ区間の声量と話者フレーズ区間の声量との差を算出し、この差に基づいて得点（評価点）を算出することで声量の評価を行う。この得点は、例えば、差が０に近いほど、評価が高く（つまり、得点が高く）なるように算出される。つまり、話者の声量が基準の声量よりも大きいまたは小さいほど差が大きくなるので評価は低くなる。一方、話者の声量が基準の声量に近づくほど差が小さくなるので評価は高くなる。このようにしてフレーズ区間毎に声量の評価がなされる（つまり、得点が算出される）。ところで、各フレーズ区間において、所定時間毎に算出された声量を比較する場合、声量の評価は、基準フレーズ区間と話者フレーズ区間との開始の時間位置を合わせて、フレーズ区間を伸縮させてフレーズ区間の時間長を合わせて行われるとよい。このとき、単純に伸縮させて長さを合わせてもよいし、ＤＰマッチング等の手法を使い、フレーズ区間の中で動的に評価する位置を合わせるようにしてもよい。或いは、比較される声量は、各フレーズ区間において所定時間毎に算出された声量の平均値としてもよい。なお、音読評価部３２は、フレーズ毎の声量と同様の評価方法で、基準文節区間の声量と、基準文節区間に対応する話者文節区間の声量とを比較して、声量の評価を文節毎に行ってもよい（つまり、文節毎に得点が算出される）。また、音読評価部３２は、フレーズ毎の声量と同様の評価方法で、基準単語区間の声量と、基準単語区間に対応する話者単語区間の声量とを比較して、声量の評価を単語毎に行ってもよい（つまり、単語毎に得点が算出される）。 Next, the reading aloud evaluation unit 32 compares the voice volume of the reference phrase section with the voice volume of the speaker phrase section corresponding to the reference phrase section (for example, the serial numbers match), and performs voice volume evaluation for each phrase. . For example, the above-described sound pressure level (dB) is used as the voice volume to be compared. The reading aloud evaluation unit 32 calculates, for example, a difference between the volume of the reference phrase section and the volume of the speaker phrase section as a comparison result of the volume, and calculates a score (evaluation score) based on the difference, thereby calculating the voice volume. Evaluate. For example, the score is calculated such that the evaluation is higher (that is, the score is higher) as the difference is closer to 0. That is, since the difference becomes larger as the speaker's voice volume is larger or smaller than the reference voice volume, the evaluation becomes lower. On the other hand, since the difference becomes smaller as the speaker's voice volume approaches the reference voice volume, the evaluation becomes higher. In this way, the voice volume is evaluated for each phrase section (that is, a score is calculated). By the way, when comparing the amount of voice calculated at each predetermined time in each phrase section, the evaluation of the voice volume is performed by adjusting the time position of the start of the reference phrase section and the speaker phrase section, and expanding and contracting the phrase section. It is good to carry out according to the time length of a section. At this time, the length may be adjusted by simply expanding and contracting, or the position to be dynamically evaluated in the phrase section may be adjusted using a technique such as DP matching. Alternatively, the voice volume to be compared may be an average voice volume calculated every predetermined time in each phrase section. The reading aloud evaluation unit 32 compares the voice volume of the reference phrase section with the voice volume of the speaker phrase section corresponding to the reference phrase section, and evaluates the voice volume for each phrase by the same evaluation method as the voice volume for each phrase. (That is, a score is calculated for each phrase). Moreover, the reading aloud evaluation unit 32 compares the voice volume of the reference word section with the voice volume of the speaker word section corresponding to the reference word section by the same evaluation method as the voice volume of each phrase, and evaluates the voice volume for each word. (That is, a score is calculated for each word).

また、音読評価部３２は、基準フレーズ区間の抑揚（音高）と、基準フレーズ区間に対応する話者フレーズ区間の抑揚とを比較して、抑揚の評価をフレーズ毎に行う。音読評価部３２は、例えば、抑揚の比較結果として、基準フレーズ区間の抑揚と話者フレーズ区間の抑揚との差を算出し、この差に基づいて得点を算出することで抑揚の評価を行う。この得点は、例えば、差が０に近いほど、評価が高く（つまり、得点が高く）なるように算出される。つまり、話者の抑揚が基準の抑揚よりも高いまたは低いほど差が大きくなるので評価は低くなる。一方、話者の抑揚が基準の抑揚に近づくほど差が小さくなるので評価は高くなる。このようにしてフレーズ区間毎に抑揚の評価がなされる（つまり、得点が算出される）。また、声量の場合と同様、抑揚の評価は、基準フレーズ区間と話者フレーズ区間との開始の時間位置を合わせて、フレーズ区間を伸縮させてフレーズ区間の時間長を合わせて行われるとよい。或いは、比較される抑揚は、各フレーズ区間において所定時間毎に算出された抑揚の平均値としてもよい。なお、音読評価部３２は、フレーズ毎の抑揚と同様の評価方法で、基準文節区間の抑揚と、基準文節区間に対応する話者文節区間の抑揚とを比較して、抑揚の評価を文節毎に行ってもよい（つまり、文節毎に得点が算出される）。また、音読評価部３２は、フレーズ毎の抑揚と同様の評価方法で、基準単語区間の抑揚と、基準単語区間に対応する話者単語区間の抑揚とを比較して、抑揚の評価を単語毎に行ってもよい（つまり、単語毎に得点が算出される）。 Moreover, the reading aloud evaluation unit 32 compares the inflection (pitch) of the reference phrase section with the inflection of the speaker phrase section corresponding to the reference phrase section, and performs the inflection evaluation for each phrase. For example, the reading aloud evaluation unit 32 calculates the difference between the inflection of the reference phrase section and the inflection of the speaker phrase section as the comparison result of the inflection, and evaluates the inflection by calculating the score based on the difference. For example, the score is calculated such that the evaluation is higher (that is, the score is higher) as the difference is closer to 0. That is, the higher the lower or lower speaker inflection, the greater the difference and the lower the evaluation. On the other hand, the evaluation becomes higher because the difference becomes smaller as the inflection of the speaker approaches the inflection of the reference. In this way, inflection is evaluated for each phrase section (that is, a score is calculated). Further, as in the case of the voice volume, the inflection evaluation may be performed by matching the start time positions of the reference phrase section and the speaker phrase section, expanding and contracting the phrase section, and adjusting the time length of the phrase section. Alternatively, the inflection to be compared may be an average value of the inflection calculated every predetermined time in each phrase section. The reading aloud evaluation unit 32 compares the inflection of the reference phrase section with the inflection of the speaker phrase section corresponding to the reference phrase section, and evaluates the inflection evaluation for each phrase by the same evaluation method as the inflection for each phrase. (That is, a score is calculated for each phrase). The reading aloud evaluation unit 32 compares the inflection of the reference word section with the inflection of the speaker word section corresponding to the reference word section, and evaluates the inflection evaluation for each word by the same evaluation method as the inflection for each phrase. (That is, a score is calculated for each word).

また、音読評価部３２は、基準単語区間の滑舌と、基準単語区間に対応する話者単語区間の滑舌とを比較して、滑舌の評価を単語毎に行う。音読評価部３２は、滑舌の評価では、例えば、単語毎に算出された声道特性を示す特徴量（ＭＦＣＣ）が用いられる。音読評価部３２は、例えば、滑舌の比較結果として、基準単語区間の特徴量と話者単語区間の特徴量との類似度を算出し、この類似度に基づいて得点を算出することで滑舌の評価を行う。この得点は、例えば、類似度が高いほど、評価が高く（つまり、得点が高く）なるように算出される。このようにして単語区間毎に滑舌の評価がなされる（つまり、得点が算出される）。更に、音読評価部３２は、基準フレーズ区間に含まれる複数の基準単語区間それぞれの特徴量に基づいて（例えば平均して）、基準フレーズ区間の特徴量を基準フレーズ区間毎に算出する。また、音読評価部３２は、話者フレーズ区間に含まれる複数の話者単語区間それぞれの特徴量に基づいて（例えば平均して）、話者フレーズ区間の特徴量を話者フレーズ区間毎に算出する。そして、音読評価部３２は、例えば、基準フレーズ区間の特徴量と話者フレーズ区間の特徴量との類似度を算出し、この類似度に基づいて得点を算出することで滑舌の評価をフレーズ毎に行う。なお、音読評価部３２は、フレーズ毎の滑舌と同様の評価方法で、基準文節区間の滑舌と、基準文節区間に対応する話者文節区間の滑舌とを比較して、滑舌の評価を文節毎に行ってもよい（つまり、文節毎に得点が算出される）。 The reading aloud evaluation unit 32 compares the smooth tongue of the reference word section with the smooth tongue of the speaker word section corresponding to the reference word section, and evaluates the smooth tongue for each word. The reading aloud evaluation unit 32 uses, for example, a feature value (MFCC) indicating vocal tract characteristics calculated for each word in the smooth tongue evaluation. The reading aloud evaluation unit 32 calculates, for example, the similarity between the feature amount of the reference word section and the feature amount of the speaker word section as a smoothing tongue comparison result, and calculates a score based on the similarity. Perform tongue evaluation. This score is calculated, for example, such that the higher the similarity, the higher the evaluation (that is, the higher the score). In this way, the smooth tongue is evaluated for each word section (that is, a score is calculated). Furthermore, the reading aloud evaluation part 32 calculates the feature-value of a reference phrase area for every reference phrase area based on the feature-value of each of the some reference word area contained in a reference phrase area (for example, average). Further, the reading aloud evaluation unit 32 calculates the feature amount of the speaker phrase section for each speaker phrase section based on (for example, on average) the feature amount of each of the plurality of speaker word sections included in the speaker phrase section. To do. Then, the reading aloud evaluation unit 32 calculates, for example, the similarity between the feature amount of the reference phrase section and the feature amount of the speaker phrase section, and calculates a score based on the similarity, thereby evaluating the smooth tongue. Do it every time. The reading aloud evaluation unit 32 compares the smooth tongue of the reference phrase section with the smooth tongue of the speaker phrase section corresponding to the reference phrase section by the same evaluation method as the smooth tongue for each phrase, The evaluation may be performed for each phrase (that is, a score is calculated for each phrase).

また、音読評価部３２は、基準フレーズ区間の時間長（時間的長さ）と、話者フレーズ区間の時間長とを比較して文を音読するスピード（音読スピード）の評価をフレーズ（つまり、フレーズ区間）毎に行う。音読評価部３２は、例えば、時間長の比較結果として、フレーズ毎に、基準フレーズ区間の時間長と話者フレーズ区間の時間長との時間差を算出し、この時間差の絶対値に基づいて得点を算出することでスピードの評価を行う。例えば、時間差の絶対値が０に近いほど、評価が高く（つまり、得点が高く）なるように算出される。つまり、話者のスピードが、基準のスピードよりも速いまたは遅いほど時間差の絶対値が大きくなるので評価は低くなる。一方、話者のスピードが基準のスピードに近づくほど時間差の絶対値が小さくなるので評価は高くなる。このようにしてフレーズ毎にスピードの評価がなされる（つまり、得点が算出される）。なお、音読評価部３２は、フレーズ毎のスピードと同様の評価方法で、基準文節区間の時間長と、基準文節区間に対応する話者文節区間の時間長とを比較して、スピードの評価を文節毎に行ってもよい（つまり、文節毎に得点が算出される）。また、音読評価部３２は、フレーズ毎のスピードと同様の評価方法で、基準単語区間の時間長と、基準単語区間に対応する話者単語区間の時間長とを比較して、スピードの評価を単語毎に行ってもよい（つまり、単語毎に得点が算出される）。 The reading aloud evaluation unit 32 compares the time length (temporal length) of the reference phrase section with the time length of the speaker phrase section, and evaluates the speed of reading the sentence (reading speed) as a phrase (ie, reading speed). This is done for each phrase section. The reading aloud evaluation unit 32 calculates, for example, a time difference between the time length of the reference phrase section and the time length of the speaker phrase section for each phrase as a comparison result of the time length, and obtains a score based on the absolute value of this time difference. The speed is evaluated by calculating. For example, the evaluation is higher (that is, the score is higher) as the absolute value of the time difference is closer to 0. That is, as the speaker speed is faster or slower than the reference speed, the absolute value of the time difference increases, so the evaluation becomes lower. On the other hand, since the absolute value of the time difference decreases as the speaker speed approaches the reference speed, the evaluation increases. In this way, the speed is evaluated for each phrase (that is, a score is calculated). The reading aloud evaluation unit 32 compares the time length of the reference phrase section with the time length of the speaker phrase section corresponding to the reference phrase section in the same evaluation method as the speed for each phrase, and evaluates the speed. It may be performed for each phrase (that is, a score is calculated for each phrase). Moreover, the reading aloud evaluation unit 32 compares the time length of the reference word section with the time length of the speaker word section corresponding to the reference word section by the same evaluation method as the speed for each phrase, and evaluates the speed. You may carry out for every word (that is, a score is calculated for every word).

なお、上述した声量、抑揚、及びスピードの評価それぞれにおいて算出されるフレーズ毎の得点は、それぞれのフレーズを構成する複数の単語それぞれの得点に基づいて算出されてもよい。例えば、フレーズの得点は、このフレーズを構成する複数の単語それぞれの得点の平均点として算出される。或いは、例えば、フレーズの得点は、このフレーズを構成する複数の単語それぞれの得点及び重要度（上述したように単語それぞれに付与された重要度）に基づいて算出される。例えば、フレーズを構成する複数の単語の得点が、それぞれ、５，５，６，７であり、フレーズを構成する複数の単語の重要度が、それぞれ、小，小，大，小であるとする。この場合、重みの比率が「大：小＝２：１」の場合、フレーズを構成する複数の単語の重みは、それぞれ、１，１，２，１に決定される。そして、この場合のフレーズの得点は、５×1/5[1/(1+1+2+1)]＋５×1/5[1/(1+1+2+1)]＋６×2/5［２／(1+1+2+1)]＋７×1/5[1/(1+1+2+1)]＝５．８として算出される。同様に、声量、抑揚、及びスピードの評価それぞれにおいて算出される文節毎の得点は、それぞれの文節を構成する複数の単語それぞれの得点に基づいて算出されてもよい。 Note that the score for each phrase calculated in each of the above-described evaluations of voice volume, intonation, and speed may be calculated based on the scores of a plurality of words constituting each phrase. For example, the score of a phrase is calculated as the average score of the scores of a plurality of words constituting the phrase. Alternatively, for example, the score of a phrase is calculated based on the score and importance of each of a plurality of words constituting the phrase (importance assigned to each word as described above). For example, it is assumed that the scores of a plurality of words constituting the phrase are 5, 5, 6, and 7, respectively, and the importance of the plurality of words constituting the phrase is small, small, large, and small, respectively. . In this case, when the weight ratio is “large: small = 2: 1”, the weights of the plurality of words constituting the phrase are determined as 1, 1, 2, 1, respectively. And the score of the phrase in this case is 5 × 1/5 [1 / (1 + 1 + 2 + 1)] + 5 × 1/5 [1 / (1 + 1 + 2 + 1)] + 6 × 2 / 5 [2 / (1 + 1 + 2 + 1)] + 7 × 1/5 [1 / (1 + 1 + 2 + 1)] = 5.8. Similarly, the score for each phrase calculated in each of the evaluation of voice volume, intonation, and speed may be calculated based on the score of each of a plurality of words constituting each phrase.

また、音読評価部３２は、基準インターバル区間の時間長と、話者インターバル区間の時間長とを比較して文を音読したときの間（間合い）の評価を行う。音読評価部３２は、例えば、時間長の比較結果として、基準インターバル区間の時間長と話者インターバル区間の時間長との時間差を算出し、この時間差の絶対値に基づいて得点を算出することで間の評価を行う。また、音読評価部３２は、インターバル区間毎の間の評価に基づいて全てのインターバル区間における間の評価を行う。全てのインターバル区間における間の評価では、例えば、インターバル区間毎に算出された間の得点の平均値が全てのインターバル区間における間の総得点として算出される。 Moreover, the reading aloud evaluation unit 32 compares the time length of the reference interval section with the time length of the speaker interval section, and evaluates the time during which the sentence is read aloud (interval). The reading aloud evaluation unit 32 calculates, for example, a time difference between the time length of the reference interval section and the speaker interval section as a comparison result of time lengths, and calculates a score based on the absolute value of the time difference. Interim assessment. Moreover, the reading aloud evaluation part 32 performs the evaluation between all the interval sections based on the evaluation for every interval section. In the evaluation between all interval sections, for example, the average value of the scores calculated for each interval section is calculated as the total score between all interval sections.

得点調整部３３は、音読評価部３２により音声波形データに基づいてフレーズ毎に算出された得点を、調整対象として設定された評価項目について取得する。例えば、声量、抑揚、スピード、及び滑舌毎に、フレーズ毎の得点が取得される。そして、得点調整部３３は、フレーズ毎の重みを決定（つまり、フレーズ毎に重み付けがなされる）し、決定したそれぞれの重みに応じた配点比率をフレーズ毎に決定する。配点比率により配点に傾斜がかけられる。なお、フレーズの重みは、例えば、単語重要度データベースが用いられ、フレーズを構成する複数の単語それぞれの重要度に基づいてフレーズ毎に決定される。これにより、単語単位でフレーズの重要度を重みに反映させることができる。また、フレーズの重みは、フレーズを構成する複数の単語それぞれの重要度と、フレーズ（例えば基準フレーズ区間）の時間長とに基づいてフレーズ毎に決定されるようにすれば、フレーズの重みの適切さを高めることができる。そして、得点調整部３３は、上記取得した得点を、上記決定した配点比率によりフレーズ毎に調整し、調整したフレーズ毎の得点に基づいて、複数のフレーズを含む文全体の音読に対する総得点を算出する。例えば、声量、抑揚、スピード、及び滑舌毎に調整されたフレーズ毎の得点に基づいて文全体の音読に対する総得点が算出される。なお、調整後の得点と調整前の得点とは必ずしも異なるとは限らず、同一の場合もある。 The score adjustment unit 33 acquires the score calculated for each phrase based on the speech waveform data by the reading aloud evaluation unit 32 for the evaluation item set as the adjustment target. For example, a score for each phrase is acquired for each voice volume, intonation, speed, and smooth tongue. And the score adjustment part 33 determines the weight for every phrase (namely, weighting is carried out for every phrase), and determines the score ratio according to each determined weight for every phrase. The points are inclined according to the point ratio. Note that the weight of the phrase is determined for each phrase based on the importance of each of a plurality of words constituting the phrase, for example, using a word importance database. Thereby, the importance of a phrase can be reflected in a weight for every word. In addition, if the phrase weight is determined for each phrase based on the importance of each of the plurality of words constituting the phrase and the time length of the phrase (for example, the reference phrase section), the phrase weight can be appropriately set. Can be increased. And the score adjustment part 33 adjusts the said acquired score for every phrase by the determined scoring ratio, and calculates the total score with respect to the reading of the whole sentence containing several phrases based on the score for every adjusted phrase. To do. For example, the total score for reading aloud the entire sentence is calculated based on the amount of voice, the inflection, the speed, and the score for each phrase adjusted for each tongue. Note that the score after the adjustment and the score before the adjustment are not necessarily different and may be the same.

図２は、ある評価項目についてのフレーズ毎の得点が配点比率により調整されて文全体の音読に対する総得点が算出される例を示す概念図である。図２の例では、音読された文はフレーズＦ１〜Ｆ４から構成されており、例えば得点調整部３３が備える重み付けエンジン（ソフトウェアモジュール）より、フレーズＦ１〜Ｆ４の重要度が、それぞれ、「小」，「大」，「小」，「大」に決定されている。重みの比率が「大：小＝２：１」に設定されている場合、フレーズＦ１〜Ｆ４の重みは、図２に示すように、それぞれ、「１」，「２」，「１」，「２」に決定される。これにより、フレーズＦ１〜Ｆ４の重みに応じた配点比率は、それぞれ、「1/6[1/(1+2+1+2)]」，「1/3[2/(1+2+1+2)]」，「1/6[1/(1+2+1+2)]」，「1/3[2/(1+2+1+2)]」に決定される。そして、フレーズＦ１〜Ｆ４の得点（１０点，９点，８点，７点）それぞれに、フレーズＦ１〜Ｆ４の配点比率それぞれが乗算されることで得点が調整され、調整されたフレーズ毎の得点の合計が総得点（８．３３点）として算出されている。ここで、図２の例では、フレーズＦ１〜Ｆ４の得点（１０点，９点，８点，７点）は、配点（＝満点）を１０点としたときの得点になっている。なお、別の例として、仮に、フレーズＦ１〜Ｆ４の重要度が、それぞれ、「中」，「大」，「小」，「大」であり、重みの比率が「大：中：小＝３：２：１」に設定されている場合、フレーズＦ１〜Ｆ４の重みは、それぞれ、「２」，「３」，「１」，「３」に決定されることになる。この場合、フレーズＦ１〜Ｆ４の重みに応じた配点比率は、それぞれ、「2/9[2/(2+3+1+3)]」，「1/3[3/(2+3+1+3)]」，「1/9[1/(2+3+1+3)]」，「1/3[3/(2+3+1+3)]」に決定される。 FIG. 2 is a conceptual diagram illustrating an example in which the score for each phrase for a certain evaluation item is adjusted by the scoring ratio to calculate the total score for reading aloud the entire sentence. In the example of FIG. 2, the sentence read aloud is composed of phrases F1 to F4. For example, the importance of the phrases F1 to F4 is “small” by the weighting engine (software module) included in the score adjustment unit 33, for example. , “Large”, “Small”, and “Large”. When the weight ratio is set to “large: small = 2: 1”, the weights of the phrases F1 to F4 are “1”, “2”, “1”, “1”, respectively, as shown in FIG. 2 ". Thereby, the scoring ratios according to the weights of the phrases F1 to F4 are “1/6 [1 / (1 + 2 + 1 + 2)]” and “1/3 [2 / (1 + 2 + 1), respectively. +2)] ”,“ 1/6 [1 / (1 + 2 + 1 + 2)] ”,“ 1/3 [2 / (1 + 2 + 1 + 2)] ”. And the score is adjusted by multiplying each score of phrases F1 to F4 (10 points, 9 points, 8 points, and 7 points) by each of the score ratios of phrases F1 to F4, and the score for each adjusted phrase Is calculated as the total score (8.33 points). Here, in the example of FIG. 2, the scores (10 points, 9 points, 8 points, and 7 points) of the phrases F1 to F4 are points when the score (= full score) is 10 points. As another example, the importance of phrases F1 to F4 is “medium”, “large”, “small”, and “large”, respectively, and the weight ratio is “large: medium: small = 3”. : 2: 1 ”, the weights of the phrases F1 to F4 are determined as“ 2 ”,“ 3 ”,“ 1 ”, and“ 3 ”, respectively. In this case, the scoring ratios according to the weights of the phrases F1 to F4 are “2/9 [2 / (2 + 3 + 1 + 3)]” and “1/3 [3 / (2 + 3 + 1), respectively. +3)] ”,“ 1/9 [1 / (2 + 3 + 1 + 3)] ”,“ 1/3 [3 / (2 + 3 + 1 + 3)] ”.

図３は、図２に示す重み付けエンジン内で、フレーズを構成する複数の単語それぞれの重要度に基づいてフレーズ毎に重要度が決定される例を示す概念図である。図３の例では、フレーズＦ１を構成する複数の単語の重要度は、単語重要度データベースにより、それぞれ、「小」，「小」になっている。これにより、フレーズＦ１を構成する単語の重みは、それぞれ、「１」，「１」に決定されている。フレーズＦ２〜Ｆ４を構成する複数の単語の重みについても、同様に決定される。そして、フレーズＦ１を構成する複数の単語の重みの平均値は「１」であり、フレーズＦ２を構成する複数の単語の重みの平均値は「１．２（便宜上、小数点以下２桁目以降を切り捨て）」であり、フレーズＦ３を構成する複数の単語の重みの平均値は「１」であり、フレーズＦ４を構成する複数の単語の重みの平均値は「１．２」である。これにより、フレーズＦ１〜Ｆ４の重要度は、それぞれ、「小」，「大」，「小」，「大」に決定されている。なお、単語の重みの数値のとり方は一例であり、単語の重要度に応じて単語間で差がつけばどのような数値をとってもよい。 FIG. 3 is a conceptual diagram showing an example in which the importance is determined for each phrase based on the importance of each of a plurality of words constituting the phrase in the weighting engine shown in FIG. In the example of FIG. 3, the importance levels of the plurality of words constituting the phrase F1 are “small” and “small”, respectively, based on the word importance database. Thereby, the weight of the word which comprises the phrase F1 is determined to be "1" and "1", respectively. The weights of a plurality of words constituting the phrases F2 to F4 are similarly determined. The average value of the weights of the plurality of words constituting the phrase F1 is “1”, and the average value of the weights of the plurality of words constituting the phrase F2 is “1.2 (for convenience, the second and subsequent digits after the decimal point are The average value of the weights of the plurality of words constituting the phrase F3 is “1”, and the average value of the weights of the plurality of words constituting the phrase F4 is “1.2”. Accordingly, the importance levels of the phrases F1 to F4 are determined as “small”, “large”, “small”, and “large”, respectively. In addition, how to take the numerical value of the weight of a word is an example, and what kind of numerical value may be taken if there is a difference between words according to the importance of a word.

また、得点調整部３３は、音読評価部３２により音声波形データに基づいて文節毎に算出された得点を所定の評価項目について取得してもよい。この場合、得点調整部３３は、フレーズの場合と同様に、文節毎の重みを決定し（つまり、文節毎に重み付けがなされる）、決定したそれぞれの重みに応じた配点比率を文節毎に決定する。そして、得点調整部３３は、フレーズの場合と同様に、上記取得した得点を、上記決定した配点比率により文節毎に調整し、調整した文節毎の得点に基づいて、複数の文節を含む文全体の音読に対する総得点を算出する。また、得点調整部３３は、音読評価部３２により音声波形データに基づいて単語毎に算出された得点を所定の評価項目について取得してもよい。この場合、得点調整部３３は、単語毎の重要度を単語重要度データベースから特定し、特定された単語毎の重要度が反映された重みに応じた配点比率を単語毎に決定する。そして、得点調整部３３は、取得した得点を、決定した配点比率により単語毎に調整し、調整した単語毎の得点に基づいて、複数の単語を含む文全体の音読に対する総得点を算出する。 Moreover, the score adjustment part 33 may acquire the score calculated for every clause by the reading aloud evaluation part 32 based on speech waveform data about a predetermined evaluation item. In this case, as in the case of the phrase, the score adjustment unit 33 determines the weight for each phrase (that is, the phrase is weighted), and the scoring ratio corresponding to each determined weight is determined for each phrase. To do. Then, as in the case of the phrase, the score adjustment unit 33 adjusts the acquired score for each phrase by the determined scoring ratio, and the entire sentence including a plurality of phrases based on the adjusted score for each phrase. Calculate the total score for reading aloud. Moreover, the score adjustment part 33 may acquire the score calculated for every word based on speech waveform data by the reading aloud evaluation part 32 about a predetermined evaluation item. In this case, the score adjustment unit 33 identifies the importance for each word from the word importance database, and determines a score ratio for each word according to the weight reflecting the importance for each identified word. And the score adjustment part 33 adjusts the acquired score for every word by the determined score ratio, and calculates the total score with respect to the reading of the whole sentence containing several words based on the adjusted score for every word.

音読評価部３２は、各評価項目（例えば、声量、抑揚、滑舌、スピード、間）について算出された総得点に基づいて、文全体の音読に対する総合評価を行う。この総合評価では、例えば、各評価項目（例えば、声量、抑揚、滑舌、スピード、間）について算出された総得点の合計が、文全体の音読に対する総合得点として算出される。 The aloud reading evaluation unit 32 performs a comprehensive evaluation on the aloud reading of the entire sentence based on the total score calculated for each evaluation item (for example, voice volume, intonation, smooth tongue, speed, interval). In this comprehensive evaluation, for example, the total score calculated for each evaluation item (for example, voice volume, intonation, smooth tongue, speed, interval) is calculated as an overall score for reading aloud the entire sentence.

表示制御部３４は、文を複数のフレーズに区分して画面に表示させ、且つ複数のフレーズの中で相対的に高い重み付けがなされたフレーズを他のフレーズとは異なる表示態様で表示させる。これにより、複数のフレーズの中で相対的に高い重み付けがなされたフレーズをユーザに一見して確認させることができる。このとき、表示制御部３４は、フレーズ毎に取得（算出）された得点（つまり、得点調整部３３により調整される前の得点）と、フレーズ毎に調整された得点との少なくとも何れか一方をそれぞれのフレーズに対応付けて画面に表示させるとよい。これにより、相対的に高い重み付けがなされたフレーズ毎の得点についてもユーザに一見して確認させることができる。 The display control unit 34 divides a sentence into a plurality of phrases and displays the phrase on a screen, and displays a phrase with a relatively high weight among the phrases in a display mode different from other phrases. Thereby, the user can confirm at a glance the phrase with relatively high weighting among the plurality of phrases. At this time, the display control unit 34 obtains at least one of the score acquired (calculated) for each phrase (that is, the score before being adjusted by the score adjusting unit 33) and the score adjusted for each phrase. It is good to display on the screen in association with each phrase. Thereby, the user can also confirm at a glance the score for each phrase that has been given a relatively high weight.

図４（Ａ），（Ｂ）は、話者の音読に対する評価を示す情報を表示する画面例を示す図である。図４（Ａ）に示す画面には、グラフ表示部５１、フレーズ表示部５２、フレーズ得点表示部５３、及び総得点表示部５４が設けられている。グラフ表示部５１には、基準音声波形データに基づいて所定時間毎に算出された抑揚の時系列的な変化を表すグラフ５１ａと、話者音声波形データに基づいて所定時間毎に算出された抑揚の時系列的な変化を表すグラフ５１ｂと、基準音声波形データに基づいて所定時間毎に算出された声量の時系列的な変化を表すグラフ５１ｃと、話者音声波形データに基づいて所定時間毎に算出された声量の時系列的な変化を表すグラフ５１ｄとがフレーズ毎に区分されて表示されている。 FIGS. 4A and 4B are diagrams illustrating examples of screens that display information indicating evaluation of a speaker for reading aloud. The screen shown in FIG. 4A is provided with a graph display section 51, a phrase display section 52, a phrase score display section 53, and a total score display section 54. The graph display unit 51 includes a graph 51a representing a time-series change of inflection calculated every predetermined time based on the reference speech waveform data, and an inflection calculated every predetermined time based on the speaker speech waveform data. 51b representing a time-series change of the above, a graph 51c representing a time-series change of the voice volume calculated every predetermined time based on the reference voice waveform data, and every predetermined time based on the speaker voice waveform data And a graph 51d representing a time-series change in the volume of voice calculated in FIG.

フレーズ表示部５２には、文が複数のフレーズに区分されて表示されている。フレーズ得点表示部５３には、フレーズ表示部５２に表示された各フレーズに対応する評価項目（例えば、滑舌）の得点を表示する表示欄５３ａ〜５３ｄが各フレーズに対応付けられて設けられている。表示制御部３４は、これらの表示欄５３ａ〜５３ｄのうち、相対的に高い重み付けがなされたフレーズに対応する表示欄５３ｂ及び５３ｄの欄内の色（つまり、得点の背景色）を、相対的に低い重み付けがなされたフレーズに対応する表示欄５３ａ及び５３ｃの欄内の色と異ならせることで表示欄５３ｂ及び５３ｄの欄内を強調表示させている。これにより、ユーザは、文に含まれる複数のフレーズの中で相対的に高い重み付けがなされたフレーズを判別することができる。なお、相対的に高い重み付けがなされたフレーズに対応する表示欄５３ｂ及び５３ｄの欄内の模様が、相対的に低い重み付けがなされたフレーズに対応する表示欄５３ａ及び５３ｃの欄内の模様と異なるように表示させてもよい。表示欄５３ａ〜５３ｄに表示された各得点（１０点，９点，８点，７点）に対して、それぞれに対応する配点比率が乗算されることで得点が調整され、調整されたフレーズ毎の得点の合計が総得点（８．３３点／１０点）として算出されることになる。 The phrase display unit 52 displays the sentence divided into a plurality of phrases. The phrase score display unit 53 is provided with display fields 53a to 53d that display scores of evaluation items (for example, smooth tongues) corresponding to each phrase displayed on the phrase display unit 52 in association with each phrase. Yes. The display control unit 34 compares the colors (that is, the background color of the score) in the display fields 53b and 53d corresponding to the phrase having a relatively high weight among these display fields 53a to 53d. The display fields 53b and 53d are highlighted in a different manner from the colors in the display fields 53a and 53c corresponding to the phrase with a low weight. Thereby, the user can discriminate | determine the phrase by which relatively high weight was made | formed among the some phrases contained in a sentence. It should be noted that the patterns in the display fields 53b and 53d corresponding to the phrase having a relatively high weight are different from the patterns in the display fields 53a and 53c corresponding to the phrase having a relatively low weight. You may display as follows. Each score (10 points, 9 points, 8 points, 7 points) displayed in the display fields 53a to 53d is multiplied by a corresponding scoring ratio to adjust the score, and each adjusted phrase Will be calculated as the total score (8.33 / 10 points).

なお、フレーズに対応する表示欄５３ａ〜５３ｄには、得点調整部３３により調整される前の得点が表示されているが、得点調整部３３により調整された後の得点が表示されてもよい。或いは、フレーズに対応する表示欄５３ａ〜５３ｄには、得点調整部３３により調整される前の得点と、得点調整部３３により調整された後の得点とが併記されて表示されてもよい。また、表示制御部３４は、例えば、総得点表示部５４に表示された評価項目のうちから選択された評価項目（この例では、抑揚）に対応するフレーズ得点表示部５３を表示（つまり、切り替え表示）させている。別の例として、画面には、評価対象となった全ての評価項目に対応するフレーズ得点表示部５３が設けられてもよい。 In addition, although the score before adjusting by the score adjustment part 33 is displayed on the display fields 53a-53d corresponding to a phrase, the score after adjusting by the score adjustment part 33 may be displayed. Alternatively, in the display columns 53a to 53d corresponding to the phrases, the score before being adjusted by the score adjusting unit 33 and the score after being adjusted by the score adjusting unit 33 may be displayed together. In addition, the display control unit 34 displays, for example, the phrase score display unit 53 corresponding to the evaluation item (intonation in this example) selected from the evaluation items displayed on the total score display unit 54 (that is, switching). Display). As another example, the screen may be provided with a phrase score display unit 53 corresponding to all evaluation items that have been evaluated.

総得点表示部５４には、評価対象となった全ての評価項目について算出された総得点と、総合評価について算出された総合得点とが表示されている。表示された総得点のうち、調整対象となった評価項目（例えば、声量、抑揚、滑舌、スピード）の総得点は、上述したように、得点調整部３３により調整されたフレーズ毎の得点に基づいて算出された総得点である。なお、総得点表示部５４内で総得点及び総合得点の右側（／の右側）には、それぞれの配点が表示されている。この例では、５種類の評価項目それぞれの配点は２０点であり、総合評価の配点は１００点である。このため、図４（Ａ）の例では、滑舌について算出された総得点（８．３３点／１０点）は、２倍された総得点（１６．７点／２０点）で表示されている。 The total score display unit 54 displays the total score calculated for all the evaluation items that are evaluation targets and the total score calculated for the comprehensive evaluation. Of the displayed total score, the total score of the evaluation items (for example, voice volume, intonation, smooth tongue, and speed) to be adjusted is the score for each phrase adjusted by the score adjustment unit 33 as described above. The total score calculated based on this. In the total score display section 54, the total score and the total score are displayed on the right side (right side of /). In this example, the score for each of the five types of evaluation items is 20 points, and the score for comprehensive evaluation is 100 points. For this reason, in the example of FIG. 4A, the total score (8.33 / 10 points) calculated for the tongue is displayed as a doubled total score (16.7 points / 20 points). Yes.

一方、図４（Ｂ）に示す画面の構成は、図４（Ａ）に示す画面の構成と基本的に同じであるが、フレーズ得点表示部５５における表示欄５５ａ〜５５ｄには、「得点／配点」の形式で表示されており、特に、相対的に高い重み付けがなされたフレーズに対応する表示欄５５ｂ及び５５ｄにおける得点と配点には、それぞれ重み「２」が乗算されて表示されている。これにより、どのフレーズの配点が高いか（つまり、重要なのか）をユーザに対して明示的に示すことができる。 On the other hand, the configuration of the screen shown in FIG. 4B is basically the same as the configuration of the screen shown in FIG. 4A. However, in the display fields 55a to 55d in the phrase score display section 55, “score / In particular, the score and the score in the display fields 55b and 55d corresponding to the phrase having a relatively high weight are multiplied by the weight “2” and displayed. As a result, it is possible to explicitly indicate to the user which phrase has a high score (ie, which is important).

なお、表示制御部３４は、文を複数の文節に区分して画面に表示させ、且つ複数の文節の中で相対的に高い重み付けがなされた文節を他の文節とは異なる表示態様で表示させてもよい。このとき、表示制御部３４は、文節毎に取得された得点と、文節毎に調整された得点との少なくとも何れか一方をそれぞれの文節に対応付けて画面に表示させるとよい。また、表示制御部３４は、文を複数の単語に区分して画面に表示させ、且つ複数の単語の中で相対的に高い重要度が付与された単語を他の単語とは異なる表示態様で表示させてもよい。このとき、表示制御部３４は、単語毎に取得された得点と、単語毎に調整された得点との少なくとも何れか一方をそれぞれの単語に対応付けて画面に表示させるとよい。 The display control unit 34 divides the sentence into a plurality of clauses and displays them on the screen, and displays a relatively high weighted clause among the plurality of clauses in a display mode different from other clauses. May be. At this time, the display control unit 34 may display on the screen at least one of the score acquired for each phrase and the score adjusted for each phrase in association with each phrase. In addition, the display control unit 34 divides the sentence into a plurality of words and displays them on the screen, and among the plurality of words, a word given a relatively high importance is displayed in a display mode different from other words. It may be displayed. At this time, the display control unit 34 may display on the screen at least one of the score acquired for each word and the score adjusted for each word in association with each word.

［２.音読評価装置Ｓの動作例］
次に、音読評価装置Ｓの動作の一例について、実施例１と実施例２に分けて説明する。実施例１では、所定の評価項目についての文全体の音読に対する総得点がフレーズ毎または文節毎の得点に基づいて算出される場合の例である。実施例２では、所定の評価項目についての文全体の音読に対する総得点が単語毎の得点に基づいて算出される場合の例である。 [2. Example of operation of the reading aloud evaluation device S]
Next, an example of the operation of the reading aloud evaluation apparatus S will be described separately in the first embodiment and the second embodiment. Example 1 is an example in which the total score for reading aloud the entire sentence for a predetermined evaluation item is calculated based on the score for each phrase or phrase. Example 2 is an example in which the total score for reading aloud the entire sentence for a predetermined evaluation item is calculated based on the score for each word.

（実施例１）
先ず、図５等を参照して、実施例１における制御部３の音読評価処理について説明する。図５は、実施例１における制御部３の音読評価処理の一例を示すフローチャートである。なお、以下に説明する音読評価処理では、複数のフレーズを含む文を例にとって説明するが、処理内容は複数の文節を含む文に対しても同じように適用できる。また、以下に説明する音読評価処理の前提として、基準音声波形データに基づいて特定された基準フレーズ区間、基準インターバル区間及び基準単語区間のデータと、基準音声波形データに基づいて所定時間毎に算出された声量及び抑揚のデータと、基準音声波形データに基づいて基準フレーズ区間毎に算出された声道の特徴量（ＭＦＣＣ）のデータとが、例えば、基準音声波形データの音声ファイルに対応付けられて記憶部２に記憶されているものとする。 Example 1
First, with reference to FIG. 5 etc., the reading aloud evaluation process of the control part 3 in Example 1 is demonstrated. FIG. 5 is a flowchart illustrating an example of the reading aloud evaluation process of the control unit 3 according to the first embodiment. Note that, in the reading aloud evaluation process described below, a sentence including a plurality of phrases will be described as an example. Moreover, as a premise of the reading aloud evaluation process described below, calculation is performed at predetermined intervals based on the data of the reference phrase section, the reference interval section and the reference word section identified based on the reference speech waveform data, and the reference speech waveform data. The voice volume and inflection data that have been obtained and the vocal tract feature quantity (MFCC) data calculated for each reference phrase section based on the reference voice waveform data are associated with the voice file of the reference voice waveform data, for example. Are stored in the storage unit 2.

図５に示す処理は、例えば、話者が操作部４を介して、音読に対する得点算出の基準（お手本）となる所望の音声ファイルを指定して開始指示を行うことにより開始される。図５に示す処理が開始されると、制御部３は、マイク入力をオンにし、上記指定された音声ファイルに対応付けられた文のテキストデータ、基準フレーズ区間、基準インターバル区間、基準単語区間、声量、抑揚、及び声道の特徴量（ＭＦＣＣ）のデータを記憶部２から入力する（ステップＳ１）。入力されたデータは、ＲＡＭに記憶される。なお、基準フレーズ区間、及び基準インターバル区間には、それぞれ、シリアル番号が付与される。基準単語区間は、これを含む基準フレーズ区間に対応付けられている。そして、話者が文の音読を開始すると、この文の音読中の発せられた音声がマイクＭにより集音され、集音された音声の波形を示す話者音声波形データが、インターフェース部５を介して音読評価装置Ｓに入力される。 The process shown in FIG. 5 is started, for example, when the speaker designates a desired voice file as a reference (example) of score calculation for reading aloud via the operation unit 4 and gives a start instruction. When the process shown in FIG. 5 is started, the control unit 3 turns on the microphone input, the text data of the sentence associated with the designated audio file, the reference phrase section, the reference interval section, the reference word section, Voice volume, intonation, and vocal tract feature value (MFCC) data are input from the storage unit 2 (step S1). The input data is stored in the RAM. A serial number is assigned to each of the reference phrase section and the reference interval section. The reference word section is associated with a reference phrase section including the reference word section. When the speaker starts reading the sentence, the voice generated during the reading of the sentence is collected by the microphone M, and the speaker voice waveform data indicating the waveform of the collected voice is sent to the interface unit 5. Is input to the reading aloud evaluation device S.

音読評価装置Ｓの制御部３は、入力された話者音声波形データを記憶部２に記憶（録音）しつつ、入力された話者音声波形データに基づいて、上述したように、話者フレーズ区間、及び話者インターバル区間を順次特定する（ステップＳ２）。特定された話者フレーズ区間及び話者インターバル区間のデータには、それぞれ、シリアル番号が付与されてＲＡＭに記憶される。こうして記憶された各話者フレーズ区間、及び各話者インターバル区間のデータは、後述する評価に用いられる。 As described above, the control unit 3 of the reading aloud evaluation apparatus S stores (records) the input speaker voice waveform data in the storage unit 2 and, based on the input speaker voice waveform data, as described above. Sections and speaker interval sections are sequentially identified (step S2). Serial numbers are assigned to the data of the specified speaker phrase section and speaker interval section, and are stored in the RAM. The data of each speaker phrase section and each speaker interval section stored in this way are used for evaluation described later.

次いで、制御部３は、入力された話者音声波形データに基づいて、上述したように、所定時間毎に声量及び抑揚を算出し、且つ、話者フレーズ区間毎に声道の特徴量（ＭＦＣＣ）を算出する（ステップＳ３）。算出された声量、抑揚、及び声道の特徴量（ＭＦＣＣ）のデータはＲＡＭに記憶される。こうして記憶された声量、抑揚、及び声道の特徴量（ＭＦＣＣ）のデータは、後述する評価に用いられる。次いで、制御部３は、基準フレーズ区間の抑揚と話者フレーズ区間の抑揚とをシリアル番号順に比較して抑揚の評価を行う（ステップＳ４）。抑揚の評価により、上述したように、フレーズ毎の抑揚の得点が算出され、例えば文のテキストデータが示すフレーズ毎に対応付けられて記憶部２に記憶される。 Next, as described above, the control unit 3 calculates the voice volume and the inflection every predetermined time based on the input speaker voice waveform data, and the vocal tract feature quantity (MFCC) for each speaker phrase section. ) Is calculated (step S3). The calculated voice volume, intonation and vocal tract feature quantity (MFCC) data are stored in the RAM. The voice volume, intonation, and vocal tract feature quantity (MFCC) data stored in this way are used for evaluation described later. Next, the control unit 3 evaluates the inflection by comparing the inflection of the reference phrase section and the inflection of the speaker phrase section in the order of serial numbers (step S4). As described above, the inflection score is calculated for each phrase, and is stored in the storage unit 2 in association with each phrase indicated by the text data of the sentence, for example.

次いで、制御部３は、基準フレーズ区間の声量と話者フレーズ区間の声量とをシリアル番号順に比較して声量の評価を行う（ステップＳ５）。声量の評価により、上述したように、フレーズ毎の声量の得点が算出され、例えば文のテキストデータが示すフレーズ毎に対応付けられて記憶部２に記憶される。次いで、制御部３は、基準フレーズ区間の声道特性を示す特徴量（ＭＦＣＣ）と話者フレーズ区間の声道特性を示す特徴量（ＭＦＣＣ）とをシリアル番号順に比較して滑舌の評価を行う（ステップＳ６）。滑舌の評価により、上述したように、フレーズ毎の滑舌の得点が算出され、例えば文のテキストデータが示すフレーズ毎に対応付けられて記憶部２に記憶される。 Next, the control unit 3 evaluates the voice volume by comparing the voice volume of the reference phrase section and the voice volume of the speaker phrase section in the order of serial numbers (step S5). As described above, the voice volume score for each phrase is calculated by the evaluation of the voice volume, and is stored in the storage unit 2 in association with each phrase indicated by the text data of the sentence, for example. Next, the control unit 3 compares the feature quantity (MFCC) indicating the vocal tract characteristic of the reference phrase section and the feature quantity (MFCC) indicating the vocal tract characteristic of the speaker phrase section in the order of serial numbers, and evaluates the smooth tongue. It performs (step S6). As described above, the smooth tongue score is calculated for each phrase and is stored in the storage unit 2 in association with each phrase indicated by the text data of the sentence.

次いで、制御部３は、基準フレーズ区間の時間長と話者フレーズ区間の時間長とをシリアル番号順に比較してスピードの評価を行う（ステップＳ７）。スピードの評価により、上述したように、フレーズ毎のスピードの得点が算出され、例えば文のテキストデータが示すフレーズ毎に対応付けられて記憶部２に記憶される。次いで、制御部３は、基準インターバル区間の時間長と話者インターバル区間の時間長とをシリアル番号順に比較して間の評価を行う（ステップＳ８）。間の評価により、上述したように、全てのインターバル区間における間の総得点が算出され、記憶部２に記憶される。 Next, the control unit 3 evaluates the speed by comparing the time length of the reference phrase section and the time length of the speaker phrase section in the order of serial numbers (step S7). As described above, the speed score for each phrase is calculated by the speed evaluation, and is stored in the storage unit 2 in association with each phrase indicated by the text data of the sentence, for example. Next, the control unit 3 compares the time length of the reference interval section and the time length of the speaker interval section in the order of serial numbers and performs an evaluation between them (step S8). As described above, the total score between all the interval sections is calculated and stored in the storage unit 2.

次いで、制御部３は、ステップＳ１で入力されたテキストデータが示す文をフレーズ毎に複数の単語に分解する（ステップＳ９）。次いで、制御部３は、単語重要度データベースに登録された、単語の重要度を規定する参照情報を参照して、ステップＳ９で分解されたそれぞれの単語の重要度を特定する（ステップＳ１０）。次いで、制御部３は、ステップＳ１０により特定された単語の重要度に基づいて、例えば図３に示すように、フレーズの重要度をフレーズ毎に決定する（ステップＳ１１）。なお、フレーズの重要度は、単語の重要度と、単語の基準単語区間の時間長（以下、「単語長」という）とに基づいて決定されてもよい。 Next, the control unit 3 decomposes the sentence indicated by the text data input in step S1 into a plurality of words for each phrase (step S9). Next, the control unit 3 refers to the reference information that defines the importance of the word registered in the word importance database, and identifies the importance of each word decomposed in step S9 (step S10). Next, the control unit 3 determines the importance of the phrase for each phrase, for example, as shown in FIG. 3, based on the importance of the word specified in step S10 (step S11). The importance of the phrase may be determined based on the importance of the word and the time length of the reference word section of the word (hereinafter referred to as “word length”).

図６（Ａ）は、単語の重要度及び単語長に基づいてフレーズの重要度が決定される例を示す概念図である。図６（Ａ）の例では、フレーズＦ１１を構成する複数の単語の重要度は、それぞれ、「小」，「小」，「小」になっている。これにより、フレーズＦ１１を構成する複数の単語の重みは、それぞれ、「１」，「１」，「１」に決定されている。また、フレーズＦ１１を構成する複数の単語の単語長は、それぞれ、「0.5秒」，「0.75秒」，「0.75秒」になっている。そして、これらの単語の重み及び単語長に基づいて、図６（Ａ）に示す計算式（１）により算出された値に応じた重要度が、フレーズＦ１１の重要度として決定されることになる。同様に、図６（Ａ）に示す計算式（２）により算出された値に応じた重要度が、フレーズＦ１２の重要度として決定されることになる。 FIG. 6A is a conceptual diagram illustrating an example in which the importance of a phrase is determined based on the importance of a word and the word length. In the example of FIG. 6A, the importance levels of the plurality of words constituting the phrase F11 are “small”, “small”, and “small”, respectively. Thereby, the weights of the plurality of words constituting the phrase F11 are determined as “1”, “1”, and “1”, respectively. The word lengths of the plurality of words constituting the phrase F11 are “0.5 seconds”, “0.75 seconds”, and “0.75 seconds”, respectively. Based on the weights and word lengths of these words, the importance corresponding to the value calculated by the calculation formula (1) shown in FIG. 6A is determined as the importance of the phrase F11. . Similarly, the importance corresponding to the value calculated by the calculation formula (2) shown in FIG. 6A is determined as the importance of the phrase F12.

次いで、制御部３は、ステップＳ１で入力された基準フレーズ区間のデータから、基準フレーズ区間の時間長（基準音声波形データが示す音声の波形に基づいて特定されたフレーズの時間長、以下、「フレーズ長」という））をフレーズ毎に特定する（ステップＳ１２）。次いで、制御部３は、ステップＳ１１で決定されたフレーズの重要度と、ステップＳ１２で特定されたフレーズ長とに基づいて、フレーズの重みをフレーズ毎に決定する（ステップＳ１３）。次いで、制御部３は、ステップＳ１３で決定された重みに応じた配点比率をフレーズ毎に決定する（ステップＳ１４）。 Next, the control unit 3 determines the time length of the reference phrase section (the time length of the phrase specified based on the speech waveform indicated by the reference speech waveform data, from the data of the reference phrase section input in step S1, hereinafter “ (Referred to as “phrase length”) for each phrase (step S12). Subsequently, the control part 3 determines the weight of a phrase for every phrase based on the importance of the phrase determined by step S11, and the phrase length specified by step S12 (step S13). Subsequently, the control part 3 determines the score ratio according to the weight determined by step S13 for every phrase (step S14).

図６（Ｂ）は、フレーズの重要度及びフレーズ長に基づいてフレーズの重み及び配点比率が決定される例を示す概念図である。フレーズＦ１１〜Ｆ１３の重要度は、それぞれ、小，大，大になっている。一方、フレーズＦ１１〜Ｆ１３のフレーズ長は、それぞれ、「２秒」，「１秒」，「２秒」になっている。そして、重みの比率が「大：小＝２：１」及び「２秒：１秒＝２：１」に設定されている場合、図６（Ｂ）に示すように、フレーズＦ１１〜Ｆ１３の全体重みは、それぞれ、「２」，「２」，「４」に決定される。これにより、フレーズＦ１１〜Ｆ１３の重みに応じた配点比率は、それぞれ、「1/4」，「1/4」，「1/2」に決定される。 FIG. 6B is a conceptual diagram illustrating an example in which phrase weights and scoring ratios are determined based on phrase importance and phrase length. The importance levels of the phrases F11 to F13 are small, large, and large, respectively. On the other hand, the phrase lengths of the phrases F11 to F13 are “2 seconds”, “1 second”, and “2 seconds”, respectively. When the weight ratio is set to “large: small = 2: 1” and “2 seconds: 1 second = 2: 1”, as shown in FIG. 6B, the entire phrases F11 to F13 The weights are determined as “2”, “2”, and “4”, respectively. Thereby, the scoring ratios according to the weights of the phrases F11 to F13 are determined to be “1/4”, “1/4”, and “1/2”, respectively.

なお、制御部３は、ステップＳ１１で決定されたフレーズの重要度のみに基づいてフレーズの重みをフレーズ毎に決定するように構成してもよい。 Note that the control unit 3 may be configured to determine the weight of the phrase for each phrase based only on the importance of the phrase determined in step S11.

次いで、制御部３は、ステップＳ４〜Ｓ７で算出された、各評価項目についてのフレーズ毎の得点（つまり、抑揚、声量、滑舌、及びスピードの得点）をそれぞれ取得する（ステップＳ１５）。次いで、制御部３は、ステップＳ１５で取得した得点に対して、ステップＳ１４で決定した配点比率を乗算することでフレーズ毎に得点を調整する（ステップＳ１６）。このような得点の調整は、評価項目毎に実行される。 Subsequently, the control part 3 acquires the score (namely, score of an inflection, a voice volume, a smooth tongue, and speed) for every evaluation item calculated by step S4-S7, respectively (step S15). Next, the control unit 3 adjusts the score for each phrase by multiplying the score acquired in step S15 by the scoring ratio determined in step S14 (step S16). Such adjustment of the score is executed for each evaluation item.

次いで、制御部３は、ステップＳ１６で調整されたフレーズ毎の得点に基づいて、文全体の音読に対する総得点を算出する（ステップＳ１７）。例えば、フレーズ毎に調整された得点の総和が総得点として算出される。このような総得点の算出は、評価項目毎に実行される。次いで、制御部３は、ステップＳ８で算出された間の総得点と、ステップＳ１７で算出された各評価項目の総得点とに基づいて文全体の音読に対する総合得点を算出する（ステップＳ１８）。次いで、制御部３は、ステップＳ１〜Ｓ１８で得られた得点等の情報に基づいて、図４（Ａ）又は（Ｂ）に示すように、話者の音読に対する評価を示す情報を表示する画面をディスプレイＤに表示させる（ステップＳ１９）。 Next, the control unit 3 calculates a total score for reading aloud the entire sentence based on the score for each phrase adjusted in step S16 (step S17). For example, the total score adjusted for each phrase is calculated as the total score. Such calculation of the total score is executed for each evaluation item. Next, the control unit 3 calculates a total score for the reading of the whole sentence based on the total score calculated in step S8 and the total score of each evaluation item calculated in step S17 (step S18). Next, based on the information such as the scores obtained in steps S1 to S18, the control unit 3 displays information indicating evaluation of the speaker's reading aloud as shown in FIG. 4 (A) or (B). Is displayed on the display D (step S19).

以上説明したように、上記実施例１によれば、音読評価装置Ｓは、フレーズ（または文節）毎の重みを決定し、決定したそれぞれの重みに応じた配点比率をフレーズ（または文節）毎に決定し、音読に対する所定の評価項目についてフレーズ（または文節）毎に評価された得点を、上記決定した配点比率によりフレーズ（または文節）毎に調整し、調整したフレーズ（または文節）毎の得点に基づいて、複数のフレーズ（または文節）を含む文全体の音読に対する総得点を算出するように構成したので、文脈上重要な部分かどうかに応じて評価を行うことができる。これにより、話者（練習者）は文脈上の重要な位置（フレーズ箇所等）が分かるため、どこに注力すべきかが分かり、得点算出の納得感が上がる。そのため、例えばアナウンス全体を万遍なく練習するのではなく、文脈上の重要な位置を重点的に練習することができる。 As described above, according to the first embodiment, the reading aloud evaluation device S determines a weight for each phrase (or phrase), and assigns a scoring ratio according to each determined weight for each phrase (or phrase). The score that is determined and evaluated for each phrase (or phrase) for a given evaluation item for reading aloud is adjusted for each phrase (or phrase) according to the determined ratio of points, and the score for each adjusted phrase (or phrase) is adjusted. Based on the above, the total score for reading aloud the entire sentence including a plurality of phrases (or phrases) is calculated, so that the evaluation can be performed according to whether it is an important part in context. Thereby, since the speaker (practitioner) knows the important position (phrase part etc.) in the context, he / she can know where to focus and the satisfaction of the score calculation is improved. Therefore, for example, it is possible not to practice the entire announcement uniformly, but to focus on important positions in the context.

（実施例２）
次に、図７等を参照して、実施例２における制御部３の音読評価処理について説明する。図７は、実施例２における制御部３の音読評価処理の一例を示すフローチャートである。なお、以下に説明する音読評価処理の前提として、基準音声波形データに基づいて特定された基準単語区間及び基準インターバル区間のデータと、基準音声波形データに基づいて所定時間毎に算出された声量及び抑揚のデータと、基準音声波形データに基づいて単語（話者単語区間）毎に算出された声道の特徴量（ＭＦＣＣ）のデータとが、例えば、基準音声波形データの音声ファイルに対応付けられて記憶部２に記憶されているものとする。 (Example 2)
Next, with reference to FIG. 7 etc., the reading aloud evaluation process of the control part 3 in Example 2 is demonstrated. FIG. 7 is a flowchart illustrating an example of a reading aloud evaluation process performed by the control unit 3 according to the second embodiment. As a premise of the reading aloud evaluation process described below, the data of the reference word interval and the reference interval interval specified based on the reference speech waveform data, the voice volume calculated every predetermined time based on the reference speech waveform data, and Inflection data and vocal tract feature value (MFCC) data calculated for each word (speaker word interval) based on the reference speech waveform data are associated with the speech file of the reference speech waveform data, for example. Are stored in the storage unit 2.

図７に示す処理は、図５に示す処理と同様に開始される。図７に示す処理が開始されると、制御部３は、マイク入力をオンにし、指定された音声ファイルに対応付けられた文のテキストデータ、基準単語区間、基準インターバル区間、声量、抑揚、及び声道の特徴量（ＭＦＣＣ）のデータを記憶部２から入力する（ステップＳ２１）。なお、基準単語区間、及び基準インターバル区間には、それぞれ、シリアル番号が付与される。そして、話者が文の音読を開始すると、この文の音読中に発せられた音声がマイクＭにより集音され、集音された音声の波形を示す話者音声波形データが、インターフェース部５を介して音読評価装置Ｓに入力される。 The process shown in FIG. 7 is started in the same manner as the process shown in FIG. When the process shown in FIG. 7 is started, the control unit 3 turns on the microphone input, the text data of the sentence associated with the designated voice file, the reference word interval, the reference interval interval, the voice volume, the inflection, and Data of the vocal tract feature value (MFCC) is input from the storage unit 2 (step S21). A serial number is assigned to each of the reference word section and the reference interval section. When the speaker starts reading the sentence, the voice generated during the reading of the sentence is collected by the microphone M, and the speaker voice waveform data indicating the waveform of the collected voice is sent to the interface unit 5. Is input to the reading aloud evaluation device S.

音読評価装置Ｓの制御部３は、入力された話者音声波形データを記憶部２に記憶しつつ、入力された話者音声波形データに基づいて、上述したように、話者単語区間、及び話者インターバル区間を順次特定する（ステップＳ２２）。特定された話者単語区間及び話者インターバル区間のデータには、それぞれ、シリアル番号が付与されてＲＡＭに記憶される。 The control unit 3 of the reading aloud evaluation device S stores the input speaker voice waveform data in the storage unit 2 and, as described above, based on the input speaker voice waveform data, Speaker interval intervals are sequentially identified (step S22). Serial numbers are assigned to the data of the specified speaker word section and speaker interval section, and are stored in the RAM.

次いで、制御部３は、入力された話者音声波形データに基づいて、上述したように、所定時間毎に声量及び抑揚を算出し、且つ、話者単語区間毎に声道の特徴量（ＭＦＣＣ）を算出する（ステップＳ２３）。算出された声量、抑揚、及び声道の特徴量（ＭＦＣＣ）のデータはＲＡＭに記憶される。次いで、制御部３は、基準単語区間の抑揚と話者単語区間の抑揚とをシリアル番号順に比較して抑揚の評価を行う（ステップＳ２４）。抑揚の評価により、上述したように、単語毎の抑揚の得点が算出され、記憶部２に記憶される。 Next, as described above, the control unit 3 calculates the voice volume and the inflection every predetermined time based on the input speaker voice waveform data, and the vocal tract feature quantity (MFCC) for each speaker word section. ) Is calculated (step S23). The calculated voice volume, intonation and vocal tract feature quantity (MFCC) data are stored in the RAM. Next, the control unit 3 evaluates the inflection by comparing the inflection of the reference word section and the inflection of the speaker word section in the order of serial numbers (step S24). As described above, the inflection score for each word is calculated and stored in the storage unit 2 by the evaluation of the inflection.

次いで、制御部３は、基準単語区間の声量と話者単語区間の声量とをシリアル番号順に比較して声量の評価を行う（ステップＳ２５）。声量の評価により、上述したように、単語毎の声量の得点が算出され、記憶部２に記憶される。次いで、制御部３は、基準単語区間の声道特性を示す特徴量（ＭＦＣＣ）と話者単語区間の声道特性を示す特徴量（ＭＦＣＣ）とをシリアル番号順に比較して滑舌の評価を行う（ステップＳ２６）。滑舌の評価により、上述したように、単語毎の滑舌の得点が算出され、記憶部２に記憶される。 Next, the control unit 3 evaluates the voice volume by comparing the voice volume of the reference word section and the voice volume of the speaker word section in the order of serial numbers (step S25). As described above, the voice volume score for each word is calculated and stored in the storage unit 2 by the evaluation of the voice volume. Next, the control unit 3 compares the feature quantity (MFCC) indicating the vocal tract characteristic of the reference word section and the feature quantity (MFCC) showing the vocal tract characteristic of the speaker word section in order of serial numbers, and evaluates the smooth tongue. This is performed (step S26). As described above, the smooth tongue score for each word is calculated and stored in the storage unit 2 by the evaluation of the smooth tongue.

次いで、制御部３は、基準単語区間の時間長と話者単語区間の時間長とをシリアル番号順に比較してスピードの評価を行う（ステップＳ２７）。スピードの評価により、上述したように、単語毎のスピードの得点が算出され、記憶部２に記憶される。次いで、制御部３は、基準インターバル区間の時間長と話者インターバル区間の時間長とをシリアル番号順に比較して間の評価を行う（ステップＳ２８）。間の評価により、上述したように、全てのインターバル区間における間の総得点が算出され、記憶部２に記憶される。 Next, the controller 3 evaluates the speed by comparing the time length of the reference word section and the time length of the speaker word section in the order of serial numbers (step S27). As described above, the speed score for each word is calculated and stored in the storage unit 2 by the speed evaluation. Next, the control unit 3 compares the time length of the reference interval section and the time length of the speaker interval section in the order of serial numbers and performs an evaluation between them (step S28). As described above, the total score between all the interval sections is calculated and stored in the storage unit 2.

次いで、制御部３は、ステップＳ２１で入力されたテキストデータが示す文を複数の単語に分解する（ステップＳ２９）。次いで、制御部３は、単語重要度データベースに登録された、単語の重要度を規定する参照情報を参照して、ステップＳ２９で分解されたそれぞれの単語の重要度を特定する（ステップＳ３０）。次いで、制御部３は、ステップＳ２１で入力された基準単語区間のデータから、単語長を単語毎に特定する（ステップＳ３１）。 Next, the control unit 3 breaks down the sentence indicated by the text data input in step S21 into a plurality of words (step S29). Next, the control unit 3 refers to the reference information that defines the importance of the word registered in the word importance database, and specifies the importance of each word decomposed in step S29 (step S30). Next, the control unit 3 specifies the word length for each word from the data of the reference word section input in step S21 (step S31).

次いで、制御部３は、ステップＳ３０で決定された単語の重要度と、ステップＳ３１で特定された単語長とに基づいて、単語の重みを単語毎に決定する（ステップＳ３２）。次いで、制御部３は、ステップＳ３２で決定された重みに応じた配点比率を単語毎に決定する（ステップＳ３３）。なお、制御部３は、ステップＳ３０で決定された単語の重要度のみに基づいて単語の重みを単語毎に決定するように構成してもよい。 Next, the control unit 3 determines the word weight for each word based on the importance of the word determined in step S30 and the word length specified in step S31 (step S32). Subsequently, the control part 3 determines the score distribution according to the weight determined by step S32 for every word (step S33). The control unit 3 may be configured to determine the word weight for each word based only on the importance of the word determined in step S30.

次いで、制御部３は、ステップＳ２４〜Ｓ２７で算出された、各評価項目についての単語毎の得点（つまり、抑揚、声量、滑舌、及びスピードの得点）をそれぞれ取得する（ステップＳ３４）。次いで、制御部３は、ステップＳ３４で取得した得点に対して、ステップＳ３３で決定した配点比率を乗算することで単語毎に得点を調整する（ステップＳ３５）。このような得点の調整は、評価項目毎に実行される。 Next, the control unit 3 obtains a score for each word (that is, a score for inflection, voice volume, smooth tongue, and speed) calculated in steps S24 to S27 (step S34). Next, the control unit 3 adjusts the score for each word by multiplying the score acquired in step S34 by the score ratio determined in step S33 (step S35). Such adjustment of the score is executed for each evaluation item.

次いで、制御部３は、ステップＳ３５で調整された単語毎の得点に基づいて、文全体の音読に対する総得点を算出する（ステップＳ３６）。例えば、単語毎に調整された得点の総和が総得点として算出される。このような総得点の算出は、評価項目毎に実行される。 Next, the control unit 3 calculates a total score for reading aloud the entire sentence based on the score for each word adjusted in step S35 (step S36). For example, the total score adjusted for each word is calculated as the total score. Such calculation of the total score is executed for each evaluation item.

図８は、単語毎の得点が配点比率により調整されて文全体の音読に対する総得点が算出される例を示す概念図である。図８の例では、重みの比率が「大：小＝２：１」及び「１．０秒：０．５秒＝２：１」に設定されており、これにより、単語Ｗ１〜Ｗ７の重みは、それぞれ、「２」，「２」，「１」，「１」，「１」，「２」，「１」に決定されている。そして、単語Ｗ１〜Ｗ７の得点（１０点満点中、６点，５点，４点，８点，７点，１０点，６点）それぞれに、単語Ｗ１〜Ｗ７の配点比率それぞれが乗算されることで得点が調整され、調整された単語毎の得点の合計が総得点（７．２３点／１０点）として算出されている。 FIG. 8 is a conceptual diagram showing an example in which the score for each word is adjusted by the scoring ratio, and the total score for reading the whole sentence is calculated. In the example of FIG. 8, the weight ratios are set to “large: small = 2: 1” and “1.0 seconds: 0.5 seconds = 2: 1”, whereby the weights of the words W1 to W7 are set. Are determined to be “2”, “2”, “1”, “1”, “1”, “2”, and “1”, respectively. Then, the scores of the words W1 to W7 (6 points, 5 points, 4 points, 8 points, 7 points, 10 points, and 6 points out of 10 points) are respectively multiplied by the stipulation ratios of the words W1 to W7. Thus, the score is adjusted, and the total score for each adjusted word is calculated as a total score (7.23 / 10 points).

次いで、制御部３は、ステップＳ２８で算出された間の総得点と、ステップＳ３６で算出された各評価項目の総得点とに基づいて文全体の音読に対する総合得点を算出する（ステップＳ３７）。次いで、制御部３は、ステップＳ２１〜Ｓ３７で得られた得点等の情報に基づいて、話者の音読に対する評価を示す情報を表示する画面をディスプレイＤに表示させる（ステップＳ３８）。 Next, the control unit 3 calculates a total score for the reading of the whole sentence based on the total score calculated in step S28 and the total score of each evaluation item calculated in step S36 (step S37). Next, the control unit 3 causes the display D to display a screen that displays information indicating evaluation of the speaker's reading aloud based on the information such as the scores obtained in steps S21 to S37 (step S38).

以上説明したように、上記実施例２によれば、音読評価装置Ｓは、単語毎の重要度を単語重要度データベースから特定し、特定された単語毎の重要度が反映された重みに応じた配点比率を単語毎に決定し、音読に対する所定の評価項目について単語毎に評価された得点を、決定した配点比率により単語毎に調整し、調整した単語毎の得点に基づいて、複数の単語を含む文全体の音読に対する総得点を算出するように構成したので、単語重要度データベースから迅速に単語毎の重要度を特定して、文脈上重要な部分かどうかに応じて評価を行うことができる。これにより、話者は文脈上の重要な位置（単語部分）が分かるため、どこに注力すべきかが分かり、得点算出の納得感が上がる。そのため、例えばアナウンス全体を万遍なく練習するのではなく、文脈上の重要な位置を重点的に練習することができる。 As described above, according to the second embodiment, the reading aloud evaluation device S identifies the importance for each word from the word importance database, and responds to the weight reflecting the importance for each identified word. A scoring ratio is determined for each word, and the score evaluated for each word for a predetermined evaluation item for reading aloud is adjusted for each word according to the determined scoring ratio, and a plurality of words are determined based on the adjusted score for each word. Since it is configured to calculate the total score for reading aloud for the entire sentence, it is possible to quickly identify the importance for each word from the word importance database and evaluate it according to whether it is an important part in context . As a result, the speaker knows an important position (word part) in the context, so that the speaker can know where to focus, and the satisfaction of the score calculation increases. Therefore, for example, it is possible not to practice the entire announcement uniformly, but to focus on important positions in the context.

１通信部
２記憶部
３制御部
４操作部
５インターフェース部
６バス
Ｓ音読評価装置 DESCRIPTION OF SYMBOLS 1 Communication part 2 Memory | storage part 3 Control part 4 Operation part 5 Interface part 6 Bus S Reading aloud evaluation apparatus

Claims

Speakers is calculated for each of the phrases on the basis of the speech waveform data representing the speech waveform emitted when reading aloud a plurality including statements phrase is a group of clause spoken in one breath Ri Do from one or more clauses An acquisition means for acquiring a score;
First determining means for determining a weight for each phrase;
Second deciding means for deciding for each phrase a scoring ratio according to the weight for each phrase decided by the first deciding means;
The score obtained by the obtaining means is adjusted for each phrase by the scoring ratio determined by the second determining means, and the entire sentence including the plurality of phrases is adjusted based on the score for each adjusted phrase. A calculation means for calculating a total score for reading aloud;
A reading aloud evaluation apparatus comprising:

Scores calculated for each phrase based on speech waveform data indicating the waveform of speech uttered when a speaker reads a sentence containing a plurality of phrases that are composed of one or more phrases and are spoken at a time. First acquisition means for acquiring
First determining means for determining a weight for each phrase;
Second deciding means for deciding for each phrase a scoring ratio according to the weight for each phrase decided by the first deciding means;
A second acquisition means for acquiring a score of a gap calculated in an interval section from an end timing of any one of the phrases to a start timing of the next phrase;
The score acquired by the first acquisition means is adjusted for each phrase by the scoring ratio determined by the second determination means, and the score for each adjusted phrase is acquired by the second acquisition means. A calculation means for calculating a total score for reading aloud the entire sentence including the plurality of phrases based on the score of the gap;
A reading aloud evaluation apparatus comprising:

Display control means for dividing the sentence into the plurality of phrases and displaying the same on the screen, and displaying the phrase having a relatively high weight among the plurality of phrases in a display mode different from other phrases. reading aloud evaluation apparatus according to claim 1 or 2, further comprising.

Wherein the display control unit, a front Symbol phrase acquired the score for each, and wherein the display at least one of the screen in association with each of the phrases in the adjusted score for each of the phrases The reading aloud evaluation apparatus according to claim 3 .

First input means for inputting text data of the sentence;
Decomposing means for decomposing a sentence indicated by the text data input by the first input means into a plurality of words;
Referring to reference information that defines the importance of a word, first specifying means for specifying the importance of each of the decomposed words;
Further comprising
The first determining unit determines a weight of a phrase composed of the words for each phrase based on the importance of the word specified by the first specifying unit. 5. The reading aloud evaluation apparatus according to any one of 4 above.

A second input means for inputting reference speech waveform data indicating a speech waveform that serves as a reference for score calculation by reading the sentence aloud;
A second specifying unit that specifies, for each phrase, a time length of the phrase specified based on a waveform of a voice indicated by the reference voice waveform data input by the second input unit; Is based on the importance of the word specified by the first specifying means and the time length of the phrase specified by the second specifying means, and the weight of the phrase composed of the words is determined for each phrase. The reading aloud evaluation apparatus according to claim 5 , wherein:

First acquisition means for acquiring a score calculated for each phrase based on voice waveform data indicating a waveform of a voice uttered when a speaker reads a sentence including a plurality of phrases;
First determining means for determining a weight for each phrase;
Second deciding means for deciding for each clause a scoring ratio according to the weight for each clause determined by the first deciding means;
Second acquisition means for acquiring a score of a gap calculated in an interval section from an end timing of any one of the plurality of clauses to a start timing of the next clause;
The score acquired by the first acquisition unit is adjusted for each phrase by the scoring ratio determined by the second determination unit, and the score for each adjusted phrase is acquired by the second acquisition unit. A calculation means for calculating a total score for reading aloud the entire sentence including the plurality of clauses based on the score of the gap;
A reading aloud evaluation apparatus comprising:

Display control means for dividing the sentence into the plurality of phrases and displaying the same on the screen, and for displaying the phrase having a relatively high weight among the plurality of phrases in a display mode different from other phrases. The reading aloud evaluation apparatus according to claim 7 , further comprising:

Wherein the display control unit, before SL and the scores obtained for each clause, and characterized by displaying at least one of the screen in association with each of the clauses of the adjusted score for each of the clauses The reading aloud evaluation apparatus according to claim 8 .

First input means for inputting text data of the sentence;
Decomposing means for decomposing a sentence indicated by the text data input by the first input means into a plurality of words;
Referring to reference information that defines the importance of a word, first specifying means for specifying the importance of each of the decomposed words;
Further comprising
Wherein the first determination means, based on said word importance identified by the first identification means, to claim 7 the weight of clause constituted by the word and determines for each of the clauses The reading aloud evaluation apparatus according to any one of 9 .

A second input means for inputting reference speech waveform data indicating a speech waveform that serves as a reference for score calculation by reading the sentence aloud;
Second specifying means for specifying, for each phrase, a time length of the phrase specified based on a speech waveform indicated by the reference speech waveform data input by the second input means;
The first determination unit is configured to determine a phrase composed of the words based on the importance level of the word specified by the first specification unit and the time length of the phrase specified by the second specification unit. 11. The reading aloud evaluation apparatus according to claim 10 , wherein a weight is determined for each phrase.

A reading aloud evaluation method executed by one or more computers,
Speakers is calculated for each of the phrases on the basis of the speech waveform data representing the speech waveform emitted when reading aloud a plurality including statements phrase is a group of clause spoken in one breath Ri Do from one or more clauses An acquisition step for acquiring a score;
A first determination step of determining a weight for each phrase;
A second determination step of determining, for each of the phrases, a scoring ratio according to the weight of each of the phrases determined in the first determination step;
The score acquired in the acquisition step is adjusted for each phrase by the scoring ratio determined in the second determination step, and the entire sentence including the plurality of phrases is adjusted based on the adjusted score for each phrase. A calculation step for calculating a total score for reading aloud;
A reading aloud evaluation method characterized by including:

A reading aloud evaluation method executed by one or more computers,
Scores calculated for each phrase based on speech waveform data indicating the waveform of speech uttered when a speaker reads a sentence containing a plurality of phrases that are composed of one or more phrases and are spoken at a time. A first acquisition step of acquiring
A first determination step of determining a weight for each phrase;
A second determination step of determining, for each of the phrases, a scoring ratio according to the weight of each of the phrases determined in the first determination step;
A second acquisition step of acquiring a score of the interval calculated in the interval section from the end timing of any one of the phrases to the start timing of the next phrase;
The score acquired in the first acquisition step is adjusted for each phrase by the scoring ratio determined in the second determination step, and the score for each adjusted phrase is acquired in the second acquisition step. A calculation step of calculating a total score for reading aloud the entire sentence including the plurality of phrases based on the score of the gap;
A reading aloud evaluation method characterized by including:

Scores calculated for each phrase based on speech waveform data indicating the waveform of speech uttered when a speaker reads a sentence containing a plurality of phrases that are composed of one or more phrases and are spoken at a time. An acquisition step to acquire,
A first determination step of determining a weight for each phrase;
A second determination step of determining, for each of the phrases, a scoring ratio according to the weight of each of the phrases determined in the first determination step;
The score acquired in the acquisition step is adjusted for each phrase by the scoring ratio determined in the second determination step, and the entire sentence including the plurality of phrases is adjusted based on the adjusted score for each phrase. A calculation step for calculating a total score for reading aloud;
A program that causes a computer to execute.

Scores calculated for each phrase based on speech waveform data indicating the waveform of speech uttered when a speaker reads a sentence containing a plurality of phrases that are composed of one or more phrases and are spoken at a time. A first acquisition step of acquiring
A first determination step of determining a weight for each phrase;
A second determination step of determining, for each of the phrases, a scoring ratio according to the weight of each of the phrases determined in the first determination step;
A second acquisition step of acquiring a score of the interval calculated in the interval section from the end timing of any one of the phrases to the start timing of the next phrase;
The score acquired in the first acquisition step is adjusted for each phrase by the scoring ratio determined in the second determination step, and the score for each adjusted phrase is acquired in the second acquisition step. A calculation step of calculating a total score for reading aloud the entire sentence including the plurality of phrases based on the score of the gap;
A program that causes a computer to execute.

A reading aloud evaluation method executed by one or more computers,
A first acquisition step of acquiring a score calculated for each phrase based on voice waveform data indicating a waveform of a voice uttered when a speaker reads a sentence including a plurality of phrases;
A first determination step of determining a weight for each phrase;
A second determination step of determining, for each phrase, a scoring ratio according to the weight of each phrase determined in the first determination step;
A second obtaining step of obtaining a score of the interval calculated in an interval section from an end timing of any one of the plurality of clauses to a start timing of the next clause;
The score acquired in the first acquisition step is adjusted for each phrase according to the scoring ratio determined in the second determination step, and the score for each adjusted phrase is acquired in the second acquisition step. A calculation step of calculating a total score for reading aloud the entire sentence including the plurality of clauses based on the score of the gap;
A reading aloud evaluation method characterized by including:

A first acquisition step of acquiring a score calculated for each phrase based on voice waveform data indicating a waveform of a voice uttered when a speaker reads a sentence including a plurality of phrases;
A first determination step of determining a weight for each phrase;
A second determination step of determining, for each phrase, a scoring ratio according to the weight of each phrase determined in the first determination step;
A second obtaining step of obtaining a score of the interval calculated in an interval section from an end timing of any one of the plurality of clauses to a start timing of the next clause;
The score acquired in the first acquisition step is adjusted for each phrase according to the scoring ratio determined in the second determination step, and the score for each adjusted phrase is acquired in the second acquisition step. A calculation step of calculating a total score for reading aloud the entire sentence including the plurality of clauses based on the score of the gap;
A program that causes a computer to execute.