JP6531654B2

JP6531654B2 - Speech reading evaluation device, display control method, and program

Info

Publication number: JP6531654B2
Application number: JP2016002550A
Authority: JP
Inventors: 林　宏一; 宏一林
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2016-01-08
Filing date: 2016-01-08
Publication date: 2019-06-19
Anticipated expiration: 2036-01-08
Also published as: JP2017122880A

Description

本発明は、話者が文を音読したときに発した音声に基づいて、文の音読に対する評価を行うシステム等の技術分野に関する。 The present invention relates to the technical field such as a system for evaluating the reading of a sentence based on the speech emitted when the speaker aloud the sentence.

近年、語学学習、アナウンス、または歌唱等の練習支援を目的として、複数の文要素（例えば、フレーズや単語）毎に抑揚や音量等の評価点を算出し、その結果を基に音読に対する総合評価点を算出して表示するシステムが知られている。例えば、特許文献１に開示された技術では、歌唱者の歌唱音声信号から抽出された抑揚や音量等に基づいて、曲の区間別に歌唱を採点し、これら各区間の得点に基づいて求めた総合得点を表示（特許文献１の図３参照）するようになっている。 In recent years, for the purpose of practice support such as language learning, announcement, or singing, evaluation points such as intonation and volume are calculated for each of plural sentence elements (for example, phrases and words), and overall evaluation for reading aloud based on the result Systems for calculating and displaying points are known. For example, in the technique disclosed in Patent Document 1, a song is scored for each section of the song based on intonation, volume, etc. extracted from the singing voice signal of the singer, and the total is obtained based on the score of each section The score is displayed (see FIG. 3 of Patent Document 1).

特開平１０−０７８７４９号公報Japanese Patent Application Laid-Open No. 10-078749

しかしながら、従来の技術では、話者による音読に対する総合評価点が表示されている状態で、音読された文を構成する文要素に対応する文要素区間うち、例えば話者が特に確認したい何れかの文要素区間において算出された抑揚や音量等の評価点の詳細を、その文要素区間における文要素を表すテキスト等に対応付けて、より見易い表示態様で表示させることは困難であった。 However, in the prior art, in a state in which the overall evaluation point for the reading by the speaker is displayed, any of the sentence element sections corresponding to the sentence elements constituting the read aloud sentence, for example, any one It has been difficult to display details of evaluation points such as intonation and volume calculated in a sentence element section in a more easily viewable display mode in association with text etc. representing the sentence element in the sentence element section.

本発明は、以上の点に鑑みてなされたものであり、例えば話者が特に確認したい何れかの区間において算出された評価点の詳細を、より見易い表示態様で表示させることが可能な音読評価装置、表示制御方法、及びプログラムを提供する。 The present invention has been made in view of the above points, and, for example, it is a speech reading evaluation that can display details of evaluation points calculated in any section that the speaker particularly wants to confirm in a more easily viewable display mode An apparatus, a display control method, and a program are provided.

上記課題を解決するために、請求項１に記載の発明は、話者が文を音読したときに発した音声の波形を示す音声波形データに基づいて、前記音読された文を構成する各文要素の開始タイミングから終了タイミングまでの文要素区間と、複数の前記文要素のうち何れかの前記文要素の終了タイミングから次の前記文要素の開始タイミングまでのインターバル区間との少なくとも何れか一方の区間毎に前記音読に対する評価点を算出する評価点算出手段と、前記評価点算出手段により算出された前記区間毎の評価点に基づいて、前記音読に対する総合評価点を算出する総合評価点算出手段と、前記音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの前記文要素区間の時間長に応じた長さの第１の表示領域に時間軸に沿って前記文要素区間毎に表示させる第１表示制御手段と、前記文要素区間の時間長より短い所定時間間隔毎に前記音声波形データに基づいて特定された音高と音圧との少なくとも何れか一方の音要素の時系列的な変化を表すグラフを第２の表示領域に前記時間軸に沿って表示させる第２表示制御手段と、前記総合評価点算出手段により算出された総合評価点を第３の表示領域に表示させる第３表示制御手段と、前記第１の表示領域、前記第２の表示領域、及び複数の前記第１の表示領域間に前記時間軸に沿って位置するスペース領域のうち、何れか一の領域がユーザ操作により指定された場合、指定された前記何れか一の領域に対応する前記区間において算出された前記評価点を表示する第１のウインドウをポップアップ表示させ、前記第１の表示領域がユーザ操作により指定された場合、指定された前記第１の表示領域に少なくとも一部が表示された前記テキストの全部を表示する第３のウインドウをポップアップ表示させる第４表示制御手段と、を備えることを特徴とする。 In order to solve the above-mentioned problems, the invention according to claim 1 is characterized in that each sentence constituting the read speech is based on speech waveform data showing a waveform of speech emitted when the speaker aloud the sentence At least one of a sentence element section from the start timing to the end timing of the element and an interval section from the end timing of any of the sentence elements of the plurality of sentence elements to the start timing of the next sentence element Comprehensive evaluation point calculation means for calculating an overall evaluation point for the speech based on the evaluation point calculation means for calculating an evaluation point for the phonetic reading for each section and the evaluation point for each section calculated by the evaluation point calculation means And at least a portion of the text representing each sentence element constituting the aloud sentence in the first display area of a length corresponding to the time length of each of the sentence element sections on the time axis And at least one of a pitch and a sound pressure specified on the basis of the voice waveform data at predetermined time intervals shorter than the time length of the sentence element section. A second display control means for displaying a graph representing a time-series change of one sound element in the second display area along the time axis, and an overall evaluation point calculated by the overall evaluation point calculation means A third display control means for displaying in a third display area, a space area located along the time axis between the first display area, the second display area, and the plurality of first display areas If any one of the regions is designated by the user operation, the first window displaying the evaluation point calculated in the section corresponding to the designated one of the regions is popped up . Said If the display area of is designated by a user operation, and the fourth display control means for the third window to see all of the text at least part of which is displayed in the specified the first display area is pop up , And.

請求項２に記載の発明は、話者が文を音読したときに発した音声の波形を示す音声波形データに基づいて、前記音読された文を構成する各文要素の開始タイミングから終了タイミングまでの文要素区間と、複数の前記文要素のうち何れかの前記文要素の終了タイミングから次の前記文要素の開始タイミングまでのインターバル区間との少なくとも何れか一方の区間毎に前記音読に対する評価点を算出する評価点算出手段と、前記評価点算出手段により算出された前記区間毎の評価点に基づいて、前記音読に対する総合評価点を算出する総合評価点算出手段と、前記音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの前記文要素区間の時間長に応じた長さの第１の表示領域に時間軸に沿って前記文要素区間毎に表示させる第１表示制御手段と、前記文要素区間の時間長より短い所定時間間隔毎に前記音声波形データに基づいて特定された音高と音圧との少なくとも何れか一方の音要素の時系列的な変化を表すグラフを第２の表示領域に前記時間軸に沿って表示させる第２表示制御手段と、前記総合評価点算出手段により算出された総合評価点を第３の表示領域に表示させる第３表示制御手段と、前記第１の表示領域、前記第２の表示領域、及び複数の前記第１の表示領域間に前記時間軸に沿って位置するスペース領域のうち、何れか一の領域がユーザ操作により指定された場合、指定された前記何れか一の領域に対応する前記区間において算出された前記評価点を表示する第１のウインドウをポップアップ表示させる第４表示制御手段と、前記音読に対する評価項目として予め設定された抑揚、音量、滑舌、及び速度のうち少なくとも何れか一つのそれぞれに対応する前記評価点であって、前記文要素区間毎に算出された前記評価点に基づいて、前記音読に対する区間評価点を前記文要素区間毎に算出する区間評価点算出手段と、前記区間評価点算出手段により算出された各区間評価点を表すアイコンを、前記第１の表示領域毎に対応して配置された第４の表示領域に前記時間軸に沿って前記文要素区間毎に表示させる第５表示制御手段と、を備え、前記第４表示制御手段は、前記アイコンがユーザ操作により指定された場合、前記指定された前記アイコンに対応する前記文要素区間において算出された評価点であって、前記音読に対する評価項目として予め設定された抑揚、音量、滑舌、及び速度のうち少なくとも何れか一つのそれぞれに対する前記評価点を表示する第４のウインドウをポップアップ表示させることを特徴とする。 According to the second aspect of the present invention, from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence Score for the reading aloud at least one of the sentence element section and the interval section from the end timing of the sentence element of any of the plurality of sentence elements to the start timing of the next sentence element Evaluation point calculating means for calculating a score, comprehensive evaluation point calculating means for calculating an overall evaluation point for the reading on the basis of the evaluation points for each of the sections calculated by the evaluation point calculating means, At least a portion of the text representing each of the constituent sentence elements is listed for each of the sentence element intervals along the time axis in the first display area of a length corresponding to the time length of each of the sentence element intervals. Time-series of at least one sound element of a pitch and a sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element section; A second display control means for displaying a graph showing various changes in the second display area along the time axis, and a third display area for displaying the comprehensive evaluation point calculated by the comprehensive evaluation point calculation means One of the three display control means, the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas Fourth display control means for pop-up displaying a first window for displaying the evaluation point calculated in the section corresponding to the designated one of the areas when designated by a user operation; versus The evaluation points corresponding to each of at least one of intonation, volume, glide tongue, and speed set in advance as the evaluation items, the evaluation points being calculated based on the evaluation points calculated for each of the sentence element sections Interval evaluation point calculation means for calculating an interval evaluation point for the reading aloud for each sentence element interval, and an icon representing each interval evaluation point calculated by the interval evaluation point calculation means, for each of the first display areas And a fifth display control unit configured to display in the fourth display area arranged corresponding to each of the sentence element sections along the time axis, and the fourth display control unit is configured such that the icon is operated by a user operation. When it is designated, it is an evaluation point calculated in the sentence element section corresponding to the designated icon, which is previously set as an evaluation item for the reading aloud, volume, slip tongue, and A fourth window pops up to display the evaluation point for at least one of the speeds .

請求項３に記載の発明は、話者が文を音読したときに発した音声の波形を示す音声波形データに基づいて、前記音読された文を構成する各文要素の開始タイミングから終了タイミングまでの文要素区間と、複数の前記文要素のうち何れかの前記文要素の終了タイミングから次の前記文要素の開始タイミングまでのインターバル区間との少なくとも何れか一方の区間毎に前記音読に対する評価点を算出する評価点算出手段と、前記評価点算出手段により算出された前記区間毎の評価点に基づいて、前記音読に対する総合評価点を算出する総合評価点算出手段と、前記音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの前記文要素区間の時間長に応じた長さの第１の表示領域に時間軸に沿って前記文要素区間毎に表示させる第１表示制御手段と、前記文要素区間の時間長より短い所定時間間隔毎に前記音声波形データに基づいて特定された音高と音圧との少なくとも何れか一方の音要素の時系列的な変化を表すグラフを第２の表示領域に前記時間軸に沿って表示させる第２表示制御手段と、前記総合評価点算出手段により算出された総合評価点を第３の表示領域に表示させる第３表示制御手段と、前記第１の表示領域、前記第２の表示領域、及び複数の前記第１の表示領域間に前記時間軸に沿って位置するスペース領域のうち、何れか一の領域がユーザ操作により指定された場合、指定された前記何れか一の領域に対応する前記区間において算出された前記評価点を表示する第１のウインドウをポップアップ表示させる第４表示制御手段と、を備え、前記第４表示制御手段は、前記ユーザ操作に応じて、前記一の領域にマウスのポインタが重畳されることにより前記第１のウインドウをポップアップ表示させ、前記マウスのポインタが前記一の領域から離れることにより前記第１のウインドウを消去させ、前記一の領域に前記マウスのポインタが重畳されている状態で前記マウスの操作ボタンがクリックされることにより前記第１のウインドウの表示を継続させ、前記第１のウインドウの表示が継続している間に前記何れか一の領域とは異なる他の領域にマウスのポインタが重畳されることにより前記他の領域に対応する前記区間において算出された前記評価点を表示する第２のウインドウをポップアップ表示させ、前記マウスのポインタが前記他の領域から離れることにより前記第２のウインドウを消去させ、前記他の領域に前記マウスのポインタが重畳されている状態で前記マウスの操作ボタンがクリックされることにより前記第２のウインドウの表示を継続させることを特徴とする。 According to the third aspect of the present invention, from the start timing to the end timing of each sentence element constituting the read sentence on the basis of voice waveform data indicating the waveform of the voice emitted when the speaker aloud the sentence Score for the reading aloud at least one of the sentence element section and the interval section from the end timing of the sentence element of any of the plurality of sentence elements to the start timing of the next sentence element Evaluation point calculating means for calculating a score, comprehensive evaluation point calculating means for calculating an overall evaluation point for the reading on the basis of the evaluation points for each of the sections calculated by the evaluation point calculating means, At least a portion of the text representing each of the constituent sentence elements is listed for each of the sentence element intervals along the time axis in the first display area of a length corresponding to the time length of each of the sentence element intervals. Time-series of at least one sound element of a pitch and a sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element section; A second display control means for displaying a graph showing various changes in the second display area along the time axis, and a third display area for displaying the comprehensive evaluation point calculated by the comprehensive evaluation point calculation means One of the three display control means, the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas And fourth display control means for pop-up displaying a first window for displaying the evaluation point calculated in the section corresponding to the designated one of the areas when designated by a user operation; Said [4] The display control means causes the first window to be popped up by superimposing the mouse pointer on the one area in response to the user operation, and the mouse pointer is separated from the one area. The first window is erased and the display of the first window is continued by clicking the operation button of the mouse in a state where the mouse pointer is superimposed on the one area, and the first window is erased. The evaluation point calculated in the section corresponding to the other area is displayed by superimposing the mouse pointer on the other area different from the one area while the display of the window is continued. The second window to be displayed is popped up, and the mouse pointer is moved away from the other area. (C) erasing the image, and continuing the display of the second window by clicking the operation button of the mouse while the mouse pointer is superimposed on the other area .

請求項４に記載の発明は、請求項３に記載の音読評価装置において、前記第４表示制御手段は、前記第１のウインドウの表示が継続している場合において前記一の領域に前記マウスのポインタが重畳されている状態で前記マウスの操作ボタンが再度クリックされることにより前記第１のウインドウを消去させることを特徴とする。 According to a fourth aspect of the present invention, in the speech reading evaluation apparatus according to the third aspect, the fourth display control means sets the first area of the mouse in a case where the display of the first window continues. It is characterized in that the first window is erased by clicking again on the operation button of the mouse while the pointer is superimposed.

請求項５に記載の発明は、請求項１乃至４の何れか一項に記載の音読評価装置において、前記第４表示制御手段は、前記第１の表示領域または前記第２の表示領域がユーザ操作により指定された場合、前記指定された前記第１の表示領域または前記第２の表示領域に対応する前記区間において算出された評価点であって、前記音読に対する評価項目として予め設定された抑揚、音量、滑舌、及び速度のうち少なくとも何れか一つに対する前記評価点を表示する第１のウインドウをポップアップ表示させることを特徴とする。 The invention according to claim 5 is the speech reading evaluation apparatus according to any one of claims 1 to 4, wherein the fourth display control means is a user of the first display area or the second display area. When it is designated by an operation, it is an evaluation point calculated in the section corresponding to the designated first display area or the second display area, which is previously set as an evaluation item for the reading aloud Pop-up a first window displaying the evaluation points for at least one of volume, tongue, and speed .

請求項６に記載の発明は、請求項１乃至４の何れか一項に記載の音読評価装置において、前記第４表示制御手段は、前記スペース領域がユーザ操作により指定された場合、前記指定された前記スペース領域に対応する前記区間において算出された評価点であって、前記音読に対する評価項目として予め設定された間合いに対する前記評価点を表示する第１のウインドウをポップアップ表示させることを特徴とする。 The invention according to claim 6 is the speech reading evaluation apparatus according to any one of claims 1 to 4 , wherein the fourth display control means is specified when the space area is specified by a user operation. And a pop-up display of a first window for displaying the evaluation point corresponding to the interval set in advance as the evaluation item for the reading aloud, the evaluation point being calculated in the section corresponding to the space area. .

請求項７に記載の発明は、１つ以上のコンピュータにより実行される表示制御方法であって、話者が文を音読したときに発した音声の波形を示す音声波形データに基づいて、前記音読された文を構成する各文要素の開始タイミングから終了タイミングまでの文要素区間と、複数の前記文要素のうち何れかの前記文要素の終了タイミングから次の前記文要素の開始タイミングまでのインターバル区間との少なくとも何れか一方の区間毎に前記音読に対する評価点を算出する評価点算出ステップと、前記評価点算出ステップにより算出された前記区間毎の評価点に基づいて、前記音読に対する総合評価点を算出する総合評価点算出ステップと、前記音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの前記文要素区間の時間長に応じた長さの第１の表示領域に時間軸に沿って前記文要素区間毎に表示させる第１表示制御ステップと、前記文要素区間の時間長より短い所定時間間隔毎に前記音声波形データに基づいて特定された音高と音圧との少なくとも何れか一方の音要素の時系列的な変化を表すグラフを第２の表示領域に前記時間軸に沿って表示させる第２表示制御ステップと、前記総合評価点算出ステップにより算出された総合評価点を第３の表示領域に表示させる第３表示制御ステップと、前記第１の表示領域、前記第２の表示領域、及び複数の前記第１の表示領域間に前記時間軸に沿って位置するスペース領域のうち、何れか一の領域がユーザ操作により指定された場合、指定された前記何れか一の領域に対応する前記区間において算出された前記評価点を表示する第１のウインドウをポップアップ表示させ、前記第１の表示領域がユーザ操作により指定された場合、指定された前記第１の表示領域に少なくとも一部が表示された前記テキストの全部を表示する第３のウインドウをポップアップ表示させる第４表示制御ステップと、を含むことを特徴とする。請求項８に記載の発明は、１つ以上のコンピュータにより実行される表示制御方法であって、話者が文を音読したときに発した音声の波形を示す音声波形データに基づいて、前記音読された文を構成する各文要素の開始タイミングから終了タイミングまでの文要素区間と、複数の前記文要素のうち何れかの前記文要素の終了タイミングから次の前記文要素の開始タイミングまでのインターバル区間との少なくとも何れか一方の区間毎に前記音読に対する評価点を算出する評価点算出ステップと、前記評価点算出ステップにより算出された前記区間毎の評価点に基づいて、前記音読に対する総合評価点を算出する総合評価点算出ステップと、前記音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの前記文要素区間の時間長に応じた長さの第１の表示領域に時間軸に沿って前記文要素区間毎に表示させる第１表示制御ステップと、前記文要素区間の時間長より短い所定時間間隔毎に前記音声波形データに基づいて特定された音高と音圧との少なくとも何れか一方の音要素の時系列的な変化を表すグラフを第２の表示領域に前記時間軸に沿って表示させる第２表示制御ステップと、前記総合評価点算出ステップにより算出された総合評価点を第３の表示領域に表示させる第３表示制御ステップと、前記第１の表示領域、前記第２の表示領域、及び複数の前記第１の表示領域間に前記時間軸に沿って位置するスペース領域のうち、何れか一の領域がユーザ操作により指定された場合、指定された前記何れか一の領域に対応する前記区間において算出された前記評価点を表示する第１のウインドウをポップアップ表示させる第４表示制御ステップと、前記音読に対する評価項目として予め設定された抑揚、音量、滑舌、及び速度のうち少なくとも何れか一つのそれぞれに対応する前記評価点であって、前記文要素区間毎に算出された前記評価点に基づいて、前記音読に対する区間評価点を前記文要素区間毎に算出する区間評価点算出ステップと、前記区間評価点算出ステップにより算出された各区間評価点を表すアイコンを、前記第１の表示領域毎に対応して配置された第４の表示領域に前記時間軸に沿って前記文要素区間毎に表示させる第５表示制御ステップと、を含み、前記第４表示制御ステップにおいては、前記アイコンがユーザ操作により指定された場合、前記指定された前記アイコンに対応する前記文要素区間において算出された評価点であって、前記音読に対する評価項目として予め設定された抑揚、音量、滑舌、及び速度のうち少なくとも何れか一つのそれぞれに対する前記評価点を表示する第４のウインドウをポップアップ表示させることを特徴とする。請求項９に記載の発明は、１つ以上のコンピュータにより実行される表示制御方法であって、話者が文を音読したときに発した音声の波形を示す音声波形データに基づいて、前記音読された文を構成する各文要素の開始タイミングから終了タイミングまでの文要素区間と、複数の前記文要素のうち何れかの前記文要素の終了タイミングから次の前記文要素の開始タイミングまでのインターバル区間との少なくとも何れか一方の区間毎に前記音読に対する評価点を算出する評価点算出ステップと、前記評価点算出ステップにより算出された前記区間毎の評価点に基づいて、前記音読に対する総合評価点を算出する総合評価点算出ステップと、前記音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの前記文要素区間の時間長に応じた長さの第１の表示領域に時間軸に沿って前記文要素区間毎に表示させる第１表示制御ステップと、前記文要素区間の時間長より短い所定時間間隔毎に前記音声波形データに基づいて特定された音高と音圧との少なくとも何れか一方の音要素の時系列的な変化を表すグラフを第２の表示領域に前記時間軸に沿って表示させる第２表示制御ステップと、前記総合評価点算出ステップにより算出された総合評価点を第３の表示領域に表示させる第３表示制御ステップと、前記第１の表示領域、前記第２の表示領域、及び複数の前記第１の表示領域間に前記時間軸に沿って位置するスペース領域のうち、何れか一の領域がユーザ操作により指定された場合、指定された前記何れか一の領域に対応する前記区間において算出された前記評価点を表示する第１のウインドウをポップアップ表示させる第４表示制御ステップと、を含み、前記第４表示制御ステップにおいては、前記ユーザ操作に応じて、前記一の領域にマウスのポインタが重畳されることにより前記第１のウインドウをポップアップ表示させ、前記マウスのポインタが前記一の領域から離れることにより前記第１のウインドウを消去させ、前記一の領域に前記マウスのポインタが重畳されている状態で前記マウスの操作ボタンがクリックされることにより前記第１のウインドウの表示を継続させ、前記第１のウインドウの表示が継続している間に前記何れか一の領域とは異なる他の領域にマウスのポインタが重畳されることにより前記他の領域に対応する前記区間において算出された前記評価点を表示する第２のウインドウをポップアップ表示させ、前記マウスのポインタが前記他の領域から離れることにより前記第２のウインドウを消去させ、前記他の領域に前記マウスのポインタが重畳されている状態で前記マウスの操作ボタンがクリックされることにより前記第２のウインドウの表示を継続させることを特徴とする。 The invention according to claim 7 is a display control method executed by one or more computers, wherein the reading of the sound is performed based on voice waveform data indicating a waveform of a sound emitted when the speaker aloud the sentence. A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence, and an interval from the end timing of any sentence element of the plurality of sentence elements to the start timing of the next sentence element Based on the evaluation point for each section calculated at the evaluation point calculation step for calculating the evaluation point for the reading aloud for at least one of the sections with the section, and the comprehensive evaluation point for the reading aloud Calculating an overall evaluation point for calculating at least a part of the text representing each of the sentence elements constituting the sound-read sentence, The first display control step of displaying in the first display area of a length according to the time length for each sentence element section along the time axis, and the voice at predetermined time intervals shorter than the time length of the sentence element section Second display control for displaying a graph representing time-series change of at least one sound element of a pitch and a sound pressure specified based on waveform data in the second display area along the time axis A third display control step of displaying a total evaluation point calculated in the total evaluation point calculation step in a third display area, the first display area, the second display area, and a plurality of the above When any one of the space regions located along the time axis between the first display regions is designated by the user operation, calculation is performed in the section corresponding to the designated one of the regions Said When the first window displaying the valence point is popped up and the first display area is designated by the user operation, all of the text at least a part of which is displayed in the designated first display area And a fourth display control step of pop-up displaying a third window for displaying. The invention according to claim 8 is a display control method executed by one or more computers , wherein the reading of the sound is performed based on voice waveform data indicating a waveform of the sound emitted when the speaker aloud the sentence. A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence, and an interval from the end timing of any sentence element of the plurality of sentence elements to the start timing of the next sentence element Based on the evaluation point for each section calculated at the evaluation point calculation step for calculating the evaluation point for the reading aloud for at least one of the sections with the section, and the comprehensive evaluation point for the reading aloud Calculating an overall evaluation point for calculating at least a part of the text representing each of the sentence elements constituting the sound-read sentence, A first display control step of displaying in the first display area of a length corresponding to the time length for each of the sentence element sections along the time axis, and the predetermined time interval shorter than the time length of the sentence element sections A second display for displaying a graph representing a time-series change of at least one sound element of a pitch and a sound pressure specified based on audio waveform data in the second display area along the time axis A control step, a third display control step of displaying the comprehensive evaluation point calculated in the comprehensive evaluation point calculation step in a third display region, the first display region, the second display region, and a plurality of Of the space areas located along the time axis between the first display areas, when any one of the space areas is designated by a user operation, in the section corresponding to the designated one of the areas The above calculated A fourth display control step of pop-up displaying a first window for displaying a valence point, and at least one of intonation, volume, glide tongue, and speed preset as evaluation items for the reading aloud A section evaluation point calculating step of calculating a section evaluation point for the reading aloud for each of the sentence element sections based on the evaluation points which are the evaluation points and calculated for each of the sentence element sections; A fifth display area for displaying each of the section evaluation points calculated in the step, for each of the sentence element sections along the time axis in a fourth display area arranged corresponding to each of the first display areas; A display control step, and in the fourth display control step, when the icon is designated by a user operation, the icon corresponds to the designated icon The evaluation points calculated in the sentence element section, the evaluation points for at least one of intonation, volume, syntony, and speed preset as evaluation items for the reading aloud It is characterized in that the window 4 is popped up. The invention according to claim 9 is a display control method executed by one or more computers , wherein the reading of the sound is performed based on voice waveform data indicating a waveform of the sound emitted when the speaker aloud the sentence. A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence, and an interval from the end timing of any sentence element of the plurality of sentence elements to the start timing of the next sentence element Based on the evaluation point for each section calculated at the evaluation point calculation step for calculating the evaluation point for the reading aloud for at least one of the sections with the section, and the comprehensive evaluation point for the reading aloud Calculating an overall evaluation point for calculating at least a part of the text representing each of the sentence elements constituting the sound-read sentence, A first display control step of displaying in the first display area of a length corresponding to the time length for each of the sentence element sections along the time axis, and the predetermined time interval shorter than the time length of the sentence element sections A second display for displaying a graph representing a time-series change of at least one sound element of a pitch and a sound pressure specified based on audio waveform data in the second display area along the time axis A control step, a third display control step of displaying the comprehensive evaluation point calculated in the comprehensive evaluation point calculation step in a third display region, the first display region, the second display region, and a plurality of Of the space areas located along the time axis between the first display areas, when any one of the space areas is designated by a user operation, in the section corresponding to the designated one of the areas The above calculated And a fourth display control step of pop-up displaying a first window for displaying a valence point, and in the fourth display control step, a mouse pointer is superimposed on the one area in accordance with the user operation. Causes the first window to be popped up, and the mouse pointer is erased from the one area to cause the first window to disappear, and the mouse pointer is superimposed on the one area. The display of the first window is continued by clicking on the operation button of the mouse, and while the display of the first window continues, in another area different from the one area. A second display of the evaluation point calculated in the section corresponding to the other area by superimposing a mouse pointer When the mouse is popped up and the mouse pointer leaves the other area, the second window is erased, and the mouse operation button is in a state where the mouse pointer is superimposed on the other area. The display of the second window is continued by being clicked.

請求項１０に記載の発明は、話者が文を音読したときに発した音声の波形を示す音声波形データに基づいて、前記音読された文を構成する各文要素の開始タイミングから終了タイミングまでの文要素区間と、複数の前記文要素のうち何れかの前記文要素の終了タイミングから次の前記文要素の開始タイミングまでのインターバル区間との少なくとも何れか一方の区間毎に前記音読に対する評価点を算出する評価点算出ステップと、前記評価点算出ステップにより算出された前記区間毎の評価点に基づいて、前記音読に対する総合評価点を算出する総合評価点算出ステップと、前記音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの前記文要素区間の時間長に応じた長さの第１の表示領域に時間軸に沿って前記文要素区間毎に表示させる第１表示制御ステップと、前記文要素区間の時間長より短い所定時間間隔毎に前記音声波形データに基づいて特定された音高と音圧との少なくとも何れか一方の音要素の時系列的な変化を表すグラフを第２の表示領域に前記時間軸に沿って表示させる第２表示制御ステップと、前記総合評価点算出ステップにより算出された総合評価点を第３の表示領域に表示させる第３表示制御ステップと、前記第１の表示領域、前記第２の表示領域、及び複数の前記第１の表示領域間に前記時間軸に沿って位置するスペース領域のうち、何れか一の領域がユーザ操作により指定された場合、指定された前記何れか一の領域に対応する前記区間において算出された前記評価点を表示する第１のウインドウをポップアップ表示させ、前記第１の表示領域がユーザ操作により指定された場合、指定された前記第１の表示領域に少なくとも一部が表示された前記テキストの全部を表示する第３のウインドウをポップアップ表示させる第４表示制御ステップと、をコンピュータに実行させることを特徴とする。請求項１１に記載の発明は、話者が文を音読したときに発した音声の波形を示す音声波形データに基づいて、前記音読された文を構成する各文要素の開始タイミングから終了タイミングまでの文要素区間と、複数の前記文要素のうち何れかの前記文要素の終了タイミングから次の前記文要素の開始タイミングまでのインターバル区間との少なくとも何れか一方の区間毎に前記音読に対する評価点を算出する評価点算出ステップと、前記評価点算出ステップにより算出された前記区間毎の評価点に基づいて、前記音読に対する総合評価点を算出する総合評価点算出ステップと、前記音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの前記文要素区間の時間長に応じた長さの第１の表示領域に時間軸に沿って前記文要素区間毎に表示させる第１表示制御ステップと、前記文要素区間の時間長より短い所定時間間隔毎に前記音声波形データに基づいて特定された音高と音圧との少なくとも何れか一方の音要素の時系列的な変化を表すグラフを第２の表示領域に前記時間軸に沿って表示させる第２表示制御ステップと、前記総合評価点算出ステップにより算出された総合評価点を第３の表示領域に表示させる第３表示制御ステップと、前記第１の表示領域、前記第２の表示領域、及び複数の前記第１の表示領域間に前記時間軸に沿って位置するスペース領域のうち、何れか一の領域がユーザ操作により指定された場合、指定された前記何れか一の領域に対応する前記区間において算出された前記評価点を表示する第１のウインドウをポップアップ表示させる第４表示制御ステップと、前記音読に対する評価項目として予め設定された抑揚、音量、滑舌、及び速度のうち少なくとも何れか一つのそれぞれに対応する前記評価点であって、前記文要素区間毎に算出された前記評価点に基づいて、前記音読に対する区間評価点を前記文要素区間毎に算出する区間評価点算出ステップと、前記区間評価点算出ステップにより算出された各区間評価点を表すアイコンを、前記第１の表示領域毎に対応して配置された第４の表示領域に前記時間軸に沿って前記文要素区間毎に表示させる第５表示制御ステップと、をコンピュータに実行させるプログラムであって、前記第４表示制御ステップにおいては、前記アイコンがユーザ操作により指定された場合、前記指定された前記アイコンに対応する前記文要素区間において算出された評価点であって、前記音読に対する評価項目として予め設定された抑揚、音量、滑舌、及び速度のうち少なくとも何れか一つのそれぞれに対する前記評価点を表示する第４のウインドウをポップアップ表示させることを特徴とする。請求項１２に記載の発明は、話者が文を音読したときに発した音声の波形を示す音声波形データに基づいて、前記音読された文を構成する各文要素の開始タイミングから終了タイミングまでの文要素区間と、複数の前記文要素のうち何れかの前記文要素の終了タイミングから次の前記文要素の開始タイミングまでのインターバル区間との少なくとも何れか一方の区間毎に前記音読に対する評価点を算出する評価点算出ステップと、前記評価点算出ステップにより算出された前記区間毎の評価点に基づいて、前記音読に対する総合評価点を算出する総合評価点算出ステップと、前記音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの前記文要素区間の時間長に応じた長さの第１の表示領域に時間軸に沿って前記文要素区間毎に表示させる第１表示制御ステップと、前記文要素区間の時間長より短い所定時間間隔毎に前記音声波形データに基づいて特定された音高と音圧との少なくとも何れか一方の音要素の時系列的な変化を表すグラフを第２の表示領域に前記時間軸に沿って表示させる第２表示制御ステップと、前記総合評価点算出ステップにより算出された総合評価点を第３の表示領域に表示させる第３表示制御ステップと、前記第１の表示領域、前記第２の表示領域、及び複数の前記第１の表示領域間に前記時間軸に沿って位置するスペース領域のうち、何れか一の領域がユーザ操作により指定された場合、指定された前記何れか一の領域に対応する前記区間において算出された前記評価点を表示する第１のウインドウをポップアップ表示させる第４表示制御ステップと、をコンピュータに実行させるプログラムであって、前記第４表示制御ステップにおいては、前記ユーザ操作に応じて、前記一の領域にマウスのポインタが重畳されることにより前記第１のウインドウをポップアップ表示させ、前記マウスのポインタが前記一の領域から離れることにより前記第１のウインドウを消去させ、前記一の領域に前記マウスのポインタが重畳されている状態で前記マウスの操作ボタンがクリックされることにより前記第１のウインドウの表示を継続させ、前記第１のウインドウの表示が継続している間に前記何れか一の領域とは異なる他の領域にマウスのポインタが重畳されることにより前記他の領域に対応する前記区間において算出された前記評価点を表示する第２のウインドウをポップアップ表示させ、前記マウスのポインタが前記他の領域から離れることにより前記第２のウインドウを消去させ、前記他の領域に前記マウスのポインタが重畳されている状態で前記マウスの操作ボタンがクリックされることにより前記第２のウインドウの表示を継続させることを特徴とする。 The invention according to claim 10 is characterized in that, from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating the waveform of the speech emitted when the speaker read the sentence aloud Score for the reading aloud at least one of the sentence element section and the interval section from the end timing of the sentence element of any of the plurality of sentence elements to the start timing of the next sentence element Evaluation point calculating step for calculating a total evaluation point calculating step for calculating the comprehensive evaluation point for the reading on the basis of the evaluation point for each section calculated by the evaluation point calculating step; At least a portion of the text representing each of the constituent sentence elements in the first display area of a length corresponding to the time length of each of the sentence element sections along the time axis A first display control step of displaying each raw segment, and at least one of a pitch and a sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element section A second display control step of displaying a graph representing a time-series change of an element in a second display area along the time axis; and a third display of the comprehensive evaluation point calculated in the comprehensive evaluation point calculating step Any of a third display control step of displaying in an area, the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas or if one area is designated by the user operation, to pop up a first window displaying the evaluation points calculated in the section corresponding to the specified the any one of the regions, before If the first display area is designated by a user operation, the fourth display control to pop up a third window that displays all of the text at least part of which is displayed in the specified the first display regions And causing the computer to execute the steps. The invention according to claim 11 is from the start timing to the end timing of each sentence element constituting the read sentence based on voice waveform data indicating the waveform of the voice emitted when the speaker read the sentence aloud Score for the reading aloud at least one of the sentence element section and the interval section from the end timing of the sentence element of any of the plurality of sentence elements to the start timing of the next sentence element Evaluation point calculating step for calculating a total evaluation point calculating step for calculating the comprehensive evaluation point for the reading on the basis of the evaluation point for each section calculated by the evaluation point calculating step; At least a portion of the text representing each of the constituent sentence elements in the first display area of a length corresponding to the time length of each of the sentence element sections along the time axis A first display control step of displaying each raw segment, and at least one of a pitch and a sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element section A second display control step of displaying a graph representing a time-series change of an element in a second display area along the time axis; and a third display of the comprehensive evaluation point calculated in the comprehensive evaluation point calculating step Any of a third display control step of displaying in an area, the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas The first window for displaying the evaluation point calculated in the section corresponding to the specified one area is popped up when the area is specified by a user operation. A display control step, the evaluation point corresponding to each of at least one of an inflection, a volume, a tongue, and a speed preset as evaluation items for the reading aloud, calculated for each of the sentence element sections A section evaluation point calculating step of calculating a section evaluation point for the reading aloud for each sentence element section based on the evaluation points, and an icon representing each section evaluation point calculated in the section evaluation point calculating step; A program for causing a computer to execute a fifth display control step of causing a fourth display area arranged corresponding to each first display area to be displayed for each sentence element section along the time axis, In the fourth display control step, when the icon is designated by a user operation, the sentence element section corresponding to the designated icon is displayed. Pops up a fourth window for displaying the evaluation points for at least one of intonation, volume, glide tongue, and speed preset as evaluation items for the reading aloud. It is characterized in that it is displayed. The invention according to claim 12 is from the start timing to the end timing of each sentence element constituting the read sentence based on voice waveform data indicating the waveform of the voice emitted when the speaker aloud the sentence Score for the reading aloud at least one of the sentence element section and the interval section from the end timing of the sentence element of any of the plurality of sentence elements to the start timing of the next sentence element Evaluation point calculating step for calculating a total evaluation point calculating step for calculating the comprehensive evaluation point for the reading on the basis of the evaluation point for each section calculated by the evaluation point calculating step; At least a portion of the text representing each of the constituent sentence elements in the first display area of a length corresponding to the time length of each of the sentence element sections along the time axis A first display control step of displaying each raw segment, and at least one of a pitch and a sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element section A second display control step of displaying a graph representing a time-series change of an element in a second display area along the time axis; and a third display of the comprehensive evaluation point calculated in the comprehensive evaluation point calculating step Any of a third display control step of displaying in an area, the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas The first window for displaying the evaluation point calculated in the section corresponding to the specified one area is popped up when the area is specified by a user operation. A program for causing a computer to execute a display control step, wherein, in the fourth display control step, a mouse pointer is superimposed on the one area according to the user operation, and the first window Is popped up and the mouse pointer is moved away from the one area to erase the first window, and the mouse operation button is clicked while the mouse pointer is superimposed on the one area. To continue the display of the first window, and while the display of the first window continues, the mouse pointer is superimposed on another area different from the one area. Pops up a second window displaying the evaluation points calculated in the section corresponding to the other area by The mouse operation button is clicked in a state where the second window is erased when the mouse pointer leaves the other area, and the mouse pointer is superimposed on the other area. To continue the display of the second window.

請求項１，７及び１０に記載の発明によれば、表示領域内にテキストが収まらない場合であっても、そのテキストで表される文要素の内容を話者等に把握させることができる。 According to the first, seventh and tenth aspects of the invention, even if the text does not fit in the display area, it is possible to allow the speaker or the like to grasp the content of the sentence element represented by the text .

請求項２，８及び１１に記載の発明によれば、アイコンの絵柄が示す区間評価点の内訳を話者等に、効果的に把握させることができる。 According to the second , eighth and eleventh aspects of the present invention, it is possible to make the speaker or the like effectively grasp the breakdown of the section evaluation point indicated by the icon pattern .

請求項３，４，９及び１２に記載の発明によれば、話者等が区間毎の評価点を見比べることが可能となり、どの区間における評価が良いか悪いかを話者等に一見して把握させることができる。 According to the inventions described in claims 3 , 4, 9 and 12 , it becomes possible for the speaker etc. to compare the evaluation points for each section, and it is at first glance to the speaker etc. in which section the evaluation is good or bad. It can be made to grasp .

請求項５に記載の発明によれば、話者等が特に確認したい何れかの文要素区間において算出された評価点の詳細を、その文要素区間における文要素を表すテキスト等に対応付けて、より見易い表示態様で表示させることができる。 According to the fifth aspect of the present invention, the details of the evaluation points calculated in any sentence element section that the speaker or the like particularly wants to confirm are associated with the text or the like representing the sentence element in the sentence element section. It can be displayed in an easy-to-see display mode .

請求項６に記載の発明によれば、話者等が特に確認したい何れかのインターバル区間において算出された評価点の詳細を、より見易い表示態様で表示させることができる。 According to the sixth aspect of the present invention, the details of the evaluation point calculated in any interval section that the speaker or the like particularly wants to check can be displayed in a more easily viewable display mode .

本実施形態に係る音読評価装置Ｓの概要構成例を示す図である。It is a figure which shows the example of a outline | summary structure of the reading aloud evaluation apparatus S which concerns on this embodiment. 手本表示領域５１、話者表示領域５２、及び総合評価点表示領域５３等が配置される表示画面の表示例を示す図である。It is a figure which shows the example of a display of the display screen by which the model display area 51, the speaker display area 52, and the comprehensive evaluation point display area 53 grade | etc., Are arrange | positioned. 手本表示領域５１、話者表示領域５２、及び総合評価点表示領域５３等が配置される表示画面の表示例を示す図である。It is a figure which shows the example of a display of the display screen by which the model display area 51, the speaker display area 52, and the comprehensive evaluation point display area 53 grade | etc., Are arrange | positioned. 手本表示領域５１、話者表示領域５２、及び総合評価点表示領域５３等が配置される表示画面の表示例を示す図である。It is a figure which shows the example of a display of the display screen by which the model display area 51, the speaker display area 52, and the comprehensive evaluation point display area 53 grade | etc., Are arrange | positioned. 実施例１における制御部３の音読評価表示処理を示すフローチャートである。5 is a flowchart showing a reading aloud evaluation display process of the control unit 3 in the first embodiment. 実施例２における制御部３の音読評価表示処理を示すフローチャートである。10 is a flowchart showing a reading aloud evaluation display process of the control unit 3 in the second embodiment.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described based on the drawings.

［１.音読評価装置Ｓの構成及び機能］
初めに、図１を参照して、本発明の一実施形態に係る音読評価装置Ｓの構成及び機能について説明する。図１は、本実施形態に係る音読評価装置Ｓの概要構成例を示す図である。なお、音読評価装置の一例として、パーソナルコンピュータや、携帯型情報端末（スマートフォン等）などが挙げられる。図１に示すように、音読評価装置Ｓは、通信部１、記憶部２、制御部３、操作部４、及びインターフェース（ＩＦ）部５等を備えて構成され、これらの構成要素はバス６に接続されている。操作部４は、ユーザからの操作（ユーザ操作）を受け付け、そのユーザ操作に応じた信号を制御部３へ出力する。ユーザ操作の例として、マウス操作が挙げられる。なお、ディスプレイＤがタッチパネルディスプレイである場合、ユーザ操作は、ユーザの指やペン等による接触操作であってもよい。インターフェース部５には、マイクＭ、及びディスプレイＤ等が接続される。マイクＭは、語学学習や、アナウンス、朗読などの発話練習等を行う練習者である話者が、複数の文要素を含む文（文章）を音読したときに発した音声を集音する。文要素は、文を構成する単位である。文要素の例として、フレーズ、文節、単語の他、後述するように複数のフレーズが結合した結合フレーズ等が挙げられる。ここで、フレーズは、一般に文章を読むときに一息で読む単位である。フレーズは、１以上の文節から構成される。つまり、１つのフレーズが１つの文節から構成される場合もあるし、１つのフレーズが複数の文節から構成される場合もある。文節は、例えば、１つ以上の単語のまとまりである。単語には、名詞、動詞、形容詞、副詞、及び接続詞等の自立語（単独で文節を構成できる品詞）や、助動詞及び助詞等の付属語（単独で文節を構成できない品詞）などがある。音読対象となる文の例として、語学学習や、アナウンス、朗読などで用いられる文章などが挙げられる。ディスプレイＤは、制御部３からの表示指令にしたがって、後述する表示領域等が配置される表示画面を表示する。なお、マイクＭ、及びディスプレイＤは、音読評価装置Ｓと一体型であってもよいし、別体であってもよい。 [1. Configuration and Function of Reading Evaluation Apparatus S]
First, with reference to FIG. 1, the configuration and function of a reading assessing apparatus S according to an embodiment of the present invention will be described. FIG. 1 is a diagram showing an example of a schematic configuration of a reading assessing apparatus S according to the present embodiment. In addition, a personal computer, a portable information terminal (smartphone etc.), etc. are mentioned as an example of a reading aloud evaluation apparatus. As shown in FIG. 1, the reading evaluation apparatus S includes a communication unit 1, a storage unit 2, a control unit 3, an operation unit 4, an interface (IF) unit 5, and the like. It is connected to the. The operation unit 4 receives an operation (user operation) from a user, and outputs a signal corresponding to the user operation to the control unit 3. An example of user operation is mouse operation. When the display D is a touch panel display, the user operation may be a touch operation with a user's finger or a pen. A microphone M, a display D, and the like are connected to the interface unit 5. The microphone M collects a voice emitted when a speaker who is a practitioner who performs language learning, an utterance, a speech practice such as reading, etc. reads a sentence (sentence) including a plurality of sentence elements. A sentence element is a unit that constitutes a sentence. Examples of sentence elements include phrases, clauses, words, and combined phrases in which a plurality of phrases are combined as described later. Here, a phrase is generally a unit of reading in a single breath when reading a sentence. A phrase consists of one or more clauses. That is, one phrase may be composed of one clause or one phrase may be composed of a plurality of clauses. A clause is, for example, a group of one or more words. The words include independent words such as nouns, verbs, adjectives, adverbs, and conjunctions (parts of speech that can compose a clause alone), and appendages such as auxiliary verbs and particles (parts of speech that can not compose a clause alone). Examples of sentences to be read aloud include sentences used in language learning, announcements, readings and the like. The display D displays a display screen on which a display area or the like to be described later is arranged in accordance with a display command from the control unit 3. The microphone M and the display D may be integrated with or separately from the reading evaluation device S.

通信部１は、有線または無線によりネットワーク（図示せず）に接続してサーバ等と通信を行う。記憶部２は、例えばハードディスクドライブ等からなり、ＯＳ（オペレーティングシステム）、及び音読評価表示処理プログラム（本発明のプログラムの一例）等を記憶する。音読評価表示処理プログラムは、コンピュータとしての制御部３に、後述する音読評価表示処理を実行させるプログラムである。音読評価表示処理プログラムは、アプリケーションとして、所定のサーバからダウンロードされてもよいし、ＣＤ、ＤＶＤ等の記録媒体に記憶されて提供されてもよい。また、記憶部２は、複数の文要素を含む文のテキストデータと、この文を音読するときの手本となる音声の波形を示す手本音声波形データを記憶する。ここで、テキストデータには、例えば、音読対象となる文を構成する各文要素を表すテキスト（文字）が文要素毎に区切られて規定されている。例えば、文要素間に挿入される句読点により区切られる。或いは、文要素を表すテキストには、先頭から順番にシリアル番号が付与されていてもよい。なお、手本音声波形データは、所定の音声ファイル形式で記憶される。 The communication unit 1 communicates with a server or the like by connecting to a network (not shown) by wire or wirelessly. The storage unit 2 includes, for example, a hard disk drive, and stores an OS (Operating System), a reading and reading evaluation display processing program (an example of a program of the present invention), and the like. The reading aloud evaluation display processing program is a program that causes the control unit 3 as a computer to execute a reading aloud evaluation display processing described later. The reading aloud evaluation display processing program may be downloaded from a predetermined server as an application, or may be stored and provided on a recording medium such as a CD or a DVD. In addition, the storage unit 2 stores text data of a sentence including a plurality of sentence elements, and model speech waveform data indicating a waveform of speech serving as a model when reading a sentence aloud. Here, in the text data, for example, texts (characters) representing each sentence element constituting the sentence to be read aloud are divided and defined for each sentence element. For example, it is separated by punctuation marks inserted between sentence elements. Alternatively, serial numbers may be assigned to the text representing sentence elements in order from the beginning. The sample speech waveform data is stored in a predetermined speech file format.

制御部３は、コンピュータとしてのＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、及びＲＡＭ（Random Access Memory）等により構成される。制御部３は、音読評価表示処理プログラムにより、音声処理部３１、音読評価部３２、及び表示処理部３３として機能する。音読評価部３２は、本発明における評価点算出手段、区間評価点算出手段、及び総合評価点算出手段の一例である。表示処理部３３は、本発明における第１表示制御手段、第２表示制御手段、第３表示制御手段、第４表示制御手段、及び第５表示制御手段の一例である。 The control unit 3 is configured by a central processing unit (CPU) as a computer, a read only memory (ROM), a random access memory (RAM), and the like. The control unit 3 functions as an audio processing unit 31, a read aloud evaluation unit 32, and a display processing unit 33 according to the aloud evaluation display processing program. The reading aloud evaluation unit 32 is an example of evaluation point calculation means, section evaluation point calculation means, and comprehensive evaluation point calculation means in the present invention. The display processing unit 33 is an example of a first display control unit, a second display control unit, a third display control unit, a fourth display control unit, and a fifth display control unit in the present invention.

音声処理部３１は、所定の音声ファイル形式で記憶された手本音声波形データを処理対象として記憶部２から入力する。入力された手本音声波形データはＲＡＭに記憶される。また、音声処理部３１は、話者が上記文を音読したときに発した音声であってマイクＭにより集音された音声の波形を示す話者音声波形データを入力する。入力された話者音声波形データはＲＡＭに記憶される。なお、音声波形データは、離散化された時系列の音圧波形データであり、例えば、サンプリングレート44.1kHz、量子化16bit、及びモノラルの波形データである。音圧とは、音波による空気の圧力の変化分（Pa）をいう。本実施形態では、音圧として、瞬時音圧（Pa）の二乗平均平方根（RMS）である実効音圧（Pa）の大きさを計算上扱い易い数値で表した音圧レベル(dB)を適用する。音圧レベル(dB)は、広義には音量ともいう。 The voice processing unit 31 inputs model voice waveform data stored in a predetermined voice file format from the storage unit 2 as a processing target. The input sample speech waveform data is stored in the RAM. Further, the voice processing unit 31 inputs speaker voice waveform data indicating a waveform of a voice emitted when the speaker aloud the sentence and is a voice collected by the microphone M. The input speaker voice waveform data is stored in the RAM. The audio waveform data is discrete time-series sound pressure waveform data, and is, for example, waveform data of sampling rate 44.1 kHz, quantization 16 bits, and monaural. The sound pressure refers to the change (Pa) in the pressure of air due to sound waves. In the present embodiment, as the sound pressure, a sound pressure level (dB) representing the magnitude of the effective sound pressure (Pa), which is the root mean square (RMS) of the instantaneous sound pressure (Pa) Do. Sound pressure level (dB) is also referred to as volume in a broad sense.

音声処理部３１は、手本音声波形データに基づいて、各文要素の開始タイミングから終了タイミングまでの手本文要素区間を文要素毎に特定する。そして、音声処理部３１は、文要素毎に特定した手本文要素区間を示す手本文要素区間データに、それぞれの文要素を表すテキストを対応付けてＲＡＭに記憶する。これらの文要素を表すテキストは、例えば、上記処理対象の手本音声波形データに対応付けられているテキストデータから抽出される。なお、手本文要素区間データは、例えば、特定された手本文要素区間の先頭から順番を示すシリアル番号と、この手本文要素区間の時間範囲（例えば、01:00-03:00）とから構成される。同様に、音声処理部３１は、話者音声波形データに基づいて、各文要素の開始タイミングから終了タイミングまでの話者文要素区間を文要素毎に特定する。そして、音声処理部３１は、文要素毎に特定した話者文要素区間を示す話者文要素区間データに、それぞれの文要素を表すテキストを対応付けてＲＡＭに記憶する。これらの文要素を表すテキストは、例えば、話者音声波形データが示す音声の波形から特定される音素から抽出される。音素の例として、母音のみ、子音のみ、子音と母音との組合せの３つが挙げられる。なお、音素の特定方法は、ラベリング手法等で公知であるので詳しい説明を省略する。また、話者文要素区間データは、例えば、特定された話者文要素区間の先頭から順番を示すシリアル番号と、この話者文要素区間の時間範囲とから構成される。 The speech processing unit 31 specifies, for each sentence element, a handwritten text element section from the start timing to the end timing of each sentence element based on the sample speech waveform data. Then, the voice processing unit 31 associates the text representing each sentence element with the hand-text element area data indicating the hand-text element area specified for each sentence element, and stores it in the RAM. The text representing these sentence elements is extracted from, for example, text data associated with the sample speech waveform data to be processed. The hand text element section data includes, for example, a serial number indicating the order from the beginning of the specified hand text element section, and a time range of the hand text element section (for example, 01: 00-03: 00). Be done. Similarly, the speech processing unit 31 specifies, for each sentence element, a speaker sentence element section from the start timing to the end timing of each sentence element based on the speaker speech waveform data. Then, the speech processing unit 31 stores, in the RAM, the speaker sentence element section data indicating the speaker sentence element section specified for each sentence element in association with the text indicating each sentence element. The text representing these sentence elements is extracted from, for example, a phoneme specified from the speech waveform indicated by the speaker speech waveform data. There are three phoneme examples: vowel only, consonant only, and a combination of consonant and vowel. In addition, since the identification method of a phoneme is known by the labeling method etc., detailed description is abbreviate | omitted. In addition, the speaker sentence element section data is constituted of, for example, a serial number indicating an order from the beginning of the specified speaker sentence element section, and a time range of the speaker sentence element section.

ここで、開始タイミングと終了タイミングは、それぞれ、音声の波形から認識されてもよいし、上述したように算出された音圧レベル(dB)から認識されてもよい。例えば、音声処理部３１は、音声の波形の振幅が所定値以上になった時点を開始タイミングとして認識する。或いは、音声処理部３１は、音圧レベル(dB)が所定値以上になった時点を開始タイミングとして認識する。また、例えば、音声処理部３１は、音声の波形の振幅幅が所定値未満になった時点を終了タイミングとして認識する。或いは、音声処理部３１は、音圧レベル(dB)が所定値未満になった時点を終了タイミングとして認識する。なお、例えば、音圧レベル(dB)が所定値未満になった時点から、音圧レベル(dB)が所定値以上になった時点までの時間（無音時間）が閾値以上である場合に限り、音圧レベル(dB)が所定値未満になった時点が終了タイミングとして認識され、且つ音圧レベル(dB)が所定値以上になった時点が開始タイミングとして認識されるとよい（音声の波形の振幅についても同様）。これは、無音時間が閾値より短い場合、その区間で文要素を区切らない趣旨である。 Here, each of the start timing and the end timing may be recognized from the sound waveform or may be recognized from the sound pressure level (dB) calculated as described above. For example, the voice processing unit 31 recognizes a point in time when the amplitude of the voice waveform reaches a predetermined value or more as the start timing. Alternatively, the audio processing unit 31 recognizes a point in time when the sound pressure level (dB) becomes equal to or higher than a predetermined value as the start timing. Also, for example, the voice processing unit 31 recognizes a point in time when the amplitude width of the voice waveform becomes less than a predetermined value as the end timing. Alternatively, the sound processing unit 31 recognizes a point in time when the sound pressure level (dB) becomes less than a predetermined value as the end timing. Note that, for example, only when the time (silence time) from when the sound pressure level (dB) becomes lower than the predetermined value to when the sound pressure level (dB) becomes higher than the predetermined value is the threshold or more, It is preferable that the time when the sound pressure level (dB) falls below a predetermined value is recognized as the end timing, and the time when the sound pressure level (dB) exceeds the predetermined value is recognized as the start timing (the sound waveform The same is true for the amplitude). This is the effect that if the silence time is shorter than the threshold, the sentence element is not separated in that section.

ところで、「車内では（間合い）携帯電話は（間合い）マナーモードに設定の上（間合い）通話はご遠慮下さい」と区切り区切りゆっくり音読するお手本の音声波形データがあるとすると、上記の方法で開始タイミングと終了タイミングとを認識することにより、「車内では」、「携帯電話は」、「マナーモードに設定の上」、「通話はご遠慮下さい」というように、４つのフレーズ毎に対応する手本文要素区間に区切られて特定される。また、話者が、同じ文を、手本と同じ間合いで区切り区切り音読した場合に、上記の方法で開始タイミングと終了タイミングとを認識することで、「車内では」、「携帯電話は」、「マナーモードに設定の上」、「通話はご遠慮下さい」というように、４つのフレーズ毎に対応する話者文要素区間に区切られて特定される。これに対し、話者が、例えば、上記文のうち、「マナーモードに設定の上」と「通話はご遠慮下さい」の部分を、一息で「マナーモードに設定の上通話はご遠慮下さい」と素早く音読した場合、この部分が一息で読むフレーズとなり、上記の方法で開始タイミングと終了タイミングとを認識すると、上記部分は特に区切られずに話者文要素区間が特定されることになる。このように、手本により音読される複数のフレーズが、話者により音読される１つのフレーズに対応している場合、手本により音読されるフレーズに対応する手本文要素区間と、話者により音読されるフレーズの話者文要素区間とを比較し難くなる。このため、このような場合、音声処理部３１は、話者により音読されるフレーズ（「マナーモードに設定の上通話はご遠慮下さい」）を、手本により音読されるフレーズに合わせるように複数の文節または単語に区分して話者文要素区間を特定するとよい。 By the way, if you have voice waveform data of a model that you can decipher the segment slowly and say "inside the car (set up) (set up) please refrain from making calls (set up) in silent mode (set up), start timing according to the above method And the end timing, "in the car," "mobile phone is", "set in manner mode", "please refrain from calling", etc., corresponding to the text body elements of each of the four phrases It is divided into sections and identified. In addition, when the speaker reads the same sentence in the same way as the example sentence, breaks and reads aloud, “in the car” and “mobile phone” by recognizing the start timing and the end timing by the above method. For example, "set in manner mode" and "please refrain from calling", each of the four phrases is identified and divided into corresponding speaker sentence element sections. On the other hand, the speaker, for example, in the above sentence, "Please set to silent mode" and "Please refrain from calling" in a short pause, "Please refrain from calling on silent mode" In the case of reading aloud, this part becomes a phrase read in a single breath, and when the start timing and the end timing are recognized by the above method, the above-mentioned part is not particularly divided, and the speaker sentence element section is specified. As described above, when a plurality of phrases read aloud by the model correspond to one phrase read aloud by the speaker, the hand-text element section corresponding to the phrase read aloud by the model and the speaker It becomes difficult to compare with the speaker sentence element section of the aloud phrase. Therefore, in such a case, the voice processing unit 31 sets a plurality of phrases read aloud by the speaker ("Please refrain from calling after setting in the manner mode") to the phrases read aloud by the model. It is preferable to identify speaker sentence element intervals by dividing into clauses or words.

上記とは逆に、例えば、「車内では（間合い）携帯電話はマナーモードに設定の上通話はご遠慮下さい」というように、一部素早く音読するお手本の音声波形データがあるとすると、上記の方法で開始タイミングと終了タイミングとを認識することにより、「車内では」、「携帯電話はマナーモードに設定の上通話はご遠慮下さい」というように、２つのフレーズ毎に対応する手本文要素区間に区切られて特定される。これに対し、話者が、「車内では（間合い）携帯電話は（間合い）マナーモードに設定の上（間合い）通話はご遠慮下さい」と区切り区切りゆっくり音読した場合、上記の方法で開始タイミングと終了タイミングとを認識することで、「車内では」、「携帯電話は」、「マナーモードに設定の上」、「通話はご遠慮下さい」というように、４つのフレーズ毎に対応する話者文要素区間に区切られて特定されることになる。このように、手本により音読される１つのフレーズが、話者により音読される複数のフレーズに対応している場合も、手本により音読されるフレーズに対応する手本文要素区間と、話者により音読されるフレーズの話者文要素区間とを比較し難くなる。このため、このような場合、音声処理部３１は、例えば、手本により音読されるフレーズに合わせるように、例えば「携帯電話は」と「マナーモードに設定の上」と「通話はご遠慮下さい」という３つのフレーズを含む結合フレーズに対応する話者文要素区間を特定するとよい。 Contrary to the above, for example, if there is voice waveform data of a model that reads aloud in part quickly, such as "Please set the mobile phone to silent mode in the car (in the car)" By recognizing the start timing and the end timing in, the hand text element section corresponding to each of two phrases is separated as "in the car" and "Please refrain from calling after setting the mobile phone in the manner mode". Be identified. On the other hand, if the speaker splits aloud and reads slowly in the car, "Please refrain from making calls in the car (set up) (set up) in the mobile mode (set up)" (set up) in the car mode, start timing and end according to the above method Recognizing the timing, the speaker sentence element interval corresponding to every 4 phrases such as "in the car", "mobile phone", "set in manner mode", "please refrain from calling" and so on. It will be separated and identified. As described above, even when one phrase read aloud according to the model corresponds to a plurality of phrases read aloud by the speaker, the hand-text element section corresponding to the phrase read aloud by the model, and the speaker Makes it difficult to compare the speaker sentence element section of the aloud phrase. Therefore, in such a case, the voice processing unit 31 may, for example, set "mobile phone is" and "set in manner mode" and "please refrain from calling" to match the phrase read aloud by the model. It is preferable to specify a speaker sentence element interval corresponding to a combined phrase including three phrases.

また、音声処理部３１は、手本音声波形データに基づいて、複数の文要素のうち何れかの文要素の終了タイミングから次の文要素の開始タイミングまでの手本インターバル区間を特定する。そして、音声処理部３１は、特定した手本インターバル区間を示す手本インターバル区間データをＲＡＭに記憶する。なお、特定された手本インターバル区間には、例えば先頭から順番にシリアル番号が付与される。同様に、音声処理部３１は、話者音声波形データに基づいて、複数の文要素のうち何れかの文要素の終了タイミングから次の文要素の開始タイミングまでの話者インターバル区間を特定する。そして、音声処理部３１は、特定した話者インターバル区間を示す話者インターバル区間データをＲＡＭに記憶する。なお、特定された話者インターバル区間には、例えば先頭から順番にシリアル番号が付与される。 Further, the voice processing unit 31 specifies a model interval section from the end timing of any one sentence element of the plurality of sentence elements to the start timing of the next sentence element based on the model voice waveform data. Then, the audio processing unit 31 stores, in the RAM, model interval section data indicating the specified model interval section. A serial number is assigned to the identified sample interval section in order from the top, for example. Similarly, the voice processing unit 31 specifies a speaker interval section from the end timing of any one sentence element of the plurality of sentence elements to the start timing of the next sentence element based on the speaker voice waveform data. Then, the speech processing unit 31 stores, in the RAM, speaker interval section data indicating the specified speaker interval section. A serial number is given to the identified speaker interval section in order from the head, for example.

また、音声処理部３１は、手本音声波形データから所定時間毎に切り出したデータから音圧レベル(dB)を手本音圧として所定時間間隔毎に特定する。そして、音声処理部３１は、所定時間間隔毎に特定した手本音圧を示す手本音圧データをＲＡＭに記憶する。同様に、音声処理部３１は、話者音声波形データから所定時間毎に切り出したデータから音圧レベル(dB)を話者音圧として所定時間間隔毎に特定する。そして、音声処理部３１は、所定時間間隔毎に特定した話者音圧を示す話者音圧データをＲＡＭに記憶する。また、音声処理部３１は、手本音声波形データから所定時間毎に切り出したデータから基本周波数（Hz）を算出し、算出した基本周波数（Hz）を手本音高として所定時間間隔毎に特定する。なお、音高（抑揚、ピッチともいう）の特定方法には、例えば、ゼロクロス法やベクトル自己相関等の公知の手法を適用できる。そして、音声処理部３１は、所定時間間隔毎に特定した手本音高を示す手本音高データをＲＡＭに記憶する。同様に、音声処理部３１は、話者音声波形データから所定時間毎に切り出したデータから基本周波数（Hz）を算出し、算出した基本周波数（Hz）を話者音高として所定時間間隔毎に特定する。そして、音声処理部３１は、所定時間間隔毎に特定した話者音高を示す話者音高データをＲＡＭに記憶する。なお、音圧特定及び音高特定するための上記所定時間は、文要素区間の時間長（時間的長さ）より短い時間であり、例えば１０ｍｓ程度に設定される。 Further, the sound processing unit 31 specifies a sound pressure level (dB) as a model sound pressure from data sampled at predetermined time intervals from the sample sound waveform data at predetermined time intervals. Then, the audio processing unit 31 stores, in the RAM, model sound pressure data indicating the model sound pressure specified at each predetermined time interval. Similarly, the voice processing unit 31 specifies the sound pressure level (dB) as the speaker sound pressure at predetermined time intervals from the data extracted at predetermined time intervals from the speaker voice waveform data. Then, the voice processing unit 31 stores, in the RAM, speaker sound pressure data indicating the speaker sound pressure specified at predetermined time intervals. Further, the voice processing unit 31 calculates a fundamental frequency (Hz) from data extracted at predetermined time intervals from the model voice waveform data, and specifies the calculated fundamental frequency (Hz) as a model pitch at predetermined time intervals. . Note that, as a method of specifying the pitch (also referred to as intonation or pitch), for example, a known method such as a zero crossing method or vector autocorrelation can be applied. Then, the voice processing unit 31 stores, in the RAM, model pitch data indicating a model pitch specified at each predetermined time interval. Similarly, the voice processing unit 31 calculates a fundamental frequency (Hz) from data extracted at predetermined time intervals from the speaker voice waveform data, and uses the calculated fundamental frequency (Hz) as a speaker's voice pitch at predetermined time intervals. Identify. Then, the voice processing unit 31 stores, in the RAM, speaker pitch data indicating the speaker pitches specified at predetermined time intervals. The predetermined time for specifying the sound pressure and the sound pitch is shorter than the time length (temporal length) of the sentence element section, and is set to, for example, about 10 ms.

また、音声処理部３１は、手本音声波形データから所定時間毎に切り出したデータを窓掛けで区切って（例えば、25ms毎にフレーム化）、フーリエ解析（ＦＦＴ）することで振幅スペクトルを求める。そして、音声処理部３１は、求めた振幅スペクトルにメルフィルタバンクをかけ、メルフィルタバンクの出力を対数化した値を離散コサイン変換（ＤＣＴ）することでＭＦＣＣ（メル周波数ケプストラム係数）を算出することで、手本の声道特性を示す特徴量として手本文要素区間毎に特定する。そして、音声処理部３１は、手本文要素区間毎に特定した、手本の声道特性を示す特徴量を示す手本特徴量データをＲＡＭに記憶する。同様に、音声処理部３１は、話者音声波形データから所定時間毎に切り出したデータを窓掛けで区切って、フーリエ解析することで振幅スペクトルを求める。そして、音声処理部３１は、求めた振幅スペクトルにメルフィルタバンクをかけ、メルフィルタバンクの出力を対数化した値を離散コサイン変換することでＭＦＣＣを算出することで、話者の声道特性を示す特徴量として話者文要素区間毎に特定する。そして、音声処理部３１は、話者文要素区間毎に特定した、話者の声道特性を示す特徴量を示す話者特徴量データをＲＡＭに記憶する。 Further, the voice processing unit 31 divides the data extracted from the sample voice waveform data at predetermined time intervals by windowing (for example, framing every 25 ms), and obtains an amplitude spectrum by Fourier analysis (FFT). Then, the voice processing unit 31 calculates a MFCC (mel frequency cepstrum coefficient) by applying a mel filter bank to the obtained amplitude spectrum and performing discrete cosine transformation (DCT) on a value obtained by logarithmically converting the output of the mel filter bank. Then, it is specified for each hand text region as a feature indicating the vocal tract characteristics of the example. Then, the voice processing unit 31 stores, in the RAM, model feature quantity data that indicates the feature quantity that indicates the vocal tract characteristics of the model that has been identified for each of the handwritten text element sections. Similarly, the voice processing unit 31 divides the data extracted from the speaker voice waveform data at predetermined time intervals by windowing, and obtains an amplitude spectrum by Fourier analysis. Then, the voice processing unit 31 applies a mel filter bank to the obtained amplitude spectrum, and discrete cosine transforms a value obtained by converting the output of the mel filter bank into a logarithmic value to calculate MFCC, whereby the vocal tract characteristic of the speaker is obtained. The feature amount to be indicated is specified for each speaker sentence element section. Then, the speech processing unit 31 stores, in the RAM, the speaker feature amount data indicating the feature amount indicating the vocal tract characteristics of the speaker specified for each speaker sentence element section.

次に、音読評価部３２（評価点算出手段の一例）は、話者文要素区間と話者インターバル区間との少なくとも何れか一方の区間毎、且つ、予め設定された評価項目毎に、話者の音読に対する評価点を算出する。ここで、話者文要素区間における評価項目の例として、抑揚、音量、滑舌、及び速度が挙げられる。また、話者インターバル区間における評価項目の例として、間（間合い）が挙げられる。例えば、音読評価部３２は、音声処理部３１により特定された手本音高と話者音高との差を文要素区間毎（つまり、シリアル番号が互いに同一の手本文要素区間及び話者文要素区間毎）に算出し、算出した差に基づいて、話者の抑揚に対する評価点を文要素区間毎に算出する。この評価点は、例えば３０点を満点とし、差が０に近いほど高くなる（満点に近づく）ように算出される。また、音読評価部３２は、手本音圧と話者音圧との差を文要素区間毎に算出し、算出した差に基づいて、話者の音量に対する評価点を文要素区間毎に算出する。この評価点は、例えば３０点を満点とし、差が０に近いほど高くなるように算出される。また、音読評価部３２は、手本の声道特性を示す特徴量と話者の声道特性を示す特徴量との類似度を文要素区間毎に算出し、算出した類似度に基づいて、話者の滑舌に対する評価点を文要素区間毎に算出する。この評価点は、例えば３０点を満点とし、類似度が高いほど高くなるように算出される。また、音読評価部３２は、手本文要素区間の時間長と話者文要素区間の時間長との時間差を文要素区間毎に算出し、算出した時間差の絶対値に基づいて、話者の速度（音読スピード）に対する評価点を文要素区間毎に算出する。この評価点は、例えば３０点を満点とし、時間差の絶対値が０に近いほど高くなるように算出される。また、音読評価部３２は、手本インターバル区間の時間長と、話者インターバル区間の時間長との時間差をインターバル区間毎に算出し、算出した時間差の絶対値に基づいて、話者の間合いに対する評価点をインターバル区間毎に算出する。この評価点は、例えば３０点を満点とし、時間差の絶対値が０に近いほど高くなるように算出される。 Next, the reading aloud evaluation unit 32 (an example of the evaluation point calculation means) generates a speaker for each of at least one of the speaker sentence element section and the speaker interval section, and for each evaluation item set in advance. Calculate the evaluation score for reading aloud. Here, examples of evaluation items in the speaker sentence element section include intonation, volume, syntony, and speed. Further, as an example of the evaluation item in the speaker interval section, a gap may be mentioned. For example, the reading evaluation unit 32 determines the difference between the sample pitch specified by the speech processing unit 31 and the speaker pitch for each sentence element section (that is, the text portion and the speaker sentence element having the same serial number). Based on the calculated difference calculated for each section, evaluation points for the speaker's intonation are calculated for each sentence element section. The evaluation score is calculated, for example, such that 30 points are full marks, and the closer the difference is to 0, the higher (closer to full marks). In addition, the reading evaluation unit 32 calculates the difference between the model sound pressure and the speaker sound pressure for each sentence element section, and calculates an evaluation point for the volume of the speaker for each sentence element section based on the calculated difference. . This evaluation score is calculated, for example, so that 30 points are full marks, and the closer the difference is to 0, the higher. Further, the reading aloud evaluation unit 32 calculates, for each sentence element section, the similarity between the feature amount indicating the vocal tract characteristics of the model and the feature amount indicating the vocal tract characteristics of the speaker, and based on the calculated similarity, An evaluation point for the speaker's tongue tongue is calculated for each sentence element interval. This evaluation score is calculated, for example, so that 30 points are full marks, and the higher the degree of similarity, the higher. Further, the reading evaluation unit 32 calculates the time difference between the time length of the hand text element section and the time length of the speaker sentence element section for each sentence element section, and based on the calculated absolute value of the time difference, the speed of the speaker An evaluation point for (reading speed) is calculated for each sentence element section. This evaluation point is calculated, for example, so that 30 points are full marks, and the closer the absolute value of the time difference is to 0, the higher. Further, the reading evaluation unit 32 calculates the time difference between the time length of the sample interval section and the time length of the speaker interval section for each interval section, and based on the calculated absolute value of the time difference, the speaker interval is calculated. Evaluation points are calculated for each interval section. This evaluation point is calculated, for example, so that 30 points are full marks, and the closer the absolute value of the time difference is to 0, the higher.

また、音読評価部３２（区間評価点算出手段の一例）は、文要素区間毎に算出された、評価項目別の評価点に基づいて、話者の音読に対する区間評価点を文要素区間毎に算出する。例えば、評価項目が抑揚と音量とである場合、音読評価部３２は、抑揚に対する評価点と音量に対する評価点との平均値（合計値でもよい）を、区間評価点として文要素区間毎に算出する。また、評価項目が抑揚のみである場合、音読評価部３２は、抑揚に対する評価点を、区間評価点としてもよい。また、音読評価部３２（総合評価点算出手段の一例）は、上述したように算出された文要素区間毎の評価項目別の評価点に基づいて、全文要素区間における評価項目別の総合評価点を算出する。例えば、音読評価部３２は、文要素区間毎に算出された、評価項目別の評価点の平均値（合計値でもよい）を、全文要素区間における評価項目別の総合評価点として算出する。また、音読評価部３２は、上述したように算出されたインターバル区間毎の評価点に基づいて、全インターバル区間における間合いに対する総合評価点を算出する。例えば、音読評価部３２は、インターバル区間毎に算出された、間合いに対する評価点の平均値または合計値を、全インターバル区間における間合いに対する総合評価点として算出する。そして、音読評価部３２（総合評価点算出手段の一例）は、全文要素区間における評価項目別の総合評価点と、全インターバル区間における間合いに対する総合評価点との合計値（平均値でもよい）を、全区間（つまり、文要素区間及びインターバル区間）における総合評価点として算出する。 The reading evaluation unit 32 (an example of a section evaluation point calculation means) calculates a section evaluation point for the speaker's reading aloud for each sentence element section based on the evaluation points for each evaluation item calculated for each sentence element section. calculate. For example, when the evaluation items are intonation and volume, the reading evaluation unit 32 calculates an average value (may be a total value) of the evaluation point for intonation and the evaluation point for volume as a section evaluation point for each sentence element section Do. In addition, when the evaluation item is only intonation, the reading evaluation unit 32 may use an evaluation point for intonation as a section evaluation point. In addition, the reading aloud evaluation unit 32 (an example of the integrated evaluation point calculation means) calculates the integrated evaluation points by evaluation item in the full text element section based on the evaluation points classified by evaluation item for each sentence element section calculated as described above. Calculate For example, the reading aloud evaluation unit 32 calculates an average value (or a total value) of evaluation points for each evaluation item calculated for each sentence element section as a comprehensive evaluation point for each evaluation item in the full text element section. Further, the reading aloud evaluation unit 32 calculates a comprehensive evaluation point for the gaps in all the interval sections based on the evaluation points for each interval section calculated as described above. For example, the reading aloud evaluation unit 32 calculates the average value or the total value of the evaluation points for the gaps calculated for each interval section as a total evaluation point for the gaps in all the interval sections. Then, the read aloud evaluation unit 32 (an example of the integrated evaluation point calculating means) calculates the total value (may be an average value) of the integrated evaluation points by evaluation items in the full text element section and the integrated evaluation points for intervals in all interval sections. , It is calculated as a comprehensive evaluation point in all sections (that is, sentence element sections and interval sections).

次に、表示処理部３３は、文を音読するときの手本となる音声に関する情報が表示される手本表示領域、話者が文を音読したときに発した音声に関する情報が表示される話者表示領域、及び話者の音読に対する総合評価点が表示される総合評価点表示領域等が配置される表示画面をディスプレイＤに表示させる。図２乃至図４は、手本表示領域５１、話者表示領域５２、及び総合評価点表示領域５３等が配置される表示画面の表示例を示す図である。図２（Ａ）に示す表示画面には、時間軸ｔを例えば横軸（Ｘ軸）とすることで、手本表示領域５１と話者表示領域５２とが時間軸ｔと直交する上下方向（縦方向）に並んで配置されている。このとき、表示処理部３３は、手本表示領域５１において、音読対象となる文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの手本文要素区間の時間長に応じた長さの手本テキスト表示領域５１ａ１〜５１ａ３に時間軸ｔに沿って手本文要素区間毎に表示させる。ここで、手本テキスト表示領域５１ａ１〜５１ａ３の領域内に表示されるテキストは、それぞれの手本文要素区間の手本文要素区間データに対応付けられたテキストである。 Next, the display processing unit 33 displays a model display area in which information on a voice that is a model when reading a sentence aloud is displayed, and a speech in which information on a voice emitted when the speaker aloud the sentence is displayed A display screen is displayed on which the person display area, the comprehensive evaluation point display area where the comprehensive evaluation point for the speaker's aloud reading is displayed, and the like are arranged. FIGS. 2 to 4 are diagrams showing display examples of a display screen on which a sample display area 51, a speaker display area 52, an overall evaluation point display area 53 and the like are arranged. On the display screen shown in FIG. 2A, by setting the time axis t to, for example, the horizontal axis (X axis), the vertical display direction in which the model display area 51 and the speaker display area 52 are orthogonal to the time axis t Are arranged side by side). At this time, in the sample display area 51, the display processing unit 33 has a length corresponding to the time length of each hand-text element section at least a part of the text representing each sentence element constituting the sentence to be read aloud The model text display areas 51a1 to 51a3 are displayed along the time axis t for each hand-text element section. Here, the texts displayed in the model text display areas 51a1 to 51a3 are texts associated with the body-of-text element zone data of the respective body-of-text element zones.

また、表示処理部３３（第１表示制御手段の一例）は、話者表示領域５２において、話者により音読された文を構成する各文要素を表すテキストの少なくとも一部を、それぞれの話者文要素区間の時間長に応じた長さの話者テキスト表示領域（第１の表示領域の一例）５２ａ１〜５２ａ３に時間軸ｔに沿って話者文要素区間毎に表示させる。ここで、話者テキスト表示領域５２ａ１〜５２ａ３の輪郭内に表示されるテキストは、それぞれの話者文要素区間の話者文要素区間データに対応付けられたテキストである。また、例えば、話者テキスト表示領域５２ａ１の時間軸ｔと並行する方向の長さＤ１（言い換えれば、ピクセル数）は、話者テキスト表示領域５２ａ１に対応する話者文要素区間の時間長に応じた長さに設定されている。すなわち、文要素区間の時間長が長いほど、これに対応するテキスト表示領域の長さは距離的に長く設定される。テキスト表示領域内にテキストが収まらない場合、本実施形態では、図２（Ａ）に示す話者テキスト表示領域５２ａ３のように、話者テキスト表示領域５２ａ３の領域からはみ出す部分のテキストは表示されないように構成される。また、図２（Ａ）に示すように、話者テキスト表示領域５２ａ１と話者テキスト表示領域５２ａ２との間には、時間軸ｔに沿って話者スペース領域５２ｓ１が位置する。また、話者テキスト表示領域５２ａ２と話者テキスト表示領域５２ａ３との間には、時間軸ｔに沿って話者スペース領域５２ｓ２が位置する。例えば、話者スペース領域５２ｓ１の時間軸ｔと並行する方向の長さＤ２は、話者スペース領域５２ｓ１に対応する話者インターバル区間の時間長に応じた長さに設定されている。すなわち、話者インターバル区間の時間長が長いほど、これに対応する話者スペース領域の長さは距離的に長く設定される。 Further, the display processing unit 33 (an example of the first display control means) is configured to, in the speaker display area 52, at least a part of the text representing each sentence element constituting the sentence read aloud by the speaker The speaker text display area (an example of the first display area) 52a1 to 52a3 having a length corresponding to the time length of the sentence element section is displayed for each speaker sentence element section along the time axis t. Here, the texts displayed in the contours of the speaker text display areas 52a1 to 52a3 are texts associated with the speaker sentence element segment data of the respective speaker sentence element segments. Also, for example, the length D1 (in other words, the number of pixels) in the direction parallel to the time axis t of the speaker text display area 52a1 corresponds to the time length of the speaker sentence element section corresponding to the speaker text display area 52a1. The length is set. That is, as the time length of the sentence element section is longer, the length of the text display area corresponding to this is set longer in distance. If the text does not fit within the text display area, in the present embodiment, as in the speaker text display area 52a3 shown in FIG. 2A, the text in the portion extending out of the speaker text display area 52a3 is not displayed. Configured Further, as shown in FIG. 2A, a speaker space area 52s1 is located along the time axis t between the speaker text display area 52a1 and the speaker text display area 52a2. Further, a speaker space area 52s2 is located along the time axis t between the speaker text display area 52a2 and the speaker text display area 52a3. For example, the length D2 in the direction parallel to the time axis t of the speaker space area 52s1 is set to a length corresponding to the time length of the speaker interval section corresponding to the speaker space area 52s1. That is, as the time length of the speaker interval section is longer, the length of the corresponding speaker space area is set longer in distance.

また、表示処理部３３（第５表示制御手段の一例）は、音読評価部３２により算出された各区間評価点を表すアイコン５２ａ１１〜５２ａ３１を、それぞれのアイコンに対応する話者文要素区間に対応する話者テキスト表示領域５２ａ１〜５２ａ３毎に対応して配置された領域（第４の表示領域の一例）に時間軸ｔに沿って話者文要素区間毎に表示させる。図２（Ａ）の例では、アイコン５２ａ１１〜５２ａ３１は、話者テキスト表示領域５２ａ１〜５２ａ３内の右端部に配置されている。区間評価点を表すアイコンの絵柄の例として、晴れを示す絵柄、曇りを示す絵柄、及び雨を示す絵柄が挙げられる。ここで、区間評価点の満点を３０点としたとき、晴れを示す絵柄は、区間評価点が例えば２１〜３０点であることを示す。曇りを示す絵柄は、区間評価点が例えば１１〜２０点であることを示す。雨を示す絵柄は、区間評価点が例えば０〜１０点であることを示す。なお、区間評価点を表すアイコンの絵柄及び絵柄の数は、任意に設定可能である。 In addition, the display processing unit 33 (an example of the fifth display control means) corresponds the icons 52a11 to 52a31 representing the section evaluation points calculated by the reading evaluation unit 32 to the speaker sentence element sections corresponding to the respective icons. In each of the speaker text display areas 52a1 to 52a3 to be displayed, an area (an example of a fourth display area) arranged corresponding to each speaker text element area is displayed along the time axis t. In the example of FIG. 2A, the icons 52a11 to 52a31 are disposed at the right end in the speaker text display areas 52a1 to 52a3. As an example of the pattern of the icon showing a section evaluation point, the pattern which shows fine, the pattern which shows cloudy, and the pattern which shows rain are mentioned. Here, when the full score of the section evaluation point is 30 points, the pattern showing fine indicates that the section evaluation point is, for example, 21 to 30 points. The design showing fogging indicates that the section evaluation point is, for example, 11 to 20 points. The pattern indicating rain indicates that the section evaluation point is, for example, 0 to 10 points. In addition, the pattern of the icon showing a section evaluation point and the number of patterns can be set arbitrarily.

また、表示処理部３３は、音声処理部３１により特定された手本音高と手本音圧との少なくとも何れか一方の音要素の時系列的な変化を表す手本グラフを手本グラフ表示領域５１ｂに時間軸に沿って表示させる。また、表示処理部３３（第２表示制御手段の一例）は、音声処理部３１により特定された話者音高と話者音圧との少なくとも何れか一方の音要素の時系列的な変化を表す話者グラフを話者グラフ表示領域５２ｂ（第２の表示領域の一例）に時間軸ｔに沿って表示させる。このとき、表示処理部３３は、手本グラフにより表される音要素（つまり、手本音高または手本音圧）と同じ種類の音要素（例えば、手本音高と同じ種類の音要素は話者音高）の時系列的な変化を表す話者グラフを、時間軸ｔに沿って表示させることになる。図２（Ａ）の例では、手本グラフ表示領域５１ｂには、手本音高の時系列的な変化を表す手本グラフ（折線グラフ）５１ｂ１１〜５１ｂ３１と、手本音圧の時系列的な変化を表す手本グラフ（棒グラフ）５１ｂ１２〜５１ｂ３２とが、時間軸ｔに沿って手本文要素区間毎に区別して表示されている。また、話者グラフ表示領域５２ｂには、話者音高の時系列的な変化を表す話者グラフ（折線グラフ）５２ｂ１１〜５２ｂ３１と、話者音圧の時系列的な変化を表す話者グラフ（棒グラフ）５２ｂ１２〜５２ｂ３２とが、時間軸ｔに沿って話者文要素区間毎に区別して表示されている。 Further, the display processing unit 33 displays a model graph representing a time-series change of at least one sound element of the model pitch and the model sound pressure specified by the voice processing section 31 in a model graph display area 51b. To display along the time axis. In addition, the display processing unit 33 (an example of the second display control unit) generates a time-series change in at least one of the sound elements of the speaker pitch and the speaker sound pressure specified by the sound processing unit 31. A speaker graph to be displayed is displayed in the speaker graph display area 52b (an example of a second display area) along the time axis t. At this time, the display processing unit 33 displays the same kind of sound element (for example, the same kind of sound element as the example sound pitch as the sound element represented by the example graph (that is, the example sound pitch or the model sound pressure) A speaker graph representing time-series changes in pitch) is displayed along the time axis t. In the example of FIG. 2 (A), the model graph display area 51b includes a model graph (broken line graph) 51b11 to 51b31 representing a time-series change in model pitch and a time-series change in model sound pressure. A model graph (bar graph) 51b12 to 51b32 representing the symbol is displayed separately for each hand text element section along the time axis t. In the speaker graph display area 52b, a speaker graph (broken line graph) 52b11 to 52b31 showing time-series changes in speaker pitch and a speaker graph showing time-series changes in speaker sound pressure (Bar graphs) 52b12 to 52b32 are displayed separately for each speaker sentence element section along the time axis t.

また、図２（Ａ）に示す表示画面には、総合評価点表示領域５３（第３の表示領域の一例）、スクロールバー５４、及び表示画面を閉じるための「閉じる」キー５５が配置されている。表示処理部３３（第３表示制御手段の一例）は、音読評価部３２により算出された総合評価点を総合評価点表示領域５３に表示させる。図２（Ａ）の例では、全話者文要素区間における評価項目（抑揚、音量、滑舌、及び速度）別の総合評価点と、全話者インターバル区間における間合い（間）に対する総合評価点と、全区間における総合評価点とが表示されている。なお、手本表示領域５１及び話者表示領域５２における表示内容は、スクロールバー５４のユーザ操作に応じて、時間軸ｔと並行する方向にスクロール表示される。 Further, on the display screen shown in FIG. 2A, a comprehensive evaluation point display area 53 (an example of a third display area), a scroll bar 54, and a "close" key 55 for closing the display screen are arranged. There is. The display processing unit 33 (an example of the third display control unit) causes the comprehensive evaluation point display area 53 to display the comprehensive evaluation point calculated by the reading evaluation unit 32. In the example of FIG. 2 (A), the overall evaluation points for the evaluation items (abduction, volume, syntony, and speed) in all speaker sentence element intervals, and the overall evaluation points for the gap (interval) in all speaker interval intervals And the comprehensive evaluation points in all sections. The display contents in the sample display area 51 and the speaker display area 52 are scrolled and displayed in the direction parallel to the time axis t in accordance with the user operation of the scroll bar 54.

そして、表示処理部３３は、話者テキスト表示領域５２ａ１〜５２ａ３、及び話者スペース領域５２ｓ１，５２ｓ２のうち、何れか一の領域がユーザ操作により指定された場合、指定された何れか一の領域に対応する区間（文要素区間、またはインターバル区間）において算出された評価点を表示するウインドウＷ１１（第１のウインドウの一例）をポップアップ表示させる。これにより、話者等が特に確認したい何れかの区間において算出された評価点の詳細を、より見易い表示態様で表示させることができる。 Then, when any one of the speaker text display areas 52a1 to 52a3 and the speaker space areas 52s1 and 52s2 is specified by the user operation, the display processing unit 33 indicates any one of the specified areas. The window W11 (an example of the first window) for displaying the evaluation point calculated in the section (the sentence element section or the interval section) corresponding to is popped up. This makes it possible to display the details of the evaluation points calculated in any of the sections that the speaker or the like particularly wants to confirm in a more easily viewable display mode.

図２（Ｂ）の例は、話者テキスト表示領域５２ａ１上に、マウスのポインタＰ（カーソル）を重畳させることにより、話者テキスト表示領域５２ａ１が指定された場合を示している。この場合、表示処理部３３は、指定された話者テキスト表示領域５２ａ１に対応する話者文要素区間において算出された、話者の抑揚、音量、滑舌、及び速度それぞれに対する評価点を表示するウインドウＷ１１をポップアップ表示させる。これにより、話者等が特に確認したい何れかの話者文要素区間において算出された評価点の詳細を、その話者文要素区間における文要素を表すテキスト等に対応付けて、より見易い表示態様で表示させることができる。こうして表示されたウインドウＷ１１は、話者テキスト表示領域５２ａ１からポインタＰが離れることにより、話者テキスト表示領域５２ａ１の指定が解除された場合に消去される（つまり、ウインドウＷ１１が閉じる）。一方、ウインドウＷ１１の表示は、話者テキスト表示領域５２ａ１上にマウスのポインタＰが重畳されている状態で、マウスの操作ボタンがクリックされることにより継続（継続表示）される。こうして継続表示されたウインドウＷ１１は、話者テキスト表示領域５２ａ１上にマウスのポインタＰが重畳されている状態で、マウスの操作ボタンが再度、クリックされることにより消去される。なお、ウインドウＷ１１に表示される評価点は、抑揚、音量、滑舌、及び速度のうちの一部（例えば、音量と速度）に対する評価点であってもよい。 The example of FIG. 2B shows a case where the speaker text display area 52a1 is designated by superimposing the mouse pointer P (cursor) on the speaker text display area 52a1. In this case, the display processing unit 33 displays evaluation points for each of the speaker's intonation, volume, glide tongue, and speed, which are calculated in the speaker sentence element section corresponding to the specified speaker text display area 52a1. The window W11 is popped up. In this way, the detail of the evaluation point calculated in any speaker sentence element section that the speaker or the like particularly wants to confirm is associated with the text or the like representing the sentence element in the speaker sentence element section to make the display easier to view Can be displayed. The window W11 displayed in this way is erased when the designation of the speaker text display area 52a1 is canceled by the pointer P leaving the speaker text display area 52a1 (that is, the window W11 is closed). On the other hand, the display of the window W11 is continued (continuously displayed) by clicking the mouse operation button in a state where the mouse pointer P is superimposed on the speaker text display area 52a1. The window W11 thus continuously displayed is erased by clicking the mouse operation button again while the mouse pointer P is superimposed on the speaker text display area 52a1. Note that the evaluation points displayed on the window W11 may be evaluation points for a part of the intonation, the volume, the tongue, and the speed (for example, the volume and the speed).

図３（Ａ）の例は、話者スペース領域５２ｓ１上に、マウスのポインタＰを重畳させることにより、話者スペース領域５２ｓ１が指定された場合を示している。この場合、表示処理部３３は、指定された話者スペース領域５２ｓ１に対応する話者インターバル区間において算出された、話者の間合いに対する評価点を表示するウインドウＷ２１をポップアップ表示させる。これにより、話者等が特に確認したい何れかの話者インターバル区間において算出された評価点の詳細を、より見易い表示態様で表示させることができる。こうして表示されたウインドウＷ２１は、話者スペース領域５２ｓ１からポインタＰが離れることにより、話者スペース領域５２ｓ１の指定が解除された場合に消去される。一方、ウインドウＷ２１の表示は、話者スペース領域５２ｓ１上にマウスのポインタＰが重畳されている状態で、マウスの操作ボタンがクリックされることにより継続される。こうして継続表示されたウインドウＷ２１は、話者スペース領域５２ｓ１上にマウスのポインタＰが重畳されている状態で、マウスの操作ボタンが再度、クリックされることにより消去される。 The example of FIG. 3A shows the case where the speaker space area 52s1 is designated by superimposing the mouse pointer P on the speaker space area 52s1. In this case, the display processing unit 33 pops up and displays a window W21 for displaying the evaluation score for the speaker gap calculated in the speaker interval section corresponding to the designated speaker space area 52s1. This makes it possible to display the details of the evaluation point calculated in any speaker interval section that the speaker or the like particularly wants to confirm in a more easily viewable display mode. The window W21 displayed in this way is erased when the designation of the speaker space area 52s1 is canceled by the pointer P leaving the speaker space area 52s1. On the other hand, the display of the window W21 is continued by clicking the mouse operation button while the mouse pointer P is superimposed on the speaker space area 52s1. The window W21 thus continuously displayed is erased by clicking the mouse operation button again while the mouse pointer P is superimposed on the speaker space area 52s1.

なお、上記ウインドウＷ１１，Ｗ２１は、話者グラフ表示領域５２ｂがユーザ操作により指定された場合にポップアップ表示されるように構成してもよい。この場合、話者グラフ表示領域５２ｂは、上記各区間（話者文要素区間、及び話者インターバル区間）に対応するように複数の領域に区分（例えば、時間軸ｔと直交する線で区分）される。そして、表示処理部３３は、話者グラフ表示領域５２ｂにおいて指定された領域に対応する話者文要素区間において算出された、話者の抑揚、音量、滑舌、及び速度それぞれに対する評価点を表示するウインドウＷ１１をポップアップ表示させる。或いは、表示処理部３３は、話者グラフ表示領域５２ｂにおいて指定された領域に対応する話者インターバル区間において算出された、話者の間合いに対する評価点を表示するウインドウＷ２１をポップアップ表示させる。 The windows W11 and W21 may be configured to be popped up when the speaker graph display area 52b is designated by the user operation. In this case, the speaker graph display area 52b is divided into a plurality of areas (for example, divided by a line orthogonal to the time axis t) so as to correspond to each of the above sections (speaker sentence element section and speaker interval section). Be done. Then, the display processing unit 33 displays the evaluation points for each of the speaker's intonation, the volume, the glide tongue, and the speed, which are calculated in the speaker sentence element section corresponding to the area designated in the speaker graph display area 52b. Window W11 is popped up. Alternatively, the display processing unit 33 pops up and displays a window W21 for displaying the evaluation score for the speaker gap calculated in the speaker interval section corresponding to the area designated in the speaker graph display area 52b.

また、表示処理部３３は、上述したように、ウインドウＷ１１の表示を継続させ、ウインドウＷ１１の表示が継続している間に何れか一の領域とは異なる他の領域（つまり、話者テキスト表示領域、話者グラフ表示領域、または話者スペース領域）がユーザ操作により新たに指定された場合、新たに指定された他の領域に対応する区間において算出された評価点を表示するウインドウＷ１２（第２のウインドウの一例）をポップアップ表示させ、ウインドウＷ１１の表示とウインドウＷ１２の表示を継続させる。これにより、話者等が区間毎の評価点を見比べることが可能となり、どの区間における評価が良いか悪いかを話者等に一見して把握させることができる。図３（Ｂ）の例は、ウインドウＷ１１とウインドウＷ１２のポップアップ表示が継続している例を示している。ウインドウＷ１２は、話者テキスト表示領域５２ａ１の後に指定された話者テキスト表示領域５２ａ２に対応する話者文要素区間において算出された、話者の抑揚、音量、滑舌、及び速度それぞれに対する評価点を表示している。 In addition, as described above, the display processing unit 33 causes the display of the window W11 to be continued, and while the display of the window W11 is continued, another region different from any one region (that is, the speaker text display When the area, the speaker graph display area, or the speaker space area is newly specified by the user operation, the window W12 (the second display area) displays the evaluation point calculated in the section corresponding to the newly specified other area. 2) is popped up, and the display of the window W11 and the display of the window W12 are continued. As a result, it becomes possible for the speaker etc. to compare the evaluation points for each section, and it is possible to make the speaker etc. understand at a glance which section the evaluation is good or bad. The example of FIG. 3B shows an example in which the pop-up display of the window W11 and the window W12 continues. The window W12 is an evaluation point for each of the speaker's intonation, volume, glide tongue, and speed, which is calculated in the speaker sentence element section corresponding to the speaker text display area 52a2 specified after the speaker text display area 52a1. Is displayed.

また、表示処理部３３は、アイコン（５２ａ１１〜５２ａ３１の何れか）がユーザ操作により指定された場合、指定されたアイコンに対応する話者文要素区間において算出された、話者の抑揚、音量、滑舌、及び速度のうち少なくとも何れか一つのそれぞれに対する評価点を表示するウインドウＷ３１（第４のウインドウの一例）をポップアップ表示させる。図４（Ａ）の例は、アイコン５２ａ３１上に、マウスのポインタＰを重畳させることにより、アイコン５２ａ３１が指定された場合を示している。この場合、表示処理部３３は、指定されたアイコン５２ａ３１に対応する話者文要素区間において算出された、話者の抑揚、音量、滑舌、及び速度それぞれに対する評価点を表示するウインドウＷ３１をポップアップ表示させる。ここで、ウインドウＷ３１に表示された各評価点は、アイコン５２ａ３１に対応する話者文要素区間における区間評価点（つまり、アイコン５２ａ３１の絵柄が示す区間評価点）の算出に用いられた評価点である。従って、ウインドウＷ３１のポップアップ表示により、アイコン５２ａ３１の絵柄が示す区間評価点の内訳を話者等に、効果的に把握させることができる。こうして表示されたウインドウＷ３１は、アイコン５２ａ３１からポインタＰが離れることにより、アイコン５２ａ３１の指定が解除された場合に消去される。一方、ウインドウＷ３１の表示は、アイコン５２ａ３１上にマウスのポインタＰが重畳されている状態で、マウスの操作ボタンがクリックされることにより継続される。こうして継続表示されたウインドウＷ３１は、アイコン５２ａ３１上にマウスのポインタＰが重畳されている状態で、マウスの操作ボタンが再度、クリックされることにより消去される。なお、この場合も、ウインドウＷ３１の表示が継続している間に、アイコン５２ａ３１とは異なる他のアイコンがユーザ操作により新たに指定された場合、新たに指定された他のアイコンに対応する話者文要素区間において算出された評価点を表示する新たなウインドウをポップアップ表示させ、ウインドウＷ３１の表示と新たなウインドウの表示を継続させてもよい。 In addition, when the icon (any of 52a11 to 52a31) is designated by the user operation, the display processing unit 33 calculates the speaker's intonation, the volume, and the like calculated in the speaker sentence element section corresponding to the designated icon. A window W31 (an example of a fourth window) for displaying an evaluation point for each of at least one of a tongue and a velocity is popped up. The example of FIG. 4A shows a case where the icon 52a31 is designated by superimposing the mouse pointer P on the icon 52a31. In this case, the display processing unit 33 pops up a window W31 for displaying evaluation points for each of the speaker's intonation, volume, glide tongue, and speed, calculated in the speaker sentence element section corresponding to the designated icon 52a31. Display. Here, each evaluation point displayed in the window W31 is an evaluation point used to calculate a section evaluation point in the speaker sentence element section corresponding to the icon 52a31 (that is, a section evaluation point indicated by the design of the icon 52a31). is there. Therefore, the pop-up display of the window W31 allows the speaker or the like to effectively grasp the breakdown of the section evaluation point indicated by the pattern of the icon 52a31. The window W31 displayed in this manner is erased when the designation of the icon 52a31 is released by the pointer P leaving the icon 52a31. On the other hand, the display of the window W31 is continued by clicking the mouse operation button while the mouse pointer P is superimposed on the icon 52a31. The window W31 continuously displayed in this manner is erased by clicking the mouse operation button again while the mouse pointer P is superimposed on the icon 52a31. Also in this case, when another icon different from the icon 52a31 is newly designated by the user operation while the display of the window W31 is continued, the speaker corresponding to the newly designated other icon A new window displaying the evaluation points calculated in the sentence element section may be popped up to continue the display of the window W31 and the display of the new window.

また、表示処理部３３は、話者テキスト表示領域（５２ａ１〜５２ａ３の何れか）がユーザ操作により指定された場合、指定された話者テキスト表示領域に少なくとも一部が表示されたテキストの全部を表示するウインドウＷＴ（第３のウインドウの一例）をポップアップ表示させてもよい。図４（Ｂ）の例は、話者テキスト表示領域５２ａ３上に、マウスのポインタＰを重畳させることにより、話者テキスト表示領域５２ａ３が指定された場合を示している。この場合、表示処理部３３は、指定された話者テキスト表示領域５２ａ３に対応する話者文要素区間の話者文要素区間データに対応付けられたテキストの全部を表示するウインドウＷＴをポップアップ表示する。これにより、話者テキスト表示領域５２ａ３内にテキストが収まらない場合であっても、そのテキストで表される文要素の内容を話者等に把握させることができる。こうして表示されたウインドウＷＴは、話者テキスト表示領域５２ａ３からポインタＰが離れることにより、話者テキスト表示領域５２ａ３の指定が解除された場合に消去される。一方、ウインドウＷＴの表示は、話者テキスト表示領域５２ａ３上にマウスのポインタＰが重畳されている状態で、マウスの操作ボタンがクリックされることにより継続される。こうして継続表示されたウインドウＷＴは、話者テキスト表示領域５２ａ３上にマウスのポインタＰが重畳されている状態で、マウスの操作ボタンが再度、クリックされることにより消去される。 In addition, when the speaker text display area (any of 52a1 to 52a3) is designated by the user operation, the display processing unit 33 performs the whole of the text at least a part of which is displayed in the designated speaker text display area. A window WT (an example of a third window) to be displayed may be popped up. The example of FIG. 4B shows the case where the speaker text display area 52a3 is designated by superimposing the mouse pointer P on the speaker text display area 52a3. In this case, the display processing unit 33 pops up a window WT that displays the entire text associated with the speaker sentence element segment data of the speaker sentence element segment corresponding to the specified speaker text display area 52a3. . As a result, even if the text does not fit in the speaker text display area 52a3, it is possible to make the speaker or the like understand the content of the sentence element represented by the text. The window WT displayed in this way is erased when the designation of the speaker text display area 52a3 is canceled by the pointer P leaving the speaker text display area 52a3. On the other hand, the display of the window WT is continued by clicking the mouse operation button while the mouse pointer P is superimposed on the speaker text display area 52a3. The window WT continuously displayed in this manner is erased by clicking the mouse operation button again while the mouse pointer P is superimposed on the speaker text display area 52a3.

［２.音読評価装置Ｓの動作例］
次に、音読評価装置Ｓの動作例について、実施例１と実施例２に分けて説明する。なお、以下に説明する動作の前提として、手本文要素区間データ及び話者音声波形データが制御部３に取り込まれ、音声処理部３１により、手本文要素区間、手本インターバル区間、手本音圧、手本音高、手本特徴量、話者文要素区間、話者インターバル区間、話者音圧、話者音高、及び話者特徴量が特定され、これらのデータ、及び手本文要素区間毎のテキスト、並びに話者文要素区間毎のテキストが、例えば、音読対象のお手本となる所望の音声ファイルの識別情報に対応付けられて記憶部２に記憶されているものとする。さらに、音読評価部３２により、評価項目別の評価点、区間評価点、総合評価点が算出され、これらの評価点が、音読対象のお手本となる所望の音声ファイルの識別情報に対応付けられて記憶部２に記憶されているものとする。 [2. Operation Example of Reading Evaluation System S]
Next, an operation example of the reading evaluation apparatus S will be described by being divided into the first embodiment and the second embodiment. It should be noted that, as a premise of the operation described below, the body text section data and the speaker voice waveform data are taken into the control unit 3 and the speech processing section 31 outputs the body text section, the model interval section, the model sound pressure, A sample pitch, a model feature, a speaker sentence element interval, a speaker interval interval, a speaker sound pressure, a speaker pitch and a speaker feature are specified, and these data and each hand text element interval are specified. It is assumed that the text and the text for each speaker sentence element section are stored in the storage unit 2 in association with, for example, identification information of a desired voice file serving as a model of the reading target. Furthermore, the reading aloud evaluation unit 32 calculates evaluation points for each evaluation item, a section evaluation point, and an overall evaluation point, and these evaluation points are associated with identification information of a desired voice file to be a model for reading aloud It is assumed that it is stored in the storage unit 2.

（実施例１）
先ず、図５を参照して、実施例１について説明する。図５は、実施例１における制御部３の音読評価表示処理を示すフローチャートである。図５に示す処理は、例えば、話者が操作部４を介して音読対象のお手本となる所望の音声ファイルを指定して表示開始指示を行うことにより開始される。図５に示す処理が開始されると、制御部３（表示処理部３３）は、指定された音声ファイルの識別情報に対応付けられたデータ（文要素区間データ、インターバル区間データ、テキストデータ等）、及び評価点のデータを記憶部２から読み込み、図２（Ａ）に示すように、手本表示領域５１、話者表示領域５２、及び総合評価点表示領域５３等が配置された表示画面をディスプレイＤに表示させる（ステップＳ１）。図２（Ａ）の例では、手本表示領域５１と話者表示領域５２と総合評価点表示領域５３とは同一画面上に配置されている。これにより、全区間における総合評価点と、話者等が確認したい一部区間における評価点とを、話者等に同時に参照させることができる。ただし、総合評価点表示領域５３は、手本表示領域５１と話者表示領域５２とが配置される画面とは別の画面に配置されてもよい。つまり、例えば、別の画面では総合評価点は表示されるが、手本表示領域５１と話者表示領域５２が表示されている画面には総合評価点が表示されなくてもよい。 Example 1
First, the first embodiment will be described with reference to FIG. FIG. 5 is a flowchart showing the reading aloud evaluation display processing of the control unit 3 in the first embodiment. The process shown in FIG. 5 is started, for example, by the speaker designating a desired voice file to be a model for reading aloud through the operation unit 4 and giving a display start instruction. When the process shown in FIG. 5 is started, the control unit 3 (display processing unit 33) generates data (text element section data, interval section data, text data, etc.) associated with identification information of the specified audio file. And data of evaluation points are read from the storage unit 2 and, as shown in FIG. 2A, the display screen on which the model display area 51, the speaker display area 52, the comprehensive evaluation point display area 53, etc. are arranged The display D is displayed (step S1). In the example of FIG. 2A, the sample display area 51, the speaker display area 52, and the comprehensive evaluation point display area 53 are arranged on the same screen. In this way, it is possible to have the speaker etc. simultaneously refer to the comprehensive evaluation points in all the sections and the evaluation points in the partial section that the speaker etc wants to check. However, the comprehensive evaluation point display area 53 may be arranged on a screen different from the screen on which the model display area 51 and the speaker display area 52 are arranged. That is, for example, although the comprehensive evaluation point is displayed on another screen, the comprehensive evaluation point may not be displayed on the screen on which the sample display area 51 and the speaker display area 52 are displayed.

次いで、制御部３は、話者テキスト表示領域５２ａ１〜５２ａ３、及び話者スペース領域５２ｓ１，５２ｓ２のうち、何れか一の領域がユーザ操作により指定されたか否かを判定する（ステップＳ２）。例えば、ユーザがマウスを操作することで、マウスのポインタＰが話者テキスト表示領域または話者スペース領域上に重畳すると、その領域がユーザ操作により指定されたと判定される。或いは、ディスプレイＤがタッチパネルディスプレイである場合、ユーザが、話者テキスト表示領域または話者スペース領域を指やペン等で触れると、その領域がユーザ操作により指定されたと判定される。なお、ステップＳ２において、上述したように話者グラフ表示領域がユーザ操作により指定されたか否かが判定されてもよい。そして、制御部３は、何れか一の領域がユーザ操作により指定されたと判定した場合（ステップＳ２：ＹＥＳ）、ステップＳ３へ進む。一方、制御部３は、何れか一の領域がユーザ操作により指定されていないと判定した場合（ステップＳ２：ＮＯ）、ステップＳ１０へ進む。 Next, the control unit 3 determines whether or not any one of the speaker text display areas 52a1 to 52a3 and the speaker space areas 52s1 and 52s2 is designated by the user operation (step S2). For example, when the user operates the mouse and the mouse pointer P is superimposed on the speaker text display area or the speaker space area, it is determined that the area is designated by the user operation. Alternatively, in the case where the display D is a touch panel display, when the user touches the speaker text display area or the speaker space area with a finger or a pen, it is determined that the area is designated by the user operation. In step S2, as described above, it may be determined whether or not the speaker graph display area is designated by the user operation. When the control unit 3 determines that one of the areas is designated by the user operation (step S2: YES), the process proceeds to step S3. On the other hand, when it is determined that one of the areas is not designated by the user operation (step S2: NO), the control unit 3 proceeds to step S10.

ステップＳ３では、制御部３は、指定された領域に対応するウインドウが継続表示中であるか否かを判定する。制御部３は、上記ウインドウが継続表示中であると判定した場合（ステップＳ３：ＹＥＳ）、ステップＳ１０へ進む。一方、制御部３は、上記ウインドウが継続表示中でないと判定した場合（ステップＳ３：ＮＯ）、ステップＳ４へ進む。ステップＳ４では、制御部３は、指定された領域（つまり、指定されたと判定した領域）に対応する区間において算出された評価点を特定する。次いで、制御部３は、ステップＳ４で特定した評価点を表示するウインドウを表示画面上にポップアップ表示させる（ステップＳ５）。例えば、話者テキスト表示領域５２ａ１が指定された場合、この領域に対応する話者文要素区間において算出された、話者の抑揚、音量、滑舌、及び速度それぞれに対する評価点を表示するウインドウＷ１１が、図２（Ｂ）に示すようにポップアップ表示される。一方、話者スペース領域５２ｓ１が指定された場合、この領域に対応する話者インターバル区間において算出された、話者の間合いに対する評価点を表示するウインドウＷ２１が、図３（Ａ）に示すようにポップアップ表示される。 In step S3, the control unit 3 determines whether the window corresponding to the specified area is being displayed continuously. If the control unit 3 determines that the window is being displayed continuously (step S3: YES), the process proceeds to step S10. On the other hand, when the control unit 3 determines that the window is not continuously displayed (step S3: NO), the process proceeds to step S4. In step S4, the control unit 3 specifies the evaluation point calculated in the section corresponding to the designated area (that is, the area determined to be designated). Next, the control unit 3 pops up a window for displaying the evaluation point identified in step S4 on the display screen (step S5). For example, when the speaker text display area 52a1 is designated, a window W11 for displaying evaluation points for each of the speaker's intonation, volume, glide tongue, and speed calculated in the speaker sentence element section corresponding to this area Is popped up as shown in FIG. 2 (B). On the other hand, when the speaker space area 52s1 is designated, the window W21 for displaying the evaluation points for the speaker gap calculated in the speaker interval section corresponding to this area is as shown in FIG. 3 (A). Pop up is displayed.

次いで、制御部３は、上記領域への指定が解除されたか否かを判定する（ステップＳ６）。例えば、ユーザがマウスを操作することで、マウスのポインタＰが話者テキスト表示領域または話者スペース領域上から離れると、その領域への指定が解除されたと判定される。或いは、ディスプレイＤがタッチパネルディスプレイである場合、ユーザが、話者テキスト表示領域または話者スペース領域から指やペン等を離すと、その領域への指定が解除されたと判定される。そして、制御部３は、上記領域への指定が解除されたと判定した場合（ステップＳ６：ＹＥＳ）、判定した領域に対応するウインドウの表示消去を行い（ステップＳ７）、ステップＳ１０へ進む。 Next, the control unit 3 determines whether or not the designation to the area is released (step S6). For example, when the user operates the mouse and the pointer P of the mouse leaves the speaker text display area or the speaker space area, it is determined that the designation to the area is released. Alternatively, in the case where the display D is a touch panel display, when the user removes a finger, a pen or the like from the speaker text display area or the speaker space area, it is determined that the designation to the area is cancelled. Then, when it is determined that the designation to the area is canceled (step S6: YES), the control unit 3 erases the display of the window corresponding to the determined area (step S7), and proceeds to step S10.

一方、制御部３は、上記領域への指定が解除されていないと判定した場合（ステップＳ６：ＮＯ）、判定した領域に対応するウインドウの継続表示指示があったか否かを判定する（ステップＳ８）。例えば、ステップＳ６で判定された領域上にマウスのポインタＰが重畳されている状態で、ユーザがマウスの操作ボタンをクリックすると、その領域に対応するウインドウの継続表示指示があったと判定される。或いは、ディスプレイＤがタッチパネルディスプレイである場合、ユーザが、ステップＳ６で判定された領域上を指やペン等で触れている状態から、一旦、指やペン等を離して素早くタップすると、その領域に対応するウインドウの継続表示指示があったと判定される。そして、制御部３は、上記領域に対応するウインドウの継続表示指示があったと判定した場合（ステップＳ８：ＹＥＳ）、そのウインドウを継続表示させ（ステップＳ９）、ステップＳ１０へ進む。一方、制御部３は、上記領域に対応するウインドウの継続表示指示がないと判定した場合（ステップＳ８：ＮＯ）、ステップＳ６に戻る。 On the other hand, when it is determined that the designation to the area is not canceled (step S6: NO), the control unit 3 determines whether or not there is an instruction to continue displaying the window corresponding to the determined area (step S8). . For example, in the state where the mouse pointer P is superimposed on the area determined in step S6, when the user clicks the operation button of the mouse, it is determined that the continuous display instruction of the window corresponding to the area is given. Alternatively, if the display D is a touch panel display, when the user touches the area determined in step S6 with a finger or a pen with a finger or a pen etc., the finger or pen is temporarily released and quickly taps the area. It is determined that there is a continuous display instruction of the corresponding window. Then, when it is determined that the continuous display instruction of the window corresponding to the area is given (step S8: YES), the control unit 3 causes the window to be continuously displayed (step S9), and proceeds to step S10. On the other hand, when it is determined that the control unit 3 has not instructed to continuously display the window corresponding to the area (step S8: NO), the process returns to step S6.

ステップＳ１０では、制御部３は、継続表示中のウインドウの継続表示解除指示があったか否かを判定する。なお、ステップＳ１０において、ウインドウが継続表示されていない場合、そもそも、継続表示解除指示は受け付けられないので、ステップＳ１２へ進む。制御部３は、継続表示中のウインドウの継続表示解除指示があったと判定した場合（ステップＳ１０：ＹＥＳ）、ステップＳ１１へ進む。例えば、ユーザがマウスを操作して、継続表示中のウインドウ上にマウスのポインタＰを重畳させた状態で、ユーザがマウスの操作ボタンをクリックすると、そのウインドウの継続表示解除指示があったと判定される。或いは、ディスプレイＤがタッチパネルディスプレイである場合、ユーザが、継続表示中のウインドウ上を指やペン等でタップすると、そのウインドウの継続表示解除指示があったと判定される。ステップＳ１１では、制御部３は、継続表示解除指示されたウインドウの表示消去を行い、ステップＳ１２へ進む。一方、制御部３は、継続表示中のウインドウの継続表示解除指示がないと判定した場合（ステップＳ１０：ＮＯ）、ステップＳ１２へ進む。 In step S10, the control unit 3 determines whether or not there is a continuous display cancellation instruction of the window being continuously displayed. If the window is not continuously displayed in step S10, the instruction to cancel the continuous display is not received from the start, so the process proceeds to step S12. If the control unit 3 determines that the instruction to cancel the continuous display of the continuously displayed window has been issued (step S10: YES), the control unit 3 proceeds to step S11. For example, when the user operates the mouse and superimposes the mouse pointer P on the window being continuously displayed, and the user clicks the operation button of the mouse, it is determined that the instruction to cancel the continuous display of the window has been issued. Ru. Alternatively, in the case where the display D is a touch panel display, when the user taps on the continuously displayed window with a finger, a pen, or the like, it is determined that there is a continuous display cancellation instruction of the window. In step S11, the control unit 3 erases the display of the window for which the continuous display cancellation instruction is issued, and the process proceeds to step S12. On the other hand, when the control unit 3 determines that the continuous display cancellation instruction of the window being continuously displayed is not received (step S10: NO), the control unit 3 proceeds to step S12.

ステップＳ１２では、制御部３は、表示終了指示があったか否かを判定する。制御部３は、閉じるボタン５５のユーザ操作に応じて、表示終了指示があったと判定した場合（ステップＳ１２：ＹＥＳ）、図５に示す処理を終了する。一方、制御部３は、表示終了指示がないと判定した場合（ステップＳ１２：ＮＯ）、ステップＳ２に戻る。なお、図示しないが、図５に示す処理において、制御部３は、スクロールバー５４のユーザ操作に応じて、手本表示領域５１及び話者表示領域５２を左方向又は右方向へスクロールさせる。 In step S12, the control unit 3 determines whether or not there is a display end instruction. When the control unit 3 determines that the display end instruction has been given according to the user operation of the close button 55 (step S12: YES), the process shown in FIG. 5 is ended. On the other hand, when it is determined that the display end instruction has not been issued (step S12: NO), the control unit 3 returns to step S2. Although not illustrated, in the process illustrated in FIG. 5, the control unit 3 scrolls the sample display area 51 and the speaker display area 52 in the left direction or the right direction according to the user operation of the scroll bar 54.

（実施例２）
次に、図６を参照して、実施例２について説明する。図６は、実施例２における制御部３の音読評価表示処理を示すフローチャートである。なお、実施例２は、話者テキスト表示領域においてテキストの領域が指定された場合と、話者テキスト表示領域においてアイコンの領域が指定された場合とで、ウインドウに表示される内容を切り替える構成である。なお、図６に示す処理では、話者スペース領域がユーザ操作により指定された場合の処理を省略している。図６に示す処理は、図５に示す処理と同様に開始される。図６に示すステップＳ２１は、図５に示すステップＳ１と同様である。 (Example 2)
A second embodiment will now be described with reference to FIG. FIG. 6 is a flowchart showing the reading aloud evaluation display processing of the control unit 3 in the second embodiment. In the second embodiment, the contents displayed in the window are switched between when the text area is specified in the speaker text display area and when the icon area is specified in the speaker text display area. is there. In the process shown in FIG. 6, the process when the speaker space area is designated by the user operation is omitted. The process shown in FIG. 6 is started in the same manner as the process shown in FIG. Step S21 shown in FIG. 6 is the same as step S1 shown in FIG.

ステップＳ２２では、制御部３は、話者テキスト表示領域５２ａ１〜５２ａ３のうち、何れか一の話者テキスト表示領域におけるテキストの領域がユーザ操作により指定されたか否かを判定する。制御部３は、テキストの領域がユーザ操作により指定されたと判定した場合（ステップＳ２２：ＹＥＳ）、ステップＳ２３へ進む。一方、制御部３は、テキストの領域がユーザ操作により指定されていないと判定した場合（ステップＳ２２：ＮＯ）、ステップＳ３０へ進む。 In step S22, the control unit 3 determines whether the text area in any one of the speaker text display areas of the speaker text display areas 52a1 to 52a3 is designated by the user operation. When the control unit 3 determines that the text area is designated by the user operation (step S22: YES), the control unit 3 proceeds to step S23. On the other hand, when the control unit 3 determines that the text area is not designated by the user operation (step S22: NO), the control unit 3 proceeds to step S30.

ステップＳ２３では、制御部３は、指定されたテキストの領域に対応するウインドウが継続表示中であるか否かを判定する。制御部３は、上記ウインドウが継続表示中であると判定した場合（ステップＳ２３：ＹＥＳ）、ステップＳ３０へ進む。一方、制御部３は、上記ウインドウが継続表示中でないと判定した場合（ステップＳ２３：ＮＯ）、ステップＳ２４へ進む。ステップＳ２４では、制御部３は、指定されたテキストの領域に対応する話者文要素区間の話者文要素区間データに対応付けられたテキストを特定する。次いで、制御部３は、ステップＳ２４で特定したテキストの全部を表示するウインドウを表示画面上にポップアップ表示させる（ステップＳ２５）。これにより、指定された話者テキスト表示領域に少なくとも一部が表示されたテキストの全部を表示するウインドウＷＴが、図４（Ｂ）に示すようにポップアップ表示される。なお、ステップＳ２６〜Ｓ２９の処理は、図５に示すステップＳ６〜Ｓ９と同様である。 In step S23, the control unit 3 determines whether the window corresponding to the designated text area is being displayed continuously. If the control unit 3 determines that the window is continuously displayed (step S23: YES), the control unit 3 proceeds to step S30. On the other hand, when the control unit 3 determines that the window is not continuously displayed (step S23: NO), the process proceeds to step S24. In step S24, the control unit 3 specifies the text associated with the speaker sentence element segment data of the speaker sentence element segment corresponding to the designated text region. Next, the control unit 3 pops up a window for displaying all of the text specified in step S24 on the display screen (step S25). As a result, a window WT displaying all of the text at least partially displayed in the designated speaker text display area is popped up as shown in FIG. 4 (B). The process of steps S26 to S29 is the same as that of steps S6 to S9 shown in FIG.

ステップＳ３０では、制御部３は、アイコン５２ａ１１〜５２ａ３１のうち、何れか一のアイコンがユーザ操作により指定されたか否かを判定する。制御部３は、何れか一のアイコンがユーザ操作により指定されたと判定した場合（ステップＳ３０：ＹＥＳ）、ステップＳ３１へ進む。一方、制御部３は、何れか一のアイコンがユーザ操作により指定されていないと判定した場合（ステップＳ３０：ＮＯ）、ステップＳ３８へ進む。 In step S30, the control unit 3 determines whether any one of the icons 52a11 to 52a31 is designated by the user operation. If the control unit 3 determines that any one icon is designated by the user operation (step S30: YES), the process proceeds to step S31. On the other hand, when it is determined that one of the icons is not designated by the user operation (step S30: NO), the control unit 3 proceeds to step S38.

ステップＳ３１では、制御部３は、指定されたアイコンに対応するウインドウが継続表示中であるか否かを判定する。制御部３は、上記ウインドウが継続表示中であると判定した場合（ステップＳ３１：ＹＥＳ）、ステップＳ３８へ進む。一方、制御部３は、上記ウインドウが継続表示中でないと判定した場合（ステップＳ３１：ＮＯ）、ステップＳ３２へ進む。ステップＳ３２では、制御部３は、指定されたアイコンに対応する話者文要素区間において算出された評価点を特定する。次いで、制御部３は、ステップＳ３２で特定した評価点を表示するウインドウを表示画面上にポップアップ表示させる（ステップＳ３３）。これにより、指定されたアイコンに対応する話者文要素区間において算出された評価点を表示するウインドウＷ３１が、図４（Ａ）に示すようにポップアップ表示される。なお、ステップＳ３４〜Ｓ３７の処理は、図５に示すステップＳ６〜Ｓ９と同様である。また、ステップＳ３８〜Ｓ４０の処理は、図５に示すステップＳ１０〜Ｓ１２と同様である。 In step S31, the control unit 3 determines whether or not the window corresponding to the designated icon is being displayed continuously. If the control unit 3 determines that the window is continuously displayed (step S31: YES), the control unit 3 proceeds to step S38. On the other hand, when the control unit 3 determines that the window is not continuously displayed (step S31: NO), the process proceeds to step S32. In step S32, the control unit 3 specifies the evaluation point calculated in the speaker sentence element section corresponding to the designated icon. Next, the control unit 3 pops up a window displaying the evaluation point identified in step S32 on the display screen (step S33). As a result, a window W31 for displaying the evaluation point calculated in the speaker sentence element section corresponding to the designated icon is popped up as shown in FIG. 4 (A). The process of steps S34 to S37 is similar to that of steps S6 to S9 shown in FIG. Moreover, the process of step S38-S40 is the same as that of step S10-S12 shown in FIG.

以上説明したように、上記実施形態によれば、音読評価装置Ｓは、話者による音読に対する総合評価点が表示される状態で、話者テキスト表示領域５２ａ１〜５２ａ３、及び話者スペース領域５２ｓ１，５２ｓ２のうち、何れか一の領域がユーザ操作により指定された場合、指定された何れか一の領域に対応する区間において算出された評価点を表示するウインドウをポップアップ表示させるので、話者等が特に確認したい何れかの区間において算出された評価点の詳細を、より見易い表示態様で表示させることができる。 As described above, according to the above-described embodiment, the reading evaluation apparatus S displays the speaker text display areas 52a1 to 52a3 and the speaker space areas 52s1 and 52s1 in a state where the comprehensive evaluation point for the reading by the speaker is displayed. When any one of the 52s2 is designated by the user operation, a window etc. displaying the evaluation point calculated in the section corresponding to the designated one is displayed in a pop-up manner, so that the speaker etc. In particular, the details of the evaluation points calculated in any of the sections to be confirmed can be displayed in a more easily viewable display mode.

１通信部
２記憶部
３制御部
４操作部
５インターフェース部
６バス
３１音声処理部
３２音読評価部
３３表示処理部
Ｓ音読評価装置 Reference Signs List 1 communication unit 2 storage unit 3 control unit 4 operation unit 5 interface unit 6 bus 31 voice processing unit 32 speech reading evaluation unit 33 display processing unit S speech reading evaluation device

Claims

A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence, and a plurality of the sentence elements Evaluation point calculation means for calculating an evaluation point for the reading aloud for at least one of interval segments from the end timing of any one of the sentence elements to the start timing of the next one of the sentence elements;
Comprehensive evaluation point calculation means for calculating an overall evaluation point for the reading on the basis of the evaluation points for each section calculated by the evaluation point calculation means;
At least a portion of the text representing each sentence element constituting the aloud sentence is placed along the time axis in the first display area of a length corresponding to the time length of each of the sentence element sections. First display control means for displaying each time;
A graph representing a time-series change of at least one sound element of the pitch and the sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element interval Second display control means for displaying in the display area of
Third display control means for displaying in the third display area the comprehensive evaluation point calculated by the comprehensive evaluation point calculation means;
One of the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas is designated by a user operation. In the case where the first window displaying the evaluation points calculated in the section corresponding to the designated one of the regions is displayed in a pop-up manner , and the first display region is designated by a user operation, Fourth display control means for pop-up displaying a third window displaying all of the text at least a part of which is displayed in the specified first display area ;
A speech reading and evaluating apparatus comprising:

A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence, and a plurality of the sentence elements Evaluation point calculation means for calculating an evaluation point for the reading aloud for at least one of interval segments from the end timing of any one of the sentence elements to the start timing of the next one of the sentence elements;
Comprehensive evaluation point calculation means for calculating an overall evaluation point for the reading on the basis of the evaluation points for each section calculated by the evaluation point calculation means;
At least a portion of the text representing each sentence element constituting the aloud sentence is placed along the time axis in the first display area of a length corresponding to the time length of each of the sentence element sections. First display control means for displaying each time;
A graph representing a time-series change of at least one sound element of the pitch and the sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element interval Second display control means for displaying in the display area of
Third display control means for displaying in the third display area the comprehensive evaluation point calculated by the comprehensive evaluation point calculation means;
One of the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas is designated by a user operation. Fourth display control means for pop-up displaying a first window for displaying the evaluation point calculated in the section corresponding to the designated one of the areas;
It is the evaluation point corresponding to each of at least one of intonation, volume, glide tongue, and speed set in advance as an evaluation item for the aloud reading, and the evaluation point calculated for each of the sentence element sections Interval evaluation point calculation means for calculating an interval evaluation point for the reading aloud for each sentence element interval based on the following;
The icon representing each section evaluation point calculated by the section evaluation point calculation means is arranged for each of the sentence element sections along the time axis in a fourth display area arranged corresponding to each of the first display areas. A fifth display control means for displaying on the
Equipped with
The fourth display control means is an evaluation point calculated in the sentence element section corresponding to the designated icon when the icon is designated by a user operation, and is previously set as an evaluation item for the reading aloud And a fourth window for displaying the evaluation point for each of at least one of the selected intonation, the volume, the tongue, and the speed as a pop-up display.

A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence, and a plurality of the sentence elements Evaluation point calculation means for calculating an evaluation point for the reading aloud for at least one of interval segments from the end timing of any one of the sentence elements to the start timing of the next one of the sentence elements;
Comprehensive evaluation point calculation means for calculating an overall evaluation point for the reading on the basis of the evaluation points for each section calculated by the evaluation point calculation means;
At least a portion of the text representing each sentence element constituting the aloud sentence is placed along the time axis in the first display area of a length corresponding to the time length of each of the sentence element sections. First display control means for displaying each time;
A graph representing a time-series change of at least one sound element of the pitch and the sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element interval Second display control means for displaying in the display area of
Third display control means for displaying in the third display area the comprehensive evaluation point calculated by the comprehensive evaluation point calculation means;
One of the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas is designated by a user operation. Fourth display control means for pop-up displaying a first window for displaying the evaluation point calculated in the section corresponding to the designated one of the areas;
Equipped with
The fourth display control means causes the mouse pointer to be superimposed on the one area in response to the user operation to cause the first window to pop up, and the mouse pointer leaves the one area. Thus, the first window is erased, and the display of the first window is continued by clicking the operation button of the mouse while the mouse pointer is superimposed on the one area;
While the display of the first window continues, the mouse pointer is superimposed on another area different from any one of the areas, thereby calculating in the section corresponding to the other area. A second window displaying evaluation points is popped up, and the mouse pointer is erased from the other area so as to erase the second window, and the mouse pointer is superimposed on the other area. The sound reading evaluation apparatus according to claim 1, wherein the display of the second window is continued by clicking the operation button of the mouse in a state.

When the display of the first window continues, the fourth display control means clicks again the operation button of the mouse while the pointer of the mouse is superimposed on the one area. 4. The reading evaluation apparatus according to claim 3, wherein the first window is erased.

When the first display area or the second display area is designated by a user operation, the fourth display control means corresponds to the designated first display area or the second display area. An evaluation point calculated in the section, a first window for displaying the evaluation point for at least one of an inflection, a volume, a tongue tongue, and a speed preset as an evaluation item for the reading aloud The speech reading evaluation apparatus according to any one of claims 1 to 4, wherein the pop-up display is performed.

When the space area is designated by a user operation, the fourth display control means is an evaluation point calculated in the section corresponding to the designated space area, and is previously set as an evaluation item for the reading aloud The speech reading evaluation apparatus according to any one of claims 1 to 4 , wherein a first window for displaying the evaluation point for the determined gap is popped up.

A display control method implemented by one or more computers, comprising:
A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence, and a plurality of the sentence elements An evaluation point calculating step of calculating an evaluation point for the reading aloud for at least one of interval segments from the end timing of any one of the sentence elements to the start timing of the next one of the sentence elements;
An overall evaluation point calculating step of calculating an overall evaluation point for the reading on the basis of the evaluation point for each section calculated in the evaluation point calculating step;
At least a portion of the text representing each sentence element constituting the aloud sentence is placed along the time axis in the first display area of a length corresponding to the time length of each of the sentence element sections. A first display control step of displaying each time;
A graph representing a time-series change of at least one sound element of the pitch and the sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element interval A second display control step of displaying on the display area along the time axis;
A third display control step of displaying the comprehensive evaluation point calculated in the comprehensive evaluation point calculating step in a third display area;
One of the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas is designated by a user operation. In the case where the first window displaying the evaluation points calculated in the section corresponding to the designated one of the regions is displayed in a pop-up manner , and the first display region is designated by a user operation, A fourth display control step of pop-up displaying a third window for displaying all of the text at least a part of which is displayed in the specified first display area ;
A display control method comprising:

A display control method implemented by one or more computers, comprising:
A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence, and a plurality of the sentence elements An evaluation point calculating step of calculating an evaluation point for the reading aloud for at least one of interval segments from the end timing of any one of the sentence elements to the start timing of the next one of the sentence elements;
An overall evaluation point calculating step of calculating an overall evaluation point for the reading on the basis of the evaluation point for each section calculated in the evaluation point calculating step;
At least a portion of the text representing each sentence element constituting the aloud sentence is placed along the time axis in the first display area of a length corresponding to the time length of each of the sentence element sections. A first display control step of displaying each time;
A graph representing a time-series change of at least one sound element of the pitch and the sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element interval A second display control step of displaying on the display area along the time axis;
A third display control step of displaying the comprehensive evaluation point calculated in the comprehensive evaluation point calculating step in a third display area;
One of the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas is designated by a user operation. In the case, a fourth display control step of pop-up displaying a first window displaying the evaluation point calculated in the section corresponding to the designated one of the areas;
It is the evaluation point corresponding to each of at least one of intonation, volume, glide tongue, and speed set in advance as an evaluation item for the aloud reading, and the evaluation point calculated for each of the sentence element sections A section evaluation point calculation step of calculating a section evaluation point for the reading aloud for each sentence element section based on the following;
In the fourth display area arranged corresponding to each of the first display areas, the icon representing each of the section evaluation points calculated in the section evaluation point calculation step for each of the sentence element sections along the time axis A fifth display control step to be displayed on the
Including
In the fourth display control step, when the icon is designated by a user operation, it is an evaluation point calculated in the sentence element section corresponding to the designated icon, and is previously determined as an evaluation item for the reading aloud A display control method characterized by pop-up displaying a fourth window for displaying the evaluation point for each of at least one of the set intonation, the volume, the tongue, and the speed.

A display control method implemented by one or more computers, comprising:
A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence, and a plurality of the sentence elements An evaluation point calculating step of calculating an evaluation point for the reading aloud for at least one of interval segments from the end timing of any one of the sentence elements to the start timing of the next one of the sentence elements;
An overall evaluation point calculating step of calculating an overall evaluation point for the reading on the basis of the evaluation point for each section calculated in the evaluation point calculating step;
At least a portion of the text representing each sentence element constituting the aloud sentence is placed along the time axis in the first display area of a length corresponding to the time length of each of the sentence element sections. A first display control step of displaying each time;
A graph representing a time-series change of at least one sound element of the pitch and the sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element interval A second display control step of displaying on the display area along the time axis;
A third display control step of displaying the comprehensive evaluation point calculated in the comprehensive evaluation point calculating step in a third display area;
One of the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas is designated by a user operation. In the case, a fourth display control step of pop-up displaying a first window displaying the evaluation point calculated in the section corresponding to the designated one of the areas;
Including
In the fourth display control step, the mouse pointer is superimposed on the one area in response to the user operation to cause the first window to be popped up, and the mouse pointer starts from the one area. The first window is erased by being separated, and the display of the first window is continued by clicking the operation button of the mouse while the mouse pointer is superimposed on the one area;
While the display of the first window continues, the mouse pointer is superimposed on another area different from any one of the areas, thereby calculating in the section corresponding to the other area. A second window displaying evaluation points is popped up, and the mouse pointer is erased from the other area so as to erase the second window, and the mouse pointer is superimposed on the other area. A display control method characterized by continuing display of the second window by clicking an operation button of the mouse in a state.

A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence, and a plurality of the sentence elements An evaluation point calculating step of calculating an evaluation point for the reading aloud for at least one of interval segments from the end timing of any one of the sentence elements to the start timing of the next one of the sentence elements;
An overall evaluation point calculating step of calculating an overall evaluation point for the reading on the basis of the evaluation point for each section calculated in the evaluation point calculating step;
At least a portion of the text representing each sentence element constituting the aloud sentence is placed along the time axis in the first display area of a length corresponding to the time length of each of the sentence element sections. A first display control step of displaying each time;
A graph representing a time-series change of at least one sound element of the pitch and the sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element interval A second display control step of displaying on the display area along the time axis;
A third display control step of displaying the comprehensive evaluation point calculated in the comprehensive evaluation point calculating step in a third display area;
One of the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas is designated by a user operation. In the case where the first window displaying the evaluation points calculated in the section corresponding to the designated one of the regions is displayed in a pop-up manner , and the first display region is designated by a user operation, A fourth display control step of pop-up displaying a third window for displaying all of the text at least a part of which is displayed in the specified first display area ;
A program characterized by causing a computer to execute.

A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence, and a plurality of the sentence elements An evaluation point calculating step of calculating an evaluation point for the reading aloud for at least one of interval segments from the end timing of any one of the sentence elements to the start timing of the next one of the sentence elements;
An overall evaluation point calculating step of calculating an overall evaluation point for the reading on the basis of the evaluation point for each section calculated in the evaluation point calculating step;
At least a portion of the text representing each sentence element constituting the aloud sentence is placed along the time axis in the first display area of a length corresponding to the time length of each of the sentence element sections. A first display control step of displaying each time;
A graph representing a time-series change of at least one sound element of the pitch and the sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element interval A second display control step of displaying on the display area along the time axis;
A third display control step of displaying the comprehensive evaluation point calculated in the comprehensive evaluation point calculating step in a third display area;
One of the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas is designated by a user operation. In the case, a fourth display control step of pop-up displaying a first window displaying the evaluation point calculated in the section corresponding to the designated one of the areas;
It is the evaluation point corresponding to each of at least one of intonation, volume, glide tongue, and speed set in advance as an evaluation item for the aloud reading, and the evaluation point calculated for each of the sentence element sections A section evaluation point calculation step of calculating a section evaluation point for the reading aloud for each sentence element section based on the following;
In the fourth display area arranged corresponding to each of the first display areas, the icon representing each of the section evaluation points calculated in the section evaluation point calculation step for each of the sentence element sections along the time axis A fifth display control step to be displayed on the
A program that causes a computer to execute
In the fourth display control step, when the icon is designated by a user operation, it is an evaluation point calculated in the sentence element section corresponding to the designated icon, and is previously determined as an evaluation item for the reading aloud A program for pop-up displaying a fourth window for displaying the evaluation point for each of at least one of the set intonation, volume, glide tongue, and speed.

A sentence element interval from the start timing to the end timing of each sentence element constituting the sentence read aloud based on speech waveform data indicating a waveform of speech emitted when the speaker aloud the sentence, and a plurality of the sentence elements An evaluation point calculating step of calculating an evaluation point for the reading aloud for at least one of interval segments from the end timing of any one of the sentence elements to the start timing of the next one of the sentence elements;
An overall evaluation point calculating step of calculating an overall evaluation point for the reading on the basis of the evaluation point for each section calculated in the evaluation point calculating step;
At least a portion of the text representing each sentence element constituting the aloud sentence is placed along the time axis in the first display area of a length corresponding to the time length of each of the sentence element sections. A first display control step of displaying each time;
A graph representing a time-series change of at least one sound element of the pitch and the sound pressure specified based on the voice waveform data at predetermined time intervals shorter than the time length of the sentence element interval A second display control step of displaying on the display area along the time axis;
A third display control step of displaying the comprehensive evaluation point calculated in the comprehensive evaluation point calculating step in a third display area;
One of the first display area, the second display area, and a space area located along the time axis between the plurality of first display areas is designated by a user operation. In the case, a fourth display control step of pop-up displaying a first window displaying the evaluation point calculated in the section corresponding to the designated one of the areas;
A program that causes a computer to execute
In the fourth display control step, the mouse pointer is superimposed on the one area in response to the user operation to cause the first window to be popped up, and the mouse pointer starts from the one area. The first window is erased by being separated, and the display of the first window is continued by clicking the operation button of the mouse while the mouse pointer is superimposed on the one area;
While the display of the first window continues, the mouse pointer is superimposed on another area different from any one of the areas, thereby calculating in the section corresponding to the other area. A second window displaying evaluation points is popped up, and the mouse pointer is erased from the other area so as to erase the second window, and the mouse pointer is superimposed on the other area. A program characterized by continuing display of the second window by clicking an operation button of the mouse in a state.