JP6418179B2

JP6418179B2 - Reading aloud practice device, display control method, and program

Info

Publication number: JP6418179B2
Application number: JP2016041017A
Authority: JP
Inventors: 成田　健; 健成田
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2016-03-03
Filing date: 2016-03-03
Publication date: 2018-11-07
Anticipated expiration: 2036-03-03
Also published as: JP2017156615A

Description

本発明は、話者によるテキストの音読を支援することが可能なシステム等の技術分野に関する。 The present invention relates to a technical field such as a system capable of supporting reading of text by a speaker.

近年、語学学習、アナウンス、または朗読等の練習支援を目的として、複数の文要素（例えば、フレーズや単語）を含む文を音読するときの手本（模範）となる音声に関する情報（例えば、音圧、音高、文要素を表すテキスト等）と、練習者が上記文を音読したときに発した音声に関する情報とを、比較可能に画面に表示するシステムが知られている。例えば、特許文献１に開示された技術では、単語毎に模範音声のピッチと、学習者の音声のピッチとを一緒に表示するようになっている。 In recent years, for the purpose of practice support such as language learning, announcements, or reading, information related to speech (examples) that serves as a model (example) when reading a sentence containing a plurality of sentence elements (for example, phrases and words). There is known a system that displays on a screen such that the pressure, pitch, text representing a sentence element, and the like, and information related to the sound produced when the practitioner reads the sentence aloud can be compared. For example, in the technique disclosed in Patent Document 1, the pitch of the model voice and the pitch of the learner's voice are displayed together for each word.

上述したようなシステムでは、文を音読するときの音声の波形を示す音声波形データに基づいて音声の波形の塊を所定のプログラムが判別し文要素（例えば、フレーズ）単位で分割することで複数の文要素区間（言い換えれば、文要素の波形の区間）を特定する。また、上記システムは、文のテキストデータに基づいて文全体のテキストを文要素（例えば、フレーズ）単位で分割することで複数のテキスト（つまり、文要素を表すテキスト）を特定する。そして、上記システムは、特定した文要素区間とテキストとを時系列で対応付けて記憶するようになっている。例えば、特許文献２には、テキスト情報を構成する各文字に音声波形データを対応付けて格納する技術が開示されている。 In the system as described above, a predetermined program discriminates a lump of a speech waveform based on speech waveform data indicating a speech waveform when a sentence is read aloud and divides the speech waveform into a plurality of sentence elements (for example, phrases). The sentence element section (in other words, the section of the waveform of the sentence element) is specified. Further, the system identifies a plurality of texts (that is, texts representing sentence elements) by dividing the text of the whole sentence into sentence elements (for example, phrases) based on the text data of the sentences. The system stores the identified sentence element section and the text in association with each other in time series. For example, Patent Document 2 discloses a technique for storing speech waveform data in association with each character constituting text information.

特開２００７−１３９８６８号公報JP 2007-139868 A 特開平４−３０５７３０号公報JP-A-4-305730

ところで、文要素区間とテキストとを時系列で対応付けたデータを用いて、例えば文要素区間の時間長に応じた長さの表示バーに対応する限られた領域に、その文要素区間に対応する文要素を表すテキストを描画することで、練習者（話者）の音読練習を支援することが考えられる。しかしながら、短い文要素区間に対して長いテキストが対応していると、上記表示バーの末端でテキストが途切れ、テキストの全部を描画できない場合がある。この場合、練習者は文要素を表すテキストを音読する途中で、表示バーの末端にテキスト途切れを発見することになり、テキストのスムーズな音読が妨げられてしまう。このため、練習者の音読に対する評価が、テキスト途切れが無かった場合と比較して下がってしまう可能性がある。 By the way, using the data in which the sentence element section and the text are associated with each other in time series, the sentence element section corresponds to the limited area corresponding to the display bar having a length corresponding to the time length of the sentence element section, for example. It is conceivable to assist the practitioner (speaker) in reading aloud by drawing text representing the sentence element to be played. However, if a long text corresponds to a short sentence element section, the text may be interrupted at the end of the display bar, and the entire text may not be drawn. In this case, the practitioner finds a text break at the end of the display bar while reading the text representing the sentence element, and smooth reading of the text is hindered. For this reason, there is a possibility that the practitioner's evaluation of reading aloud may be lower than when the text is not interrupted.

本発明は、以上の点に鑑みてなされたものであり、話者によるテキストのスムーズな音読を効果的に支援することが可能な音読練習装置、表示制御方法、及びプログラムを提供する。 The present invention has been made in view of the above points, and provides a reading aloud practice device, a display control method, and a program that can effectively support smooth reading of text by a speaker.

上記課題を解決するために、請求項１に記載の発明は、文を音読するときの音声の波形を示す音声波形データを再生する再生制御手段と、前記音声波形データに基づいて複数に区分された文要素区間であって前記文を構成する複数の文要素それぞれの音読開始タイミングから音読終了タイミングまでの文要素区間の時間長に応じた長さのオブジェクトを前記文の先頭から時系列で第１表示領域に表示させる第１表示制御手段と、前記文のテキストデータに基づいて複数に区分されたテキストであって前記文要素を表すテキストを前記文の先頭から時系列で前記オブジェクトに対応付け、且つ前記文要素を表すテキストの全部または一部を前記対応付けられた前記オブジェクトに対応する限られた領域に表示させる第２表示制御手段と、前記音声波形データの再生に従って、前記文要素を表すテキストの全部を前記文要素の音読開始タイミングに基づき第２表示領域に表示させ、前記第２表示領域に表示された前記テキストの全部を前記音読終了タイミングに基づき消去する第３表示制御手段と、を備えることを特徴とする。 In order to solve the above problem, the invention according to claim 1 is divided into a plurality of reproduction control means for reproducing voice waveform data indicating a voice waveform when a sentence is read aloud, and based on the voice waveform data. An object having a length corresponding to the time length of the sentence element section from the reading start timing to the reading end timing of each of the plurality of sentence elements constituting the sentence in chronological order from the head of the sentence. First display control means for displaying in one display area, and text that is divided into a plurality of texts based on text data of the sentence and that represents the sentence element is associated with the object in time series from the head of the sentence And a second display control means for displaying all or part of the text representing the sentence element in a limited area corresponding to the associated object, and the sound In accordance with the reproduction of the waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is displayed in the reading end timing. And third display control means for erasing based on.

請求項２に記載の発明は、請求項１に記載の音読練習装置において、前記第３表示制御手段は、前記音声波形データの再生に従って、前記文要素を表すテキストの全部が表示されるウインドウを前記文要素の音読開始タイミングに基づき前記第２表示領域にポップアップ表示させ、前記ポップアップ表示された前記ウインドウを前記音読終了タイミングに基づき消去することを特徴とする。 According to a second aspect of the present invention, in the reading aloud practice device according to the first aspect, the third display control means includes a window that displays all of the text representing the sentence element in accordance with the reproduction of the voice waveform data. Pop-up display is performed in the second display area based on the reading start timing of the sentence element, and the pop-up displayed window is erased based on the reading end timing.

請求項３に記載の発明は、請求項２に記載の音読練習装置において、前記第１表示制御手段は、前記音声波形データの再生に従って、前記オブジェクトが表示された前記第１表示領域をスクロールさせ、前記第３表示制御手段は、前記文要素を表すテキストの全部が表示される前記ウインドウを固定表示させることを特徴とする。 According to a third aspect of the present invention, in the reading aloud practice device according to the second aspect, the first display control means scrolls the first display area where the object is displayed in accordance with the reproduction of the voice waveform data. The third display control means fixedly displays the window in which all the text representing the sentence element is displayed.

請求項４に記載の発明は、請求項２または３に記載の音読練習装置において、前記文要素を表すテキストの全部の文字数が閾値より大きい場合、前記第３表示制御手段は、前記テキストの全部を複数行で前記ウインドウ内に表示させることを特徴とする。 According to a fourth aspect of the present invention, in the reading aloud practice device according to the second or third aspect, when the total number of characters of the text representing the sentence element is larger than a threshold, the third display control means Is displayed in the window in a plurality of lines.

請求項５に記載の発明は、請求項２乃至４の何れか一項に記載の音読練習装置において、前記文要素を表すテキストの全部の文字数が閾値より大きい場合、前記第３表示制御手段は、前記テキストを一部ずつ所定時間間隔で切り替えて前記ウインドウ内に表示させることを特徴とする。 According to a fifth aspect of the present invention, in the reading aloud practice device according to any one of the second to fourth aspects, when the total number of characters of the text representing the sentence element is greater than a threshold, the third display control means The text is switched in portions at predetermined time intervals and displayed in the window.

請求項６に記載の発明は、請求項１乃至５の何れか一項に記載の音読練習装置において、前記第３表示制御手段は、前記オブジェクトに対応する限られた領域に前記文要素を表すテキストの一部が表示される場合に限り、その文要素を表すテキストの全部を前記文要素の音読開始タイミングに基づき前記第２表示領域に表示させることを特徴とする。 According to a sixth aspect of the present invention, in the reading aloud practice device according to any one of the first to fifth aspects, the third display control means represents the sentence element in a limited area corresponding to the object. Only when a part of the text is displayed, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element.

請求項７に記載の発明は、請求項１乃至６の何れか一項に記載の音読練習装置において、前記第３表示制御手段は、前記文要素を表すテキストの全部を前記文要素の音読開始タイミングの所定時間前から表示させることを特徴とする。 According to a seventh aspect of the present invention, in the reading aloud practice device according to any one of the first to sixth aspects, the third display control means starts reading the whole sentence representing the sentence element aloud. The display is performed from a predetermined time before the timing.

請求項８に記載の発明は、請求項１乃至７の何れか一項に記載の音読練習装置において、前記第３表示制御手段は、前記第２表示領域に表示された前記テキストの全部を前記音読終了タイミングの所定時間後に消去することを特徴とする。 According to an eighth aspect of the present invention, in the reading aloud practice device according to any one of the first to seventh aspects, the third display control means reads the entire text displayed in the second display area. Erasure is performed after a predetermined time from the end of reading aloud.

請求項９に記載の発明は、請求項１乃至８の何れか一項に記載の音読練習装置において、前記第２表示領域には複数の部分表示領域が設定されており、隣合う２つの前記文要素を表すテキストが前記第２表示領域に表示される期間の一部が重なる場合に、前記第３表示制御手段は、それぞれの前記テキストを互いに異なる前記部分表示領域に表示させることを特徴とする。 According to a ninth aspect of the present invention, in the reading aloud practice device according to any one of the first to eighth aspects, a plurality of partial display areas are set in the second display area. The third display control means causes each of the texts to be displayed in the partial display areas different from each other when a part of a period in which the text representing the sentence element is displayed in the second display area overlaps. To do.

請求項１０に記載の発明は、請求項１乃至９の何れか一項に記載の音読練習装置において、前記文を表すテキストの全部を第３表示領域に表示させ、前記第３表示領域に表示された前記テキストにおける前記文要素を表すテキストの表示色を前記文要素の音読開始タイミングに基づき変更する第４表示制御手段を更に備えることを特徴とする。 According to a tenth aspect of the present invention, in the reading aloud practice device according to any one of the first to ninth aspects, the entire text representing the sentence is displayed in the third display area and displayed in the third display area. The display device further comprises fourth display control means for changing a display color of the text representing the sentence element in the written text based on a reading start timing of the sentence element.

請求項１１に記載の発明は、１つ以上のコンピュータにより実行される表示制御方法であって、文を音読するときの音声の波形を示す音声波形データを再生する再生制御ステップと、前記音声波形データに基づいて複数に区分された文要素区間であって前記文を構成する複数の文要素それぞれの音読開始タイミングから音読終了タイミングまでの文要素区間の時間長に応じた長さのオブジェクトを前記文の先頭から時系列で第１表示領域に表示させる第１表示制御ステップと、前記文のテキストデータに基づいて複数に区分されたテキストであって前記文要素を表すテキストを前記文の先頭から時系列で前記オブジェクトに対応付け、且つ前記文要素を表すテキストの全部または一部を前記対応付けられた前記オブジェクトに対応する限られた領域に表示させる第２表示制御ステップと、前記音声波形データの再生に従って、前記文要素を表すテキストの全部を前記文要素の音読開始タイミングに基づき第２表示領域に表示させ、前記第２表示領域に表示された前記テキストの全部を前記音読終了タイミングに基づき消去する第３表示制御ステップと、を含むことを特徴とする。 The invention according to claim 11 is a display control method executed by one or more computers, wherein a reproduction control step of reproducing voice waveform data indicating a voice waveform when a sentence is read aloud, and the voice waveform A sentence element section divided into a plurality based on data, and an object having a length corresponding to the time length of the sentence element section from the reading start timing to the reading end timing of each of the plurality of sentence elements constituting the sentence A first display control step for displaying in a first display area in time series from the beginning of a sentence, and text that is divided into a plurality of texts based on the text data of the sentence and that represents the sentence element from the beginning of the sentence Limited to correspond to the object that is associated with the object in time series and all or part of the text representing the sentence element In accordance with the second display control step to display in the area and the reproduction of the speech waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the second display area And a third display control step of erasing all of the text displayed on the basis of the reading end timing.

請求項１２に記載の発明は、文を音読するときの音声の波形を示す音声波形データを再生する再生制御ステップと、前記音声波形データに基づいて複数に区分された文要素区間であって前記文を構成する複数の文要素それぞれの音読開始タイミングから音読終了タイミングまでの文要素区間の時間長に応じた長さのオブジェクトを前記文の先頭から時系列で第１表示領域に表示させる第１表示制御ステップと、前記文のテキストデータに基づいて複数に区分されたテキストであって前記文要素を表すテキストを前記文の先頭から時系列で前記オブジェクトに対応付け、且つ前記文要素を表すテキストの全部または一部を前記対応付けられた前記オブジェクトに対応する限られた領域に表示させる第２表示制御ステップと、前記音声波形データの再生に従って、前記文要素を表すテキストの全部を前記文要素の音読開始タイミングに基づき第２表示領域に表示させ、前記第２表示領域に表示された前記テキストの全部を前記音読終了タイミングに基づき消去する第３表示制御ステップと、をコンピュータに実行させることを特徴とする。 The invention according to claim 12 is a reproduction control step of reproducing voice waveform data indicating a voice waveform when a sentence is read aloud, and a sentence element section divided into a plurality based on the voice waveform data, A first object that displays an object having a length corresponding to a time length of a sentence element section from a reading start timing to a reading end timing of each of a plurality of sentence elements constituting a sentence in the first display area in time series from the head of the sentence. A text that is divided into a plurality of texts based on the text data of the sentence and that represents the sentence element in association with the object in time series from the beginning of the sentence and that represents the sentence element; A second display control step for displaying all or part of the voice waveform data in a limited area corresponding to the associated object, and the audio waveform data According to the reproduction, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is erased based on the reading end timing. And causing the computer to execute a third display control step.

請求項１，１１及び１２に記載の発明によれば、オブジェクトに対応する限られた領域に文要素を表すテキストが途切れて表示される場合であっても、そのテキストの全部を一定時間、第２表示領域に表示させるので、話者によるテキストのスムーズな音読を効果的に支援することができる。 According to the inventions described in claims 1, 11 and 12, even when the text representing the sentence element is displayed in a limited area corresponding to the object, the entire text is stored for a certain period of time. Since it is displayed in the two display areas, it is possible to effectively support the smooth reading of the text by the speaker.

請求項２に記載の発明によれば、文要素を表すテキストの全部が表示されるウインドウを、一定時間、第２表示領域にポップアップ表示させるので、文要素を表すテキストの視認性を向上することができる。 According to the second aspect of the present invention, since the window displaying the entire text representing the sentence element is displayed in a pop-up display in the second display area for a certain period of time, the visibility of the text representing the sentence element is improved. Can do.

請求項３に記載の発明によれば、第２表示領域がスクロールしてもウインドウを固定表示させるので、文要素を表すテキストの視認性を向上することができる。 According to the invention described in claim 3, since the window is fixedly displayed even when the second display area is scrolled, the visibility of the text representing the sentence element can be improved.

請求項４に記載の発明によれば、文要素を表すテキストの全部の文字数が多い場合であっても、そのテキストの全部をウインドウ内に表示させることができる。 According to the fourth aspect of the present invention, even when the total number of characters of the text representing the sentence element is large, the entire text can be displayed in the window.

請求項５に記載の発明によれば、文要素を表すテキストの全部の文字数が多い場合であっても、そのテキストの全部をウインドウ内に効果的に切替表示させることができる。 According to the fifth aspect of the present invention, even when the number of characters of the text representing the sentence element is large, the entire text can be effectively switched and displayed in the window.

請求項６に記載の発明によれば、文要素を表すテキストの全部を表示させる処理負荷、及びメモリ使用容量を低減することができる。 According to the sixth aspect of the present invention, it is possible to reduce the processing load for displaying all the text representing the sentence element and the memory usage capacity.

請求項７に記載の発明によれば、文要素を表すテキストを話者に余裕を持って読ませることができる。 According to the seventh aspect of the present invention, it is possible to allow the speaker to read the text representing the sentence element with a margin.

請求項８に記載の発明によれば、文要素を表すテキストを話者に余裕を持って読ませることができる。 According to the invention described in claim 8, it is possible to allow the speaker to read the text representing the sentence element with a margin.

請求項９に記載の発明によれば、隣合う２つの文要素を表すテキストの視認性を向上することができる。 According to invention of Claim 9, the visibility of the text showing two adjacent sentence elements can be improved.

請求項１０に記載の発明によれば、文要素を表すテキストの音読時に、そのテキストが文全体においてどの部分に位置するかを、話者に一見して把握させることができる。 According to the tenth aspect of the present invention, when the text representing the sentence element is read aloud, the speaker can grasp at a glance where the text is located in the whole sentence.

本実施形態に係る音読練習装置Ｓの概要構成例を示す図である。It is a figure which shows the example of a schematic structure of the reading aloud practice apparatus S which concerns on this embodiment. 文要素区間リストの一例を示す図である。It is a figure which shows an example of a sentence element area list. ディスプレイＤに表示された表示画面の一例を示す図である。6 is a diagram illustrating an example of a display screen displayed on a display D. FIG. 表示バーに対応する限られた領域の例を示す図である。It is a figure which shows the example of the limited area | region corresponding to a display bar. （Ａ）は、隣合う文要素区間に対応するウインドウの表示継続期間の一部が重複する場合の表示画面の一例を示す図であり、（Ｂ）は、右側表示領域５２２において２つの吹き出し表示エリアが設定された場合の表示画面の一例を示す図である。(A) is a figure which shows an example of a display screen when a part of display continuation period of the window corresponding to an adjacent sentence element area overlaps, (B) is two balloon display in the right side display area 522. It is a figure which shows an example of the display screen when an area is set. 隣合う２つの文要素区間に対応するウインドウが互いに異なる吹き出し表示エリアに表示される場合の表示画面の一例を示す図である。It is a figure which shows an example of a display screen in case the window corresponding to two adjacent sentence element areas is displayed on a mutually different balloon display area. （Ａ）は、文要素を表すテキストの全部が複数行でウインドウ内に表示される場合の表示画面の一例を示す図であり、（Ｂ），（Ｃ）は、文要素を表すテキストの一部ずつ所定時間間隔で切り替えられてウインドウ内に表示される場合の表示画面の一例を示す図である。(A) is a figure which shows an example of the display screen in case all the text showing a sentence element is displayed in a window by multiple lines, (B), (C) is one of the text showing a sentence element. It is a figure which shows an example of the display screen in the case of being switched in a predetermined time interval, and displaying in a window. 表示領域のスクロール中にもウインドウが固定表示される場合の表示画面の一例を示す図である。It is a figure which shows an example of the display screen in case a window is fixedly displayed also while scrolling a display area. 話者が音読練習を行うときの表示画面の遷移例を示す図である。It is a figure which shows the example of a transition of a display screen when a speaker practices reading aloud. 話者が音読練習を行うときの表示画面の遷移例を示す図である。It is a figure which shows the example of a transition of a display screen when a speaker practices reading aloud. 話者が音読練習を行うときの表示画面の遷移例を示す図である。It is a figure which shows the example of a transition of a display screen when a speaker practices reading aloud. 音読練習処理中に行われる表示処理の一例を示すフローチャートである。It is a flowchart which shows an example of the display process performed during a reading aloud practice process. ウインドウ[i]の表示処理の一例を示すフローチャートである。It is a flowchart which shows an example of the display process of window [i]. 吹き出し出しリストの一例を示す図である。It is a figure which shows an example of a speech balloon list | wrist. 吹き出し表示されるウインドウの表示状態を示す概念図である。It is a conceptual diagram which shows the display state of the window displayed by balloon.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［１.音読練習装置Ｓの構成及び機能］
初めに、図１を参照して、本発明の一実施形態に係る音読練習装置Ｓの構成及び機能について説明する。図１は、本実施形態に係る音読練習装置Ｓの概要構成例を示す図である。なお、音読練習装置の一例として、パーソナルコンピュータや、携帯型情報端末（スマートフォン等）などが挙げられる。図１に示すように、音読練習装置Ｓは、通信部１、記憶部２、制御部３、操作部４、及びインターフェース（ＩＦ）部５等を備えて構成され、これらの構成要素はバス６に接続されている。音読練習装置Ｓは、発話練習装置ともいう。操作部４は、ユーザからの操作（ユーザ操作）を受け付け、そのユーザ操作に応じた信号を制御部３へ出力する。ユーザ操作の例として、マウス操作、及びキーボード操作が挙げられる。なお、ディスプレイＤがタッチパネルディスプレイである場合、ユーザ操作は、ユーザの指やペン等による接触操作であってもよい。インターフェース部５には、マイクＭ、及びディスプレイＤ等が接続される。マイクＭは、語学学習、アナウンス、朗読などの発話練習等を行う練習者である話者が、複数の文要素を含む文（文章）を音読したときに発した音声を集音する。文要素は、文を構成する単位である。文要素の例として、フレーズ、文節、単語の他、複数のフレーズが結合した結合フレーズ等が挙げられる。ここで、フレーズは、一般に文章を読むときに一息で読む単位である。フレーズは、１以上の文節から構成される。つまり、１つのフレーズが１つの文節から構成される場合もあるし、１つのフレーズが複数の文節から構成される場合もある。文節は、例えば、１つ以上の単語のまとまりである。単語には、名詞、動詞、形容詞、副詞、及び接続詞等の自立語（単独で文節を構成できる品詞）や、助動詞及び助詞等の付属語（単独で文節を構成できない品詞）などがある。音読対象となる文の例として、語学学習や、アナウンス、朗読などで用いられる文章などが挙げられる。ディスプレイＤは、制御部３からの表示指令にしたがって、後述する表示領域等が配置される表示画面を表示する。なお、マイクＭ、及びディスプレイＤは、音読練習装置Ｓと一体型であってもよいし、別体であってもよい。 [1. Configuration and function of reading aloud practice device S]
First, with reference to FIG. 1, the structure and function of the reading aloud practice apparatus S which concerns on one Embodiment of this invention are demonstrated. FIG. 1 is a diagram illustrating a schematic configuration example of the reading aloud practice device S according to the present embodiment. In addition, a personal computer, a portable information terminal (smartphone, etc.) etc. are mentioned as an example of a reading aloud practice apparatus. As shown in FIG. 1, the reading practice device S includes a communication unit 1, a storage unit 2, a control unit 3, an operation unit 4, an interface (IF) unit 5, and the like. It is connected to the. The reading aloud practice device S is also called an utterance practice device. The operation unit 4 receives an operation (user operation) from a user and outputs a signal corresponding to the user operation to the control unit 3. Examples of user operations include mouse operations and keyboard operations. When the display D is a touch panel display, the user operation may be a contact operation with a user's finger or pen. The interface unit 5 is connected to a microphone M, a display D, and the like. The microphone M collects the sound uttered when a speaker who is a practicing speaker such as language learning, announcement, reading, etc. reads a sentence (sentence) including a plurality of sentence elements. A sentence element is a unit constituting a sentence. Examples of sentence elements include phrases, phrases, words, and combined phrases in which a plurality of phrases are combined. Here, a phrase is a unit that is generally read at a time when reading a sentence. A phrase is composed of one or more phrases. That is, one phrase may be composed of one phrase, and one phrase may be composed of a plurality of phrases. A phrase is a group of one or more words, for example. Words include independent words such as nouns, verbs, adjectives, adverbs, and conjunctions (parts of speech that can constitute a phrase alone), adjuncts such as auxiliary verbs and particles (parts of speech that cannot constitute a phrase alone), and the like. Examples of sentences that can be read aloud include sentences used in language learning, announcements, and reading. The display D displays a display screen on which a display area, which will be described later, is arranged in accordance with a display command from the control unit 3. The microphone M and the display D may be integrated with the reading practice device S or may be separate.

通信部１は、有線または無線によりネットワーク（図示せず）に接続してサーバ等と通信を行う。記憶部２は、例えばハードディスクドライブ等からなり、ＯＳ（オペレーティングシステム）、及び音読練習処理プログラム（本発明のプログラムを含む）を記憶する。音読練習処理プログラムは、コンピュータとしての制御部３に、音読練習処理を実行させるプログラムである。音読練習処理プログラムは、アプリケーションとして、所定のサーバからダウンロードされてもよいし、ＣＤ、ＤＶＤ等の記録媒体に記憶されて提供されてもよい。また、記憶部２は、複数の文要素を含む文のテキストデータと、この文を音読するときの手本となる音声の波形を示す音声波形データ（以下、「手本音声波形データ」という）を記憶する。ここで、テキストデータには、例えば、音読対象となる文を構成する各文要素を表すテキスト（文字）が文要素毎に区分されて規定されている。例えば、文要素間に挿入される句読点により区切られる。或いは、文要素を表すテキストには、先頭から順番にシリアル番号が付与されていてもよい。なお、手本音声波形データは、所定の音声ファイル形式で記憶される。 The communication unit 1 communicates with a server or the like by connecting to a network (not shown) by wire or wireless. The storage unit 2 includes, for example, a hard disk drive and stores an OS (Operating System) and a reading practice program (including the program of the present invention). The reading aloud practice processing program is a program for causing the control unit 3 as a computer to execute the reading aloud practice processing. The reading aloud practice processing program may be downloaded from a predetermined server as an application, or may be provided by being stored in a recording medium such as a CD or a DVD. The storage unit 2 also includes text data of a sentence including a plurality of sentence elements, and voice waveform data (hereinafter referred to as “example voice waveform data”) indicating a waveform of a voice that serves as a model when the sentence is read aloud. Remember. Here, in the text data, for example, text (characters) representing each sentence element constituting a sentence to be read aloud is defined for each sentence element. For example, it is delimited by punctuation marks inserted between sentence elements. Or the serial number may be given to the text showing a sentence element in order from the head. The model voice waveform data is stored in a predetermined voice file format.

制御部３は、コンピュータとしてのＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、及びＲＡＭ（Random Access Memory）等により構成される。制御部３は、音読練習処理プログラムにより、音声処理部３１、表示処理部３２、及び音読評価部３３として機能する。音声処理部３１は、本発明における再生制御手段の一例である。表示処理部３２は、本発明における第１表示制御手段、第２表示制御手段、及び第３表示制御手段の一例である。 The control unit 3 includes a CPU (Central Processing Unit) as a computer, a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The control unit 3 functions as an audio processing unit 31, a display processing unit 32, and a reading aloud evaluation unit 33 according to the reading aloud practice processing program. The audio processing unit 31 is an example of a reproduction control unit in the present invention. The display processing unit 32 is an example of a first display control unit, a second display control unit, and a third display control unit in the present invention.

音声処理部３１は、所定の音声ファイル形式で記憶された手本音声波形データを処理対象として記憶部２から入力して再生する。なお、音声波形データは、離散化された時系列の音圧波形データであり、例えば、サンプリングレート44.1kHz、量子化16bit、及びモノラルの波形データである。音圧とは、音波による空気の圧力の変化分（Pa）をいう。本実施形態では、音圧として、瞬時音圧（Pa）の二乗平均平方根（RMS）である実効音圧（Pa）の大きさを計算上扱い易い数値で表した音圧レベル(dB)を適用する。音圧レベル(dB)は、広義には音量ともいう。 The voice processing unit 31 inputs model voice waveform data stored in a predetermined voice file format from the storage unit 2 as a processing target and reproduces it. The voice waveform data is discretized time-series sound pressure waveform data, for example, sampling rate 44.1 kHz, quantization 16 bits, and monaural waveform data. The sound pressure refers to the change in air pressure (Pa) due to sound waves. In this embodiment, the sound pressure level (dB) representing the effective sound pressure (Pa), which is the root mean square (RMS) of the instantaneous sound pressure (Pa), is expressed as a numerical value that is easy to handle in the calculation. To do. The sound pressure level (dB) is also called volume in a broad sense.

音声処理部３１は、再生された手本音声波形データに基づいて、各文要素の音読開始時間（つまり、音読開始タイミング）から、音読終了時間（つまり、音読終了タイミング）までの文要素区間を文要素毎に特定する。ここで、文要素区間とは、音声の波形の塊が例えばフレーズ単位で区分された区間である。音読開始時間は、手本音声波形データの再生開始時点（0:00）からの経過時間である。音読終了時間は、手本音声波形データの再生開始時点（0:00）からの経過時間である。音声処理部３１は、文要素毎に特定した文要素区間の音読開始時間及び音読終了時間と、それぞれの文要素を表すテキストを対応付けて文要素区間リストに登録する。これらの文要素を表すテキストは、例えば、再生された手本音声波形データに対応付けられているテキストデータから抽出され、例えば先頭の文要素区間から順に対応付けられる。図２は、文要素区間リストの一例を示す図である。図２に示す文要素区間リストには、手本音声波形データから特定された複数の文要素区間毎に、文要素を表すテキスト、音読開始時間、及び音読終了時間が対応付けられている。なお、図２に示す文要素区間[i]（i=0,1,2,3,4,5）は、シリアル番号を示す。 Based on the reproduced sample speech waveform data, the speech processing unit 31 determines a sentence element section from the reading start time (ie, reading start timing) of each sentence element to the reading end time (ie, reading end timing). Specify for each sentence element. Here, the sentence element section is a section in which a lump of speech waveform is divided, for example, in phrase units. The reading start time is an elapsed time from the reproduction start time (0:00) of the model voice waveform data. The reading end time is an elapsed time from the reproduction start time (0:00) of the model voice waveform data. The speech processing unit 31 registers the text-reading start time and the text-reading end time of the text element section specified for each text element in association with the text representing each text element in the text element section list. The text representing these sentence elements is extracted from, for example, text data associated with the reproduced sample speech waveform data, and is associated in order from the first sentence element section, for example. FIG. 2 is a diagram illustrating an example of a sentence element section list. In the sentence element section list shown in FIG. 2, a text representing a sentence element, a reading start time, and a reading end time are associated with each of a plurality of sentence element sections specified from the model speech waveform data. The sentence element section [i] (i = 0, 1, 2, 3, 4, 5) shown in FIG. 2 indicates a serial number.

ここで、音読開始時間と音読終了時間は、それぞれ、音声の波形から認識されてもよいし、上述したように算出された音圧レベル(dB)から認識されてもよい。例えば、音声処理部３１は、音声の波形の振幅が所定値以上になった時点を音読開始時間として認識する。或いは、音声処理部３１は、音圧レベル(dB)が所定値以上になった時点を音読開始時間として認識する。また、例えば、音声処理部３１は、音声の波形の振幅幅が所定値未満になった時点を音読終了時間として認識する。或いは、音声処理部３１は、音圧レベル(dB)が所定値未満になった時点を音読終了時間として認識する。なお、例えば、音圧レベル(dB)が所定値未満になった時点から、音圧レベル(dB)が所定値以上になった時点までの時間（無音時間）が閾値以上である場合に限り、音圧レベル(dB)が所定値未満になった時点が音読終了時間として認識され、且つ音圧レベル(dB)が所定値以上になった時点が音読開始時間として認識されるとよい（音声の波形の振幅についても同様）。これは、無音時間が閾値より短い場合、その区間で文要素を区切らない趣旨である。なお、音声処理部３１は、再生された手本音声波形データに基づいて、複数の文要素のうち何れかの文要素の音読終了時間から次の文要素の音読開始時間までのインターバル区間を特定してもよい。 Here, the reading start time and reading end time may each be recognized from the waveform of the voice, or may be recognized from the sound pressure level (dB) calculated as described above. For example, the voice processing unit 31 recognizes the time point when the amplitude of the voice waveform becomes a predetermined value or more as the reading start time. Alternatively, the voice processing unit 31 recognizes the time point when the sound pressure level (dB) is equal to or higher than a predetermined value as the reading start time. Further, for example, the voice processing unit 31 recognizes the time when the amplitude width of the voice waveform becomes less than a predetermined value as the reading end time. Alternatively, the voice processing unit 31 recognizes the time point when the sound pressure level (dB) becomes less than a predetermined value as the reading end time. For example, only when the time from when the sound pressure level (dB) becomes less than a predetermined value until the time when the sound pressure level (dB) becomes more than a predetermined value (silence time) is equal to or more than a threshold value, The point in time when the sound pressure level (dB) becomes less than the predetermined value is recognized as the end time for reading aloud, and the point in time when the sound pressure level (dB) exceeds the predetermined value is recognized as the start time for reading aloud (sound The same applies to the amplitude of the waveform). This means that when the silent time is shorter than the threshold value, the sentence element is not divided in the section. Note that the speech processing unit 31 specifies an interval section from the reading end time of any sentence element to the reading start time of the next sentence element based on the reproduced sample voice waveform data. May be.

また、音声処理部３１は、再生された手本音声波形データから所定時間毎に切り出したデータから音圧レベル(dB)を音圧として所定時間間隔毎に特定する。そして、音声処理部３１は、所定時間間隔毎に特定した音圧を示す音圧データをＲＡＭに記憶する。また、音声処理部３１は、再生された音声波形データから所定時間毎に切り出したデータから基本周波数（Hz）を算出し、算出した基本周波数（Hz）を音高として所定時間間隔毎に特定する。なお、音高（抑揚、ピッチともいう）の特定方法には、例えば、ゼロクロス法やベクトル自己相関等の公知の手法を適用できる。そして、音声処理部３１は、所定時間間隔毎に特定した音高を示す音高データをＲＡＭに記憶する。なお、音圧特定及び音高特定するための上記所定時間は、文要素区間の時間長（時間的長さ）より短い時間であり、例えば１０ｍｓ程度に設定される。また、音声処理部３１は、再生された手本音声波形データから所定時間毎に切り出したデータを窓掛けで区切って（例えば、25ms毎にフレーム化）、フーリエ解析（ＦＦＴ）することで振幅スペクトルを求める。そして、音声処理部３１は、求めた振幅スペクトルにメルフィルタバンクをかけ、メルフィルタバンクの出力を対数化した値を離散コサイン変換（ＤＣＴ）することでＭＦＣＣ（メル周波数ケプストラム係数）を算出することで、手本の声道特性を示す特徴量として文要素区間毎に特定する。そして、音声処理部３１は、文要素区間毎に特定した、手本の声道特性を示す特徴量を示す特徴量データをＲＡＭに記憶する。 In addition, the sound processing unit 31 specifies the sound pressure level (dB) from the data cut out from the reproduced model sound waveform data every predetermined time as the sound pressure at every predetermined time interval. Then, the sound processing unit 31 stores sound pressure data indicating the sound pressure specified at predetermined time intervals in the RAM. In addition, the voice processing unit 31 calculates a fundamental frequency (Hz) from data cut out from the reproduced voice waveform data every predetermined time, and specifies the calculated basic frequency (Hz) as a pitch at every predetermined time interval. . For example, a known method such as a zero cross method or vector autocorrelation can be applied to the pitch (also referred to as inflection or pitch) identification method. Then, the voice processing unit 31 stores pitch data indicating the pitch specified at predetermined time intervals in the RAM. The predetermined time for specifying the sound pressure and the pitch is shorter than the time length (temporal length) of the sentence element section, and is set to about 10 ms, for example. Further, the audio processing unit 31 divides the data cut out from the reproduced sample audio waveform data every predetermined time by windowing (for example, framed every 25 ms), and performs Fourier analysis (FFT) to obtain an amplitude spectrum. Ask for. Then, the speech processing unit 31 multiplies the obtained amplitude spectrum by a mel filter bank, and calculates a MFCC (mel frequency cepstrum coefficient) by performing a discrete cosine transform (DCT) on a logarithmic value of the output of the mel filter bank. Thus, the feature amount indicating the vocal tract characteristic of the model is specified for each sentence element section. Then, the speech processing unit 31 stores, in the RAM, feature amount data indicating a feature amount indicating the vocal tract characteristic of the model specified for each sentence element section.

なお、音声処理部３１は、話者が音読練習で上記文を音読したときに発した音声であってマイクＭにより集音された音声の波形を示す音声波形データ（以下、「話者音声波形データ」という）を入力する。そして、音声処理部３１は、入力された話者音声波形データに基づいて、手本音声波形データと同様の方法で、各文要素の音読開始時間から音読終了時間までの文要素区間を文要素毎に特定し、特定した各文要素区間に対して各文要素を表すテキストを対応付ける。また、音声処理部３１は、入力された話者音声波形データから所定時間毎に切り出したデータから音圧レベル(dB)を音圧として所定時間間隔毎に特定し、且つ、入力された話者音声波形データから所定時間毎に切り出したデータから基本周波数（Hz）を算出し、算出した基本周波数（Hz）を音高として所定時間間隔毎に特定する。さらに、音声処理部３１は、手本音声波形データと同様の方法で、入力された話者音声波形データから話者の声道特性を示す特徴量として文要素区間毎に特定する。なお、音声処理部３１は、入力された話者音声波形データに基づいて、複数の文要素のうち何れかの文要素の音読終了時間から次の文要素の音読開始時間までのインターバル区間を特定してもよい。 Note that the voice processing unit 31 is voice waveform data (hereinafter referred to as “speaker voice waveform”) that is a voice generated when a speaker reads aloud the above sentence in reading aloud and is collected by a microphone M. Data). Then, the speech processing unit 31 determines the sentence element interval from the reading start time to the reading end time of each sentence element based on the input speaker voice waveform data in the same manner as the model voice waveform data. The text is specified for each, and a text representing each sentence element is associated with each specified sentence element section. Further, the speech processing unit 31 specifies the sound pressure level (dB) from the data cut out from the input speaker voice waveform data every predetermined time as the sound pressure for every predetermined time interval, and the input speaker A fundamental frequency (Hz) is calculated from data cut out from the speech waveform data at predetermined time intervals, and the calculated basic frequency (Hz) is specified as a pitch at every predetermined time interval. Furthermore, the speech processing unit 31 specifies, for each sentence element section, as a feature amount indicating the vocal tract characteristic of the speaker from the input speaker speech waveform data by the same method as the model speech waveform data. The speech processing unit 31 specifies an interval section from the reading end time of any sentence element to the reading start time of the next sentence element based on the input speaker voice waveform data. May be.

次に、表示処理部３２は、話者が音読練習を行うための表示画面をディスプレイＤに表示させる。図３（Ａ），（Ｂ）は、ディスプレイＤに表示された表示画面の一例を示す図である。図３（Ａ）に示すように、表示画面には、手本表示領域５１、話者表示領域５２、全文表示領域５３、再生ボタン５４、スクロールバー５５、録音ボタン５６、及び総合評価点表示領域５７等が設けられている。手本表示領域５１（第１表示領域の一例）は、手本音声波形データから得られた情報を表示するための領域であり、話者表示領域５２は、話者音声波形データから得られた情報を表示するための領域である。手本表示領域５１と話者表示領域５２とは、時間軸ｔを挟んで上下方向（時間軸ｔと直交する縦方向）に並んで配置されており、矢印方向（左方向）にスクロール可能になっている。 Next, the display processing unit 32 causes the display D to display a display screen for the speaker to practice reading aloud. 3A and 3B are diagrams illustrating an example of a display screen displayed on the display D. FIG. As shown in FIG. 3A, the display screen includes a model display area 51, a speaker display area 52, a full-text display area 53, a play button 54, a scroll bar 55, a recording button 56, and a comprehensive evaluation score display area. 57 etc. are provided. The example display area 51 (an example of the first display area) is an area for displaying information obtained from the example voice waveform data, and the speaker display area 52 is obtained from the speaker voice waveform data. This is an area for displaying information. The sample display area 51 and the speaker display area 52 are arranged side by side in the vertical direction (vertical direction orthogonal to the time axis t) with the time axis t interposed therebetween, and can be scrolled in the arrow direction (left direction). It has become.

例えば、話者が音読練習を行うためにユーザ操作により再生ボタン５４を指定すると、手本音声波形データが再生され、表示処理部３２は、手本音声波形データの再生に従って、図３（Ａ），（Ｂ）に示すように、手本表示領域５１をスクロールさせながら、手本音声波形データから得られた情報を手本表示領域５１に表示させる。図３（Ａ）の例では、手本音声波形データから得られた情報として、文要素区間の時間長に応じた長さの表示バー（オブジェクトの一例）５１ａ１〜５１ａ５、音高の時系列的な変化を表す折線グラフ５１ｂ１〜５１ｂ５、及び音圧の時系列的な変化を表す棒グラフ５１ｃ１〜５１ｃ５が手本表示領域５１に表示されている。一方、例えば、話者が音読練習を行うためにユーザ操作により録音ボタン５６を指定すると、話者により発せられた音声であってマイクＭにより集音された音声の録音が開始され、表示処理部３２は、その音声の波形を示す話者音声波形データの入力に従って、話者表示領域５２をスクロールさせながら、話者音声波形データから得られた情報（図示せず）を話者表示領域５２に表示（リアルタイムに表示）させる。なお、スクロールバー５５は、手本表示領域５１及び話者表示領域５２のスクロールに追従して右方向へ移動する。 For example, if the playback button 54 is designated by the user operation for the speaker to practice reading aloud, the model voice waveform data is played back, and the display processing unit 32 follows the playback of the model voice waveform data as shown in FIG. , (B), information obtained from the model voice waveform data is displayed in the model display area 51 while scrolling the model display area 51. In the example of FIG. 3A, as information obtained from the sample speech waveform data, display bars (examples of objects) 51a1 to 51a5 having lengths corresponding to the time length of the sentence element section, time series of pitches Line graphs 51b1 to 51b5 representing various changes and bar graphs 51c1 to 51c5 representing time-series changes in sound pressure are displayed in the example display area 51. On the other hand, for example, when the recording button 56 is designated by a user operation in order for the speaker to practice reading aloud, recording of the sound emitted by the speaker and collected by the microphone M is started, and the display processing unit Reference numeral 32 denotes information (not shown) obtained from the speaker voice waveform data in the speaker display area 52 while scrolling the speaker display area 52 in accordance with the input of the speaker voice waveform data indicating the waveform of the voice. Display (display in real time). The scroll bar 55 moves in the right direction following the scrolling of the model display area 51 and the speaker display area 52.

また、図３（Ａ），（Ｂ）において、手本表示領域５１と話者表示領域５２上に描かれた縦線ＲＰは、手本音声波形データにおける現在の再生時間（再生位置）を示し、手本表示領域５１及び話者表示領域５２のスクロールには依存せず、図示する位置に固定表示される。話者表示領域５２は、縦線ＲＰを挟んで左側表示領域５２１と右側表示領域５２２とに分けられる。そして、話者音声波形データから得られた情報であってリアルタイムで表示される情報は、左側表示領域５２１に表示される（つまり、縦線ＲＰの位置から表れる）ようになっている。なお、左側表示領域５２１には、話者音声波形データから得られた情報として、文要素区間の時間長に応じた長さの表示バー、音高の時系列的な変化を表す折線グラフ、及び音圧の時系列的な変化を表す棒グラフが表示されることになる。 3A and 3B, vertical lines RP drawn on the model display area 51 and the speaker display area 52 indicate the current playback time (playback position) in the model voice waveform data. The display is fixed at the position shown in the figure without depending on the scrolling of the model display area 51 and the speaker display area 52. The speaker display area 52 is divided into a left display area 521 and a right display area 522 across the vertical line RP. Information obtained from the speaker speech waveform data and displayed in real time is displayed in the left display area 521 (that is, appears from the position of the vertical line RP). In the left display area 521, as information obtained from the speaker voice waveform data, a display bar having a length corresponding to the time length of the sentence element section, a line graph representing a time-series change in pitch, and A bar graph representing a time-series change in sound pressure is displayed.

また、表示処理部３２は、手本音声波形データから特定された文要素区間の時間長に応じた長さの表示バー（図３（Ａ）の例では、５１ａ１〜５１ａ５）を文の先頭から時系列で手本表示領域５１に表示させるときに、文要素を表すテキストを文の先頭から時系列で表示バーに対応付け、且つ文要素を表すテキストの全部または一部を上記対応付けられた表示バーに対応する限られた領域に表示させる（つまり、描画する）。ここで、表示バーに対応する限られた領域とは、表示バーに対応する文要素区間の時間長に依存した有限の領域を意味し、図３（Ａ）の例では、表示バーを画する領域である。このため、例えば短い文要素区間に対して長いテキスト（つまり、文要素を表すテキスト）が対応付けられていると、そのテキスト全てを有限の領域内に描画できない場合がある。例えば、表示バー５１ａ３に対応する文要素区間には「ありがとうございます。」（文要素を表すテキストの全部）が対応付けられているが、表示バー５１ａ３に対応する限られた領域には「ありがとうございま」というように、文要素を表すテキストが途切れて表示（つまり、文要素を表すテキストの一部が表示）されている。 In addition, the display processing unit 32 displays a display bar (51a1 to 51a5 in the example of FIG. 3A) having a length corresponding to the time length of the sentence element section specified from the model speech waveform data from the head of the sentence. When displaying in the model display area 51 in time series, the text representing the sentence element is associated with the display bar in time series from the beginning of the sentence, and all or part of the text representing the sentence element is associated with the above. Display in a limited area corresponding to the display bar (that is, draw). Here, the limited area corresponding to the display bar means a finite area depending on the time length of the sentence element section corresponding to the display bar. In the example of FIG. It is an area. For this reason, for example, if a long text (that is, a text representing a sentence element) is associated with a short sentence element section, the entire text may not be drawn in a finite area. For example, the sentence element section corresponding to the display bar 51a3 is associated with "Thank you." (All text representing the sentence element), but the limited area corresponding to the display bar 51a3 is "Thank you." The text representing the sentence element is discontinuously displayed (that is, a part of the text representing the sentence element is displayed).

なお、図３（Ａ）の例において、表示バーに対応する限られた領域は、表示バーを画する領域であるが、これに限定されるものではない。図４は、表示バーに対応する限られた領域の例を示す図である。図４（Ａ）に示す表示バーは、図３（Ａ）に示す表示バー５１ａ３に相当し、文要素を表すテキストの一部である「ありがとうございま」は、表示バー内に収まっている。一方、図４（Ｂ）の例では、文要素を表すテキストの一部である「ありがとうございま」は、表示バー内に収まらず、はみ出しているように見える。図４（Ｂ）に示す表示バーの中はグラデーションが掛かっており、下から上に向かって色が薄く（例えば、背景色（例えば白色））との区別がつかなくなっている。図４（Ｂ）の例では、グラデーションが掛かっている領域だけでなく、背景色との区別がつかなくなっている領域も、表示バーに対応する限られた領域に該当する。一方、図４（Ｃ）の例では、文要素を表すテキストの一部である「ありがとうございま」は、表示バーに隣接した領域に表示されているが、このように表示バーに隣接した領域もまた、表示バーに対応する限られた領域に該当する。一方、図４（Ｄ）の例では、表示バーの左右両端は濃く、中心に向かう程薄くなっているが、中心部分の領域は背景色との区別がつかなくてもよい。図４（Ｄ）の例では、背景色との区別がつかない中心部分の領域を含めて、表示バーに対応する限られた領域に該当する。なお、上記の例では、表示バーの形状は矩形状としたが、これに限定されるものではなく、表示バーを一例とするオブジェクトは、例えば、多角形上、円形、または雲形になっていたり、波線になっていたり、単に縦棒で境界を示されただけであったりしてもよい。 In the example of FIG. 3A, the limited area corresponding to the display bar is an area that delineates the display bar, but is not limited thereto. FIG. 4 is a diagram illustrating an example of a limited area corresponding to the display bar. The display bar shown in FIG. 4A corresponds to the display bar 51a3 shown in FIG. 3A, and “Thank you”, which is part of the text representing the sentence element, is within the display bar. On the other hand, in the example of FIG. 4B, “Thank you”, which is a part of the text representing the sentence element, does not fit in the display bar and appears to protrude. The display bar shown in FIG. 4B has gradation, and the color is lighter from the bottom to the top (for example, the background color (for example, white)) cannot be distinguished. In the example of FIG. 4B, not only a region with gradation but also a region that cannot be distinguished from the background color corresponds to a limited region corresponding to the display bar. On the other hand, in the example of FIG. 4C, “Thank you”, which is a part of the text representing the sentence element, is displayed in the area adjacent to the display bar. Also corresponds to a limited area corresponding to the display bar. On the other hand, in the example of FIG. 4D, the left and right ends of the display bar are darker and thinner toward the center. However, the region of the central portion may not be distinguished from the background color. In the example of FIG. 4D, this corresponds to a limited area corresponding to the display bar, including the central area that cannot be distinguished from the background color. In the above example, the shape of the display bar is rectangular. However, the present invention is not limited to this. For example, an object such as a display bar may be a polygon, a circle, or a cloud. , It may be a wavy line or simply indicated by a vertical bar.

上述したように、表示バーに対応する限られた領域に文要素を表すテキストが途切れて表示される場合、話者によるテキストのスムーズな音読が妨げられてしまう。そこで、本実施形態において、表示処理部３２は、手本音声波形データの再生に従って、表示バーに対応付けられる文要素を表すテキストの全部（図３（Ａ）の例では、「ありがとうございます。」）が表示されるウインドウ５１ｐ３を、この文要素（つまり、文要素区間）の音読開始時間（つまり、音読開始タイミング）に基づき右側表示領域５２２（第２表示領域の一例）に吹き出し表示（ポップアップ表示の一例）させ、吹き出し表示されたウインドウ５１ｐ３を、この文要素の音読終了時間（つまり、音読終了タイミング）に基づき消去する。このように、文要素を表すテキストの全部が表示されるウインドウを、一定時間、右側表示領域５２２にポップアップ表示させることで、文要素を表すテキストの視認性を向上することができる。例えば、文要素を表すテキストが表示されるウインドウ５１ｐ３は、手本音声波形データの再生開始時点（0:00）から、この文要素の音読開始時間が到来したときに吹き出し表示され、この文要素の音読終了時間が到来したときに消去される。この場合、ウインドウ５１ｐ３の表示継続期間は、図３（Ａ）に示す“ＳＴ(Standard visible Time)”となる。 As described above, when the text representing the sentence element is interrupted and displayed in the limited area corresponding to the display bar, smooth reading of the text by the speaker is hindered. Therefore, in the present embodiment, the display processing unit 32 follows the reproduction of the model voice waveform data, and in the example of FIG. 3 (A), “Thank you.” ”) Is displayed in a balloon (pop-up) on the right display area 522 (an example of the second display area) based on the reading start time (that is, reading start timing) of this sentence element (that is, the sentence element section). The window 51p3 displayed as a balloon is erased based on the reading end time of the sentence element (that is, the reading end timing). Thus, the visibility of the text representing the sentence element can be improved by popping up the window in which the entire text representing the sentence element is displayed in the right display area 522 for a certain period of time. For example, a window 51p3 in which text representing a sentence element is displayed is displayed in a balloon when the reading start time of this sentence element comes from the reproduction start time (0:00) of the model voice waveform data. It will be erased when the reading end time of. In this case, the display duration of the window 51p3 is “ST (Standard visible Time)” shown in FIG.

しかし、話者によるテキストの読み易さを考慮すると、ウインドウ５１ｐ３の表示継続期間には、ある程度、時間的な余裕を持たせることが望ましい。このため、表示処理部３２は、ウインドウ５１ｐ３を文要素の音読開始時間の所定時間ＢＴ（Before Time）前から表示させ、ウインドウ５１ｐ３を文要素の音読終了時間の所定時間ＡＴ（After Time）後に消去する。この場合、ウインドウ５１ｐ３の表示継続期間は、図３（Ａ）に示す“ＶＴ(Visible Time)”（＝ＢＴ＋ＳＴ＋ＡＴ）となる。これにより、話者は、ウインドウ５１ｐ３に表示されたテキストを余裕を持って読むことができる。図３（Ａ），（Ｂ）の例では、「ありがとうございます。」が表示されるウインドウ５１ｐ３は、吹き出し表示されてから“ＶＴ”の時間が経過した時点で消去され、その後に、「この電車は、」が表示されるウインドウ５１ｐ４が吹き出し表示されることになる。ウインドウ５１ｐ３は、文要素の音読開始時間の所定時間ＢＴ前から吹き出し表示され、文要素の音読終了時間が到来したときに消去されるように構成してもよい。また、ウインドウ５１ｐ３は、文要素の音読開始時間が到来したときに吹き出し表示され、文要素の音読終了時間の所定時間ＡＴ後に消去されるように構成してもよい。なお、表示処理部３２は、表示バーに対応する限られた領域に文要素を表すテキストの一部が表示される場合（つまり、文要素を表すテキストが途切れて表示される場合）に限り、その文要素を表すテキストの全部が表示されるウインドウを文要素の音読開始時間に基づき右側表示領域５２２に吹き出し表示させてもよい。これにより、ウインドウを表示させる処理負荷、及びメモリ使用容量を低減することができる。 However, considering the readability of the text by the speaker, it is desirable to provide some time margin for the display continuation period of the window 51p3. For this reason, the display processing unit 32 displays the window 51p3 before a predetermined time BT (Before Time) of the reading start time of the sentence element, and deletes the window 51p3 after a predetermined time AT (After Time) of the reading end time of the sentence element. To do. In this case, the display continuation period of the window 51p3 is “VT (Visible Time)” (= BT + ST + AT) shown in FIG. Thereby, the speaker can read the text displayed in the window 51p3 with a margin. In the example of FIGS. 3A and 3B, the window 51p3 in which “Thank you” is displayed is deleted when the time “VT” has elapsed since the balloon is displayed. The window 51p4 in which “Train is displayed” is displayed in a balloon. The window 51p3 may be configured to be displayed in a balloon from a predetermined time BT before the reading start time of the sentence element, and to be deleted when the reading end time of the sentence element arrives. Further, the window 51p3 may be configured to be displayed in a balloon when the reading start time of the sentence element arrives and to be deleted after a predetermined time AT of the reading end time of the sentence element. Note that the display processing unit 32 is limited to a case where a part of text representing a sentence element is displayed in a limited area corresponding to the display bar (that is, when the text representing the sentence element is displayed intermittently). A window in which all the text representing the sentence element is displayed may be displayed as a balloon in the right display area 522 based on the reading start time of the sentence element. As a result, the processing load for displaying the window and the memory usage capacity can be reduced.

ところで、ウインドウの吹き出し表示とは、ウインドウが右側表示領域５２２上にポップアップ（言い換えると、重畳）する表示（ポップアップ表示）であり、なおかつ、対応する表示バーからウインドウが吹き出されるように表示されることをいう。表示バーからウインドウが吹き出されるように表示させることで、話者に対して表示バーとウインドウとの対応関係を明確に把握させることができる。このような吹き出しの表示形態は、ウインドウと表示バーとの対応関係を明確に把握させることができるのであればどのように構成してもよい。図３（Ｂ）の例では、ウインドウは、長方形の上部に三角形が付加された形状をしており、その三角形の頂点が表示バーに接触している例を示しているが、三角形の頂点は表示バーを指し示せばよいので、この頂点は表示バーから離れていてもよい。また、表示バーを指し示すのは三角形ではなく、複数の楕円が離間した図形であってもよいし、線（例えば、実線または破線）であってもよい。また、ウインドウは、多角形、円形、または雲形の上部に三角形（または、複数の楕円が離間した図形、または線）が付加された形状をしていてもよい。また、ウインドウの時間軸方向（横方向）の長さ（ピクセル数）は、ウインドウ内に表示されるテキストの文字数に応じて変化するように構成するとよい。この場合、テキストの文字数が少ないほどウインドウの時間軸方向の長さは短くなる。また、表示バーとウインドウとの対応関係が明確であれば、ウインドウは、右側表示領域５２２上に、単にポップアップ表示される（つまり、吹き出しは行わない）ように構成してもよい。また、別の例として、ウインドウをポップアップ表示させずに、文要素を表すテキストの全部を右側表示領域５２２上に直接表示させるように構成してもよい。 By the way, the balloon display of the window is a display (pop-up display) in which the window pops up (in other words, superimposes) on the right display area 522, and is displayed so that the window is blown out from the corresponding display bar. That means. By displaying the window so that it blows out from the display bar, the speaker can clearly understand the correspondence between the display bar and the window. Such a balloon display form may be configured in any way as long as the correspondence between the window and the display bar can be clearly understood. In the example of FIG. 3 (B), the window has a shape in which a triangle is added to the top of the rectangle, and the vertex of the triangle is in contact with the display bar. Since the display bar only needs to be pointed, this vertex may be away from the display bar. In addition, the display bar may not be a triangle but may be a figure in which a plurality of ellipses are separated, or a line (for example, a solid line or a broken line). The window may have a shape in which a triangle (or a figure or a line in which a plurality of ellipses are separated) is added to the top of a polygon, a circle, or a cloud. The length (number of pixels) in the time axis direction (horizontal direction) of the window may be configured to change according to the number of characters of text displayed in the window. In this case, the smaller the number of characters in the text, the shorter the length of the window in the time axis direction. In addition, if the correspondence between the display bar and the window is clear, the window may be configured to be simply pop-up displayed on the right display area 522 (that is, no balloon is performed). As another example, the entire text representing the sentence element may be displayed directly on the right display area 522 without popping up the window.

全文表示領域５３（第３表示領域の一例）は、文を表すテキストの全部（つまり、全文）を表示するための領域である。表示処理部３２は、図３（Ａ）に示すように文を表すテキストの全部を全文表示領域５３に表示させ、右側表示領域５２２上に文要素を表すテキストを表示させるときに（例えば、文要素を表すテキストが表示されるウインドウを吹き出し表示させるときに）、全文表示領域５３に表示されたテキストにおける文要素を表すテキストの表示色を文要素の音読開始時間に基づき変更する（つまり、ハイライト表示させる）。これにより、文要素を表すテキストの音読時に、そのテキストが、文全体においてどの部分に位置するかを話者に一見して把握させることができる。図３（Ａ）の例では、「ありがとうございます。」が表示されるウインドウ５１ｐ３が表示されている間に、全文表示領域５３に表示されたテキストにおける文要素を表すテキスト５３ｈ３「ありがとうございます。」の表示色が黒色から赤色に変更されている。なお、図３（Ａ）の例では、テキスト５３ｈ３は、さらに、太字に変更され、その背景色も変更されている。また、図３（Ｂ）の例では、「この電車は、」が表示されるウインドウ５１ｐ４が表示されている間に、全文表示領域５３に表示されたテキストにおける文要素を表すテキスト５３ｈ４「この電車は、」の表示色が黒色から赤色に変更されている。なお、図３（Ｂ）の例でも、テキスト５３ｈ４は、さらに、太字に変更され、その背景色も変更されている。総合評価点表示領域５７は、全文要素区間における評価項目（例えば、抑揚、音量、滑舌、及び速度）別の総合評価点と、全インターバル区間における間合いに対する総合評価点と、全区間における総合評価点（総合得点）を表示するための領域である。総合評価点表示領域５７に表示される総合評価点は、音読評価部３３により算出される（詳細は後述）。 The full text display area 53 (an example of a third display area) is an area for displaying the entire text representing a sentence (that is, the full text). As shown in FIG. 3A, the display processing unit 32 displays all the text representing a sentence in the whole sentence display area 53 and displays the text representing the sentence element in the right display area 522 (for example, the sentence When a window displaying text representing an element is displayed in a balloon, the display color of the text representing the text element in the text displayed in the full text display area 53 is changed based on the reading start time of the text element (that is, high Light display). Thereby, at the time of reading aloud the text representing the sentence element, the speaker can grasp at a glance where the text is located in the whole sentence. In the example of FIG. 3A, while the window 51p3 in which “Thank you” is displayed is displayed, the text 53h3 “Thank you” representing the sentence element in the text displayed in the full text display area 53. "Is changed from black to red. In the example of FIG. 3A, the text 53h3 is further changed to bold and its background color is also changed. In the example of FIG. 3B, the text 53h4 “this train” representing the sentence element in the text displayed in the full text display area 53 while the window 51p4 displaying “This train is” is displayed. The display color of “has been changed from black to red. In the example of FIG. 3B, the text 53h4 is further changed to bold and its background color is also changed. The comprehensive evaluation score display area 57 includes a comprehensive evaluation score for each evaluation item (for example, inflection, volume, tongue, and speed) in the whole sentence element section, a comprehensive evaluation score for the interval in all interval sections, and a comprehensive evaluation in all sections. This is an area for displaying points (total score). The comprehensive evaluation score displayed in the comprehensive evaluation score display area 57 is calculated by the reading aloud evaluation unit 33 (details will be described later).

また、図３（Ａ）の例では、ウインドウ５１ｐ３を文要素の音読開始時間の所定時間ＢＴ前から吹き出し表示させ、文要素の音読終了時間の所定時間ＡＴ後に消去するように構成したが、隣合う文要素区間の間隔が所定時間ＢＴまたは所定時間ＡＴに比べて短い場合、右側表示領域５２２において、隣合う２つの文要素区間に対応するウインドウ（つまり、隣合う２つの文要素を表すテキスト）の表示継続期間が重複し、その結果、隣合う文要素区間に対応するウインドウが重なって表示され、視認性が阻害されてしまうことがある。図５（Ａ）は、隣合う文要素区間に対応するウインドウの表示継続期間の一部が重複する場合の表示画面の一例を示す図である。図５（Ａ）に示す“ＶＴ１”は、表示バー５１ａ４に対応するウインドウ（この例では、まだ表示されていない）の表示継続期間を示し、図５（Ａ）に示す“ＶＴ２”は、表示バー５１ａ５に対応するウインドウの表示継続期間を示し、図５（Ａ）に示す“ＶＴ３”は、表示バー５１ａ６に対応するウインドウの表示継続期間を示している。図５（Ａ）に示すように、表示継続期間ＶＴ１と表示継続期間ＶＴ２には重複する期間が存在し、表示継続期間ＶＴ１と表示継続期間ＶＴ２には重複する期間が存在しているため、この重複する期間において、２つのウインドウ（つまり、隣合う２つの文要素を表すテキスト）が重なって表示され、視認性が阻害されてしまう。 In the example of FIG. 3A, the window 51p3 is displayed in a balloon from a predetermined time BT before the reading start time of the sentence element, and is deleted after the predetermined time AT of the reading end time of the sentence element. When the interval between the matching sentence element sections is shorter than the predetermined time BT or the predetermined time AT, a window corresponding to two adjacent sentence element sections in the right display area 522 (that is, text representing two adjacent sentence elements). As a result, the windows corresponding to the adjacent sentence element sections are displayed overlapping each other, and the visibility may be hindered. FIG. 5A is a diagram illustrating an example of a display screen when a part of the display continuation period of the window corresponding to the adjacent sentence element section overlaps. “VT1” shown in FIG. 5A indicates the display duration of the window corresponding to the display bar 51a4 (not displayed yet in this example), and “VT2” shown in FIG. The window display continuation period corresponding to the bar 51a5 is shown, and “VT3” shown in FIG. 5A indicates the window display continuation period corresponding to the display bar 51a6. As shown in FIG. 5A, since the display continuation period VT1 and the display continuation period VT2 have overlapping periods, the display continuation period VT1 and the display continuation period VT2 have overlapping periods. In the overlapping period, two windows (that is, texts representing two adjacent sentence elements) are displayed overlapping each other, and visibility is hindered.

そこで、本実施形態では、右側表示領域５２２に複数の吹き出し表示エリア（部分表示領域の一例）が設定される。そして、表示処理部３２は、隣合う２つの文要素区間に対応するウインドウ（つまり、隣合う２つの文要素を表すテキスト）が右側表示領域５２２に表示される表示継続期間の一部が重なる場合に、それぞれのウインドウを互いに異なる吹き出し表示エリアに表示させる。これにより、隣合う２つの文要素区間に対応するウインドウが重なって表示されることを回避し、隣合う２つの文要素を表すテキストの視認性を向上することができる。図５（Ｂ）は、右側表示領域５２２において２つの吹き出し表示エリアが設定された場合の表示画面の一例を示す図である。なお、図５（Ｂ）の例では、第１吹き出しエリア５２２ａと第２吹き出しエリア５２２ｂとが設定されているが、３つの以上の第１吹き出しエリアが設定されてもよい。また、図５（Ｂ）の例では、第１吹き出しエリア５２２ａと第２吹き出しエリア５２２ｂとが上下に設定されているが、第１吹き出しエリア５２２ａと第２吹き出しエリア５２２ｂとが左右に設定されてもよい。 Therefore, in the present embodiment, a plurality of balloon display areas (an example of partial display areas) are set in the right display area 522. The display processing unit 32 then overlaps a part of the display continuation period in which the windows corresponding to the two adjacent sentence element sections (that is, the text representing the two adjacent sentence elements) are displayed in the right display area 522. Each window is displayed in a different balloon display area. Thereby, it can avoid that the window corresponding to two adjacent sentence element areas is displayed overlapping, and the visibility of the text showing two adjacent sentence elements can be improved. FIG. 5B is a diagram illustrating an example of a display screen when two balloon display areas are set in the right display area 522. In the example of FIG. 5B, the first balloon area 522a and the second balloon area 522b are set, but three or more first balloon areas may be set. In the example of FIG. 5B, the first balloon area 522a and the second balloon area 522b are set up and down, but the first balloon area 522a and the second balloon area 522b are set left and right. Also good.

図６（Ａ）〜（Ｃ）は、隣合う２つの文要素区間に対応するウインドウが互いに異なる吹き出し表示エリアに表示される場合の表示画面の一例を示す図（遷移図）である。先ず、図６（Ａ）に示すように、「この電車は、」が表示されるウインドウ５１ｐ４が文要素の音読開始時間の所定時間ＢＴ前の時点で第１吹き出しエリア５２２ａに吹き出し表示される。続いて、図６（Ｂ）に示すように、「東西線急行、」が表示されるウインドウ５１ｐ５が文要素の音読開始時間の所定時間ＢＴ前の時点で第２吹き出しエリア５２２ｂに吹き出し表示される。このように、隣合う２つの文要素区間に対応するウインドウ５１ｐ４，５１ｐ５は互いに異なる吹き出しエリアに吹き出し表示される。しかも、図６（Ｂ）の例では、２つのウインドウ５１ｐ４，５１ｐ５の背景色が互いに異なっているので、隣合う２つの文要素を表すテキストの視認性を大幅に向上することができる。続いて、図６（Ｃ）に示すように、「この電車は、」が表示されるウインドウ５１ｐ４が文要素の音読終了時間の所定時間ＡＴ後の時点で第１吹き出しエリア５２２ａから消去される。その後も、隣合う２つの文要素区間に対応するウインドウが互いに異なる吹き出しエリアに交互に表示されることになる。なお、図６（Ｃ）の例において、ウインドウ５１ｐ４が第１吹き出しエリア５２２ａから消去されたときに、ウインドウ５１ｐ５を第２吹き出しエリア５２２ｂから消去し、且つ第１吹き出しエリア５２２ａに吹き出し表示させるように構成してもよい。つまり、隣合う２つの文要素を表すテキストの内、先に音読するテキストを表示するウインドウは常に上側の第１吹き出しエリア５２２ａに表示し、後に音読するテキストを表示するウインドウは常に下側の第２吹き出しエリア５２２ｂに表示するように表示されることになる。 6A to 6C are diagrams (transition diagrams) illustrating an example of a display screen when windows corresponding to two adjacent sentence element sections are displayed in different balloon display areas. First, as shown in FIG. 6A, a window 51p4 in which “This train is” is displayed in a balloon in the first balloon area 522a at a point before a predetermined time BT of the reading start time of the sentence element. Subsequently, as shown in FIG. 6B, a window 51p5 in which “Tozai Line Express” is displayed is displayed in a balloon in the second balloon area 522b at a time before the predetermined time BT of the reading start time of the sentence element. . Thus, the windows 51p4 and 51p5 corresponding to two adjacent sentence element sections are displayed in a balloon in different balloon areas. In addition, in the example of FIG. 6B, since the background colors of the two windows 51p4 and 51p5 are different from each other, the visibility of text representing two adjacent sentence elements can be greatly improved. Subsequently, as shown in FIG. 6C, the window 51p4 in which “This train is” is deleted from the first balloon area 522a at a time after a predetermined time AT of the reading end time of the sentence element. After that, windows corresponding to two adjacent sentence element sections are alternately displayed in different balloon areas. In the example of FIG. 6C, when the window 51p4 is deleted from the first balloon area 522a, the window 51p5 is deleted from the second balloon area 522b and displayed in a balloon in the first balloon area 522a. It may be configured. That is, of the texts that represent two adjacent sentence elements, the window that displays the text to be read aloud first is always displayed in the upper first balloon area 522a, and the window that displays the text to be read aloud later is always the lower second. 2 so that it is displayed in the balloon area 522b.

上述したように、図３及び図６では、ウインドウ内に文要素を表すテキストの全部が１行で表示される例を示したが、表示処理部３２は、文要素を表すテキストの全部の文字数が閾値より大きい場合、このテキストの全部を複数行でウインドウ内に表示させてもよい。ここで、閾値は、例えばウインドウ内で１行に表示可能な最大文字数に設定される。図７（Ａ）は、文要素を表すテキストの全部が複数行でウインドウ内に表示される場合の表示画面の一例を示す図である。図７（Ａ）の例では、第１吹き出しエリア５２２ａに吹き出し表示されたウインドウ５１ｐ１１内のテキストは２行で表示されている。これにより、文要素区間に対応付けられたテキストの全部の文字数が多い場合であっても、そのテキストの全部をウインドウ内に表示させることができる。なお、テキストの全部を複数行でウインドウ内に表示させる場合、例えば句読点などの位置で区切り、区切られた位置以降のテキストを次の行に表示されるように構成してもよい。 As described above, FIG. 3 and FIG. 6 show an example in which the entire text representing the sentence element is displayed in one line in the window, but the display processing unit 32 displays the total number of characters representing the text element. If is greater than the threshold, the entire text may be displayed in a window with multiple lines. Here, the threshold value is set to the maximum number of characters that can be displayed in one line in the window, for example. FIG. 7A is a diagram illustrating an example of a display screen in the case where all the text representing a sentence element is displayed in a window with a plurality of lines. In the example of FIG. 7A, the text in the window 51p11 displayed in a balloon in the first balloon area 522a is displayed in two lines. As a result, even if the number of characters in the text associated with the sentence element section is large, the entire text can be displayed in the window. When all the text is displayed in the window in a plurality of lines, the text may be separated at a position such as a punctuation mark, and the text after the separated position may be displayed on the next line.

別の例として、表示処理部３２は、文要素を表すテキストの全部の文字数が閾値より大きい場合、このテキストを一部ずつ所定時間間隔で切り替えてウインドウ内に表示させてもよい。ここで、閾値は、例えばウインドウ内に複数行で表示可能な最大文字数に設定される。図７（Ｂ），（Ｃ）は、文要素を表すテキストの一部ずつ所定時間間隔で切り替えられてウインドウ内に表示される場合の表示画面の一例を示す図である。図７（Ｂ）に示すように、第１吹き出しエリア５２２ａに吹き出し表示されたウインドウ５１ｐ１１内には文要素を表すテキストの一部が２行で表示されており、所定時間が経過すると、図７（Ｃ）に示すように、文要素を表すテキストは１行分上方向にスクロールされることで、表示されていなかった残りのテキストが表示されることになる。これにより、文要素区間に対応付けられたテキストの全部の文字数が多い場合であっても、そのテキストの全部をウインドウ内に効果的に切替表示させることができる。 As another example, when the total number of characters of the text representing the sentence element is larger than the threshold value, the display processing unit 32 may switch the text part by part at a predetermined time interval and display it in the window. Here, the threshold is set to the maximum number of characters that can be displayed in a plurality of lines in the window, for example. FIGS. 7B and 7C are diagrams illustrating an example of a display screen when a part of text representing a sentence element is switched at predetermined time intervals and displayed in a window. As shown in FIG. 7B, a part of the text representing the sentence element is displayed in two lines in the window 51p11 displayed in a balloon in the first balloon area 522a. As shown in (C), the text representing the sentence element is scrolled upward by one line, so that the remaining text that has not been displayed is displayed. As a result, even when the number of characters in the text associated with the sentence element section is large, the entire text can be effectively switched and displayed in the window.

なお、上述したように、表示処理部３２は、手本音声波形データの再生に従って、表示バーが表示された手本表示領域５１及び話者表示領域５２をスクロールさせるが、このスクロール中にも、文要素を表すテキストの全部が表示されるウインドウを右側表示領域５２２上（例えば、右側表示領域５２２上の上位レイヤ）に固定表示させる。図８（Ａ），（Ｂ）は、表示領域のスクロール中にもウインドウが固定表示される場合の表示画面の一例を示す図（遷移図）である。図８（Ａ）に示すように、ウインドウ５１ｐ２１が表示バー５１ａ２１から吹き出し表示されているが、その後、図８（Ｂ）に示すように、ウインドウ５１ｐ２１に対応する表示バー５１ａ２１がスクロールにより左方向に移動しても、ウインドウ５１ｐ２１は固定表示されている（つまり、第１吹き出しエリア５２２ａまたは第２吹き出しエリア５２２ｂの外に当たる領域、例えばｐｏｐＮＧの位置に移動しない）。これにより、文要素を表すテキストの視認性を向上することができる。 As described above, the display processing unit 32 scrolls the model display area 51 and the speaker display area 52 on which the display bar is displayed according to the reproduction of the model voice waveform data. A window in which all the text representing the sentence element is displayed is fixedly displayed on the right display area 522 (for example, an upper layer on the right display area 522). FIGS. 8A and 8B are diagrams (transition diagrams) illustrating an example of a display screen when a window is fixedly displayed even while the display area is scrolled. As shown in FIG. 8A, the window 51p21 is displayed as a balloon from the display bar 51a21. Thereafter, as shown in FIG. 8B, the display bar 51a21 corresponding to the window 51p21 is scrolled leftward. Even if the window 51p21 is moved, the window 51p21 is fixedly displayed (that is, the window 51p21 does not move to the area outside the first balloon area 522a or the second balloon area 522b, for example, the position of popNG). Thereby, the visibility of the text showing a sentence element can be improved.

次に、音読評価部３３は、手本音声波形データから特定された文要素区間と、話者音声波形データから特定された文要素区間とを対応付けて文要素区間毎に話者の音読に対する評価を行う。このとき、音読評価部３３は、上記区間毎、且つ複数の評価項目毎に、話者の音読に対する評価を行うとよい。ここで、評価項目の例として、抑揚、音量、滑舌、及び速度が挙げられる。例えば、音読評価部３３は、手本音声波形データから特定された音高と、話者音声波形データから特定された音高との差を文要素区間毎に算出し、算出した差に基づいて、話者の抑揚に対する評価点を文要素区間毎に算出する。この評価点は、例えば３０点を満点とし、差が０に近いほど高くなる（満点に近づく）ように算出される。また、音読評価部３３は、手本音声波形データから特定された音圧と、話者音声波形データから特定された音圧との差を文要素区間毎に算出し、算出した差に基づいて、話者の音量に対する評価点を文要素区間毎に算出する。この評価点は、例えば３０点を満点とし、差が０に近いほど高くなるように算出される。また、音読評価部３３は、手本の声道特性を示す特徴量と話者の声道特性を示す特徴量との類似度を文要素区間毎に算出し、算出した類似度に基づいて、話者の滑舌に対する評価点を文要素区間毎に算出する。この評価点は、例えば３０点を満点とし、類似度が高いほど高くなるように算出される。また、音読評価部３３は、手本音声波形データから特定された文要素区間の時間長と、話者音声波形データから特定された文要素区間の時間長との時間差を文要素区間毎に算出し、算出した時間差の絶対値に基づいて、話者の速度（音読スピード）に対する評価点を文要素区間毎に算出する。この評価点は、例えば３０点を満点とし、時間差の絶対値が０に近いほど高くなるように算出される。 Next, the reading aloud evaluation unit 33 associates the sentence element section specified from the model voice waveform data with the sentence element section specified from the speaker voice waveform data, and performs reading of the speaker for each sentence element section. Evaluate. At this time, the reading aloud evaluation part 33 is good to evaluate a speaker's reading aloud for every said area and every several evaluation item. Here, examples of evaluation items include intonation, volume, smooth tongue, and speed. For example, the reading aloud evaluation unit 33 calculates the difference between the pitch specified from the model voice waveform data and the pitch specified from the speaker voice waveform data for each sentence element section, and based on the calculated difference. The evaluation score for the speaker inflection is calculated for each sentence element section. The evaluation score is calculated so that, for example, 30 points are full marks, and the difference is close to 0, and becomes higher (closer to full marks). Further, the reading aloud evaluation unit 33 calculates, for each sentence element section, a difference between the sound pressure specified from the model voice waveform data and the sound pressure specified from the speaker voice waveform data, and based on the calculated difference. The evaluation score for the speaker volume is calculated for each sentence element section. This evaluation score is calculated so that, for example, 30 points are full, and the higher the difference is, the higher the score is. Further, the reading aloud evaluation unit 33 calculates the similarity between the feature quantity indicating the vocal tract characteristic of the model and the feature quantity indicating the vocal tract characteristic of the speaker for each sentence element section, and based on the calculated similarity, An evaluation score for the speaker's smooth tongue is calculated for each sentence element section. This evaluation score is calculated so that, for example, the maximum score is 30, and the higher the similarity is, the higher the score is. Moreover, the reading aloud evaluation unit 33 calculates, for each sentence element section, a time difference between the time length of the sentence element section specified from the model voice waveform data and the time length of the sentence element section specified from the speaker voice waveform data. Then, based on the calculated absolute value of the time difference, an evaluation score for the speaker speed (reading speed) is calculated for each sentence element section. This evaluation score is calculated so that, for example, 30 points are full, and the absolute value of the time difference is closer to 0.

また、音読評価部３３は、手本音声波形データから特定されたインターバル区間と、話者音声波形データから特定されたインターバル区間とを対応付けてインターバル区間毎に、話者の間（間合い）に対する評価を行ってもよい。例えば、音読評価部３３は、手本音声波形データから特定されたインターバル区間の時間長と、話者音声波形データから特定されたインターバル区間の時間長との時間差をインターバル区間毎に算出し、算出した時間差の絶対値に基づいて、話者の間合いに対する評価点をインターバル区間毎に算出する。この評価点は、例えば３０点を満点とし、時間差の絶対値が０に近いほど高くなるように算出される。また、音読評価部３３は、文要素区間毎の評価項目別の評価点の平均値（合計値でもよい）を全文要素区間における評価項目別の総合評価点として算出し、インターバル区間毎の評価点の平均値（合計値でもよい）を全インターバル区間における間合いに対する総合評価点として算出する。さらに、音読評価部３３は、全文要素区間における評価項目別の総合評価点と、全インターバル区間における間合いに対する総合評価点との合計値（平均値でもよい）を、全区間（全文要素区間と全インターバル区間）における総合評価点として算出する。 Further, the reading aloud evaluation unit 33 associates the interval section specified from the model voice waveform data with the interval section specified from the speaker voice waveform data, for each interval section with respect to the speaker interval (interval). An evaluation may be performed. For example, the reading aloud evaluation unit 33 calculates the time difference between the time length of the interval section specified from the sample speech waveform data and the time length of the interval section specified from the speaker speech waveform data for each interval section. Based on the absolute value of the time difference, an evaluation score for the speaker gap is calculated for each interval section. This evaluation score is calculated so that, for example, 30 points are full, and the absolute value of the time difference is closer to 0. Moreover, the reading aloud evaluation unit 33 calculates an average value (or a total value) of evaluation points for each evaluation item for each sentence element section as a total evaluation point for each evaluation item in the whole sentence element section, and evaluates for each interval section. The average value (or the total value) may be calculated as an overall evaluation score for the interval in all interval sections. Furthermore, the reading aloud evaluation unit 33 calculates the total value (which may be an average value) of the overall evaluation score for each evaluation item in the whole sentence element section and the overall evaluation score for the interval in all the interval sections as the whole section (the whole sentence element section and the entire sentence section). It is calculated as an overall evaluation score in the interval section).

［２.音読練習装置Ｓの動作］
次に、図９〜図１５を参照して、音読練習装置Ｓの動作について説明する。図９〜図１１は、話者が音読練習を行うときの表示画面の遷移例を示す図である。図１２は、制御部３による音読練習処理中に行われる表示処理の一例を示すフローチャートであり、図１３は、制御部３によるウインドウ[i]の表示処理の一例を示すフローチャートである。図１４は、図１２に示す表示処理において作成される吹き出し出しリストの一例を示す図である。図１５は、図１２に示す表示処理において吹き出し表示されるウインドウの表示状態を示す概念図である。 [2. Operation of reading aloud practice device S]
Next, the operation of the reading aloud practice device S will be described with reference to FIGS. 9 to 11 are diagrams showing examples of display screen transition when the speaker practice reading aloud. FIG. 12 is a flowchart showing an example of the display process performed during the reading aloud practice process by the control unit 3, and FIG. 13 is a flowchart showing an example of the window [i] display process by the control unit 3. FIG. 14 is a diagram showing an example of a balloon list created in the display process shown in FIG. FIG. 15 is a conceptual diagram showing a display state of a window displayed in a balloon in the display process shown in FIG.

先ず、ユーザ操作により音読練習処理プログラムが起動して音読練習処理が開始されると、ディスプレイＤには図９（Ａ）に示す表示画面が表示される。図９（Ａ）に示す表示画面において、話者がユーザ操作により練習ボタン６１を指定（例えば、マウスでクリック）すると、ディスプレイＤには図９（Ｂ）に示す表示画面が表示される。 First, when the reading aloud practice processing program is started by a user operation and the reading aloud practice processing is started, a display screen shown in FIG. In the display screen shown in FIG. 9A, when the speaker designates the practice button 61 by user operation (for example, clicks with the mouse), the display screen shown in FIG.

次に、図９（Ｂ）に示す表示画面において、話者がユーザ操作によりユーザＩＤ及びパスワードを入力し、ログインボタン６２を指定すると、ログイン処理が開始される。このログイン処理中、ディスプレイＤには図９（Ｃ）に示す表示画面が表示される。そして、ログイン処理によりログインが成功すると、ディスプレイＤには図１０（Ａ）に示す表示画面が表示される。 Next, on the display screen shown in FIG. 9B, when the speaker inputs the user ID and password by user operation and designates the login button 62, the login process is started. During this login process, the display screen shown in FIG. When the login process is successful, the display screen shown in FIG. 10A is displayed on the display D.

次に、図１０（Ａ）に示す表示画面において、話者がユーザ操作により一覧表示６３から音読対象となる文のファイル名を選択し、練習ボタン６４を指定すると、選択されたファイル名に対応する手本音声波形データ及びテキストデータが記憶部２から制御部３に読み込まれ、ディスプレイＤには図１０（Ｂ）に示す表示画面が表示される。 Next, on the display screen shown in FIG. 10A, when the speaker selects a file name of a sentence to be read aloud from the list display 63 by a user operation and designates the practice button 64, it corresponds to the selected file name. The model voice waveform data and text data to be read are read from the storage unit 2 into the control unit 3, and the display screen shown in FIG.

次に、図１０（Ｂ）に示す表示画面において、話者がユーザ操作により再生ボタン５４及び録音ボタン５６を指定（或いは、録音ボタン５６のみを指定）すると、手本音声波形データの再生、話者音声波形データの入力（つまり、話者により発せられた音声の録音）、及び手本表示領域５１と話者表示領域５２のスクロールが開始されると共に、図１２に示す表示処理が開始される。なお、図１０（Ｂ）に示す表示画面に表示された矢印ボタン５８は、手本表示領域５１と話者表示領域５２のスクロールを先頭に戻すボタンである。また、図１０（Ｂ）に示す表示画面に表示された時間Ｔは、現在のスクロール位置（例えば、現在の再生時間ＲＰ）が先頭から何秒経過した再生位置かを示す。こうして、図１２に示す表示処理が開始されると、図１０（Ｃ）に示すように、録音ボタン５６の表示が終了ボタン５９の表示に切り替わる。なお、表示処理の詳細は後述する。そして、話者がユーザ操作により終了ボタン５９を指定するか、或いは手本音声波形データの再生が終了（言い換えると、話者により発せられた音声の録音終了）すると、ディスプレイＤには図１１（Ａ）に示す表示画面が表示される。 Next, on the display screen shown in FIG. 10B, when the speaker specifies the playback button 54 and the recording button 56 (or only the recording button 56) by the user operation, playback of the model voice waveform data, The input of the speaker voice waveform data (that is, the recording of the voice uttered by the speaker) and the scroll of the model display area 51 and the speaker display area 52 are started, and the display process shown in FIG. 12 is started. . The arrow button 58 displayed on the display screen shown in FIG. 10B is a button for returning the scroll of the model display area 51 and the speaker display area 52 to the top. Also, the time T displayed on the display screen shown in FIG. 10B indicates how many seconds have elapsed from the beginning of the current scroll position (for example, the current playback time RP). Thus, when the display process shown in FIG. 12 is started, the display of the recording button 56 is switched to the display of the end button 59 as shown in FIG. Details of the display process will be described later. Then, when the speaker designates the end button 59 by the user operation or the reproduction of the model voice waveform data is finished (in other words, the recording of the voice uttered by the speaker is finished), the display D shows FIG. A display screen shown in A) is displayed.

次に、図１１（Ａ）に示す表示画面において、話者がユーザ操作により“はい”ボタン６５を指定すると、音読評価部３３による話者の音読に対する評価（採点）処理が開始される。この評価処理中、ディスプレイＤには図１１（Ｂ）に示す表示画面が表示される。一方、図１１（Ａ）に示す表示画面において、話者がユーザ操作により“いいえ”ボタン６６を指定すると、練習前（つまり、録音ボタン５６の指定前）の状態へ戻る。そして、評価処理が完了すると、ディスプレイＤには図１１（Ｃ）に示す表示画面が表示される。 Next, on the display screen shown in FIG. 11A, when the speaker designates the “Yes” button 65 by the user operation, an evaluation (scoring) process for the speaker's reading by the reading aloud evaluation unit 33 is started. During this evaluation process, the display screen shown in FIG. On the other hand, on the display screen shown in FIG. 11A, when the speaker designates the “No” button 66 by a user operation, the state returns to the state before practice (ie, before the recording button 56 is designated). When the evaluation process is completed, the display screen shown in FIG.

図１１（Ｃ）に示す表示画面の話者表示領域５２には、話者音声波形データから得られた情報が表示されている。そのうち、話者がユーザ操作により、ある文要素区間の時間長に応じた長さの表示バー５２ａ１に、例えばマウスのポインタを重畳させる（つまり、マウスオーバーする）と、その文要素区間における話者の音読に対する評価結果（評価項目毎の評価点）を表示するウインドウ６７がポップアップ表示される。このウインドウ６７には、評価項目毎の評価点の右横に配点（満点）、及び満点に対する得点率に応じた表示形態のアイコンが表示されている。例えば、晴れを表すアイコンは90％以上を示し、曇りを表すアイコンは50〜79％を示し、雨を表すアイコンは50％未満を示している。 In the speaker display area 52 of the display screen shown in FIG. 11C, information obtained from the speaker voice waveform data is displayed. Among them, when the speaker performs a user operation, for example, a mouse pointer is superimposed on the display bar 52a1 having a length corresponding to the time length of a certain sentence element section (that is, the mouse is over), the speaker in the sentence element section is displayed. A window 67 that displays the evaluation result (evaluation score for each evaluation item) for the reading of the voice is popped up. In this window 67, an icon in a display form corresponding to a score (full score) and a score rate with respect to the full score is displayed on the right side of the evaluation score for each evaluation item. For example, an icon representing sunny indicates 90% or more, an icon representing cloudy indicates 50 to 79%, and an icon representing rain indicates less than 50%.

また、図１１（Ｃ）に示す表示画面の総合評価点表示領域５７には、全文要素区間における評価項目（例えば、抑揚、音量、滑舌、及び速度）別の総合評価点と、全インターバル区間における間合いに対する総合評価点と、全区間における総合評価点（総合得点）が表示されている。そして、話者がユーザ操作により、総合評価点表示領域５７に、例えばマウスのポインタを重畳させる（つまり、マウスオーバーする）と、総合評価点表示領域５７に変えて総合評価点表示領域６８が表示される。総合評価点表示領域６８には、全文要素区間における評価項目別の総合評価点の右横に配点（満点）が表示されている。 In addition, in the comprehensive evaluation point display area 57 of the display screen shown in FIG. 11C, the overall evaluation point for each evaluation item (for example, inflection, volume, smooth tongue, and speed) in the whole sentence element interval, and all interval intervals A comprehensive evaluation score for the time interval and a comprehensive evaluation score (total score) in all sections are displayed. Then, when the speaker performs a user operation, for example, a mouse pointer is superimposed on the total evaluation score display area 57 (that is, the mouse is over), the total evaluation score display area 68 is displayed instead of the total evaluation score display area 57. Is done. In the overall evaluation score display area 68, a score (full score) is displayed on the right side of the overall evaluation score for each evaluation item in the whole sentence element section.

ここで、図１２に示す表示処理の詳細について説明する。図１２に示す表示処理が開始されると、制御部３における表示処理部３２のメインモジュールは、上記読み込まれた手本音声波形データに対応付けられた文要素区間リストに基づいて吹き出しリストを作成する（ステップＳ１）。図１４に示す吹き出しリストは、図２に示す文要素区間リストに基づいて作成されたリストである。図１４に示す各ウインドウ[i]のレコードは、図２に示す各文要素区間[i]のレコードに１対１で対応している。図１４に示すように、各ウインドウ[i]のレコードには、テキスト、表示開始時間、表示終了時間、表示状態、表示エリア、表示行数、及び切替時間が登録されている。図１４に示す各テキスト（文要素を表すテキスト）は、それぞれに対応するウインドウ[i]内に表示される。図１４に示す各ウインドウ[i]の表示開始時間は、それぞれに対応する音読開始時間（図２において[i]が同じである音読開始時間）から所定時間ＢＴ前の時間に設定されている。図１４に示す各ウインドウ[i]の表示終了時間は、音読終了時間（図２において[i]が同じである音読終了時間）から所定時間ＡＴ後の時間に設定されている。なお、図１４の例では、所定時間ＢＴ及び所定時間ＡＴは、それぞれ、500msに設定されているが、この時間は任意に設定可能である。図１４に示すウインドウ[i]の表示状態は、変数statusになっており、このstatusは、表示処理において、図１５に示すように、ready（表示準備中）→visible（表示中）→complete（表示済）と変化する。図１５に示す“Current Time”は、現在の再生時間（再生位置）を示す。 Details of the display process shown in FIG. 12 will be described here. When the display process shown in FIG. 12 is started, the main module of the display processing unit 32 in the control unit 3 creates a balloon list based on the sentence element section list associated with the read sample speech waveform data. (Step S1). The balloon list shown in FIG. 14 is a list created based on the sentence element section list shown in FIG. The records in each window [i] shown in FIG. 14 correspond one-to-one with the records in each sentence element section [i] shown in FIG. As shown in FIG. 14, text, a display start time, a display end time, a display state, a display area, the number of display lines, and a switching time are registered in the record of each window [i]. Each text (text representing a sentence element) shown in FIG. 14 is displayed in a corresponding window [i]. The display start time of each window [i] shown in FIG. 14 is set to a time before the predetermined time BT from the corresponding reading start time (speech reading start time with the same [i] in FIG. 2). The display end time of each window [i] shown in FIG. 14 is set to a time after a predetermined time AT from the reading end time (reading end time with the same [i] in FIG. 2). In the example of FIG. 14, the predetermined time BT and the predetermined time AT are each set to 500 ms, but this time can be arbitrarily set. The display state of the window [i] shown in FIG. 14 is a variable status, and this status is displayed as ready (display preparation) → visible (displaying) → complete (display) as shown in FIG. Displayed) and change. “Current Time” shown in FIG. 15 indicates the current playback time (playback position).

また、図１４に示すウインドウ[i]の表示エリアは、ウインドウ[i]が表示されるエリアを示す。表示エリアが“１”であるウインドウ[i]は、第１吹き出しエリア５２２ａに表示される一方、表示エリアが“２”であるウインドウ[i]は、第２吹き出しエリア５２２ｂに表示される。図１４に示すウインドウ[i]の表示行数は、文要素を表すテキストがウインドウ[i]内で表示される行数を示す。例えば、ウインドウ[i]内で１行に表示可能な最大文字数が予め設定されているとすると、ウインドウ[i]の表示行数は、これに表示されるテキストの文字数を、予め設定された最大文字数で割ることにより決定される。図１４に示すウインドウ[i]の切替時間は、文要素を表すテキストがウインドウ[i]内で予め設定された最大表示行数（例えば、２行）で収まらない場合に、ウインドウ[i]の表示開始時間から１行分上方向にスクロール開始（切替開始）するまでの時間を示す。例えば、ウインドウ[i]内で予め設定された最大表示行数が２行であり、ウインドウ[i]内にテキストを全部表示させるには３行で必要ある場合、ウインドウ[i]の切替時間は、ウインドウ[i]の表示開始時間に１．５×（１行あたりの表示継続時間）を加えた時間として算出される。ここで、１行あたりの表示継続時間は、ウインドウ[i]の表示開始時間から表示終了時間までの表示継続期間を、ウインドウ[i]の表示行数で割ることで算出される。なお、上記の場合、ウインドウ[i]の切替時間は、ウインドウ[i]の表示開始時間に２×（１行あたりの表示継続時間）を加えた時間として算出されてもよい。 Further, the display area of the window [i] shown in FIG. 14 indicates an area where the window [i] is displayed. The window [i] whose display area is “1” is displayed in the first balloon area 522a, while the window [i] whose display area is “2” is displayed in the second balloon area 522b. The number of lines displayed in the window [i] shown in FIG. 14 indicates the number of lines in which the text representing the sentence element is displayed in the window [i]. For example, if the maximum number of characters that can be displayed in one line in the window [i] is preset, the number of display lines in the window [i] is set to the preset maximum number of characters. Determined by dividing by the number of characters. The switching time of the window [i] shown in FIG. 14 is the window [i] when the text representing the sentence element does not fit in the maximum number of display lines (for example, two lines) preset in the window [i]. The time from the display start time to the start of scrolling (start of switching) by one line is shown. For example, if the maximum number of display lines set in advance in window [i] is 2 lines and 3 lines are required to display all the text in window [i], the switching time of window [i] is , Calculated by adding 1.5 × (display continuation time per line) to the display start time of window [i]. Here, the display continuation time per line is calculated by dividing the display continuation period from the display start time of window [i] to the display end time by the number of display lines of window [i]. In the above case, the switching time of the window [i] may be calculated as a time obtained by adding 2 × (display continuation time per line) to the display start time of the window [i].

次いで、表示処理部３２のメインモジュールは、シリアル番号を示す変数ｉに０を代入する（ステップＳ２）。次いで、表示処理部３２のメインモジュールは、ステップＳ１で作成された吹き出しリストに登録された全ウインドウの表示が完了したか否かを判定する（ステップＳ３）。表示処理部３２のメインモジュールは、吹き出しリストに登録された全ウインドウの表示が完了したと判定した場合（ステップＳ３：ＹＥＳ）、表示処理を終了する。一方、表示処理部３２のメインモジュールは、吹き出しリストに登録された全ウインドウの表示が完了していないと判定した場合（ステップＳ３：ＮＯ）、吹き出しリストを参照して、表示状態がcomplete（表示済）でなく、且つ、表示開始時間がCurrent Time（現在の再生時間）以下である（つまり、表示開始時間が到来した）との条件を満たすウインドウを検索する（ステップＳ４）。そして、表示処理部３２のメインモジュールは、ステップＳ４における条件を満たすウインドウが抽出されたか否かを判定する（ステップＳ５）。表示処理部３２のメインモジュールは、ステップＳ４における条件を満たすウインドウが抽出されないと判定した場合（ステップＳ５：ＮＯ）、ステップＳ３に戻る。一方、表示処理部３２のメインモジュールは、ステップＳ４における条件を満たすウインドウが抽出されたと判定した場合（ステップＳ５：ＹＥＳ）、ステップＳ６へ進む。 Next, the main module of the display processing unit 32 substitutes 0 for a variable i indicating a serial number (step S2). Next, the main module of the display processing unit 32 determines whether or not the display of all the windows registered in the balloon list created in step S1 is completed (step S3). When the main module of the display processing unit 32 determines that the display of all the windows registered in the balloon list has been completed (step S3: YES), the display process ends. On the other hand, when the main module of the display processing unit 32 determines that the display of all windows registered in the balloon list is not complete (step S3: NO), the display state is complete (display) with reference to the balloon list. The window that satisfies the condition that the display start time is not more than Current Time (that is, the display start time has arrived) is searched (step S4). Then, the main module of the display processing unit 32 determines whether or not a window satisfying the condition in step S4 has been extracted (step S5). If the main module of the display processing unit 32 determines that a window satisfying the condition in step S4 is not extracted (step S5: NO), the process returns to step S3. On the other hand, if the main module of the display processing unit 32 determines that a window satisfying the condition in step S4 has been extracted (step S5: YES), the process proceeds to step S6.

ステップＳ６では、表示処理部３２のメインモジュールは、ウインドウ[i]の表示状態がready（表示準備中）であるか否かを判定する。表示処理部３２のメインモジュールは、ウインドウ[i]の表示状態がready（表示準備中）でないと判定した場合（ステップＳ６：ＮＯ）、変数ｉを１インクリメントし（ステップＳ８）、ステップＳ９へ進む。ステップＳ９では、表示処理部３２のメインモジュールは、ステップＳ５で抽出されたウインドウを全て処理したか否かを判定する。表示処理部３２のメインモジュールは、ステップＳ５で抽出されたウインドウを全て処理していないと判定した場合（ステップＳ９：ＮＯ）、ステップＳ６に戻る。一方、表示処理部３２のメインモジュールは、ステップＳ５で抽出されたウインドウを全て処理したと判定した場合（ステップＳ９：ＹＥＳ）、ステップＳ３に戻る。 In step S6, the main module of the display processing unit 32 determines whether or not the display state of the window [i] is ready (in preparation for display). When the main module of the display processing unit 32 determines that the display state of the window [i] is not ready (in preparation for display) (step S6: NO), the variable i is incremented by 1 (step S8), and the process proceeds to step S9. . In step S9, the main module of the display processing unit 32 determines whether or not all the windows extracted in step S5 have been processed. When the main module of the display processing unit 32 determines that all the windows extracted in step S5 have not been processed (step S9: NO), the process returns to step S6. On the other hand, if the main module of the display processing unit 32 determines that all the windows extracted in step S5 have been processed (step S9: YES), the process returns to step S3.

一方、表示処理部３２のメインモジュールは、ウインドウ[i]の表示状態がready（表示準備中）であると判定した場合（ステップＳ６：ＹＥＳ）、ウインドウ[i]の表示指令により、表示処理部３２のサブモジュールを生成、実行する（ステップＳ７）。つまり、この例では、表示処理部３２のサブモジュールは、メインモジュールが実行されるスレッドとは独立した新しいスレッドで実行されることになる。表示処理部３２のサブモジュールは、図１３に示すように、ウインドウ[i]の表示を開始する。なお、表示処理部３２のサブモジュールは、表示処理部３２のメインモジュールとは非同期で処理を行う。このような処理は、例えば、ＯＳのマルチタスク機能により行われる。また、上述したように、複数のウインドウ[i]の表示継続期間の一部が重複する場合、同じ時間帯で複数のサブモジュールが非同期で動作する場合もある。 On the other hand, when the main module of the display processing unit 32 determines that the display state of the window [i] is ready (in preparation for display) (step S6: YES), the display processing unit 32 displays the display processing unit according to the display command of the window [i]. 32 submodules are generated and executed (step S7). That is, in this example, the submodule of the display processing unit 32 is executed by a new thread independent of the thread in which the main module is executed. As shown in FIG. 13, the submodule of the display processing unit 32 starts displaying the window [i]. Note that the submodule of the display processing unit 32 performs processing asynchronously with the main module of the display processing unit 32. Such processing is performed, for example, by the multitask function of the OS. Further, as described above, when some of the display continuation periods of the plurality of windows [i] overlap, a plurality of submodules may operate asynchronously in the same time zone.

図１３に示すウインドウ[i]の表示開始では、表示処理部３２のサブモジュールは、吹き出しリストにしたがって、ウインドウ[i]を、第１吹き出しエリア５２２ａまたは第２吹き出しエリア５２２ｂに吹き出し表示させる。次いで、表示処理部３２のサブモジュールは、ウインドウ[i]の表示状態をvisible（表示中）に設定する（ステップＳ１１）ことで、吹き出しリストにおけるウインドウ[i]の表示状態を更新する。次いで、表示処理部３２のサブモジュールは、変数ｊに０を代入する（ステップＳ１２）。次いで、表示処理部３２のサブモジュールは、吹き出しリストを参照して、ウインドウ[i]の表示切替があるか否かを判定する（ステップＳ１３）。例えば、表示処理部３２のサブモジュールは、ウインドウ[i]の切替時間が吹き出しリストに登録されている場合、ウインドウ[i]の表示切替があると判定し（ステップＳ１３：ＹＥＳ）、ステップＳ１４へ進む。一方、表示処理部３２のサブモジュールは、ウインドウ[i]の表示切替がないと判定した場合（ステップＳ１３：ＮＯ）、ステップＳ１７へ進む。 When the display of the window [i] shown in FIG. 13 is started, the submodule of the display processing unit 32 displays the window [i] in a balloon in the first balloon area 522a or the second balloon area 522b according to the balloon list. Next, the submodule of the display processing unit 32 sets the display state of the window [i] to visible (displaying) (Step S11), thereby updating the display state of the window [i] in the balloon list. Next, the submodule of the display processing unit 32 substitutes 0 for the variable j (step S12). Next, the submodule of the display processing unit 32 refers to the balloon list and determines whether or not there is display switching of the window [i] (step S13). For example, if the switching time of window [i] is registered in the balloon list, the sub-module of the display processing unit 32 determines that there is display switching of window [i] (step S13: YES), and proceeds to step S14. move on. On the other hand, when the submodule of the display processing unit 32 determines that there is no display switching of the window [i] (step S13: NO), the process proceeds to step S17.

ステップＳ１４では、表示処理部３２のサブモジュールは、吹き出しリストを参照して、ウインドウ[i]の表示開始時間からｊ番目の切替時間が到来したか否かを判定する。表示処理部３２のサブモジュールは、ｊ番目の切替時間が到来したと判定した場合（ステップＳ１４：ＹＥＳ）、ステップＳ１５へ進む。一方、表示処理部３２のサブモジュールは、ｊ番目の切替時間が到来していないと判定した場合（ステップＳ１４：ＮＯ）、ステップＳ１３に戻る。ステップＳ１５では、表示処理部３２のサブモジュールは、ウインドウ[i]内で１行表示切替（つまり、１行分上方向にスクロール）を行う。次いで、表示処理部３２のサブモジュールは、変数ｊを１インクリメントし（ステップＳ１６）、ステップＳ１７へ進む。 In step S14, the submodule of the display processing unit 32 refers to the balloon list and determines whether or not the j-th switching time has come from the display start time of the window [i]. If the submodule of the display processing unit 32 determines that the j-th switching time has arrived (step S14: YES), the process proceeds to step S15. On the other hand, when the sub-module of the display processing unit 32 determines that the j-th switching time has not arrived (step S14: NO), the process returns to step S13. In step S15, the submodule of the display processing unit 32 performs one-line display switching (that is, scrolls upward by one line) in the window [i]. Next, the submodule of the display processing unit 32 increments the variable j by 1 (step S16), and proceeds to step S17.

ステップＳ１７では、表示処理部３２のサブモジュールは、吹き出しリストを参照して、ウインドウ[i]の表示終了時間がCurrent Time（現在の再生時間）以下である（つまり、表示終了時間が到来した）か否かを判定する。表示処理部３２のサブモジュールは、ウインドウ[i]の表示終了時間がCurrent Time（現在の再生時間）以下であると判定した場合（ステップＳ１７：ＹＥＳ）、ステップＳ１８へ進む。一方、表示処理部３２のサブモジュールは、ウインドウ[i]の表示終了時間がCurrent Time（現在の再生時間）以下でないと判定した場合（ステップＳ１７：ＮＯ）、ステップＳ１３に戻る。ステップＳ１８では、ウインドウ[i]の表示状態をcomplete（表示済）に設定することで吹き出しリストにおけるウインドウ[i]の表示状態を更新し、ウインドウ[i]を消去（表示消去）して処理を終了する。 In step S17, the sub-module of the display processing unit 32 refers to the balloon list, and the display end time of the window [i] is less than or equal to Current Time (current playback time) (that is, the display end time has arrived). It is determined whether or not. If the submodule of the display processing unit 32 determines that the display end time of the window [i] is less than or equal to Current Time (current playback time) (step S17: YES), the process proceeds to step S18. On the other hand, when the sub-module of the display processing unit 32 determines that the display end time of the window [i] is not less than the Current Time (current playback time) (step S17: NO), the process returns to step S13. In step S18, the display state of window [i] is set to complete (displayed) to update the display state of window [i] in the balloon list, and window [i] is deleted (display deleted). finish.

以上説明したように、上記実施形態によれば、音読練習装置Ｓは、表示バーに対応付けられる文要素を表すテキストの全部が表示されるウインドウを、この文要素の音読開始時間に基づき吹き出し表示させ、吹き出し表示されたウインドウを、この文要素の音読終了時間に基づき消去するように構成したので、表示バーに対応する限られた領域に文要素を表すテキストが途切れて表示される場合であっても、話者によるテキストのスムーズな音読を効果的に支援することができる。 As described above, according to the embodiment, the reading aloud practice device S displays a window in which all of the text representing the sentence element associated with the display bar is displayed based on the reading start time of the sentence element. Since the window displayed in a balloon is erased based on the reading end time of the sentence element, the text representing the sentence element is displayed in a limited area corresponding to the display bar. However, it is possible to effectively support the smooth reading of the text by the speaker.

１通信部
２記憶部
３制御部
４操作部
５インターフェース部
６バス
３１音声処理部
３２表示処理部
３３音読評価部
Ｓ音読練習装置 DESCRIPTION OF SYMBOLS 1 Communication part 2 Memory | storage part 3 Control part 4 Operation part 5 Interface part 6 Bus 31 Voice processing part 32 Display processing part 33 Reading aloud evaluation part S Reading aloud practice apparatus

Claims

Reproduction control means for reproducing voice waveform data indicating a waveform of a voice when reading a sentence;
A sentence element section divided into a plurality based on the speech waveform data, and having a length corresponding to a time length of the sentence element section from a reading start timing to a reading end timing of each of the plurality of sentence elements constituting the sentence. First display control means for displaying an object in the first display area in time series from the beginning of the sentence;
A text divided into a plurality based on the text data of the sentence, and the text representing the sentence element is associated with the object in time series from the head of the sentence, and all or part of the text representing the sentence element Second display control means for displaying in a limited area corresponding to the associated object;
In accordance with the reproduction of the speech waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is read aloud. Third display control means for erasing based on the end timing;
A reading aloud practice device characterized by comprising:

The third display control means pops up a window in which the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element in accordance with the reproduction of the speech waveform data, The reading aloud practice device according to claim 1, wherein the window displayed in a pop-up is deleted based on the reading end timing.

The first display control means scrolls the first display area where the object is displayed in accordance with the reproduction of the audio waveform data,
3. The reading aloud practice device according to claim 2, wherein the third display control means fixedly displays the window in which all the text representing the sentence element is displayed.

4. The third display control unit displays all of the text in the window in a plurality of lines when the total number of characters of the text representing the sentence element is larger than a threshold value. Reading aloud practice device.

3. The third display control means, when the total number of characters of the text representing the sentence element is larger than a threshold, switches the text part by part at a predetermined time interval and displays it in the window. The reading aloud practice apparatus as described in any one of thru | or 4.

The third display control means starts reading the sentence element aloud only when a part of the text representing the sentence element is displayed in a limited area corresponding to the object. The reading practice device according to any one of claims 1 to 5, wherein the second display area is displayed based on timing.

7. The reading aloud according to claim 1, wherein the third display control unit displays all of the text representing the sentence element from a predetermined time before the reading start timing of the sentence element. Practice equipment.

The said 3rd display control means erases all the said text displayed on the said 2nd display area after the predetermined time of the said reading end timing, The one of Claim 1 thru | or 7 characterized by the above-mentioned. Reading aloud device.

A plurality of partial display areas are set in the second display area,
When a part of the period in which the texts representing two adjacent sentence elements are displayed in the second display area overlaps, the third display control means displays the texts in the partial display areas different from each other. The reading practice device according to any one of claims 1 to 8, wherein

A text that represents all of the text representing the sentence is displayed in a third display area, and a display color of the text representing the sentence element in the text displayed in the third display area is changed based on a reading start timing of the sentence element. The reading aloud practice device according to claim 1, further comprising 4 display control means.

A display control method executed by one or more computers,
A reproduction control step for reproducing voice waveform data indicating a voice waveform when reading a sentence aloud;
A sentence element section divided into a plurality based on the speech waveform data, and having a length corresponding to a time length of the sentence element section from a reading start timing to a reading end timing of each of the plurality of sentence elements constituting the sentence. A first display control step of displaying the object in the first display area in time series from the beginning of the sentence;
A text divided into a plurality based on the text data of the sentence, and the text representing the sentence element is associated with the object in time series from the head of the sentence, and all or part of the text representing the sentence element A second display control step of displaying a limited area corresponding to the associated object;
In accordance with the reproduction of the speech waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is read aloud. A third display control step for erasing based on the end timing;
A display control method comprising:

A reproduction control step for reproducing voice waveform data indicating a voice waveform when reading a sentence aloud;
A sentence element section divided into a plurality based on the speech waveform data, and having a length corresponding to a time length of the sentence element section from a reading start timing to a reading end timing of each of the plurality of sentence elements constituting the sentence. A first display control step of displaying the object in the first display area in time series from the beginning of the sentence;
A text divided into a plurality based on the text data of the sentence, and the text representing the sentence element is associated with the object in time series from the head of the sentence, and all or part of the text representing the sentence element A second display control step of displaying a limited area corresponding to the associated object;
In accordance with the reproduction of the speech waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is read aloud. A third display control step for erasing based on the end timing;
A program that causes a computer to execute.