JP6418179B2 - Reading aloud practice device, display control method, and program - Google Patents

Reading aloud practice device, display control method, and program Download PDF

Info

Publication number
JP6418179B2
JP6418179B2 JP2016041017A JP2016041017A JP6418179B2 JP 6418179 B2 JP6418179 B2 JP 6418179B2 JP 2016041017 A JP2016041017 A JP 2016041017A JP 2016041017 A JP2016041017 A JP 2016041017A JP 6418179 B2 JP6418179 B2 JP 6418179B2
Authority
JP
Japan
Prior art keywords
sentence
display
reading
text
displayed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2016041017A
Other languages
Japanese (ja)
Other versions
JP2017156615A (en
Inventor
成田 健
健 成田
Original Assignee
ブラザー工業株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ブラザー工業株式会社 filed Critical ブラザー工業株式会社
Priority to JP2016041017A priority Critical patent/JP6418179B2/en
Publication of JP2017156615A publication Critical patent/JP2017156615A/en
Application granted granted Critical
Publication of JP6418179B2 publication Critical patent/JP6418179B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Description

  The present invention relates to a technical field such as a system capable of supporting reading of text by a speaker.

  In recent years, for the purpose of practice support such as language learning, announcements, or reading, information related to speech (examples) that serves as a model (example) when reading a sentence containing a plurality of sentence elements (for example, phrases and words). There is known a system that displays on a screen such that the pressure, pitch, text representing a sentence element, and the like, and information related to the sound produced when the practitioner reads the sentence aloud can be compared. For example, in the technique disclosed in Patent Document 1, the pitch of the model voice and the pitch of the learner's voice are displayed together for each word.

  In the system as described above, a predetermined program discriminates a lump of a speech waveform based on speech waveform data indicating a speech waveform when a sentence is read aloud and divides the speech waveform into a plurality of sentence elements (for example, phrases). The sentence element section (in other words, the section of the waveform of the sentence element) is specified. Further, the system identifies a plurality of texts (that is, texts representing sentence elements) by dividing the text of the whole sentence into sentence elements (for example, phrases) based on the text data of the sentences. The system stores the identified sentence element section and the text in association with each other in time series. For example, Patent Document 2 discloses a technique for storing speech waveform data in association with each character constituting text information.

JP 2007-139868 A JP-A-4-305730

  By the way, using the data in which the sentence element section and the text are associated with each other in time series, the sentence element section corresponds to the limited area corresponding to the display bar having a length corresponding to the time length of the sentence element section, for example. It is conceivable to assist the practitioner (speaker) in reading aloud by drawing text representing the sentence element to be played. However, if a long text corresponds to a short sentence element section, the text may be interrupted at the end of the display bar, and the entire text may not be drawn. In this case, the practitioner finds a text break at the end of the display bar while reading the text representing the sentence element, and smooth reading of the text is hindered. For this reason, there is a possibility that the practitioner's evaluation of reading aloud may be lower than when the text is not interrupted.

  The present invention has been made in view of the above points, and provides a reading aloud practice device, a display control method, and a program that can effectively support smooth reading of text by a speaker.

  In order to solve the above problem, the invention according to claim 1 is divided into a plurality of reproduction control means for reproducing voice waveform data indicating a voice waveform when a sentence is read aloud, and based on the voice waveform data. An object having a length corresponding to the time length of the sentence element section from the reading start timing to the reading end timing of each of the plurality of sentence elements constituting the sentence in chronological order from the head of the sentence. First display control means for displaying in one display area, and text that is divided into a plurality of texts based on text data of the sentence and that represents the sentence element is associated with the object in time series from the head of the sentence And a second display control means for displaying all or part of the text representing the sentence element in a limited area corresponding to the associated object, and the sound In accordance with the reproduction of the waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is displayed in the reading end timing. And third display control means for erasing based on.

  According to a second aspect of the present invention, in the reading aloud practice device according to the first aspect, the third display control means includes a window that displays all of the text representing the sentence element in accordance with the reproduction of the voice waveform data. Pop-up display is performed in the second display area based on the reading start timing of the sentence element, and the pop-up displayed window is erased based on the reading end timing.

  According to a third aspect of the present invention, in the reading aloud practice device according to the second aspect, the first display control means scrolls the first display area where the object is displayed in accordance with the reproduction of the voice waveform data. The third display control means fixedly displays the window in which all the text representing the sentence element is displayed.

  According to a fourth aspect of the present invention, in the reading aloud practice device according to the second or third aspect, when the total number of characters of the text representing the sentence element is larger than a threshold, the third display control means Is displayed in the window in a plurality of lines.

  According to a fifth aspect of the present invention, in the reading aloud practice device according to any one of the second to fourth aspects, when the total number of characters of the text representing the sentence element is greater than a threshold, the third display control means The text is switched in portions at predetermined time intervals and displayed in the window.

  According to a sixth aspect of the present invention, in the reading aloud practice device according to any one of the first to fifth aspects, the third display control means represents the sentence element in a limited area corresponding to the object. Only when a part of the text is displayed, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element.

  According to a seventh aspect of the present invention, in the reading aloud practice device according to any one of the first to sixth aspects, the third display control means starts reading the whole sentence representing the sentence element aloud. The display is performed from a predetermined time before the timing.

  According to an eighth aspect of the present invention, in the reading aloud practice device according to any one of the first to seventh aspects, the third display control means reads the entire text displayed in the second display area. Erasure is performed after a predetermined time from the end of reading aloud.

  According to a ninth aspect of the present invention, in the reading aloud practice device according to any one of the first to eighth aspects, a plurality of partial display areas are set in the second display area. The third display control means causes each of the texts to be displayed in the partial display areas different from each other when a part of a period in which the text representing the sentence element is displayed in the second display area overlaps. To do.

  According to a tenth aspect of the present invention, in the reading aloud practice device according to any one of the first to ninth aspects, the entire text representing the sentence is displayed in the third display area and displayed in the third display area. The display device further comprises fourth display control means for changing a display color of the text representing the sentence element in the written text based on a reading start timing of the sentence element.

  The invention according to claim 11 is a display control method executed by one or more computers, wherein a reproduction control step of reproducing voice waveform data indicating a voice waveform when a sentence is read aloud, and the voice waveform A sentence element section divided into a plurality based on data, and an object having a length corresponding to the time length of the sentence element section from the reading start timing to the reading end timing of each of the plurality of sentence elements constituting the sentence A first display control step for displaying in a first display area in time series from the beginning of a sentence, and text that is divided into a plurality of texts based on the text data of the sentence and that represents the sentence element from the beginning of the sentence Limited to correspond to the object that is associated with the object in time series and all or part of the text representing the sentence element In accordance with the second display control step to display in the area and the reproduction of the speech waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the second display area And a third display control step of erasing all of the text displayed on the basis of the reading end timing.

  The invention according to claim 12 is a reproduction control step of reproducing voice waveform data indicating a voice waveform when a sentence is read aloud, and a sentence element section divided into a plurality based on the voice waveform data, A first object that displays an object having a length corresponding to a time length of a sentence element section from a reading start timing to a reading end timing of each of a plurality of sentence elements constituting a sentence in the first display area in time series from the head of the sentence. A text that is divided into a plurality of texts based on the text data of the sentence and that represents the sentence element in association with the object in time series from the beginning of the sentence and that represents the sentence element; A second display control step for displaying all or part of the voice waveform data in a limited area corresponding to the associated object, and the audio waveform data According to the reproduction, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is erased based on the reading end timing. And causing the computer to execute a third display control step.

  According to the inventions described in claims 1, 11 and 12, even when the text representing the sentence element is displayed in a limited area corresponding to the object, the entire text is stored for a certain period of time. Since it is displayed in the two display areas, it is possible to effectively support the smooth reading of the text by the speaker.

  According to the second aspect of the present invention, since the window displaying the entire text representing the sentence element is displayed in a pop-up display in the second display area for a certain period of time, the visibility of the text representing the sentence element is improved. Can do.

  According to the invention described in claim 3, since the window is fixedly displayed even when the second display area is scrolled, the visibility of the text representing the sentence element can be improved.

  According to the fourth aspect of the present invention, even when the total number of characters of the text representing the sentence element is large, the entire text can be displayed in the window.

  According to the fifth aspect of the present invention, even when the number of characters of the text representing the sentence element is large, the entire text can be effectively switched and displayed in the window.

  According to the sixth aspect of the present invention, it is possible to reduce the processing load for displaying all the text representing the sentence element and the memory usage capacity.

  According to the seventh aspect of the present invention, it is possible to allow the speaker to read the text representing the sentence element with a margin.

  According to the invention described in claim 8, it is possible to allow the speaker to read the text representing the sentence element with a margin.

  According to invention of Claim 9, the visibility of the text showing two adjacent sentence elements can be improved.

  According to the tenth aspect of the present invention, when the text representing the sentence element is read aloud, the speaker can grasp at a glance where the text is located in the whole sentence.

It is a figure which shows the example of a schematic structure of the reading aloud practice apparatus S which concerns on this embodiment. It is a figure which shows an example of a sentence element area list. 6 is a diagram illustrating an example of a display screen displayed on a display D. FIG. It is a figure which shows the example of the limited area | region corresponding to a display bar. (A) is a figure which shows an example of a display screen when a part of display continuation period of the window corresponding to an adjacent sentence element area overlaps, (B) is two balloon display in the right side display area 522. It is a figure which shows an example of the display screen when an area is set. It is a figure which shows an example of a display screen in case the window corresponding to two adjacent sentence element areas is displayed on a mutually different balloon display area. (A) is a figure which shows an example of the display screen in case all the text showing a sentence element is displayed in a window by multiple lines, (B), (C) is one of the text showing a sentence element. It is a figure which shows an example of the display screen in the case of being switched in a predetermined time interval, and displaying in a window. It is a figure which shows an example of the display screen in case a window is fixedly displayed also while scrolling a display area. It is a figure which shows the example of a transition of a display screen when a speaker practices reading aloud. It is a figure which shows the example of a transition of a display screen when a speaker practices reading aloud. It is a figure which shows the example of a transition of a display screen when a speaker practices reading aloud. It is a flowchart which shows an example of the display process performed during a reading aloud practice process. It is a flowchart which shows an example of the display process of window [i]. It is a figure which shows an example of a speech balloon list | wrist. It is a conceptual diagram which shows the display state of the window displayed by balloon.

  Hereinafter, embodiments of the present invention will be described with reference to the drawings.

[1. Configuration and function of reading aloud practice device S]
First, with reference to FIG. 1, the structure and function of the reading aloud practice apparatus S which concerns on one Embodiment of this invention are demonstrated. FIG. 1 is a diagram illustrating a schematic configuration example of the reading aloud practice device S according to the present embodiment. In addition, a personal computer, a portable information terminal (smartphone, etc.) etc. are mentioned as an example of a reading aloud practice apparatus. As shown in FIG. 1, the reading practice device S includes a communication unit 1, a storage unit 2, a control unit 3, an operation unit 4, an interface (IF) unit 5, and the like. It is connected to the. The reading aloud practice device S is also called an utterance practice device. The operation unit 4 receives an operation (user operation) from a user and outputs a signal corresponding to the user operation to the control unit 3. Examples of user operations include mouse operations and keyboard operations. When the display D is a touch panel display, the user operation may be a contact operation with a user's finger or pen. The interface unit 5 is connected to a microphone M, a display D, and the like. The microphone M collects the sound uttered when a speaker who is a practicing speaker such as language learning, announcement, reading, etc. reads a sentence (sentence) including a plurality of sentence elements. A sentence element is a unit constituting a sentence. Examples of sentence elements include phrases, phrases, words, and combined phrases in which a plurality of phrases are combined. Here, a phrase is a unit that is generally read at a time when reading a sentence. A phrase is composed of one or more phrases. That is, one phrase may be composed of one phrase, and one phrase may be composed of a plurality of phrases. A phrase is a group of one or more words, for example. Words include independent words such as nouns, verbs, adjectives, adverbs, and conjunctions (parts of speech that can constitute a phrase alone), adjuncts such as auxiliary verbs and particles (parts of speech that cannot constitute a phrase alone), and the like. Examples of sentences that can be read aloud include sentences used in language learning, announcements, and reading. The display D displays a display screen on which a display area, which will be described later, is arranged in accordance with a display command from the control unit 3. The microphone M and the display D may be integrated with the reading practice device S or may be separate.

  The communication unit 1 communicates with a server or the like by connecting to a network (not shown) by wire or wireless. The storage unit 2 includes, for example, a hard disk drive and stores an OS (Operating System) and a reading practice program (including the program of the present invention). The reading aloud practice processing program is a program for causing the control unit 3 as a computer to execute the reading aloud practice processing. The reading aloud practice processing program may be downloaded from a predetermined server as an application, or may be provided by being stored in a recording medium such as a CD or a DVD. The storage unit 2 also includes text data of a sentence including a plurality of sentence elements, and voice waveform data (hereinafter referred to as “example voice waveform data”) indicating a waveform of a voice that serves as a model when the sentence is read aloud. Remember. Here, in the text data, for example, text (characters) representing each sentence element constituting a sentence to be read aloud is defined for each sentence element. For example, it is delimited by punctuation marks inserted between sentence elements. Or the serial number may be given to the text showing a sentence element in order from the head. The model voice waveform data is stored in a predetermined voice file format.

  The control unit 3 includes a CPU (Central Processing Unit) as a computer, a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The control unit 3 functions as an audio processing unit 31, a display processing unit 32, and a reading aloud evaluation unit 33 according to the reading aloud practice processing program. The audio processing unit 31 is an example of a reproduction control unit in the present invention. The display processing unit 32 is an example of a first display control unit, a second display control unit, and a third display control unit in the present invention.

  The voice processing unit 31 inputs model voice waveform data stored in a predetermined voice file format from the storage unit 2 as a processing target and reproduces it. The voice waveform data is discretized time-series sound pressure waveform data, for example, sampling rate 44.1 kHz, quantization 16 bits, and monaural waveform data. The sound pressure refers to the change in air pressure (Pa) due to sound waves. In this embodiment, the sound pressure level (dB) representing the effective sound pressure (Pa), which is the root mean square (RMS) of the instantaneous sound pressure (Pa), is expressed as a numerical value that is easy to handle in the calculation. To do. The sound pressure level (dB) is also called volume in a broad sense.

  Based on the reproduced sample speech waveform data, the speech processing unit 31 determines a sentence element section from the reading start time (ie, reading start timing) of each sentence element to the reading end time (ie, reading end timing). Specify for each sentence element. Here, the sentence element section is a section in which a lump of speech waveform is divided, for example, in phrase units. The reading start time is an elapsed time from the reproduction start time (0:00) of the model voice waveform data. The reading end time is an elapsed time from the reproduction start time (0:00) of the model voice waveform data. The speech processing unit 31 registers the text-reading start time and the text-reading end time of the text element section specified for each text element in association with the text representing each text element in the text element section list. The text representing these sentence elements is extracted from, for example, text data associated with the reproduced sample speech waveform data, and is associated in order from the first sentence element section, for example. FIG. 2 is a diagram illustrating an example of a sentence element section list. In the sentence element section list shown in FIG. 2, a text representing a sentence element, a reading start time, and a reading end time are associated with each of a plurality of sentence element sections specified from the model speech waveform data. The sentence element section [i] (i = 0, 1, 2, 3, 4, 5) shown in FIG. 2 indicates a serial number.

  Here, the reading start time and reading end time may each be recognized from the waveform of the voice, or may be recognized from the sound pressure level (dB) calculated as described above. For example, the voice processing unit 31 recognizes the time point when the amplitude of the voice waveform becomes a predetermined value or more as the reading start time. Alternatively, the voice processing unit 31 recognizes the time point when the sound pressure level (dB) is equal to or higher than a predetermined value as the reading start time. Further, for example, the voice processing unit 31 recognizes the time when the amplitude width of the voice waveform becomes less than a predetermined value as the reading end time. Alternatively, the voice processing unit 31 recognizes the time point when the sound pressure level (dB) becomes less than a predetermined value as the reading end time. For example, only when the time from when the sound pressure level (dB) becomes less than a predetermined value until the time when the sound pressure level (dB) becomes more than a predetermined value (silence time) is equal to or more than a threshold value, The point in time when the sound pressure level (dB) becomes less than the predetermined value is recognized as the end time for reading aloud, and the point in time when the sound pressure level (dB) exceeds the predetermined value is recognized as the start time for reading aloud (sound The same applies to the amplitude of the waveform). This means that when the silent time is shorter than the threshold value, the sentence element is not divided in the section. Note that the speech processing unit 31 specifies an interval section from the reading end time of any sentence element to the reading start time of the next sentence element based on the reproduced sample voice waveform data. May be.

  In addition, the sound processing unit 31 specifies the sound pressure level (dB) from the data cut out from the reproduced model sound waveform data every predetermined time as the sound pressure at every predetermined time interval. Then, the sound processing unit 31 stores sound pressure data indicating the sound pressure specified at predetermined time intervals in the RAM. In addition, the voice processing unit 31 calculates a fundamental frequency (Hz) from data cut out from the reproduced voice waveform data every predetermined time, and specifies the calculated basic frequency (Hz) as a pitch at every predetermined time interval. . For example, a known method such as a zero cross method or vector autocorrelation can be applied to the pitch (also referred to as inflection or pitch) identification method. Then, the voice processing unit 31 stores pitch data indicating the pitch specified at predetermined time intervals in the RAM. The predetermined time for specifying the sound pressure and the pitch is shorter than the time length (temporal length) of the sentence element section, and is set to about 10 ms, for example. Further, the audio processing unit 31 divides the data cut out from the reproduced sample audio waveform data every predetermined time by windowing (for example, framed every 25 ms), and performs Fourier analysis (FFT) to obtain an amplitude spectrum. Ask for. Then, the speech processing unit 31 multiplies the obtained amplitude spectrum by a mel filter bank, and calculates a MFCC (mel frequency cepstrum coefficient) by performing a discrete cosine transform (DCT) on a logarithmic value of the output of the mel filter bank. Thus, the feature amount indicating the vocal tract characteristic of the model is specified for each sentence element section. Then, the speech processing unit 31 stores, in the RAM, feature amount data indicating a feature amount indicating the vocal tract characteristic of the model specified for each sentence element section.

  Note that the voice processing unit 31 is voice waveform data (hereinafter referred to as “speaker voice waveform”) that is a voice generated when a speaker reads aloud the above sentence in reading aloud and is collected by a microphone M. Data). Then, the speech processing unit 31 determines the sentence element interval from the reading start time to the reading end time of each sentence element based on the input speaker voice waveform data in the same manner as the model voice waveform data. The text is specified for each, and a text representing each sentence element is associated with each specified sentence element section. Further, the speech processing unit 31 specifies the sound pressure level (dB) from the data cut out from the input speaker voice waveform data every predetermined time as the sound pressure for every predetermined time interval, and the input speaker A fundamental frequency (Hz) is calculated from data cut out from the speech waveform data at predetermined time intervals, and the calculated basic frequency (Hz) is specified as a pitch at every predetermined time interval. Furthermore, the speech processing unit 31 specifies, for each sentence element section, as a feature amount indicating the vocal tract characteristic of the speaker from the input speaker speech waveform data by the same method as the model speech waveform data. The speech processing unit 31 specifies an interval section from the reading end time of any sentence element to the reading start time of the next sentence element based on the input speaker voice waveform data. May be.

  Next, the display processing unit 32 causes the display D to display a display screen for the speaker to practice reading aloud. 3A and 3B are diagrams illustrating an example of a display screen displayed on the display D. FIG. As shown in FIG. 3A, the display screen includes a model display area 51, a speaker display area 52, a full-text display area 53, a play button 54, a scroll bar 55, a recording button 56, and a comprehensive evaluation score display area. 57 etc. are provided. The example display area 51 (an example of the first display area) is an area for displaying information obtained from the example voice waveform data, and the speaker display area 52 is obtained from the speaker voice waveform data. This is an area for displaying information. The sample display area 51 and the speaker display area 52 are arranged side by side in the vertical direction (vertical direction orthogonal to the time axis t) with the time axis t interposed therebetween, and can be scrolled in the arrow direction (left direction). It has become.

  For example, if the playback button 54 is designated by the user operation for the speaker to practice reading aloud, the model voice waveform data is played back, and the display processing unit 32 follows the playback of the model voice waveform data as shown in FIG. , (B), information obtained from the model voice waveform data is displayed in the model display area 51 while scrolling the model display area 51. In the example of FIG. 3A, as information obtained from the sample speech waveform data, display bars (examples of objects) 51a1 to 51a5 having lengths corresponding to the time length of the sentence element section, time series of pitches Line graphs 51b1 to 51b5 representing various changes and bar graphs 51c1 to 51c5 representing time-series changes in sound pressure are displayed in the example display area 51. On the other hand, for example, when the recording button 56 is designated by a user operation in order for the speaker to practice reading aloud, recording of the sound emitted by the speaker and collected by the microphone M is started, and the display processing unit Reference numeral 32 denotes information (not shown) obtained from the speaker voice waveform data in the speaker display area 52 while scrolling the speaker display area 52 in accordance with the input of the speaker voice waveform data indicating the waveform of the voice. Display (display in real time). The scroll bar 55 moves in the right direction following the scrolling of the model display area 51 and the speaker display area 52.

  3A and 3B, vertical lines RP drawn on the model display area 51 and the speaker display area 52 indicate the current playback time (playback position) in the model voice waveform data. The display is fixed at the position shown in the figure without depending on the scrolling of the model display area 51 and the speaker display area 52. The speaker display area 52 is divided into a left display area 521 and a right display area 522 across the vertical line RP. Information obtained from the speaker speech waveform data and displayed in real time is displayed in the left display area 521 (that is, appears from the position of the vertical line RP). In the left display area 521, as information obtained from the speaker voice waveform data, a display bar having a length corresponding to the time length of the sentence element section, a line graph representing a time-series change in pitch, and A bar graph representing a time-series change in sound pressure is displayed.

  In addition, the display processing unit 32 displays a display bar (51a1 to 51a5 in the example of FIG. 3A) having a length corresponding to the time length of the sentence element section specified from the model speech waveform data from the head of the sentence. When displaying in the model display area 51 in time series, the text representing the sentence element is associated with the display bar in time series from the beginning of the sentence, and all or part of the text representing the sentence element is associated with the above. Display in a limited area corresponding to the display bar (that is, draw). Here, the limited area corresponding to the display bar means a finite area depending on the time length of the sentence element section corresponding to the display bar. In the example of FIG. It is an area. For this reason, for example, if a long text (that is, a text representing a sentence element) is associated with a short sentence element section, the entire text may not be drawn in a finite area. For example, the sentence element section corresponding to the display bar 51a3 is associated with "Thank you." (All text representing the sentence element), but the limited area corresponding to the display bar 51a3 is "Thank you." The text representing the sentence element is discontinuously displayed (that is, a part of the text representing the sentence element is displayed).

  In the example of FIG. 3A, the limited area corresponding to the display bar is an area that delineates the display bar, but is not limited thereto. FIG. 4 is a diagram illustrating an example of a limited area corresponding to the display bar. The display bar shown in FIG. 4A corresponds to the display bar 51a3 shown in FIG. 3A, and “Thank you”, which is part of the text representing the sentence element, is within the display bar. On the other hand, in the example of FIG. 4B, “Thank you”, which is a part of the text representing the sentence element, does not fit in the display bar and appears to protrude. The display bar shown in FIG. 4B has gradation, and the color is lighter from the bottom to the top (for example, the background color (for example, white)) cannot be distinguished. In the example of FIG. 4B, not only a region with gradation but also a region that cannot be distinguished from the background color corresponds to a limited region corresponding to the display bar. On the other hand, in the example of FIG. 4C, “Thank you”, which is a part of the text representing the sentence element, is displayed in the area adjacent to the display bar. Also corresponds to a limited area corresponding to the display bar. On the other hand, in the example of FIG. 4D, the left and right ends of the display bar are darker and thinner toward the center. However, the region of the central portion may not be distinguished from the background color. In the example of FIG. 4D, this corresponds to a limited area corresponding to the display bar, including the central area that cannot be distinguished from the background color. In the above example, the shape of the display bar is rectangular. However, the present invention is not limited to this. For example, an object such as a display bar may be a polygon, a circle, or a cloud. , It may be a wavy line or simply indicated by a vertical bar.

  As described above, when the text representing the sentence element is interrupted and displayed in the limited area corresponding to the display bar, smooth reading of the text by the speaker is hindered. Therefore, in the present embodiment, the display processing unit 32 follows the reproduction of the model voice waveform data, and in the example of FIG. 3 (A), “Thank you.” ”) Is displayed in a balloon (pop-up) on the right display area 522 (an example of the second display area) based on the reading start time (that is, reading start timing) of this sentence element (that is, the sentence element section). The window 51p3 displayed as a balloon is erased based on the reading end time of the sentence element (that is, the reading end timing). Thus, the visibility of the text representing the sentence element can be improved by popping up the window in which the entire text representing the sentence element is displayed in the right display area 522 for a certain period of time. For example, a window 51p3 in which text representing a sentence element is displayed is displayed in a balloon when the reading start time of this sentence element comes from the reproduction start time (0:00) of the model voice waveform data. It will be erased when the reading end time of. In this case, the display duration of the window 51p3 is “ST (Standard visible Time)” shown in FIG.

  However, considering the readability of the text by the speaker, it is desirable to provide some time margin for the display continuation period of the window 51p3. For this reason, the display processing unit 32 displays the window 51p3 before a predetermined time BT (Before Time) of the reading start time of the sentence element, and deletes the window 51p3 after a predetermined time AT (After Time) of the reading end time of the sentence element. To do. In this case, the display continuation period of the window 51p3 is “VT (Visible Time)” (= BT + ST + AT) shown in FIG. Thereby, the speaker can read the text displayed in the window 51p3 with a margin. In the example of FIGS. 3A and 3B, the window 51p3 in which “Thank you” is displayed is deleted when the time “VT” has elapsed since the balloon is displayed. The window 51p4 in which “Train is displayed” is displayed in a balloon. The window 51p3 may be configured to be displayed in a balloon from a predetermined time BT before the reading start time of the sentence element, and to be deleted when the reading end time of the sentence element arrives. Further, the window 51p3 may be configured to be displayed in a balloon when the reading start time of the sentence element arrives and to be deleted after a predetermined time AT of the reading end time of the sentence element. Note that the display processing unit 32 is limited to a case where a part of text representing a sentence element is displayed in a limited area corresponding to the display bar (that is, when the text representing the sentence element is displayed intermittently). A window in which all the text representing the sentence element is displayed may be displayed as a balloon in the right display area 522 based on the reading start time of the sentence element. As a result, the processing load for displaying the window and the memory usage capacity can be reduced.

  By the way, the balloon display of the window is a display (pop-up display) in which the window pops up (in other words, superimposes) on the right display area 522, and is displayed so that the window is blown out from the corresponding display bar. That means. By displaying the window so that it blows out from the display bar, the speaker can clearly understand the correspondence between the display bar and the window. Such a balloon display form may be configured in any way as long as the correspondence between the window and the display bar can be clearly understood. In the example of FIG. 3 (B), the window has a shape in which a triangle is added to the top of the rectangle, and the vertex of the triangle is in contact with the display bar. Since the display bar only needs to be pointed, this vertex may be away from the display bar. In addition, the display bar may not be a triangle but may be a figure in which a plurality of ellipses are separated, or a line (for example, a solid line or a broken line). The window may have a shape in which a triangle (or a figure or a line in which a plurality of ellipses are separated) is added to the top of a polygon, a circle, or a cloud. The length (number of pixels) in the time axis direction (horizontal direction) of the window may be configured to change according to the number of characters of text displayed in the window. In this case, the smaller the number of characters in the text, the shorter the length of the window in the time axis direction. In addition, if the correspondence between the display bar and the window is clear, the window may be configured to be simply pop-up displayed on the right display area 522 (that is, no balloon is performed). As another example, the entire text representing the sentence element may be displayed directly on the right display area 522 without popping up the window.

  The full text display area 53 (an example of a third display area) is an area for displaying the entire text representing a sentence (that is, the full text). As shown in FIG. 3A, the display processing unit 32 displays all the text representing a sentence in the whole sentence display area 53 and displays the text representing the sentence element in the right display area 522 (for example, the sentence When a window displaying text representing an element is displayed in a balloon, the display color of the text representing the text element in the text displayed in the full text display area 53 is changed based on the reading start time of the text element (that is, high Light display). Thereby, at the time of reading aloud the text representing the sentence element, the speaker can grasp at a glance where the text is located in the whole sentence. In the example of FIG. 3A, while the window 51p3 in which “Thank you” is displayed is displayed, the text 53h3 “Thank you” representing the sentence element in the text displayed in the full text display area 53. "Is changed from black to red. In the example of FIG. 3A, the text 53h3 is further changed to bold and its background color is also changed. In the example of FIG. 3B, the text 53h4 “this train” representing the sentence element in the text displayed in the full text display area 53 while the window 51p4 displaying “This train is” is displayed. The display color of “has been changed from black to red. In the example of FIG. 3B, the text 53h4 is further changed to bold and its background color is also changed. The comprehensive evaluation score display area 57 includes a comprehensive evaluation score for each evaluation item (for example, inflection, volume, tongue, and speed) in the whole sentence element section, a comprehensive evaluation score for the interval in all interval sections, and a comprehensive evaluation in all sections. This is an area for displaying points (total score). The comprehensive evaluation score displayed in the comprehensive evaluation score display area 57 is calculated by the reading aloud evaluation unit 33 (details will be described later).

  In the example of FIG. 3A, the window 51p3 is displayed in a balloon from a predetermined time BT before the reading start time of the sentence element, and is deleted after the predetermined time AT of the reading end time of the sentence element. When the interval between the matching sentence element sections is shorter than the predetermined time BT or the predetermined time AT, a window corresponding to two adjacent sentence element sections in the right display area 522 (that is, text representing two adjacent sentence elements). As a result, the windows corresponding to the adjacent sentence element sections are displayed overlapping each other, and the visibility may be hindered. FIG. 5A is a diagram illustrating an example of a display screen when a part of the display continuation period of the window corresponding to the adjacent sentence element section overlaps. “VT1” shown in FIG. 5A indicates the display duration of the window corresponding to the display bar 51a4 (not displayed yet in this example), and “VT2” shown in FIG. The window display continuation period corresponding to the bar 51a5 is shown, and “VT3” shown in FIG. 5A indicates the window display continuation period corresponding to the display bar 51a6. As shown in FIG. 5A, since the display continuation period VT1 and the display continuation period VT2 have overlapping periods, the display continuation period VT1 and the display continuation period VT2 have overlapping periods. In the overlapping period, two windows (that is, texts representing two adjacent sentence elements) are displayed overlapping each other, and visibility is hindered.

  Therefore, in the present embodiment, a plurality of balloon display areas (an example of partial display areas) are set in the right display area 522. The display processing unit 32 then overlaps a part of the display continuation period in which the windows corresponding to the two adjacent sentence element sections (that is, the text representing the two adjacent sentence elements) are displayed in the right display area 522. Each window is displayed in a different balloon display area. Thereby, it can avoid that the window corresponding to two adjacent sentence element areas is displayed overlapping, and the visibility of the text showing two adjacent sentence elements can be improved. FIG. 5B is a diagram illustrating an example of a display screen when two balloon display areas are set in the right display area 522. In the example of FIG. 5B, the first balloon area 522a and the second balloon area 522b are set, but three or more first balloon areas may be set. In the example of FIG. 5B, the first balloon area 522a and the second balloon area 522b are set up and down, but the first balloon area 522a and the second balloon area 522b are set left and right. Also good.

  6A to 6C are diagrams (transition diagrams) illustrating an example of a display screen when windows corresponding to two adjacent sentence element sections are displayed in different balloon display areas. First, as shown in FIG. 6A, a window 51p4 in which “This train is” is displayed in a balloon in the first balloon area 522a at a point before a predetermined time BT of the reading start time of the sentence element. Subsequently, as shown in FIG. 6B, a window 51p5 in which “Tozai Line Express” is displayed is displayed in a balloon in the second balloon area 522b at a time before the predetermined time BT of the reading start time of the sentence element. . Thus, the windows 51p4 and 51p5 corresponding to two adjacent sentence element sections are displayed in a balloon in different balloon areas. In addition, in the example of FIG. 6B, since the background colors of the two windows 51p4 and 51p5 are different from each other, the visibility of text representing two adjacent sentence elements can be greatly improved. Subsequently, as shown in FIG. 6C, the window 51p4 in which “This train is” is deleted from the first balloon area 522a at a time after a predetermined time AT of the reading end time of the sentence element. After that, windows corresponding to two adjacent sentence element sections are alternately displayed in different balloon areas. In the example of FIG. 6C, when the window 51p4 is deleted from the first balloon area 522a, the window 51p5 is deleted from the second balloon area 522b and displayed in a balloon in the first balloon area 522a. It may be configured. That is, of the texts that represent two adjacent sentence elements, the window that displays the text to be read aloud first is always displayed in the upper first balloon area 522a, and the window that displays the text to be read aloud later is always the lower second. 2 so that it is displayed in the balloon area 522b.

  As described above, FIG. 3 and FIG. 6 show an example in which the entire text representing the sentence element is displayed in one line in the window, but the display processing unit 32 displays the total number of characters representing the text element. If is greater than the threshold, the entire text may be displayed in a window with multiple lines. Here, the threshold value is set to the maximum number of characters that can be displayed in one line in the window, for example. FIG. 7A is a diagram illustrating an example of a display screen in the case where all the text representing a sentence element is displayed in a window with a plurality of lines. In the example of FIG. 7A, the text in the window 51p11 displayed in a balloon in the first balloon area 522a is displayed in two lines. As a result, even if the number of characters in the text associated with the sentence element section is large, the entire text can be displayed in the window. When all the text is displayed in the window in a plurality of lines, the text may be separated at a position such as a punctuation mark, and the text after the separated position may be displayed on the next line.

  As another example, when the total number of characters of the text representing the sentence element is larger than the threshold value, the display processing unit 32 may switch the text part by part at a predetermined time interval and display it in the window. Here, the threshold is set to the maximum number of characters that can be displayed in a plurality of lines in the window, for example. FIGS. 7B and 7C are diagrams illustrating an example of a display screen when a part of text representing a sentence element is switched at predetermined time intervals and displayed in a window. As shown in FIG. 7B, a part of the text representing the sentence element is displayed in two lines in the window 51p11 displayed in a balloon in the first balloon area 522a. As shown in (C), the text representing the sentence element is scrolled upward by one line, so that the remaining text that has not been displayed is displayed. As a result, even when the number of characters in the text associated with the sentence element section is large, the entire text can be effectively switched and displayed in the window.

  As described above, the display processing unit 32 scrolls the model display area 51 and the speaker display area 52 on which the display bar is displayed according to the reproduction of the model voice waveform data. A window in which all the text representing the sentence element is displayed is fixedly displayed on the right display area 522 (for example, an upper layer on the right display area 522). FIGS. 8A and 8B are diagrams (transition diagrams) illustrating an example of a display screen when a window is fixedly displayed even while the display area is scrolled. As shown in FIG. 8A, the window 51p21 is displayed as a balloon from the display bar 51a21. Thereafter, as shown in FIG. 8B, the display bar 51a21 corresponding to the window 51p21 is scrolled leftward. Even if the window 51p21 is moved, the window 51p21 is fixedly displayed (that is, the window 51p21 does not move to the area outside the first balloon area 522a or the second balloon area 522b, for example, the position of popNG). Thereby, the visibility of the text showing a sentence element can be improved.

  Next, the reading aloud evaluation unit 33 associates the sentence element section specified from the model voice waveform data with the sentence element section specified from the speaker voice waveform data, and performs reading of the speaker for each sentence element section. Evaluate. At this time, the reading aloud evaluation part 33 is good to evaluate a speaker's reading aloud for every said area and every several evaluation item. Here, examples of evaluation items include intonation, volume, smooth tongue, and speed. For example, the reading aloud evaluation unit 33 calculates the difference between the pitch specified from the model voice waveform data and the pitch specified from the speaker voice waveform data for each sentence element section, and based on the calculated difference. The evaluation score for the speaker inflection is calculated for each sentence element section. The evaluation score is calculated so that, for example, 30 points are full marks, and the difference is close to 0, and becomes higher (closer to full marks). Further, the reading aloud evaluation unit 33 calculates, for each sentence element section, a difference between the sound pressure specified from the model voice waveform data and the sound pressure specified from the speaker voice waveform data, and based on the calculated difference. The evaluation score for the speaker volume is calculated for each sentence element section. This evaluation score is calculated so that, for example, 30 points are full, and the higher the difference is, the higher the score is. Further, the reading aloud evaluation unit 33 calculates the similarity between the feature quantity indicating the vocal tract characteristic of the model and the feature quantity indicating the vocal tract characteristic of the speaker for each sentence element section, and based on the calculated similarity, An evaluation score for the speaker's smooth tongue is calculated for each sentence element section. This evaluation score is calculated so that, for example, the maximum score is 30, and the higher the similarity is, the higher the score is. Moreover, the reading aloud evaluation unit 33 calculates, for each sentence element section, a time difference between the time length of the sentence element section specified from the model voice waveform data and the time length of the sentence element section specified from the speaker voice waveform data. Then, based on the calculated absolute value of the time difference, an evaluation score for the speaker speed (reading speed) is calculated for each sentence element section. This evaluation score is calculated so that, for example, 30 points are full, and the absolute value of the time difference is closer to 0.

  Further, the reading aloud evaluation unit 33 associates the interval section specified from the model voice waveform data with the interval section specified from the speaker voice waveform data, for each interval section with respect to the speaker interval (interval). An evaluation may be performed. For example, the reading aloud evaluation unit 33 calculates the time difference between the time length of the interval section specified from the sample speech waveform data and the time length of the interval section specified from the speaker speech waveform data for each interval section. Based on the absolute value of the time difference, an evaluation score for the speaker gap is calculated for each interval section. This evaluation score is calculated so that, for example, 30 points are full, and the absolute value of the time difference is closer to 0. Moreover, the reading aloud evaluation unit 33 calculates an average value (or a total value) of evaluation points for each evaluation item for each sentence element section as a total evaluation point for each evaluation item in the whole sentence element section, and evaluates for each interval section. The average value (or the total value) may be calculated as an overall evaluation score for the interval in all interval sections. Furthermore, the reading aloud evaluation unit 33 calculates the total value (which may be an average value) of the overall evaluation score for each evaluation item in the whole sentence element section and the overall evaluation score for the interval in all the interval sections as the whole section (the whole sentence element section and the entire sentence section). It is calculated as an overall evaluation score in the interval section).

[2. Operation of reading aloud practice device S]
Next, the operation of the reading aloud practice device S will be described with reference to FIGS. 9 to 11 are diagrams showing examples of display screen transition when the speaker practice reading aloud. FIG. 12 is a flowchart showing an example of the display process performed during the reading aloud practice process by the control unit 3, and FIG. 13 is a flowchart showing an example of the window [i] display process by the control unit 3. FIG. 14 is a diagram showing an example of a balloon list created in the display process shown in FIG. FIG. 15 is a conceptual diagram showing a display state of a window displayed in a balloon in the display process shown in FIG.

  First, when the reading aloud practice processing program is started by a user operation and the reading aloud practice processing is started, a display screen shown in FIG. In the display screen shown in FIG. 9A, when the speaker designates the practice button 61 by user operation (for example, clicks with the mouse), the display screen shown in FIG.

  Next, on the display screen shown in FIG. 9B, when the speaker inputs the user ID and password by user operation and designates the login button 62, the login process is started. During this login process, the display screen shown in FIG. When the login process is successful, the display screen shown in FIG. 10A is displayed on the display D.

  Next, on the display screen shown in FIG. 10A, when the speaker selects a file name of a sentence to be read aloud from the list display 63 by a user operation and designates the practice button 64, it corresponds to the selected file name. The model voice waveform data and text data to be read are read from the storage unit 2 into the control unit 3, and the display screen shown in FIG.

  Next, on the display screen shown in FIG. 10B, when the speaker specifies the playback button 54 and the recording button 56 (or only the recording button 56) by the user operation, playback of the model voice waveform data, The input of the speaker voice waveform data (that is, the recording of the voice uttered by the speaker) and the scroll of the model display area 51 and the speaker display area 52 are started, and the display process shown in FIG. 12 is started. . The arrow button 58 displayed on the display screen shown in FIG. 10B is a button for returning the scroll of the model display area 51 and the speaker display area 52 to the top. Also, the time T displayed on the display screen shown in FIG. 10B indicates how many seconds have elapsed from the beginning of the current scroll position (for example, the current playback time RP). Thus, when the display process shown in FIG. 12 is started, the display of the recording button 56 is switched to the display of the end button 59 as shown in FIG. Details of the display process will be described later. Then, when the speaker designates the end button 59 by the user operation or the reproduction of the model voice waveform data is finished (in other words, the recording of the voice uttered by the speaker is finished), the display D shows FIG. A display screen shown in A) is displayed.

  Next, on the display screen shown in FIG. 11A, when the speaker designates the “Yes” button 65 by the user operation, an evaluation (scoring) process for the speaker's reading by the reading aloud evaluation unit 33 is started. During this evaluation process, the display screen shown in FIG. On the other hand, on the display screen shown in FIG. 11A, when the speaker designates the “No” button 66 by a user operation, the state returns to the state before practice (ie, before the recording button 56 is designated). When the evaluation process is completed, the display screen shown in FIG.

  In the speaker display area 52 of the display screen shown in FIG. 11C, information obtained from the speaker voice waveform data is displayed. Among them, when the speaker performs a user operation, for example, a mouse pointer is superimposed on the display bar 52a1 having a length corresponding to the time length of a certain sentence element section (that is, the mouse is over), the speaker in the sentence element section is displayed. A window 67 that displays the evaluation result (evaluation score for each evaluation item) for the reading of the voice is popped up. In this window 67, an icon in a display form corresponding to a score (full score) and a score rate with respect to the full score is displayed on the right side of the evaluation score for each evaluation item. For example, an icon representing sunny indicates 90% or more, an icon representing cloudy indicates 50 to 79%, and an icon representing rain indicates less than 50%.

  In addition, in the comprehensive evaluation point display area 57 of the display screen shown in FIG. 11C, the overall evaluation point for each evaluation item (for example, inflection, volume, smooth tongue, and speed) in the whole sentence element interval, and all interval intervals A comprehensive evaluation score for the time interval and a comprehensive evaluation score (total score) in all sections are displayed. Then, when the speaker performs a user operation, for example, a mouse pointer is superimposed on the total evaluation score display area 57 (that is, the mouse is over), the total evaluation score display area 68 is displayed instead of the total evaluation score display area 57. Is done. In the overall evaluation score display area 68, a score (full score) is displayed on the right side of the overall evaluation score for each evaluation item in the whole sentence element section.

  Details of the display process shown in FIG. 12 will be described here. When the display process shown in FIG. 12 is started, the main module of the display processing unit 32 in the control unit 3 creates a balloon list based on the sentence element section list associated with the read sample speech waveform data. (Step S1). The balloon list shown in FIG. 14 is a list created based on the sentence element section list shown in FIG. The records in each window [i] shown in FIG. 14 correspond one-to-one with the records in each sentence element section [i] shown in FIG. As shown in FIG. 14, text, a display start time, a display end time, a display state, a display area, the number of display lines, and a switching time are registered in the record of each window [i]. Each text (text representing a sentence element) shown in FIG. 14 is displayed in a corresponding window [i]. The display start time of each window [i] shown in FIG. 14 is set to a time before the predetermined time BT from the corresponding reading start time (speech reading start time with the same [i] in FIG. 2). The display end time of each window [i] shown in FIG. 14 is set to a time after a predetermined time AT from the reading end time (reading end time with the same [i] in FIG. 2). In the example of FIG. 14, the predetermined time BT and the predetermined time AT are each set to 500 ms, but this time can be arbitrarily set. The display state of the window [i] shown in FIG. 14 is a variable status, and this status is displayed as ready (display preparation) → visible (displaying) → complete (display) as shown in FIG. Displayed) and change. “Current Time” shown in FIG. 15 indicates the current playback time (playback position).

  Further, the display area of the window [i] shown in FIG. 14 indicates an area where the window [i] is displayed. The window [i] whose display area is “1” is displayed in the first balloon area 522a, while the window [i] whose display area is “2” is displayed in the second balloon area 522b. The number of lines displayed in the window [i] shown in FIG. 14 indicates the number of lines in which the text representing the sentence element is displayed in the window [i]. For example, if the maximum number of characters that can be displayed in one line in the window [i] is preset, the number of display lines in the window [i] is set to the preset maximum number of characters. Determined by dividing by the number of characters. The switching time of the window [i] shown in FIG. 14 is the window [i] when the text representing the sentence element does not fit in the maximum number of display lines (for example, two lines) preset in the window [i]. The time from the display start time to the start of scrolling (start of switching) by one line is shown. For example, if the maximum number of display lines set in advance in window [i] is 2 lines and 3 lines are required to display all the text in window [i], the switching time of window [i] is , Calculated by adding 1.5 × (display continuation time per line) to the display start time of window [i]. Here, the display continuation time per line is calculated by dividing the display continuation period from the display start time of window [i] to the display end time by the number of display lines of window [i]. In the above case, the switching time of the window [i] may be calculated as a time obtained by adding 2 × (display continuation time per line) to the display start time of the window [i].

  Next, the main module of the display processing unit 32 substitutes 0 for a variable i indicating a serial number (step S2). Next, the main module of the display processing unit 32 determines whether or not the display of all the windows registered in the balloon list created in step S1 is completed (step S3). When the main module of the display processing unit 32 determines that the display of all the windows registered in the balloon list has been completed (step S3: YES), the display process ends. On the other hand, when the main module of the display processing unit 32 determines that the display of all windows registered in the balloon list is not complete (step S3: NO), the display state is complete (display) with reference to the balloon list. The window that satisfies the condition that the display start time is not more than Current Time (that is, the display start time has arrived) is searched (step S4). Then, the main module of the display processing unit 32 determines whether or not a window satisfying the condition in step S4 has been extracted (step S5). If the main module of the display processing unit 32 determines that a window satisfying the condition in step S4 is not extracted (step S5: NO), the process returns to step S3. On the other hand, if the main module of the display processing unit 32 determines that a window satisfying the condition in step S4 has been extracted (step S5: YES), the process proceeds to step S6.

  In step S6, the main module of the display processing unit 32 determines whether or not the display state of the window [i] is ready (in preparation for display). When the main module of the display processing unit 32 determines that the display state of the window [i] is not ready (in preparation for display) (step S6: NO), the variable i is incremented by 1 (step S8), and the process proceeds to step S9. . In step S9, the main module of the display processing unit 32 determines whether or not all the windows extracted in step S5 have been processed. When the main module of the display processing unit 32 determines that all the windows extracted in step S5 have not been processed (step S9: NO), the process returns to step S6. On the other hand, if the main module of the display processing unit 32 determines that all the windows extracted in step S5 have been processed (step S9: YES), the process returns to step S3.

  On the other hand, when the main module of the display processing unit 32 determines that the display state of the window [i] is ready (in preparation for display) (step S6: YES), the display processing unit 32 displays the display processing unit according to the display command of the window [i]. 32 submodules are generated and executed (step S7). That is, in this example, the submodule of the display processing unit 32 is executed by a new thread independent of the thread in which the main module is executed. As shown in FIG. 13, the submodule of the display processing unit 32 starts displaying the window [i]. Note that the submodule of the display processing unit 32 performs processing asynchronously with the main module of the display processing unit 32. Such processing is performed, for example, by the multitask function of the OS. Further, as described above, when some of the display continuation periods of the plurality of windows [i] overlap, a plurality of submodules may operate asynchronously in the same time zone.

  When the display of the window [i] shown in FIG. 13 is started, the submodule of the display processing unit 32 displays the window [i] in a balloon in the first balloon area 522a or the second balloon area 522b according to the balloon list. Next, the submodule of the display processing unit 32 sets the display state of the window [i] to visible (displaying) (Step S11), thereby updating the display state of the window [i] in the balloon list. Next, the submodule of the display processing unit 32 substitutes 0 for the variable j (step S12). Next, the submodule of the display processing unit 32 refers to the balloon list and determines whether or not there is display switching of the window [i] (step S13). For example, if the switching time of window [i] is registered in the balloon list, the sub-module of the display processing unit 32 determines that there is display switching of window [i] (step S13: YES), and proceeds to step S14. move on. On the other hand, when the submodule of the display processing unit 32 determines that there is no display switching of the window [i] (step S13: NO), the process proceeds to step S17.

  In step S14, the submodule of the display processing unit 32 refers to the balloon list and determines whether or not the j-th switching time has come from the display start time of the window [i]. If the submodule of the display processing unit 32 determines that the j-th switching time has arrived (step S14: YES), the process proceeds to step S15. On the other hand, when the sub-module of the display processing unit 32 determines that the j-th switching time has not arrived (step S14: NO), the process returns to step S13. In step S15, the submodule of the display processing unit 32 performs one-line display switching (that is, scrolls upward by one line) in the window [i]. Next, the submodule of the display processing unit 32 increments the variable j by 1 (step S16), and proceeds to step S17.

  In step S17, the sub-module of the display processing unit 32 refers to the balloon list, and the display end time of the window [i] is less than or equal to Current Time (current playback time) (that is, the display end time has arrived). It is determined whether or not. If the submodule of the display processing unit 32 determines that the display end time of the window [i] is less than or equal to Current Time (current playback time) (step S17: YES), the process proceeds to step S18. On the other hand, when the sub-module of the display processing unit 32 determines that the display end time of the window [i] is not less than the Current Time (current playback time) (step S17: NO), the process returns to step S13. In step S18, the display state of window [i] is set to complete (displayed) to update the display state of window [i] in the balloon list, and window [i] is deleted (display deleted). finish.

  As described above, according to the embodiment, the reading aloud practice device S displays a window in which all of the text representing the sentence element associated with the display bar is displayed based on the reading start time of the sentence element. Since the window displayed in a balloon is erased based on the reading end time of the sentence element, the text representing the sentence element is displayed in a limited area corresponding to the display bar. However, it is possible to effectively support the smooth reading of the text by the speaker.

DESCRIPTION OF SYMBOLS 1 Communication part 2 Memory | storage part 3 Control part 4 Operation part 5 Interface part 6 Bus 31 Voice processing part 32 Display processing part 33 Reading aloud evaluation part S Reading aloud practice apparatus

Claims (12)

  1. Reproduction control means for reproducing voice waveform data indicating a waveform of a voice when reading a sentence;
    A sentence element section divided into a plurality based on the speech waveform data, and having a length corresponding to a time length of the sentence element section from a reading start timing to a reading end timing of each of the plurality of sentence elements constituting the sentence. First display control means for displaying an object in the first display area in time series from the beginning of the sentence;
    A text divided into a plurality based on the text data of the sentence, and the text representing the sentence element is associated with the object in time series from the head of the sentence, and all or part of the text representing the sentence element Second display control means for displaying in a limited area corresponding to the associated object;
    In accordance with the reproduction of the speech waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is read aloud. Third display control means for erasing based on the end timing;
    A reading aloud practice device characterized by comprising:
  2.   The third display control means pops up a window in which the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element in accordance with the reproduction of the speech waveform data, The reading aloud practice device according to claim 1, wherein the window displayed in a pop-up is deleted based on the reading end timing.
  3. The first display control means scrolls the first display area where the object is displayed in accordance with the reproduction of the audio waveform data,
    3. The reading aloud practice device according to claim 2, wherein the third display control means fixedly displays the window in which all the text representing the sentence element is displayed.
  4.   4. The third display control unit displays all of the text in the window in a plurality of lines when the total number of characters of the text representing the sentence element is larger than a threshold value. Reading aloud practice device.
  5.   3. The third display control means, when the total number of characters of the text representing the sentence element is larger than a threshold, switches the text part by part at a predetermined time interval and displays it in the window. The reading aloud practice apparatus as described in any one of thru | or 4.
  6.   The third display control means starts reading the sentence element aloud only when a part of the text representing the sentence element is displayed in a limited area corresponding to the object. The reading practice device according to any one of claims 1 to 5, wherein the second display area is displayed based on timing.
  7.   7. The reading aloud according to claim 1, wherein the third display control unit displays all of the text representing the sentence element from a predetermined time before the reading start timing of the sentence element. Practice equipment.
  8.   The said 3rd display control means erases all the said text displayed on the said 2nd display area after the predetermined time of the said reading end timing, The one of Claim 1 thru | or 7 characterized by the above-mentioned. Reading aloud device.
  9. A plurality of partial display areas are set in the second display area,
    When a part of the period in which the texts representing two adjacent sentence elements are displayed in the second display area overlaps, the third display control means displays the texts in the partial display areas different from each other. The reading practice device according to any one of claims 1 to 8, wherein
  10.   A text that represents all of the text representing the sentence is displayed in a third display area, and a display color of the text representing the sentence element in the text displayed in the third display area is changed based on a reading start timing of the sentence element. The reading aloud practice device according to claim 1, further comprising 4 display control means.
  11. A display control method executed by one or more computers,
    A reproduction control step for reproducing voice waveform data indicating a voice waveform when reading a sentence aloud;
    A sentence element section divided into a plurality based on the speech waveform data, and having a length corresponding to a time length of the sentence element section from a reading start timing to a reading end timing of each of the plurality of sentence elements constituting the sentence. A first display control step of displaying the object in the first display area in time series from the beginning of the sentence;
    A text divided into a plurality based on the text data of the sentence, and the text representing the sentence element is associated with the object in time series from the head of the sentence, and all or part of the text representing the sentence element A second display control step of displaying a limited area corresponding to the associated object;
    In accordance with the reproduction of the speech waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is read aloud. A third display control step for erasing based on the end timing;
    A display control method comprising:
  12. A reproduction control step for reproducing voice waveform data indicating a voice waveform when reading a sentence aloud;
    A sentence element section divided into a plurality based on the speech waveform data, and having a length corresponding to a time length of the sentence element section from a reading start timing to a reading end timing of each of the plurality of sentence elements constituting the sentence. A first display control step of displaying the object in the first display area in time series from the beginning of the sentence;
    A text divided into a plurality based on the text data of the sentence, and the text representing the sentence element is associated with the object in time series from the head of the sentence, and all or part of the text representing the sentence element A second display control step of displaying a limited area corresponding to the associated object;
    In accordance with the reproduction of the speech waveform data, the entire text representing the sentence element is displayed in the second display area based on the reading start timing of the sentence element, and the entire text displayed in the second display area is read aloud. A third display control step for erasing based on the end timing;
    A program that causes a computer to execute.
JP2016041017A 2016-03-03 2016-03-03 Reading aloud practice device, display control method, and program Active JP6418179B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2016041017A JP6418179B2 (en) 2016-03-03 2016-03-03 Reading aloud practice device, display control method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2016041017A JP6418179B2 (en) 2016-03-03 2016-03-03 Reading aloud practice device, display control method, and program

Publications (2)

Publication Number Publication Date
JP2017156615A JP2017156615A (en) 2017-09-07
JP6418179B2 true JP6418179B2 (en) 2018-11-07

Family

ID=59809681

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2016041017A Active JP6418179B2 (en) 2016-03-03 2016-03-03 Reading aloud practice device, display control method, and program

Country Status (1)

Country Link
JP (1) JP6418179B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111645A (en) * 2019-06-04 2019-08-09 西南石油大学 A kind of intelligence religion rod based on voice control

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000206859A (en) * 1999-01-13 2000-07-28 Kobe Steel Ltd Information output device
JP4075349B2 (en) * 2001-10-18 2008-04-16 カシオ計算機株式会社 Electronic book apparatus and electronic book data display control method
US20060057545A1 (en) * 2004-09-14 2006-03-16 Sensory, Incorporated Pronunciation training method and apparatus
JP5007401B2 (en) * 2005-01-20 2012-08-22 株式会社国際電気通信基礎技術研究所 Pronunciation rating device and program
JP2007139868A (en) * 2005-11-15 2007-06-07 Yamaha Corp Language learning device
JP5340797B2 (en) * 2009-05-01 2013-11-13 任天堂株式会社 Learning support program and learning support apparatus
JP6283936B2 (en) * 2014-02-26 2018-02-28 ブラザー工業株式会社 Karaoke device and karaoke program

Also Published As

Publication number Publication date
JP2017156615A (en) 2017-09-07

Similar Documents

Publication Publication Date Title
US10002605B2 (en) Method and system for achieving emotional text to speech utilizing emotion tags expressed as a set of emotion vectors
US9720644B2 (en) Information processing apparatus, information processing method, and computer program
JP4370811B2 (en) Voice display output control device and voice display output control processing program
US10210769B2 (en) Method and system for reading fluency training
EP0832480B1 (en) Dynamically redefining keys on a keyboard
JP5789608B2 (en) System and method for tactile enhanced text interface
JP4114888B2 (en) Voice quality change location identification device
US8682671B2 (en) Method and apparatus for generating synthetic speech with contrastive stress
US8015011B2 (en) Generating objectively evaluated sufficiently natural synthetic speech from text by using selective paraphrases
EP1835488B1 (en) Text to speech synthesis
JP4678193B2 (en) Voice data recognition device, note display device, voice data recognition program, and note display program
US5544305A (en) System and method for creating and executing interactive interpersonal computer simulations
US6324511B1 (en) Method of and apparatus for multi-modal information presentation to computer users with dyslexia, reading disabilities or visual impairment
JP3520022B2 (en) Foreign language learning device, foreign language learning method and medium
AU767280B2 (en) Text processor
JP5120826B2 (en) Pronunciation diagnosis apparatus, pronunciation diagnosis method, recording medium, and pronunciation diagnosis program
US5850629A (en) User interface controller for text-to-speech synthesizer
JP5895740B2 (en) Apparatus and program for performing singing synthesis
JP3250559B2 (en) Lyric creating apparatus, lyrics creating method, and recording medium storing lyrics creating program
EP1455340A1 (en) Singing voice synthesizing apparatus with selective use of templates for attack and non-attack notes
US8103510B2 (en) Device control device, speech recognition device, agent device, on-vehicle device control device, navigation device, audio device, device control method, speech recognition method, agent processing method, on-vehicle device control method, navigation method, and audio device control method, and program
US20020086269A1 (en) Spoken language teaching system based on language unit segmentation
KR100524590B1 (en) Reading apparatus and method using display device
US9412352B2 (en) Recording audio in association with display content
US20070136062A1 (en) Method and apparatus for labelling speech

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20170901

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20180828

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20180911

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20180924

R150 Certificate of patent or registration of utility model

Ref document number: 6418179

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150