JP2014059330A

JP2014059330A - Tone display control device and program

Info

Publication number: JP2014059330A
Application number: JP2012202584A
Authority: JP
Inventors: Kiyotaka Kaji; 清貴梶
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2012-09-14
Filing date: 2012-09-14
Publication date: 2014-04-03

Abstract

PROBLEM TO BE SOLVED: To provide a tone display control device in which when in language learning by way of oral reading, voice reading a text is input by a user, it can be easily confirmed via visual observation whether a tone of the voice is appropriately changed as to the text.SOLUTION: When a model voice reading out a sentence is input by directly reading out by a teacher or the like, or by selecting from a database and outputting from a voice output part, the model voice data is changed into a text by way of voice recognition, a tone at each word end part is determined, a model text Tm, in which a letter at a word end of each text-changed word is discriminated by a form according to a tone thereof, is generated, and displayed in a model display area 17M.When a read-out voice by a user (learner) is input while following the model voice and the text Tm, a user text Tu, which is generated by discriminating a word end part of each word by means of the voice recognition text-making processing and tone determination processing as mentioned above, is displayed in a comparative manner in a user display area 17U adjoining the model display area 17M.

Description

本発明は、例えば母語でない言語の音調を学習するための音調表示制御装置およびその制御プログラムに関する。 The present invention relates to a tone display control device for learning, for example, a tone of a language that is not a native language, and a control program therefor.

外国語等の例文の学習において、先生による又は録音された模範の音声データを聞いて、学習者が当該模範音声を参考に朗読する学習が広く行われている。 In the learning of example sentences such as foreign languages, there is a widespread learning in which a learner reads a model voice data by referring to the model voice data recorded by a teacher or recorded.

また、例文に対応した位置に、模範の音声波形と学習者の読み上げた音声波形とを並べて表示させることで、その相違を確認して学習できる言語学習装置も考えられている（例えば、特許文献１参照。）。 In addition, a language learning device is also conceivable in which a model speech waveform and a speech waveform read out by a learner are displayed side by side at positions corresponding to example sentences, so that the difference can be confirmed and learned (for example, Patent Documents). 1).

特開２０００−２５０４０１号公報JP 2000-250401 A

前記従来の言語学習装置では、模範の音声波形と学習者の音声波形との相違を確認し、当該学習者自身の音声波形が模範の音声波形に近い波形になるよう学習することができる。しかしながら、音声波形からでは、強弱は容易に視認できるものの、学習者自身の読み上げ音声が正しい音調か間違っているのか、音声波形からではよくわからない問題がある。例えば、学習者によっては単語の語尾を上げたつもりでも実際には上がっておらず、強く発音している場合もある。 In the conventional language learning device, the difference between the exemplary speech waveform and the learner's speech waveform can be confirmed, and learning can be performed so that the learner's own speech waveform is similar to the exemplary speech waveform. However, although the strength and weakness can be easily visually recognized from the speech waveform, there is a problem that it is not clear from the speech waveform whether the read-out speech of the learner is correct or wrong. For example, some learners may not pronounce the word even if they intend to raise the ending of the word, but may pronounce it strongly.

本発明は、このような課題に鑑みなされたもので、テキストについて音声入力された際に、そのテキストについて適切に音調が変更された音声になっているかどうか目視で容易に確認することが可能になる音調表示制御装置およびその制御プログラムを提供することを目的とする。 The present invention has been made in view of such a problem, and when a voice is input to a text, it is possible to easily confirm visually whether the sound has been appropriately changed in tone for the text. It is an object to provide a tone display control device and a control program therefor.

本発明に係る音調表示制御装置は、表示部と、文の読み上げ音声を入力する音声入力手段と、この音声入力手段により入力された音声を認識してテキスト化する音声認識手段と、前記音声入力手段により入力された音声の音調を前記音声認識手段によりテキスト化された単語毎に判定する音調判定手段と、前記音声認識手段によりテキスト化されたテキストデータと前記音調判定手段により判定された単語毎の音調に基づいて、各単語をその音調に応じた形態に識別してなる音調還元テキストデータを生成する音調還元テキスト生成手段と、この音調還元テキスト生成手段により生成された音調還元テキストデータを前記表示部に表示させる音調還元テキスト表示制御手段と、を備えたことを特徴としている。 The tone display control apparatus according to the present invention includes a display unit, voice input means for inputting a sentence-reading voice, voice recognition means for recognizing the voice inputted by the voice input means, and converting the voice into text. Tone determination means for determining the tone of the voice input by the means for each word made into text by the speech recognition means, text data converted to text by the speech recognition means, and for each word determined by the tone determination means Tone reduction text generation means for generating tone reduction text data obtained by identifying each word in a form according to the tone, and tone reduction text data generated by the tone reduction text generation means And a tone reduction text display control means to be displayed on the display unit.

本発明に係るプログラムは、表示部を備えた電子機器のコンピュータを制御するためのプログラムであって、前記コンピュータを、文の読み上げ音声を入力する音声入力手段、この音声入力手段により入力された音声を認識してテキスト化する音声認識手段、前記音声入力手段により入力された音声の音調を前記音声認識手段によりテキスト化された単語毎に判定する音調判定手段、前記音声認識手段によりテキスト化されたテキストデータと前記音調判定手段により判定された単語毎の音調に基づいて、各単語をその音調に応じた形態に識別してなる音調還元テキストデータを生成する音調還元テキスト生成手段、この音調還元テキスト生成手段により生成された音調還元テキストデータを前記表示部に表示させる音調還元テキスト表示制御手段、として機能させることを特徴としている。 A program according to the present invention is a program for controlling a computer of an electronic device having a display unit, and the computer inputs voice input means for inputting a text-to-speech voice, and voice input by the voice input means. Speech recognition means for recognizing text, sound tone determination means for determining the tone of the voice input by the voice input means for each word textified by the voice recognition means, and text converted by the voice recognition means Tone reduction text generation means for generating tone reduction text data by identifying each word in a form corresponding to the tone based on the text data and the tone of each word determined by the tone determination means, and this tone reduction text Tone reduction text display system for displaying the tone reduction text data generated by the generation means on the display unit. Means, is characterized in that to function as a.

本発明によれば、テキストについて音声入力された際に、そのテキストについて適切に音調が変更された音声になっているかどうか、目視で容易に確認することが可能になる。 According to the present invention, when a voice is input for a text, it is possible to easily visually confirm whether or not the voice is appropriately changed in tone.

本発明の音調表示制御装置の実施形態に係るタッチパネル式ＰＤＡ１０の電子回路の構成を示すブロック図。The block diagram which shows the structure of the electronic circuit of touchscreen PDA10 which concerns on embodiment of the tone display control apparatus of this invention. 言語学習支援画面Ｇを表示させたタッチパネル式ＰＤＡ１０の外観構成を示す正面図。The front view which shows the external appearance structure of touchscreen PDA10 on which the language learning assistance screen G was displayed. 前記タッチパネル式ＰＤＡ１０の言語学習支援機能に伴う基本動作を説明する図。The figure explaining the basic operation | movement accompanying the language learning assistance function of the said touchscreen PDA10. 前記タッチパネル式ＰＤＡ１０の言語学習支援処理を示すフローチャート。The flowchart which shows the language learning assistance process of the said touchscreen PDA10. 前記タッチパネル式ＰＤＡ１０の言語学習支援処理に伴う音声認識＆音調検出処理を示すフローチャート。The flowchart which shows the speech recognition & tone detection process accompanying the language learning assistance process of the said touchscreen PDA10. 前記タッチパネル式ＰＤＡ１０の言語学習支援処理に伴う学習テキスト表示ウインドウ１７Ｔでの表示動作（その１）を示す図。The figure which shows the display operation (the 1) in the learning text display window 17T accompanying the language learning assistance process of the said touchscreen PDA10. 前記タッチパネル式ＰＤＡ１０の言語学習支援処理に伴う学習テキスト表示ウインドウ１７Ｔでの表示動作（その２）を示す図。The figure which shows the display operation (the 2) in the learning text display window 17T accompanying the language learning assistance process of the said touchscreen PDA10. 前記タッチパネル式ＰＤＡ１０の言語学習支援処理による中国語の音読学習に伴う学習テキスト表示ウインドウ１７Ｔでの表示動作を示す図。The figure which shows the display operation in the learning text display window 17T accompanying the reading aloud of Chinese by the language learning assistance process of the said touchscreen PDA10.

以下図面により本発明の実施の形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本発明の音調表示制御装置の実施形態に係るタッチパネル式ＰＤＡ１０の電子回路の構成を示すブロック図である。 FIG. 1 is a block diagram showing a configuration of an electronic circuit of a touch panel PDA 10 according to an embodiment of a tone display control device of the present invention.

この音調表示制御装置は、以下に説明する音読学習支援機能を搭載したタッチパネル式ＰＤＡ(personal digital assistants)１０として構成されるか、ＰＣ(personal computer)、携帯電話、携帯ゲーム機等として構成される。 This tone display control device is configured as a touch panel PDA (personal digital assistants) 10 equipped with a reading aloud learning support function described below, or as a PC (personal computer), a mobile phone, a portable game machine, or the like. .

このＰＤＡ１０は、各種の記録媒体に記録されたプログラム、又は、伝送されたプログラムを読み込んで、その読み込んだプログラムによって動作が制御されるコンピュータによって構成され、その電子回路には、ＣＰＵ(central processing unit)１１が備えられる。 The PDA 10 is configured by a computer that reads a program recorded on various recording media or a transmitted program and whose operation is controlled by the read program. The electronic circuit includes a CPU (central processing unit). ) 11 is provided.

ＣＰＵ１１は、メモリ１２内に予め記憶された装置制御プログラム、あるいはＲＯＭカードなどの外部記録媒体１３から記録媒体読取部１４を介して前記メモリ１２に読み込まれた装置制御プログラム、あるいはインターネットＮ上のＷｅｂサーバ（この場合はプログラムサーバ）から通信制御部１８を介して前記メモリ１２に読み込まれた装置制御プログラムに応じて、回路各部の動作を制御する。 The CPU 11 is a device control program stored in advance in the memory 12, a device control program read from the external recording medium 13 such as a ROM card into the memory 12 via the recording medium reading unit 14, or a Web on the Internet N. The operation of each part of the circuit is controlled in accordance with a device control program read into the memory 12 from the server (in this case, a program server) via the communication control unit 18.

前記メモリ１２に記憶された装置制御プログラムは、タッチパネル式カラー表示部１７からのユーザ操作に応じた入力信号、あるいは音声入力部１５を介して入力される音声信号、あるいは通信制御部１８を介して接続されるインターネットＮ上の各Ｗｅｂサーバとの通信信号、あるいは記録媒体読取部１４を介して外部接続されるＥＥＰＲＯＭ，ＲＡＭ，ＲＯＭなどのメモリ・カード（記録媒体）１３との接続通信信号に応じて起動される。 The device control program stored in the memory 12 is an input signal corresponding to a user operation from the touch panel type color display unit 17, an audio signal input via the audio input unit 15, or a communication control unit 18. In response to a communication signal with each Web server on the connected Internet N or a connection communication signal with a memory card (recording medium) 13 such as an EEPROM, RAM, or ROM connected externally via the recording medium reading unit 14 Is activated.

このように、前記ＣＰＵ１１には、前記メモリ１２、記録媒体読取部１４、音声入力部１５、音声出力部１６、タッチパネル式カラー表示部１７、通信制御部１８などが接続される。 As described above, the CPU 11 is connected to the memory 12, the recording medium reading unit 14, the audio input unit 15, the audio output unit 16, the touch panel type color display unit 17, the communication control unit 18, and the like.

前記メモリ１２に記憶される装置制御プログラムとしては、当該ＰＤＡ１０の全体の動作を司るシステムプログラムや、通信制御部１８を介してインターネットＮ上の各Ｗｅｂサーバや図示しないユーザＰＣ(Personal Computer)などとデータ通信するための通信プログラムが記憶される他、メール機能や録音／再生機能、録画／再生機能、文書処理機能、スケジュール管理機能等、予め記憶されるか、外部からダウンロードされた各種のアプリケーションプログラムが記憶される。 Examples of the device control program stored in the memory 12 include a system program that controls the overall operation of the PDA 10, web servers on the Internet N via the communication control unit 18, and a user PC (Personal Computer) (not shown). In addition to storing communication programs for data communication, various application programs stored in advance or downloaded from the outside, such as mail function, recording / playback function, recording / playback function, document processing function, schedule management function, etc. Is memorized.

さらに、言語学習のための例文テキストを表示させると共に、当該例文テキストに対応した音声データの入力に応じて同テキストそのものをその音調（上昇調／下降調）に応じた形態にして識別表示させるための言語学習支援プログラム１２ａ、この言語学習支援プログラム１２ａの実行に伴い前記音声データの音声認識処理や単語サンプリング処理、音調解析処理、文節検出処理等を実行するための単語音声認識処理モジュール（プログラム）１２ｂが記憶される。 Furthermore, in order to display example sentence text for language learning, and to identify and display the text itself in a form corresponding to the tone (increase / decrease) in response to input of voice data corresponding to the example sentence text. Language learning support program 12a, and a word speech recognition processing module (program) for executing speech recognition processing, word sampling processing, tone analysis processing, phrase detection processing, and the like of the speech data in accordance with execution of the language learning support program 12a 12b is stored.

また、メモリ１２には、前記言語学習のための模範の例文をそのテキストと音声データの組として複数組記憶する模範例文データベース１２ｃが用意され、その他、前記言語学習支援プログラム１２ａの実行に伴い要する作業領域として、模範音声メモリ１２ｄ、ユーザ音声メモリ１２ｅ、音声認識テキストメモリ（模範／ユーザ）１２ｆ、単語毎認識完了時間サンプリングメモリ（模範／ユーザ）１２ｇ、単語毎語尾波形登録テーブル（模範／ユーザ）１２ｈ、単語毎語尾音調登録テーブル（模範／ユーザ）１２ｉ、単語毎音調還元テキストメモリ（模範／ユーザ）１２ｊ、練習単語設定メモリ１２ｋなどが確保される。 Further, the memory 12 is provided with an exemplary sentence example database 12c that stores a plurality of exemplary sentence examples for language learning as a combination of text and speech data, and is necessary for the execution of the language learning support program 12a. As a work area, model voice memory 12d, user voice memory 12e, voice recognition text memory (model / user) 12f, word recognition completion time sampling memory (model / user) 12g, word-by-word ending waveform registration table (model / user) 12h, word-by-word ending tone registration table (example / user) 12i, word-by-word tone reduction text memory (example / user) 12j, practice word setting memory 12k, and the like are secured.

前記模範音声メモリ１２ｄには、例えば教師による例文の読み上げに伴い音声入力部１５から入力された模範となる音声データ、または前記模範例文データベース１２ｃからユーザ操作に応じて選択的に読み出された例文の音声データ（模範音声データ）が記憶される（図３の（１）参照）。 In the exemplary speech memory 12d, for example, speech data input from the speech input unit 15 as a teacher reads out example sentences, or example sentences selectively read from the model example sentence database 12c according to user operations. Is stored (see (1) in FIG. 3).

前記ユーザ音声メモリ１２ｅには、模範となる例文のテキスト及びその音声に基づいて、例えば学習者により音声入力部１５から入力された音声データ（ユーザ音声データ）が記憶される（図３の（１）参照）。 The user voice memory 12e stores, for example, voice data (user voice data) input from the voice input unit 15 by the learner based on the text of the example sentence as an example and its voice ((1 in FIG. 3). )reference).

前記音声認識テキストメモリ（模範／ユーザ）１２ｆには、前記模範音声データおよびユーザ音声データの波形を、各々前記単語音声認識処理モジュール１２ｂに従い音声認識することによりテキスト化された模範およびユーザのテキストデータが記憶される（図３の（２）参照）。 In the voice recognition text memory (exemplary / user) 12f, the model voice data and user text data are converted into text by recognizing the waveforms of the model voice data and the user voice data according to the word voice recognition processing module 12b. Is stored (see (2) in FIG. 3).

前記単語毎認識完了時間サンプリングメモリ（模範／ユーザ）１２ｇには、前記模範音声データおよびユーザ音声データの音声認識に伴い、その音声データの入力開始時点からの単語毎の認識完了時間ｔ１，ｔ２，…，ｔｎがタイマ等によりサンプリングされて記憶される（図３の（３）参照）。 In the word recognition completion time sampling memory (exemplary / user) 12g, along with the voice recognition of the model voice data and the user voice data, the recognition completion times t1, t2 for each word from the input start time of the voice data. .., Tn are sampled and stored by a timer or the like (see (3) in FIG. 3).

前記単語毎語尾波形登録テーブル（模範／ユーザ）１２ｈには、前記音声認識に伴いサンプリングされた単語毎の認識完了時間ｔ１，ｔ２，…，ｔｎに基づき、当該各認識完了時間ｔ１，ｔ２，…，ｔｎ直前の音声データの波形が、単語毎の語尾部分の波形として登録される（図３の（４）参照）。 In the word-by-word ending waveform registration table (exemplary / user) 12h, the recognition completion times t1, t2,..., Tn based on the recognition completion times t1, t2,. , Tn, the waveform of the speech data immediately before tn is registered as the waveform of the ending part for each word (see (4) in FIG. 3).

前記単語毎語尾音調登録テーブル（模範／ユーザ）１２ｉには、前記単語毎の語尾部分の波形をピッチ解析（例えば特開２０００−２５０４０１参照）することにより判定された同単語毎の語尾部分の音調（上昇調部又は下降調部）が登録される（図３の（５）参照）。また、この単語毎語尾音調登録テーブル（模範／ユーザ）１２ｉには、前記音声認識されたテキストデータの内容に従い検出された文節区切り情報（文節１，２，…）も登録される。 In the word-by-word ending tone registration table (exemplary / user) 12i, the tone of the ending portion for each word determined by pitch analysis of the waveform of the ending portion for each word (see, for example, Japanese Patent Laid-Open No. 2000-250401). (Up adjustment part or down adjustment part) is registered (see (5) in FIG. 3). In addition, in the word-by-word ending tone registration table (exemplary / user) 12i, phrase break information (phrases 1, 2,...) Detected according to the contents of the speech-recognized text data is also registered.

前記単語毎音調還元テキストメモリ（模範／ユーザ）１２ｊには、前記模範およびユーザのテキストデータに対して、前記単語毎の末尾部分の音調に応じた色（上昇調部：赤色／下降調部：青色）を付けると共とに、前記文節区切り情報に応じた位置に区切り記号ａｐを付けて生成された音調区切り還元テキストデータが記憶される（図３の（６）参照）。 In the tone reduction text memory for each word (exemplary / user) 12j, the color corresponding to the tone of the tail part for each word (up tone part: red / down tone part: for the example and user text data). In addition to (blue), tone-delimited reduced text data generated by adding a delimiter ap at a position corresponding to the phrase delimiter information is stored (see (6) in FIG. 3).

前記練習単語設定メモリ１２ｋには、前記単語毎音調還元テキストメモリ（模範／ユーザ）１２ｊに記憶された、模範の音調区切り還元テキストデータとユーザの音調区切り還元テキストデータとを比較し、その音調情報（単語末尾部分の色）に相違があった場合に、当該相違のあった単語が練習対象の単語として記憶される（図６（Ｃ）（Ｄ）参照）。 The practice word setting memory 12k compares the model tone-delimited reduced text data stored in the word-by-word tone-reduced text memory (model / user) 12j with the user's tone-delimited reduced text data, and tone information thereof. If there is a difference in (color at the end of the word), the difference word is stored as a practice target word (see FIGS. 6C and 6D).

図２は、言語学習支援画面Ｇを表示させたタッチパネル式ＰＤＡ１０の外観構成を示す正面図である。 FIG. 2 is a front view showing an external configuration of the touch panel PDA 10 on which the language learning support screen G is displayed.

前記言語学習支援プログラム１２ａによる学習支援処理に伴いタッチパネル式表示部１７に表示される言語学習支援画面Ｇでは、例文に対応した模範の音声データを入力するための模範音声入力モードに設定する［模範］キー１９ｍ、前記模範音声入力モードにおいて前記模範の音声データを前記模範例文データベース１２ｃから選択して入力する際に操作される［例文］キー１９ｉ、例文に対応したユーザの音声データを入力するためのユーザ音声入力モードに設定する［ユーザ］キー１９ｕ、音声入力部１５から模範またはユーザの音声を入力する際に操作される［音声入力］キー１９ｖ、例文に対応した模範（教師）の音声データの音調とユーザ（学習者）の音声データの音調とが相違した場合に練習モードに設定するための［練習］キー１９ｐが表示される。 In the language learning support screen G displayed on the touch panel display unit 17 in accordance with the learning support processing by the language learning support program 12a, the model voice input mode for inputting the model voice data corresponding to the example sentence is set. ] Key 19m, operated when selecting the model voice data from the model example sentence database 12c and inputting it in the model voice input mode [example sentence] key 19i, for inputting user voice data corresponding to the example sentence [User] key 19u set in the user voice input mode, [Voice input] key 19v operated when inputting the voice of the model or the user from the voice input unit 15, voice data of the model (teacher) corresponding to the example sentence [Practice] to set to practice mode when the tone of the user and the tone of the voice data of the user (learner) are different Over 19p is displayed.

そして、前記単語毎音調還元テキストメモリ（模範／ユーザ）１２ｊに記憶される模範の音調区切り還元テキストデータＴｍは、学習テキスト表示ウインドウ１７Ｔの模範表示エリア１７Ｍに表示され、ユーザの音調区切り還元テキストデータＴｕは、同学習テキスト表示ウインドウ１７Ｔのユーザ表示エリア１７Ｕに表示される。 The model tone-delimited reduced text data Tm stored in the word-by-word tone-reduction text memory (model / user) 12j is displayed in the model display area 17M of the learning text display window 17T, and the user's tone-delimited reduced text data. Tu is displayed in the user display area 17U of the learning text display window 17T.

このように構成されたタッチパネル式ＰＤＡ１０は、ＣＰＵ１１が前記言語学習支援プログラム１２ａや単語音声認識処理モジュール１２ｂに記述された命令に従い回路各部の動作を制御し、ソフトウエアとハードウエアとが協働して動作することにより、以下の動作説明で述べる機能を実現する。 In the touch panel PDA 10 configured as described above, the CPU 11 controls the operation of each part of the circuit in accordance with instructions described in the language learning support program 12a and the word speech recognition processing module 12b, and the software and hardware cooperate with each other. The functions described in the following operation explanation are realized.

次に、前記構成のタッチパネル式ＰＤＡ１０による言語学習支援機能に伴う具体的動作について説明する。 Next, a specific operation associated with the language learning support function by the touch panel PDA 10 having the above configuration will be described.

図３は、前記タッチパネル式ＰＤＡ１０の言語学習支援機能に伴う基本動作を説明する図である。 FIG. 3 is a diagram for explaining a basic operation associated with the language learning support function of the touch panel PDA 10.

図４は、前記タッチパネル式ＰＤＡ１０の言語学習支援処理を示すフローチャートである。 FIG. 4 is a flowchart showing language learning support processing of the touch panel PDA 10.

図５は、前記タッチパネル式ＰＤＡ１０の言語学習支援処理に伴う音声認識＆音調検出処理を示すフローチャートである。 FIG. 5 is a flowchart showing a voice recognition & tone detection process associated with the language learning support process of the touch panel PDA 10.

図６は、前記タッチパネル式ＰＤＡ１０の言語学習支援処理に伴う学習テキスト表示ウインドウ１７Ｔでの表示動作（その１）を示す図である。 FIG. 6 is a diagram showing a display operation (part 1) in the learning text display window 17T accompanying the language learning support processing of the touch panel PDA 10.

図７は、前記タッチパネル式ＰＤＡ１０の言語学習支援処理に伴う学習テキスト表示ウインドウ１７Ｔでの表示動作（その２）を示す図である。 FIG. 7 is a diagram showing a display operation (part 2) in the learning text display window 17T accompanying the language learning support processing of the touch panel PDA 10.

ユーザ操作に応じて言語学習支援プログラム１２ａが起動されると、タッチパネル式表示部１７に前記［模範］キー１９ｍ〜［練習］キー１９ｐを配列した言語学習支援画面Ｇ（図２参照）が表示される。 When the language learning support program 12a is started in response to a user operation, the language learning support screen G (see FIG. 2) in which the [exemplary] key 19m to the [practice] key 19p are arranged is displayed on the touch panel display unit 17. The

音読学習の対象となる任意の例文について模範となる読み上げ音声を入力するために、［模範］キー１９ｍがタッチされ模範音声入力モードに設定された後（ステップＳ１（Ｙｅｓ））、［音声入力］キー１９ｖがタッチされ（ステップＳ２（Ｙｅｓ））、例えば教師による例文「Get ready to say goodbye to the old, hello to the new.」の読み上げ音声が、音声入力部１５から順次入力されると（ステップＳ３）、図３の（１）に示すように、入力された模範の音声データは模範音声メモリ１２ｄに記憶され、単語音声認識処理モジュール１２ｂに従い順次音声認識処理されると共に音調検出処理される（ステップＳＡ）。 In order to input a reading voice as a model for an arbitrary example sentence to be read aloud, the [model] key 19m is touched to set the model voice input mode (step S1 (Yes)), then [voice input] When the key 19v is touched (step S2 (Yes)) and, for example, the reading voice of the example sentence “Get ready to say goodbye to the old, hello to the new” by the teacher is sequentially input from the voice input unit 15 (step S3) As shown in (1) of FIG. 3, the inputted model voice data is stored in the model voice memory 12d, and is sequentially subjected to voice recognition processing and tone detection processing according to the word voice recognition processing module 12b ( Step SA).

一方、前記［模範］キー１９ｍがタッチされ模範音声入力モードに設定された後（ステップＳ１（Ｙｅｓ））、［例文］キー１９ｉがタッチされると（ステップＳ４（Ｙｅｓ））、図３の（１）に示すように、模範例文データベース１２ｃからユーザ操作に応じて選択された模範の例文（例えば「Get ready to say goodbye to the old, hello to the new.」）の音声データが順次読み出されて音声出力部１６から出力されると共に、模範音声メモリ１２ｄに記憶される（ステップＳ５）。そして、単語音声認識処理モジュール１２ｂに従い順次音声認識処理されると共に音調検出処理される（ステップＳＡ）。 On the other hand, after the [exemplary] key 19m is touched to set the exemplary voice input mode (step S1 (Yes)), when the [example sentence] key 19i is touched (step S4 (Yes)), ( As shown in 1), the voice data of the model sentences (for example, “Get ready to say goodbye to the old, hello to the new”) selected according to the user operation are sequentially read from the model sentence database 12c. Are output from the audio output unit 16 and stored in the exemplary audio memory 12d (step S5). Then, in accordance with the word speech recognition processing module 12b, the speech recognition processing and tone detection processing are performed sequentially (step SA).

前記模範音声メモリ１２ｄに記憶される模範の音声データは、順次、単語音声認識処理モジュール１２ｂに入力され（ステップＡ１）、図３の（２）に示すように、テキスト化された模範のテキストデータ「Get ready to say goodbye to the old, hello to the new.」として音声認識テキストメモリ（模範）１２ｆに記憶される（ステップＡ２）。 The model voice data stored in the model voice memory 12d is sequentially input to the word voice recognition processing module 12b (step A1), and the model text data is converted into text as shown in (2) of FIG. “Get ready to say goodbye to the old, hello to the new” is stored in the speech recognition text memory (exemplary) 12f (step A2).

また、前記模範の音声データの音声認識に伴い、その音声データの入力開始時点からの単語毎の認識完了時間ｔ１，ｔ２，…，ｔｎがタイマ等によりサンプリングされ、図３の（３）に示すように、単語毎認識完了時間サンプリングメモリ（模範）１２ｇに記憶される（ステップＡ３）。 Further, along with the voice recognition of the model voice data, the recognition completion times t1, t2,..., Tn for each word from the voice data input start time are sampled by a timer or the like, as shown in (3) of FIG. Thus, the recognition completion time for each word is stored in the sampling memory (exemplary) 12g (step A3).

すると、前記サンプリングされた模範の音声データの単語毎の認識完了時間ｔ１，ｔ２，…，ｔｎに基づいて、当該各認識完了時間ｔ１，ｔ２，…，ｔｎ直前の模範の音声データの波形が、単語毎の語尾部分の波形として抽出され、図３の（４）に示すように、単語毎語尾波形登録テーブル（模範）１２ｈに登録される（ステップＡ４）。 Then, based on the recognition completion times t1, t2,..., Tn for each word of the sampled exemplary speech data, the waveform of the exemplary speech data immediately before each recognition completion time t1, t2,. It is extracted as a waveform of the ending portion for each word and is registered in the ending waveform registration table (exemplary) 12h for each word as shown in (4) of FIG. 3 (step A4).

すると、前記単語毎語尾波形登録テーブル（模範）１２ｈに登録された模範音声の単語毎の語尾部分の波形がピッチ解析されて当該語尾部分の音調（上昇調部又は下降調部）が判定され、図３の（５）に示すように、単語毎語尾音調登録テーブル（模範）１２ｉに登録される（ステップＡ５）。 Then, the waveform of the ending part for each word of the exemplary speech registered in the word-by-word ending waveform registration table (exemplary) 12h is subjected to pitch analysis, and the tone (ascending tone part or descending tone part) of the ending part is determined. As shown in (5) of FIG. 3, it is registered in the word-by-word ending tone registration table (exemplary) 12i (step A5).

この際、前記音声認識された模範のテキストデータ「Get ready to say goodbye to the old, hello to the new.」の内容に従い文節区切り情報（文節１，２，…）が検出され、前記単語毎語尾音調登録テーブル（模範）１２ｉに登録される（ステップＡ６）。 At this time, phrase delimiter information (phrases 1, 2,...) Is detected according to the content of the voice-recognized model text data “Get ready to say goodbye to the old, hello to the new.”. It is registered in the tone registration table (example) 12i (step A6).

すると、前記音声認識テキストメモリ（模範）１２ｆに記憶された模範のテキストデータ「Get ready to say goodbye to the old, hello to the new.」に対して、図３の（６）に示すように、前記単語毎語尾音調登録テーブル（模範）１２ｉに登録された単語毎の末尾部分の音調に応じた色（上昇調部：赤色／下降調部：青色）が付けられると共に、前記文節区切り情報（文節１，２，…）に応じた位置に区切り記号ａｐが付けられた模範の音調区切り還元テキストデータＴｍ「Get(赤) ready(青) to say / goodbye(青) to the old(青), hello(赤) to the(赤) new(青).」が生成され、単語毎音調還元テキストメモリ（模範）１２ｊに記憶される。そして、この単語毎音調還元テキストメモリ（模範）１２ｊに生成記憶された模範の音調区切り還元テキストデータＴｍは、図６（Ａ）に示すように、前記模範の音声データの入力に合わせて、順次、学習テキスト表示ウインドウ１７Ｔの模範表示エリア１７Ｍに表示される（ステップＳ６）。 Then, with respect to the model text data “Get ready to say goodbye to the old, hello to the new.” Stored in the voice recognition text memory (model) 12f, as shown in (6) of FIG. A color (ascending tone part: red / descending tone part: blue) corresponding to the tone of the tail part of each word registered in the word-by-word ending tone registration table (exemplary) 12i is added, and the phrase delimiter information (sentence) 1, 2, ...) An example tone-delimited reduced text data Tm “Get (red) ready (blue) to say / goodbye (blue) to the old (blue), hello” (Red) to the (red) new (blue). "Is generated and stored in the tone-reduction text memory (example) 12j for each word. Then, the exemplary tone-delimited reduction text data Tm generated and stored in the word-by-word tone reduction text memory (exemplary) 12j is sequentially displayed in accordance with the input of the exemplary voice data as shown in FIG. Then, it is displayed in the model display area 17M of the learning text display window 17T (step S6).

この後、前記模範の音声データの入力に合わせて、図６（Ｂ）に示すように、学習テキスト表示ウインドウ１７Ｔの模範表示エリア１７Ｍに模範の音調区切り還元テキストデータＴｍ「Get(赤) ready(青) to say / goodbye(青) to the old(青), / hello(赤) to the(赤) new(青).」が表示された際に、これを模範としたユーザ（学習者）の読み上げ音声を入力するために、［ユーザ］キー１９ｕがタッチされユーザ音声入力モードに設定される（ステップＳ７（Ｙｅｓ））。 Thereafter, in accordance with the input of the exemplary speech data, as shown in FIG. 6B, the exemplary tone-delimited reduced text data Tm “Get (red) ready () is displayed in the exemplary display area 17M of the learning text display window 17T. (Blue) to say / goodbye (blue) to the old (blue), / hello (red) to the (red) new (blue). " In order to input the reading voice, the [user] key 19u is touched to set the user voice input mode (step S7 (Yes)).

そして、［音声入力］キー１９ｖがタッチされ（ステップＳ８（Ｙｅｓ））、ユーザによる前記模範の音調区切り還元テキストデータＴｍの読み上げ音声が、音声入力部１５から順次入力されると（ステップＳ９）、入力されたユーザの音声データは、図３の（１）に示すように、ユーザ音声メモリ１２ｅに記憶され、前記模範の音声データに対する処理と同様に（図３の（２）〜（６）参照）、単語音声認識処理モジュール１２ｂに従い順次音声認識処理されると共に音調検出処理される（ステップＳＡ）。 Then, when the [speech input] key 19v is touched (step S8 (Yes)), and the reading voice of the model tone-delimited reduced text data Tm is sequentially input from the voice input unit 15 (step S9). The input user's voice data is stored in the user voice memory 12e as shown in (1) of FIG. 3, and similar to the processing for the model voice data (see (2) to (6) of FIG. 3). ), Voice recognition processing and tone detection processing are sequentially performed in accordance with the word voice recognition processing module 12b (step SA).

すると、音声認識テキストメモリ（ユーザ）１２ｆに記憶されたユーザのテキストデータ「Get ready to say goodbye to the old, hello to the new.」に対して、単語毎語尾音調登録テーブル（ユーザ）１２ｉに登録された単語毎の末尾部分の音調に応じた色（上昇調部：赤色／下降調部：青色）が付けられると共に、文節区切り情報（文節１，２，…）に応じた位置に区切り記号ａｐが付けられたユーザの音調区切り還元テキストデータＴｕ「Get(赤) ready(赤) to say / goodbye(青) to the old(青), / hello(赤) to the(赤) new(青).」が生成され、単語毎音調還元テキストメモリ（ユーザ）１２ｊに記憶される。そして、この単語毎音調還元テキストメモリ（ユーザ）１２ｊに生成記憶されたユーザの音調区切り還元テキストデータＴｕは、図６（Ｂ）〜（Ｃ）に示すように、前記ユーザの音声データの入力に合わせて、順次、学習テキスト表示ウインドウ１７Ｔのユーザ表示エリア１７Ｕに表示される（ステップＳ１０）。 Then, the user's text data “Get ready to say goodbye to the old, hello to the new.” Stored in the speech recognition text memory (user) 12f is registered in the word-by-word ending tone registration table (user) 12i. A color corresponding to the tone of the tail part of each word (ascending tone part: red / descending tone part: blue) is added, and a delimiter ap at a position corresponding to the phrase delimiter information (sentences 1, 2,...) Tone / goodbye (blue) to the old (blue), hello (red) to the (red) new (blue). Is generated and stored in the word-by-word tone reduction text memory (user) 12j. Then, the user's tone delimiter reduction text data Tu generated and stored in the word-by-word tone reduction text memory (user) 12j is input to the user's voice data as shown in FIGS. In addition, they are sequentially displayed in the user display area 17U of the learning text display window 17T (step S10).

ここで、前記単語毎音調還元テキストメモリ（模範／ユーザ）１２ｊに記憶された、模範の音調区切り還元テキストデータＴｍとユーザの音調区切り還元テキストデータＴｕとが比較され、その音調情報（各単語末尾部分の色）に相違があるか否か判断される（ステップＳ１１）。 Here, the model tone-delimited reduced text data Tm and the user tone-delimited reduced text data Tu stored in the word-by-word tone-reduced text memory (model / user) 12j are compared, and the tone information (the end of each word) is compared. It is determined whether or not there is a difference in the color of the part (step S11).

そして、前記模範の音調区切り還元テキストデータＴｍの音調情報とユーザの音調区切り還元テキストデータＴｕの音調情報とに相違がないと判断された場合には（ステップＳ１１（Ｎｏ））、ユーザ音声の音調が模範音声の音調に一致している旨のメッセージ（「GOOD!」等）が表示される（ステップＳ１３）。 When it is determined that there is no difference between the tone information of the exemplary tone-delimited reduced text data Tm and the tone information of the user's tone-delimited reduced text data Tu (step S11 (No)), the tone of the user voice A message (“GOOD!” Or the like) is displayed indicating that the tone matches the tone of the model voice (step S13).

一方、前記ステップＳ１１において、前記図６（Ｃ）で示したように、模範の音調区切り還元テキストデータＴｍの先頭から２番目の単語「ready(青)」の音調情報(青)とユーザの音調区切り還元テキストデータＴｕの同２番目の単語「ready(赤)」の音調情報(赤)とに相違があると判断された場合には（ステップＳ１１（Ｙｅｓ））、当該音調情報に相違のある模範およびユーザ各々のテキストの部分“ｙ”に黄色マーカＭｙが付加されて識別表示される（ステップＳ１２）。 On the other hand, in step S11, as shown in FIG. 6C, the tone information (blue) of the second word “ready (blue)” from the top of the exemplary tone-delimited reduced text data Tm and the tone of the user If it is determined that there is a difference between the tone information (red) of the second word “ready (red)” of the delimited reduction text data Tu (step S11 (Yes)), the tone information is different. A yellow marker My is added to the text portion “y” of each of the model and the user for identification (step S12).

ここで、図６（Ｄ）に示すように、［練習］キー１９ｐがタッチされると（ステップＳ１４（Ｙｅｓ））、前記音調情報（単語末尾部分の色）に相違があった単語「ready」が練習対象の単語として練習単語設定メモリ１２ｋに記憶され、前記模範表示エリア１７Ｍに表示されている模範の音調区切り還元テキストデータＴｍの中の前記練習対象の単語「ready」が赤下線ＲＬにより識別表示される（ステップＳ１５）。 Here, as shown in FIG. 6D, when the [Practice] key 19p is touched (Step S14 (Yes)), the word “ready” whose tone information (color at the end of the word) has been different. Is stored in the practice word setting memory 12k as a practice target word, and the practice target word “ready” in the model tone-delimited reduction text data Tm displayed in the model display area 17M is identified by a red underline RL. It is displayed (step S15).

すると、前記模範音声メモリ１２ｄに記憶されている模範の音声データのうち、前記赤下線ＲＬにより識別表示された練習対象の単語「ready」に対応する部分の音声データが、前記単語毎認識完了時間サンプリングメモリ（模範）１２ｇに記憶されている該当単語「ready」のサンプリング時間ｔ１〜ｔ２に基づき読み出されて音声出力部１６から出力される（ステップＳ１６）。 Then, among the exemplary speech data stored in the exemplary speech memory 12d, the speech data of the portion corresponding to the practice target word “ready” identified and displayed by the red underline RL is the recognition completion time for each word. The corresponding word “ready” stored in the sampling memory (exemplary) 12g is read based on the sampling times t1 to t2 and output from the audio output unit 16 (step S16).

そして、前記音声出力部１６から出力された前記練習対象の単語「ready」に対応する部分の模範の音声に従い、ユーザにより発声された同単語「ready」の読み上げ音声が音声入力部１５から入力されると（ステップＡ１７）、当該ユーザの音声データ“ready”は、ユーザ音声メモリ１２ｅに記憶され、前記同様に、単語音声認識処理モジュール１２ｂに従い音声認識処理されると共に音調検出処理される（ステップＳＡ）。 Then, in accordance with the exemplary voice of the portion corresponding to the practice target word “ready” output from the voice output unit 16, a reading voice of the same word “ready” uttered by the user is input from the voice input unit 15. Then, the user's voice data “ready” is stored in the user voice memory 12e and is subjected to voice recognition processing and tone detection processing according to the word voice recognition processing module 12b (step SA). ).

すると、音声認識テキストメモリ（ユーザ）１２ｆに記憶されたユーザのテキストデータ「ready」に対して、単語毎語尾音調登録テーブル（ユーザ）１２ｉに登録された当該単語の末尾部分の音調に応じた色（上昇調部：赤色／下降調部：青色）が付けられると共に、文節区切り情報（文節１，２，…）に応じた位置に区切り記号ａｐが付けられた（ここでは１単語「ready」のため区切り記号ａｐは付かない）ユーザの音調区切り還元テキストデータＴｕ「ready(青)」が生成され、単語毎音調還元テキストメモリ（ユーザ）１２ｊに記憶される。そして、この単語毎音調還元テキストメモリ（ユーザ）１２ｊに生成記憶されたユーザの音調区切り還元テキストデータＴｕ「ready(青)」は、図６（Ｄ）に示すように、前記ユーザの音声データの入力に合わせて、学習テキスト表示ウインドウ１７Ｔのユーザ表示エリア１７Ｕに赤下線ＲＬが付加されて表示される（ステップＳ１８）。 Then, for the user's text data “ready” stored in the speech recognition text memory (user) 12f, a color corresponding to the tone of the tail part of the word registered in the word-by-word tail tone registration table (user) 12i (Ascending tone part: red / descending tone part: blue) and a delimiter ap at a position corresponding to the phrase delimiter information (sentences 1, 2,...) (Here, one word “ready” Therefore, the user's tone delimitation reduction text data Tu “ready (blue)” is generated and stored in the word-by-word tone reduction text memory (user) 12j. Then, the user's tone-delimited reduced text data Tu “ready (blue)” generated and stored in the word-by-word tone reduction text memory (user) 12j is, as shown in FIG. In accordance with the input, a red underline RL is added and displayed in the user display area 17U of the learning text display window 17T (step S18).

ここで、前記単語毎音調還元テキストメモリ（模範／ユーザ）１２ｊに記憶された、前記練習対象単語「ready」に対応した模範の音調区切り還元テキストデータＴｍとユーザの音調区切り還元テキストデータＴｕとが比較され、その音調情報（単語末尾部分の色）に相違があると判断された場合には（ステップＳ１９（Ｙｅｓ））、前記同様に、当該音調情報に相違のある模範およびユーザ各々のテキストの部分に黄色マーカＭｙが付加されて識別表示され（ステップＳ２０）、前記ステップＳ１６からの同一の練習対象単語でのやり直しの処理となる。 Here, the tone delimited text data Tm of the model corresponding to the practice target word “ready” and the tone delimited reduced text data Tu of the user stored in the word-by-word tone reduced text memory (model / user) 12j. If it is compared and it is determined that there is a difference in the tone information (color at the end of the word) (step S19 (Yes)), similarly to the above, the example of the difference in the tone information and the text of each user The yellow marker My is added to the portion for identification and display (step S20), and the process of redoing with the same practice target word from step S16 is performed.

一方、前記図６（Ｄ）で示したように、前記練習対象単語「ready」に対応した模範の音調区切り還元テキストデータＴｍ「ready(青)」とユーザの音調区切り還元テキストデータＴｕ「ready(青)」との音調情報（単語末尾部分の色）に相違がないと判断された場合には（ステップＳ１９（Ｎｏ））、これまで模範およびユーザ各々のテキストの部分“ｙ”に付加されていた黄色マーカＭｙが消去される代わりにマーカ枠Ｍのみが付加され、音読誤り箇所の履歴として識別表示される（ステップＳ２１）。 On the other hand, as shown in FIG. 6D, an exemplary tone-delimited reduced text data Tm “ready (blue)” corresponding to the practice target word “ready” and the user's tone-delimited reduced text data Tu “ready ( If it is determined that there is no difference in tone information (color at the end of the word) with “blue)” (step S19 (No)), it has been added to the text portion “y” of each model and user so far. Instead of erasing the yellow marker My, only the marker frame M is added and is identified and displayed as a history of a reading error part (step S21).

すると、前記練習単語設定メモリ１２ｋに現在記憶されている練習対象の単語「ready」が、前記音声認識テキストメモリ（模範）１２ｆに記憶されている模範のテキストデータ「Get ready to say goodbye to the old, hello to the new.」の全体となったか否か判断される（ステップＳ２２）。 Then, the word “ready” to be practiced currently stored in the practice word setting memory 12k is converted to the model text data “Get ready to say goodbye to the old” stored in the speech recognition text memory (example) 12f. , hello to the new. ”is determined (step S22).

ここで、前記練習対象の単語「ready」が、前記模範のテキストデータ「Get ready to say goodbye to the old, hello to the new.」の全体とはなっていないと判断されると（ステップＳ２２（Ｎｏ））、当該練習対象の単語の文字列が、元の音調相違部分の単語「ready」を含めて、それより前方および後方の単語まで１単語ずつ延長され、図７（Ａ）に示すように、模範表示エリア１７Ｍに表示されている模範の音調区切り還元テキストデータＴｍの中の前記延長された練習対象の単語「Get ready」が赤下線ＲＬにより識別表示される（ステップＳ２３）。 Here, when it is determined that the word “ready” to be practiced is not the entire text data “Get ready to say goodbye to the old, hello to the new.” (Step S22 ( No)), the character string of the word to be practiced is extended one word at a time up to and including the word “ready” of the original tone difference part, as shown in FIG. Then, the extended practice target word “Get ready” in the model tone-delimited reduced text data Tm displayed in the model display area 17M is identified and displayed by the red underline RL (step S23).

すると、前記ステップＳ１６〜Ｓ２３による音読練習処理が、図７（Ａ）〜（Ｄ）に示すように、前記練習単語設定メモリ１２ｋに記憶される練習対象単語が１単語ずつ延長されながら繰り返し同様に実行される。 Then, the reading aloud practice process in steps S16 to S23 is repeated in the same manner as shown in FIGS. 7A to 7D while the practice target words stored in the practice word setting memory 12k are extended one word at a time. Executed.

そしてこの後、前記ステップＳ２２において、図７（Ｄ）に示すように、その練習対象の単語が、前記模範のテキストデータ「Get ready to say goodbye to the old, hello to the new.」の全体に達したと判断されると（ステップＳ２２（Ｙｅｓ））、前記一連の言語学習支援処理が終了される。 Thereafter, in step S22, as shown in FIG. 7D, the word to be practiced is added to the entire text data “Get ready to say goodbye to the old, hello to the new”. If it is determined that it has been reached (step S22 (Yes)), the series of language learning support processing is terminated.

したがって、前記構成のタッチパネル式ＰＤＡ１０による言語学習支援機能によれば、例文を読み上げる模範の音声データを、教師等による読み上げにより音声入力部１５から直接あるいは模範例文データベース１２ｃから選択し音声出力部１６から出力しながら入力すると、当該模範の音声データが音声認識によりテキスト化されると共に、単語毎に語尾部分の音調が判定され、前記テキスト化された各単語の語尾の文字をその音調に応じた形態（色，字体，模様，マーク付加等）で識別した模範のテキストデータＴｍが生成されて模範表示エリア１７Ｍに表示される。この模範の音声データおよびテキストデータＴｍに習って、ユーザ（学習者）による同例文の読み上げ音声を入力すると、前記同様の音声認識テキスト化処理および音調判定処理により各単語の語尾部分を識別して生成されたユーザのテキストデータＴｕが前記模範表示エリア１７Ｍに並べたユーザ表示エリア１７Ｕに対比表示される。 Therefore, according to the language learning support function by the touch panel PDA 10 having the above-described configuration, the model voice data to be read out is selected from the voice input unit 15 directly or from the model example sentence database 12c and read from the voice output unit 16 by the teacher. When input while outputting, the voice data of the model is converted into text by voice recognition, the tone of the ending portion is determined for each word, and the ending character of each word converted to the text is in accordance with the tone The model text data Tm identified by (color, font, pattern, mark addition, etc.) is generated and displayed in the model display area 17M. Learning the voice data and text data Tm of this model, when the user (learner) reads out the speech of the same example sentence, the ending part of each word is identified by the same voice recognition text conversion processing and tone determination processing. The generated user text data Tu is displayed in comparison with the user display area 17U arranged in the model display area 17M.

これにより、ユーザ（学習者）は、テキストについて音声入力した際に、そのテキストについて適切に音調が変更された音声になっているかどうか、目視で容易に確認することができる。また、模範音声とユーザ入力音声とを対比して表示するようにしたので、テキストのどの部分の音声が模範音声と合っていて、どの部分が間違っているのか、目視で容易に確認することが可能になる。 Thereby, when a user (learner) inputs a voice for a text, it can be easily confirmed visually whether or not the voice is appropriately changed in tone for the text. In addition, since the model voice and the user input voice are displayed in contrast, it is easy to visually check which part of the text matches the model voice and which part is wrong. It becomes possible.

また、前記構成のタッチパネル式ＰＤＡ１０による言語学習支援機能によれば、前記模範表示エリア１７Ｍに表示された模範のテキストデータＴｍに対して、前記ユーザ表示エリア１７Ｕに表示されたユーザのテキストデータＴｕに相違する部分（各単語の語尾部分の音調に応じた識別形態が相違する）がある場合には、該当する部分が黄色マーカＭｙにより識別表示される。 Further, according to the language learning support function by the touch panel PDA 10 having the above configuration, the text data Tu of the user displayed in the user display area 17U is compared with the text data Tm of the model displayed in the model display area 17M. If there is a different part (the identification form corresponding to the tone of the ending part of each word is different), the corresponding part is identified and displayed by the yellow marker My.

これにより、ユーザ（学習者）は、テキストのどの部分の音声が模範音声と合っていて、どの部分が間違っているのか、目視でより明確に確認することができる。 Thereby, the user (learner) can more clearly confirm visually which part of the text matches the model voice and which part is wrong.

さらに、前記構成のタッチパネル式ＰＤＡ１０による言語学習支援機能によれば、前記黄色マーカＭｙにより識別表示された単語が練習対象単語に設定され、前記模範のテキストデータＴｍの中の練習対象単語が赤下線ＲＬにより識別表示されると共に、当該単語部分の模範の音声データが切り出されて出力される。この模範のテキストデータＴｍの中の練習対象単語の識別表示および該当単語の模範の音声データに習って、ユーザ（学習者）による同単語の読み上げ音声を入力すると、前記同様の音声認識テキスト化処理および音調判定処理により生成された該当単語のテキストデータＴｕがユーザ表示エリア１７Ｕに対比表示される。 Furthermore, according to the language learning support function using the touch panel PDA 10 having the above configuration, the word identified and displayed by the yellow marker My is set as a practice target word, and the practice target word in the exemplary text data Tm is underlined in red. While being identified and displayed by the RL, exemplary voice data of the word part is cut out and output. If the user (learner) reads out the speech of the word in accordance with the identification display of the practice target word in the model text data Tm and the voice data of the model of the corresponding word, the same speech recognition text conversion process as described above is performed. In addition, the text data Tu of the corresponding word generated by the tone determination process is displayed in the user display area 17U.

これにより、文の読み上げ音声の音調に対しユーザの読み上げ音声の音調が相違した単語だけを切り出して効果的な音読練習を行うことができる。 As a result, it is possible to effectively practice reading aloud by cutting out only words whose tone of the reading voice of the user is different from that of the reading voice of the sentence.

なお、前記実施形態では、英語の例文の読み上げ音声の音声認識テキスト化処理および音調判定処理により、各単語の末尾部分をその音調（上昇調／下降調）に応じた形態（色，字体，模様，マーク付加等）にして識別したテキストデータを生成して表示する構成とした。次の図８に示すように、漢字の配列よりなる中国語の場合には、前記音声認識テキスト化処理および音調判定処理により、各漢字をその音調（四声）に応じた形態（１声；赤字や赤上線付加／２声；青字や青上昇線付加／３声；茶字や茶下線付加／４声；緑字や緑下降線付加）にして識別したテキストデータを生成して表示する構成とすればよい。 In the above-described embodiment, the end part of each word is converted into a form (color, font, pattern) corresponding to its tone (up / down) by means of speech recognition text-to-speech conversion processing and tone judgment processing of English example sentences. , Mark addition, etc.) to generate and display the identified text data. As shown in FIG. 8, in the case of Chinese composed of an array of kanji, each kanji is in a form corresponding to its tone (four voices) by the voice recognition text conversion process and tone determination process (one voice; Generated and displayed text data identified as red letters or red lines added / 2 voices; blue letters or blue rising lines added / 3 voices; brown letters or brown lines added / 4 voices; green letters or green descending lines added) What is necessary is just composition.

図８は、前記タッチパネル式ＰＤＡ１０の言語学習支援処理による中国語の音読学習に伴う学習テキスト表示ウインドウ１７Ｔでの表示動作を示す図である。 FIG. 8 is a diagram showing a display operation in the learning text display window 17T accompanying Chinese reading aloud by the language learning support processing of the touch panel PDA 10.

同図８において、学習テキスト表示ウインドウ１７Ｔの模範表示エリア１７Ｍには、模範の中国語例文の音調区切り還元テキストデータＴｍが、音声認識されたピンイン表記Ｐｉｎと併せて表示され、ユーザ表示エリア１７Ｕには、当該模範の音調区切り還元テキストデータＴｍに習ってユーザ音声入力された音調区切り還元テキストデータＴｕが対比表示される。この場合も前記実施形態と同様に、模範のテキストデータＴｍとユーザのテキストデータＴｕとでその音調が相違した漢字は、黄色マーカＭｙにより識別表示される。 In FIG. 8, in the model display area 17M of the learning text display window 17T, the tone-delimited reduced text data Tm of the model Chinese example sentence is displayed together with the voice-recognized Pinyin notation Pin and displayed in the user display area 17U. Is displayed in contrast with the tone-delimited reduced text data Tu inputted by the user's voice following the model tone-delimited reduced text data Tm. Also in this case, as in the above-described embodiment, kanji whose tone differs between the exemplary text data Tm and the user text data Tu are identified and displayed by the yellow marker My.

なお、前記音調表示制御装置の実施形態において記載したタッチパネル式ＰＤＡ１０による各処理の手法およびデータベース、すなわち、図４のフローチャートに示す言語学習支援処理、図５のフローチャートに示す前記言語学習支援処理に伴う音声認識＆音調検出処理等の各手法、および模範例文データベース１２ｃは、何れもコンピュータに実行させることができるプログラムとして、メモリ・カード（ＲＯＭカード、ＲＡＭカード等）、磁気ディスク（フロッピディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の外部記録媒体１３に格納して配布することができる。そして、音声入出力部（１５，１６）および表示部（１７）を備えた電子機器のコンピュータは、この外部記録媒体１３に記憶されたプログラムを記憶装置（１２）に読み込み、この読み込んだプログラムによって動作が制御されることにより、前記実施形態において説明した言語学習支援機能を実現し、前述した手法による同様の処理を実行することができる。 It should be noted that each processing method and database by the touch panel PDA 10 described in the embodiment of the tone display control device, that is, the language learning support process shown in the flowchart of FIG. 4 and the language learning support process shown in the flowchart of FIG. Each method such as voice recognition & tone detection processing and the exemplary sentence database 12c are programs that can be executed by a computer, such as a memory card (ROM card, RAM card, etc.), magnetic disk (floppy disk, hard disk, etc.). ), An optical disc (CD-ROM, DVD, etc.), an external recording medium 13 such as a semiconductor memory, and the like can be distributed. Then, the computer of the electronic device having the voice input / output units (15, 16) and the display unit (17) reads the program stored in the external recording medium 13 into the storage device (12), and uses the read program. By controlling the operation, it is possible to realize the language learning support function described in the embodiment and execute the same processing by the method described above.

また、前記各手法を実現するためのプログラムのデータは、プログラムコードの形態としてネットワークＮ上を伝送させることができ、このプログラムデータを、ネットワークＮに接続されたコンピュータに通信制御部１８によって取り込むことで、前述した言語学習支援機能を実現することもできる。 Further, program data for realizing each of the above methods can be transmitted on the network N in the form of a program code, and this program data is taken into the computer connected to the network N by the communication control unit 18. Thus, the language learning support function described above can also be realized.

本願発明は、前記各実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。さらに、前記各実施形態には種々の段階の発明が含まれており、開示される複数の構成要件における適宜な組み合わせにより種々の発明が抽出され得る。例えば、各実施形態に示される全構成要件から幾つかの構成要件が削除されたり、幾つかの構成要件が異なる形態にして組み合わされても、発明が解決しようとする課題の欄で述べた課題が解決でき、発明の効果の欄で述べられている効果が得られる場合には、この構成要件が削除されたり組み合わされた構成が発明として抽出され得るものである。 The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention when it is practiced. Further, each of the embodiments includes inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some constituent elements are deleted from all the constituent elements shown in each embodiment or some constituent elements are combined in different forms, the problems described in the column of the problem to be solved by the invention If the effects described in the column “Effects of the Invention” can be obtained, a configuration in which these constituent requirements are deleted or combined can be extracted as an invention.

以下に、本願出願の当初の特許請求の範囲に記載された発明を付記する。 Hereinafter, the invention described in the scope of claims of the present application will be appended.

［１］
表示部と、
文の読み上げ音声を入力する音声入力手段と、
この音声入力手段により入力された音声を認識してテキスト化する音声認識手段と、
前記音声入力手段により入力された音声の音調を前記音声認識手段によりテキスト化された単語毎に判定する音調判定手段と、
前記音声認識手段によりテキスト化されたテキストデータと前記音調判定手段により判定された単語毎の音調に基づいて、各単語をその音調に応じた形態に識別してなる音調還元テキストデータを生成する音調還元テキスト生成手段と、
この音調還元テキスト生成手段により生成された音調還元テキストデータを前記表示部に表示させる音調還元テキスト表示制御手段と、
を備えたことを特徴とする音調表示制御装置。 [1]
A display unit;
A voice input means for inputting a sentence reading voice;
Voice recognition means for recognizing the voice input by the voice input means and converting it into text;
A tone determination unit that determines the tone of the voice input by the voice input unit for each word that is converted into text by the voice recognition unit;
Based on the text data converted into text by the voice recognition means and the tone for each word determined by the tone determination means, the tone for generating tone reduction text data by identifying each word in a form corresponding to the tone. Reduction text generation means,
Tone reduction text display control means for displaying the tone reduction text data generated by the tone reduction text generation means on the display unit;
A tone display control device comprising:

［２］
前記表示部は、模範表示エリアと当該模範表示エリアに対比可能なユーザ表示エリアとを有し、
前記音調還元テキスト表示制御手段は、前記音声入力手段により模範の読み上げ音声が入力された場合は、前記音調還元テキスト生成手段により生成された模範の音調還元テキストデータを前記模範表示エリアに表示させ、前記音声入力手段によりユーザの読み上げ音声が入力された場合は、前記音調還元テキスト生成手段により生成されたユーザの音調還元テキストデータを前記ユーザ表示エリアに表示させる、
ことを特徴とする［１］に記載の音調表示制御装置。 [2]
The display unit includes a model display area and a user display area that can be compared with the model display area.
The tone reduction text display control means displays the model tone reduction text data generated by the tone reduction text generation means in the model display area when the model reading voice is input by the voice input means, When the user's reading voice is input by the voice input unit, the user's tone reduction text data generated by the tone reduction text generation unit is displayed in the user display area.
The tone display control device according to [1], wherein

［３］
前記音声入力手段により模範の読み上げ音声が入力された場合とユーザの読み上げ音声が入力された場合とで、前記音調判定手段により判定された単語毎の音調に相違があるか否かを判断する音調相違判断手段と、
この音調相違判断手段により音調に相違があると判断された場合に、前記模範表示エリアに表示されている模範の音調還元テキストデータと前記ユーザ表示エリアに表示されているユーザの音調還元テキストデータとの前記相違があると判断された単語の部分を識別して表示させる相違部識別表示制御手段と、
を備えたことを特徴とする［２］に記載の音調表示制御装置。 [3]
A tone that determines whether or not there is a difference in tone for each word determined by the tone determination unit between when the exemplary reading speech is input by the speech input unit and when the user's reading speech is input Difference judgment means,
When it is determined by the tone difference determination means that there is a difference in tone, the model tone reduction text data displayed in the model display area and the user tone reduction text data displayed in the user display area A different part identification display control means for identifying and displaying a part of the word determined to have the difference,
The tone display control device according to [2], comprising:

［４］
音声を出力する音声出力手段と、
前記音調相違判断手段により音調に相違があると判断された場合に、当該相違があると判断された単語に対応する模範の音声データを、前記音声入力手段により入力された模範の読み上げ音声から切り出して前記音声出力手段により出力させる模範音声出力制御手段と、
この模範音声出力制御手段により模範の読み上げ音声が出力された後、前記音声入力手段によりユーザの読み上げ音声が入力された際に、前記音調還元テキスト表示制御手段により、前記音調還元テキスト生成手段により生成されたユーザの音調還元テキストデータを前記ユーザ表示エリアに新たに表示させるユーザテキスト再表示制御手段と、
を備えたことを特徴とする［３］に記載の音調表示制御装置。 [4]
Audio output means for outputting audio;
When it is determined that the tone difference is determined by the tone difference determination unit, the model voice data corresponding to the word determined to have the difference is cut out from the model reading voice input by the voice input unit. Model voice output control means for outputting by the voice output means,
Generated by the tone reduction text generation means by the tone reduction text display control means when the user's reading voice is input by the voice input means after the model reading voice is output by the model voice output control means. User text redisplay control means for newly displaying the user's tone reduction text data in the user display area;
The tone display control device according to [3], comprising:

［５］
前記音声認識手段によりテキスト化された単語毎に前記音声入力手段により入力された音声先頭からの認識完了時間を抽出する単語毎認識完了時間抽出手段を備え、
前記模範音声出力制御手段は、前記音調相違判断手段により音調に相違があると判断された場合に、当該相違があると判断された単語に対応する模範の音声データを、前記音声入力手段により入力された模範の読み上げ音声から前記単語毎認識完了時間抽出手段により抽出された該当単語とその前の単語の各認識完了時間に基づいて切り出し、前記音声出力手段により出力させる、
ことを特徴とする［４］に記載の音調表示制御装置。 [5]
A recognition completion time extraction unit for each word for extracting a recognition completion time from the beginning of the voice input by the voice input unit for each word made into text by the voice recognition unit;
The model voice output control means inputs, by the voice input means, model voice data corresponding to a word determined to have a difference when the tone difference determination means determines that there is a difference in tone. Cut out based on the recognition completion time extraction means for each word extracted from the word-by-word recognition completion time extraction means and the recognition completion time of the previous word, and output by the voice output means,
The tone display control device according to [4], wherein

［６］
表示部を備えた電子機器のコンピュータを制御するためのプログラムであって、
前記コンピュータを、
文の読み上げ音声を入力する音声入力手段、
この音声入力手段により入力された音声を認識してテキスト化する音声認識手段、
前記音声入力手段により入力された音声の音調を前記音声認識手段によりテキスト化された単語毎に判定する音調判定手段、
前記音声認識手段によりテキスト化されたテキストデータと前記音調判定手段により判定された単語毎の音調に基づいて、各単語をその音調に応じた形態に識別してなる音調還元テキストデータを生成する音調還元テキスト生成手段、
この音調還元テキスト生成手段により生成された音調還元テキストデータを前記表示部に表示させる音調還元テキスト表示制御手段、
として機能させるためのプログラム。 [6]
A program for controlling a computer of an electronic device having a display unit,
The computer,
A voice input means for inputting a sentence reading voice;
Voice recognition means for recognizing the voice input by the voice input means and converting it into text;
Tone determination means for determining the tone of the voice input by the voice input means for each word made into text by the voice recognition means;
Based on the text data converted into text by the voice recognition means and the tone for each word determined by the tone determination means, the tone for generating tone reduction text data by identifying each word in a form corresponding to the tone. Reduction text generation means,
Tone reduction text display control means for displaying the tone reduction text data generated by the tone reduction text generation means on the display unit;
Program to function as.

１０ …タッチパネル式ＰＤＡ（音調表示制御装置）
１１ …ＣＰＵ
１２ …メモリ
１２ａ…言語学習支援プログラム
１２ｂ…単語音声認識処理モジュール
１２ｃ…模範例文データベース
１２ｄ…模範音声メモリ
１２ｅ…ユーザ音声メモリ
１２ｆ…音声認識テキストメモリ（模範／ユーザ）
１２ｇ…単語毎認識完了時間サンプリングメモリ（模範／ユーザ）
１２ｈ…単語毎語尾波形登録テーブル（模範／ユーザ）
１２ｉ…単語毎語尾音調登録テーブル（模範／ユーザ）
１２ｊ…単語毎音調還元テキストメモリ（模範／ユーザ）
１２ｋ…練習単語設定メモリ
１３ …外部記録媒体
１４ …記録媒体読取部
１５ …音声入力部
１６ …音声出力部
１７ …タッチパネル式表示部
１７Ｔ…学習テキスト表示ウインドウ
１７Ｍ…模範表示エリア
１７Ｕ…ユーザ表示エリア
１８ …通信制御部
１９ｍ…［模範］キー
１９ｉ…［例文］キー
１９ｕ…［ユーザ］キー
１９ｖ…［音声入力］キー
１９ｐ…［練習］キー
Ｇ …言語学習支援画面
Ｔｍ …模範の音調区切り還元テキストデータ
Ｔｕ …ユーザの音調区切り還元テキストデータ
ａｐ …区切り記号
Ｍｙ …黄色マーカ
Ｍ …マーカ枠 10 ... Touch panel PDA (tone display control device)
11 ... CPU
DESCRIPTION OF SYMBOLS 12 ... Memory 12a ... Language learning support program 12b ... Word voice recognition processing module 12c ... Model example sentence database 12d ... Model voice memory 12e ... User voice memory 12f ... Voice recognition text memory (model / user)
12g ... Word recognition completion time sampling memory (exemplary / user)
12h ... Word ending waveform registration table (exemplary / user)
12i ... Word-by-word tail tone registration table (exemplary / user)
12j ... Word tones reduction text memory (exemplary / user)
12k ... practice word setting memory 13 ... external recording medium 14 ... recording medium reading unit 15 ... voice input unit 16 ... voice output unit 17 ... touch panel display unit 17T ... learning text display window 17M ... model display area 17U ... user display area 18 ... Communication control unit 19m ... [exemplary] key 19i ... [example sentence] key 19u ... [user] key 19v ... [voice input] key 19p ... [practice] key G ... language learning support screen Tm ... exemplary tone-delimited reduction text data Tu ... User's tone delimiter reduction text data ap ... Delimiter My ... Yellow marker M ... Marker frame

Claims

A display unit;
A voice input means for inputting a sentence reading voice;
Voice recognition means for recognizing the voice input by the voice input means and converting it into text;
A tone determination unit that determines the tone of the voice input by the voice input unit for each word that is converted into text by the voice recognition unit;
Based on the text data converted into text by the voice recognition means and the tone for each word determined by the tone determination means, the tone for generating tone reduction text data by identifying each word in a form corresponding to the tone. Reduction text generation means,
Tone reduction text display control means for displaying the tone reduction text data generated by the tone reduction text generation means on the display unit;
A tone display control device comprising:

The display unit includes a model display area and a user display area that can be compared with the model display area.
The tone reduction text display control means displays the model tone reduction text data generated by the tone reduction text generation means in the model display area when the model reading voice is input by the voice input means, When the user's reading voice is input by the voice input unit, the user's tone reduction text data generated by the tone reduction text generation unit is displayed in the user display area.
The tone display control apparatus according to claim 1.

A tone that determines whether or not there is a difference in tone for each word determined by the tone determination unit between when the exemplary reading speech is input by the speech input unit and when the user's reading speech is input Difference judgment means,
When it is determined by the tone difference determination means that there is a difference in tone, the model tone reduction text data displayed in the model display area and the user tone reduction text data displayed in the user display area A different part identification display control means for identifying and displaying a part of the word determined to have the difference,
The tone display control device according to claim 2, comprising:

Audio output means for outputting audio;
When it is determined that the tone difference is determined by the tone difference determination unit, the model voice data corresponding to the word determined to have the difference is cut out from the model reading voice input by the voice input unit. Model voice output control means for outputting by the voice output means,
Generated by the tone reduction text generation means by the tone reduction text display control means when the user's reading voice is input by the voice input means after the model reading voice is output by the model voice output control means. User text redisplay control means for newly displaying the user's tone reduction text data in the user display area;
The tone display control apparatus according to claim 3, comprising:

A recognition completion time extraction unit for each word for extracting a recognition completion time from the beginning of the voice input by the voice input unit for each word made into text by the voice recognition unit;
The model voice output control means inputs, by the voice input means, model voice data corresponding to a word determined to have a difference when the tone difference determination means determines that there is a difference in tone. Cut out based on the recognition completion time extraction means for each word extracted from the word-by-word recognition completion time extraction means and the recognition completion time of the previous word, and output by the voice output means,
The tone display control device according to claim 4.

A program for controlling a computer of an electronic device having a display unit,
The computer,
A voice input means for inputting a sentence reading voice;
Voice recognition means for recognizing the voice input by the voice input means and converting it into text;
Tone determination means for determining the tone of the voice input by the voice input means for each word made into text by the voice recognition means;
Based on the text data converted into text by the voice recognition means and the tone for each word determined by the tone determination means, the tone for generating tone reduction text data by identifying each word in a form corresponding to the tone. Reduction text generation means,
Tone reduction text display control means for displaying the tone reduction text data generated by the tone reduction text generation means on the display unit;
Program to function as.