JP2005031207A

JP2005031207A - Pronunciation practice support system, pronunciation practice support method, pronunciation practice support program, and computer readable recording medium with the program recorded thereon

Info

Publication number: JP2005031207A
Application number: JP2003193824A
Authority: JP
Inventors: Yoichi Yamashita; 洋一山下; Akihiro Aoi; 昭博青井; Kunio Arakawa; 邦雄荒川
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 2003-07-08
Filing date: 2003-07-08
Publication date: 2005-02-03

Abstract

<P>PROBLEM TO BE SOLVED: To provide a pronunciation practice support system and a pronunciation practice support method capable of providing content of pronunciation practice for English conversation or the like in a mobile environment such as a cellular phone, and to provide a pronunciation support program for realizing the pronunciation practice support system and a computer readable recording medium with the program recorded thereon. <P>SOLUTION: This pronunciation practice support system 1 is communicably connected to a learner's mobile apparatus 100 and is provided with a channel communication control device 10 for acquiring speech data inputted from the mobile apparatus 100 by the learner, an utterance evaluation engine 33 for evaluating the learner's pronunciation included in the speech data acquired by the channel communication control device 10, a content editing section 44 for editing the content according to the evaluation result by the utterance evaluation engine 33, and a Web Server 20 for displaying the contents edited by the contents editing section 44 on the mobile apparatus 100. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、英会話等の発音練習を支援するシステムに関し、さらに詳しくは、学習者の発音を評定する発音練習支援システム、発音練習支援方法、発音練習支援プログラムおよびそれを記録したコンピュータ読み取り可能な記録媒体に関するものである。
【０００２】
【従来の技術】
日本での英語学習者は常時１３００万人いると推測されている。英会話スクールの数は１万教室を突破している。各種英語検定試験の受験者数は毎年５００万人を超えている。このように、日本人の英語に対する学習意欲、特に英会話に対する学習意欲は高い。
【０００３】
また、英会話コミュニケーションに挑戦してみたいという欲求がある一方で、「まとまった勉強時間がとれない」、「英会話スクールなどは費用が高いし、遠い」という事情や、「教材を買っても継続できない」、「本、雑誌、テレビだけでは、一方的過ぎる」などの不安感から、実際の学習をためらっている潜在的な英語学習者も多い。
【０００４】
このような事情から、携帯電話を用いて手軽に英語を学習できるサービスが提供されている。例えば、コラムにからめた短いフレーズをタレントが話す音声データを再生することで、学習者に「英語を聴く」練習をさせるサービスがある（非特許文献１）。また、映画のＤＶＤの販売促進のために映画のセリフの英文サンプルの音声データを再生することで、学習者に「英語を聴く」練習をさせるサービスがある（非特許文献２）。
【０００５】
また、英会話コミュニケーションを習得するには、“学習基本要素”「読む」「話す」「聴く」「書く」のサイクルを繰り返すことが効果的であるとされている。なお、日本では、中学、高校、大学の長期間にわたって英語を学習したにもかかわらず、「英会話コミュニケーションが苦手」という人が多いが、その原因は「話す練習をしていない」ことにあると考えられている。よって、日本人の英語学習者には、学習基本要素を踏まえて、実際の場面を想定し、コミュニケーションの視点から「話すこと」を訓練する（経験する）「コミュニカディブ・アプローチ」が効果的である。
【０００６】
この点、非特許文献３には、実写と音声認識により英会話のフリートークを擬似的に再現するパーソナルコンピュータ用のソフトウェアが記載されている。また、このソフトウェアでは、流暢な英語から初心者のカタカナ英語まで幅広く認識できるように、発話レベルを５段階に設定できる。
【０００７】
【非特許文献１】
“セインカミュのＡＢＣ”、［ｏｎｌｉｎｅ］、
［平成１５年６月２３日検索］、インターネット
＜ＵＲＬ：ｈｔｔｐ：／／ｉ．ｔｈａｎｅ．ｔｃ／ｎｏ＿ｍｅｎｂｅｒ＿ｇｅｎｅｒａｌ．ｐｈｐ＞
【０００８】
【非特許文献２】
“シネマ英会話”、［ｏｎｌｉｎｅ］、
［平成１５年６月２３日検索］、インターネット
＜ＵＲＬ：ｈｔｔｐ：／／ｉ＿ｃｉｎｅｍａ．ｔｓｕｔａｙａ．ｃｏ．ｊｐ／＞
【０００９】
【非特許文献３】
株式会社ラーニングウェア、“ＮａｔｉｖｅＷｏｒｌｄ機能概要”、［ｏｎｌｉｎｅ］
、［平成１５年６月２３日検索］、インターネット
＜ＵＲＬ：ｈｔｔｐ：／／ｗｗｗ．ｌｅａｒｎｉｎｇｗａｒｅ．ｃｏ．ｊｐ／ｐｒｏｄｕｃｔ／ｎｗ／ｎｗ＿ｔｏｋｕ．ｈｔｍ＞
【００１０】
【非特許文献４】
“語学学習者のための発声自動評定システム”、立命館大学理工学部情報学科山下洋一、２００２年１１月１１日、知性連合推進機構第９回フォーラムにおいて
【００１１】
【発明が解決しようとする課題】
しかしながら、従来の携帯電話を用いた英語学習サービスは「英語を聴く」練習のみであり、「英語を話す」練習を提供するサービスがなかった。
【００１２】
また、従来、「英語を話す」練習を提供するソフトウェアがあるが、パーソナルコンピュータ用であるため、学習者とって場所的、時間的な制約が大きい。すなわち、可搬性と即時利用性に問題があった。また、パーソナルコンピュータを利用するためのスキルが学習者に要求される。よって、このソフトウェアは携帯電話ほど手軽に利用できるものではなかった。さらに、教材の内容の変更にはアップグレードがともなうため、教材の内容を柔軟に変更することが困難であった。
【００１３】
本発明は、上記の問題点を解決するためになされたもので、その目的は、携帯電話等のモバイル環境において英会話等の発音練習のコンテンツを供給することができる発音練習支援システム、発音練習支援方法を提供することにある。また、本発明の目的には、上記発音練習支援システムを実現する発音練習支援プログラムおよびこれを記録したコンピュータ読み取り可能な記録媒体を提供することも含まれる。
【００１４】
【課題を解決するための手段】
上記の課題を解決するために、本発明の発音練習支援システムは、学習者の端末装置と通信可能に接続された発音練習支援システムであって、学習者が端末装置より入力した音声データを取得する音声データ取得手段と、上記音声データ取得手段によって取得された上記音声データに含まれる学習者の発音を評定する発音評定手段と、上記発音評定手段による評定結果に応じてコンテンツを編集するコンテンツ編集手段と、上記コンテンツ編集手段によって編集された上記コンテンツを上記端末装置に提示するコンテンツ提示手段と、を備えることを特徴としている。
【００１５】
また、本発明の発音練習支援方法は、学習者の端末装置と通信可能に接続された発音練習支援システムによる発音練習支援方法であって、学習者が端末装置より入力した音声データを取得する音声データ取得ステップと、上記音声データ取得ステップで取得した上記音声データに含まれる学習者の発音を評定する発音評定ステップと、上記発音評定ステップでの評定結果に応じてコンテンツを編集するコンテンツ編集ステップと、上記コンテンツ編集ステップで編集した上記コンテンツを上記端末装置に提示するコンテンツ提示ステップと、を含むことを特徴としている。
【００１６】
上記の構成および方法により、学習者が端末装置（携帯電話等）より入力した音声データを取得し、発音練習支援システムにおいて音声データに含まれる学習者の発音を評定し、その結果に応じたコンテンツを端末装置に提示できる。なお、本明細書において、「発音」とは、学習者が発した音声の意味であり、外国語の単語や文章の発話に限定されず、例えばカラオケや楽器演奏も含まれる。すなわち、本発明の発音練習支援システムは、対象とする音声データに適した発音評定手段を搭載することによって、学習者が発する各種の音をリモートで評定するシステムとして広く適用可能である。
【００１７】
よって、携帯電話等のモバイル環境において英会話等の発音練習のコンテンツを供給することができる。したがって、「話す」練習を手軽に利用したいという学習者の要求に応えることができる。
【００１８】
また、端末装置に提示するコンテンツをその都度発音練習支援システムから送信するため、パーソナルコンピュータのようなスタンド・アローンの装置と比較にして、コンテンツの内容変更が容易である。
【００１９】
また、発音評定手段による評定結果に応じて端末装置に提示するコンテンツを編集するため、学習者の学習進度や習熟度等の状況に応じた適切な内容のコンテンツを提示できる。よって、学習者ごとの状況を学習内容に動的に反映可能な学習サービスを提供できる。
【００２０】
以上より、発音練習支援システムを例えば英会話の学習サービスに利用すれば、英語学習者に、学習基本要素を踏まえて、実際の場面を想定し、コミュニケーションの視点から「話すこと」を訓練する（経験する）「コミュニカディブ・アプローチ」の機会を提供することが可能となる。それゆえ、「わずかな空き時間で試しに学びたい」、「それなりの効果が期待できる手軽な学習サービス」、「気軽に楽しめる・学べる方法はないのか」、「意思が弱くても続けたくなるコンテンツ」、「自分の都合優先のインタラクティブなコンテンツ」といった英語学習者の要望に応えることができる。
【００２１】
さらに、本発明の発音練習支援システムは、上記発音評定手段が評定の厳密さを精度パラメータに応じて変更可能なものであって、かつ、上記発音評定手段による評定結果の履歴に応じて上記精度パラメータを変更する精度パラメータ変更手段と、上記精度パラメータ変更手段によって設定された上記精度パラメータを学習者ごとに保持する精度パラメータ保持手段と、を備えることを特徴としている。
【００２２】
上記の構成により、さらに、精度パラメータを用いることにより、発音評定手段の評定結果の履歴に応じて、発音評定手段による評定の厳密さを変更できる。
【００２３】
よって、学習者の学習進度や習熟度等の状況に応じて、評定の厳密さを効率よく変更できる。したがって、学習者ごとに異なる状況を学習内容にシステム側で自動的に反映させることができる。それゆえ、英会話学校のように学習内容を学習者に合わせて柔軟にカスタマイズできるサービスを提供することが可能となる。
【００２４】
さらに、本発明の発音練習支援システムは、上記音声データ取得手段によって取得された上記音声データに基づいて学習者を認証する学習者認証手段を備えることを特徴としている。
【００２５】
上記の構成により、さらに、学習者ごとに異なるサービスを提供するために必要な学習者の認証を、学習者が端末装置より入力した音声データに基づいて行うことができる。なお、認証のための音声データは学習者の名前でもよいし、あいさつであってもよい。また、ＩＤやパスワードと組み合わせて認証してもよい。
【００２６】
よって、学習者に認証されていることを意識させず、発音練習支援システムによるサービスを抵抗感なく利用させることができる。
【００２７】
さらに、本発明の発音練習支援システムは、学習者のデータを学習者ごとに記憶した学習者データ記憶手段を備え、かつ、上記コンテンツ編集手段が、上記学習者データ記憶手段に記憶されたデータに基づいて他の学習者を模擬した登場人物をコンテンツに登場させるクラスメイト追加手段を含むことを特徴としている。
【００２８】
上記の構成により、さらに、学習者のコンテンツに他の学習者を登場させることができる。これにより、コンテンツに教室の雰囲気を付与することができる。また、コンテンツに登場する他の学習者は実在の学習者のデータに基づくため、学習者にリアリティを感じさせることができる。
【００２９】
なお、他の学習者の提示方法としてアバターが利用できる。対話相手となる仮想パーソナリティをアバターによって表現することにより、円滑なコミュニケーションが可能となり、手軽でありながら高い学習効果と継続性が期待できる。
【００３０】
さらに、本発明の発音練習支援システムは、上記発音評定手段が、学習者の母国語の音素と学習する外国語の音素との両方を用いたラベリングによる対応付けを行うものであることを特徴としている。
【００３１】
上記の構成により、さらに、上記発音評定手段は、学習者の母国語（例えば日本語）の音素と学習する外国語（例えば英語）の音素との両方を用いたラベリングにより、単語と文節の適切な対応付けが可能であるため、評定エラーまたは不適当な発声と判断することなく正確に評定できる。
【００３２】
よって、発音練習支援システムは、携帯端末として携帯電話のような音声通話機能とデータの表示・閲覧機能でプロセスが重複する端末装置を利用する場合であっても、学習効率やユーザビリティを低下させることなく、サービスを提供することが可能となる。
【００３３】
したがって、上記発音評定手段は、外国語の発音練習のためのコンテンツを提供する発音練習支援システムに搭載される発音評定エンジンとして好適である。
【００３４】
また、本発明の発音練習支援プログラムは、コンピュータを上記の各手段として機能させるコンピュータ・プログラムである。
【００３５】
上記の構成により、コンピュータで上記発音練習支援システムの各手段を実現することによって、上記発音練習支援システムを実現することができる。
【００３６】
また、本発明の発音練習支援プログラムを記録したコンピュータ読み取り可能な記録媒体は、上記の各手段をコンピュータに実現させて、上記発音練習支援システムを動作させる発音練習支援プログラムを記録したコンピュータ読み取り可能な記録媒体である。
【００３７】
上記の構成により、上記記録媒体から読み出された発音練習支援プログラムによって、上記発音練習支援システムをコンピュータ上に実現することができる。
【００３８】
【発明の実施の形態】
本発明の一実施の形態について図１から図１３に基づいて説明すれば、以下のとおりである。
【００３９】
図１は、本実施の形態に係る発音練習支援システム１の構成の概略を示す機能ブロック図である。
【００４０】
発音練習支援システム１は、携帯電話等の移動機（端末装置）１００へ通信ネットワークを介して、英会話等の発音練習のコンテンツを供給するものである。
【００４１】
なお、本実施の形態では、英会話の練習のサービスを提供するために、学習者の発音の英語としての適否を評定する構成について説明するが、評定する音声データの種類は英語に限定されない。すなわち、発音練習支援システム１は、評定する音声データに応じた発声評定エンジン３３（後述）を採用できる。よって、発音練習支援システム１は、英語以外の言語の発話や、カラオケ、楽器演奏の練習を支援するシステムとして構成することもできる。
【００４２】
図１に示すように、発音練習支援システム１は、移動機１００と携帯電話通信網（図示せず）を介して接続されている。具体的には、音声通話のために、移動機１００の音声通話モジュール１０１と回線通話制御装置１０の音声通話モジュール１１とが接続されている。また、データ通信のために、移動機１００のデータ通信モジュール１０２とＷｅｂサーバ２０のデータ通信モジュール２１とが接続されている。
【００４３】
発音練習支援システム１は、回線通話制御装置（音声データ取得手段）１０、Ｗｅｂサーバ（コンテンツ提示手段）２０、発声評定サーバ３０、データベースサーバ４０を備えて構成されている。発音練習支援システム１を構成する各装置は、データ通信のために、データ通信モジュール１２・２１・３１・４１によりＬＡＮ（ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）を介して接続されている。なお、発音練習支援システム１内の装置構成は適宜変更可能である。例えば、処理速度や通信速度に応じて各装置を複数台設けてもよいし、図１に示した４つの装置を１つの装置に統合してもよい。また、図１に示した各装置のブロックでは、デバイスマネージャ、オペレーティングシステム、ネイティブアプリケーションインターフェイス等が省略されている。
【００４４】
回線通話制御装置１０は、音声通話モジュール１１、データ通信モジュール１２、音声通話制御部１３、音声データ変換部１４を備えている。回線通話制御装置１０では、音声通話制御部１３が移動機１００から音声通話信号を受信し、その音声通話信号を音声データ変換部１４が発音練習支援システム１においてデータ処理可能な音声データ信号に変換する。すなわち、回線通話制御装置１０は、移動機１００から評定対象である学習者の音声データを取得する。なお、回線通話制御装置１０は、移動機１００を認証する端末認証部を備えていてもよい。
【００４５】
Ｗｅｂサーバ２０は、データ通信モジュール２１、Ｗｅｂサーバ部２２を備えている。Ｗｅｂサーバ２０は、英会話学習サービスのコンテンツを移動機１００へ送信して、移動機１００に提示させる。具体的には、Ｗｅｂサーバ部２２は、移動機１００のバーチャルマシン１０３あるいはＷｅｂブラウザ１０５と通信を行い、移動機１００がコンテンツを提示するためのＨＴＭＬ（ｈｙｐｅｒｔｅｘｔｍａｒｋｕｐｌａｎｇｕａｇｅ）ファイル等をデータベースサーバ４０のコンテンツ編集部４４から取得して、移動機１００へ送信する。
【００４６】
発声評定サーバ３０は、データ通信モジュール３１、音声認識エンジン３２、発声評定エンジン（発音評定手段）３３、発声データパターン格納部３４を備えている。発声評定サーバ３０は、回線通話制御装置１０によって取得された音声データに含まれる学習者の発音を評定し、評定結果をデータベースサーバ４０へ送信する。
【００４７】
音声認識エンジン３２は、学習者の音声データから音素を切り出す。
【００４８】
発声評定エンジン３３は、日本語音素（学習者の母国語）と英語音素（学習する外国語）の両方を用いたラベリングによる対応付けを行うことにより、学習者の発話単語または文章とデータパターンのマッチングを行って、２つの発話の基本周波数パターンの似ている程度を自動的に評定する。なお、発声評定エンジン３３は、発声データパターン格納部３４に格納されている発声データパターンを適宜参照する。また、発声評定エンジン３３は、評定の厳密さを精度パラメータ５７（図２（ａ））に応じて変更可能である。この精度パラメータ５７はユーザデータベース４５に格納されており、発声評定エンジン３３はユーザ管理部４３を介して学習者の精度パラメータ５７を取得する。
【００４９】
発声データパターン格納部３４には、コンテンツにおいて学習者の発音の適否を評定するための英語母語話者の発声データパターンが格納されている。なお、学習者を認証するための学習者による認証用の発話の発声データパターンは、音声ファイル５３（図２（ａ））としてユーザデータベース４５に格納されている。
【００５０】
データベースサーバ４０は、データ通信モジュール４１、ユーザ認証部（学習者認証手段）４２、ユーザ管理部４３、コンテンツ編集部（コンテンツ編集手段）４４、ユーザデータベース（学習者データ記憶手段、精度パラメータ保持手段）４５、コンテンツデータベース４６を備えている。
【００５１】
ユーザ認証部４２は、移動機１００の使用者である学習者を認証する。その方法としては、パスワードを用いてもよいし、後述するようにあらかじめ登録した学習者自身の発声データパターン（音声ファイル５３）との比較を発声評定エンジン３３によって行ってもよい。
【００５２】
ユーザ管理部４３は、ユーザデータベース４５を管理する。図２（ａ）は、ユーザデータベース４５に格納されるデータのデータ構造の一例を示す説明図である。ユーザデータベース４５には、学習者を管理するためのデータが学習者ごとに登録されている。図２（ａ）は一例であって、ユーザデータベース４５はこれに限定されない。
【００５３】
具体的には、図２（ａ）に示すように、ユーザデータベース４５には、学習者のユーザＩＤ５１をキーとして、電話番号５２、音声ファイル５３、ページ番号５４、アプリケーション用パラメータ５５、ログ５６、精度パラメータ５７が関連づけられて登録されている。なお、ユーザデータベース４５には、これらの他に、ユーザのニックネームや、電子メールアドレスなどを適宜登録できる。
【００５４】
ユーザＩＤ５１は、学習者を識別するための文字列である。電話番号５２は、学習者の電話番号を示す文字列である。音声ファイル５３は、学習者の認証のためにあらかじめ登録した学習者自身の発声データパターンを内容とする１６ｂｉｔ−１６ｋｈｚのＰＣＭ（ｐｕｌｓｅ−ｃｏｄｅｍｏｄｕｌａｔｉｏｎ）ファイルである。ページ番号５４は、学習者が利用しているサービスの直近の状態を表すｈｔｍｌページを示す文字列である。アプリケーション用パラメータ（コンテンツＩＤ）５５は、コンテンツのページを識別するための文字列である。ログ５６は、学習者のサービスの利用履歴を示す文字列である。精度パラメータ５７は、発声評定エンジン３３による評定の厳密さを示す数値（本実施の形態では、１〜５を割り当てる）である。
【００５５】
また、ユーザ管理部４３は、精度パラメータ変更部（精度パラメータ変更手段）４３ａを備えている。精度パラメータ変更部４３ａは、発声評定エンジン３３による評定結果の履歴に応じて精度パラメータ５７を変更し、ユーザＩＤ５１に関連づけてユーザデータベース４５に格納する。このように、精度パラメータ５７をユーザＩＤ５１と関連づけて管理することにより、学習者ごとの学習進度や熟練度を学習内容に動的に反映させることができる。
【００５６】
コンテンツ編集部４４は、コンテンツデータベース４６に格納されたデータを用いて、発声評定エンジン３３による評定結果に応じたコンテンツを編集する。
また、コンテンツ編集部４４は、クラスメイト追加部（クラスメイト追加手段）４４ａを備えている。クラスメイト追加部４４ａは、ユーザデータベース４５に記憶されたデータに基づいて他の学習者を模擬した登場人物をコンテンツに登場させる。
【００５７】
コンテンツデータベース４６には、コンテンツサプライヤから供給される教材コンテンツのデータが格納される。図２（ｂ）は、コンテンツデータベース４６に格納されるデータのデータ構造の一例を示す説明図である。図２（ｂ）は一例であって、コンテンツデータベース４６はこれに限定されない。
【００５８】
具体的には、図２（ｂ）に示すように、コンテンツデータベース４６には、コンテンツＩＤ６１をキーとして、発話例文６２、問題６３、発話サンプル６４が関連づけられて登録されている。コンテンツＩＤ６１は、コンテンツのページを識別するための文字列である。発話例文６２は、発話例文として表示される文字列である。問題６３は、問題として表示される文字列である。発話サンプル６４は、発話サンプルである音声データであり、移動機１００が備える再生機能に応じて選択できるように同じ内容の音声データが異なるデータフォーマット（例えば１６ｂｉｔ−１６ｋｈｚのＰＣＭ等）で格納されている。
【００５９】
図１に示すように、移動機１００は、音声通話モジュール１０１、データ通信モジュール１０２、バーチャルマシン１０３、Ｗｅｂブラウザ１０５を備えている。また、移動機１００は、携帯電話のユーザインタフェースとして、表示パネル、操作キー、マイク、スピーカ（図示せず）を少なくとも備えている。バーチャルマシン１０３は、Ｗｅｂサーバ２０から受信したクライアントプログラム１０４を実行することによりコンテンツを提示する。また、Ｗｅｂブラウザ１０５は、Ｗｅｂサーバ２０から受信したＨＴＭＬファイルに従ってコンテンツを表示する。なお、バーチャルマシン１０３およびＷｅｂブラウザ１０５のいずれか一方のみを備えていてもよい。
【００６０】
なお、本実施の形態では、移動機１００を携帯電話として説明するが、移動機１００としては、音声通話機能とアプリケーション動作環境を備えていれば任意の端末装置を利用できる。そして、発音練習支援システム１は、携帯電話のような音声通話機能とデータの表示・閲覧機能でプロセスが重複する端末装置を移動機１００として利用する場合に好適である。
【００６１】
つづいて、発音練習支援システム１の動作について説明する。
【００６２】
まず、図３を参照しながら、受講選択処理について説明する。図３は、受講選択処理で表示される画面例を示す説明図である。
【００６３】
学習者が移動機１００から発音練習支援システム１に初めてアクセスすると、移動機１００はＷｅｂサーバ部２２から受信したＨＴＭＬファイルに従って画面Ｗ１１を表示する。そして、画面Ｗ１１において学習者が「英会話教室」を選択すると、画面Ｗ１２を表示する。次に、画面Ｗ１２において学習者が「ビジネス会話コース」を選択すると、画面Ｗ１３を表示する。次に、画面Ｗ１３において学習者が講師の一人を選択すると、画面Ｗ１４を表示する。画面Ｗ１４では、講師の声を再生する「生声を聴いてみる」というホットスポットが表示されている。画面Ｗ１４において学習者が「このコースを選択する」というホットスポットを選択すると、受講選択処理は終了する。
【００６４】
受講選択処理が終了すると、学習者データ登録処理に移行する。学習者データ登録処理では、移動機１００に設定画面（図示せず）を表示して、学習者に電話番号５２、ニックネーム、電子メールアドレスなどを入力させる。また、学習者データ登録処理では、学習者を認証するための音声ファイル５３の登録を行う。
【００６５】
受講選択処理と学習者データ登録処理が完了すると、設定されたデータがＷｅｂサーバ部２２からユーザ管理部４３へ送信される。このとき、ユーザ管理部４３は、学習者にユーザＩＤ５１を発行する。そして、設定されたデータを、ユーザＩＤ５１に関連づけてユーザデータベース４５に格納する。
【００６６】
つづいて、図４から図６を参照しながら、発音練習支援システム１における発話による学習者認証処理について説明する。図４は、発話による学習者認証処理の流れを示すフローチャートである。図５は、初回アクセス時に行う音声ファイル５３の登録処理を示す説明図である。図６は、２回目以降のアクセス時に行う音声ファイル５３を用いた学習者認証処理を示す説明図である。
【００６７】
図４に示すように、ユーザ認証部４２は、初回アクセス時に学習者に英語による課題文を読ませて発話データパターンを収集する。具体的には、図５に示すように、まず、“ＭｙｎａｍｅｉｓＪｉｍｍｙ”という学習者のニックネームを含む短い課題文を表示する（画面Ｗ２１）。この画面Ｗ２１において、学習者が「ｓｐｅａｋ」ボタンを押した後、課題文を読んだ学習者の音声データをユーザ管理部４３が音声ファイル５３としてユーザデータベース４５に保存する。音声ファイル５３の保存が完了すると、学習のコンテンツを開始する（画面Ｗ２２）。
【００６８】
なお、認証に用いる課題文としては、例えば学習する言語の５ｗｏｒｄ程度からなるフレーズが適当である。なお、フレーズの内容は適宜選択可能である。また、ユーザ認証部４２は、音声ファイル５３を確定する前に、学習者が入力した音声データが認証用の発声データパターンとして適当であるか否かを発声評定エンジン３３によって評価してもよい。
【００６９】
そして、図４に示すように、２回目以降のアクセス時には、ユーザ認証部４２は、まず、学習者が移動機１００に入力したユーザＩＤ５１を取得する（Ｓ１１）。次に、移動機１００に認証用の課題文を表示し（画面Ｗ３１）、これを学習者が読んだ音声データを取得する（Ｓ１２）。次に、この音声データと、ユーザＩＤ５１に関連づけられてユーザデータベース４５に格納されている音声ファイル５３とを、発声評定エンジン３３にデータパターンのマッチングにより照合させる（Ｓ１３）。そして、発声評定エンジン３３が照合に成功した場合（Ｓ１４でＹＥＳ）、ユーザ認証部４２は学習者を認証して（Ｓ１５）、学習コンテンツを開始する（画面Ｗ３２）。一方、発声評定エンジン３３が照合に失敗した場合（Ｓ１４でＮＯ）、ユーザ認証部４２は学習者を認証せず（Ｓ１６）、終了画面Ｗ３３を表示する。
【００７０】
このように、発音練習支援システム１では、発話により学習者を認証することができる。これにより、学習を行う外国語による課題文を学習者に発話させて認証できるあるため、学習者にとっては学習の一環と感じられ、不自然さがない。
【００７１】
次に、図７から図９を参照しながら、学習コンテンツの流れについて説明する。図７は、学習コンテンツの基本的な流れを示す説明図である。図８は、学習コンテンツで発生するクラスメイトとの会話イベントの流れを示す説明図である。
図９は、学習コンテンツで表示するアバター（化身）の説明図である。
【００７２】
なお、図７、図８、図１０（後述）においては、学習者自身が「Ｊｉｍｍｙ（男性）」であり、クラスメイトが「Ｒｏｌｌｙ（女性）」である。また、講師、学習者、クラスメイトは、それぞれのアバター（図９）で表示される。
【００７３】
図７に示すように、学習コンテンツでは、（１）進捗状況の表示（画面Ｗ４１）、（２）問題文と選択肢の表示（画面Ｗ４２）、（３）解答結果の表示（画面Ｗ４３）、（４）得たポイント数の表示（画面Ｗ４４）のサイクルを１問ごとに繰り返す。よって、１レッスンを５問で構成する場合は、このサイクルを５回繰り返すことになる。
【００７４】
各画面を具体的に説明すると、画面Ｗ４１では、受講コース名、学習できるレッスン名、次のステップ名、これまでに得たポイント数が表示されている。
【００７５】
画面Ｗ４２では、問題文と解答の選択肢が表示されている。学習者が「ａｎｓｗｅｒ」ボタンを押した後、選択する選択肢の文章を読むと、その音声データが移動機１００から発音練習支援システム１へ送信される。なお、音声入力による選択に代えて、キーやボタン等による選択を可能としてもよい。
【００７６】
画面Ｗ４３では、学習者が入力した音声データを発声評定エンジン３３が評定した結果を表示する。画面Ｗ４３では、学習者を示す「Ｊｉｍｍｙ」が正解であったことが、得られたポイント数とともに表示されている。
【００７７】
画面Ｗ４４では、このステップで得られたポイントが表示されている。そして、学習者がこの画面で「ｃｏｎｆｉｒｍ」ボタンを押すと、次の問題の最初の画面（Ｗ４２に相当）へ移行する。
【００７８】
なお、画面Ｗ４１〜Ｗ４４の内容は、コンテンツ編集部４４が、学習者の解答、解答の正否、解答までの時間等に応じて決定し、それを提示するためのデータをコンテンツデータベース４６のデータに基づいて作成する。例えば、現在のポイント数に応じた表情のアバターを表示する。また、１ステップで得たポイントが少なければ、次のステップへは進まず、同じステップを繰り返す。その時、「困った表情のアバター」を表示する。
【００７９】
ここで、図７の画面Ｗ４３に示されているように、講師が出した問題に対して、クラスメイトが学習者と一緒に解答する。そして、学習者が正解してもクラスメイトより解答が遅ければ得られるポイントは少ない。これにより、学習者をクラスメイトとどちらが早く解答できるか競わせることができるため、学習者に英会話教室の雰囲気を感じさせることが可能となる。
【００８０】
このクラスメイトは、クラスメイト追加部４４ａが、ユーザデータベース４５から選択した他の学習者のデータに基づいて仮想的に作り出してコンテンツに追加する。具体的には、クラスメイト追加部４４ａは、他の学習者の性別、ニックネーム、正答率等のプロフィールのデータを用いてクラスメイトを生成する。そして、学習者の正答率や精度パラメータ５７に応じて、クラスメイトの正答率や解答時間を調整する。すなわち、学習者のレベルが高い場合、短時間で正解を示すクラスメイトを提示する。なお、登場させるクラスメイトの数は、１名でもよいし、複数でもよい。
【００８１】
あるいは、学習者の解答および解答時間をユーザデータベース４５に記録しておき、そのデータに基づいて、クラスメイト追加部４４ａが、他の学習者が過去に現実に行った解答を再現するようにクラスメイトを提示することもできる。この場合、より英会話教室のリアリティを出すことができる。
【００８２】
つづいて、図８を参照しながら、学習コンテンツで発生するクラスメイトとの会話イベントについて説明する。会話イベントとは、学習コンテンツの途中でクラスメイトが学習者に突然話しかけてくるイベントである。なお、会話イベントは、学習コンテンツにあらかじめに組み込まれていてもよいし、コンテンツ編集部４４がランダムに発生させてもよい。また、登場させるクラスメイトには、その時点で学習サービスを利用している他の学習者を選択してもよい。
【００８３】
例えば、画面Ｗ５１において、学習者が英会話のレベル設定していると、突然画面Ｗ５２に切り替わり、クラスメイトが学習者に話しかけてくる。画面Ｗ５２において、学習者が講師アバターをクリックすると、講師によるアドバイスが表示される（画面Ｗ５３）。画面Ｗ５３では、クラスメイトへの返答の文章が表示され、学習者が「Ｓｐｅａｋ」ボタンを押せば返答の発声を入力でき、また、学習者が「Ｌｉｓｔｅｎ」ボタンを押せばその文章の講師による発声が再生される。なお、講師の発声は繰り返し再生できる。
【００８４】
そして、学習者が入力した音声データを発声評定エンジン３３が評定した結果、適切であれば、クラスメイトが返答する画面Ｗ５４が表示される。また、画面Ｗ５４では、学習者の応対の出来に応じて付与された「ＦｒｉｅｎｄｓｈｉｐＤｅｇｒｅｅ（親密度）」のポイントが表示されている。なお、親密度が所定値以上になれば、例えばクラスメイトのモデルとなった現実の他の学習者とのメッセージのやり取りを可能とするなどの付加的なサービスを提供してもよい。そして、会話イベントの終了後、もとの画面５１が表示される。
【００８５】
このような会話イベントを発生させることにより、英会話教室の雰囲気をよりリアリティのあるものにできるとともに、学習が単調になることを防止できる。
【００８６】
つづいて、図１０を参照しながら、発声評定エンジン３３による評定の厳密さを変更する処理について説明する。図１０は、発声評定エンジン３３による評定の厳密さを変更する処理を示す説明図である。
【００８７】
発声評定エンジン３３は、ユーザデータベース４５に格納されている精度パラメータ５７に基づいて、評定の厳密さを調整できる。そして、精度パラメータ５７は、学習コンテンツに従って学習者が入力した音声データに基づいて、精度パラメータ変更部４３ａが自動的に変更する。
【００８８】
具体的には、学習コンテンツ（画面Ｗ６１）に従って移動機１００で学習者が入力した音声データは、回線通話制御装置１０を介して発声評定エンジン３３に入力される（▲１▼，▲２▼（音声データ取得ステップ））。このとき、発声評定エンジン３３は、ユーザ管理部４３へ学習者の精度パラメータ５７を問い合わせて（▲３▼）、これを取得する（▲４▼）。
【００８９】
次に、発声評定エンジン３３は、精度パラメータ５７に応じた厳密さで、音声データに含まれる発音の適否を評定し（発音評定ステップ）、その評定結果をユーザ管理部４３とコンテンツ編集部４４へ送信する（▲５▼）。その後、コンテンツ編集部４４では、評定結果を反映したコンテンツを生成し（コンテンツ編集ステップ）、このコンテンツ（画面Ｗ６２）をＷｅｂサーバ２０を介して移動機１００へ送信する（▲６▼（コンテンツ提示ステップ））。
【００９０】
一方、ユーザ管理部４３では、評定結果をログ５６（図２（ａ））に記録するとともに、精度パラメータ変更部４３ａが評定結果の履歴（例えば、誤答／正答が所定回数に達したこと）に基づき、精度パラメータ５７を変更する。なお、ユーザ管理部４３は、ログ５６および精度パラメータ５７を学習者のユーザＩＤ５１に対応付けてユーザデータベース４５に格納する。また、図１０では、精度パラメータ５７は５段階で設定されているが、段階の数は任意に選択できる。
【００９１】
これにより、学習者の発話能力や習熟度に応じて、評定の厳密さを学習コンテンツの進行中に発音練習支援システム１において自動的に変更することができる。すなわち、評定の厳密さを学習者に設定させることもなく、また、評定の厳密さの設定を学習者に意識させることもない。よって、英会話学校と同様に、学習者の状態に応じて柔軟かつ違和感なく教材の内容を変更して、効率的な学習サービスを提供することが可能となる。
【００９２】
ここで、発声評定エンジン３３の詳細について説明する。なお、発声評定エンジン３３に実装される手法は、本願発明の発明者によって提案されたものである（非特許文献４）。
【００９３】
日本人の英語習得を困難にしている一要素として、日本語と英語の韻律操作の差違が挙げられる。英語学習者（以下、学習者）が英語母語話者（以下、母語話者）の韻律操作を習得する１つの方法として、母語話者の韻律操作を真似て発声する方法が考えられる。このような学習をコンピュータで支援し、似ている程度を自動的に評定できるようにするには、まず、比較する２つの発話同士を適切に対応付ける必要がある。
【００９４】
発声評定エンジン３３では、学習者音声に対して、日本人の発声に考えられる音素を含めて、英語音素と日本語音素を用いた自動ラベリングを行い、対応付けを行う。例えば、“ｔｈｅ”の“ｔｈ”の音素表記として、英語の／ｔｈ／だけでなく、日本語の／ｚ／も許すなど、日本人の発声における発声の誤りパターンを考慮して自動ラベリングを行う。なお、日本語音素モデルには、「日本語ディクテーション基本ソフトウェアの開発」プロジェクト（ｈｔｔｐ：／／ｗｉｎｎｉｅ．ｋｕｉｓ．ｋｙｏｔｏ−ｕ．ａｃ．ｊｐ／ｄｉｃｔａｔｉｏｎ／）で提供されている４３音素が使用できる。また、英語音素モデルには、ＨＴＫ（ＴｈｅＨＴＫＢｏｏｋ（Ｖｅｒｓｉｏｎ２．１））を用いて作成した例えば４６音素が使用できる。
【００９５】
図１１は、発声評定エンジン３３における日本語音素と英語音素との自動ラベリングによる対応付けの一例を示す説明図である。図１１に示すように、手動ラベリング結果をもとに、母語話者の音声を基準にしてごとに対応すべき学習者音声の区間（フレーム）を決定し、その結果と発声評定エンジン３３により自動的に決定された学習者音声の対応フレームが１００ｍｓ以上ずれている割合を求めると、９．４０（％）であった。これは従来の対応付けの手法による対応のずれと比較して極めて小さいものである。すなわち、英語と日本語音素の自動ラベリングの手法によれば、２つの発話を正確に対応付けることができる。
【００９６】
従来の発声評定エンジンにおいては、▲１▼基本周波数パターン、▲２▼スペクトル情報、▲３▼英語音素のみの自動ラベリング等により、母語話者発話と学習者発話の音素レベルでの対応付けを行ない評定を行っていた。しかし、これらの方法の場合、フレーズ中の単語が正しく対応付けされず、ずれが生じるといった事態が高い頻度で発生する。具体的には、従来の発声評定エンジンでは、“ａｐｐｌｅ”の期待発話に対して“ａｎａｐｐｌｅ”と発話されると、“ａｐｐｌｅ”に“ａｎａｐｐｌｅ”が対応付けられて、正しい評定ができなくなることが相当な頻度で発生する。すなわち、“ａｐｐｌｅ”に対して利用者が“ａｎａｐｐｌｅ”と発声することは、システム設計上受容できないケース（例外ケース）として処理されるか、または、正しくない発話として利用者に通知されることとなる。
【００９７】
これに対して、発声評定エンジン３３では、上記のように日本語音素と英語音素の両方を用いたラベリングによる対応付けを行うことにより、従来の手法に比べ対応付けのずれが格段に少なくなっている。具体的には、仮に利用者が“ａｐｐｌｅ”に対して“ａｎａｐｐｌｅ”と発話しても、“ａｎ”は“ａｎ”に対して、“ａｐｐｌｅ”は“ａｐｐｌｅ”に対してそれぞれ正しく対応付けられる。その結果、発声評定エンジン３３を用いた発音練習支援システム１では、“ａｐｐｌｅ”の発声が正しければ、その旨を学習者に対して通知できる。
【００９８】
携帯電話のような音声通話機能とデータの表示・閲覧機能（Ｗｅｂブラウザ等）とが別のプロセスで構成される端末装置を移動機１００として利用する場合、音声発声→結果通知の一連のサービス・フローが一度で完結することが望ましい。すなわち、上述したような従来の発声評定エンジンを用いた場合、“ａｐｐｌｅ”の評定前提に対して“ａｎａｐｐｌｅ”と発声されたときに、評定エラーまたは不適当な発声と判断して、サービス・フローのリトライを学習者に強要することになる。これは、英会話等の学習サービスにおいて、学習効率とユーザビリティに大きな悪影響を与えることとなるため、品質面から許容できない。なお、この問題は、上記２つの機能がインターフェイス上で融合されているパーソナルコンピュータなどでは発生しない。
【００９９】
このように、発声評定エンジン３３は、英語音素、日本語音素両方を用いたラベリングにより、単語と文節の適切な対応付けが可能であるため、評定エラーまたは不適当な発声と判断することなく正確に評定できる。よって、発音練習支援システム１は、移動機１００として携帯電話のような音声通話機能とデータの表示・閲覧機能でプロセスが重複する端末装置を利用する場合であっても、学習効率やユーザビリティを低下させることなく、サービスを提供することが可能となる。
【０１００】
また、発声評定エンジン３３は、学習者の発話データと、発声を評定するための母国語話者のデータとのパターン・マッチングの程度（適合度合いの程度）によって、学習者の発話の正確さを評定する。これにより、点数分布のような表現を用いて発話の程度を分類することができる。
【０１０１】
例えば、マッチングの度合いを０〜１００点で表すとき、８０点以上であれば発話として容認する場合と、６０点以上であれば発話として容認する場合とでは、その評定の厳密さは自ずと異なる。そして、発話の程度は学習者個人の能力に極めて依存するものであり、発話がどの程度以上から容認されるのかについての判断を静的に規定することは、当判断を利用したサービスにおいて、学習者間に存在する明らかな個体差を無視したものとなる。
【０１０２】
発音練習支援システム１では、学習者の過去の履歴等からパターン・マッチングの結果得られる点数分布の傾向を動的に採取し、そのデータから学習者の能力を評定の都度判断する。そして、発話が容認される程度だけ分布点を上下させる。その結果、評定の厳密さを切り替えることが可能となる。
【０１０３】
ここで、どの程度のパターン・マッチング率の水準から受容できる発話とするかの集合を表すのが精度パラメータである。精度パラメータは、例えば、評定を最も厳密に行うレベル５から最も緩やかに行うレベル１の５段階で設定できる。
そして、サービス開始時のデフォルト状態での受容評定集合がレベル５以上であっても、学習者のその後の評定結果分布が３〜４のレベルに密集していた場合、精度パラメータ変更部４３ａは、受容評定集合をレベル３またはレベル４以上に変更する。これにより、学習者の現時点の発話傾向に沿った発声評定が可能となる。なお、学習者の評定結果の密集分布は、学習者に固有のログ５６（図２（ａ））により管理される。
【０１０４】
以上のように、発音練習支援システム１によれば、可搬性と携帯性が高く常時ネットワークに接続され、かつ、普及度が高くユーザの操作スキルが平均的に習熟している携帯電話環境において学習サービスを提供できる。また、学習者ごとのカスタマイズが可能であるため、学習者の能力に応じた学習サービスを効率的に提供できる。よって、手軽でありながら高い学習効果と継続性が期待できる学習サービスを提供することが可能となる。
【０１０５】
発音練習支援システム１によれば、発声評定エンジンにより発音・韻律を客観的に評定し、その評定結果がコンテンツに連動するため、「英語を話す」練習になる。また、実際に使われる頻度の高い会話を抽出して教材とできるため、「英語を聴く」「英語を読む」練習になる。問題を解くことで、擬似的に「英語を書く」練習になる。また、講師とクラスメイトにアバターを使用することで、学習効果を向上させることができる。
【０１０６】
発音練習支援システム１によれば、学習者は、▲１▼ネイティブの音声を聴いて、状況を理解し、発声を覚えることができる、▲２▼携帯電話をかけて実際の場面を想像しながら話すことができる、▲３▼発話の評定を受けることができる、▲４▼スピーキングの結果が携帯電話の画面ですぐに確認できる。
【０１０７】
発音練習支援システム１によれば、以下のようなサービス・コンセプトに沿った学習サービスを実現できる。すなわち、英会話コミュニケーションをリアルにシミュレートし、現実のコミュニケーションを常に想定した実用度の高い学習コンテンツによる「コミュニカティブ・アプローチ」を実現できる。また、講師やクラスメイトとさまざまなコミュケーション・イベントを楽しみながら役に立つ英会話を体験できる「エデュテイメント・アプローチ」を実現できる。学習基本要素を会話タスクの中にバランスよくコンテンツ化した「タスク・ベース・アプローチ」を実現できる。
【０１０８】
ここで、発音練習支援システム１を構成する各装置（回線通話制御装置１０、Ｗｅｂサーバ２０、発声評定サーバ３０、データベースサーバ４０）は、ワークステーション等の汎用のコンピュータをベースに構成できる。また、移動機１００は、携帯電話やＰＤＡ（ｐｅｒｓｏｎａｌｄｉｇｉｔａｌａｓｓｉｓｔａｎｔ）を含む汎用のコンピュータをベースに構成できる。
【０１０９】
すなわち、発音練習支援システム１を構成する各装置および移動機１００は、それぞれの機能を実現するプログラムの命令を実行するＣＰＵ（ｃｅｎｔｒａｌｐｒｏｃｅｓｓｉｎｇｕｎｉｔ）、ブートロジックを格納したＲＯＭ（ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）、上記プログラムを展開するＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）、上記プログラムおよび各種データベースを格納するハードディスク等の記憶装置（記録媒体）、キーボードやマウス等の入力機器、モニタ、スピーカー、プリンタ等の出力機器、外部のネットワークに接続するネットワーク接続機器が、内部バスによって接続されて構成されている。
【０１１０】
移動機１００は、発音練習支援システム１から取得したコンテンツを提示するために、標準的なインターネット閲覧機能を有し、Ｗｅｂサーバ２０にネットワークを介して接続できるものであればよい。
【０１１１】
図１２は、バーチャルマシン１０３を備えた移動機１００の構成を示す説明図である。バーチャルマシン１０３は、Ｗｅｂサーバ２０から取得したクライアントプログラム１０４（例えば、Ｊａｖａ（登録商標）プログラム）を実行するアプリケーション実行環境である。このように、バーチャルマシン１０３でクライアントプログラム１０４を実行することによってコンテンツを提示する場合、クライアントプログラム１０４がプログラムであるため、イベントや入力等による処理を移動機１００において実行することができる。よって、コンテンツに応じた多様な振る舞いを実装することが可能となる。
【０１１２】
また、図１３は、Ｗｅｂブラウザ１０５を備えた移動機１００の構成を示す説明図である。Ｗｅｂブラウザ１０５は、ネイティブアプリケーションの一種であり、Ｗｅｂサーバ２０から取得したＨＴＭＬやＳＨＴＭＬなどのマークアップ言語により記述された文書構造に対する文書やデータを移動機１００の画面上に表示する。このように、Ｗｅｂブラウザ１０５でＨＴＭＬファイル等に従ってコンテンツを提示する場合、Ｗｅｂブラウザ１０５がほとんどの移動機に実装されているので、多くの移動機から発音練習支援システム１を利用することができる。
【０１１３】
最後に、本発明の目的は、上述した機能を実現するソフトウェアである発音練習支援プログラムのプログラムコード（実行形式プログラム、中間コードプログラム、ソースプログラム）をコンピュータで読み取り可能に記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ、ＤＳＰ）が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。
【０１１４】
具体的には、回線通話制御装置１０、Ｗｅｂサーバ２０、発声評定サーバ３０、データベースサーバ４０が備える各機能ブロックは、各装置において、メモリ（図示せず）に格納された所定のプログラムをマイクロプロセッサなどが実行することにより実現される。
【０１１５】
上記プログラムコードを供給するための記録媒体は、システムあるいは装置と分離可能に構成することができる。また、上記記録媒体は、プログラムコードを供給可能であるように固定的に担持する媒体であってもよい。そして、上記記録媒体は、記録したプログラムコードをコンピュータが直接読み取ることができるようにシステムあるいは装置に装着されるものであっても、外部記憶装置としてシステムあるいは装置に接続されたプログラム読み取り装置を介して読み取ることができるように装着されるものであってもよい。
【０１１６】
例えば、上記記録媒体としては、磁気テープやカセットテープ等のテープ系、フロッピー（登録商標）ディスク／ハードディスク等の磁気ディスクやＣＤ−ＲＯＭ／ＭＯ／ＭＤ／ＤＶＤ／ＣＤ−Ｒ等の光ディスクを含むディスク系、ＩＣカード（メモリカードを含む）／光カード等のカード系、あるいはマスクＲＯＭ／ＥＰＲＯＭ／ＥＥＰＲＯＭ／フラッシュＲＯＭ等の半導体メモリ系などを用いることができる。
【０１１７】
また、上記プログラムコードは、コンピュータが記録媒体から読み出して直接実行できるように記録されていてもよいし、記録媒体から主記憶のプログラム記憶領域へ転送された後コンピュータが主記憶から読み出して実行できるように記録されていてもよい。
【０１１８】
さらに、システムあるいは装置を通信ネットワークと接続可能に構成し、上記プログラムコードを通信ネットワークを介して供給してもよい。そして、通信ネットワークとしては、特に限定されず、具体的には、インターネット、イントラネット、エキストラネット、ＬＡＮ、ＩＳＤＮ、ＶＡＮ、ＣＡＴＶ通信網、仮想専用網（ｖｉｒｔｕａｌｐｒｉｖａｔｅｎｅｔｗｏｒｋ）、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、通信ネットワークを構成する伝送媒体としては、特に限定されず、具体的には、ＩＥＥＥ１３９４、ＵＳＢ、電力線搬送、ケーブルＴＶ回線、電話線、ＡＤＳＬ回線等の有線でも、ＩｒＤＡやリモコンのような赤外線、Ｂｌｕｅｔｏｏｔｈ、８０２．１１無線、ＨＤＲ、携帯電話網、衛星回線、地上波デジタル網等の無線でも利用可能である。なお、本発明は、上記プログラムコードが電子的な伝送で具現化された搬送波あるいはデータ信号列の形態でも実現され得る。
【０１１９】
上述した機能は、コンピュータが読み出した上記プログラムコードを実行することによって実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているＯＳなどが実際の処理の一部または全部を行うことによっても実現される。
【０１２０】
さらに、上述した機能は、上記記録媒体から読み出された上記プログラムコードが、コンピュータに装着された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行うことによっても実現される。
【０１２１】
【発明の効果】
以上のように、本発明の発音練習支援システムは、学習者の端末装置と通信可能に接続された発音練習支援システムであって、学習者が端末装置より入力した音声データを取得する音声データ取得手段と、上記音声データ取得手段によって取得された上記音声データに含まれる学習者の発音を評定する発音評定手段と、上記発音評定手段による評定結果に応じてコンテンツを編集するコンテンツ編集手段と、上記コンテンツ編集手段によって編集された上記コンテンツを上記端末装置に提示するコンテンツ提示手段と、を備える構成である。
【０１２２】
また、本発明の発音練習支援方法は、学習者の端末装置と通信可能に接続された発音練習支援システムによる発音練習支援方法であって、学習者が端末装置より入力した音声データを取得する音声データ取得ステップと、上記音声データ取得ステップで取得した上記音声データに含まれる学習者の発音を評定する発音評定ステップと、上記発音評定ステップでの評定結果に応じてコンテンツを編集するコンテンツ編集ステップと、上記コンテンツ編集ステップで編集した上記コンテンツを上記端末装置に提示するコンテンツ提示ステップと、を含む方法である。
【０１２３】
それゆえ、携帯電話等のモバイル環境において英会話等の発音練習のコンテンツを供給することができる。したがって、「話す」練習を手軽に利用したいという学習者の要求に応えることができるという効果を奏する。
【０１２４】
また、端末装置に提示するコンテンツをその都度発音練習支援システムから送信するため、パーソナルコンピュータのようなスタンド・アローンの装置と比較にして、コンテンツの内容変更が容易であるという効果を奏する。
【０１２５】
また、発音評定手段による評定結果に応じて端末装置に提示するコンテンツを編集するため、学習者の学習進度や習熟度等の状況に応じた適切な内容のコンテンツを提示できる。よって、学習者ごとの状況を学習内容に動的に反映可能な学習サービスを提供できるという効果を奏する。
【０１２６】
さらに、本発明の発音練習支援システムは、上記発音評定手段が評定の厳密さを精度パラメータに応じて変更可能なものであって、かつ、上記発音評定手段による評定結果の履歴に応じて上記精度パラメータを変更する精度パラメータ変更手段と、上記精度パラメータ変更手段によって設定された上記精度パラメータを学習者ごとに保持する精度パラメータ保持手段と、を備える構成である。
【０１２７】
それゆえ、さらに、学習者の学習進度や習熟度等の状況に応じて、評定の厳密さを効率よく変更できる。したがって、学習者ごとに異なる状況を学習内容にシステム側で自動的に反映させることができる。それゆえ、英会話学校のように学習内容を学習者に合わせて柔軟にカスタマイズできるサービスを提供することが可能となるという効果を奏する。
【０１２８】
さらに、本発明の発音練習支援システムは、上記音声データ取得手段によって取得された上記音声データに基づいて学習者を認証する学習者認証手段を備える構成である。
【０１２９】
それゆえ、さらに、学習者に認証されていることを意識させず、発音練習支援システムによるサービスを抵抗感なく利用させることができるという効果を奏する。
【０１３０】
さらに、本発明の発音練習支援システムは、学習者のデータを学習者ごとに記憶した学習者データ記憶手段を備え、かつ、上記コンテンツ編集手段が、上記学習者データ記憶手段に記憶されたデータに基づいて他の学習者を模擬した登場人物をコンテンツに登場させるクラスメイト追加手段を含む構成である。
【０１３１】
それゆえ、さらに、学習者のコンテンツに他の学習者を登場させることができる。これにより、コンテンツに教室の雰囲気を付与することができるという効果を奏する。また、コンテンツに登場する他の学習者は実在の学習者のデータに基づくため、学習者にリアリティを感じさせることができるという効果を奏する。
【０１３２】
さらに、本発明の発音練習支援システムは、上記発音評定手段が、学習者の母国語の音素と学習する外国語の音素との両方を用いたラベリングによる対応付けを行うものである。
【０１３３】
それゆえ、さらに、発音練習支援システムは、携帯端末として携帯電話のような音声通話機能とデータの表示・閲覧機能でプロセスが重複する端末装置を利用する場合であっても、学習効率やユーザビリティを低下させることなく、サービスを提供することが可能となるという効果を奏する。
【０１３４】
また、本発明の発音練習支援プログラムは、コンピュータを上記の各手段として機能させるコンピュータ・プログラムである。
【０１３５】
それゆえ、コンピュータで上記発音練習支援システムの各手段を実現することによって、上記発音練習支援システムを実現することができるという効果を奏する。
【０１３６】
また、本発明の発音練習支援プログラムを記録したコンピュータ読み取り可能な記録媒体は、上記の各手段をコンピュータに実現させて、上記発音練習支援システムを動作させる発音練習支援プログラムを記録したコンピュータ読み取り可能な記録媒体である。
【０１３７】
それゆえ、上記記録媒体から読み出された発音練習支援プログラムによって、上記発音練習支援システムをコンピュータ上に実現することができるという効果を奏する。
【図面の簡単な説明】
【図１】本発明の一実施の形態に係る発音練習支援システムの構成の概略を示す機能ブロック図である。
【図２】図１に示した発音練習支援システムが使用するデータのデータ構造を示す説明図であって、図２（ａ）はユーザデータベースに格納されるデータのデータ構造の一例、図２（ｂ）はコンテンツデータベースに格納されるデータのデータ構造の一例をそれぞれ示す。
【図３】図１に示した移動機において受講選択処理で表示される画面例を示す説明図である。
【図４】図１に示した発音練習支援システムにおける発話によるユーザ認証処理の流れを示すフローチャートである。
【図５】図１に示した発音練習支援システムにおいて初回アクセス時に行う音声ファイルの登録処理を示す説明図である。
【図６】図１に示した発音練習支援システムにおいて２回目以降のアクセス時に行う音声ファイルを用いた学習者認証処理を示す説明図である。
【図７】図１に示した移動機に提示される学習コンテンツの基本的な流れを示す説明図である。
【図８】図１に示した移動機に提示される学習コンテンツで発生するクラスメイトとの会話イベントの流れを示す説明図である。
【図９】図１に示した移動機に表示されるアバターの説明図である。
【図１０】図１に示した発音練習支援システムにおける発声評定エンジンによる評定の厳密さを変更する処理を示す説明図である。
【図１１】図１に示した発音練習支援システムの発声評定エンジンにおける日本語音素と英語音素との自動ラベリングによる対応付けの一例を示す説明図である。
【図１２】図１に示した移動機のバーチャルマシンを備えた構成を示す説明図である。
【図１３】図１に示した移動機のＷｅｂブラウザを備えた構成を示す説明図である。
【符号の説明】
１発音練習支援システム
１０回線通話制御装置（音声データ取得手段）
２０Ｗｅｂサーバ（コンテンツ提示手段）
３３発声評定エンジン（発音評定手段）
４２ユーザ認証部（学習者認証手段）
４３ａ精度パラメータ変更部（精度パラメータ変更手段）
４４コンテンツ編集部（コンテンツ編集手段）
４４ａクラスメイト追加部（クラスメイト追加手段）
４５ユーザデータベース（学習者データ記憶手段、精度パラメータ保持手段）
５７精度パラメータ
１００移動機（端末装置）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a system for supporting pronunciation practice such as English conversation, and more particularly, a pronunciation practice support system for assessing a learner's pronunciation, a pronunciation practice support method, a pronunciation practice support program, and a computer-readable record recording the same It relates to the medium.
[0002]
[Prior art]
It is estimated that there are always 13 million English learners in Japan. The number of English conversation schools has exceeded 10,000 classrooms. The number of examinees taking various English proficiency tests exceeds 5 million each year. In this way, Japanese people are highly motivated to learn English, especially English conversation.
[0003]
Also, while there is a desire to challenge English conversation communication, there are circumstances such as “I can not take a long time to study”, “English school is expensive and far away”, “I can not continue even if I buy teaching materials”, Many potential English learners are hesitant about learning because of anxiety such as “books, magazines, and television alone are too unilateral”.
[0004]
Under these circumstances, services that allow easy learning of English using a mobile phone are provided. For example, there is a service that allows a learner to practice “listening to English” by playing audio data in which a talent speaks a short phrase entangled in a column (Non-Patent Document 1). In addition, there is a service that allows a learner to practice “listening to English” by reproducing audio data of English sentences of a movie line for sales promotion of a movie DVD (Non-patent Document 2).
[0005]
In order to learn English conversation communication, it is said that it is effective to repeat the cycle of “learning basic elements”, “read”, “speak”, “listen” and “write”. In Japan, despite learning English for a long time at junior high school, high school, and university, many people say that they are not good at English conversation communication, but the reason is that they are not practicing speaking. It is considered. Therefore, it is effective for Japanese learners of English to use the “Communicative Approach” to train (experience) speaking from the viewpoint of communication, assuming actual situations based on basic learning elements. It is.
[0006]
In this regard, Non-Patent Document 3 describes software for a personal computer that artificially reproduces a free talk in English conversation using live action and voice recognition. In addition, this software can set the utterance level to 5 levels so that it can be recognized widely from fluent English to beginner katakana English.
[0007]
[Non-Patent Document 1]
“Sain Camus ABC”, [online],
[Search June 23, 2003] Internet
<URL: http: // i. thane. tc / no_member_general. php>
[0008]
[Non-Patent Document 2]
“Cinema English”, [online],
[Search June 23, 2003] Internet
<URL: http: // i_cinema. tsutaya. co. jp />
[0009]
[Non-Patent Document 3]
Learning Wear Co., Ltd., “Native World Functional Overview”, [online]
[Search June 23, 2003] Internet
<URL: http: // www. learningware. co. jp / product / nw / nw_toku. htm>
[0010]
[Non-Patent Document 4]
"Voice-speaking automatic rating system for language learners", Ritsumeikan University Faculty of Science and Technology, Department of Information Yoichi Yamashita, November 11, 2002, at the 9th Forum of the Intellectual Union Promotion Organization
[0011]
[Problems to be solved by the invention]
However, the conventional English learning service using a mobile phone is only “listening to English” practice, and there is no service that provides “speaking English” practice.
[0012]
Conventionally, there is software that provides “speaking English” practice, but because it is for a personal computer, there are significant restrictions on the place and time for the learner. That is, there were problems in portability and immediate availability. In addition, the learner is required to have skills for using a personal computer. Therefore, this software was not as easy to use as a mobile phone. Furthermore, since the change of the content of the teaching material involves an upgrade, it has been difficult to flexibly change the content of the teaching material.
[0013]
The present invention has been made to solve the above problems, and its purpose is to provide a pronunciation practice support system and pronunciation practice support capable of supplying pronunciation practice content such as English conversation in a mobile environment such as a mobile phone. It is to provide a method. Further, the object of the present invention includes providing a pronunciation practice support program that realizes the pronunciation practice support system and a computer-readable recording medium that records the program.
[0014]
[Means for Solving the Problems]
In order to solve the above problems, the pronunciation practice support system of the present invention is a pronunciation practice support system that is communicably connected to a learner's terminal device, and acquires voice data input by the learner from the terminal device Audio data acquisition means for performing, pronunciation evaluation means for evaluating the pronunciation of the learner included in the audio data acquired by the audio data acquisition means, and content editing for editing the content according to the evaluation result by the pronunciation evaluation means Means, and content presenting means for presenting the content edited by the content editing means to the terminal device.
[0015]
The pronunciation practice support method of the present invention is a pronunciation practice support method by a pronunciation practice support system that is communicably connected to a learner's terminal device, and is a voice that acquires voice data input from the terminal device by the learner. A data acquisition step, a pronunciation rating step for rating a learner's pronunciation included in the voice data acquired in the voice data acquisition step, and a content editing step for editing content according to the rating result in the pronunciation rating step; And a content presentation step of presenting the content edited in the content editing step on the terminal device.
[0016]
Using the above-described configuration and method, the voice data input by the learner from the terminal device (such as a mobile phone) is acquired, and the pronunciation of the learner included in the voice data is evaluated in the pronunciation practice support system, and the content according to the result Can be presented to the terminal device. In the present specification, “pronunciation” means the meaning of speech uttered by the learner, and is not limited to the utterance of words or sentences in a foreign language, and includes, for example, karaoke and musical instrument performance. That is, the pronunciation practice support system of the present invention is widely applicable as a system for remotely assessing various sounds produced by a learner by installing pronunciation rating means suitable for target speech data.
[0017]
Thus, pronunciation practice content such as English conversation can be supplied in a mobile environment such as a mobile phone. Therefore, it is possible to meet the demands of learners who want to use the “speaking” practice easily.
[0018]
In addition, since the content to be presented on the terminal device is transmitted from the pronunciation practice support system each time, it is easier to change the content as compared with a stand-alone device such as a personal computer.
[0019]
Moreover, since the content to be presented on the terminal device is edited according to the rating result by the pronunciation rating means, it is possible to present the content with appropriate contents according to the situation such as the learning progress and proficiency of the learner. Therefore, it is possible to provide a learning service that can dynamically reflect the situation of each learner on the learning content.
[0020]
From the above, if the pronunciation practice support system is used, for example, as an English conversation learning service, English learners will be trained in “speaking” from the viewpoint of communication, assuming actual situations based on basic learning elements (experience) Yes, it will be possible to provide an opportunity for a “communicadian approach”. Therefore, "I want to learn in a little free time", "Easy learning service that can be expected to have some effect", "Is there any way to enjoy and learn easily", "Content that you want to continue even if your intention is weak ”And“ Interactive content that gives priority to my convenience ”.
[0021]
Furthermore, the pronunciation practice support system of the present invention is such that the pronunciation rating means can change the strictness of the rating according to the accuracy parameter, and the accuracy of the accuracy according to the history of the rating results by the pronunciation rating means. It is characterized by comprising accuracy parameter changing means for changing parameters, and accuracy parameter holding means for holding the accuracy parameters set by the accuracy parameter changing means for each learner.
[0022]
With the above configuration, the accuracy of the rating by the pronunciation rating unit can be changed according to the history of the rating result of the pronunciation rating unit by using the accuracy parameter.
[0023]
Therefore, the strictness of the rating can be changed efficiently according to the situation such as the learning progress and proficiency of the learner. Therefore, the system can automatically reflect different situations for each learner on the learning content. Therefore, it becomes possible to provide a service that can flexibly customize the learning content according to the learner like an English conversation school.
[0024]
Furthermore, the pronunciation practice support system of the present invention is characterized by comprising learner authentication means for authenticating a learner based on the voice data acquired by the voice data acquisition means.
[0025]
With the configuration described above, further, authentication of a learner necessary for providing a service different for each learner can be performed based on voice data input from the terminal device by the learner. Note that the voice data for authentication may be a learner's name or a greeting. Moreover, you may authenticate in combination with ID and a password.
[0026]
Therefore, the service by the pronunciation practice support system can be used without a sense of resistance without making the learner aware of the authentication.
[0027]
The pronunciation practice support system of the present invention further includes learner data storage means for storing learner data for each learner, and the content editing means is stored in the data stored in the learner data storage means. It is characterized by including a classmate adding means for causing a character who simulates another learner to appear in the content.
[0028]
According to the above configuration, another learner can appear in the learner's content. Thereby, the atmosphere of a classroom can be given to content. Further, since other learners appearing in the content are based on the data of actual learners, the learners can feel reality.
[0029]
An avatar can be used as a presentation method for other learners. By expressing the virtual personality as the conversation partner with an avatar, smooth communication is possible, and it is easy to expect high learning effects and continuity.
[0030]
Further, the pronunciation practice support system of the present invention is characterized in that the pronunciation rating means performs association by labeling using both a learner's native language phoneme and a learned foreign language phoneme. Yes.
[0031]
With the above-described configuration, the pronunciation evaluation means further determines whether words and phrases are appropriate by labeling using both the phoneme of the learner's native language (for example, Japanese) and the phoneme of the foreign language to be learned (for example, English). Therefore, it is possible to accurately evaluate without judging a rating error or inappropriate utterance.
[0032]
Therefore, the pronunciation practice support system reduces learning efficiency and usability even when using a terminal device with a process overlap between a voice call function like a mobile phone and a data display / view function as a mobile terminal. It is possible to provide a service.
[0033]
Therefore, the pronunciation rating means is suitable as a pronunciation rating engine mounted on a pronunciation training support system that provides content for pronunciation training in a foreign language.
[0034]
The pronunciation training support program of the present invention is a computer program that causes a computer to function as each of the above-described means.
[0035]
With the above configuration, the pronunciation training support system can be realized by realizing each unit of the pronunciation training support system with a computer.
[0036]
A computer-readable recording medium on which the pronunciation practice support program of the present invention is recorded is a computer-readable recording medium on which a pronunciation practice support program for operating the pronunciation practice support system is realized by causing the above-described means to be realized by a computer. It is a recording medium.
[0037]
With the above configuration, the pronunciation practice support system can be realized on a computer by the pronunciation practice support program read from the recording medium.
[0038]
DETAILED DESCRIPTION OF THE INVENTION
One embodiment of the present invention will be described below with reference to FIGS.
[0039]
FIG. 1 is a functional block diagram showing an outline of the configuration of the pronunciation training support system 1 according to the present embodiment.
[0040]
The pronunciation practice support system 1 supplies content for pronunciation practice such as English conversation to a mobile device (terminal device) 100 such as a mobile phone via a communication network.
[0041]
In this embodiment, in order to provide a service for practicing English conversation, a configuration for evaluating the applicability of learners' pronunciation as English will be described, but the type of voice data to be evaluated is not limited to English. That is, the pronunciation practice support system 1 can employ an utterance rating engine 33 (described later) corresponding to the voice data to be rated. Thus, the pronunciation practice support system 1 can be configured as a system that supports utterances in languages other than English, karaoke, and practice of musical instrument performance.
[0042]
As shown in FIG. 1, the pronunciation practice support system 1 is connected to a mobile device 100 via a mobile phone communication network (not shown). Specifically, the voice call module 101 of the mobile device 100 and the voice call module 11 of the line call control device 10 are connected for a voice call. Further, the data communication module 102 of the mobile device 100 and the data communication module 21 of the Web server 20 are connected for data communication.
[0043]
The pronunciation practice support system 1 includes a line call control device (voice data acquisition means) 10, a Web server (content presentation means) 20, an utterance rating server 30, and a database server 40. Each device constituting the pronunciation practice support system 1 is connected via a LAN (local area network) by data communication modules 12, 21, 31, and 41 for data communication. The device configuration in the pronunciation practice support system 1 can be changed as appropriate. For example, a plurality of devices may be provided according to the processing speed and communication speed, or the four devices shown in FIG. 1 may be integrated into one device. Further, the device manager, the operating system, the native application interface, and the like are omitted from the blocks of the apparatuses shown in FIG.
[0044]
The line call control device 10 includes a voice call module 11, a data communication module 12, a voice call control unit 13, and a voice data conversion unit 14. In the line call control device 10, the voice call control unit 13 receives a voice call signal from the mobile device 100, and the voice data conversion unit 14 converts the voice call signal into a voice data signal that can be processed by the pronunciation practice support system 1. To do. That is, the line call control device 10 acquires the voice data of the learner who is the evaluation target from the mobile device 100. The line call control device 10 may include a terminal authentication unit that authenticates the mobile device 100.
[0045]
The web server 20 includes a data communication module 21 and a web server unit 22. The Web server 20 transmits the content of the English conversation learning service to the mobile device 100 and causes the mobile device 100 to present the content. Specifically, the Web server unit 22 communicates with the virtual machine 103 or the Web browser 105 of the mobile device 100, and transmits an HTML (hypertext markup language) file or the like for the mobile device 100 to present content. Obtained from the content editing unit 44 and transmitted to the mobile device 100.
[0046]
The utterance rating server 30 includes a data communication module 31, a speech recognition engine 32, an utterance rating engine (pronunciation rating means) 33, and an utterance data pattern storage unit 34. The utterance rating server 30 rates the learner's pronunciation included in the voice data acquired by the line call control device 10, and transmits the rating result to the database server 40.
[0047]
The speech recognition engine 32 cuts out phonemes from the learner's speech data.
[0048]
The utterance rating engine 33 performs association by labeling using both Japanese phonemes (the learner's native language) and English phonemes (the foreign language to be learned), so that the learner's utterance word or sentence and the data pattern Matching is performed to automatically evaluate the degree of similarity between the basic frequency patterns of the two utterances. The utterance rating engine 33 refers to the utterance data pattern stored in the utterance data pattern storage unit 34 as appropriate. Further, the utterance rating engine 33 can change the strictness of the rating in accordance with the accuracy parameter 57 (FIG. 2A). The accuracy parameter 57 is stored in the user database 45, and the utterance rating engine 33 acquires the learner accuracy parameter 57 via the user management unit 43.
[0049]
The utterance data pattern storage unit 34 stores utterance data patterns of native English speakers for assessing the suitability of the learner's pronunciation in the content. Note that the utterance data pattern of the utterance for authentication by the learner for authenticating the learner is stored in the user database 45 as an audio file 53 (FIG. 2A).
[0050]
The database server 40 includes a data communication module 41, a user authentication unit (learner authentication unit) 42, a user management unit 43, a content editing unit (content editing unit) 44, and a user database (learner data storage unit, accuracy parameter holding unit). 45, a content database 46 is provided.
[0051]
The user authentication unit 42 authenticates a learner who is a user of the mobile device 100. As the method, a password may be used, or the utterance rating engine 33 may perform comparison with the utterance data pattern (speech file 53) of the learner himself registered in advance as will be described later.
[0052]
The user management unit 43 manages the user database 45. FIG. 2A is an explanatory diagram illustrating an example of a data structure of data stored in the user database 45. In the user database 45, data for managing learners is registered for each learner. FIG. 2A is an example, and the user database 45 is not limited to this.
[0053]
Specifically, as shown in FIG. 2A, in the user database 45, the phone number 52, the audio file 53, the page number 54, the application parameter 55, the log 56, the learner's user ID 51 as a key, The accuracy parameter 57 is registered in association with it. In addition to these, the user database 45 can appropriately register a user's nickname, an e-mail address, and the like.
[0054]
The user ID 51 is a character string for identifying a learner. The telephone number 52 is a character string indicating the learner's telephone number. The audio file 53 is a 16-bit-16 kHz PCM (pulse-code modulation) file containing the learner's own utterance data pattern registered in advance for authentication of the learner. The page number 54 is a character string indicating an html page representing the most recent state of the service used by the learner. The application parameter (content ID) 55 is a character string for identifying the content page. The log 56 is a character string indicating the service use history of the learner. The accuracy parameter 57 is a numerical value (1 to 5 is assigned in the present embodiment) indicating the strictness of the rating by the utterance rating engine 33.
[0055]
Further, the user management unit 43 includes an accuracy parameter changing unit (accuracy parameter changing means) 43a. The accuracy parameter changing unit 43 a changes the accuracy parameter 57 according to the rating result history by the utterance rating engine 33 and stores it in the user database 45 in association with the user ID 51. In this way, by managing the accuracy parameter 57 in association with the user ID 51, the learning progress and skill level of each learner can be dynamically reflected in the learning content.
[0056]
The content editing unit 44 uses the data stored in the content database 46 to edit the content according to the rating result by the utterance rating engine 33.
In addition, the content editing unit 44 includes a classmate adding unit (classmate adding means) 44a. The classmate adding unit 44a makes a character who simulates another learner appear in the content based on the data stored in the user database 45.
[0057]
The content database 46 stores teaching material content data supplied from a content supplier. FIG. 2B is an explanatory diagram showing an example of the data structure of data stored in the content database 46. FIG. 2B is an example, and the content database 46 is not limited to this.
[0058]
Specifically, as shown in FIG. 2B, an utterance example sentence 62, a question 63, and an utterance sample 64 are associated and registered in the content database 46 using the content ID 61 as a key. The content ID 61 is a character string for identifying a content page. The utterance example sentence 62 is a character string displayed as an utterance example sentence. The problem 63 is a character string displayed as a problem. The utterance sample 64 is audio data that is an utterance sample, and the audio data having the same content is stored in different data formats (for example, 16-bit to 16-khz PCM) so that the audio data can be selected according to the playback function of the mobile device 100. Yes.
[0059]
As shown in FIG. 1, the mobile device 100 includes a voice call module 101, a data communication module 102, a virtual machine 103, and a web browser 105. Moreover, the mobile device 100 includes at least a display panel, operation keys, a microphone, and a speaker (not shown) as a mobile phone user interface. The virtual machine 103 presents content by executing the client program 104 received from the Web server 20. The web browser 105 displays content according to the HTML file received from the web server 20. Only one of the virtual machine 103 and the web browser 105 may be provided.
[0060]
In the present embodiment, mobile device 100 will be described as a mobile phone. However, mobile device 100 can use any terminal device as long as it has a voice call function and an application operating environment. The pronunciation practice support system 1 is suitable when a mobile device 100 is used as a terminal device in which processes such as a voice call function and a data display / browsing function, such as a mobile phone, overlap.
[0061]
Subsequently, the operation of the pronunciation practice support system 1 will be described.
[0062]
First, the attendance selection process will be described with reference to FIG. FIG. 3 is an explanatory diagram showing an example of a screen displayed in the attendance selection process.
[0063]
When the learner first accesses the pronunciation practice support system 1 from the mobile device 100, the mobile device 100 displays the screen W11 according to the HTML file received from the Web server unit 22. When the learner selects “English conversation classroom” on the screen W11, the screen W12 is displayed. Next, when the learner selects “business conversation course” on the screen W12, the screen W13 is displayed. Next, when the learner selects one of the lecturers on the screen W13, the screen W14 is displayed. On the screen W14, a hot spot “Let's listen to the live voice” for reproducing the voice of the instructor is displayed. When the learner selects a hot spot “select this course” on the screen W14, the attendance selection process ends.
[0064]
When the attendance selection process ends, the process proceeds to a learner data registration process. In the learner data registration process, a setting screen (not shown) is displayed on the mobile device 100 to allow the learner to input a telephone number 52, a nickname, an e-mail address, and the like. In the learner data registration process, the audio file 53 for authenticating the learner is registered.
[0065]
When the attendance selection process and the learner data registration process are completed, the set data is transmitted from the Web server unit 22 to the user management unit 43. At this time, the user management unit 43 issues a user ID 51 to the learner. The set data is stored in the user database 45 in association with the user ID 51.
[0066]
Next, a learner authentication process by utterance in the pronunciation practice support system 1 will be described with reference to FIGS. 4 to 6. FIG. 4 is a flowchart showing the flow of the learner authentication process by utterance. FIG. 5 is an explanatory diagram showing the registration process of the audio file 53 performed at the first access. FIG. 6 is an explanatory diagram showing a learner authentication process using the audio file 53 performed at the second and subsequent accesses.
[0067]
As shown in FIG. 4, the user authenticating unit 42 collects utterance data patterns by causing a learner to read a task sentence in English at the first access. Specifically, as shown in FIG. 5, first, a short task sentence including the learner's nickname “My name is Jimmy” is displayed (screen W21). After the learner presses the “speak” button on this screen W 21, the user management unit 43 stores the voice data of the learner who has read the task sentence as the voice file 53 in the user database 45. When saving of the audio file 53 is completed, learning content is started (screen W22).
[0068]
For example, a phrase composed of about 5 words of a language to be learned is appropriate as an assignment sentence used for authentication. The content of the phrase can be selected as appropriate. In addition, the user authentication unit 42 may evaluate, by the utterance rating engine 33, whether or not the voice data input by the learner is appropriate as the utterance data pattern for authentication before determining the voice file 53.
[0069]
Then, as shown in FIG. 4, at the second and subsequent accesses, the user authentication unit 42 first acquires the user ID 51 input by the learner to the mobile device 100 (S11). Next, an authentication task sentence is displayed on the mobile device 100 (screen W31), and voice data read by the learner is acquired (S12). Next, this voice data and the voice file 53 associated with the user ID 51 and stored in the user database 45 are collated by the utterance rating engine 33 by matching the data pattern (S13). If the utterance rating engine 33 succeeds in collation (YES in S14), the user authentication unit 42 authenticates the learner (S15) and starts the learning content (screen W32). On the other hand, if the utterance rating engine 33 fails to collate (NO in S14), the user authentication unit 42 does not authenticate the learner (S16) and displays the end screen W33.
[0070]
Thus, the pronunciation practice support system 1 can authenticate the learner by utterance. This allows the learner to utter and authenticate a task sentence in a foreign language to be learned, so that the learner feels it is part of the learning and there is no unnaturalness.
[0071]
Next, the flow of learning content will be described with reference to FIGS. FIG. 7 is an explanatory diagram showing a basic flow of learning content. FIG. 8 is an explanatory diagram showing the flow of conversation events with classmates that occur in learning content.
FIG. 9 is an explanatory diagram of an avatar (incarnation) displayed as learning content.
[0072]
In FIGS. 7, 8, and 10 (described later), the learner himself is “Jimmy (male)” and the classmate is “Rolly (female)”. Instructors, learners, and classmates are displayed with their respective avatars (FIG. 9).
[0073]
As shown in FIG. 7, in the learning content, (1) progress display (screen W41), (2) question sentence and option display (screen W42), (3) answer result display (screen W43), ( 4) The cycle of displaying the obtained number of points (screen W44) is repeated for each question. Therefore, if a lesson consists of 5 questions, this cycle is repeated 5 times.
[0074]
Each screen will be described in detail. On the screen W41, a course name, a lesson name that can be learned, a next step name, and the number of points obtained so far are displayed.
[0075]
On the screen W42, question sentences and answer options are displayed. After the learner presses the “answer” button and reads a sentence of an option to be selected, the voice data is transmitted from the mobile device 100 to the pronunciation practice support system 1. Instead of selection by voice input, selection by a key, a button, or the like may be possible.
[0076]
The screen W43 displays the result of the speech rating engine 33 rating the voice data input by the learner. On the screen W43, the fact that “Jimmy” indicating the learner is the correct answer is displayed together with the obtained number of points.
[0077]
On the screen W44, the points obtained in this step are displayed. When the learner presses the “confirm” button on this screen, the screen shifts to the first screen (corresponding to W42) of the next question.
[0078]
The contents of the screens W41 to W44 are determined by the content editing unit 44 according to the answer of the learner, whether the answer is correct, the time until the answer, and the like. Create based on. For example, an avatar with an expression corresponding to the current number of points is displayed. If the number of points obtained in one step is small, the same step is repeated without proceeding to the next step. At that time, “Avatar with troubled expression” is displayed.
[0079]
Here, as shown in the screen W43 in FIG. 7, the classmate answers the question given by the instructor together with the learner. And even if the learner answers correctly, if the answer is slower than the classmate, there are few points. As a result, the learner can compete with the classmate for faster answering, so that the learner can feel the atmosphere of the English conversation classroom.
[0080]
This classmate is created virtually by the classmate adding unit 44a based on the data of other learners selected from the user database 45 and added to the content. Specifically, the classmate adding unit 44a generates classmates using profile data such as the sex, nickname, and correct answer rate of other learners. Then, the correct answer rate and answer time of the classmate are adjusted in accordance with the correct answer rate and accuracy parameter 57 of the learner. That is, when the level of the learner is high, classmates showing correct answers are presented in a short time. Note that the number of classmates to appear may be one or more.
[0081]
Alternatively, the learner's answer and answer time are recorded in the user database 45, and based on the data, the class mate adding unit 44a is configured to reproduce the answer that the other learner has actually made in the past. You can also present your mate. In this case, the English conversation class can be more realistic.
[0082]
Next, a conversation event with a classmate that occurs in the learning content will be described with reference to FIG. A conversation event is an event in which a classmate suddenly speaks to a learner in the middle of learning content. Note that the conversation event may be incorporated in the learning content in advance, or may be randomly generated by the content editing unit 44. Further, other learners who are using the learning service at that time may be selected as classmates to appear.
[0083]
For example, when the learner has set the level of English conversation on the screen W51, the screen suddenly switches to the screen W52, and the classmate speaks to the learner. When the learner clicks the instructor avatar on the screen W52, advice from the instructor is displayed (screen W53). On the screen W53, the response text to the classmate is displayed, and if the learner presses the “Speak” button, the utterance of the response can be input. If the learner presses the “Listen” button, the instructor utters the text. Is played. The instructor's voice can be reproduced repeatedly.
[0084]
Then, as a result of rating the voice data input by the learner by the utterance rating engine 33, a screen W54 to which the classmate responds is displayed if appropriate. In addition, on the screen W54, a “Friendship Degree” point given according to the learner's response is displayed. If the intimacy becomes equal to or higher than a predetermined value, an additional service may be provided, such as enabling messages to be exchanged with other learners who have become classmate models. Then, after the conversation event ends, the original screen 51 is displayed.
[0085]
By generating such a conversation event, the atmosphere of the English conversation classroom can be made more realistic, and learning can be prevented from becoming monotonous.
[0086]
Next, a process for changing the strictness of the rating by the utterance rating engine 33 will be described with reference to FIG. FIG. 10 is an explanatory diagram showing a process of changing the strictness of the rating by the utterance rating engine 33.
[0087]
The utterance rating engine 33 can adjust the strictness of the rating based on the accuracy parameter 57 stored in the user database 45. The accuracy parameter 57 is automatically changed by the accuracy parameter changing unit 43a based on voice data input by the learner according to the learning content.
[0088]
Specifically, voice data input by the learner at the mobile device 100 in accordance with the learning content (screen W61) is input to the utterance rating engine 33 via the line call control device 10 ((1), (2) ( Audio data acquisition step)). At this time, the utterance rating engine 33 inquires the user management section 43 about the accuracy parameter 57 of the learner ((3)) and acquires it ((4)).
[0089]
Next, the utterance rating engine 33 evaluates the appropriateness of pronunciation included in the audio data with strictness according to the accuracy parameter 57 (pronunciation rating step), and the rating result is sent to the user management unit 43 and the content editing unit 44. Transmit (5). Thereafter, the content editing unit 44 generates content reflecting the evaluation result (content editing step), and transmits this content (screen W62) to the mobile device 100 via the Web server 20 ((6) (content presentation step). )).
[0090]
On the other hand, in the user management unit 43, the evaluation result is recorded in the log 56 (FIG. 2A), and the accuracy parameter changing unit 43a records the evaluation result (for example, the number of incorrect / correct answers has reached a predetermined number of times). Based on the above, the accuracy parameter 57 is changed. Note that the user management unit 43 stores the log 56 and the accuracy parameter 57 in the user database 45 in association with the user ID 51 of the learner. In FIG. 10, the accuracy parameter 57 is set in five stages, but the number of stages can be arbitrarily selected.
[0091]
Thereby, according to the utterance ability and proficiency level of the learner, the strictness of the rating can be automatically changed in the pronunciation practice support system 1 while the learning content is in progress. That is, it does not cause the learner to set the strictness of the rating, nor does it make the learner aware of the setting of the strictness of the rating. Therefore, as in the English conversation school, it is possible to provide an efficient learning service by changing the content of the teaching material flexibly and comfortably according to the state of the learner.
[0092]
Here, the details of the utterance rating engine 33 will be described. The technique implemented in the utterance rating engine 33 is proposed by the inventor of the present invention (Non-Patent Document 4).
[0093]
One factor that makes it difficult for Japanese to learn English is the difference in prosodic operation between Japanese and English. As one method for an English learner (hereinafter referred to as “learner”) to learn the prosodic operation of an English native speaker (hereinafter referred to as “native speaker”), there is a method of uttering by imitating the prosodic operation of the native speaker. In order to support such learning with a computer and automatically evaluate the degree of similarity, it is necessary to appropriately associate two utterances to be compared with each other.
[0094]
The utterance rating engine 33 performs automatic labeling using English phonemes and Japanese phonemes, including phonemes that can be considered as Japanese utterances, on the learner's voice and associates them. For example, automatic labeling is performed in consideration of utterance error patterns in Japanese utterances, such as permitting not only / th / in English but also Japanese / z / as phoneme notation of “th” in “the”. . For the Japanese phoneme model, 43 phonemes provided by the “Development of Japanese dictation basic software” project (http://winnie.kuis.kyoto-u.ac.jp/diction/) can be used. As the English phoneme model, for example, 46 phonemes created using HTK (The HTK Book (Version 2.1)) can be used.
[0095]
FIG. 11 is an explanatory diagram showing an example of association by automatic labeling between Japanese phonemes and English phonemes in the utterance rating engine 33. As shown in FIG. 11, based on the result of manual labeling, a section (frame) of learner speech to be dealt with is determined based on the speech of the native speaker, and the result and the speech rating engine 33 automatically When the ratio of the corresponding frames of learner's speech determined in a determined manner was displaced by 100 ms or more, it was 9.40 (%). This is extremely small as compared with the correspondence deviation by the conventional association method. That is, according to the automatic labeling method of English and Japanese phonemes, two utterances can be accurately associated.
[0096]
In the conventional speech rating engine, (1) fundamental frequency pattern, (2) spectrum information, and (3) automatic labeling of only English phonemes are used to correlate native speaker utterances with learner utterances at the phoneme level. I was doing a grade. However, in the case of these methods, a situation in which words in a phrase are not correctly associated and deviation occurs frequently occurs. Specifically, in the conventional utterance rating engine, when “an apple” is spoken for the expected utterance of “apple”, “an apple” is associated with “apple” and correct rating cannot be performed. Occurs at a considerable frequency. That is, if the user utters “an apple” in response to “apple”, it is handled as an unacceptable case (exception case) in the system design, or the user is notified as an incorrect utterance. It becomes.
[0097]
On the other hand, in the utterance rating engine 33, associating by labeling using both Japanese phonemes and English phonemes as described above, the shift in correspondence is remarkably reduced as compared with the conventional method. Yes. Specifically, even if the user speaks “an apple” to “apple”, “an” is correctly associated with “an” and “apple” is correctly associated with “apple”. It is done. As a result, the pronunciation training support system 1 using the utterance rating engine 33 can notify the learner that the “apple” utterance is correct.
[0098]
When using as a mobile device 100 a terminal device in which a voice call function such as a cellular phone and a data display / browsing function (Web browser, etc.) are separate processes, a series of services of voice utterance → result notification It is desirable that the flow be completed once. That is, when the conventional utterance rating engine as described above is used, when “an apple” is uttered with respect to the “apple” rating premise, it is determined that the rating error or inappropriate utterance, This forces the learner to retry the flow. This has a large adverse effect on learning efficiency and usability in learning services such as English conversation, and is unacceptable in terms of quality. This problem does not occur in a personal computer or the like in which the above two functions are integrated on the interface.
[0099]
As described above, the utterance rating engine 33 can appropriately associate a word and a phrase by labeling using both English phonemes and Japanese phonemes. Can be rated. Therefore, the pronunciation practice support system 1 reduces the learning efficiency and usability even when the mobile device 100 uses a terminal device in which processes such as a voice call function such as a mobile phone and a data display / view function overlap. It is possible to provide a service without causing it to occur.
[0100]
Further, the utterance rating engine 33 determines the accuracy of the learner's utterance based on the degree of pattern matching (degree of matching) between the utterance data of the learner and the data of the native language speaker for rating the utterance. Assess. Thereby, the degree of utterance can be classified using an expression such as a score distribution.
[0101]
For example, when the degree of matching is expressed by 0 to 100 points, the strictness of evaluation is naturally different between a case where 80 points or more are accepted as utterances and a case where 60 points or more are accepted as utterances. And the degree of utterance is very dependent on the individual ability of the learner, and statically prescribing the judgment of how much the utterance is accepted is a learning service that uses this judgment. It ignores the obvious individual differences that exist between the two.
[0102]
The pronunciation practice support system 1 dynamically collects the tendency of the score distribution obtained as a result of pattern matching from the learner's past history and the like, and judges the ability of the learner from the data every time it is evaluated. Then, the distribution points are moved up and down to the extent that the utterance is acceptable. As a result, it becomes possible to switch the strictness of the rating.
[0103]
Here, the accuracy parameter represents a set of utterances that can be accepted from what level of pattern matching rate. The accuracy parameter can be set, for example, in five stages from level 5 at which evaluation is performed most strictly to level 1 at which evaluation is performed most gently.
And even if the acceptance rating set in the default state at the time of starting the service is level 5 or higher, if the learner's subsequent rating result distribution is concentrated in the level of 3-4, the accuracy parameter changing unit 43a Change the acceptance grade set to level 3 or level 4 or higher. Thereby, the utterance rating according to the current utterance tendency of the learner becomes possible. The dense distribution of the learner's evaluation results is managed by a log 56 (FIG. 2A) unique to the learner.
[0104]
As described above, according to the pronunciation practice support system 1, learning is performed in a mobile phone environment that is highly portable and portable, is always connected to a network, and has a high degree of spread and user operation skills on average. Service can be provided. Moreover, since the learning can be customized for each learner, a learning service according to the ability of the learner can be efficiently provided. Therefore, it is possible to provide a learning service that is easy and can be expected to have a high learning effect and continuity.
[0105]
According to the pronunciation practice support system 1, the pronunciation and prosody are objectively assessed by the utterance assessment engine, and the assessment result is linked to the content, so that “speaking English” is practiced. In addition, since it is possible to extract conversations that are frequently used in actual practice, it becomes practice to “listen to English” and “read English”. By solving the problem, it becomes a practice of "writing English" in a pseudo manner. Moreover, a learning effect can be improved by using an avatar for a lecturer and a classmate.
[0106]
According to the pronunciation practice support system 1, the learner can listen to (1) native speech, understand the situation, and learn the utterance, and (2) imagine a real scene with a mobile phone. You can speak, (3) you can receive a rating of utterance, (4) you can immediately check the results of speaking on the mobile phone screen.
[0107]
According to the pronunciation practice support system 1, it is possible to realize a learning service in accordance with the following service concept. In other words, it is possible to realize a “communicative approach” with highly practical learning content that simulates English conversation communication realistically and always assumes real communication. In addition, you can realize an “edutainment approach” that allows you to experience useful English conversation while enjoying various communication events with instructors and classmates. A “task-based approach” that balances learning basic elements into conversational tasks can be realized.
[0108]
Here, each device (the line call control device 10, the Web server 20, the utterance rating server 30, and the database server 40) constituting the pronunciation practice support system 1 can be configured based on a general-purpose computer such as a workstation. In addition, the mobile device 100 can be configured based on a general-purpose computer including a mobile phone and a PDA (Personal Digital Assistant).
[0109]
That is, each device and mobile device 100 constituting the pronunciation practice support system 1 includes a CPU (central processing unit) that executes instructions of a program that realizes each function, a ROM (read only memory) that stores boot logic, and the above RAM (Random Access Memory) for developing programs, storage devices (recording media) such as hard disks for storing the programs and various databases, input devices such as keyboards and mice, output devices such as monitors, speakers, and printers, and external networks The network connection device connected to is connected by an internal bus.
[0110]
The mobile device 100 only needs to have a standard Internet browsing function and can be connected to the Web server 20 via the network in order to present the content acquired from the pronunciation training support system 1.
[0111]
FIG. 12 is an explanatory diagram illustrating a configuration of the mobile device 100 including the virtual machine 103. The virtual machine 103 is an application execution environment that executes a client program 104 (for example, a Java (registered trademark) program) acquired from the Web server 20. As described above, when the content is presented by executing the client program 104 in the virtual machine 103, the client program 104 is a program, and therefore, processing by an event, input, or the like can be executed in the mobile device 100. Therefore, it is possible to implement various behaviors according to the content.
[0112]
FIG. 13 is an explanatory diagram showing a configuration of the mobile device 100 including the web browser 105. The web browser 105 is a kind of native application, and displays on the screen of the mobile device 100 documents and data for a document structure described in a markup language such as HTML and HTML acquired from the web server 20. As described above, when the web browser 105 presents content according to an HTML file or the like, since the web browser 105 is installed in most mobile devices, the pronunciation training support system 1 can be used from many mobile devices.
[0113]
Finally, an object of the present invention is to provide a recording medium in which a program code (execution format program, intermediate code program, source program) of a pronunciation practice support program, which is software that realizes the above-described functions, is recorded in a computer-readable manner. Alternatively, this can also be achieved by supplying to the apparatus and reading and executing the program code recorded on the recording medium by the computer (or CPU, MPU, DSP) of the system or apparatus.
[0114]
Specifically, each function block included in the line call control device 10, the Web server 20, the utterance rating server 30, and the database server 40 is configured such that a predetermined program stored in a memory (not shown) is stored in the microprocessor in each device. It is realized by executing.
[0115]
The recording medium for supplying the program code can be configured to be separable from the system or apparatus. The recording medium may be a medium that is fixedly supported so that the program code can be supplied. Even if the recording medium is attached to the system or apparatus so that the recorded program code can be directly read by the computer, the recording medium can be connected via the program reading apparatus connected to the system or apparatus as an external storage device. It may be mounted so that it can be read.
[0116]
For example, as the recording medium, a disk including a tape system such as a magnetic tape or a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, and an optical disk such as a CD-ROM / MO / MD / DVD / CD-R. Card system such as IC card, IC card (including memory card) / optical card, or semiconductor memory system such as mask ROM / EPROM / EEPROM / flash ROM.
[0117]
Further, the program code may be recorded so that the computer can read out from the recording medium and directly execute it, or after being transferred from the recording medium to the program storage area of the main memory, the computer can read out and execute it from the main memory. It may be recorded as follows.
[0118]
Furthermore, the system or apparatus may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited. Specifically, the Internet, intranet, extranet, LAN, ISDN, VAN, CATV communication network, virtual private network, telephone line network, mobile communication A network, a satellite communication network, etc. can be used. In addition, the transmission medium constituting the communication network is not particularly limited, and specifically, it is an infrared ray such as IrDA or a remote control even in a wired manner such as IEEE 1394, USB, power line carrier, cable TV line, telephone line, ADSL line or the like. , Bluetooth, 802.11 wireless, HDR, mobile phone network, satellite line, terrestrial digital network, and the like. The present invention can also be realized in the form of a carrier wave or a data signal sequence in which the program code is embodied by electronic transmission.
[0119]
The functions described above are not only realized by executing the program code read out by the computer, but based on an instruction of the program code, an OS or the like operating on the computer partially or entirely in actual processing. It is also realized by performing.
[0120]
Furthermore, the function described above is obtained by writing the program code read from the recording medium into a memory provided in a function expansion board attached to the computer or a function expansion unit connected to the computer, and then the program code. Based on the instruction, the CPU or the like provided in the function expansion board or function expansion unit also implements part or all of the actual processing.
[0121]
【The invention's effect】
As described above, the pronunciation practice support system according to the present invention is a pronunciation practice support system that is communicably connected to a learner's terminal device, and obtains voice data that is acquired by the learner from the terminal device. Means, a pronunciation rating means for rating the learner's pronunciation included in the voice data acquired by the voice data acquisition means, a content editing means for editing content according to the rating result by the pronunciation rating means, and the above Content presenting means for presenting the content edited by the content editing means to the terminal device.
[0122]
The pronunciation practice support method of the present invention is a pronunciation practice support method by a pronunciation practice support system that is communicably connected to a learner's terminal device, and is a voice that acquires voice data input from the terminal device by the learner. A data acquisition step, a pronunciation rating step for rating a learner's pronunciation included in the voice data acquired in the voice data acquisition step, and a content editing step for editing content according to the rating result in the pronunciation rating step; A content presentation step of presenting the content edited in the content editing step on the terminal device.
[0123]
Therefore, pronunciation practice content such as English conversation can be supplied in a mobile environment such as a mobile phone. Therefore, there is an effect that it is possible to meet the demand of the learner who wants to easily use the “speaking” practice.
[0124]
Further, since the content to be presented on the terminal device is transmitted from the pronunciation practice support system each time, the content can be easily changed as compared with a stand-alone device such as a personal computer.
[0125]
Moreover, since the content to be presented on the terminal device is edited according to the rating result by the pronunciation rating means, it is possible to present the content with appropriate contents according to the situation such as the learning progress and proficiency of the learner. Therefore, it is possible to provide a learning service that can dynamically reflect the situation of each learner on the learning content.
[0126]
Furthermore, the pronunciation practice support system of the present invention is such that the pronunciation rating means can change the strictness of the rating according to the accuracy parameter, and the accuracy of the accuracy according to the history of the rating results by the pronunciation rating means. An accuracy parameter changing unit that changes a parameter, and an accuracy parameter holding unit that holds the accuracy parameter set by the accuracy parameter changing unit for each learner.
[0127]
Therefore, the strictness of the rating can be changed efficiently according to the situation such as the learning progress and proficiency of the learner. Therefore, the system can automatically reflect different situations for each learner on the learning content. Therefore, it is possible to provide a service that can flexibly customize the learning content according to the learner like an English conversation school.
[0128]
Furthermore, the pronunciation practice support system of the present invention is configured to include learner authentication means for authenticating a learner based on the voice data acquired by the voice data acquisition means.
[0129]
Therefore, there is an effect that the service by the pronunciation practice support system can be used without a sense of resistance without making the learner conscious of being authenticated.
[0130]
The pronunciation practice support system of the present invention further includes learner data storage means for storing learner data for each learner, and the content editing means is stored in the data stored in the learner data storage means. Based on this, it is configured to include classmate adding means for causing a character imitating another learner to appear in the content.
[0131]
Therefore, other learners can appear in the learner's content. Thereby, there exists an effect that the atmosphere of a classroom can be provided to content. In addition, since other learners appearing in the content are based on the actual learner data, the learner can feel the reality.
[0132]
Furthermore, in the pronunciation practice support system of the present invention, the pronunciation rating means associates by labeling using both a learner's native language phoneme and a learned foreign language phoneme.
[0133]
Therefore, the pronunciation practice support system further improves the learning efficiency and usability even when using a terminal device with a process overlapping between a voice call function such as a mobile phone and a data display / view function as a mobile terminal. There is an effect that the service can be provided without lowering.
[0134]
The pronunciation training support program of the present invention is a computer program that causes a computer to function as each of the above-described means.
[0135]
Therefore, it is possible to realize the pronunciation practice support system by realizing each means of the pronunciation practice support system with a computer.
[0136]
A computer-readable recording medium on which the pronunciation practice support program of the present invention is recorded is a computer-readable recording medium on which a pronunciation practice support program for operating the pronunciation practice support system is realized by causing the above-described means to be realized by a computer. It is a recording medium.
[0137]
Therefore, the pronunciation practice support system can be realized on the computer by the pronunciation practice support program read from the recording medium.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing an outline of a configuration of a pronunciation training support system according to an embodiment of the present invention.
2 is an explanatory diagram showing a data structure of data used by the pronunciation practice support system shown in FIG. 1. FIG. 2 (a) is an example of a data structure of data stored in a user database, FIG. b) shows an example of the data structure of data stored in the content database.
FIG. 3 is an explanatory diagram showing an example of a screen displayed in the attendance selection process in the mobile device shown in FIG. 1;
4 is a flowchart showing a flow of user authentication processing by utterance in the pronunciation practice support system shown in FIG. 1; FIG.
FIG. 5 is an explanatory diagram showing audio file registration processing performed at the first access in the pronunciation practice support system shown in FIG. 1;
6 is an explanatory diagram showing a learner authentication process using an audio file that is performed at the second and subsequent accesses in the pronunciation practice support system shown in FIG. 1; FIG.
7 is an explanatory diagram showing a basic flow of learning content presented to the mobile device shown in FIG. 1; FIG.
8 is an explanatory diagram showing the flow of a conversation event with classmates that occurs in learning content presented to the mobile device shown in FIG. 1. FIG.
FIG. 9 is an explanatory diagram of an avatar displayed on the mobile device shown in FIG. 1;
FIG. 10 is an explanatory diagram showing a process of changing the strictness of the rating by the utterance rating engine in the pronunciation practice support system shown in FIG. 1;
FIG. 11 is an explanatory diagram showing an example of association by automatic labeling of Japanese phonemes and English phonemes in the utterance rating engine of the pronunciation practice support system shown in FIG. 1;
12 is an explanatory diagram showing a configuration including a virtual machine of the mobile device shown in FIG.
13 is an explanatory diagram showing a configuration including a Web browser of the mobile device shown in FIG. 1. FIG.
[Explanation of symbols]
1 Pronunciation practice support system
10 Line call control device (voice data acquisition means)
20 Web server (content presentation means)
33 Speech rating engine (pronouncement rating means)
42 User authentication unit (learner authentication means)
43a Accuracy parameter changing unit (Accuracy parameter changing means)
44 Content editing section (content editing means)
44a Classmate addition part (Classmate addition means)
45 User database (learner data storage means, accuracy parameter holding means)
57 Accuracy parameters
100 Mobile equipment (terminal equipment)

Claims

A pronunciation practice support system communicably connected to a learner's terminal device,
Voice data acquisition means for acquiring voice data input by the learner from the terminal device; pronunciation rating means for rating the pronunciation of the learner included in the voice data acquired by the voice data acquisition means;
Content editing means for editing content according to the rating result by the pronunciation rating means;
A pronunciation practice support system comprising: content presenting means for presenting the content edited by the content editing means on the terminal device.

The pronunciation rating means can change the strictness of the rating according to the accuracy parameter, and
Accuracy parameter changing means for changing the accuracy parameter according to the history of the rating results by the pronunciation rating means;
The pronunciation practice support system according to claim 1, further comprising accuracy parameter holding means for holding the accuracy parameter set by the accuracy parameter changing means for each learner.

The pronunciation practice support system according to claim 1, further comprising learner authentication means for authenticating a learner based on the voice data acquired by the voice data acquisition means.

Characters comprising learner data storage means for storing learner data for each learner, and the content editing means imitating other learners based on the data stored in the learner data storage means The pronunciation practice support system according to any one of claims 1 to 3, further comprising classmate adding means for causing the content to appear in the content.

5. The phonetic rating means associating by labeling using both a phoneme of a learner's native language and a phoneme of a foreign language to be learned. Pronunciation support system described in 1.

A pronunciation practice support method by a pronunciation practice support system that is communicably connected to a learner's terminal device,
An audio data acquisition step of acquiring audio data input by the learner from the terminal device;
A pronunciation rating step for rating the pronunciation of the learner included in the voice data acquired in the voice data acquisition step;
A content editing step for editing the content according to the rating result in the pronunciation rating step;
A pronunciation presentation support method, comprising: a content presentation step of presenting the content edited in the content editing step on the terminal device.

A pronunciation practice support program for operating the pronunciation practice support system according to any one of claims 1 to 5, wherein the pronunciation practice support program causes a computer to function as each of the above means.

A computer-readable recording medium on which the pronunciation practice support program according to claim 7 is recorded.