JP2004347906A

JP2004347906A - Educational system and speech input/output apparatus

Info

Publication number: JP2004347906A
Application number: JP2003145718A
Authority: JP
Inventors: Toshiki Kanemichi; 敏樹金道; Hitoshi Hayashi; 仁林
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-05-23
Filing date: 2003-05-23
Publication date: 2004-12-09

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem in an educational system using a video conference system such that a user, particularly a child, can not bear with speech from a monitor screen or a speaker for a long time, or can not concentrate. <P>SOLUTION: This educational system has first and second sound input/output apparatuses, and supports education by bidirectionally transmitting and receiving sound information using the two sound input/output apparatuses. The system is constituted to convert the sound, when outputting the sound information inputted from at least one sound input/output apparatus to the other apparatus. This converts the sound outputted, and provides an environment allowing the user to put up with for a long time or concentrate. Then, the user can concentratedly learn. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声入出力装置等を有して、双方向に音声情報を送受信することにより教育を支援する教育システム等に関する。
【０００２】
【従来の技術】
従来から、テレビ会議システムを用いた、教育システムがある。本テレビ会議システムは、正当な受講資格を有する者であることを認証する認証機能、複数の生徒同士を分けるグループ分け機能等の教育に必要な機能を備えたシステムである（特許文献１参照）。例えば、本テレビ会議システムを用いて、英会話教育を行う場合、本システムは、上記のグループ分け機能により、同じレベルの生徒同士を同一のグループに構成し、グループ内の生徒同士が会話の練習を行えるようにする。すなわち、英会話力に差がある生徒同士が練習を行わせるよりも、同等の英会話力を有する生徒同士に会話練習を行わせて、効率的に英会話力を向上させることを提供する。また、本システムは、正当な受講資格を有する者であるか否かを判断でき、不正な者が受講できないようにすることを提供する。
【０００３】
【特許文献１】
特開２００１−２２４０００号公報（第１頁、第１図等）
【０００４】
【発明が解決しようとする課題】
しかしながら、テレビ会議システムを用いた教育システムは、以下の潜在的な問題を有する。利用者（特に、子供）は、モニター画面またはスピーカーの音声に対して、長時間耐えられない、または集中できない等の問題がある。なぜならば、利用者は、モニター画面の向こうにいる人（先生）からでなく、感情のない機械から教わっているように感じ、結果、授業がつまらなくなり、集中力が低下する。また、直接、人間同士が接する授業のように、緊張感が生じにくいため、利用者は、授業に集中できず、学習効果が上がらない。
【０００５】
【課題を解決するための手段】
以上の課題を解決するために、本発明は、第一音声入出力装置と第二音声入出力装置を有し、二つの音声入出力装置により双方向に音声情報を送受信することにより教育を行うことを支援する教育システムであって、少なくとも一方の音声入出力装置から入力された音声情報を他方の音声入出力装置に出力する際に、音声変換を行う教育システムである。かかるシステムは、出力される音声を変換し、利用者が長時間耐えられる、または集中できる環境を提供する。よって、利用者は、集中して学習できる。
【０００６】
また、他の本発明は、上記の音声変換された音声がキャラクタの音声である教育システムである。かかるシステムにより、入力された音声が、他方で出力される際に、キャラクタの音声で出力される。例えば、キャラクタの音声が漫画の主人公の声である場合に、利用者（特に、子供）は、集中して、楽しく学習できる。
【０００７】
また、他の本発明は、上記の音声変換された音声がユーザにより指定された音声である教育システムである。かかるシステムにより、入力された音声が、他方で出力される際に、ユーザが指定した音声で出力される。例えば、ユーザが指定した音声が利用者の憧れの俳優の声である場合に、利用者は、集中して、楽しく学習できる。
【０００８】
【発明の実施の形態】
【０００９】
以下に、本発明の実施の形態について、図面を用いて詳細に説明する。なお、本実施の形態において、同一の符号を用いた構成要素やフローチャートのステップなどは、同じ機能を果たすので、一度説明したものについて説明を省略する場合がある。
【００１０】
（実施の形態１）
図１は、本実施の形態における教育システムの構成を示すブロック図である。教育システムは、第一端末１０１と第二端末１０２を有する。第一端末１０１は、第一音声入出力装置１０１１、第一映像入出力装置１０１２、第一送受信部１０１３を有する。第一音声入出力装置１０１１は、第一音声入力部１０１１１、第一音声出力部１０１１２、音声変換部１０１１３を有する。また、第一映像入出力装置１０１２は、第一映像入力部１０１２１、第一映像出力部１０１２２を有する。第二端末１０２は、第二音声入出力装置１０２１、第二映像入出力装置１０２２、第二送受信部１０２３を有する。第二音声入出力装置１０２１は、第二音声入力部１０２１、第二音声出力部１０２２を有する。また第二映像入出力装置１０２２は、第二映像入力部１０１２１、第二映像出力部１０１２２を有する。
【００１１】
第一音声入力部１０１１１は、第一ユーザの音声の入力を受け付ける。第一音声入力部１０１１１は、通常、マイクロホンおよびそのドライバーソフト等により実現され得る。
【００１２】
第一音声出力部１０１１２は、第二ユーザの音声情報を出力する。第一音声出力部１０１１２は、通常、スピーカーおよびそのドライバーソフト等により実現され得る。
【００１３】
音声変換部１０１１３は、第一送受信部１０１３が受信した第二ユーザの音声情報を、例えば、キャラクタの音声を表す情報に、またはユーザにより指定された音声情報に変換する。音声情報とは、音声に関する情報であり、例えば、音声を電気信号に変換した情報等を有する。また、キャラクタの音声を表す情報とは、例えば、テレビ漫画のキャラクター（声優）の音声、俳優の音声、美しく澄んだ音声、アナウンサーの音声、ユーザ自身の音声、ユーザの知り合いの音声、その他の音声を表す情報等である。また、ユーザにより指定された音声情報とは、例えば、上記キャラクタの音声情報等である。音声情報を指定する手段は、例えば、音声指定手段等（例えば、指定ボタンを押す等）により実現されうる。音声変換の方法は、例えば、音声情報に所定の合成処理を行い、予め決められた、または指定のキャラクタの音声に変換する方法等である。この合成処理は、例えば、加算合成（例えば、周波数発信器と加算器により構成される処理）、減算合成（所定の周波数フィルター処理）、音を揺らす処理、倍音を少しずらす処理、高・低音部を強調する（基音と倍音の比率を変える）処理、周波数を変化させて音程を変化させる処理等、または上記の処理の組み合わせにより合成する処理等である。また、音声変換の方法は、以下の方法でもよい。まず、音声情報を認識（例えば、基本周波数、ホルマント等の周波数の特徴を抽出して、認識）する。次に、認識された音声情報を文字データに変換する。次に、当該文字データに相当するキャラクタの音声波形を取得して、文字データに基づいて、音声波形を合成する。なお、キャラクタの音声波形は、音声変換部１０１１３に内蔵の記録媒体に予め記録されている。音声変換部１０１１３は、通常、ＭＰＵやメモリ等から実現され得る。音声変換するための処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。
【００１４】
第一映像入力部１０１２１は、第一ユーザの映像の入力を受け付ける。第一映像入力部１０１２１は、映像入力手段等で実現できる。第一映像入力部１０１２１は、例えば、デジタルビデオカメラ等が好適である。
【００１５】
第一映像出力部１０１２２は、第一送受信部１０１３が受信した第二ユーザの映像情報を出力する。第一映像出力部１０１２２は、例えば、ディスプレイとそのドライバーソフト等により実現される。
【００１６】
第一送受信部１０１３と第二送受信部１０２３は、音声情報、映像情報等を送信または受信する。映像情報とは、映像の情報であり、例えば、映像入力部が入力を受け付けたユーザの映像の情報等である。第一送受信部１０１３と第二送受信部１０２３は、例えば、ネットワークカードとそのドライバーソフト等を有して実現される。なお、音声情報と映像情報は、多重化されて送信されても良く、送信の形態は、問わない。かかる送信の形態は、他の実施の形態においても同様である。
【００１７】
第二音声入力部１０２１１は、第二ユーザの音声の入力を受け付ける。第二音声入力部１０２１１は、通常、マイクロホンおよびそのドライバーソフト等により実現され得る。
【００１８】
第二音声出力部１０２１２は、第二送受信部１０２３が受信した第一ユーザの音声情報を出力する。第二音声出力部１０２１２は、通常、スピーカーおよびそのドライバーソフト等により実現され得る。
【００１９】
第二映像入力部１０２２１は、第二ユーザの映像の入力を受け付ける。第二映像入力部１０２２１は、映像入力手段等で実現できる。第二映像入力部１０２２１は、例えば、デジタルビデオカメラ等が好適である。
【００２０】
第二映像出力部１０２２２は、第二送受信部１０２３が受信した第一ユーザの映像情報を出力する。第二映像出力部１０２２２は、例えば、ディスプレイとそのドライバーソフト等により実現される。
【００２１】
以下、本実施の形態における第一端末１０１において、音声および映像を入力し、音声情報および映像情報を外部装置（例えば、第二端末１０２等）に送信する動作について図２のフローチャートを用いて説明する。
【００２２】
（ステップＳ２０１）第一端末１０１は、第二端末１０２と接続する入力を受け付けたか否かを判断する。第一端末１０１が第二端末１０２と接続する入力を受け付けた場合、ステップＳ２０２に行き、受け付けない場合、ステップＳ２０１に戻る。なお、接続する入力は、第一端末１０１の電源スイッチをＯＮにする入力、または接続スイッチをＯＮにする入力等のものでも良く、その他の入力手段でも良い。また、接続する入力は、第二端末１０２が送信した接続信号（接続する旨の信号）を受信したことでも良い。
【００２３】
（ステップＳ２０２）第一音声入力部１０１１１は、音声の入力を受け付けたか否かを判断する。また、第一映像入力部１０１２１は、映像の入力を受け付けたか否かを判断する。第一音声入力部１０１１１または第一映像入力部１０１２１は、音声および映像の入力を受け付けた場合、ステップＳ２０３に行き、受け付けなければ、ステップＳ２０２に戻る。
【００２４】
（ステップＳ２０３）第一送受信部１０１３は、ステップＳ２０２で入力を受け付けた音声情報および映像情報を第二端末１０２に送信する。
【００２５】
（ステップＳ２０４）第一端末１０１は、第二端末１０２との接続を切断する入力を受け付けたか否かを判断する。第一端末１０１は、第二端末１０２との接続を切断する入力を受け付けた場合、第一端末１０１の動作は終了し、切断する入力を受け付けない場合、ステップＳ２０２に行く。なお、切断する入力は、第一端末１０１の電源スイッチをＯＦＦにする入力、または接続スイッチをＯＦＦにする入力等のものでも良い。
【００２６】
なお、第二端末１０２において音声および映像を入力し、外部装置（例えば、第一端末１０１等）に送信する動作は、上記フローチャートで説明した動作と同様である。また、本実施の形態は、映像の入力を受け付けない、映像情報を送受信しない、および映像情報を出力しないものでも良い。かかる形態は、他の実施の形態においても同様である。
【００２７】
以下、本実施の形態における第一端末１０１において、第二端末１０２から音声情報及び映像情報を受信し、音声情報を音声変換し、出力する動作について図３のフローチャートを用いて説明する。
【００２８】
（ステップＳ３０１）第一端末１０１は、第二端末１０２と接続する入力を受け付けたか否かを判断する。第一端末１０１が第二端末１０２と接続する入力を受け付けた場合、ステップＳ３０２に行き、受け付けない場合、ステップＳ３０１に戻る。なお、接続する入力の手段は、ステップＳ２０１と同様である。
【００２９】
（ステップＳ３０２）第一送受信部１０１３は、第二端末１０２から音声情報および映像情報を受信したか否かを判断する。第一送受信部１０１３が、音声情報および映像情報を受信した場合、ステップＳ３０３に行き、受信しない場合には、ステップＳ３０２に戻る。
【００３０】
（ステップＳ３０３）第一音声出力部１０１１２は、ステップＳ３０２で受信された音声情報を、予め決められた音声情報、またはユーザによって指定されたキャラクタの音声情報に音声変換する。
【００３１】
（ステップＳ３０４）第一音声出力部１０１１２は、ステップＳ３０３で音声変換された、キャラクタの音声情報を出力する。
【００３２】
（ステップＳ３０５）第一映像出力部１０１２２は、ステップＳ３０２で受信された映像情報を出力する。
【００３３】
（ステップＳ３０６）第一端末１０１は、第二端末１０２との接続を切断する入力を受け付けたか否かを判断する。第一端末１０１は、第二端末１０２との接続を切断する入力を受け付けた場合、第一端末１０１の動作は終了し、切断する入力を受け付けない場合、ステップＳ３０２に行く。なお、切断する入力の手段は、ステップＳ２０４と同様である。
【００３４】
なお、第二端末１０２において音声情報および映像情報を受信し、出力する動作は、上記フローチャートで説明した動作と同様である。
【００３５】
以下、本実施の形態における教育システム（英会話教育のシステム）の具体的な動作について説明する。図４は、本実施の形態における教育システムの概念の例を示す図である。図４において、今、第一端末４０１は、パソコン４０２のモニター４０２１の上に設置されている。また、第一端末４０１は、デジタルビデオカメラ４０１１を有し、またマイク付きヘッドホン４０１２（音声入出力装置）と専用線を介して接続されている。第二端末４１１は、テレビ４１２のテレビ画面４１２１の上に設置されている。また、第二端末４１１は、デジタルビデオカメラ４１１１を有し、またマイク付きヘッドホン４１１２（音声入出力装置）と専用線を介して接続されている。また、第一端末４０１、第二端末４１１、教育システムのサーバー装置４２０は、インターネットを介して接続されている。モニター４０２１は、第二端末４１１側のユーザの映像が出力されている。またテレビ画面４１２１は、第一端末４０１側のユーザの映像が出力されている。なお、第一端末４０１は、生徒である。また、第二端末４１１は、英会話の先生である。
【００３６】
また教育システムのサーバー装置４２０は、以下の動作を行う。例えば、サーバー装置４２０は、日々の授業内容（授業スケジュール）を出力指示する、または授業の終了時間を出力指示する等を行う。また、サーバー装置４２０は、第一端末４０１と第二端末４１１のアクセス許可の判断を行い、例えば、不正に授業を受ける者が存在する場合に、第一端末４０１と第二端末４１１を接続しないようにする等の動作を行う。また、サーバー装置４２０は、例えば、複数の端末から音声情報および映像情報を受信し、各端末に送信する等を行う。
【００３７】
以下、音声情報をキャラクタの音声に変換する動作について具体的に説明する。マイク付きヘッドホン４１１２のマイクは、第二端末４１１側のユーザが発声した音声を、取得し、電気信号（音声情報）に変換する。第二送受信部は、当該音声情報を、第一端末４０１に送信する。第一送受信部は、当該音声情報を受信する。
【００３８】
次に、音声変換部１０１１３は、受信した音声情報を、予め決められたキャラクタの音声を表す情報に変換する。または、ユーザにより指定された音声情報に変換する。予め決められたキャラクタの音声を表す情報が、テレビ漫画Ａの主人公Ｂの音声を表す情報である場合、主人公Ｂの音声波形データは、音声変換部１０１１３に内蔵の記録媒体に、予め記録されている。また、ユーザにより指定された音声情報に変換する場合、指定された音声情報の音声波形データは、音声変換部１０１１３に内蔵の記録媒体に予め、記録されている。ここで音声波形データは、例えば、主人公Ｂの音声を再現できる波形データである。当該波形データのデータ構造は問わない。
【００３９】
また、ユーザがキャラクタの音声情報を指定する方法は、例えば、ユーザが、毎回、指定ボタン（音声指定手段）を押して、当該指定ボタンに応じたキャラクタの音声情報を指定する方法等である。また、キャラクタの音声情報を毎回指定せず、前回指定されたキャラクタの音声情報が、指定されるものでも良い。
【００４０】
音声変換部１０１１３は、受信した音声情報の基本周波数、ホルマント等の周波数の特徴を検出する。次に、受信した音声情報とキャラクタの音声波形データの各々の基本周波数、ホルマント等のデータを比較減算する。比較減算の結果が零（つまり、値が一致する）または、ほぼ零になるように、音声変換部１０１１３は、受信した音声情報に上記の合成処理を行う。なお、上記の合成処理以外の処理を行い、音声変換するものでも良い。
【００４１】
また、音声変換部１０１１３は、以下に示す方法で音声変換するものでも良い。まず、音声変換部１０１１３は、受信した音声情報を認識（例えば、基本周波数、ホルマント等の周波数の特徴を抽出して、認識）する。次に、音声変換部１０１１３は、認識された音声情報を文字データに変換する。次に、音声変換部１０１１３は、当該文字データに相当するキャラクタの音声波形を取得して、文字データに基づいて、音声波形を合成する。
【００４２】
次に、マイク付きヘッドホン４０１２は、音声変換部１０１１３が変換した音声情報を出力する。よって、第二端末４１１側のユーザの音声を、予め決められた、またはユーザが指定したキャラクタの音声で、第一端末４０１側のユーザは、聞くことができる。
【００４３】
なお、第一端末４０１側のユーザが発声した音声を、マイク付きヘッドホン４０１２のマイクは、取得し、電気信号（音声情報）に変換する。第一送受信部は、当該音声情報を、第二端末４１１に送信する。第二送受信部は、当該音声情報を受信する。第二音声出力部のマイク付きヘッドホン４１１２は、当該音声情報を音声に変換して出力する。
【００４４】
なお、本実施の形態において、音声変換部１０１１３は、第一端末４０１に内蔵されているものだったが、音声変換部１０１１３は、第二端末４１１、またはサーバー装置４２０に内蔵されているものでも良い。音声変換部１０１１３がサーバー装置４２０に内蔵されている場合、第二端末４１１は、音声情報をサーバー装置４２０に送信し、サーバー装置４２０は、音声情報を受信し、当該音声情報を音声変換し、変換された音声情報を、第一端末４０１に送信する。また、音声変換部１０１１３が第二端末４１１に内蔵されている場合、第二端末４１１は、音声情報を音声変換し、当該音声変換した音声情報を第一端末４０１に送信する。
【００４５】
また、本実施の形態において、第二端末４１１は、音声変換部を有さないものだったが、第二端末４１１は、音声変換部を有するものでも良い。かかる場合、第二端末４１１は、第一端末４０１側のユーザの音声をキャラクタの音声に変換して、出力できる。
【００４６】
また、本実施の形態において、映像出力部は、他方の端末から入力された映像を出力するものだったが、他方の音声情報をキャラクタの音声情報に変換することに応じて、映像出力部は、かかるキャラクタの映像情報を出力するものでも良い。なお、キャラクタの映像情報は、端末に内蔵の記録媒体に格納されているものでも良く、サーバー装置に内蔵の記録媒体に格納されているものでも良い。
【００４７】
以上、本実施の形態によれば、第二端末で入力された音声が、第一端末で出力される場合に、キャラクタの音声情報で出力される。例えば、キャラクタの音声がテレビ漫画の主人公の声である場合に、利用者（特に、子供）は、集中して、楽しく学習できる。また、例えば、ユーザが指定した音声が利用者の憧れの俳優の声である場合に、利用者は、集中して、楽しく学習できる。さらに、同一人物（例えば、キャラクタＡ）が別々の授業を同時に行うことは物理的に不可能であった。本発明によれば、入力された先生の音声をキャラクタＡの音声に音声変換できる。よって、同時に行われる別々の授業であっても、各々の生徒は、キャラクタＡの音声で授業を受けることができる。例えば、授業内容が異なるクラスＡの授業とクラスＢの授業が同時に行われる場合、クラスＡ、Ｂの生徒は、人気の高いキャラクタＡの音声で同時に授業を受けることができる。よって、クラスＡ、Ｂの生徒は、授業に集中でき、学習効果も向上する。
【００４８】
さらに、本実施の形態において説明した教育システムの動作について、ソフトウェアで実現し、当該ソフトウェアを例えば、サーバー上に置いて、ソフトウェアダウンロードにより当該ソフトウェアを配布しても良い。さらにソフトウェアをＣＤ−ＲＯＭ等の記録媒体に記録して流布しても良い。このことは、すべての他の実施の形態においても同様である。なお、本実施の形態における動作をソフトウェアで実現した場合のプログラムは、以下のようになる。コンピュータに、少なくとも一方の音声入出力装置から入力された音声情報を受信する音声情報受信ステップと、当該受信した音声情報を音声変換する音声変換ステップと、当該音声変換された音声情報を出力する音声情報出力ステップを実行させるためのプログラムである。なお、音声情報出力ステップにおける、音声変換された音声情報を出力するとは、他の音声入出力装置に送信することを含む概念である。
【００４９】
（実施の形態２）
図５は、本実施の形態における教育システムの構成を示すブロック図である。教育システムは、第一端末５０１、第二端末１０２、サーバー装置５０３を有する。第一端末５０１は、第一音声入出力装置５０１１、第一映像入出力装置１０１２、第一送受信部１０１３を有する。第一音声入出力装置５０１１は、第一音声入力部１０１１１、第一音声出力部１０１１２を有する。また、第一映像入出力装置１０１２は、第一映像入力部１０１２１、第一映像出力部１０１２２を有する。サーバー装置５０３は、サーバー送受信部５０３１、音声変換部１０１１３を有する。
【００５０】
サーバー送受信部５０３１は、端末から音声情報、映像情報等を受信する、また、端末に音声情報、映像情報等を送信する。サーバー送受信部５０３１は、例えば、ネットワークカードとそのドライバーソフト等を有して実現される。
【００５１】
音声変換部１０１１３は、キャラクタ指定情報に応じて、音声情報を音声変換する。なお、音声変換する方法は、実施の形態１と同様である。音声変換部１０１１３は、キャラクタ指定情報を端末から取得する。キャラクタ指定情報は、指定されたキャラクタに関する情報であり、例えば、「第二端末１０２の音声情報はキャラクタＡの音声情報に変換」、「第三端末の音声情報はキャラクタＢの音声情報に変換」等の情報を有する。なお、端末は、キャラクタ指定情報を予め記録していても良く、キャラクタ指定情報の入力を受け付けるキャラクタ指定入力受付手段を有するものでも良い。なお、キャラクタ指定情報は、端末から取得せず、サーバー装置５０３に内蔵の記録媒体に、記録しているものでも良い。
【００５２】
以下、本実施の形態におけるサーバー装置５０３の動作について図６のフローチャートを用いて説明する。なお、第一端末５０１が送信する音声情報を第一音声情報、第二端末１０２が送信する音声情報を第二音声情報とする。
【００５３】
（ステップＳ６０１）サーバー装置５０３は、第一端末５０１および第二端末１０２と接続する入力を受け付けたか否かを判断する。サーバー装置５０３が接続する入力を受け付けた場合、ステップＳ６０２に行き、受け付けない場合、ステップＳ６０１に戻る。なお、接続する入力は、サーバー装置５０３の電源スイッチをＯＮにする入力、または接続スイッチをＯＮにする入力等のものでも良く、その他の入力手段でも良い。
【００５４】
（ステップＳ６０２）サーバー送受信部５０３１は、音声情報を受信したか否かを判断する。また、サーバー送受信部５０３１は、映像情報を受信したか否かを判断する。サーバー送受信部５０３１は、音声情報および映像情報を受信した場合、ステップＳ６０３に行き、受信しない場合、ステップＳ６０２に戻る。
【００５５】
（ステップＳ６０３）音声変換部１０１１３は、ステップＳ６０２において受信した音声情報が、第二音声情報か否かを判断する。音声変換部１０１１３は、第二音声情報である場合、ステップＳ６０４に行き、そうでない場合、ステップＳ６０５に行く。例えば、第二音声情報か否かを判断する方法は、以下である。音声情報に伴って受信された電話番号が、第二端末１０２の電話番号である場合、第二音声情報であると、音声変換部１０１１３は判断する。なお、第二端末１０２の電話番号等は、サーバー装置５０３に内蔵の記録媒体に、予め保持されている。
【００５６】
（ステップＳ６０４）音声変換部１０１１３は、第二音声情報を音声変換する。
【００５７】
（ステップＳ６０５）サーバー送受信部５０３１は、ステップＳ６０４において音声変換した第二音声情報を第一端末５０１に送信する。また、サーバー送受信部５０３１は、第一音声情報を第二端末５０１に送信する。
【００５８】
（ステップＳ６０６）サーバー装置５０３は、第一端末５０１および第二端末１０２との接続を切断する入力を受け付けたか否かを判断する。サーバー装置５０３は、接続を切断する入力を受け付けた場合、サーバー装置５０３の動作は終了し、切断する入力を受け付けない場合、ステップＳ６０２に行く。なお、切断する入力は、サーバー装置５０３の電源スイッチをＯＦＦにする入力、または接続スイッチをＯＦＦにする入力等のものでも良い。
【００５９】
以上、サーバー装置５０３の動作を図６のフローチャートを用いて説明した。かかる説明において、第二音声情報を音声変換するものだったが、第一音声情報を音声変換するものでも良く、第一音声情報および第二音声情報を音声変換するものでも良い。
【００６０】
また、第一端末５０１と第二端末１０２の動作は、以下である。第一端末５０１または第二端末１０２は、音声入力部が入力を受け付けた音声、および映像入力部が入力を受け付けた映像を、サーバー装置５０３に送信する。第一端末５０１または第二端末１０２は、サーバー装置５０３から受信した音声情報を音声出力部が出力し、またサーバー装置５０３から受信した映像情報を映像出力部が出力する。
【００６１】
以下、本実施の形態における教育システムの具体的な動作について説明する。図７は、本実施の形態における教育システムの概念の例を示す図である。図７において、今、第一端末７０１は、パソコン７０２のモニター７０２１の上に設置されている。また、第一端末７０１は、デジタルビデオカメラ７０１１を有し、またマイク付きヘッドホン７０１２（音声入出力装置）と専用線を介して接続されている。第二端末７１１は、テレビ７１２のテレビ画面７１２１の上に設置されている。また、第二端末７１１は、デジタルビデオカメラ７１１１を有し、またマイク付きヘッドホン７１１２（音声入出力装置）と専用線を介して接続されている。第三端末７２１は、パソコン７２２のモニター７２２１の上に設置されている。また、第三端末７２１は、デジタルビデオカメラ７２１１を有し、またマイク付きヘッドホン７２１２（音声入出力装置）と専用線を介して接続されている。なお、第一端末７０１、第二端末７１１は、生徒である。また、第三端末７２１は、英会話の先生である。
【００６２】
また、第一端末７０１、第二端末７１１、第二端末７２１、教育システムのサーバー装置７３０は、インターネットを介して接続されている。モニター７０２１は、第二端末７１１側のユーザ、および第三端末７２１側のユーザの映像が出力されている。またテレビ画面７１２１は、第一端末７０１側のユーザ、および第三端末７２１側のユーザの映像が出力されている。また、モニター７２２１は、第一端末７０１側のユーザ、および第二端末７１１側のユーザの映像が出力されている。
【００６３】
今、第一端末７０１は、キャラクタ指定情報（以下、「第一キャラクタ指定情報」という）をサーバー装置７３０に送信する。第一キャラクタ指定情報の内容は、「第二端末７１１の音声情報は俳優Ａの音声情報Ａに変換する」、また、「第三端末７２１の音声情報は女優Ｂの音声情報Ｂに変換する」等である。また、第二端末７１１は、キャラクタ指定情報（以下、「第二キャラクタ指定情報」という）をサーバー装置７３０に送信する。第二キャラクタ指定情報の内容は、「第一端末７０１の音声情報は変換しない」、また、「第三端末７２１の音声情報はアナウンサーＣの音声情報Ｃに変換する」等である。また、第三端末７２１は、キャラクタ指定情報（以下、「第三キャラクタ指定情報」という）をサーバー装置７３０に、送信する。第三キャラクタ指定情報の内容は、「第一端末７０１の音声情報は変換しない」、また、「第二端末７１１の音声情報は変換しない」等である。なお、各々のキャラクタ指定情報は、予め、各々の端末に内蔵の記録媒体に記録されている。
【００６４】
第一端末７０１、第二端末７１１、第三端末７２１は、音声情報、映像情報をサーバー装置７３０に送信する。サーバー送受信部５０３１は、各端末からの音声情報、映像情報を受信する。
【００６５】
音声変換部１０１１３は、第一キャラクタ指定情報に基づいて、第二音声情報を俳優Ａの音声情報Ａに変換する。また、音声変換部１０１１３は、第三音声情報を女優Ｂの音声情報Ｂに変換する。また、音声変換部１０１１３は、その他のキャラクタ指定情報に応じて、音声情報を音声変換する（音声変換しない旨の情報の場合は、音声変換しない）。
【００６６】
サーバー送受信部５０３１は、音声変換部１０１１３が音声変換した音声情報と映像情報を各々の端末に送信する。例えば、サーバー送受信部５０３１は、第一端末７０１に、音声変換した第二音声情報、音声変換した第三音声情報、および第二映像情報、第三映像情報を送信する。
【００６７】
なお、本実施の形態において、各々の端末は、音声情報等をサーバー装置７３０に送信するものだったが、音声変換部が音声情報を音声変換しない場合、各々の端末は、音声情報をサーバー装置７３０に送信せず、直接、その他の端末に送信するものでも良く、他の外部装置を経由して送信するものでも良い。
【００６８】
また、本実施の形態において、各端末は、キャラクタ指定情報をサーバー装置７３０に送信するものだったが、サーバー装置７３０が、キャラクタ指定情報テーブルを有し、音声変換部が、当該キャラクタ指定情報テーブルに基づいて、音声情報を音声変換するものでも良い。以下に、キャラクタ指定情報テーブルに基づいて音声情報を音声変換する方法について、図８、９を用いて説明する。図８のテーブルは、音声変換される音声情報が指定されているデータ例を示す図である。図８のテーブルにおいて、レコード番号は、キャラクタ指定情報テーブルのレコード番号を示す。また、ユーザ識別子は、ユーザを識別する情報である。図８のテーブルにおいて、例えば、レコード番号「１」の内容は、「生徒Ｂ」で識別される端末に送信される「先生Ａ」の音声情報が音声変換されることを表している。つまり、図８のテーブルは、現在、教育システムを利用している先生と生徒を管理するテーブルである。図９におけるキャラクタ指定情報テーブルは、レコード番号、ユーザ識別子、キャラクタ識別子、ポインタ等の情報を有するテーブルである。キャラクタ識別子は、キャラクタの音声情報を識別する情報である。ポインタは、キャラクタ識別子に応じたキャラクタの音声波形データファイルを指定するポインタである。音声波形データは、予め、サーバー装置７３０に内蔵の記録媒体に記録されている。図９のテーブルにおいて、例えば、レコード番号「１」の内容は、図８のテーブルにおいて指定される「先生Ａ」の音声情報を、ポインタが指定するキャラクタ「漫画の主人公Ｅ」の音声波形に音声変換することを表している。また、音声変換された「先生Ａ」の音声情報を、「生徒Ｂ」で識別される端末に送信することを表している。なお、音声変換する具体的方法は、実施の形態１において説明した方法と同様である。また、図９のテーブルにおいて、「ユーザ識別子」で識別される生徒が、「キャラクタ識別子」で識別される音声に、先生の音声情報が変換されることを指示した旨を示す。
【００６９】
以上、本実施の形態によれば、複数の端末からの音声情報をキャラクタの音声情報に変換し、出力できる。よって、複数人を対象にするグループレッスン等の場合においても、利用者（特に、生徒）は、集中して、楽しく学習できる。
【００７０】
【発明の効果】
本発明によれば、第二端末で入力された音声が、第一端末で出力される際に、キャラクタの音声で出力される。よって、利用者は、集中して学習できる。例えば、キャラクタの音声がテレビ漫画の主人公の声である場合に、利用者（特に、子供）は、集中して、楽しく学習できる。
【図面の簡単な説明】
【図１】実施の形態１における教育システムの構成を示すブロック図
【図２】実施の形態１における第一端末の動作について説明するフローチャート
【図３】実施の形態１における第二端末の動作について説明するフローチャート
【図４】実施の形態１における教育システムの概念の例を示す図
【図５】実施の形態２における教育システムの構成を示すブロック図
【図６】実施の形態２におけるサーバー装置の動作について説明するフローチャート
【図７】実施の形態２における教育システムの概念の例を示す図
【図８】実施の形態２における音声情報が指定されているデータ例を示す図
【図９】実施の形態２におけるキャラクタ指定情報テーブルの例を示す図
【符号の説明】
１０１、４０１、５０１、７０１第一端末
１０２、４１１、７１１第二端末
１０１１、５０１１第一音声入出力装置
１０１２第一映像入出力装置
１０１３第一送受信部
１０１１１第一音声入力部
１０１１２第一音声出力部
１０１１３音声変換部
１０２１第二音声入出力装置
１０２２第二映像入出力装置
１０２３第二送受信部
１０２１第二音声入出力装置
１０２１第二音声入力部
１０２２第二音声出力部
１０１１３音声変換部
４０２、７０２、７２２パソコン
４０２１、７０２１、７２２１モニター
４０１１、７０１１、７２１１デジタルビデオカメラ
４０１２、７０１２、７２１２マイク付きヘッドホン
４１２、７１２テレビ
４１２１、７１２１テレビ画面
４２０、５０３、７３０サーバー装置
５０３１サーバー送受信部
７２１第三端末[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an education system or the like having an audio input / output device and supporting education by transmitting and receiving audio information in both directions.
[0002]
[Prior art]
Conventionally, there is an education system using a video conference system. This video conference system is a system provided with functions necessary for education, such as an authentication function for authenticating a person who has valid attendance qualifications and a grouping function for dividing a plurality of students (see Patent Document 1). . For example, when conducting English conversation education using the video conference system, the system organizes students of the same level into the same group by the above-described grouping function, and the students in the group practice conversation. Be able to do it. That is, it is possible to provide students who have equivalent English conversation skills to practice conversation more effectively than students who have differences in English conversation skills, and to efficiently improve English conversation skills. Further, the present system provides that it is possible to judge whether or not a person has a valid attendance qualification and to prevent an unauthorized person from attending the course.
[0003]
[Patent Document 1]
JP 2001-224000 A (Page 1, FIG. 1, etc.)
[0004]
[Problems to be solved by the invention]
However, the education system using the video conference system has the following potential problems. The user (especially, a child) has a problem that he cannot tolerate or concentrate on the monitor screen or the sound of the speaker for a long time. This is because the user feels as if he / she is teaching from a machine without emotions, not from the person (teacher) behind the monitor screen, and as a result, the lesson becomes boring and the concentration is reduced. Also, unlike a class in which humans are in direct contact with each other, a sense of tension does not easily occur, so that the user cannot concentrate on the class and the learning effect is not improved.
[0005]
[Means for Solving the Problems]
In order to solve the above problems, the present invention has a first voice input / output device and a second voice input / output device, and provides education by transmitting and receiving voice information bidirectionally by two voice input / output devices. This is an educational system that assists in this, and performs audio conversion when audio information input from at least one audio input / output device is output to the other audio input / output device. Such a system converts the output voice and provides an environment where the user can endure or concentrate for a long time. Therefore, the user can concentrate on learning.
[0006]
Another embodiment of the present invention is an educational system in which the sound converted above is a sound of a character. With such a system, when the input voice is output on the other side, it is output as the voice of the character. For example, when the voice of the character is the voice of the hero of the manga, the user (especially a child) can concentrate and enjoy learning.
[0007]
Further, another embodiment of the present invention is an education system in which the above-mentioned voice-converted voice is a voice specified by a user. With such a system, when the input voice is output on the other side, it is output with the voice specified by the user. For example, when the voice specified by the user is the voice of an actor longing for the user, the user can concentrate and enjoy learning.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
[0009]
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that, in this embodiment, components using the same reference numerals, steps in a flowchart, and the like have the same function, and therefore, description thereof may be omitted once.
[0010]
(Embodiment 1)
FIG. 1 is a block diagram illustrating a configuration of the education system according to the present embodiment. The education system has a first terminal 101 and a second terminal 102. The first terminal 101 has a first audio input / output device 1011, a first video input / output device 1012, and a first transmission / reception unit 1013. The first audio input / output device 1011 includes a first audio input unit 10111, a first audio output unit 10112, and an audio conversion unit 10113. The first video input / output device 1012 has a first video input unit 10121 and a first video output unit 10122. The second terminal 102 includes a second audio input / output device 1021, a second video input / output device 1022, and a second transmission / reception unit 1023. The second audio input / output device 1021 has a second audio input unit 1021 and a second audio output unit 1022. The second video input / output device 1022 has a second video input unit 10121 and a second video output unit 10122.
[0011]
The first voice input unit 10111 receives a first user's voice input. The first voice input unit 10111 can be generally realized by a microphone and its driver software.
[0012]
The first audio output unit 10112 outputs audio information of the second user. The first audio output unit 10112 can be generally realized by a speaker and its driver software or the like.
[0013]
The voice conversion unit 10113 converts the voice information of the second user received by the first transmission / reception unit 1013 into, for example, information representing a voice of a character or voice information specified by the user. The audio information is information relating to audio, and includes, for example, information obtained by converting audio into an electric signal. The information representing the voice of the character is, for example, the voice of a TV cartoon character (voice actor), the voice of an actor, the beautiful and clear voice, the voice of an announcer, the voice of the user, the voice of a user's acquaintance, and other voices. And the like. The voice information specified by the user is, for example, voice information of the character. The means for specifying the audio information can be realized by, for example, a voice specifying means or the like (for example, pressing a specification button). The method of voice conversion is, for example, a method of performing predetermined synthesis processing on voice information and converting the voice information into voice of a predetermined or designated character. This synthesis processing includes, for example, addition synthesis (for example, processing composed of a frequency oscillator and an adder), subtraction synthesis (predetermined frequency filter processing), processing for shaking the sound, processing for slightly shifting harmonics, and high and low frequency parts. (To change the ratio between the fundamental tone and the overtone), to change the pitch by changing the frequency, or to combine them by a combination of the above processes. Further, the method of voice conversion may be the following method. First, speech information is recognized (for example, frequency features such as a fundamental frequency and a formant are extracted and recognized). Next, the recognized voice information is converted into character data. Next, the voice waveform of the character corresponding to the character data is obtained, and the voice waveform is synthesized based on the character data. Note that the voice waveform of the character is recorded in a recording medium built in the voice conversion unit 10113 in advance. The sound conversion unit 10113 can be generally realized by an MPU, a memory, or the like. The processing procedure for voice conversion is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
[0014]
The first video input unit 10121 receives an input of a video of the first user. The first video input unit 10121 can be realized by a video input unit or the like. The first video input unit 10121 is preferably, for example, a digital video camera.
[0015]
The first video output unit 10122 outputs the video information of the second user received by the first transmission / reception unit 1013. The first video output unit 10122 is realized by, for example, a display and its driver software.
[0016]
The first transmission / reception unit 1013 and the second transmission / reception unit 1023 transmit or receive audio information, video information, and the like. The video information is video information, and is, for example, video information of a user whose input is received by the video input unit. The first transmission / reception unit 1013 and the second transmission / reception unit 1023 are realized, for example, by including a network card and its driver software. The audio information and the video information may be multiplexed and transmitted, and the transmission form is not limited. This transmission mode is the same in other embodiments.
[0017]
The second voice input unit 10211 receives an input of a second user's voice. The second voice input unit 10211 can be generally realized by a microphone and its driver software.
[0018]
The second audio output unit 10212 outputs the audio information of the first user received by the second transmitting / receiving unit 1023. The second audio output unit 10212 can be generally realized by a speaker and its driver software.
[0019]
The second video input unit 10221 receives an input of a video of the second user. The second video input unit 10221 can be realized by a video input unit or the like. The second video input unit 10221 is preferably, for example, a digital video camera.
[0020]
The second video output unit 10222 outputs the video information of the first user received by the second transmission / reception unit 1023. The second video output unit 10222 is realized by, for example, a display and its driver software.
[0021]
Hereinafter, an operation of inputting audio and video and transmitting audio and video information to an external device (for example, second terminal 102 or the like) in first terminal 101 according to the present embodiment will be described with reference to the flowchart of FIG. I do.
[0022]
(Step S201) The first terminal 101 determines whether an input for connecting to the second terminal 102 has been received. If the first terminal 101 has received an input to connect to the second terminal 102, the process proceeds to step S202; otherwise, the process returns to step S201. The input to be connected may be an input for turning on the power switch of the first terminal 101, an input for turning on the connection switch, or other input means. Further, the input to be connected may be that a connection signal (signal indicating connection) transmitted by the second terminal 102 is received.
[0023]
(Step S202) The first voice input unit 10111 determines whether a voice input has been received. Also, the first video input unit 10121 determines whether or not video input has been received. The first audio input unit 10111 or the first video input unit 10121 goes to step S203 when receiving the input of the audio and the video, and returns to step S202 otherwise.
[0024]
(Step S203) The first transmission / reception unit 1013 transmits the audio information and the video information accepted in step S202 to the second terminal 102.
[0025]
(Step S204) The first terminal 101 determines whether or not an input for disconnecting the connection with the second terminal 102 has been received. When the first terminal 101 receives an input to disconnect the connection with the second terminal 102, the operation of the first terminal 101 ends. When the first terminal 101 does not receive an input to disconnect, the process proceeds to step S202. Note that the input to disconnect may be an input to turn off the power switch of the first terminal 101 or an input to turn off the connection switch.
[0026]
The operation of inputting audio and video in the second terminal 102 and transmitting it to an external device (for example, the first terminal 101 or the like) is the same as the operation described in the above flowchart. Further, the present embodiment may be one that does not accept video input, does not transmit and receive video information, and does not output video information. Such a form is the same in other embodiments.
[0027]
Hereinafter, an operation of receiving audio information and video information from the second terminal 102, converting audio information into audio, and outputting the audio information in the first terminal 101 according to the present embodiment will be described with reference to the flowchart in FIG.
[0028]
(Step S301) The first terminal 101 determines whether an input for connecting to the second terminal 102 has been received. If the first terminal 101 has received an input to connect to the second terminal 102, the process proceeds to step S302; otherwise, the process returns to step S301. The input means for connection is the same as in step S201.
[0029]
(Step S302) The first transmission / reception unit 1013 determines whether audio information and video information have been received from the second terminal 102. If the first transmission / reception unit 1013 has received the audio information and the video information, the process proceeds to step S303; otherwise, the process returns to step S302.
[0030]
(Step S303) The first voice output unit 10112 voice-converts the voice information received in step S302 into predetermined voice information or voice information of a character designated by the user.
[0031]
(Step S304) The first voice output unit 10112 outputs the voice information of the character that has been voice-converted in step S303.
[0032]
(Step S305) The first video output unit 10122 outputs the video information received in Step S302.
[0033]
(Step S306) The first terminal 101 determines whether an input for disconnecting the connection with the second terminal 102 has been received. If the first terminal 101 receives an input to disconnect the connection with the second terminal 102, the operation of the first terminal 101 ends. If the first terminal 101 does not receive an input to disconnect, the process proceeds to step S302. The means for inputting disconnection is the same as in step S204.
[0034]
The operation of receiving and outputting audio information and video information in the second terminal 102 is the same as the operation described in the above flowchart.
[0035]
Hereinafter, a specific operation of the education system (English conversation education system) according to the present embodiment will be described. FIG. 4 is a diagram illustrating an example of the concept of the education system according to the present embodiment. In FIG. 4, the first terminal 401 is installed on a monitor 4021 of the personal computer 402. The first terminal 401 has a digital video camera 4011, and is connected to headphones 4012 with a microphone (audio input / output device) via a dedicated line. The second terminal 411 is installed on a television screen 4121 of the television 412. The second terminal 411 has a digital video camera 4111 and is connected to a headphone with microphone 4112 (audio input / output device) via a dedicated line. In addition, the first terminal 401, the second terminal 411, and the server device 420 of the education system are connected via the Internet. The monitor 4021 outputs an image of the user of the second terminal 411 side. On the television screen 4121, an image of the user of the first terminal 401 is output. Note that the first terminal 401 is a student. The second terminal 411 is an English conversation teacher.
[0036]
The server 420 of the education system performs the following operation. For example, the server device 420 outputs a daily lesson content (lesson schedule), or outputs a lesson end time. In addition, the server device 420 determines access permission of the first terminal 401 and the second terminal 411, and does not connect the first terminal 401 and the second terminal 411, for example, when there is a person who takes a class illegally. And so on. In addition, the server device 420 receives, for example, audio information and video information from a plurality of terminals and transmits the information to each terminal.
[0037]
Hereinafter, the operation of converting the voice information into the voice of the character will be specifically described. The microphone of the headphone with microphone 4112 acquires the voice uttered by the user of the second terminal 411 and converts the voice into an electric signal (voice information). The second transmitting / receiving unit transmits the audio information to the first terminal 401. The first transmission / reception unit receives the audio information.
[0038]
Next, the voice conversion unit 10113 converts the received voice information into information representing the voice of a predetermined character. Alternatively, the audio information is converted into audio information specified by the user. When the information representing the voice of the predetermined character is the information representing the voice of the hero B of the TV cartoon A, the voice waveform data of the hero B is recorded in advance on a recording medium built in the voice converter 10113. I have. In the case of converting into audio information specified by the user, audio waveform data of the specified audio information is recorded in advance on a recording medium built in the audio conversion unit 10113. Here, the sound waveform data is, for example, waveform data that can reproduce the sound of the main character B. The data structure of the waveform data does not matter.
[0039]
A method for the user to specify the voice information of the character is, for example, a method in which the user presses a specification button (voice specification means) every time to specify the voice information of the character corresponding to the specification button. Further, the voice information of the character specified last time may be specified without specifying the voice information of the character every time.
[0040]
The voice conversion unit 10113 detects a characteristic of a frequency such as a fundamental frequency and a formant of the received voice information. Next, the received voice information and the data such as the fundamental frequency and formant of the voice waveform data of the character are compared and subtracted. The speech conversion unit 10113 performs the above-described synthesis processing on the received speech information so that the result of the comparison and subtraction becomes zero (that is, the values match) or substantially zero. Note that a process other than the above-described synthesis process may be performed to perform voice conversion.
[0041]
Further, the voice conversion unit 10113 may perform voice conversion by the following method. First, the speech conversion unit 10113 recognizes the received speech information (for example, it extracts and recognizes features of a frequency such as a fundamental frequency and a formant). Next, the voice conversion unit 10113 converts the recognized voice information into character data. Next, the voice conversion unit 10113 obtains the voice waveform of the character corresponding to the character data, and synthesizes the voice waveform based on the character data.
[0042]
Next, the headphone with microphone 4012 outputs the audio information converted by the audio converter 10113. Therefore, the user of the first terminal 401 can hear the voice of the user of the second terminal 411 with the voice of a character determined in advance or specified by the user.
[0043]
Note that the microphone of the headphone with microphone 4012 acquires the voice uttered by the user of the first terminal 401 and converts it into an electric signal (voice information). The first transmitting / receiving unit transmits the audio information to the second terminal 411. The second transmission / reception unit receives the audio information. The headphone with microphone 4112 of the second audio output unit converts the audio information into audio and outputs the audio.
[0044]
In the present embodiment, the voice conversion unit 10113 is built in the first terminal 401. However, the voice conversion unit 10113 may be built in the second terminal 411 or the server device 420. good. When the voice conversion unit 10113 is incorporated in the server device 420, the second terminal 411 transmits voice information to the server device 420, and the server device 420 receives the voice information, performs voice conversion on the voice information, The converted voice information is transmitted to the first terminal 401. When the voice conversion unit 10113 is incorporated in the second terminal 411, the second terminal 411 converts the voice information into voice, and transmits the voice-converted voice information to the first terminal 401.
[0045]
Further, in the present embodiment, the second terminal 411 does not have a voice conversion unit, but the second terminal 411 may have a voice conversion unit. In such a case, the second terminal 411 can convert the voice of the user on the first terminal 401 side into the voice of the character and output it.
[0046]
Further, in the present embodiment, the video output unit outputs the video input from the other terminal, but in response to converting the other audio information into the audio information of the character, the video output unit Alternatively, video information of such a character may be output. Note that the video information of the character may be stored in a recording medium built in the terminal, or may be stored in a recording medium built in the server device.
[0047]
As described above, according to the present embodiment, when the voice input by the second terminal is output by the first terminal, the voice is output as the voice information of the character. For example, when the voice of the character is the voice of the hero of the TV cartoon, the user (especially a child) can concentrate and enjoy learning. Further, for example, when the voice specified by the user is the voice of an actor longing for the user, the user can concentrate and enjoy learning. Furthermore, it has been physically impossible for the same person (for example, character A) to conduct different classes at the same time. According to the present invention, the input teacher's voice can be converted into the voice of the character A. Therefore, each student can take a lesson with the voice of the character A even if the lessons are conducted simultaneously. For example, when the class A class and the class B class having different class contents are performed at the same time, the students of the classes A and B can take the class with the voice of the popular character A at the same time. Therefore, the students in the classes A and B can concentrate on the lesson, and the learning effect is improved.
[0048]
Furthermore, the operation of the education system described in the present embodiment may be realized by software, and the software may be placed on, for example, a server, and the software may be distributed by software download. Further, the software may be recorded on a recording medium such as a CD-ROM and distributed. This is the same in all other embodiments. A program when the operation in the present embodiment is realized by software is as follows. A voice information receiving step of receiving voice information input from at least one voice input / output device to the computer, a voice conversion step of voice-converting the received voice information, and a voice of outputting the voice-converted voice information This is a program for executing an information output step. It should be noted that outputting the voice-converted voice information in the voice information output step is a concept including transmitting the voice information to another voice input / output device.
[0049]
(Embodiment 2)
FIG. 5 is a block diagram illustrating a configuration of the education system according to the present embodiment. The education system includes a first terminal 501, a second terminal 102, and a server device 503. The first terminal 501 includes a first audio input / output device 5011, a first video input / output device 1012, and a first transmission / reception unit 1013. The first audio input / output device 5011 has a first audio input unit 10111 and a first audio output unit 10112. The first video input / output device 1012 has a first video input unit 10121 and a first video output unit 10122. The server device 503 includes a server transmission / reception unit 5031 and a voice conversion unit 10113.
[0050]
The server transmission / reception unit 5031 receives audio information, video information, and the like from the terminal, and transmits audio information, video information, and the like to the terminal. The server transmission / reception unit 5031 is realized, for example, by including a network card and its driver software.
[0051]
The voice conversion unit 10113 converts voice information into voice according to the character designation information. The method of voice conversion is the same as in the first embodiment. Voice conversion unit 10113 acquires character designation information from the terminal. The character designation information is information relating to the designated character. For example, “the voice information of the second terminal 102 is converted into voice information of the character A”, “the voice information of the third terminal 102 is converted into voice information of the character B” Etc. Note that the terminal may record the character designation information in advance, or may have a character designation input receiving means for receiving the input of the character designation information. The character designation information may not be obtained from the terminal but may be recorded on a recording medium built in the server device 503.
[0052]
Hereinafter, the operation of the server device 503 according to the present embodiment will be described with reference to the flowchart in FIG. The voice information transmitted by the first terminal 501 is referred to as first voice information, and the voice information transmitted by the second terminal 102 is referred to as second voice information.
[0053]
(Step S601) The server device 503 determines whether an input for connecting to the first terminal 501 and the second terminal 102 has been received. If the server device 503 has received an input to connect, the process proceeds to step S602; otherwise, the process returns to step S601. The input to be connected may be an input for turning on the power switch of the server device 503, an input for turning on the connection switch, or other input means.
[0054]
(Step S602) The server transmission / reception unit 5031 determines whether or not audio information has been received. Further, server transmitting / receiving section 5031 determines whether or not video information has been received. The server transmission / reception unit 5031 proceeds to step S603 when receiving the audio information and the video information, and returns to step S602 when not receiving it.
[0055]
(Step S603) The voice conversion unit 10113 determines whether or not the voice information received in Step S602 is the second voice information. If it is the second audio information, the voice conversion unit 10113 goes to step S604; otherwise, it goes to step S605. For example, a method of determining whether or not it is the second audio information is as follows. If the telephone number received with the audio information is the telephone number of the second terminal 102, the audio conversion unit 10113 determines that the telephone number is the second audio information. Note that the telephone number and the like of the second terminal 102 are stored in a recording medium built in the server device 503 in advance.
[0056]
(Step S604) The voice conversion unit 10113 performs voice conversion on the second voice information.
[0057]
(Step S605) The server transmission / reception unit 5031 transmits to the first terminal 501 the second voice information voice-converted in step S604. In addition, the server transmission / reception unit 5031 transmits the first audio information to the second terminal 501.
[0058]
(Step S606) The server device 503 determines whether an input for disconnecting the connection with the first terminal 501 and the second terminal 102 has been received. If the server device 503 receives an input for disconnecting the connection, the operation of the server device 503 ends. If the server device 503 does not receive an input for disconnecting, the process proceeds to step S602. The input for disconnecting may be an input for turning off the power switch of the server device 503 or an input for turning off the connection switch.
[0059]
The operation of the server device 503 has been described with reference to the flowchart in FIG. In the above description, the second voice information is converted into voice. However, the first voice information may be converted into voice, or the first voice information and the second voice information may be converted into voice.
[0060]
The operations of the first terminal 501 and the second terminal 102 are as follows. The first terminal 501 or the second terminal 102 transmits, to the server device 503, the audio received by the audio input unit and the video received by the video input unit. In the first terminal 501 or the second terminal 102, the audio output unit outputs the audio information received from the server device 503, and the video output unit outputs the video information received from the server device 503.
[0061]
Hereinafter, a specific operation of the education system according to the present embodiment will be described. FIG. 7 is a diagram illustrating an example of the concept of the education system according to the present embodiment. In FIG. 7, the first terminal 701 is installed on a monitor 7021 of a personal computer 702. The first terminal 701 has a digital video camera 7011 and is connected to a headphone with microphone 7012 (audio input / output device) via a dedicated line. The second terminal 711 is installed on a television screen 7121 of the television 712. The second terminal 711 has a digital video camera 7111 and is connected to headphones 7112 with a microphone (audio input / output device) via a dedicated line. The third terminal 721 is installed on a monitor 7221 of the personal computer 722. The third terminal 721 has a digital video camera 7211 and is connected to headphones with microphone 7212 (audio input / output device) via a dedicated line. Note that the first terminal 701 and the second terminal 711 are students. The third terminal 721 is an English conversation teacher.
[0062]
In addition, the first terminal 701, the second terminal 711, the second terminal 721, and the server device 730 of the education system are connected via the Internet. The monitor 7021 outputs images of the user of the second terminal 711 and the user of the third terminal 721. On the television screen 7121, images of the user of the first terminal 701 and the user of the third terminal 721 are output. The monitor 7221 outputs images of the user of the first terminal 701 side and the user of the second terminal 711 side.
[0063]
Now, the first terminal 701 transmits character designation information (hereinafter, referred to as “first character designation information”) to the server device 730. The contents of the first character designation information are “convert the voice information of the second terminal 711 to voice information A of the actor A” and “convert the voice information of the third terminal 721 to voice information B of the actress B”. And so on. Further, the second terminal 711 transmits character designation information (hereinafter, referred to as “second character designation information”) to the server device 730. The contents of the second character designation information include "the voice information of the first terminal 701 is not converted" and "the voice information of the third terminal 721 is converted into the voice information C of the announcer C". The third terminal 721 transmits character designation information (hereinafter, referred to as “third character designation information”) to the server device 730. The contents of the third character designation information include "the voice information of the first terminal 701 is not converted" and "the voice information of the second terminal 711 is not converted". Note that each character designation information is recorded in advance on a recording medium built in each terminal.
[0064]
The first terminal 701, the second terminal 711, and the third terminal 721 transmit audio information and video information to the server device 730. The server transmission / reception unit 5031 receives audio information and video information from each terminal.
[0065]
The voice conversion unit 10113 converts the second voice information into voice information A of the actor A based on the first character designation information. In addition, the voice conversion unit 10113 converts the third voice information into voice information B of the actress B. In addition, the voice conversion unit 10113 performs voice conversion on voice information in accordance with other character designation information (if the information indicates that voice conversion is not performed, voice conversion is not performed).
[0066]
The server transmission / reception unit 5031 transmits the audio information and the video information converted by the audio conversion unit 10113 to each terminal. For example, the server transmission / reception unit 5031 transmits the first audio information, the second audio information, the third audio information, the second video information, and the third video information to the first terminal 701.
[0067]
In the present embodiment, each terminal transmits voice information and the like to the server device 730. However, when the voice conversion unit does not perform voice conversion on the voice information, each terminal transmits the voice information to the server device 730. The signal may not be transmitted to the terminal 730 but may be transmitted directly to another terminal, or may be transmitted via another external device.
[0068]
Further, in the present embodiment, each terminal transmits character designation information to server device 730, but server device 730 has a character designation information table, and the voice conversion unit transmits the character designation information table. May be used to convert voice information into voice. Hereinafter, a method of converting voice information into voice based on the character designation information table will be described with reference to FIGS. The table in FIG. 8 is a diagram illustrating an example of data in which audio information to be audio-converted is specified. In the table of FIG. 8, the record number indicates the record number of the character designation information table. The user identifier is information for identifying a user. In the table of FIG. 8, for example, the content of the record number “1” indicates that the voice information of “teacher A” transmitted to the terminal identified as “student B” is voice-converted. That is, the table in FIG. 8 is a table for managing teachers and students who are currently using the education system. The character designation information table in FIG. 9 is a table having information such as a record number, a user identifier, a character identifier, and a pointer. The character identifier is information for identifying voice information of the character. The pointer is a pointer for designating a sound waveform data file of a character corresponding to the character identifier. The audio waveform data is recorded in advance on a recording medium built in the server device 730. In the table of FIG. 9, for example, the content of the record number “1” is obtained by converting the voice information of “teacher A” specified in the table of FIG. Indicates conversion. Also, this indicates that the voice information of “teacher A” that has been voice-converted is transmitted to the terminal identified as “student B”. Note that the specific method of voice conversion is the same as the method described in the first embodiment. In the table of FIG. 9, it also indicates that the student identified by the “user identifier” has instructed that the teacher's voice information be converted into the voice identified by the “character identifier”.
[0069]
As described above, according to the present embodiment, voice information from a plurality of terminals can be converted into voice information of a character and output. Therefore, even in the case of a group lesson for a plurality of people, the user (especially a student) can concentrate and enjoy learning.
[0070]
【The invention's effect】
According to the present invention, when the voice input at the second terminal is output at the first terminal, the voice is output as the voice of the character. Therefore, the user can concentrate on learning. For example, when the voice of the character is the voice of the hero of the TV cartoon, the user (especially a child) can concentrate and enjoy learning.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an education system according to a first embodiment.
FIG. 2 is a flowchart illustrating an operation of a first terminal according to the first embodiment.
FIG. 3 is a flowchart illustrating an operation of a second terminal according to the first embodiment.
FIG. 4 is a diagram showing an example of a concept of an education system according to the first embodiment.
FIG. 5 is a block diagram illustrating a configuration of an education system according to the second embodiment.
FIG. 6 is a flowchart illustrating an operation of a server device according to the second embodiment.
FIG. 7 is a diagram showing an example of a concept of an education system according to the second embodiment.
FIG. 8 is a diagram showing an example of data in which audio information is specified in the second embodiment.
FIG. 9 is a diagram showing an example of a character designation information table according to the second embodiment.
[Explanation of symbols]
101, 401, 501, 701 First terminal
102, 411, 711 Second terminal
1011, 5011 First audio input / output device
1012 First video input / output device
1013 First transceiver
10111 First voice input unit
10112 First audio output unit
10113 Voice conversion unit
1021 second audio input / output device
1022 Second video input / output device
1023 Second transceiver
1021 second audio input / output device
1021 Second voice input unit
1022 Second audio output unit
10113 Voice conversion unit
402, 702, 722 PC
4021, 7021, 7221 monitor
4011, 7011, 7211 Digital video camera
4012, 7012, 7212 Headphones with Microphone
412, 712 TV
4121, 7121 TV screen
420, 503, 730 server device
5031 Server transceiver
721 Third terminal

Claims

Having a first voice input / output device and a second voice input / output device,
An education system that supports education by transmitting and receiving audio information in two directions by the two audio input / output devices,
An education system that outputs voice-converted voice information when voice information input from at least one voice input / output device is output to the other voice input / output device.

The educational system according to claim 1, wherein the voice information resulting from the voice conversion is information representing voice of a character.

The voice conversion is
3. The educational system according to claim 1, wherein the conversion is to conversion to voice information specified by a user of the other voice input / output device.

The voice conversion is
The educational system according to any one of claims 1 to 3, wherein the first voice input / output device and / or the second voice input / output device converts voice information into voice.

The educational system includes a server device that receives voice information from at least one voice input / output device and transmits the voice information to the other voice input / output device,
The server device includes:
The educational system according to any one of claims 1 to 3, wherein voice information transmitted from at least one voice input / output device is converted into voice.

An audio input / output device constituting an education system having two audio input / output devices,
An audio input / output device that receives audio information transmitted from at least one audio input / output device, converts the audio information into audio, and outputs the converted audio information.

On the computer,
Voice information receiving step of receiving voice information input from at least one voice input / output device,
A voice conversion step of voice-converting the received voice information;
A program for executing a voice information output step of outputting the voice information after the voice conversion.

An audio output method in an education system having a first audio input / output device and a second audio input / output device, and supporting education by transmitting and receiving audio information in two directions by the two audio input / output devices. hand,
An audio output method for outputting audio information subjected to audio conversion when audio information input from at least one audio input / output device is output to the other audio input / output device.

9. The audio output method according to claim 8, wherein the audio information obtained as a result of the audio conversion is information representing the voice of the character.

The audio output method according to claim 8, wherein the audio conversion is conversion into audio information designated by a user of the other audio input / output device.