JP4651981B2

JP4651981B2 - Education information management server

Info

Publication number: JP4651981B2
Application number: JP2004208232A
Authority: JP
Inventors: ハル安藤; 啓子藤田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2004-07-15
Filing date: 2004-07-15
Publication date: 2011-03-16
Anticipated expiration: 2024-07-15
Also published as: JP2006030513A

Description

本発明は、遠隔教育（eラーニング）等の教育分野において、受講者であるユーザの学習環境、特にユーザとシステム間の接面をユーザに適応した環境にカスタマイズすることによって教育効果を向上させるための技術を搭載した情報処理装置に関する。 The present invention improves the educational effect by customizing the learning environment of a user who is a student, particularly the interface between the user and the system, in an educational field such as distance learning (e-learning). The present invention relates to an information processing apparatus equipped with the above technology.

昨今はeラーニング事業が日本においても立ち上がり始めた。eラーニングには大きく分けて、非同期型のeラーニングと同期型のeラーニングの２種類がある。 Recently, the e-learning business has started in Japan. There are two types of e-learning: asynchronous e-learning and synchronous e-learning.

非同期型eラーニングで普及している方式は、ASPなどを利用したWebベースのトレーニング方式で、ユーザが自分の学習したい時間に学習したい場所で教材を利用することができる方式である。ただ、コンテンツに関しては、受講者の解答結果に応じて異なるパスへ誘導するマルチパス方式ではあるが、webページの１ページの中では、受講者毎に適応したコンテンツではなく、汎用の情報がコンテンツとして提示されているのが現状である。 A popular method for asynchronous e-learning is a web-based training method using ASP or the like, in which a user can use teaching materials at a place where he / she wants to learn. However, with regard to content, it is a multi-pass method that leads to different paths according to the answer results of students, but general-purpose information is not content that is adapted to each student in one page of the web page. It is presented as

一方、同期型のeラーニングでは、ASPを利用したビデオチャット方式やISDN回線を利用した動画双方向通信方式で、互いに遠隔地にいる教師と受講者の間を結び、あたかも同じ教室で授業を行っているような状況で学習を行うことができる。 On the other hand, in synchronous e-learning, a video chat method using ASP and a two-way video communication method using an ISDN line connect teachers and students who are remote from each other, and conduct classes in the same classroom. Can learn in situations like

これらeラーニングにつて、以下のような提案がなされている。例えば、特許文献１には、トレーニングサーバーがユーザーに対して、仮想空間を利用してトレーニングの指導、支援を個別に行うトレーニング支援方法が開示されている。また、特許文献２には、利用者からの質問を受け付けるホストコンピュータの質問提示方法において、利用者に対して最適な質問文を提示し、利用者が複数提示されている質問文から自分が質問しようとしている内容に近い質問文を選択する方法が開示されている。 The following proposals have been made for these e-learning. For example, Patent Document 1 discloses a training support method in which a training server individually provides training guidance and support to a user using a virtual space. Further, in Patent Document 2, in the question presentation method of a host computer that accepts a question from a user, an optimum question sentence is presented to the user, and the user himself / herself asks a question from a plurality of question sentences presented. A method of selecting a question sentence close to the content to be attempted is disclosed.

特開２００４−１６７３８号公報JP 2004-16738 A 特開２００４−０５３２２号公報JP 200405322 A

しかしながら、同期型eラーニングでは、受講者の数に相当した教師が対応する必要があり、受講者が要求する時間に教師が対応できない場合も多く、またそれに伴って受講するための授業料が高額になっているのが現状である。 However, in synchronous e-learning, it is necessary for teachers to correspond to the number of students, and there are many cases where teachers are not able to respond to the time required by students, and the tuition fees for attendance are high accordingly. This is the current situation.

本発明の目的は、利用者である受講者が実際の教師と対話しているような擬似的環境を生成することで、個別の受講者に適応したコンテンツを有する教育支援装置及び教育情報管理サーバを提供することにある。 An object of the present invention is to generate a pseudo-environment in which a student student is interacting with an actual teacher, thereby providing an education support apparatus and an education information management server having contents adapted to individual students Is to provide.

本発明の他の目的は、利用者である受講者が普段リアルに行っている対話状況をアニメーションキャラクタに投影し、教師への親しみを持たせることにより、学習効率の向上が見込める教育支援装置及び教育情報管理サーバを提供することにある。 Another object of the present invention is to provide an education support apparatus that can improve learning efficiency by projecting a conversation situation that a student who is a user normally conducts to an animation character and having friendliness to a teacher, and To provide an educational information management server.

上述の課題を解決するために本願において開示される発明のうち代表的なものの概要を簡単に説明すれば以下の通りである。 In order to solve the above-described problems, the outline of typical ones of the inventions disclosed in the present application will be briefly described as follows.

受講者側端末と教師側端末とに接続される接続部を備え、受講者が前記受講者側端末を利用して教師または該教師に相当するデジタルインストラクタと対話学習するのを支援するための教育情報管理サーバであって、格納部と、前記受講者側端末及び前記教師側端末から取得した受講者情報及び教師情報から、前記前記受講者と前記教師との対話学習における対話パタンのデータを生成する対話パタン生成手段と、前記対話パタンのデータを、時系列に前記格納部に登録する登録手段と、前記格納部に登録された前記対話パタンの中から、特徴的な対話パタンを抽出して前記格納部に登録する対話パタン抽出機能と、前記格納部に登録された前記特徴的な対話パタンの中の、前記受講者の特定の対話パタンに対する前記教師の特定の対話パタンを、アニメーションキャラクタ及び合成音声により該教師に相当するデジタルインストラクタとして生成するためのユニークデジタルインストラクタ生成機能とを備えている。 Education for providing a connection unit connected to the student side terminal and the teacher side terminal, and for assisting the student in interactive learning with the teacher or a digital instructor corresponding to the teacher using the student side terminal An information management server that generates dialogue pattern data in dialogue learning between the student and the teacher from the storage unit, the student information and the teacher information acquired from the student side terminal and the teacher side terminal dialogue pattern generating means for the data of the interactive pattern, when a registration means for registering in the storage unit in the sequence, from among the interactive pattern registered in the storage unit, extracts characteristic interaction pattern dialogue pattern extraction function to be registered in the storage unit, in the characteristic interaction patterns registered in the storage unit, certain interactions patterns of the teacher for specific interaction pattern of the student The, and a unique digital instructor generation function to generate a digital instructor corresponding to該教nurses by animation characters and synthesized speech.

本発明によれば、例えば利用者である受講者が、出力されるアニメーションキャラクタの動作情報、音声情報により、リアルな教師との対話環境を保ちながら学習を行うことができ、受講者の学習効果の向上を見込むことができる。 According to the present invention, for example, a student who is a user can perform learning while maintaining a conversational environment with a real teacher based on the motion information and audio information of the output animation character. Improvement can be expected.

以下、本発明の教育支援装置の一実施形態について図面を参照しながら説明する。実施 Hereinafter, an embodiment of an education support apparatus according to the present invention will be described with reference to the drawings. Implementation

形態の一例として示すシステムは、主に教育情報を取り扱うシステムであり、ユーザである受講者が学習コンテンツを利用して学習を行なうシステムである。 The system shown as an example of the form is a system that mainly handles educational information, and is a system in which a student who is a user learns using learning content.

まず、図１を用いて、本発明の実施の形態を説明する。図１Ａは、本発明の教育支援装置の全体的なハード構成を示す図であり、図１Ｂは、本発明の教育支援装置を処理機能の観点で示したものである。１０１は、学習関連情報を蓄積し、かつ蓄積された情報を分析するための教育情報管理サーバである。１０２は教師が複数の受講者に対してリアルタイムで講義を行ったり、また受講者と対話式で授業を行ったりするために利用する教師用の端末としてのＰＣ、１０３は学習を行う受講者用の端末としてのＰＣである。１０４は映像情報蓄積用サーバである。各サーバ及びＰＣは通信制御機能を有しており、通信ネットワーク１０５を介して相互に接続可能に構成されている。 First, an embodiment of the present invention will be described with reference to FIG. FIG. 1A is a diagram showing the overall hardware configuration of the education support apparatus of the present invention, and FIG. 1B shows the education support apparatus of the present invention from the viewpoint of processing functions. Reference numeral 101 denotes an education information management server for accumulating learning related information and analyzing the accumulated information. 102 is a PC used as a teacher's terminal for a teacher to give a lecture to a plurality of students in real time, and interactive classes with students. 103 is for a student who performs learning PC as a terminal. Reference numeral 104 denotes a video information storage server. Each server and PC have a communication control function and are configured to be connected to each other via a communication network 105.

教師用ＰＣ１０２には、受講者とのコミュニケーションを行なうためのスピーカ１０２０１、教師の様子を撮像するカメラ１０２０２、教師が音声を入力するためのマイク１０２０３、また情報入力用としてキーボード、マウスが搭載されている。受講者用ＰＣ１０３には、受講者が教師とコミュニケーションするためのマイク１０３０１、及びカメラ１０３０２、教師からの音声等を獲得するためのスピーカ１０３０３、また情報入力用としてキーボード、マウスが搭載されている。説明を簡単にするために一台の教育情報管理サーバに対して、一台の教師用ＰＣが接続される例を示しているが、実際には、複数台の教師用ＰＣを接続し得るものとする。また、本例では受講者用ＰＣをサーバに対して一台接続している例を示すが、教師用ＰＣに対して受講者用ＰＣを複数台接続し得ることは言うまでも無い。なお、各サーバやＰＣは、コンピュータで構成されており、ＣＰＵやメモリおよび入出力装置を備え、メモリにロードされて実行されることによりコンピュータに各種の機能を実現させる複数のプログラムを保持している。 The teacher PC 102 is equipped with a speaker 10201 for communicating with the student, a camera 10202 for imaging the teacher, a microphone 10203 for the teacher to input voice, and a keyboard and mouse for information input. Yes. The student PC 103 is equipped with a microphone 10301 for the student to communicate with the teacher, a camera 10302, a speaker 10303 for acquiring voice from the teacher, and a keyboard and mouse for information input. In order to simplify the explanation, an example is shown in which one teacher PC is connected to one education information management server. However, in practice, a plurality of teacher PCs can be connected. And In this example, one student PC is connected to the server. However, it goes without saying that a plurality of student PCs can be connected to the teacher PC. Each server and PC is composed of a computer, and includes a CPU, a memory, and an input / output device, and holds a plurality of programs that realize various functions by being loaded into the memory and executed. Yes.

図１Ｂに示すように、教育情報管理サーバ１０１は、主な機能として、教師と受講者の対話コミュニケーション機能１０１Ａ、テキスト表示編集機能１０１Ｂ、対話パターン抽出機能１０１Ｃ、ユニークデジタルインストラクタ生成機能１０１Ｄ、教師（リアルインストラクタ）による学習支援機能１０１Ｅ、デジタルインストラクタによる学習支援機能１０１Ｆおよび学習スケジュール管理機能１０１Ｇを有している。換言すると、これらの各機能が教育情報管理サーバ１０１のＣＰＵやメモリ及びプログラムで構成される制御部において実行、処理される。 As shown in FIG. 1B, the education information management server 101 includes, as main functions, a dialogue communication function 101A between a teacher and a student, a text display editing function 101B, a dialogue pattern extraction function 101C, a unique digital instructor generation function 101D, a teacher ( A learning support function 101E using a real instructor, a learning support function 101F using a digital instructor, and a learning schedule management function 101G. In other words, each of these functions is executed and processed in the control unit configured by the CPU, memory, and program of the education information management server 101.

教師と受講者の対話コミュニケーション機能１０１Ａは、受講者と教師との対話学習における対話パターンのデータを時系列に記憶手段に登録する機能を有する。対話パターン抽出機能１０１Ｃは、登録された前記対話パターンの中から、特徴的な対話パターンを抽出する機能を有している。より具体的には、学習時における教師及び受講者の動画像及び音声を解析して動作パタン抽出及び発声パタン抽出を行う機能と、抽出された動作パタンと発声パタンの少なくとも２セットについてそれぞれ時系列に対応付けて対話パタンの抽出を行い教師及び受講者から別途取得される音声や動画像と抽出された動作パタンや発声パタンとの類似度を判定する機能と、この類似度が所定値以上の場合に対話パターンとして登録する機能とを有する。 The dialogue communication function 101A between the teacher and the student has a function of registering dialogue pattern data in the learning means in the time series in the dialogue learning between the student and the teacher. The dialogue pattern extraction function 101C has a function of extracting a characteristic dialogue pattern from the registered dialogue patterns. More specifically, a function for performing motion pattern extraction and utterance pattern extraction by analyzing moving images and sounds of teachers and students at the time of learning, and at least two sets of extracted motion patterns and utterance patterns, respectively A dialogue pattern is extracted in association with the voice and moving image separately acquired from the teacher and the student, and a function for determining the similarity between the extracted motion pattern and utterance pattern, and the similarity is a predetermined value or more. And a function of registering as a dialogue pattern.

ユニークデジタルインストラクタ生成機能１０１Ｄは、受講者の特定の対話パターンに対する教師の特定の対話パターンを、アニメーションキャラクタ及び合成音声によりこの教師に相当するデジタルインストラクタとして生成する機能を有する。 The unique digital instructor generating function 101D has a function of generating a teacher's specific dialog pattern for a student's specific dialog pattern as a digital instructor corresponding to this teacher using animation characters and synthesized speech.

また、教師用ＰＣ１０２は、主な機能として、教師と受講者の対話コミュニケーション機能１０２Ａ、テキスト表示編集機能１０２Ｂ、ユニークデジタルインストラクタ生成機能１０２Ｄ、リアルインストラクタによる学習支援機能１０２Ｅ、デジタルインストラクタによる学習支援機能１０２Ｆおよび学習スケジュール管理機能１０２Ｇを有している。 Further, the teacher PC 102 has, as main functions, a dialogue communication function 102A between the teacher and the student, a text display editing function 102B, a unique digital instructor generation function 102D, a learning support function 102E by a real instructor, and a learning support function 102F by a digital instructor. And a learning schedule management function 102G.

さらに、受講者用ＰＣ１０３は、主な機能として、教師と受講者の対話コミュニケーション機能１０３Ａ、テキスト表示編集機能１０３Ｂ、リアルインストラクタによる学習支援機能１０３Ｅおよびデジタルインストラクタによる学習支援機能１０３Ｆを有している。映像情報蓄積用サーバ１０４は、映像/音声蓄積機能１０４Ｈを有している。 Furthermore, the student PC 103 has, as main functions, an interactive communication function 103A between the teacher and the student, a text display editing function 103B, a learning support function 103E using a real instructor, and a learning support function 103F using a digital instructor. The video information storage server 104 has a video / audio storage function 104H.

このように本発明の教育支援装置は、全体的な構成として、対話している一組の利用者を撮像した動画像を取得する入力部と、該端末において対話している前記一組の利用者の音声を取得する入力部と、上記入力部から得られる動画像や音声の入力時刻を記録する入力時刻検出部と、アニメーションキャラクタを表示する画像出力部と、アニメーションキャラクタと同期して音声を出力する音声出力部と、制御部とを有している。そして、前記制御部は、前記動画像を解析して前記一組の利用者の動作パタン抽出を行い、前記音声を解析して前記利用者の発声パタン抽出を行い、抽出された動作パタンと発声パタンの少なくとも２セットについてそれぞれ時系列に対応付けて対話パタンの抽出を行い、前記利用者から別途取得される音声や動画像と同一利用者から抽出した動作パタンと発声パタンとの類似度を判定し、類似度が高ければ、抽出された一方の利用者の対話パタンに対する相手利用者の動作パタンをアニメーションキャラクタによって動画像として出力し、同時に抽出された対話パタンの相手利用者の発声パタンを音声出力する。 As described above, the education support apparatus according to the present invention has, as an overall configuration, an input unit that acquires a moving image obtained by capturing a set of interacting users and the set of uses interacting at the terminal. An input unit that obtains the voice of the person, an input time detection unit that records the input time of the moving image and sound obtained from the input unit, an image output unit that displays the animation character, and a voice that is synchronized with the animation character. An audio output unit for outputting and a control unit are included. The control unit analyzes the moving image to extract the motion pattern of the set of users, analyzes the voice to extract the utterance pattern of the user, and extracts the extracted motion pattern and utterance. Dialog patterns are extracted in association with time series for at least two sets of patterns, and the degree of similarity between the action pattern and the utterance pattern extracted from the same user as the voice or moving image separately obtained from the user is determined. If the degree of similarity is high, the other user's action pattern for the extracted one user's dialog pattern is output as a moving image by the animation character, and the other person's utterance pattern of the extracted dialog pattern is simultaneously spoken. Output.

本発明によれば、例えば利用者である受講者が、出力されるアニメーションキャラクタの動作情報、音声情報により、リアルな教師との対話環境を保ちながら学習を行うことができ、受講者の学習効果の向上を見込むことができる。また、アニメーションキャラクタを該動作パタンに基いて制御し、パタンとして記述された動作を行うことができるようになり、さらには発話パタンに基いて該キャラクタの音声として合成音声を出力することができるようになる。また、対話パタンデータセット中の動作パタンを、対象となるアニメーションキャラクタの動作として画面表示し、さらに該対話パタンデータセット中の音声パタンを、対象となるアニメーションキャラクタの音声として音声表示することによって、発話パタンに基いて該キャラクタの音声として合成音声を出力することができるようになる。 According to the present invention, for example, a student who is a user can perform learning while maintaining a conversational environment with a real teacher based on the motion information and audio information of the output animation character. Improvement can be expected. In addition, the animation character can be controlled based on the action pattern, and the action described as the pattern can be performed. Furthermore, the synthesized voice can be output as the voice of the character based on the utterance pattern. become. Also, by displaying the motion pattern in the dialogue pattern data set on the screen as the motion of the target animation character, and further displaying the voice pattern in the dialogue pattern data set as voice of the target animation character by voice display, Based on the utterance pattern, a synthesized voice can be output as the voice of the character.

次に、図２を参照しながら教育情報管理サーバ１０１の構成に関して説明する。１０１１は起動されたプログラムに応じて処理を行なうＣＰＵ、１０１２は起動したプログラム等を格納するメモリ、１０１３はメモリデータ等を記憶するハードディスクである。アクセスするデータは、必要に応じてメモリ１０１２上に読み込まれ、ＣＰＵ１０１１によって基づく教育支援のためのデータ処理がなされる。ハードディスク１０１３には、蓄積した映像データから動作情報を抽出するための動作パタンデータ１０１３０１、蓄積された音声データから言語情報を認識するための音声認識用辞書１０１３０２、蓄積された映像データから動作パタンを抽出するための映像パタン認識用データ１０１３０３、受講者が授業で利用する受講者用テキストデータ１０１３０４、さらに１０１３０４に記述されている問題に対する正解データである問題正解データ１０１３０５が格納されている。 Next, the configuration of the education information management server 101 will be described with reference to FIG. Reference numeral 1011 denotes a CPU that performs processing in accordance with the activated program, 1012 denotes a memory that stores the activated program, and 1013 denotes a hard disk that stores memory data and the like. Data to be accessed is read into the memory 1012 as necessary, and data processing for educational support based on the CPU 1011 is performed. The hard disk 1013 stores operation pattern data 101301 for extracting operation information from the stored video data, a voice recognition dictionary 101302 for recognizing language information from the stored audio data, and an operation pattern from the stored video data. Video pattern recognition data 101303 for extraction, student text data 101304 used by students in the class, and problem correct answer data 101305 which is correct answer data for the problem described in 101304 are stored.

また、メモリ１０１２には、サーバが起動されることにより、システム全体を制御するシステムプログラム１０１２０１をはじめとして、データ送受信プログラム１０１２０２、音声認識プログラム１０１２０３、映像認識プログラム１０１２０４、コミュニケーションパタン抽出プログラム１０１２０５、キャラクタ生成プログラム１０１２０６、音声合成プログラム１０１２０７、キャラクタ−ユーザ対話制御プログラム１０１２０８が格納される。メモリ１０１２には、さらに、各教師及び受講者に対応するユーザデータテーブル７００が格納されている。ユーザデータテーブル７００は、例えば、図６に示すように、データ区分７０１としてのイベントデータ、音声データ、映像データ毎に、時系列（７０２）に、かつ、各ユーザーごとに格納される。なお、ユーザデータテーブル７００は一時的なデータの格納に使用されるものであって、処理の都合などでこれらのデータを長時間保持しておく必要のある場合には、一旦これらのデータをハードディスクの所定の領域に格納する。 In addition, the memory 1012 includes a system program 101201 for controlling the entire system when the server is started, a data transmission / reception program 101202, a voice recognition program 101203, a video recognition program 101204, a communication pattern extraction program 101205, a character generation A program 101206, a speech synthesis program 101207, and a character-user interaction control program 101208 are stored. The memory 1012 further stores a user data table 700 corresponding to each teacher and student. For example, as shown in FIG. 6, the user data table 700 is stored in time series (702) for each event data, audio data, and video data as the data section 701 and for each user. Note that the user data table 700 is used for temporary data storage, and when it is necessary to hold these data for a long time for the convenience of processing, these data are temporarily stored in the hard disk. Stored in a predetermined area.

次に、図３を用いて教師用ＰＣ１０２の詳細について説明する。教師用ＰＣ１０２のメモリ１０２１には、本ＰＣを制御するシステムプログラム１０２１０１、データ送受信プログラム１０２１０２がある。また、教師がキーボードやマウスから入力した情報を収集・格納するユーザ入力情報収集・格納プログラム１０２１０３、カメラが取得した映像を格納する映像入力・格納プログラム１０２１０４、マイクによって音声を取得し、さらに格納する音声入力・格納プログラム１０２１０５、及び受講者ｓの対話を行うためのコミュニケーションプログラム１０２１０７がメモリに読み込まれる。メモリに読み込まれたこれらの情報は、教師と受講者の対話コミュニケーション機能に利用されると共に、対話パターン抽出機能およびユニークデジタルインストラクタ生成機能に利用される。 Next, details of the teacher PC 102 will be described with reference to FIG. The memory 1021 of the teacher PC 102 includes a system program 102101 and a data transmission / reception program 102102 for controlling the PC. Also, user input information collection / storage program 102103 for collecting and storing information input from the keyboard and mouse by the teacher, video input / storage program 102104 for storing the video acquired by the camera, and acquiring and storing the sound by the microphone A voice input / storage program 102105 and a communication program 102107 for performing a dialogue with the student s are read into the memory. The information read into the memory is used for a dialog communication function between the teacher and the student, and also used for a dialog pattern extraction function and a unique digital instructor generation function.

教師用ＰＣ１０２のディスプレイには、教師と受講者の対話コミュニケーション機能１０２Ａによりテキストの内容と共に受講者の映像が表示され、さらに、学習スケジュール管理機能１０２Ｇにより、各受講者の学習履歴が表示される。すなわち、図８に示すように、教師用画面５００には、画面左側に受講者映像表示箇所５０１と、学習履歴表示箇所５０２、画面右側にテキスト表示箇所５０３、ONボタン５０４、およびOFFボタン５０５の表示領域がある。また受講者の音声が出力される。教師は、リアルインストラクタによる学習支援機能１０２Ｅにより、受講者との対話形式の学習指導を行う。さらに、学習スケジュール管理機能１０２Ｇにより、各受講者の学習履歴を知りあるいは学習計画を立てることができる。 On the display of the teacher PC 102, the video of the student is displayed together with the text content by the interactive communication function 102A between the teacher and the student, and further, the learning history of each student is displayed by the learning schedule management function 102G. That is, as shown in FIG. 8, the teacher screen 500 includes a student video display location 501 and a learning history display location 502 on the left side of the screen, a text display location 503 on the right side of the screen, an ON button 504, and an OFF button 505. There is a display area. The student's voice is also output. The teacher performs interactive learning guidance with the student by the learning support function 102E by the real instructor. Furthermore, the learning schedule management function 102G can know the learning history of each student or make a learning plan.

また、図４に示すように、受講者用ＰＣ１０３には、メモリ１０３１上にシステムプログラム１０３１０１、データ送受信プログラム１０３１０３がある。また、受講者がキーボードやマウスから入力した情報を収集・格納するユーザ入力情報収集・格納プログラム１０３１０２、カメラが取得した映像を格納する映像入力・格納プログラム１０３１０４、マイクによって音声を取得し、さらに格納する音声入力・格納プログラム１０３１０５、そして受講者が授業を受けるための受講用テキスト表示・編集プログラム１０３１０６及び受講者の対話を行うためのコミュニケーションプログラム１０３１０７がメモリに読み込まれる。コミュニケーションプログラム１０３１０７の構造の一例を図７に示す。コミュニケーションプログラムは、リアルインストラクタ受講者コミュニケーションプログラム１０３１０７１と、デジタルインストラクタ受講者コミュニケーションプログラム１０３１０７２とで構成されている。 As shown in FIG. 4, the student PC 103 includes a system program 103101 and a data transmission / reception program 103103 on a memory 1031. Also, a user input information collection / storage program 103102 that collects and stores information input by the student from the keyboard and mouse, a video input / storage program 103104 that stores the video acquired by the camera, and obtains audio by a microphone and further stores it. The voice input / storage program 103105, the attendance text display / editing program 103106 for the student to take the class, and the communication program 103107 for the conversation of the student are read into the memory. An example of the structure of the communication program 103107 is shown in FIG. The communication program includes a real instructor student communication program 1031071 and a digital instructor student communication program 1031072.

受講者用ＰＣ１０３のディスプレイには、テキストの内容と共に教師またはデジタルインストラクタの映像が表示され、さらに、教師の音声や受講者に説明するために教師が入力したデータも表示される。すなわち、図９に示すように、受講者用画面６００には、画面左側に教師映像表示箇所６０１と、教師入力データ表示箇所６０２、画面右側にテキスト表示箇所６０３、ONボタン６０４、およびOFFボタン６０５の表示領域がある。 On the display of the student's PC 103, the video of the teacher or digital instructor is displayed together with the text content, and further, the voice of the teacher and the data input by the teacher for explaining to the student are also displayed. That is, as shown in FIG. 9, the student screen 600 has a teacher video display location 601 and a teacher input data display location 602 on the left side of the screen, a text display location 603 on the right side of the screen, an ON button 604, and an OFF button 605. There is a display area.

さらに、図５に示すように、映像情報蓄積用サーバ１０４には、メモリ１０４２上にシステムプログラム１０４２０１と映像/音声蓄積プログラム１０４２０２が読み込まれる。またハードディスク１０４３には撮影された映像データ１０４３０１や音声/音響データ１０４３０２が蓄積される。 Further, as shown in FIG. 5, the system information 104201 and the video / audio storage program 104202 are read into the memory 1042 in the video information storage server 104. Also, the video data 104301 and audio / sound data 104302 that have been shot are stored in the hard disk 1043.

教師や受講者の音声や受講者に説明するために教師が入力したデータ等の情報は、教育情報管理サーバ１０１の対話パターン抽出機能１０１Ｃおよびユニークデジタルインストラクタ生成機能１０１Ｄに利用される。 Information such as the voice of the teacher and students and the data input by the teacher for explaining to the students is used for the dialogue pattern extraction function 101C and the unique digital instructor generation function 101D of the education information management server 101.

なお、上記各サーバーやＰＣの各機能、例えば対話コミュニケーション機能１０１Ａ、１０２Ａ及び、１０３Ａは、いずれも同じ機能であり、各サーバーやＰＣ上で動作するものとして便宜的に別々の符号を付している。以下、特に区別する必要が無いときは、教育情報管理サーバ１０１に対応する機能で説明する。 Note that the functions of the servers and PCs, for example, the interactive communication functions 101A, 102A, and 103A, are all the same functions, and are given different symbols for the sake of convenience on the servers and PCs. Yes. Hereinafter, when there is no need to distinguish between them, the function corresponding to the education information management server 101 will be described.

受講者用ＰＣ１０３は、教師による学習支援とデジタルインストラクタによる学習支援とに対応しており、後で述べるように、教師による学習の際には受講者用ＰＣ１０３のディスプレイに、教師と受講者の対話コミュニケーション機能１０１Ａによりテキストの内容と共に教師の映像が表示され、またテキスト表示編集機能１０１Ｂにより教師の音声が出力される。一方、デジタルインストラクタによる学習支援の際には受講者用ＰＣ１０３のディスプレイにテキストの内容と共にデジタルインストラクタの画像が表示され、また教師の合成音声が出力される。なお、予め用意された当初のデジタルインストラクタ（以下、オリジナルデジタルインストラクタ）は、受講者間で差が無いが、レッスンの進行と共に受講者固有のユニークなインストラクタとなって行く。よって、このようにして生成されたデジタルインストラクタを、以下ユニークデジタルインストラクタと呼ぶことにする。 The student PC 103 supports learning support by the teacher and learning support by the digital instructor. As will be described later, when the teacher learns, the dialogue between the teacher and the student is displayed on the display of the student PC 103. The communication function 101A displays the teacher's video together with the text content, and the text display editing function 101B outputs the teacher's voice. On the other hand, at the time of learning support by the digital instructor, the image of the digital instructor is displayed together with the text content on the display of the student's PC 103, and the synthesized speech of the teacher is output. The initial digital instructor prepared in advance (hereinafter referred to as the original digital instructor) is not different among the students, but becomes a unique instructor unique to the student as the lesson progresses. Therefore, the digital instructor generated in this way is hereinafter referred to as a unique digital instructor.

次に、本システムにおける学習支援の処理を記述する。まず始めに、リアルインストラクタによる学習支援機能の処理について説明する。リアルインストラクタによる学習支援では、教師と受講者が遠隔地どうしで、各々のＰＣを用いて対話を行いながら所定のテキストに沿って学習を進める。 Next, the learning support processing in this system will be described. First, the learning support function processing by the real instructor will be described. In the learning support by the real instructor, the teacher and the learner advance the learning along a predetermined text while performing a dialogue using each PC between remote places.

受講者用ＰＣ１０３は、教師による学習支援とデジタルインストラクタによる学習支援とに対応している。リアルインストラクタによる学習支援の際に、この学習支援と並行して、特徴的な対話パターンを教育情報管理サーバ１０１の対話パターン抽出機能１０１Ｃで抽出し、ユニークデジタルインストラクタ生成機能１０１Ｄにより、ユニークデジタルインストラクタを生成する。このユニークデジタルインストラクタを用いて、デジタルインストラクタによる学習支援機能１０１Ｅが生成される。 The student PC 103 supports learning support by a teacher and learning support by a digital instructor. At the time of learning support by the real instructor, in parallel with this learning support, a characteristic dialog pattern is extracted by the dialog pattern extraction function 101C of the education information management server 101, and a unique digital instructor is generated by the unique digital instructor generation function 101D. Generate. Using this unique digital instructor, a learning support function 101E by the digital instructor is generated.

まず、教師用ＰＣ１０２ではＰＣ起動後にコミュニケーションプログラム１０２１０７が起動され、例えば、図８に示すような教師用画面５００が表示される。また、受講者用ＰＣ１０３では、ＰＣ起動後にコミュニケーションプログラム１０３１０７が起動され、例えば図９に示すような受講者用画面６００が表示される。 First, on the teacher PC 102, the communication program 102107 is started after the PC is started, and for example, a teacher screen 500 as shown in FIG. 8 is displayed. In the student PC 103, the communication program 103107 is activated after the PC is activated, and a student screen 600 as shown in FIG. 9 is displayed, for example.

リアルインストラクタによる学習支援にあたっては、教師用ＰＣ画面５００のONボタン５０４を押すと、データ送受信プログラムによってコミュニケーションを要求してきた受講者とコミュニケーションモードに入ることができる。ここで受講者用ＰＣ１０３において画面６００のONボタン６０４を押すと、データ送受信プログラムにより、教師用ＰＣ１０２と対話が可能なコミュニケーションモードに入り、受講者は教師映像表示箇所６０１に表示されている教師の画像やテキスト表示個所６０３に表示されている受講者用テキストデータ、教師入力データ表示箇所６０２に表示されているデータを閲覧しながら、教師と学習を行うことになる。 In the learning support by the real instructor, when the ON button 504 of the teacher PC screen 500 is pressed, the communication mode can be entered with the student who has requested communication by the data transmission / reception program. Here, when the ON button 604 of the screen 600 is pressed on the student PC 103, the data transmission / reception program enters a communication mode in which the teacher can interact with the teacher PC 102, and the student is displayed on the teacher video display location 601. Learning with the teacher is performed while browsing the text data for the students displayed in the image and text display location 603 and the data displayed in the teacher input data display location 602.

次に、図１０により、教師による学習支援の際の各種データ（イベントデータ、音声データ、映像データ）を記録保持する処理について説明する。 Next, a process for recording and holding various data (event data, audio data, video data) at the time of learning support by the teacher will be described with reference to FIG.

ここでは、教師と受講者間の対話データの蓄積方法について説明する。まず始めに、教育情報管理サーバ１０１、教師用ＰＣ１０２、受講者用ＰＣ１０３を起動する（Ｓ１００１、Ｓ１００２）。この時点で教師用ＰＣに搭載されている各プログラムが起動される。教育情報管理サーバ１０１から各ＰＣへ教育情報が送信される（Ｓ１００３）。また、各ＰＣのコミュニケーションプログラムが起動され（Ｓ１００３）、各ＰＣの画面に教育情報がテキストとして表示される（Ｓ１００４）。ここで、カメラにより教師の映像データを撮像し、同データは、ユーザ入力情報収集・格納プログラムによって教育情報管理サーバに送信される（Ｓ１００７）。同時に、教師用ＰＣに接続されているマイクにより教師の音声データが入力されると、同データはユーザ入力情報収集・格納プログラムによって教育情報管理サーバ１０１に送信される。 Here, a method for accumulating dialogue data between a teacher and a student will be described. First, the educational information management server 101, teacher PC 102, and student PC 103 are activated (S1001, S1002). At this point, each program installed in the teacher PC is activated. Education information is transmitted from the education information management server 101 to each PC (S1003). Further, the communication program of each PC is activated (S1003), and the education information is displayed as text on the screen of each PC (S1004). Here, the video data of the teacher is captured by the camera, and the data is transmitted to the education information management server by the user input information collection / storage program (S1007). At the same time, when the teacher's voice data is input by the microphone connected to the teacher PC, the data is transmitted to the education information management server 101 by the user input information collection / storage program.

音声データは、音声入力・格納プログラムによりＡ/Ｄ変換されて、教育情報管理サーバに送信される。映像データについては、同カメラで取り込まれる時にＡ/Ｄ変換され、データ送受信プログラムによって教育情報管理サーバに送信される。また、教師や受講者がキーボードやマウスから入力したデータもイベントコード情報としてサーバに送信される。さらに各データの入力時刻、入力終了時刻のデータもサーバに送信される。サーバに送信された各データは、ユーザデータテーブル７００に格納される（Ｓ１００８〜Ｓ１０１４）。 The voice data is A / D converted by a voice input / storage program and transmitted to the education information management server. The video data is A / D converted when captured by the camera and transmitted to the educational information management server by the data transmission / reception program. Data input from the keyboard or mouse by the teacher or students is also transmitted to the server as event code information. Further, the data of the input time and input end time of each data is also transmitted to the server. Each data transmitted to the server is stored in the user data table 700 (S1008 to S1014).

これらのイベントデータは、例えば、図６に示すように、キーイベント、マウス入力イベント等の各イベント毎に時系列にユーザデータテーブル７００に格納され、音声データは、音声が入力された音声立ち上がり時刻から音声入力が終了した音声立下り時刻までを音声データ入力時間帯としてユーザデータテーブルに格納される。映像データは、常時入力状態となっているため、カメラが起動されてから撮影終了までを入力時間帯としてユーザデータテーブル７００に格納する。教師がコミュニケーションモードを終了する場合にも画面右側のOFFボタン５０５を押す。 For example, as shown in FIG. 6, these event data are stored in the user data table 700 in time series for each event such as a key event and a mouse input event, and the voice data is the voice rise time when the voice is input. Until the voice falling time when voice input is completed is stored in the user data table as a voice data input time zone. Since the video data is always in an input state, the video data is stored in the user data table 700 as an input time zone from when the camera is activated until the end of shooting. Even when the teacher ends the communication mode, the OFF button 505 on the right side of the screen is pressed.

受講者用ＰＣ１０３では、コミュニケーションモードにある場合、接続されているマイクより受講者の音声データを記録し、同時にカメラにより受講者の映像データをユーザ入力情報収集・格納プログラムによって記録する（Ｓ１００７）。また同時に、受講者用テキスト表示・編集プログラムも起動し、受講者が表示されているテキストに対して、例えばキーボードやマウスといった入力デバイスを用いて情報入力すると、入力された情報に応じてテキストが編集される。受講者の音声データや映像データも教師の音声データや映像データと同様に、音声データはサーバに送信された段階で音声入力プログラムによりＡ/Ｄ変換され、映像データはカメラに入力された際にＡ/Ｄ変換され、データ送受信プログラムによってサーバに送信される（Ｓ１００９、Ｓ１０１３）。また、教師や受講者がキーボードやマウスから入力したデータもコード情報としてサーバに送信され、さらに各データの入力時刻、入力終了時刻のデータもサーバに送信される。 When in the communication mode, the student PC 103 records the voice data of the student from the connected microphone, and at the same time records the video data of the student by the camera by the user input information collection / storage program (S1007). At the same time, the student text display / editing program is started, and when the student inputs information to the displayed text using an input device such as a keyboard or a mouse, the text is displayed according to the input information. Edited. As with the teacher's voice data and video data, the voice data of the students is A / D converted by the voice input program when the voice data is sent to the server, and the video data is input to the camera. A / D converted and transmitted to the server by the data transmission / reception program (S1009, S1013). In addition, data input from the keyboard or mouse by the teacher or students is transmitted to the server as code information, and data of the input time and input end time of each data is also transmitted to the server.

送信された各データは、教師の場合と同様に、図６に示すようなイベントデータ、音声データ、映像データ毎に、サーバ１０１のメモリ領域であるユーザデータテーブル７００に時系列に格納される（Ｓ１０１０、Ｓ１０１１）。 Each transmitted data is stored in time series in the user data table 700 which is a memory area of the server 101 for each event data, audio data, and video data as shown in FIG. S1010, S1011).

受講者がコミュニケーションモードを終了するときには、画面右側のOFFボタン６０５を押す。 When the student ends the communication mode, the user presses the OFF button 605 on the right side of the screen.

教師用ＰＣ１０２及び受講者用ＰＣ１０３でOFFボタンが押された場合、教育情報管理サーバ１０１は、リアルインストラクタによる学習支援が終了したものとして、受講者の学習履歴を更新した後（Ｓ１０１７）、一連の処理を終了する。 When the OFF button is pressed on the teacher PC 102 and the student PC 103, the education information management server 101 assumes that the learning support by the real instructor has been completed and updates the student learning history (S1017). The process ends.

一方、リアルインストラクタによる学習支援の間に、ユーザデータテーブル７００に格納された映像データは、撮像時刻を付加した映像フレームデータとしてフレームデータ格納領域に格納される。さらに音声・音響データも音声立ち上がり時刻及び立ち下がり時刻を付加したデータとして音声・音響データ格納領域に格納される。 On the other hand, during the learning support by the real instructor, the video data stored in the user data table 700 is stored in the frame data storage area as video frame data to which the imaging time is added. Furthermore, the voice / sound data is also stored in the voice / sound data storage area as data to which the voice rise time and fall time are added.

次に、図１１〜図１４により、教師による学習支援の際に取得された上記各種データから、対話パターン抽出機能１０１Ｃにより、教師による学習支援の際の特徴的な対話パターンを抽出する処理について説明する。 Next, with reference to FIGS. 11 to 14, a description will be given of a process of extracting a characteristic dialogue pattern at the time of learning support by the teacher by the dialogue pattern extraction function 101C from the various data acquired at the time of learning support by the teacher. To do.

図１１で、データの認識処理について説明する。この認識処理を行うために、まず、ハードディスクからユーザデータテーブル７００へ必要なデータを読み出す（Ｓ１１０１）。同時に、教育情報管理サーバ１０１上の音声認識プログラム１０１２０３、映像認識プログラム１０１２０５を起動する（Ｓ１１０２、１１０３）。 The data recognition process will be described with reference to FIG. In order to perform this recognition processing, first, necessary data is read from the hard disk to the user data table 700 (S1101). At the same time, the voice recognition program 101203 and the video recognition program 101205 on the education information management server 101 are activated (S1102, 1103).

次に、音声認識プログラム１０１２０３及び音声認識用辞書１０１３０２によって、格納されている音声・音響データをテキストデータに変換する（Ｓ１１０４）。 Next, the stored voice / acoustic data is converted into text data by the voice recognition program 101203 and the voice recognition dictionary 101302 (S1104).

次に、変換されたテキストデータに対応している音声箇所の立ち上がり時刻と立ち下がり時刻をテキストデータに付加し、タイムスタンプ付きテキストデータとして、例えばSpeech[ m ][ n ] (m = 音声パタン数、n=０:音声立ち上がり時刻、n=１：音声立ち下がり時刻、n=２:認識されたテキストデータ)の配列形式で格納する。 Next, the rise time and fall time of the voice part corresponding to the converted text data are added to the text data, and for example, Speech [m] [n] (m = number of voice patterns) , N = 0: voice rise time, n = 1: voice fall time, n = 2: recognized text data).

次に、映像認識プログラム及び映像パタン認識用データにより、撮像された映像データ全てから静止時間帯を検出する。次に静止時間帯に挟まれた映像箇所を動作パタンとして切り出し、動作パタンデータとして例えばMove[ i ][ j ]( i = 動作パタン数、j=フレーム数)の配列データとしてメモリ領域に格納する。次に、各動作パタンデータの撮像時刻をタイムスタンプデータとして例えばMove_t[ i ][ p ][ t ] (i = 動作パタン数、p=０:映像パタン起点時刻、p = １：映像パタン終点時刻)のような配列データとして格納する。 Next, the still time zone is detected from all the captured video data by the video recognition program and the video pattern recognition data. Next, the video portion sandwiched between the still time periods is cut out as an operation pattern, and stored as operation pattern data, for example, in the memory area as array data of Move [i] [j] (i = number of operation patterns, j = number of frames) . Next, for example, Move_t [i] [p] [t] (i = number of motion patterns, p = 0: video pattern start time, p = 1: video pattern end time) ) As array data.

さらに、抽出された動作パタンどうしで類似しているパタンを検出する。類似しているパタンが検出された場合には、例えば、Move_a[ v ][ w ](v = 類似パタン数、w = 類似パタンである動作パタン番号(i))といった配列形式で格納される（Ｓ１１０５）。 Furthermore, a pattern similar between the extracted motion patterns is detected. When a similar pattern is detected, for example, it is stored in an array format such as Move_a [v] [w] (v = number of similar patterns, w = action pattern number (i) with similar patterns) ( S1105).

さらに、教師の音声パタンデータと映像パタンデータについては、映像パタンが出現した時間帯と同時間帯に音声パタンが出現している場合には、ユーザメディア間関連情報として、例えばM_S_A[ s ][ q ](s = 関連情報数、q = １：音声パタン番号、q = ２：映像パタン番号)のような配列形式に格納される（Ｓ１１０６〜Ｓ１１０８）。 Further, regarding the voice pattern data and video pattern data of the teacher, when the voice pattern appears in the same time zone as the time when the video pattern appeared, as the related information between user media, for example, M_S_A [s] [ q] (s = number of related information, q = 1: audio pattern number, q = 2: video pattern number) (S1106 to S1108).

例えば、図１４の静止時間帯に挟まれた時間Ｔ１〜Ｔ２における受講者の動作パタンデータ（１）、静止時間帯に挟まれた時間Ｔ３〜Ｔ４における教師の動作パタンデータ（１）及び時間Ｔ５〜Ｔ６における教師の動作パタンデータ（２）が、それぞれ類似している動作パタンとして以前にも現れているものであるとき、類似パタンである動作パタンとして抽出、格納される。また、映像パタンが出現した時間Ｔ１〜Ｔ２、時間Ｔ３〜Ｔ４、及び時間Ｔ５〜Ｔ６にそれぞれ類似している音声パタンが出現している場合には、それぞれユーザメディア間関連情報として記録される。 For example, the student's motion pattern data (1) in the time T1 to T2 sandwiched between the stationary time zones in FIG. 14, the teacher's motion pattern data (1) and the time T5 in the time T3 to T4 sandwiched between the stationary time zones. When the motion pattern data (2) of the teacher in .about.T6 has previously appeared as similar motion patterns, they are extracted and stored as motion patterns that are similar patterns. In addition, when audio patterns similar to the time T1 to T2, the time T3 to T4, and the time T5 to T6 that appear, respectively, are recorded as the related information between user media.

次に、図１２において、パターン抽出の処理を説明する。この処理を行うために、コミュニケーションパタン抽出プログラム１０１２０５を起動する（Ｓ１２０１）。ここでは上記音声認識プログラム、映像認識プログラムによって認識された音声データ及び映像データ、及びユーザデータテーブル７００に格納されているユーザのイベントデータを用いて教師と受講者の対話パタンを抽出するものとする。 Next, the pattern extraction process will be described with reference to FIG. In order to perform this process, the communication pattern extraction program 101205 is activated (S1201). Here, the dialogue pattern between the teacher and the student is extracted using the voice recognition program, the voice data and video data recognized by the video recognition program, and the user event data stored in the user data table 700. .

まず、音声データ間でのコミュニケーションパタンを抽出する。教師から得られたSpeech配列データと受講者から得られたSpeech配列データに関し、両者の発声時間帯を時系列にマッピングする。例えば、教師から得られたSpeech配列データを、T_Speech[ m_t ][ n_t ]とし、受講者から得られたSpeech配列データをS_Speech[ m_s ][ n_s ]とすると、T_Speech[ m_t ][ １ ] とS_Speech[ m_s ][ １ ]の時刻を比較し、出現時刻が早いパタンデータから時系列にマッピングする。例えば、Speech_t_a[ s_１ ][ s_２ ][ s_３ ](s_１=総音声パタン数、s_２=０（教師による音声）、s_２=１（受講者による音声）、 s_３=０（開始時刻）、s_３=１（終了時刻）)といった配列形式で格納する。 First, communication patterns between voice data are extracted. For the speech sequence data obtained from the teacher and the speech sequence data obtained from the students, the utterance time zones of both are mapped in time series. For example, if the Speech sequence data obtained from the teacher is T_Speech [m_t] [n_t] and the Speech sequence data obtained from the student is S_Speech [m_s] [n_s], T_Speech [m_t] [1] and S_Speech [m_s] The time of [1] is compared, and the pattern data with the earlier appearance time is mapped in time series. For example, Speech_t_a [s_1] [s_2] [s_3] (s_1 = total number of voice patterns, s_2 = 0 (teacher voice), s_2 = 1 (student voice), s_3 = 0 (start time), s_3 = 1 (End time)).

さらに、例えば、受講者の音声パタンデータと隣接する教師の音声パタンデータの間の無音時間長を算出し、さらに算出された無音時間長を時系列に、例えば、Blank_t[ a ](a = ブランク数)といった配列形式で格納する。 Furthermore, for example, the silent time length between the voice pattern data of the student and the voice pattern data of the adjacent teacher is calculated, and the calculated silent time length is calculated in time series, for example, Blank_t [a] (a = blank Number)).

例えば、図１４の２つの音声パタンデータ間の時間Ｔ２とＴ３の間は、無音時間すなわち受講者の発言したのに対して、次に教師が応答するまでの音声のブランクの期間である。 For example, a period between times T2 and T3 between the two voice pattern data in FIG. 14 is a silent period, that is, a voice blank period until the teacher responds to the next speech.

次に、映像データ間でのコミュニケーションパタンを抽出する（Ｓ１２０２）。 Next, a communication pattern between video data is extracted (S1202).

映像情報についても音声情報の場合と同様に、教師の動作パタンと受講者の動作パタンを時系列にマッピングする。例えば、Move_t_a[ a_１ ][ a_２ ][ a_３ ](a_１=総動作パタン数、a_２=０（教師による動作）、a_２=１（受講者による動作）、a_３=０（開始時刻）、a_３=１（終了時刻）)といった配列形式で格納する（Ｓ１２０３）。 For video information, as in the case of audio information, a teacher's motion pattern and a student's motion pattern are mapped in time series. For example, Move_t_a [a_1] [a_2] [a_3] (a_1 = total number of motion patterns, a_2 = 0 (operation by teacher), a_2 = 2 (operation by student), a_3 = 0 (start time), a_3 = 1 (End time)) is stored in an array format (S1203).

次に、上記で行ったパタン抽出結果に基いてメディア内のコミュニケーション状況を抽出する。Speech_t_aに関し、隣接した音声パタンのs_２の値が異なっている場合で、音声パタンが同一であるものが、例えば、３回以上検出された場合には、同パタンを音声コミュニケーションパタンとみなして、C_S_P[ c_１ ][ c_２ ](c_１ = パタン数、c_２ = ０（教師音声パタン）、c_２ = １（受講者音声パタン）)といった配列形式で格納する。また、無音時間長については、Blank_t[ a ]について、Blank_tの両端に対応する音声パタンについてs_２の値が異なっている場合のみを抽出し、s_２が０から１の場合、s_２が１から０の場合に毎にブランクの平均値と最大値、最小値を算出する。それぞれ、例えば、B_０_１[b_１][b_２][b_３]（b_１=０（s_２が０から１の場合)、b_１=１(s_２が１から０の場合)、b_２=０（平均値）、b_２=１（最大値）、b_２=２（最小値））といった配列形式で格納する（Ｓ１２０４）。 Next, the communication status in the media is extracted based on the pattern extraction result performed above. Regarding Speech_t_a, when the values of s_2 of adjacent voice patterns are different and the same voice pattern is detected, for example, three times or more, the same pattern is regarded as a voice communication pattern, and C_S_P [c_1] [c_2] (c_1 = number of patterns, c_2 = 0 (teacher voice pattern), c_2 = 1 (student voice pattern)). As for the silent time length, only blank_t [a] is extracted only when the value of s_2 is different for the voice patterns corresponding to both ends of Blank_t. When s_2 is 0 to 1, s_2 is 1 to 0 In each case, the average value, the maximum value, and the minimum value of the blank are calculated. For example, B_0_1 [b_1] [b_2] [b_3] (b_1 = 0 (when s_2 is 0 to 1), b_1 = 1 (when s_2 is 1 to 0), b_2 = 0 (average value), b_2 = 1 (maximum value), b_2 = 2 (minimum value)) (S1204).

次に、Move_t_aに関し、隣接した動作パタンのa_２の値が異なっている場合で、動作パタンが同一であるものが、例えば、３回以上検出された場合には、同パタンを動作コミュニケーションパタンとみなして、C_A_P[ c_１ ][ c_２ ](c_１ = パタン数、c_２ = ０（教師動作パタン）、c_２ = １（受講者動作パタン）)といった配列形式で格納する（Ｓ１２０５）。 Next, with respect to Move_t_a, when the values of a_2 of adjacent operation patterns are different and the same operation pattern is detected, for example, three or more times, the same pattern is regarded as an operation communication pattern. And stored in an array format such as C_A_P [c_1] [c_2] (c_1 = number of patterns, c_2 = 0 (teacher action pattern), c_2 = 1 (student action pattern)) (S1205).

次に、上記で行ったパタン抽出結果に基いてメディア間のコミュニケーション状況を抽出する（Ｓ１２０６）。メディア間の場合は、メディア内の場合と同様に、受講者の動作パタンと教師の音声パタンが隣接して出現している場合が、例えば３回以上あった場合には、これをメディア間コミュニケーションパタンとして、例えば、In_M_P[p_１ ][p_２ ](P_１=パタン数、P_２=メディア数、P_３=０（音声パタン）、P_３=１（動作パタン）)といった配列形式で格納する（Ｓ１２０７）。 Next, the communication status between media is extracted based on the pattern extraction result performed above (S1206). In the case of between media, as in the case of the media, when the student's motion pattern and the teacher's voice pattern appear adjacent to each other, for example, when there are three or more times, this is communicated between the media. The pattern is stored in an array format such as In_M_P [p_1] [p_2] (P_1 = number of patterns, P_2 = number of media, P_3 = 0 (voice pattern), P_3 = 1 (operation pattern)) (S1207).

次に、図１３で、ユニークデジタルインストラクタ用のキャラクタ生成について説明する。最初に、キャラクタ生成キャラクタ生成プログラム１０１２０６を起動する（Ｓ１３０１）。例えば、手話アニメーション生成システムプログラム等を用いてキャラクタによる動作生成を行う。ここでは上記で抽出された教師の動作パタンを読み出し（Ｓ１３０２）、同システムにおける動作特徴量とし、アニメーション基本動作データとして格納する（Ｓ１３０３）。例えば、動作パタンMove_t[ ０ ][ １ ][ １ ]に対して生成されるアニメーション基本動作データは、Anim[ ０ ][ １ ][ １ ][ a_４ ](a_４：動作ポイント数)として配列形式で記述され格納される。 Next, character generation for the unique digital instructor will be described with reference to FIG. First, the character generation character generation program 101206 is activated (S1301). For example, motion generation by a character is performed using a sign language animation generation system program or the like. Here, the motion pattern of the teacher extracted above is read (S1302), and is stored as motion basic motion data as motion feature quantities in the system (S1303). For example, the animation basic motion data generated for the motion pattern Move_t [0] [1] [1] is an array format as Anim [0] [1] [1] [a_4] (a_4: number of motion points). Described and stored.

次に、上記アニメーション基本動作データを、オリジナルデジタルインストラクタ、又は、前回までに作成されたユニークデジタルインストラクタと対応付けるための編集を行い、新たなユニークデジタルインストラクタとして生成し、登録する（Ｓ１３０４）。 Next, the animation basic motion data is edited to be associated with the original digital instructor or the unique digital instructor created so far, and is generated and registered as a new unique digital instructor (S1304).

例えば、図１４の時間Ｔ１〜Ｔ２における受講者の音声や動作のパタンデータ（１）の類似パタンの後に、時間が略Ｔ３〜Ｔ４のタイミングで、教師の音声や動作のパタンデータ（１）と（２）の類似パタンが時間が略Ｔ５〜Ｔ６のタイミングで現れるのが抽出、格納されている場合、受講者の音声や動作のパタンデータ（１）を抽出したら、教師の動作のパタンデータ（１）と（２）に類似するアニメーションを生成すると共に、教師の音声に対応する音声を合成する。 For example, after the similar pattern of the voice and action pattern data (1) of the student at time T1 to T2 in FIG. 14, the voice data and action pattern data (1) of the teacher at the timing of time T3 to T4. When the similar pattern of (2) appears and is stored at a timing of approximately T5 to T6, the voice data of the student and the pattern data (1) of the movement are extracted, and then the movement pattern data of the teacher ( An animation similar to 1) and (2) is generated, and a voice corresponding to the voice of the teacher is synthesized.

このように、動画像を解析して教師や受講者（利用者）の動作パタン抽出を行い、上記音声を解析して利用者の発声パタン抽出を行い、抽出された動作パタンと発声パタンの少なくとも２セットについてそれぞれ時系列に対応付けて対話パタンの抽出を行い、利用者から別途取得される音声や動画像と同一利用者から抽出した動作パタンと発声パタンとの類似度を判定し、類似度が高ければ、抽出された対話パタンの相手利用者の動作パタンをアニメーションキャラクタによって動画像として出力し、同時に抽出された対話パタンの相手利用者の発声パタンを音声出力するようにする。 As described above, the motion image is analyzed to extract the motion pattern of the teacher or the student (user), the voice is analyzed to extract the utterance pattern of the user, and at least the extracted motion pattern and utterance pattern are extracted. Dialog patterns are extracted in association with each of the two sets in time series, and the similarity between the action pattern and the utterance pattern extracted from the same user as the voice or moving image acquired separately from the user is determined, and the similarity If it is high, the movement pattern of the other user of the extracted dialogue pattern is output as a moving image by the animation character, and the voice pattern of the other user of the extracted conversation pattern is output at the same time.

なお、ユニークデジタルインストラクタの生成は、教師のＰＣで行うことも出来る。 The unique digital instructor can also be generated on the teacher's PC.

キャラクタ生成キャラクタ生成プログラム１０１２０６に、テキスト、画像等を入力すると、既に選択されているアニメーションキャラクタに対し、選択された対話パタンに対応して、テキストを音声に変換し、テキストに対応する対話パタン中の動作パタンをアニメーションキャラクタによって表示する。さらに指示された位置や時刻或いは時間帯に画像を表示することによって、例えば教師が自分を模写したキャラクタで自分の講義コンテンツを作成する場合、教師自ら音声を録音する必要がなく、また１度作成したコンテンツに対して修正を加えることができるようになる。 When text, an image, or the like is input to the character generation character generation program 101206, the text is converted into speech corresponding to the selected dialog pattern for the already selected animation character, and the dialog pattern corresponding to the text is being displayed. The motion pattern is displayed by an animated character. In addition, by displaying images at the specified position, time, or time zone, for example, when a teacher creates his / her lecture content with a character that replicates himself / herself, the teacher does not need to record his / her voice, and once created It becomes possible to make corrections to the content that has been made.

一例として、受講者の動作パタンデータ及び音声パタンデータ（１）が、「画面６００上のテキスト表示個所をマウスで指しながら、「この意味がわかりません。」と質問する。」ものであり、これに応答する教師の動作のパタンデータ及び音声パタンデータ（１）と（２）が、「頭に手をやりながら、一定の間隔を置いて、「えーと、−−えーと」と繰り返すものである。」場合には、教師の動作のパタンに対応するアニメーションとして、例えば、図１９Ａに示すようなオリジナルデジタルインストラクタのアニメーションに加えて、図１９Ｂに示すような特徴動作パタン対応のアニメーションを生成すると共に、ユーザメディア間関連情報としてこのアニメーションに連動する合成音声「えーと、−−えーと」を記録する。 As an example, the student's motion pattern data and voice pattern data (1) asks "I don't know this meaning" while pointing the text display location on the screen 600 with a mouse. The pattern data and voice pattern data (1) and (2) of the teacher's action responding to the above are expressed as follows. And repeat. In this case, as the animation corresponding to the motion pattern of the teacher, for example, in addition to the animation of the original digital instructor as shown in FIG. 19A, an animation corresponding to the characteristic motion pattern as shown in FIG. 19B is generated. A synthesized voice “e --- e-to” linked to this animation is recorded as the related information between user media.

このようにして生成されたデジタルインストラクタは、教師と各受講者との関係で生成されるので、同じ教師であっても受講者毎に異なったデジタルインストラクタとして生成し、レッスンが続く間成長を続ける。また、レッスンを繰り返すごとに、デジタルインストラクタが更新される。従って、予め用意されたオリジナルデジタルインストラクタは、受講者間で差が無いが、レッスンの進行と共に受講者固有のユニークデジタルインストラクタとなる。このようなユニークデジタルインストラクタは、受講者の個性を反映して生成されるので、受講者の関心をひきつけ、レッスンの効果を高めることができる。 The digital instructor generated in this way is generated by the relationship between the teacher and each student, so even for the same teacher, it is generated as a different digital instructor for each student and continues to grow while the lesson continues. . The digital instructor is updated each time the lesson is repeated. Therefore, the original digital instructor prepared in advance has no difference between the students, but becomes a unique digital instructor unique to the student as the lesson progresses. Such a unique digital instructor is generated by reflecting the individuality of the student, so that the interest of the student can be attracted and the effect of the lesson can be enhanced.

次に、図１５で、上記の操作及び解析が終了した後に、受講者が、上記キャラクタ生成プログラムによって生成されたデジタルインストラクタを用いて学習する場合のフローについて説明する。まず受講者が受講者用ＰＣを起動する（Ｓ１４０１）。次に受講者がコミュニケーションプログラムを起動する（Ｓ１４０２）と、受講者ＰＣ画面に、例えば、「ただいまの時間、デジタルインストラクタが起動されますが、よろしいでしょうか。」というテキスト表示と共にYes/Noボタンが表示される（Ｓ１４０３）。受講者がYesボタンを押すと、コミュニケーションプログラムのサブプログラムであるユーザ−デジタルインストラクタコミュニケーションプログラム及びキャラクタ−ユーザ対話制御プログラムが起動され（Ｓ１４０４）、受講者画面６００が表示される。受講者画面には図１９Ａに示した画面の教師映像表示箇所６１０にキャラクタが表示される。ユーザが受講者用テキストに情報を入力しながら学習を行っているところを受講者用ＰＣのカメラにて撮像する（Ｓ１４０５）。撮像されたデータは、サーバに送信され、上記方法と同様に動作パタンが抽出される。抽出された動作パタンをC_A_P[ c_１ ][ c_２ ]内で検索し（Ｓ１４０６）、同動作パタンが検出された場合には、同動作パタンと対応付けられている教師動作パタンを抽出する（Ｓ１４０７）。さらに同動作パタンに対応付けられているAnimデータを検索し、検索されたAnimデータを用いて画面上に表示されたキャラクタを動作させる。同時に、In_M_Pでも検索を行い、同一パタンが検出された場合には、対応する音声パタンのテキストを用い音声合成プログラムによって音声合成データを生成し、受講者用ＰＣのスピーカから出力する（Ｓ１４０８〜Ｓ１４１０）。 Next, referring to FIG. 15, a flow when the student learns using the digital instructor generated by the character generation program after the above operation and analysis are completed will be described. First, the student starts the student's PC (S1401). Next, when the student starts the communication program (S1402), for example, a Yes / No button is displayed on the student's PC screen with a text display such as “A digital instructor will be activated at the moment. It is displayed (S1403). When the student presses the Yes button, the user-digital instructor communication program and the character-user interaction control program, which are subprograms of the communication program, are activated (S1404), and the student screen 600 is displayed. On the student screen, a character is displayed at a teacher video display location 610 of the screen shown in FIG. 19A. The place where the user is learning while inputting information in the student text is captured by the camera of the student PC (S1405). The imaged data is transmitted to the server, and operation patterns are extracted in the same manner as in the above method. The extracted action pattern is searched in C_A_P [c_1] [c_2] (S1406). If the same action pattern is detected, a teacher action pattern associated with the same action pattern is extracted (S1407). . Further, the Anim data associated with the same action pattern is searched, and the character displayed on the screen is moved using the searched Anim data. At the same time, perform a search even In_M_P, when the same pattern is detected, generates a speech synthesis data by a corresponding voice pattern text using sound voice synthesis program, output from the speaker of the student PC (S1408~ S1410).

また、受講者が音声を発声した場合には、上記の音声処理と同様に、音声パタンを検出し、検出された音声パタンをC_S_Pにおいて検索する。ここで同一の音声パタンであるテキストが検出された場合には、本パタンに対応付けられている教師音声パタンを抽出し、同音声パタンであるテキスト情報を音声合成プログラムにより音声合成データを生成し、受講者用ＰＣのスピーカから出力する。同時にIn_M_Pでも検索を行い、同一パタンが検出された場合には、対応する動作パタンデータを用いAnimデータを検索し、受講者用ＰＣに画面表示する。また音声出力のタイミングは、対応するB_０_１[b_１][b_２][b_３]の時間長によって設定する。 Further, when the student utters a voice, the voice pattern is detected and the detected voice pattern is searched in C_S_P, as in the above voice processing. If text having the same voice pattern is detected, the teacher voice pattern associated with this pattern is extracted, and voice information is generated from the text information that is the same voice pattern by a voice synthesis program. And output from the speaker of the student PC. At the same time, a search is also performed on In_M_P. If the same pattern is detected, Anim data is searched using the corresponding operation pattern data and displayed on the student PC. The audio output timing is set according to the time length of the corresponding B_0_1 [b_1] [b_2] [b_3].

図１５の音声合成データを生成する処理（Ｓ１４０９）の詳細を、図１６に示す。 FIG. 16 shows details of the process (S1409) for generating the speech synthesis data of FIG.

まず、In_M_Pでも検索を行い、同一パタンが検出された場合には、対応する音声パタンのテキストを用いて音声合成プログラムによって音声合成データを生成し、受講者用ＰＣのスピーカから出力する（Ｓ１５０１）。次に、検出された音声パタンをC_S_Pにおいて検索する（Ｓ１５０２）。さらに、同一の音声パタンであるテキストが検出された場合には、本パタンに対応付けられている教師音声パタンを抽出し、同音声パタンであるテキスト情報を音声合成プログラムにより音声合成データを生成する（Ｓ１５０３）。そして、音声合成データを受講者用ＰＣのスピーカから出力する（Ｓ１５０４）。 First, a search is also performed for In_M_P, and if the same pattern is detected, speech synthesis data is generated by a speech synthesis program using the text of the corresponding speech pattern and output from the speaker of the student PC (S1501). . Next, the detected voice pattern is searched in C_S_P (S1502). Further, when text having the same voice pattern is detected, a teacher voice pattern associated with this pattern is extracted, and voice information is generated from the text information having the same voice pattern by a voice synthesis program. (S1503). Then, the speech synthesis data is output from the speaker of the student PC (S1504).

また、音声合成データを生成する処理（Ｓ１４１０）の詳細を、図１７に示す。まず、In_M_Pでも検索を行い、同一パタンが検出された場合には、対応する動作パタンデータを用いAnimデータを検索し、受講者用ＰＣに画面表示する（Ｓ１５０６）。音声出力のタイミングは、対応するB_０_１[b_１][b_２][b_３]の時間長によって設定する（Ｓ１５０７）。 Details of the process of generating speech synthesis data (S1410) are shown in FIG. First, a search is also performed on In_M_P. If the same pattern is detected, Anim data is searched using the corresponding operation pattern data and displayed on the student's PC (S1506). The audio output timing is set according to the time length of the corresponding B_0_1 [b_1] [b_2] [b_3] (S1507).

例えば、図１８において、受講者から入力される音声情報、映像情報を解析し、時間Ｔ１１〜Ｔ１２における受講者の動作や音声のパタンデータ（１）を抽出した場合、これに対応する対話パタンとして登録されている対話パタンデータから、時間Ｔ１３〜Ｔ１４における教師の動作や音声のパタンデータ（１）及び時間Ｔ１５〜Ｔ１６における教師の音声パタンデータ（２）、及び時間Ｔ２とＴ３の間のユーザメディア間関連情報を抽出する。そして、選択された対話パタンデータセット中の動作パタンを、対象となるアニメーションキャラクタの動作として受講者の画面６００のキャラクタ表示個所６１０に表示し、さらに該対話パタンデータセット中の音声パタンを、対象となるアニメーションキャラクタの音声として受講者の端末から音声表示することによって、発話パタンに基いて該キャラクタの音声として合成音声を出力することができるようになる。 For example, in FIG. 18, when voice information and video information input from a student are analyzed and a student's action and voice pattern data (1) at time T11 to T12 are extracted, a dialogue pattern corresponding to this is extracted. From the registered dialogue pattern data, the operation of the teacher at time T13 to T14, the voice pattern data (1), the voice pattern data of teacher at time T15 to T16 (2), and the user media between times T2 and T3 Inter-related information is extracted. Then, the motion pattern in the selected dialogue pattern data set is displayed on the character display portion 610 of the student's screen 600 as the motion of the target animation character, and the voice pattern in the dialogue pattern data set is further selected as the target. By displaying the voice of the animated character as a voice from the student's terminal, the synthesized voice can be output as the voice of the character based on the utterance pattern.

これにより、受講者は、リアルインストラクタによるレッスンに近い雰囲気で、レッスンに興味を持って臨むことが出来る。また、レッスン中に教師や受講者が無意識に繰り返す動作をアニメーションキャラクタに投影させることで、普段のレッスン中にリアルに行っている対話状況をデジタルインストラクタに反映し、教師への親しみを持たせ、受講者の学習意欲を長期間に亘って維持させることが期待できる。 As a result, the student can take an interest in the lesson in an atmosphere similar to a lesson by a real instructor. In addition, by projecting unconsciously repeated movements by teachers and students during lessons to animated characters, the conversation situation that is actually happening during regular lessons is reflected in the digital instructor, making the teacher more familiar, The student's willingness to learn can be expected to be maintained over a long period of time.

また、受講者とアニメーションキャラクタ間における時間的な対話パタンを比較することによって、受講者が実は望んでいる対話パタンを抽出し、リアルインストラクタのレッスンに反映させることもできる。 In addition, by comparing the temporal dialogue patterns between the student and the animated character, the dialogue pattern actually desired by the student can be extracted and reflected in the lesson of the real instructor.

なお、上記教師の動作のパタンデータ（２）の後に、教師が受講者の「画面６０２の教師データ表示個所６０２に答えを記入しながら説明する」パターン（３Ａ）と、「それは後で調べて回答しますと応答する」パターン（３Ｂ）とがあったと仮定した場合、パターン（３Ａ）については既に述べた方法により類似パターンを抽出して応答処理すればよい。一方、後者については、例えばサーバやＰＣに「Ｑ&Ａ」の機能を付設しておき、上記受講者の質問内容を教師用ＰＣに送り、これに対して教師が空き時間を利用して「Ｑ&Ａ」を検索し、回答を作成して別途Ｅメール等で受講者用ＰＣに送信したり、あるいは次回のリアルインストラクタとしてのレッスン時に直接回答するようにすればよい。 In addition, after the pattern data (2) of the teacher's operation, a pattern (3A) in which the teacher “explains while entering an answer in the teacher data display location 602 of the screen 602” of the student, If it is assumed that there is a pattern (3B) that responds when the user answers, the pattern (3A) may be subjected to response processing by extracting a similar pattern by the method described above. On the other hand, with regard to the latter, for example, a “Q & A” function is added to a server or PC, for example, and the contents of the above-mentioned student's questions are sent to the teacher's PC. May be created and sent to the student's PC by e-mail separately, or may be answered directly during the lesson as the next real instructor.

図２０は、学習スケジュール管理機能で得られた、一人の教師が受け持つ複数の受講者イ、ロ、−−の、学習履歴の例を示す図である。最初のリアルインストラクタによるレッスンの直後は、各受講者が共通のオリジナルデジタルインストラクタを使用して学習する。レッスンが進むにつれて、各受講者に対応したユニークデジタルインストラクタが生成され、それによる学習がなされる。 FIG. 20 is a diagram illustrating an example of a learning history of a plurality of students A, B, and-that a teacher has, obtained by the learning schedule management function. Immediately after the first real instructor lesson, each student learns using the same original digital instructor. As the lesson progresses, a unique digital instructor corresponding to each student is generated and learning is performed.

このように、本発明によれば、動画像を解析して一組の利用者（受講者と教師)の動作パタン抽出を行い、上記音声を解析して利用者の発声パタン抽出を行い、抽出された動作パタンと発声パタンの少なくとも２セットについてそれぞれ時系列に対応付けて対話パタンの抽出を行い、利用者から別途取得される音声や動画像と同一利用者から抽出した動作パタンと発声パタンとの類似度を判定し、類似度が高ければ、抽出された利用者（受講者)の対話パタンに対する相手利用者(教師)の動作パタンをアニメーションキャラクタによって動画像として出力し、同時に抽出された対話パタンの相手利用者(教師)の発声パタンを音声出力する。これにより、例えば利用者である受講者が、出力されるアニメーションキャラクタの動作情報、音声情報により、リアルな教師との対話環境を保ちながら学習を行うことができ、受講者の学習効果の向上を見込むことができる。 As described above, according to the present invention, motion patterns are analyzed to extract a motion pattern of a set of users (a student and a teacher), the voice is analyzed to extract a user's utterance pattern, and extracted. The dialogue pattern is extracted in association with each of the time series of at least two sets of the motion pattern and the utterance pattern, and the motion pattern and the utterance pattern extracted from the same user as the voice or moving image separately acquired from the user, If the similarity is high, the action pattern of the other user (teacher) with respect to the extracted user (student) dialog pattern is output as a moving image by the animation character, and the extracted dialog Outputs the utterance pattern of the pattern partner user (teacher). As a result, for example, a student who is a user can learn while maintaining a conversational environment with a real teacher by using the motion information and audio information of the animation characters that are output, thereby improving the learning effect of the student. I can expect.

本発明によれば、受講者と教師との間の時間的な対話パタンを抽出することができ、さらには、受講者とアニメーションキャラクタ間における時間的な対話パタンを比較することによって、受講者が実は望んでいる対話パタンを抽出することもできるようになる。 According to the present invention, the temporal dialogue pattern between the student and the teacher can be extracted, and further, by comparing the temporal dialogue pattern between the student and the animation character, In fact, it will be possible to extract the desired dialogue pattern.

また、動作パタンと該発話パタンを同期しているデータとして登録する、すなわちこれらをユーザメディア間関連情報として登録することによって、アニメーションキャラクタを該動作パタンに基いて制御し、パタンとして記述された動作を行うことができるようになり、さらには発話パタンに基いて該キャラクタの音声として合成音声を出力することができるようになる。 In addition, by registering the motion pattern and the utterance pattern as synchronized data, that is, by registering these as related information between user media, the animation character is controlled based on the motion pattern, and the motion described as the pattern Furthermore, based on the utterance pattern, a synthesized voice can be output as the voice of the character.

また、利用者から入力される音声情報、映像情報を解析し、解析された結果に対応する対話パタンを登録されている対話パタンデータから選択し、選択された対話パタンデータセット中の動作パタンを、対象となるアニメーションキャラクタの動作として画面表示し、さらに該対話パタンデータセット中の音声パタンを、対象となるアニメーションキャラクタの音声として音声表示することによって、発話パタンに基いて該キャラクタの音声として合成音声を出力することができるようになる。 Also, it analyzes the audio information and video information input from the user, selects the dialogue pattern corresponding to the analyzed result from the registered dialogue pattern data, and selects the operation pattern in the selected dialogue pattern data set. The screen is displayed as the action of the target animation character, and the voice pattern in the dialogue pattern data set is voice-displayed as the voice of the target animation character, so that the voice of the character is synthesized based on the utterance pattern. Audio can be output.

また、利用者間で対話が実施される度に登録され、登録時点で対話パタンが再抽出され、さらに対話パタンの類似度及び頻度を抽出することによって、例えば、教師と受講者間の変化していく過程に追随して、最近の両者における対話パタンを抽出したり、過去の対話パタンを再現したりすることができ、現状の学習状況に応じて利用する対話パタンを変動させることができるようになる。 In addition, it is registered every time a dialogue is performed between users, the dialogue pattern is re-extracted at the time of registration, and the similarity and frequency of the dialogue pattern are further extracted. It is possible to extract dialogue patterns in both of them recently and to reproduce past dialogue patterns, and to change the dialogue pattern to be used according to the current learning situation. become.

本発明の一実施例における装置の構成の一例を示す図。The figure which shows an example of a structure of the apparatus in one Example of this invention. 図１Ａの装置構成の機能ブロックを示す図。The figure which shows the functional block of the apparatus structure of FIG. 1A. 本発明の一実施例における教育情報管理サーバ内の構成の一例を示す図。The figure which shows an example of the structure in the education information management server in one Example of this invention. 本発明の一実施例における教師用ＰＣの構成の一例を示す図。The figure which shows an example of a structure of PC for teachers in one Example of this invention. 本発明の一実施例における受講者用ＰＣの構成の一例を示す図。The figure which shows an example of a structure of PC for students in one Example of this invention. 本発明の一実施例における映像情報蓄積用サーバの構成の一例を示す図。The figure which shows an example of a structure of the image | video information storage server in one Example of this invention. ユーザから入力されるイベントデータ、ユーザの音声データ、ユーザの映像データを格納するユーザデータテーブルの構造の一例を示す図。The figure which shows an example of the structure of the user data table which stores the event data input from a user, user audio | voice data, and user video data. コミュニケーションプログラムの構造の一例を示す図。The figure which shows an example of the structure of a communication program. 教師用ＰＣ画面の一例を示す図。The figure which shows an example of PC screen for teachers. 受講者用ＰＣ画面の一例を示す図。The figure which shows an example of the student's PC screen. 本発明の一実施例における、教師による学習支援処理のフローの一例を示す図。The figure which shows an example of the flow of the learning assistance process by the teacher in one Example of this invention. 本発明の一実施例におけるデータの認識処理のフローの一例を示す図。The figure which shows an example of the flow of the recognition process of the data in one Example of this invention. 本発明の一実施例におけるパターン抽出の処理のフローの一例を示す図。The figure which shows an example of the flow of the process of the pattern extraction in one Example of this invention. 本発明の一実施例におけるユニークデジタルインストラクタ用のキャラクタ生成処理フローの一例を示す図。The figure which shows an example of the character production | generation processing flow for the unique digital instructor in one Example of this invention. 受講者及びの教師の動作パタンデータ及び音声パタンデータの説明図。Explanatory drawing of an operation | movement pattern data and audio | voice pattern data of a student and a teacher. 受講者がデジタルインストラクタを用いて学習する場合のフローの一例を示す図。The figure which shows an example of the flow in case a student learns using a digital instructor. 図１５の音声合成データを生成し出力する処理（Ｓ１４０９）の詳細を示す図。The figure which shows the detail of the process (S1409) which produces | generates and outputs the speech synthesis data of FIG. 図１５の音声合成データを生成し出力する処理（Ｓ１４１０）の詳細を示す図。The figure which shows the detail of the process (S1410) which produces | generates and outputs the speech synthesis data of FIG. 受講者がデジタルインストラクタを用いて学習する際の、受講者の音声パタンデータと教師の合成音声との関係の一例を示す図。The figure which shows an example of the relationship between a student's audio | voice pattern data and a teacher's synthetic | combination audio | voice when a student learns using a digital instructor. オリジナルデジタルインストラクタのアニメーションの一例を示す図。The figure which shows an example of the animation of an original digital instructor. 教師の特徴動作パタン対応のアニメーションの一例を示す図。The figure which shows an example of the animation corresponding to a teacher's characteristic motion pattern. 学習スケジュール管理機能で得られた、一人の教師が受け持つ複数の受講者の、学習履歴の例を示す図である。It is a figure which shows the example of the learning log | history of the several student which one teacher takes charge obtained by the learning schedule management function.

Explanation of symbols

１０１…教育情報管理サーバ、１０２…教師用ＰＣ、１０３…受講者用ＰＣ、１０４…映像情報蓄積用サーバ、１０５…通信ネットワーク、１０２０１…スピーカ、１０２０２…カメラ、１０２０３…マイク、１０１Ａ…教師と受講者の対話コミュニケーション機能、１０１Ｂ…テキスト表示編集機能、１０１Ｃ…対話パターン抽出機能、１０１Ｄ…ユニークデジタルインストラクタ生成機能、１０１Ｅ…教師（リアルインストラクタ）による学習支援機能、１０１Ｆ…デジタルインストラクタによる学習支援機能、１０１Ｇ…学習スケジュール管理機能、１０１１…ＣＰＵ、１０１２…メモリ、１０１３…ハードディスク、１０１３０１…動作パタンデータ、１０１３０２…音声認識用辞書、１０１３０３…映像パタン認識用データ、１０１３０４…受講者用テキストデータ、１０１３０５…問題正解データ、１０３１０７…コミュニケーションプログラム５００…教師用画面、６００…受講者用画面、６１０…キャラクタ表示個所、７００…ユーザデータテーブル。 DESCRIPTION OF SYMBOLS 101 ... Educational information management server, 102 ... Teacher PC, 103 ... Student PC, 104 ... Video information storage server, 105 ... Communication network, 10201 ... Speaker, 10202 ... Camera, 10203 ... Microphone, 101A ... Teacher and student 101B ... Text display editing function, 101C ... Dialog pattern extraction function, 101D ... Unique digital instructor generation function, 101E ... Learning support function by teacher (real instructor), 101F ... Learning support function by digital instructor, 101G Learning schedule management function, 1011 ... CPU, 1012 ... memory, 1013 ... hard disk, 101301 ... operation pattern data, 101302 ... speech recognition dictionary, 101303 ... video pattern recognition data, 1 1304 ... Student text data, 101,305 ... problem correct data, 103,107 ... communication program 500 ... Teacher screen, 600 ... Student screen, 610 ... character display location, 700 ... user data table.

Claims

Education for providing a connection unit connected to the student side terminal and the teacher side terminal, and for assisting the student in interactive learning with the teacher or a digital instructor corresponding to the teacher using the student side terminal An information management server,
A storage unit;
Dialog pattern generation means for generating dialog pattern data in dialog learning between the student and the teacher from the student information and teacher information acquired from the student side terminal and the teacher side terminal,
Registration means for registering the dialogue pattern data in the storage unit in time series;
A dialogue pattern extraction function for extracting a characteristic dialogue pattern from the dialogue patterns registered in the storage unit and registering it in the storage unit;
Of the characteristic dialogue patterns registered in the storage unit, a specific dialogue pattern of the teacher for the specific dialogue pattern of the student is generated as a digital instructor corresponding to the teacher by an animation character and synthesized speech Educational information management server characterized by having a unique digital instructor generation function for

The dialog pattern extraction function according to claim 1,
A function of performing motion pattern extraction and utterance pattern extraction by analyzing moving images and voices of the teacher and the student at the time of learning acquired from the teacher-side terminal and the student-side terminal, and the extracted operation A dialogue pattern is extracted in association with each of a time series for at least two sets of a pattern and a utterance pattern, and voices and moving images separately acquired from the teacher-side terminal and the student-side terminal, and the extracted action pattern, A dialogue pattern extraction function having a function of determining the similarity to the utterance pattern and a function of registering the characteristic dialogue pattern in the storage unit when the similarity is equal to or greater than a predetermined value. Educational information management server characterized by

In claim 1,
The student information and the teacher information include utterance content, utterance input time and utterance input end time,
The dialog pattern extraction function is:
When the student information and the teacher information registered in the storage unit are referred to, and it is determined that the teacher's utterance has been performed after the student's utterance, the utterance content of the teacher is used as the utterance content of the student. And as a dialogue sub-pattern, and record the silent time length from the student's utterance to the teacher's utterance in the storage unit as the sub-pattern dialogue time data of the dialogue sub-pattern,
In the unique digital instructor generation function,
An education information management server, wherein the animation character is made to respond to the student by the teacher based on the inter-pattern time data.

The dialog pattern extraction function according to claim 1,
The degree of synchronization between the data extracted as the action pattern in the dialog pattern and the data extracted as the utterance pattern is determined, and when it is determined that both data are synchronized, the action pattern and the utterance An educational information management server, wherein patterns are registered in the storage unit as synchronized data.

2. The educational information management server according to claim 1, wherein the storage unit is a user data table for recording event data, audio data, and video data in time series.