JP2020003624A

JP2020003624A - System for optimizing and virtualizing video and voice for virtual reality lesson room for remote instruction

Info

Publication number: JP2020003624A
Application number: JP2018122553A
Authority: JP
Inventors: 泰宏飯箸; Yasuhiro Iihashi
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2020-01-09

Abstract

To solve problems for example, in videos and voices transmitted and received between remote sites in the ICT instruction system, if speech productions are performed simultaneously, speech production of a participant personal cannot be selected and listened, and in exchange between a learner and an instructor, it is impossible to identify an individual on the basis of a position of a speech production person which a listener can identify in a normal state, and delay of transmission occurs always, especially, communication between an instructor and a learner is not smoothly performed.SOLUTION: The invention is configured so that, it is assumed that, by transmission of voices and videos by ICT equipment and the Internet, it is impossible to completely dissolve delay, and even there is delay of transmission of voices and videos, a function for recording each learner is achieved as a corresponding function, a delay state of videos and voices between the instructor and learner can be visualized, and a virtual lesson room which is approximated to an image in which learners who present at respective local areas are arranged in seats in one class room, can be achieved by control of voice delay.SELECTED DRAWING: Figure 1

Description

本発明は、ICT遠隔教育において、教師や学習者の個別の録画・録音も保存再生でき、個別に確認できるようにするものである。また、教場の講師と学習者の立体的な位置関係を仮想的に実現して、教場の臨場感に講師も学習者も引き込むことを可能にする。さらに避けられない通信遅延については視覚化によって本人の反応遅れなのかネットによる通信遅延なのかの識別を可能にする。 The present invention makes it possible to save and reproduce individual recordings and recordings of teachers and learners in ICT distance learning, and to confirm them individually. In addition, the three-dimensional positional relationship between the lecturer and the learner in the school is virtually realized, so that both the teacher and the learner can be drawn into the realism of the school. Further, with regard to the inevitable communication delay, it is possible to identify whether the communication delay is caused by the response delay of the person or the communication delay by the visualization.

ICT教育の映像と音声は遠隔地とインターネット回線を通じて利用され、特に講師が不足する地域や海外の学習者に対して語学教育を行う場合にはメリットが高いとされ広く普及している。しかし、インターネット回線を介する学習者の音声は、多数の声の混成でありその中の特定話者の発音が適正であったどうかを選択的に聞くことが出来なかった。また、通常の教場でのように学習者の座った位置から聞こえてくることがないので、講師は学習者の声の音質や発話上の癖以外に学習者の位置や方向を頼りに声の主を特定することが出来ない。実際、講師は声の主の特定を誤ったり判断に迷ったりするため、実技指導が不完全不十分になりがちであった。また、インターネット回線とICT機器やカメラシステムの技術的な問題から映像と音声には遅延が発生し講師の発音に対して学習者が躊躇なく反応しているのか迷いや不安がないのかなどを判断する事が困難であったため、教育効果には大きな制約が存在していた。 The video and audio of ICT education is used through remote areas and Internet lines, and is widely used, especially when it comes to providing language education to learners in areas where there is a shortage of lecturers or overseas learners. However, the learner's voice via the Internet line is a mixture of many voices, and it was not possible to selectively hear whether the pronunciation of a specific speaker was proper or not. In addition, since the learner does not hear from the position where the learner sits as in a normal school, the instructor relies on the learner's position and direction in addition to the sound quality and speech habits of the learner. I cannot identify the Lord. In fact, instructors tended to be incomplete and inadequate in practical instruction, as they misidentified or lost judgment of the main voice. In addition, video and audio are delayed due to technical problems with the Internet line and ICT equipment and camera system, and it is determined whether the learner is reacting to the instructor's pronunciation without hesitation or whether there is no hesitation or anxiety Because of the difficulty in doing so, there were significant constraints on educational effectiveness.

多数の発話者の混然音声の中から特定個人の発話を選択的に聞くこと、既存のシステムにない仮想現実の教室を立体音響によって実現すること、およびインターネットの回線速度の向上、、ICT機器の性能向上は進んではいるが、全ての学習者に最高性能のICT機器の性能向上は進んではいても、全ての学習者に最高性能のICT機器が用意される事は難しく、インターネット環境やICT機器による遅延の改善だけには頼らない問題解決、が必要であった。 Selectively listen to the utterance of a specific individual from the crowded voice of many speakers, realize a virtual reality classroom that does not exist in existing systems with 3D sound, improve the line speed of the Internet, ICT equipment Although the performance improvement of ICT equipment is progressing, it is difficult for all learners to prepare the highest performance ICT equipment even if the performance improvement of ICT equipment of the highest performance is progressing, and it is difficult to provide Internet environment and ICT It was necessary to solve problems that did not rely only on improving the delay caused by equipment.

現在広く普及しているICT機器を利用した授業形態では、多数の学習者ら（講師を含む）が一斉に発話すると音声は混成され、特定個人の声を事後においても選択的に聞くことが出来ない欠陥があった。また、複数の学習者が同時に発話する際に講師が学習者の所在方向を頼りに瞬時に声の主を特定することが出来ないので本人の声の質や答え方の癖など以外には頼しかなく、通常の教場ように講師は的確にして素早い反応が出来ず、講師による指導に支障が生じていた。
さらに、現在広く普及しているICT機器を利用した授業形態は一人の講師に対してICT機器で映像を撮影し音声はモノラルかステレオ音源として配信し、複数の学習者側では生徒が買える程度のスマートフォン、タブレット、コンピュータのビデオカメラとモノラルマイクで受け答えするのが一般的である。回線は一般的なインターネット回線で、海外においては品質の低い回線も少なくない。学習者側のICT機器は一般の市販品である。映像と音声は共に0.2秒から最大１秒程度あるいはそれ以上の遅延を発生し、講師と学習者側のコミュニケーションに支障が起こっていた。 In a class using ICT equipment, which is currently widely used, when many learners (including instructors) speak at once, the voice is mixed, and the voice of a specific individual can be selectively heard after the fact. There were no flaws. In addition, when multiple learners speak simultaneously, the instructor cannot instantaneously identify the main voice based on the direction of the learner, so the instructor must rely on the voice quality of the learner and the habit of answering. However, the instructor was not able to respond accurately and quickly as in a normal school, which hindered the instruction by the instructor.
In addition, the lesson style using ICT equipment, which is currently widely used, is such that one instructor can shoot video with ICT equipment and deliver audio as monaural or stereo sound source, and multiple learners can buy It is common to answer with smartphones, tablets, computer camcorders and monaural microphones. The line is a general Internet line, and many overseas have low quality lines. The learner's ICT equipment is a general commercial product. Both video and audio generated a delay of 0.2 seconds to a maximum of 1 second or more, which hindered communication between the instructor and the learner.

そのため、図１のように授業中の学習者の映像と音声を個別に分離取得して録音録画がされると同時に個別に加工されて通信先に適切に再現される方式を考案し発明した。このことによって、最大で30人程度になる学習者個々の特定が可能となり講師に対する学習者の応答状態を個別に確認出来るようになる。次に図２では講師と学習者の発話の位置関係を仮想的に教室内に配置してのように立体的な音響効果を実現できる機能を発明した。この音響効果により講師は学習者からの発話について位置関係をイメージする事が出来るようになり、教室で講義をしている状態に近付くことができる。さらに、図３のような授業中の映像と音声の遅延を相対的に視覚化する技術を発明した。講師が発話し、個々の学習者が反応し発話し講師に届く予想時間が視覚化される事で学習者が講師に対して適切な反応をしているかを把握する事が出来る。 Therefore, as shown in FIG. 1, a method was devised and invented in which a video and a sound of a learner in a class are separately acquired and recorded and simultaneously recorded and simultaneously processed and appropriately reproduced in a communication destination. As a result, it is possible to specify individual learners of up to about 30 students, and it is possible to individually confirm the learner's response state to the instructor. Next, FIG. 2 invents a function capable of realizing a three-dimensional sound effect as if the positional relationship between the instructor and the learner's utterance was virtually arranged in a classroom. This acoustic effect allows the instructor to imagine the positional relationship with respect to the utterance from the learner, and can approach the state of giving a lecture in a classroom. Furthermore, a technique for relatively visualizing the delay between video and audio during a lesson as shown in FIG. 3 was invented. The instructor speaks, the individual learners react, and the estimated time to reach the instructor is visualized, so that it is possible to grasp whether the learner is responding appropriately to the instructor.

特開２０１５−０８０２５５号公報JP 2015-080255 A 特開２０１２−２３９１８４号公報JP 2012-239184 A 特表２００９−５１８９９６号公報JP-T-2009-518996

この仕組が解決しようとしている問題点は、ICT教育で利用されるビデオ会議では参加する講師や学習者たちの発話が混成音声となって分離できないこと、また教場のような相互の位置関係が失われていること、および通信遅延の問題の解決である。しかし、遅延そのものがゼロになる事は無く、数十ミリ秒〜1秒程度の遅延は発生してしまう。 The problems that this system is trying to solve are that in video conferencing used in ICT education, the utterances of the participating lecturers and learners cannot be separated as mixed voices, and the mutual positional relationship such as the classroom is lost. And the problem of communication delays. However, the delay itself does not become zero, and a delay of about several tens of milliseconds to one second occurs.

本発明は、講師と学習者それぞれの映像と音声を個別に取得してそれぞれを録画・録音すること、音に対して遅延や反響等の加工を加えて講師と学習者の位置関係を教室のイメージに近づけること、講師と学習者に届く音声の遅延を可視化すること、以上により、これまでICT遠隔教育に内在していた問題が解決して講師と学習者の意志疎通が緊密化される仕組みとした。 The present invention obtains the video and audio of the instructor and the learner individually, and records and records each of them, and adds processing such as delay and reverberation to the sound to determine the positional relationship between the instructor and the learner. By approaching the image and visualizing the delay of the voice reaching the instructor and the learner, the mechanism that solves the problems inherent in ICT distance education and the communication between the instructor and the learner becomes closer And

本発明のICT遠隔教育向け仮想現実教室は、映像と音声を学習者別に取得し、仮想現実の立体音響効果を実現して教室で講義する仮想的な状態を実現し、インターネットを通じた音声と映像の遅延状態を分析して、その状態を可視化する事により、学習者の発話を適正認識するタイミングが明確にな学習者との意思疎通が緊密化され、映像と音声が必要に応じて再生可能となり、講師の判断を適正にする。 The virtual reality classroom for ICT distance education of the present invention acquires video and audio for each learner, realizes virtual reality stereoscopic sound effect, realizes a virtual state of lecture in the classroom, voice and video through the Internet By analyzing the delay state of the learner and visualizing the state, communication with the learner who has a clear timing for properly recognizing the utterance of the learner is tighter, and the video and audio can be reproduced as necessary And make the instructor's judgment appropriate.

図１は映像と音声を学習者別録画する説明図である。（実施例１）FIG. 1 is an explanatory diagram for recording video and audio for each learner. (Example 1) 図２は講師と学習者の位置関係を仮想化する説明図である。（実施例２）FIG. 2 is an explanatory diagram for virtualizing the positional relationship between the instructor and the learner. (Example 2) 図3は音声と映像の遅延を可視化する説明図である。（実施例３）FIG. 3 is an explanatory diagram for visualizing the delay between audio and video. (Example 3)

ICT教育システムの映像と音声の最適化・仮想化をインターネットの仮想環境における各種プログラミング技術で可能とした。 The optimization and virtualization of the video and audio of the ICT education system was made possible by various programming technologies in the virtual environment of the Internet.

図１は、本発明の実施例の1001は講師側のコンピュータ、1002はマイクを内蔵したWEBカメラ、1003は学習者別にインターネット仮想環境で映像と音声を記録する機能、1004はカメラとマイクを内蔵する学習者側のスマートフォン、タブレットPC、パーソナル・コンピュータである。 FIG. 1 shows an embodiment of the present invention 1001 is a computer on the instructor side, 1002 is a WEB camera with a built-in microphone, 1003 is a function for recording video and audio in an Internet virtual environment for each learner, 1004 is a built-in camera and microphone Learners' smartphones, tablet PCs, and personal computers.

1003のインターネット仮想環境の映像と音声の記録機能は講習を行う側の管理者が受講者を指定して任意で映像と音声を記録する事が可能となっている。 The video and audio recording function of the Internet virtual environment of 1003 allows the administrator of the training side to record video and audio arbitrarily by specifying the student.

録画と録音は講師、学習者を個別に記録する事が可能で、講習への参加者全員を同時に記録する事も可能となっている。再生も個別と複数の記録を同時に再生する事ができる。 Recording and recording can record the instructor and the learner individually, and it is also possible to record all the participants in the course at the same time. Reproduction can be performed individually and a plurality of recordings can be reproduced simultaneously.

図２は2001遠隔地域に点在する学習者と講師の関係を仮想教室として再現する発明を示している。2003
仮想教室には学習者が遠隔地からログインした順序で席に配置される。 FIG. 2 shows an invention for reproducing a relationship between a learner and a lecturer scattered in a remote area 2001 as a virtual classroom. 2003
In the virtual classroom, the learners are placed at the seats in the order in which the learners log in from remote locations.

それぞれの学習者がどの席に配置されたかを講師は2004自分のPCで視覚的に確認する事が可能になる。 Instructors will be able to visually check on their own PC which seats each learner has been placed in.

次に席に配置された2003学習者がそれぞれの席の位置で発話しているという音響効果を演出する。 Next, the sound effect that 2003 learners placed in the seats are speaking at the position of each seat is produced.

この仕組は講師に届く音の遅延を2004講師側の左右のスピーカーに対して制御する事で可能としている。以上から最大で30人程度の教室を想定して学習者が教室内の指定した座席に配置された仮想教室が実現する。 This mechanism enables the delay of the sound reaching the lecturer to be controlled by the left and right speakers of the lecturer in 2004. From the above, a virtual classroom in which a learner is placed in a designated seat in the classroom is realized assuming a classroom of up to about 30 people.

図３はITC講習を行う講師の側の画面イメージであり、学習者に講師の声が届き学習者がそれに反応し講師に学習者の発話が届く時間を可視化できるようにしている。 FIG. 3 is a screen image of the instructor who performs the ITC course, and the learner's voice reaches the learner, and the learner responds to it so that the learner's utterance can be visualized.

プログラムがインターネットによる現在の遅延状態を分析し、同時に講師側のICT機器の遅延と学習者側の遅延を分析して、学習者までの遅延と学習者からの遅延をグラフによって講師の画面に示す。この表示は講師が必要に応じて任意に行い、学習者の反応を判断する基準として利用する。 The program analyzes the current delay status through the Internet, and at the same time analyzes the delay of the ICT equipment of the instructor and the delay of the learner, and shows the delay to the learner and the delay from the learner on the instructor screen by graph . This display is arbitrarily performed by the instructor as needed, and is used as a reference for judging the response of the learner.

図３では最初に講師側の3002PCから音波を発信し発信時刻をマイクロセカンドまで記録する。生徒側の3003端末は音波を受信し、3004受信した音波を反射して3005反射した音波が講師側に届く時刻を同様にマイクロセカンドまで計測する。3006で講師側と受講者側の二点間の差分時間をマイクロセカンドまで判断する。3002講師のPCには二点間の3007差分時間を映像で可視化して体感遅延時間を感じるように表示する。 In FIG. 3, a sound wave is first transmitted from the instructor's 3002PC, and the transmission time is recorded up to the microsecond. The 3003 terminal on the student side receives the sound wave, reflects the 3004 received sound wave, and measures the time when the reflected sound wave reaches the instructor side to the microsecond in the same manner. At 3006, the difference time between two points on the instructor side and the student side is determined up to microsecond. The 3002 instructor's PC visualizes the 3007 difference time between the two points with video and displays it so that the user can feel the sensation delay time.

特に仮想教室の実現はICT学習だけでなく、企業、組織で頻繁に行われるWEB会議に於いても参加者の発話の位置がイメージされると会議がスムーズになり、幅広く普及しする事が予想される。 In particular, realization of virtual classrooms is expected not only for ICT learning but also for WEB conferences frequently conducted by companies and organizations, if the image of the utterance position of the participants is imaged, the conference will be smooth and widely spread Is done.

１００１講師側コンピュータ
１００２ WEBカメラ
１００３インターネット環境の個別録音録画
１００４受講者側端末
２００１各地学習者の端末
２００２インターネット上のシステム
２００３仮想化教室の学習者と音声遅延による配置
２００４講師側コンピュータ
３００１プログラムの開始
３００２講師側コンピュータ
３００３学習者端末
３００４学習者端末による音波反射
３００５音波到達時間の計測
３００６音波送出受信時間比較
３００７音波遅延時間の可視化 1001 Instructor's computer 1002 Web camera 1003 Individual recording and recording in the Internet environment 1004 Student's terminal 2001 Local learner's terminal 2002 System on the Internet 2003 Virtualization classroom with learners and voice delay 2004 Instructor's computer 3001 Program start 3002 Instructor computer 3003 Learner terminal 3004 Sound wave reflection by learner terminal 3005 Measurement of sound wave arrival time 3006 Sound wave transmission / reception time comparison 3007 Visualization of sound wave delay time

Claims

A function to record and play back audio and video separately on the Internet server and each learner's terminal for each learner in the ICT learning system using the Internet.

In the Internet-based ICT learning system, the sound wave sent from the instructor arrives for each learner and the actual time the sound wave returns from the learner is measured, and the time is visualized so that the instructor can recognize the delay visually. .

In an ICT learning system using the Internet, learners scattered in distant places are assigned to seats in a virtual classroom at the time of logging in, and stereo voices are sent so that each voice reaches the lecturer from the assigned seat position. A function to adjust the image and realize a two-dimensional virtual classroom.