JP6683864B1

JP6683864B1 - Content control system, content control method, and content control program

Info

Publication number: JP6683864B1
Application number: JP2019121263A
Authority: JP
Inventors: 量生川上; 尚小嶋; 寛明齊藤
Original assignee: Dwango Co Ltd
Current assignee: Dwango Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2020-04-22
Anticipated expiration: 2039-06-28
Also published as: JP2021006977A

Abstract

【課題】アバターを用いた表現の視覚効果を高めること。【解決手段】一実施形態に係るコンテンツ制御システムは、少なくとも一つのプロセッサを備える。コンテンツ制御システムは、教師が授業を行う場面を写した原画像データを取得し、原画像データに基づいて、教師の動作を示すモーションデータを生成し、モーションデータに基づいて、教師に対応するアバターの仕様を決定し、決定された仕様に基づくアバターを教師とは異なる位置に配置することで、授業を受ける生徒のための教育用コンテンツデータを生成し、生成された教育用コンテンツデータを出力する。【選択図】図４PROBLEM TO BE SOLVED: To enhance the visual effect of expression using an avatar. A content control system according to an embodiment includes at least one processor. The content control system acquires original image data showing a scene where a teacher conducts a lesson, generates motion data indicating a teacher's action based on the original image data, and based on the motion data, an avatar corresponding to the teacher. By determining the specifications of the above and placing an avatar based on the determined specifications at a position different from that of the teacher, educational content data for the students taking the lesson is generated, and the generated educational content data is output. . [Selection diagram] Fig. 4

Description

本開示の一側面はコンテンツ制御システム、コンテンツ制御方法、およびコンテンツ制御プログラムに関する。 One aspect of the present disclosure relates to a content control system, a content control method, and a content control program.

仮想オブジェクトの一例であるアバターは様々なコンピュータシステムで用いられている。例えば、特許文献１には、講師のアバターを表示する学習システムが記載されている。講師用機器には、頭部に搭載するディスプレイで画面が空中に浮かんでいるように表示されるＨＭＤと、手を包むように装備して指の位置や動きを電気信号に変換する入力装置であるグローブデバイス（サイバーグローブ）とが接続される。グローブデバイス、ジョイパッド、キーボード、マウスなどからの入力信号により、仮想空間内での講師の化身として描画されるアバターの動作が制御される。 Avatar, which is an example of a virtual object, is used in various computer systems. For example, Patent Literature 1 describes a learning system that displays an avatar of a teacher. The instructor's device is an HMD that is displayed on the display mounted on the head as if the screen were floating in the air, and an input device that wraps around the hand and converts the position and movement of the finger into an electric signal. The glove device (cyber glove) is connected. Input signals from the glove device, joypad, keyboard, mouse, etc. control the movement of the avatar drawn as the incarnation of the instructor in the virtual space.

特開２００９−１４５８８３号公報JP, 2009-145883, A

視聴者をコンテンツに惹きつけるために、アバターを用いた表現の視覚効果を高めることが望まれている。 It is desired to enhance the visual effect of expressions using avatars in order to attract viewers to the content.

本開示の一側面に係るコンテンツ制御システムは、少なくとも一つのプロセッサを備える。少なくとも一つのプロセッサうちの少なくとも一つは、教師が授業を行う場面を写した原画像データを取得する。少なくとも一つのプロセッサのうちの少なくとも一つは、原画像データに基づいて、教師の動作を示すモーションデータを生成する。少なくとも一つのプロセッサのうちの少なくとも一つが、モーションデータに基づいて、教師に対応するアバターの仕様を決定する。少なくとも一つのプロセッサのうちの少なくとも一つは、決定された仕様に基づくアバターを教師とは異なる位置に配置することで、授業を受ける生徒のための教育用コンテンツデータを生成する。少なくとも一つのプロセッサのうちの少なくとも一つは、生成された教育用コンテンツデータを出力する。 A content control system according to one aspect of the present disclosure includes at least one processor. At least one of the at least one processor acquires original image data showing a scene where a teacher conducts a lesson. At least one of the at least one processor generates motion data indicating the motion of the teacher based on the original image data. At least one of the at least one processor determines a specification of the avatar corresponding to the teacher based on the motion data. At least one of the at least one processor arranges the avatar based on the determined specifications at a position different from the position of the teacher to generate educational content data for the student taking the lesson. At least one of the at least one processor outputs the generated educational content data.

このような側面においては、対応し合う教師およびアバターを表現する教育用コンテンツデータが生成される。特許文献１に記載されていない構成を有するこの教育用コンテンツデータを用いることで該コンテンツの視覚効果を高めることができ、ひいては、視聴者をコンテンツに惹きつけることが期待できる。 In such an aspect, educational content data expressing corresponding teachers and avatars is generated. By using this educational content data having a configuration not described in Patent Document 1, it is possible to enhance the visual effect of the content, and it can be expected that the viewer can be attracted to the content.

本開示の一側面によれば、アバターを用いた表現の視覚効果を高めることができる。 According to an aspect of the present disclosure, it is possible to enhance the visual effect of expression using an avatar.

実施形態に係るコンテンツ配信システム（コンテンツ制御システム）の適用の一例を示す図である。It is a figure showing an example of application of a content distribution system (contents control system) concerning an embodiment. 実施形態に係るコンテンツ配信システムに関連するハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions relevant to the content delivery system which concerns on embodiment. 教師端末（配信者端末）の利用場面の一例を示す図である。It is a figure which shows an example of a usage scene of a teacher terminal (distributor terminal). 実施形態に係るコンテンツ配信システムに関連する機能構成の一例を示す図である。It is a figure which shows an example of a functional structure relevant to the content delivery system which concerns on embodiment. 教育用コンテンツの表示モードの選択の一例を示すフローチャートである。It is a flow chart which shows an example of selection of a display mode of educational contents. 複合モードおよび仮想モードでの教育コンテンツデータの出力の一例を示すフローチャートである。It is a flow chart which shows an example of output of educational contents data in compound mode and virtual mode. アバターの動作を決める一例を示す図である。It is a figure which shows an example which determines operation | movement of an avatar. 実写モードでの教育用コンテンツの一例を示す図である。It is a figure which shows an example of the educational content in a live-action mode. 複合モードでの教育用コンテンツの一例を示す図である。It is a figure showing an example of educational contents in compound mode. 複合モードでの教育用コンテンツの別の例を示す図である。It is a figure which shows another example of the educational content in compound mode. 仮想モードでの教育用コンテンツの一例を示す図である。It is a figure which shows an example of the educational content in virtual mode. コンテンツの提供の様々な例を示すシーケンス図である。It is a sequence diagram which shows various examples of provision of content. 教師端末上に表示される補助画像の例を示す図である。It is a figure which shows the example of the auxiliary image displayed on a teacher terminal.

以下、添付図面を参照しながら本開示での実施形態を詳細に説明する。なお、図面の説明において同一または同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements will be denoted by the same reference symbols, without redundant description.

［システムの概要］
実施形態に係るコンテンツ制御システムは、ユーザに向けて配信されるコンテンツを制御するコンピュータシステムである。コンテンツとは、コンピュータまたはコンピュータシステムによって提供され、人が認識可能な情報のことをいう。コンテンツを示す電子データのことをコンテンツデータという。コンテンツの表現形式は限定されず、例えば、コンテンツは画像（例えば、写真、映像など）、文書、音声、音楽、またはこれらの中の任意の２以上の要素の組合せによって表現されてもよい。コンテンツは様々な態様の情報伝達またはコミュニケーションのために用いることができ、例えば、ニュース、教育、医療、ゲーム、チャット、商取引、講演、セミナー、研修などの様々な場面または目的で利用され得る。コンテンツの制御とは、ユーザにコンテンツを提供するために実行される処理のことをいう。コンテンツの制御は、コンテンツデータの生成、編集、記憶、および配信の少なくとも一つを含んでもよいし、これら以外の処理を含んでもよい。 [System overview]
The content control system according to the embodiment is a computer system that controls content distributed to users. Content refers to information that is provided by a computer or computer system and that can be recognized by a person. Electronic data indicating content is called content data. The expression format of the content is not limited, and for example, the content may be expressed by an image (for example, photograph, video, etc.), document, sound, music, or a combination of any two or more of these elements. The content can be used for various modes of information transmission or communication, and can be used in various situations or purposes such as news, education, medical care, games, chats, commerce, lectures, seminars, and training. Content control refers to processing executed to provide content to a user. Content control may include at least one of generation, editing, storage, and distribution of content data, or may include processing other than these.

本実施形態ではコンテンツは少なくとも画像を用いて表現される。コンテンツを示す画像を「コンテンツ画像」という。コンテンツ画像とは、人が視覚を通して何らかの情報を認識することができる像のことをいう。コンテンツ画像は動画像（映像）でもよいし静止画でもよい。コンテンツ画像を示す電子データをコンテンツ画像データという。 In the present embodiment, the content is expressed using at least an image. An image showing content is called a "content image". A content image is an image that allows a person to visually recognize some information. The content image may be a moving image (video) or a still image. Electronic data indicating a content image is referred to as content image data.

コンテンツ制御システムはコンテンツ画像データを視聴者に提供することで、配信者から視聴者への情報伝達を支援する。配信者とは視聴者に情報を伝えようとする人であり、すなわち、コンテンツの発信者である。視聴者とはその情報を得ようとする人であり、すなわち、コンテンツの利用者である。一例では、配信者は視聴者にとって遠隔地に位置する。配信者は自らコンテンツを配信することができ、例えば、配信者はその配信のために、自身を含む領域を撮影する。コンテンツ制御システムは、配信者が映った画像のデータ（画像データ）を取得し、その画像データを解析することで配信者の動作を特定し、その動作を表現するアバターを含むコンテンツ画像データを生成する。本開示では、配信者の動作を特定するために解析される画像（すなわち、配信者が映った画像）のことを「原画像」といい、この原画像を示す電子データを原画像データという。原画像はコンテンツを生成するための素材であるといえる。 The content control system provides the content image data to the viewer to support the information transmission from the distributor to the viewer. A distributor is a person who tries to convey information to a viewer, that is, a sender of content. A viewer is a person who wants to obtain the information, that is, a user of the content. In one example, the distributor is located remotely to the viewer. The distributor can distribute the content by himself / herself. For example, the distributor shoots an area including the distributor for the distribution. The content control system acquires data (image data) of an image of a distributor, analyzes the image data to identify the distributor's action, and generates content image data including an avatar expressing the action. To do. In the present disclosure, an image analyzed in order to identify the action of the distributor (that is, an image showing the distributor) is referred to as “original image”, and electronic data indicating this original image is referred to as original image data. The original image can be said to be a material for generating content.

一例では、アバターは配信者に代わってコンテンツ画像内に映されてもよく、この場合には、コンテンツ画像を見る視聴者は配信者ではなくアバターを視認する。別の例では、アバターは配信者と共にコンテンツ画像内に映されてもよい。視聴者はコンテンツ画像を見ることで、拡張現実（ＡｕｇｕｍｅｎｔｅｄＲｅａｌｉｔｙ（ＡＲ））、仮想現実（ＶｉｒｔｕａｌＲｅａｌｉｔｙ（ＶＲ））、または複合現実（ＭｉｘｅｄＲｅａｌｉｔｙ（ＭＲ））を体験することができる。本実施形態では、コンテンツ制御システムは視聴者の要求に応じて、または自動的に、コンテンツの表示モードを切り替えることができる。表示モードとは、コンテンツを表示する方法または形式のことをいう。 In one example, the avatar may appear in the content image on behalf of the distributor, in which case the viewer viewing the content image sees the avatar rather than the distributor. In another example, the avatar may be rendered in the content image with the publisher. A viewer can experience augmented reality (Augmented Reality (AR)), virtual reality (Virtual Reality (VR)), or mixed reality (Mixed Reality (MR)) by viewing the content image. In this embodiment, the content control system can switch the display mode of the content in response to the viewer's request or automatically. The display mode refers to a method or format for displaying content.

コンテンツ制御システムは画像データを解析することで配信者の動作を特定するので、配信者はボディストラップ、グローブなどのような、モーションキャプチャのための装置を装着する必要がない。 Since the content control system identifies the behavior of the distributor by analyzing the image data, the distributor does not need to wear a device for motion capture such as a body strap or gloves.

アバターとは、コンピュータによって表現されるユーザの分身である。アバターは、現実世界には実際に存在せずコンピュータシステム上でのみ表現される物体である仮想オブジェクトの一種である。アバターは、撮影された人そのものではなく（すなわち、原画像で示されるユーザそのものではなく）、原画像とは独立した画像素材を用いて、２次元または３次元のコンピュータグラフィック（ＣＧ）によって表現される。アバターの表現方法は限定されない。例えば、アバターはアニメーション素材を用いて表現されてもよいし、実写画像に基づいて本物に近いように表現されてもよい。アバターはコンテンツ制御システムのユーザ（例えば、教師または生徒）によって自由に選択されてもよい。 An avatar is a user's alter ego represented by a computer. An avatar is a kind of virtual object that is an object that does not actually exist in the real world and is represented only on a computer system. The avatar is represented by two-dimensional or three-dimensional computer graphics (CG) using the image material independent of the original image, not the person who is photographed (that is, not the user itself shown in the original image). It The expression method of the avatar is not limited. For example, the avatar may be expressed by using an animation material, or may be expressed as if it is close to the real thing based on a photographed image. The avatar may be freely selected by the user of the content control system (eg, teacher or student).

一例では、コンテンツ画像はアバターが存在する仮想空間を表現する。仮想空間とは、コンピュータ上に表示される画像によって表現される仮想の２次元または３次元の空間のことをいう。見方を変えると、コンテンツ画像は、仮想空間内に設定された仮想カメラから見える風景を示す画像であるといえる。仮想カメラは、コンテンツ画像を見るユーザの視線に対応するように仮想空間内に設定される。 In one example, the content image represents the virtual space in which the avatar resides. The virtual space refers to a virtual two-dimensional or three-dimensional space represented by an image displayed on a computer. From a different point of view, it can be said that the content image is an image showing the scenery seen from the virtual camera set in the virtual space. The virtual camera is set in the virtual space so as to correspond to the line of sight of the user who views the content image.

一例では、コンテンツ制御システムはコンテンツを視聴者に向けて配信してもよい。配信とは、通信ネットワークまたは放送ネットワークを経由して情報をユーザに向けて送信する処理のことをいう。本開示では、配信は放送を含み得る概念である。本開示では、コンテンツを配信する機能を備えるコンテンツ制御システムをコンテンツ配信システムともいう。 In one example, the content control system may deliver the content to a viewer. Distribution means a process of transmitting information to a user via a communication network or a broadcast network. In this disclosure, delivery is a concept that may include broadcasting. In the present disclosure, a content control system having a function of delivering content is also referred to as a content delivery system.

コンテンツ制御システムによるコンテンツの生成および配信の手法は限定されない。例えば、コンテンツ制御システムはライブコンテンツを制御してもよい。この場合には、コンテンツ制御システムは配信者端末から提供されるリアルタイムの映像を処理することでコンテンツデータを生成し、そのコンテンツデータを視聴者端末に向けてリアルタイムに送信する。これはインターネット生放送の一態様であるといえる。あるいは、コンテンツ制御システムは、過去に撮影された映像を処理することでコンテンツデータを生成してもよい。このコンテンツデータは視聴者端末に向けて送信されてもよいし、データベースなどの記憶装置にいったん格納されてもよい。コンテンツ制御システムは、リアルタイム配信後の所与の期間においてコンテンツを視聴することが可能なタイムシフトのために用いられてもよい。あるいは、コンテンツ制御システムは、任意のタイミングでコンテンツを視聴することが可能なオンデマンド配信のために用いられてもよい。上述したように、コンテンツ画像は静止画でもよいので、コンテンツ制御システム（コンテンツ配信システム）は静止画のコンテンツをリアルタイムにまたは後で配信するために用いられてもよい。 The method of generating and delivering the content by the content control system is not limited. For example, the content control system may control live content. In this case, the content control system processes the real-time video provided from the distributor terminal to generate content data, and transmits the content data to the viewer terminal in real time. It can be said that this is one aspect of live Internet broadcasting. Alternatively, the content control system may generate content data by processing an image captured in the past. This content data may be transmitted to the viewer terminal, or may be temporarily stored in a storage device such as a database. The content control system may be used for time shifting, where content can be viewed in a given period after real-time distribution. Alternatively, the content control system may be used for on-demand distribution in which the content can be viewed at any timing. As described above, since the content image may be a still image, the content control system (content distribution system) may be used to deliver the content of the still image in real time or later.

本開示において、データまたは情報を或るコンピュータ“に向けて送信する”との表現は、該コンピュータに最終的にデータまたは情報を届けるための送信を意味する。この表現は、その送信において別のコンピュータまたは通信装置がデータまたは情報を中継する場合も含む意味であることに留意されたい。 In this disclosure, the expression "transmitting data or information to a computer" means transmission for finally delivering the data or information to the computer. It should be noted that this expression is meant to include the case where another computer or communication device relays data or information in the transmission.

上述したようにコンテンツの目的および利用場面は限定されない。本実施形態では、コンテンツの例として教育用コンテンツを示し、コンテンツ制御システムが教育用コンテンツデータを制御するものとする。教育用コンテンツとは、教師が生徒に向けて授業を行うために用いられるコンテンツである。教師とは学業、技芸などを教える人のことをいい、生徒とはその教えを受ける人のことをいう。教師は配信者の一例であり、生徒は視聴者の一例である。教師は教員免許を持つ人であってもよいし、教員免許を持たない人でもよい。授業とは、教師が生徒に学業、技芸などを教えることをいう。教師および生徒のそれぞれについて年齢および所属は限定されず、したがって、教育用コンテンツの目的および利用場面も限定されない。例えば、教育用コンテンツは、保育園、幼稚園、小学校、中学校、高等学校、大学、大学院、専門学校、予備校、オンライン学校などの各種の学校で用いられてよいし、学校以外の場所または場面で用いられてもよい。これに関連して、教育用コンテンツは、幼児教育、義務教育、高等教育、生涯学習などの様々な目的で用いられ得る。 As described above, the purpose and usage of the content are not limited. In the present embodiment, educational content is shown as an example of content, and the content control system controls educational content data. Educational content is content used by teachers to teach classes to students. A teacher is a person who teaches academics and arts, and a student is a person who receives the instruction. A teacher is an example of a distributor and a student is an example of a viewer. The teacher may be a person with a teacher's license or may be a person without a teacher's license. Classes mean that a teacher teaches students about academic work and arts. The age and affiliation of each teacher and student are not limited, and thus the purpose and usage of educational content are not limited. For example, educational content may be used in various schools such as a nursery school, kindergarten, elementary school, junior high school, high school, university, graduate school, vocational school, preparatory school, online school, etc. May be. In this regard, educational content may be used for a variety of purposes such as early childhood education, compulsory education, higher education, lifelong learning and the like.

［システムの構成］
図１は、実施形態に係るコンテンツ配信システム（コンテンツ制御システム）１の適用の一例を示す図である。本実施形態では、コンテンツ配信システム１はサーバ１０を備える。サーバ１０は、コンテンツ画像データを生成および配信するコンピュータである。サーバ１０は通信ネットワークＮを介して少なくとも一つの生徒端末２０と接続する。図１は２台の生徒端末２０を示すが、生徒端末２０の台数は何ら限定されない。さらに、サーバ１０は通信ネットワークＮを介して、教師端末３０、原画像データベース４０、およびコンテンツデータベース５０のうちの少なくとも一つと接続してもよい。通信ネットワークＮの構成は限定されない。例えば、通信ネットワークＮはインターネットを含んで構成されてもよいし、イントラネットを含んで構成されてもよい。 [System configuration]
FIG. 1 is a diagram showing an example of application of a content distribution system (content control system) 1 according to an embodiment. In this embodiment, the content distribution system 1 includes a server 10. The server 10 is a computer that generates and distributes content image data. The server 10 is connected to at least one student terminal 20 via the communication network N. Although FIG. 1 shows two student terminals 20, the number of student terminals 20 is not limited in any way. Furthermore, the server 10 may be connected to at least one of the teacher terminal 30, the original image database 40, and the content database 50 via the communication network N. The configuration of the communication network N is not limited. For example, the communication network N may be configured to include the Internet or may be configured to include an intranet.

生徒端末２０は生徒によって用いられるコンピュータであり、視聴者端末（視聴者によって用いられるコンピュータ）の一例である。生徒端末２０は、コンテンツ配信システム１にアクセスしてコンテンツデータを受信および表示する機能を有する。生徒端末２０は映像を撮影および送信する機能を有してもよい。生徒端末２０の種類および構成は限定されない。例えば、生徒端末２０は高機能携帯電話機（スマートフォン）、タブレット端末、ウェアラブル端末（例えば、ヘッドマウントディスプレイ（ＨＭＤ）、スマートグラスなど）、ラップトップ型パーソナルコンピュータ、携帯電話機などの携帯端末でもよい。あるいは、生徒端末２０はデスクトップ型パーソナルコンピュータなどの据置型端末でもよい。あるいは、生徒端末２０は、部屋に設置された大型スクリーンを備える教室システムであってもよい。 The student terminal 20 is a computer used by a student and is an example of a viewer terminal (computer used by a viewer). The student terminal 20 has a function of accessing the content distribution system 1 to receive and display content data. The student terminal 20 may have a function of capturing and transmitting an image. The type and configuration of the student terminal 20 are not limited. For example, the student terminal 20 may be a high-performance mobile phone (smartphone), a tablet terminal, a wearable terminal (for example, a head mounted display (HMD), smart glasses, etc.), a laptop personal computer, a mobile terminal such as a mobile phone. Alternatively, the student terminal 20 may be a stationary terminal such as a desktop personal computer. Alternatively, the student terminal 20 may be a classroom system including a large screen installed in the room.

教師端末３０は教師によって用いられるコンピュータであり、配信者端末（配信者によって用いられるコンピュータ）の一例である。一例では、教師端末３０は生徒端末２０にとって遠隔地に位置する。教師端末３０は、映像を撮影する機能と、コンテンツ配信システム１にアクセスしてその映像を示す電子データ（映像データ）を送信する機能とを有する。教師端末３０は映像またはコンテンツを受信および表示する機能を有してもよい。教師端末３０の種類および構成は限定されない。例えば、教師端末３０は映像を撮影、収録、および送信する機能を有する撮影システムであってもよい。あるいは、教師端末３０は高機能携帯電話機（スマートフォン）、タブレット端末、ウェアラブル端末（例えば、ヘッドマウントディスプレイ（ＨＭＤ）、スマートグラスなど）、ラップトップ型パーソナルコンピュータ、携帯電話機などの携帯端末でもよい。あるいは、教師端末３０はデスクトップ型パーソナルコンピュータなどの据置型端末でもよい。 The teacher terminal 30 is a computer used by a teacher and is an example of a distributor terminal (computer used by a distributor). In one example, the teacher terminal 30 is located remote from the student terminal 20. The teacher terminal 30 has a function of shooting a video and a function of accessing the content distribution system 1 and transmitting electronic data (video data) showing the video. The teacher terminal 30 may have a function of receiving and displaying an image or content. The type and configuration of the teacher terminal 30 is not limited. For example, the teacher terminal 30 may be an image capturing system having a function of capturing, recording, and transmitting an image. Alternatively, the teacher terminal 30 may be a high-performance mobile phone (smartphone), a tablet terminal, a wearable terminal (for example, a head mounted display (HMD), smart glasses, etc.), a laptop personal computer, a mobile terminal such as a mobile phone. Alternatively, the teacher terminal 30 may be a stationary terminal such as a desktop personal computer.

教室の管理者または生徒は生徒端末２０を操作してコンテンツ配信システム１にログインし、これにより生徒は教育用コンテンツを視聴することができる。教師は教師端末３０を操作してコンテンツ配信システム１にログインし、これにより自分の授業を生徒に提供することが可能になる。本実施形態では、コンテンツ配信システム１のユーザが既にログインしていることを前提とする。 The administrator of the classroom or the student operates the student terminal 20 to log in to the content distribution system 1, whereby the student can view the educational content. The teacher operates the teacher terminal 30 to log in to the content distribution system 1, and thereby can provide his class to the students. In the present embodiment, it is assumed that the user of the content distribution system 1 has already logged in.

原画像データベース４０は原画像データを記憶する装置である。原画像データは映像または静止画を示す。原画像データは、サーバ１０、教師端末３０、または別のコンピュータなどの任意のコンピュータによって原画像データベース４０に格納される。原画像データベース４０は過去に撮影された原画像を記憶するライブラリであるといえる。 The original image database 40 is a device that stores original image data. The original image data indicates a video or a still image. The original image data is stored in the original image database 40 by the server 10, the teacher terminal 30, or another computer such as another computer. The original image database 40 can be said to be a library that stores original images captured in the past.

コンテンツデータベース５０は教育用コンテンツデータを記憶する装置である。教育用コンテンツデータは映像または静止画を示す。コンテンツデータベース５０は教育用コンテンツのライブラリであるといえる。 The content database 50 is a device that stores educational content data. The educational content data indicates a video or a still image. It can be said that the content database 50 is a library of educational content.

原画像データベース４０およびコンテンツデータベース５０のそれぞれの設置場所は限定されない。例えば、原画像データベース４０またはコンテンツデータベース５０は、コンテンツ配信システム１とは別のコンピュータシステム内に設けられてもよいし、コンテンツ配信システム１の構成要素であってもよい。一つのデータベースが原画像データベース４０およびコンテンツデータベース５０の双方として機能してもよい。 The installation locations of the original image database 40 and the content database 50 are not limited. For example, the original image database 40 or the content database 50 may be provided in a computer system different from the content distribution system 1 or may be a component of the content distribution system 1. One database may function as both the original image database 40 and the content database 50.

図２はコンテンツ配信システム１に関連するハードウェア構成の一例を示す図である。図２は、サーバ１０として機能するサーバコンピュータ１００と、生徒端末２０または教師端末３０として機能する端末コンピュータ２００とを示す。 FIG. 2 is a diagram showing an example of a hardware configuration related to the content distribution system 1. FIG. 2 shows a server computer 100 functioning as the server 10 and a terminal computer 200 functioning as the student terminal 20 or the teacher terminal 30.

一例として、サーバコンピュータ１００はハードウェア構成要素として、プロセッサ１０１、主記憶部１０２、補助記憶部１０３、および通信部１０４を備える。 As an example, the server computer 100 includes a processor 101, a main storage unit 102, an auxiliary storage unit 103, and a communication unit 104 as hardware components.

プロセッサ１０１は、オペレーティングシステムおよびアプリケーションプログラムを実行する演算装置である。プロセッサの例としてＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）およびＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）が挙げられるが、プロセッサ１０１の種類はこれらに限定されない。例えば、プロセッサ１０１はセンサおよび専用回路の組合せでもよい。専用回路はＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）のようなプログラム可能な回路でもよいし、他の種類の回路でもよい。 The processor 101 is a computing device that executes an operating system and application programs. Examples of the processor include a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit), but the type of the processor 101 is not limited thereto. For example, the processor 101 may be a combination of sensors and dedicated circuitry. The dedicated circuit may be a programmable circuit such as an FPGA (Field-Programmable Gate Array) or another type of circuit.

主記憶部１０２は、サーバ１０を実現するためのプログラム、プロセッサ１０１から出力された演算結果などを記憶する装置である。主記憶部１０２は例えばＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）およびＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）のうちの少なくとも一つにより構成される。 The main storage unit 102 is a device that stores a program for realizing the server 10, a calculation result output from the processor 101, and the like. The main storage unit 102 includes at least one of a ROM (Read Only Memory) and a RAM (Random Access Memory), for example.

補助記憶部１０３は、一般に主記憶部１０２よりも大量のデータを記憶することが可能な装置である。補助記憶部１０３は例えばハードディスク、フラッシュメモリなどの不揮発性記憶媒体によって構成される。補助記憶部１０３は、サーバコンピュータ１００をサーバ１０として機能させるためのサーバプログラムＰ１と各種のデータとを記憶する。例えば、補助記憶部１０３はアバターなどの仮想オブジェクトと仮想空間とのうちの少なくとも一つに関するデータを記憶してもよい。本実施形態では、コンテンツ制御プログラムはサーバプログラムＰ１として実装される。 The auxiliary storage unit 103 is generally a device capable of storing a larger amount of data than the main storage unit 102. The auxiliary storage unit 103 is composed of a non-volatile storage medium such as a hard disk or a flash memory. The auxiliary storage unit 103 stores a server program P1 for causing the server computer 100 to function as the server 10 and various data. For example, the auxiliary storage unit 103 may store data regarding at least one of a virtual object such as an avatar and a virtual space. In this embodiment, the content control program is implemented as the server program P1.

通信部１０４は、通信ネットワークＮを介して他のコンピュータとの間でデータ通信を実行する装置である。通信部１０４は例えばネットワークカードまたは無線通信モジュールにより構成される。 The communication unit 104 is a device that executes data communication with other computers via the communication network N. The communication unit 104 is composed of, for example, a network card or a wireless communication module.

サーバ１０の各機能要素は、プロセッサ１０１または主記憶部１０２の上にサーバプログラムＰ１を読み込ませてプロセッサ１０１にそのプログラムを実行させることで実現される。サーバプログラムＰ１は、サーバ１０の各機能要素を実現するためのコードを含む。プロセッサ１０１はサーバプログラムＰ１に従って通信部１０４を動作させ、主記憶部１０２または補助記憶部１０３におけるデータの読み出しおよび書き込みを実行する。このような処理によりサーバ１０の各機能要素が実現される。 Each functional element of the server 10 is realized by causing the processor 101 or the main storage unit 102 to read the server program P1 and causing the processor 101 to execute the program. The server program P1 includes code for realizing each functional element of the server 10. The processor 101 operates the communication unit 104 according to the server program P1 to read and write data in the main storage unit 102 or the auxiliary storage unit 103. By such processing, each functional element of the server 10 is realized.

サーバ１０は一つまたは複数のコンピュータにより構成され得る。複数のコンピュータが用いられる場合には、通信ネットワークを介してこれらのコンピュータが互いに接続されることで、論理的に一つのサーバ１０が構成される。 The server 10 may be composed of one or more computers. When a plurality of computers are used, these computers are connected to each other via a communication network to logically form one server 10.

一例として、端末コンピュータ２００はハードウェア構成要素として、プロセッサ２０１、主記憶部２０２、補助記憶部２０３、および通信部２０４、入力インタフェース２０５、出力インタフェース２０６、および撮像部２０７を備える。 As an example, the terminal computer 200 includes a processor 201, a main storage unit 202, an auxiliary storage unit 203, a communication unit 204, an input interface 205, an output interface 206, and an imaging unit 207 as hardware components.

プロセッサ２０１は、オペレーティングシステムおよびアプリケーションプログラムを実行する演算装置である。プロセッサ２０１は例えばＣＰＵまたはＧＰＵであり得るが、プロセッサ２０１の種類はこれらに限定されない。 The processor 201 is a computing device that executes an operating system and application programs. The processor 201 may be, for example, a CPU or GPU, but the type of the processor 201 is not limited to these.

主記憶部２０２は、生徒端末２０または教師端末３０を実現させるためのプログラム、プロセッサ２０１から出力された演算結果などを記憶する装置である。主記憶部２０２は例えばＲＯＭおよびＲＡＭのうちの少なくとも一つにより構成される。 The main storage unit 202 is a device that stores a program for realizing the student terminal 20 or the teacher terminal 30, a calculation result output from the processor 201, and the like. The main storage unit 202 is composed of, for example, at least one of ROM and RAM.

補助記憶部２０３は、一般に主記憶部２０２よりも大量のデータを記憶することが可能な装置である。補助記憶部２０３は例えばハードディスク、フラッシュメモリなどの不揮発性記憶媒体によって構成される。補助記憶部２０３は、端末コンピュータ２００を生徒端末２０または教師端末３０として機能させるためのクライアントプログラムＰ２と各種のデータとを記憶する。例えば、補助記憶部２０３はアバターなどの仮想オブジェクトと仮想空間とのうちの少なくとも一つに関するデータを記憶してもよい。 The auxiliary storage unit 203 is generally a device capable of storing a larger amount of data than the main storage unit 202. The auxiliary storage unit 203 is configured by a non-volatile storage medium such as a hard disk or a flash memory. The auxiliary storage unit 203 stores a client program P2 for causing the terminal computer 200 to function as the student terminal 20 or the teacher terminal 30 and various data. For example, the auxiliary storage unit 203 may store data regarding at least one of a virtual object such as an avatar and a virtual space.

通信部２０４は、通信ネットワークＮを介して他のコンピュータとの間でデータ通信を実行する装置である。通信部２０４は例えばネットワークカードまたは無線通信モジュールにより構成される。 The communication unit 204 is a device that executes data communication with other computers via the communication network N. The communication unit 204 is composed of, for example, a network card or a wireless communication module.

入力インタフェース２０５は、ユーザの操作または動作に基づいてデータを受け付ける装置である。例えば、入力インタフェース２０５は、キーボード、操作ボタン、ポインティングデバイス、マイクロフォン、センサ、およびカメラのうちの少なくとも一つによって構成される。キーボードおよび操作ボタンはタッチパネル上に表示されてもよい。入力インタフェース２０５の種類が限定されないことに対応して、入力されるデータは限定されない。例えば、入力インタフェース２０５はキーボード、操作ボタン、またはポインティングデバイスによって入力または選択されたデータを受け付けてもよい。あるいは、入力インタフェース２０５は、マイクロフォンにより入力された音声データを受け付けてもよい。あるいは、入力インタフェース２０５はカメラによって撮影された画像データ（例えば、映像データまたは静止画データ）を受け付けてもよい。 The input interface 205 is a device that receives data based on a user's operation or action. For example, the input interface 205 includes at least one of a keyboard, operation buttons, pointing device, microphone, sensor, and camera. The keyboard and operation buttons may be displayed on the touch panel. Since the type of the input interface 205 is not limited, the input data is not limited. For example, the input interface 205 may accept data input or selected by a keyboard, operation buttons, or pointing device. Alternatively, the input interface 205 may accept voice data input by the microphone. Alternatively, the input interface 205 may accept image data (for example, video data or still image data) captured by a camera.

出力インタフェース２０６は、端末コンピュータ２００で処理されたデータを出力する装置である。例えば、出力インタフェース２０６はモニタ、タッチパネル、ＨＭＤおよびスピーカのうちの少なくとも一つによって構成される。モニタ、タッチパネル、ＨＭＤなどの表示装置は、処理されたデータを画面上に表示する。スピーカは、処理された音声データで示される音声を出力する。 The output interface 206 is a device that outputs data processed by the terminal computer 200. For example, the output interface 206 includes at least one of a monitor, a touch panel, an HMD, and a speaker. A display device such as a monitor, a touch panel, or an HMD displays the processed data on the screen. The speaker outputs the sound represented by the processed sound data.

撮像部２０７は、現実世界を写した画像を撮影する装置であり、具体的にはカメラである。撮像部２０７は動画像（映像）を撮影してもよいし静止画（写真）を撮影してもよい。動画像を撮影する場合には、撮像部２０７は映像信号を所与のフレームレートに基づいて処理することで、時系列に並ぶ一連のフレーム画像を動画像として取得する。撮像部２０７は入力インタフェース２０５としても機能し得る。 The image capturing unit 207 is a device that captures an image of the real world, and is specifically a camera. The imaging unit 207 may shoot a moving image (video) or a still image (photo). When capturing a moving image, the image capturing unit 207 acquires a series of frame images arranged in time series as a moving image by processing the video signal based on a given frame rate. The imaging unit 207 can also function as the input interface 205.

生徒端末２０または教師端末３０の各機能要素は、プロセッサ２０１または主記憶部２０２の上にクライアントプログラムＰ２を読み込ませてそのプログラムを実行させることで実現される。クライアントプログラムＰ２は、生徒端末２０または教師端末３０の各機能要素を実現するためのコードを含む。プロセッサ２０１はクライアントプログラムＰ２に従って通信部２０４、入力インタフェース２０５、出力インタフェース２０６、または撮像部２０７を動作させ、主記憶部２０２または補助記憶部２０３におけるデータの読み出しおよび書き込みを行う。この処理により生徒端末２０または教師端末３０の各機能要素が実現される。 Each functional element of the student terminal 20 or the teacher terminal 30 is realized by loading the client program P2 onto the processor 201 or the main storage unit 202 and executing the program. The client program P2 includes codes for realizing each functional element of the student terminal 20 or the teacher terminal 30. The processor 201 operates the communication unit 204, the input interface 205, the output interface 206, or the imaging unit 207 according to the client program P2 to read and write data in the main storage unit 202 or the auxiliary storage unit 203. By this processing, each functional element of the student terminal 20 or the teacher terminal 30 is realized.

サーバプログラムＰ１およびクライアントプログラムＰ２の少なくとも一つは、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリなどの有形の記録媒体に固定的に記録された上で提供されてもよい。あるいは、これらのプログラムの少なくとも一つは、搬送波に重畳されたデータ信号として通信ネットワークを介して提供されてもよい。これらのプログラムは別々に提供されてもよいし、一緒に提供されてもよい。 At least one of the server program P1 and the client program P2 may be provided after being fixedly recorded in a tangible recording medium such as a CD-ROM, a DVD-ROM, or a semiconductor memory. Alternatively, at least one of these programs may be provided via a communication network as a data signal superimposed on a carrier wave. These programs may be provided separately or together.

図３は教師端末３０の利用場面の一例を示す図である。この例では、教師端末３０はプロセッサ２０１、主記憶部２０２、補助記憶部２０３、通信部２０４などを収容するコンピュータ本体２１０と、入力インタフェース２０５として機能する撮像部２０７およびマイクロフォン２１１と、出力インタフェース２０６として機能するモニタ２１２とを備える。教師９０は必要に応じてボード（例えば、ホワイトボード、黒板、電子ホワイトボード、電子黒板など）９１上に文字、図形などを書くかまたは表示させながら授業を行う。撮像部２０７はその授業の場面を撮影することで原画像を得る。教師９０の音声（発話）はマイクロフォン２１１によって記録される。教師端末３０は撮影された映像にその音声が関連付けられた映像データを取得することができる。教師９０はモニタ２１２上に映された画像（例えば、後述の補助画像）を見ながら授業を行ってもよい。 FIG. 3 is a diagram showing an example of a usage scene of the teacher terminal 30. In this example, the teacher terminal 30 includes a computer main body 210 that houses a processor 201, a main storage unit 202, an auxiliary storage unit 203, a communication unit 204, an imaging unit 207 and a microphone 211 that function as an input interface 205, and an output interface 206. And a monitor 212 functioning as. The teacher 90 gives a lesson while writing or displaying characters or figures on a board (eg, whiteboard, blackboard, electronic whiteboard, electronic blackboard, etc.) 91 as needed. The imaging unit 207 obtains an original image by shooting the scene of the lesson. The voice (utterance) of the teacher 90 is recorded by the microphone 211. The teacher terminal 30 can acquire video data in which the sound is associated with the captured video. The teacher 90 may give a lesson while watching an image (for example, an auxiliary image described later) displayed on the monitor 212.

図４はコンテンツ配信システム１に関連する機能構成の一例を示す図である。サーバ１０は機能要素としてコンテンツ管理部１１、画像取得部１２、モーション特定部１３、コンテンツ生成部１４、出力部１５、および補助画像生成部１６を備える。コンテンツ管理部１１は教育用コンテンツの生成および出力を管理する機能要素であり、画像取得部１２、モーション特定部１３、コンテンツ生成部１４、および出力部１５を含んで構成される。画像取得部１２は原画像データを取得する機能要素である。モーション特定部１３はその原画像データから教師の動作を特定する機能要素である。コンテンツ生成部１４は、教師に対応するアバターを含む教育用コンテンツデータを生成する機能要素である。出力部１５は、その教育用コンテンツデータを出力する機能要素である。補助画像生成部１６は、生徒の様子を示す補助画像の電子データである補助画像データを生成してその補助画像データを教師端末３０上に向けて送信する機能要素である。補助画像は動画像（映像）でもよいし静止画でもよい。補助画像によって教師は授業中の生徒の様子を視認することができる。 FIG. 4 is a diagram showing an example of a functional configuration related to the content distribution system 1. The server 10 includes a content management unit 11, an image acquisition unit 12, a motion identification unit 13, a content generation unit 14, an output unit 15, and an auxiliary image generation unit 16 as functional elements. The content management unit 11 is a functional element that manages generation and output of educational content, and includes an image acquisition unit 12, a motion identification unit 13, a content generation unit 14, and an output unit 15. The image acquisition unit 12 is a functional element that acquires original image data. The motion identification unit 13 is a functional element that identifies the teacher's action from the original image data. The content generation unit 14 is a functional element that generates educational content data including an avatar corresponding to a teacher. The output unit 15 is a functional element that outputs the educational content data. The auxiliary image generation unit 16 is a functional element that generates auxiliary image data, which is electronic data of an auxiliary image showing a state of a student, and transmits the auxiliary image data to the teacher terminal 30. The auxiliary image may be a moving image (video) or a still image. The auxiliary image allows the teacher to visually recognize the students in class.

生徒端末２０は機能要素として要求部２１、受信部２２、表示制御部２３、および送信部２４を備える。要求部２１は、教育用コンテンツの表示モードの切り替えをサーバ１０に要求する機能要素である。受信部２２は教育用コンテンツデータを受信する機能要素である。表示制御部２３はその教育用コンテンツデータを処理して教育用コンテンツを表示装置上に表示する機能要素である。送信部２４は撮像部２０７によって生成された画像データをサーバ１０に向けて送信する機能要素である。 The student terminal 20 includes a request unit 21, a reception unit 22, a display control unit 23, and a transmission unit 24 as functional elements. The request unit 21 is a functional element that requests the server 10 to switch the display mode of the educational content. The receiving unit 22 is a functional element that receives educational content data. The display control unit 23 is a functional element that processes the educational content data and displays the educational content on the display device. The transmission unit 24 is a functional element that transmits the image data generated by the imaging unit 207 to the server 10.

教師端末３０は機能要素として送信部３１、受信部３２、および表示制御部３３を備える。送信部３１は撮像部２０７によって生成された画像データをサーバ１０に向けて送信する機能要素である。受信部３２は補助画像データを受信する機能要素である。表示制御部３３はその補助画像データを処理して補助画像を表示装置上に表示する機能要素である。 The teacher terminal 30 includes a transmitting unit 31, a receiving unit 32, and a display control unit 33 as functional elements. The transmission unit 31 is a functional element that transmits the image data generated by the imaging unit 207 to the server 10. The receiving unit 32 is a functional element that receives the auxiliary image data. The display control unit 33 is a functional element that processes the auxiliary image data and displays the auxiliary image on the display device.

［システムの動作］
コンテンツ配信システム１の動作（より具体的にはサーバ１０の動作）を説明するとともに、本実施形態に係るコンテンツ制御方法（またはコンテンツ配信方法）について説明する。以下では画像処理に関して特に説明し、音声データの処理に関しては詳細な説明を省略する。 [System operation]
The operation of the content distribution system 1 (more specifically, the operation of the server 10) will be described, and the content control method (or content distribution method) according to the present embodiment will be described. Hereinafter, the image processing will be described in detail, and the detailed description of the audio data processing will be omitted.

図５は、教育用コンテンツの表示モードの選択の一例を処理フローＳ１として示すフローチャートである。ステップＳ１１では、コンテンツ管理部１１が次の表示モード（すなわち、切替後の表示モード）を特定する。表示モードの切替方法は限定されず、これに関連して、コンテンツ管理部１１は次の表示モードを任意の手法で特定してよい。例えば、表示モードは生徒端末２０での操作に応答して切り替えられてもよい。この場合には、生徒は教育用コンテンツの表示モードを切り替えるための操作を生徒端末２０上で行う。生徒端末２０では要求部２１がその操作に応答して切替要求をサーバ１０に向けて送信する。切替要求は、表示モードの切替を要求するデータ信号であり、次の表示モードを示す。コンテンツ管理部１１はその切替要求を受信して次の表示モードを特定する。あるいは、表示モードはユーザからの要求を受け付けることなく自動的に切り替えられてもよい。例えば、コンテンツ管理部１１は、教育用コンテンツを制御するために予め設定されたシナリオを参照することで次の表示モードを特定してもよい。 FIG. 5 is a flowchart showing an example of the selection of the display mode of the educational content as the processing flow S1. In step S11, the content management unit 11 specifies the next display mode (that is, the display mode after switching). The display mode switching method is not limited, and in connection with this, the content management unit 11 may specify the next display mode by an arbitrary method. For example, the display mode may be switched in response to an operation on the student terminal 20. In this case, the student performs an operation for switching the display mode of the educational content on the student terminal 20. In the student terminal 20, the request unit 21 transmits a switching request to the server 10 in response to the operation. The switching request is a data signal requesting switching of the display mode, and indicates the next display mode. The content management unit 11 receives the switching request and specifies the next display mode. Alternatively, the display mode may be automatically switched without receiving a request from the user. For example, the content management unit 11 may identify the next display mode by referring to a preset scenario for controlling the educational content.

ステップＳ１２に示すように、次の表示モードが何かによって後続の処理が変わる。表示モードの個数および具体的な形式は限定されず、任意の方針で設定されてよい。本実施形態では一例として、コンテンツ配信システム１が実写モード、複合モード、および仮想モードという３種類の表示モードを提供可能であるとする。実写モードは、実写画像である原画像をそのまま表示する表現形式であり、したがって、この場合には教育用コンテンツはアバターを含まない。複合モードは、教師およびアバターの双方が視認可能である画像を表示する表現形式であり、この場合にはアバターが教師とは異なる位置に表示される。「教師およびアバターの双方が視認可能である」とは、教育用コンテンツを見た人が、教師およびアバターの双方の姿を視覚的に明確に認識することができることをいう。仮想モードは、教師は視認不可だがアバターは視認可能である画像を表示する表現形式である。「教師は視認不可だがアバターは視認可能である」とは、教育用コンテンツを見た人が、教師の姿を視覚的に明確にまたは全く認識することはできないが、アバターの姿は視覚的に明確に認識できることをいう。 As shown in step S12, subsequent processing is changed depending on the next display mode. The number of display modes and the specific format are not limited, and may be set by an arbitrary policy. In the present embodiment, as an example, it is assumed that the content distribution system 1 can provide three types of display modes: a live-action mode, a composite mode, and a virtual mode. The live-action mode is an expression format in which the original image, which is a live-action image, is displayed as it is. Therefore, in this case, the educational content does not include an avatar. The composite mode is an expression format in which an image that is visible to both the teacher and the avatar is displayed, and in this case, the avatar is displayed at a position different from that of the teacher. “Both the teacher and the avatar are visually recognizable” means that a person who views the educational content can visually and clearly recognize both the teacher and the avatar. The virtual mode is an expression format that displays an image that the teacher cannot see but the avatar can see. "Teachers are invisible, but avatars are visible." Means that the person watching the educational content cannot visually or clearly recognize the teacher's appearance, but the appearance of the avatar is visually It is something that can be clearly recognized.

実写モードが指定された場合には処理はステップＳ１３に進む。ステップＳ１３では、コンテンツ管理部１１が原画像データをそのまま教育用コンテンツデータとして出力する。具体的には、画像取得部１２が原画像データを取得し、出力部１５がその原画像データを教育用コンテンツデータとして出力する。原画像データの取得方法は限定されない。例えば、画像取得部１２は教師端末３０から送られてきた画像データを原画像データとして受信してもよい。あるいは、画像取得部１２は生徒端末２０からの要求信号に応答して、その要求信号に対応する画像（例えば、生徒が希望する授業の映像の少なくとも一部）を原画像データベース４０から原画像データとして読み出してもよい。教育用コンテンツデータの出力方法も限定されない。例えば、出力部１５は教育用コンテンツデータを、１以上の生徒端末２０に向けて送信してもよいし、コンテンツデータベース５０に格納してもよい。あるいは、出力部１５は教育用コンテンツデータを、生徒端末２０に向けて送信するとともにコンテンツデータベース５０に格納してもよい。 If the live-action mode is designated, the process proceeds to step S13. In step S13, the content management section 11 outputs the original image data as it is as educational content data. Specifically, the image acquisition unit 12 acquires original image data, and the output unit 15 outputs the original image data as educational content data. The acquisition method of the original image data is not limited. For example, the image acquisition unit 12 may receive the image data sent from the teacher terminal 30 as the original image data. Alternatively, the image acquisition unit 12 responds to a request signal from the student terminal 20 and outputs an image corresponding to the request signal (for example, at least a part of the video of the lesson desired by the student) from the original image database 40. May be read as The output method of the educational content data is also not limited. For example, the output unit 15 may transmit the educational content data to one or more student terminals 20, or may store the educational content data in the content database 50. Alternatively, the output unit 15 may transmit the educational content data to the student terminal 20 and store the educational content data in the content database 50.

複合モードが指定された場合には処理はステップＳ１４に進む。ステップＳ１４では、コンテンツ管理部１１が、教師およびアバターの双方が視認可能である教育用コンテンツデータを出力する。 If the composite mode is designated, the process proceeds to step S14. In step S14, the content management unit 11 outputs educational content data that is visible to both the teacher and the avatar.

仮想モードが指定された場合には処理はステップＳ１５に進む。ステップＳ１５では、コンテンツ管理部１１が、教師は視認不可だがアバターは視認可能である教育用コンテンツデータを出力する。 If the virtual mode is designated, the process proceeds to step S15. In step S15, the content management unit 11 outputs educational content data in which the teacher is not visible but the avatar is visible.

図６を参照しながらステップＳ１４，Ｓ１５の詳細を説明する。図６は、複合モードおよび仮想モードでの教育コンテンツデータの出力の一例を示すフローチャートである。複合モードと仮想モードとの相違点はアバターの配置方法であり、双方の処理で共通する部分が多いので、図６を参照しながら複合モードおよび仮想モードの双方について説明する。 Details of steps S14 and S15 will be described with reference to FIG. FIG. 6 is a flowchart showing an example of outputting educational content data in the composite mode and the virtual mode. The difference between the composite mode and the virtual mode is the arrangement method of the avatars, and since there are many parts common to both processes, both the composite mode and the virtual mode will be described with reference to FIG.

ステップＳ１０１では、画像取得部１２が原画像データを取得する。ステップＳ１３と同様に、原画像データの取得方法は限定されない。したがって、画像取得部１２は原画像データを、教師端末３０から受信してもよいし、原画像データベース４０から読み出してもよい。 In step S101, the image acquisition unit 12 acquires original image data. Similar to step S13, the method of acquiring the original image data is not limited. Therefore, the image acquisition unit 12 may receive the original image data from the teacher terminal 30 or may read it from the original image database 40.

ステップＳ１０２では、モーション特定部１３がその原画像データに基づいて教師の動作を特定する。人の動作とは、人の姿勢、表情、または身体の動きのことをいい、口の動きを伴う発声も含み得る。一例では、モーション特定部１３は教師が表示されている領域を原画像から特定し、該領域における教師の２次元の動き（例えば、姿勢、表情）を特定し、その動きに対応する複数のジョイントの位置を特定し、それぞれのジョイントの深度を推定する。姿勢を規定するジョイントは教師の身体の部位に対応する。例えば、モーション特定部１３は、関節と顔の主要な部位（眉毛、目、顎など）とにジョイントを設定してもよいし、これらとは別の箇所にジョイントを設定してもよい。 In step S102, the motion identifying unit 13 identifies the teacher's action based on the original image data. A person's movement refers to a person's posture, facial expression, or body movement, and may also include vocalization accompanied by mouth movement. In one example, the motion identification unit 13 identifies the area in which the teacher is displayed from the original image, identifies the two-dimensional movement (for example, posture, facial expression) of the teacher in the area, and determines a plurality of joints corresponding to the movement. Position and estimate the depth of each joint. The joint that defines the posture corresponds to the body part of the teacher. For example, the motion specifying unit 13 may set a joint at a joint and a main part of the face (eyebrows, eyes, chin, etc.), or may set a joint at a position different from these.

モーション特定部１３は隣り合うジョイントの位置関係と、身体運動の合理性および整合性に基づいて予め定められたルール（動作ルール）とに基づいて、カメラのレンズ中心に対するそれぞれのボーン（隣り合うジョイントを結ぶ仮想線）の向きおよび角度を推定する。モーション特定部１３はこの推定によって教師の３次元の動作を特定することができる。身体運動の合理性および整合性とは、人間の可能な動きのことをいう。例えば、その合理性および整合性は、肘および膝は或る一方向には曲がるがその逆方向には曲がらないという制約、首に対する頭の動きの範囲、肩に対する上腕の動きの範囲、指の可動範囲などを含み得る。教師の動作を特定する手法は上記のものに限定されず、モーション特定部１３は他の手法によって教師の動作を特定してもよい。 The motion specifying unit 13 determines each bone (adjacent joints) with respect to the lens center of the camera based on the positional relationship between adjacent joints and a rule (motion rule) predetermined based on rationality and consistency of body movement. Estimate the direction and angle of the imaginary line connecting). The motion specifying unit 13 can specify the three-dimensional motion of the teacher by this estimation. Rationality and coherence of physical movement refer to the possible movements of humans. For example, its rationality and consistency is due to the constraint that the elbows and knees bend in one direction but not the other, the range of head movements relative to the neck, the range of upper arm movements relative to the shoulders, and finger movements. It may include a movable range and the like. The method for specifying the teacher's motion is not limited to the above, and the motion specifying unit 13 may specify the teacher's motion by another method.

ステップＳ１０３で示すように、モーション特定部１３は原画像データに基づいて、教師（配信者）の動作に関連する現実オブジェクト（これを「関連現実オブジェクト」という）の状態を特定してもよい。現実オブジェクトとは、人が知覚可能なもののことをいい、例えば、物、人、音声などの様々なオブジェクトを含み得る。例えば、現実オブジェクトとは、原画像により映された物体、または、原画像に関連付けられた音声であり得る。教師も現実オブジェクトの一例であるといえる。現実オブジェクトの状態とは、人の知覚によって把握可能な現実オブジェクトの様子のことをいい、その状態は例えば、現実オブジェクトの形状、位置、動き（動作）、音、および声のうちの少なくとも一つを含んでもよい。ステップＳ１０３は省略されてもよい。関連現実オブジェクトとは、配信者の動作に関連して変化し、動作し、出現し、または消える現実オブジェクトのことをいう。ただし、本開示では関連現実オブジェクトは配信者（教師）を含まないものとする。関連現実オブジェクトの種類は限定されない。教師に対応する関連現実オブジェクトの例として、教師が手に取ったり机に置いたりする教科書と、教師による記述（例えば、教師がボード上に書いたりまたは消したりする文字、文字列、記号、または絵）と、教師の発話とのうちの少なくとも一つが挙げられる。モーション特定部１３は一または複数の関連現実オブジェクトを任意の画像解析手法によって特定してよい。 As shown in step S103, the motion identifying unit 13 may identify the state of the physical object (this is referred to as a "related physical object") related to the motion of the teacher (distributor) based on the original image data. The physical object refers to something that can be perceived by a person, and may include various objects such as an object, a person, and a voice. For example, the physical object may be an object shown by the original image or a sound associated with the original image. It can be said that the teacher is also an example of a real object. The state of a real object is a state of the real object that can be grasped by human perception, and the state is, for example, at least one of shape, position, movement (motion), sound, and voice of the real object. May be included. Step S103 may be omitted. Related reality objects are reality objects that change, operate, appear, or disappear in relation to the behavior of the distributor. However, in the present disclosure, the related reality object does not include a distributor (teacher). The type of related reality object is not limited. Examples of related reality objects that correspond to a teacher include textbooks that the teacher picks up or puts on a desk and descriptions by the teacher (for example, letters, strings, symbols, or characters that the teacher writes or erases on the board. At least one of the picture) and the utterance of the teacher. The motion identifying unit 13 may identify one or more related physical objects by an arbitrary image analysis method.

ステップＳ１０４では、モーション特定部１３がローデータを生成する。ローデータとは、特定された教師（配信者）の動作を少なくとも示す電子データのことをいい、ステップＳ１０３が実行された場合には一または複数の関連現実オブジェクトの状態をさらに示す。本実施形態では、教師（配信者）の動作を示すデータを特にモーションデータともいう。ローデータはモーションデータを含む。ローデータおよびモーションデータのいずれについてもデータ構造は限定されず、任意に設計されてよい。例えば、モーション特定部１３は、教師の３次元の動き（例えば、姿勢、表情）を示す複数のジョイントおよび複数のボーンに関する情報と教師の識別子（ＩＤ）とをモーションデータに含めてもよい。ジョイントおよびボーンに関する情報の例として、個々のジョイントの３次元座標、隣り合うジョイントの組合せ（すなわちボーン）とが挙げられるが、この情報の構成はこれに限定されず、任意に設計されてよい。モーション特定部１３は教師の発話および記述の少なくとも一方をテキストに変換してそのテキストをローデータまたはモーションデータに含めてもよい。関連現実オブジェクトを示す情報の構成も限定されず、例えばモーション特定部１３はそれぞれの関連現実オブジェクトについて、識別子（ＩＤ）と状態（例えば形状、位置、文字列など）を示す情報とをローデータに含めてもよい。 In step S104, the motion specifying unit 13 generates raw data. Raw data refers to electronic data that indicates at least the operation of the identified teacher (distributor), and further indicates the state of one or more related real objects when step S103 is executed. In the present embodiment, the data indicating the motion of the teacher (distributor) is also referred to as motion data. Raw data includes motion data. The data structure of both raw data and motion data is not limited and may be designed arbitrarily. For example, the motion identifying unit 13 may include information about a plurality of joints and a plurality of bones that indicate a three-dimensional movement (for example, posture, facial expression) of the teacher and the teacher identifier (ID) in the motion data. Examples of information about joints and bones include three-dimensional coordinates of individual joints and combinations of adjacent joints (that is, bones), but the configuration of this information is not limited to this and may be arbitrarily designed. The motion specifying unit 13 may convert at least one of the teacher's utterance and description into text and include the text in raw data or motion data. The configuration of the information indicating the related reality object is not limited, and for example, the motion identification unit 13 sets the identifier (ID) and the information indicating the state (eg, shape, position, character string, etc.) for each related reality object as raw data. May be included.

ステップＳ１０５では、コンテンツ生成部１４が、教師に対応するアバターのモデルデータを取得する。モデルデータの取得方法は限定されない。例えば、コンテンツ生成部１４は予め設定されたアバター、あるいはコンテンツ配信システム１のユーザ（例えば教師または生徒）によって指定されたアバターのモデルデータを補助記憶部１０３から読み出してもよい。 In step S105, the content generation unit 14 acquires model data of the avatar corresponding to the teacher. The method of acquiring model data is not limited. For example, the content generation unit 14 may read model data of a preset avatar or an avatar designated by a user (for example, a teacher or a student) of the content distribution system 1 from the auxiliary storage unit 103.

モデルデータとは、仮想オブジェクトの仕様を規定するために用いられる電子データのことをいう。仮想オブジェクトの仕様とは、仮想オブジェクトを制御するための取り決めまたは方法のことをいう。例えば、仕様は仮想オブジェクトの構成（例えば形状および寸法）、動作、および音声のうちの少なくとも一つを含む。アバターのモデルデータのデータ構造は限定されず、任意に設計されてよい。例えば、モデルデータはアバターを構成する複数のジョイントおよび複数のボーンに関する情報と、アバターの外観デザインを示すグラフィックデータと、アバターの属性と、アバターの識別子（ＩＤ）とを含んでもよい。ジョイントおよびボーンに関する情報の例として、個々のジョイントの３次元座標と、隣り合うジョイントの組合せ（すなわちボーン）とが挙げられるが、この情報の構成はこれに限定されず、任意に設計されてよい。アバターの属性とは、アバターを特徴付けるために設定される任意の情報であり、例えば公称寸法、声質、または性格を含み得る。 Model data refers to electronic data used to define the specifications of virtual objects. A virtual object specification refers to an agreement or method for controlling a virtual object. For example, the specifications include at least one of a virtual object's configuration (eg, shape and size), behavior, and audio. The data structure of the avatar model data is not limited and may be arbitrarily designed. For example, the model data may include information about a plurality of joints and a plurality of bones that make up the avatar, graphic data indicating an appearance design of the avatar, an attribute of the avatar, and an avatar identifier (ID). Examples of information about joints and bones include three-dimensional coordinates of individual joints and a combination of adjacent joints (that is, bones), but the structure of this information is not limited to this and may be arbitrarily designed. . An avatar's attributes are any information set to characterize the avatar, and may include, for example, nominal size, voice quality, or personality.

ステップＳ１０６で示すように、コンテンツ生成部１４は一または複数の関連現実オブジェクトのそれぞれに対応する仮想オブジェクト（これを「関連仮想オブジェクト」という）のモデルデータを取得してもよい。ステップＳ１０３が実行されない場合にはステップＳ１０６も省略される。関連仮想オブジェクトは任意の物体を表現してよい。例えば、関連仮想オブジェクトは、現実世界には存在しない物体（例えば、架空のキャラクタ）を表現してもよいし、現実世界に存在する自然物または人工物などを模したものを表現してもよい。あるいは、関連仮想オブジェクトは関連現実オブジェクトに視覚効果を与えるための表現であってもよい。例えば、関連オブジェクトは、教師が用いる教科書に対応する本またはキャラクタでもよいし、教師がボード上に書いた文字列を装飾するためのグラフィック表現でもよいし、教師の発話のテキストに関するグラフィック表現でもよい。関連仮想オブジェクトのモデルデータのデータ構造は限定されず、意図する表現に応じて任意に設計されてよい。例えば、関連仮想オブジェクトがキャラクタであれば、そのモデルデータはアバターのものと同様のデータ構造を有してもよい。あるいは、関連仮想オブジェクトのモデルデータは、外観デザインを示すグラフィックデータのみを含んでもよい。 As shown in step S106, the content generation unit 14 may acquire model data of a virtual object (this is referred to as a “related virtual object”) corresponding to each of the one or more related real objects. When step S103 is not executed, step S106 is also omitted. The associated virtual object may represent any object. For example, the related virtual object may represent an object that does not exist in the real world (for example, a fictional character), or may represent a natural object or an artificial object that exists in the real world. Alternatively, the related virtual object may be a representation for giving a visual effect to the related real object. For example, the related object may be a book or a character corresponding to a textbook used by the teacher, a graphic expression for decorating a character string written on the board by the teacher, or a graphic expression related to the text of the utterance of the teacher. . The data structure of the model data of the related virtual object is not limited and may be arbitrarily designed according to the intended expression. For example, if the related virtual object is a character, its model data may have the same data structure as that of the avatar. Alternatively, the model data of the related virtual object may include only graphic data indicating the appearance design.

ステップＳ１０７では、コンテンツ生成部１４が、アバターを含む教育用コンテンツデータを生成する。ステップＳ１０３，Ｓ１０６が実行された場合には、コンテンツ生成部１４はアバターに加えて一または複数の関連仮想オブジェクトをさらに含む教育用コンテンツデータを生成し得る。 In step S107, the content generation unit 14 generates educational content data including an avatar. When steps S103 and S106 are executed, the content generation unit 14 can generate educational content data that further includes one or more related virtual objects in addition to the avatar.

一例では、コンテンツ生成部１４は原画像データに基づいて仮想空間を設定する。仮想空間の設定は、仮想空間内での仮想カメラの位置を特定する処理と、原画像に映っている１以上の現実オブジェクトのそれぞれの位置および寸法を特定する処理とを含み得る。コンテンツ生成部１４は仮想カメラの光軸方向における各現実オブジェクトの位置、または現実オブジェクト間の位置関係を算出し、この計算結果に基づいて仮想空間を設定してもよい。あるいは、コンテンツ生成部１４は原画像を機械学習などの手法により解析することで仮想空間を設定してもよい。一例では、原画像で示される場面は２次元のスクリーンのように仮想空間内に設定されてもよい。 In one example, the content generation unit 14 sets the virtual space based on the original image data. The setting of the virtual space may include a process of specifying the position of the virtual camera in the virtual space, and a process of specifying the position and the size of each of the one or more real objects shown in the original image. The content generation unit 14 may calculate the position of each real object in the optical axis direction of the virtual camera or the positional relationship between the real objects, and set the virtual space based on the calculation result. Alternatively, the content generation unit 14 may set the virtual space by analyzing the original image by a method such as machine learning. In one example, the scene shown in the original image may be set in the virtual space like a two-dimensional screen.

仮想空間を設定した後に、コンテンツ生成部１４は表示モードに応じてアバターおよび教師に関する制御を実行する。コンテンツ生成部１４はその仮想空間内にアバターを配置し、一または複数の関連仮想オブジェクトが存在する場合にはそれぞれの関連仮想オブジェクトをさらに配置する。「（アバター、関連仮想オブジェクトなどの）オブジェクトを配置する」とは、オブジェクトを決められた位置に置くことをいい、オブジェクトの位置の変更を含む概念である。 After setting the virtual space, the content generation unit 14 executes control related to the avatar and the teacher according to the display mode. The content generation unit 14 arranges an avatar in the virtual space and further arranges each related virtual object when one or a plurality of related virtual objects exist. “Arranging an object (such as an avatar or a related virtual object)” means placing the object at a predetermined position, and is a concept that includes changing the position of the object.

次の表示モードが複合モードである場合には、コンテンツ生成部１４はアバターおよび教師の双方が視認可能である教育用コンテンツデータを生成する。具体的には、コンテンツ生成部１４はアバターを教師とは異なる位置に配置することで、アバターおよび教師の双方を視認可能にする。「アバターを教師とは異なる位置に配置する」とは、教育コンテンツが表示された際にそのコンテンツ画像においてアバターおよび教師の双方が視認可能になるように配置することをいう。コンテンツ生成部１４はコンテンツ画像上でアバターが教師と重ならないようにアバターを配置してもよい。あるいは、コンテンツ生成部１４は、コンテンツ画像上でアバターが教師の一部と重なるが生徒が教師を視認できるようにアバターを配置してもよい。 When the next display mode is the composite mode, the content generation unit 14 generates educational content data that can be visually recognized by both the avatar and the teacher. Specifically, the content generation unit 14 arranges the avatar at a position different from that of the teacher, so that both the avatar and the teacher can be visually recognized. “Arranging the avatar at a position different from that of the teacher” means arranging so that both the avatar and the teacher can be visually recognized in the content image when the educational content is displayed. The content generation unit 14 may arrange the avatar so that the avatar does not overlap the teacher on the content image. Alternatively, the content generation unit 14 may arrange the avatar so that the student can visually recognize the teacher although the avatar overlaps a part of the teacher on the content image.

一例では、コンテンツ生成部１４は仮想カメラからアバターおよび教師の双方を写すことができるように、仮想空間においてアバターを教師とは異なる位置に配置してもよい。 In one example, the content generation unit 14 may arrange the avatar at a position different from the teacher in the virtual space so that both the avatar and the teacher can be captured from the virtual camera.

別の例では、コンテンツ生成部１４はアバターが映るウィンドウ（アバターウィンドウまたは第１ウィンドウ）と教師が映るウィンドウ（教師ウィンドウまたは第２ウィンドウ）とを含む画面を生成することで、アバターを教師とは異なる位置に配置してもよい。例えば、コンテンツ生成部１４は、アバターを含む仮想空間を示すアバターウィンドウと、教師が映った実世界を示す教師ウィンドウとに画面を分割してもよい。 In another example, the content generation unit 14 generates a screen including a window in which an avatar appears (an avatar window or a first window) and a window in which a teacher appears (a teacher window or a second window), so that the avatar becomes a teacher. You may arrange | position in a different position. For example, the content generation unit 14 may divide the screen into an avatar window showing a virtual space including an avatar and a teacher window showing the real world in which the teacher is shown.

コンテンツ生成部１４はアバターウィンドウを、実写画像領域と仮想オブジェクトとの組合せによって表現してもよいし、実写画像領域を用いることなく仮想オブジェクトおよび仮想背景によって表現してもよい。コンテンツ生成部１４は原画像をそのまま教師ウィンドウとして設定してもよい。コンテンツ生成部１４はアバターウィンドウの面積が教師ウィンドウの面積よりも大きくなるように双方のウィンドウを設定してもよい。あるいは、コンテンツ生成部１４は、原画像（教師ウィンドウ）を仮想空間内に背景の一部として配置することで、アバターを教師とは異なる位置に配置してもよい。この場合には、原画像（教師ウィンドウ）が仮想空間内の一オブジェクトとして配置されるので、アバターと、教師が映った原画像（教師ウィンドウ）との位置関係を３次元的に規定することができる。例えば、コンテンツ生成部１４は仮想空間において原画像（教師ウィンドウ）をアバターよりも奥に位置させてもよい。コンテンツ生成部１４は教師ウィンドウの面積がアバターウィンドウの面積よりも大きくなるように双方のウィンドウを設定してもよい。 The content generation unit 14 may represent the avatar window by a combination of a real image area and a virtual object, or may represent the avatar window by a virtual object and a virtual background without using the real image area. The content generation unit 14 may set the original image as it is as the teacher window. The content generation unit 14 may set both windows so that the area of the avatar window is larger than the area of the teacher window. Alternatively, the content generation unit 14 may arrange the original image (teacher window) in the virtual space as a part of the background to arrange the avatar at a position different from that of the teacher. In this case, since the original image (teacher window) is arranged as one object in the virtual space, it is possible to three-dimensionally define the positional relationship between the avatar and the original image (teacher window) showing the teacher. it can. For example, the content generation unit 14 may position the original image (teacher window) behind the avatar in the virtual space. The content generation unit 14 may set both windows so that the area of the teacher window is larger than the area of the avatar window.

コンテンツ生成部１４は画面分割のために任意の技術を用いてよく、例えば、ピクチャ・イン・ピクチャ（Ｐｉｃｔｕｒｅ−ｉｎ−Ｐｉｃｔｕｒｅ（ＰｉｎＰ））によって画面分割を実行してもよい。 The content generation unit 14 may use any technique for screen division, and may perform screen division by, for example, picture-in-picture (Picture-in-Picture (PinP)).

アバターウィンドウおよび教師ウィンドウを用いる場合には、コンテンツ生成部１４はアバターまたはウィンドウの表示に関する視覚効果を教育用コンテンツデータに含めてもよい。例えば、コンテンツ生成部１４はアバターウィンドウが教師ウィンドウから飛び出てくるような視覚効果を教育用コンテンツデータに含めてもよい。あるいは、コンテンツ生成部１４はアバターが教師ウィンドウ、または教師の身体から飛び出てくるような視覚効果を教育用コンテンツデータに含めてもよい。 When using the avatar window and the teacher window, the content generation unit 14 may include the visual effect related to the display of the avatar or the window in the educational content data. For example, the content generation unit 14 may include a visual effect in which the avatar window pops out of the teacher window in the educational content data. Alternatively, the content generation unit 14 may include a visual effect in which the avatar pops out of the teacher window or the teacher's body into the educational content data.

コンテンツ生成部１４は、教師とは異なる位置に配置したアバターの各ジョイントの位置をローデータ（より具体的にはモーションデータ）に基づいて設定することで、教師に対応するアバターの仕様を決定する。「教師に対応するアバターの仕様」とは、アバターの仕様が教師の動作に従うかまたはほぼ従うことをいう。決定されるアバターの仕様の決定はアバターの動作を含んでもよく、この場合には、動作のミラーリングが実現される。教師の動作と合わせるようにアバターの各ジョイントの位置を設定することで、アバターの個々のボーンの向きおよび角度が教師の姿勢を反映する。コンテンツ生成部１４はアバターの寸法が教師の大きさと同じかまたはほぼ同じになるようにアバターの各ジョイントの位置を調整してもよい。アバターの寸法を教師の大きさと同じにすることで、アバターが教師の分身であることを視覚的にさらに強調することができる。コンテンツ生成部１４はアバターの寸法を教師と異ならせてもよい。 The content generation unit 14 determines the specifications of the avatar corresponding to the teacher by setting the position of each joint of the avatar placed at a position different from that of the teacher based on raw data (more specifically, motion data). . “Specification of an avatar corresponding to the teacher” means that the specifications of the avatar follow or almost follow the actions of the teacher. Determining the specification of the determined avatar may include the action of the avatar, in which case mirroring of the action is achieved. By setting the position of each joint of the avatar so as to match the movement of the teacher, the orientation and angle of each bone of the avatar reflect the posture of the teacher. The content generation unit 14 may adjust the position of each joint of the avatar so that the size of the avatar is the same as or substantially the same as the size of the teacher. By making the size of the avatar the same as the size of the teacher, it is possible to further emphasize visually that the avatar is the alter ego of the teacher. The content generation unit 14 may make the size of the avatar different from that of the teacher.

次の表示モードが仮想モードである場合には、コンテンツ生成部１４は、アバターは視認可能であるが教師は視認不可である教育用コンテンツデータを生成する。一例では、コンテンツ生成部１４はアバターを教師に重畳するように配置してもよい。この配置処理は、教育用コンテンツが表示装置上に表示された際に教師がアバターによって隠れるようにアバターを配置することをいう。より具体的に言い換えると、「アバターを教師に重畳するように配置する」とは、アバターを、原画像で示される場面内の教師に重畳するように配置することをいう。「教師がアバターによって隠れる」とは、教師の身体がアバターによって完全に隠れる場合だけでなく、教師の身体の一部は隠れないがほとんどがアバターによって隠れる場合も含む概念を意味することに留意されたい。例えば、教師とアバターとの間の体格差などの要因によって、教育用コンテンツ上で教師の身体がアバターからはみ出るように映ることがあり得るが、アバターを教師に重畳させる処理はこのような場合も含み得る。いずれにしても、アバターを教師に重畳させることで、生徒はコンテンツ画像上で教師を視認できなくなる。 When the next display mode is the virtual mode, the content generation unit 14 generates educational content data in which the avatar is visible but the teacher is not. In one example, the content generation unit 14 may arrange the avatar so as to be superimposed on the teacher. This placement process means placing the avatar so that the teacher hides it when the educational content is displayed on the display device. More specifically, in other words, “arranging the avatar so as to overlap the teacher” means arranging the avatar so as to overlap the teacher in the scene shown in the original image. It is noted that "teacher is hidden by an avatar" means not only when the teacher's body is completely hidden by the avatar, but also when the teacher's body is not completely hidden by the avatar. I want to. For example, the teacher's body may appear to protrude from the avatar on the educational content due to factors such as the physical constitution difference between the teacher and the avatar. May be included. In any case, by superimposing the avatar on the teacher, the student cannot see the teacher on the content image.

コンテンツ生成部１４は、２次元画像においてアバターが教師に代わって表示されるように、仮想空間内にアバターを配置する。コンテンツ生成部１４はローデータ（より具体的にはモーションデータ）に基づいて、アバターの各ジョイントの位置を教師の対応する部位（例えば関節）に合わせることで、教師に対応するアバターの仕様を決定する。アバターのジョイントの位置を教師の関節に合わせることによって、アバターの個々のボーンの向きおよび角度が教師の姿勢を反映し、アバターの寸法が教師の大きさと同じかまたはほぼ同じになるように調整される。 The content generation unit 14 arranges the avatar in the virtual space so that the avatar is displayed on behalf of the teacher in the two-dimensional image. The content generation unit 14 determines the specifications of the avatar corresponding to the teacher by adjusting the position of each joint of the avatar to the corresponding part (joint) of the teacher based on the raw data (more specifically, motion data). To do. By aligning the joints of the avatar with the joints of the teacher, the orientation and angles of the individual bones of the avatar reflect the posture of the teacher and the dimensions of the avatar are adjusted to be approximately or nearly the same as the teacher. It

仮想モードにおいて、コンテンツ生成部１４はアバターを教師とは異なる位置に配置し、複合モードと同様にアバターの動作および寸法を設定した上で、不鮮明化処理によって画像上の教師を視認不可にしてもよい。不鮮明化処理とは、オブジェクト（この例では教師）の存在は認識できるが該オブジェクトの姿は視覚的に明確に認識できないように、オブジェクトの個々の部位の輪郭を不明確にする処理のことをいう。不鮮明化処理の例としてモザイク処理、ぼかし処理、およびフォグ処理が挙げられるが、他の技術が用いられてもよい。あるいは、コンテンツ生成部１４は、画像上から教師を消去し、その教師の部分（すなわち、教師によって隠れていた部分）の背景を復元することによって教師を視認不可にしてもよい。この処理はレタッチ（ｒｅｔｏｕｃｈ）ともいわれる。 In the virtual mode, the content generation unit 14 arranges the avatar at a position different from that of the teacher, sets the motion and size of the avatar as in the composite mode, and then makes the teacher invisible in the image by the blurring process. Good. The blurring process is a process of making the contours of individual parts of an object unclear so that the existence of an object (a teacher in this example) can be recognized but the appearance of the object cannot be clearly recognized visually. Say. Mosaic processing, blurring processing, and fog processing are examples of the blurring processing, but other techniques may be used. Alternatively, the content generation unit 14 may make the teacher invisible by deleting the teacher from the image and restoring the background of the teacher's portion (that is, the portion hidden by the teacher). This process is also called retouching.

複合モードおよび仮想モードのいずれにおいても、コンテンツ生成部１４は一または複数の関連仮想オブジェクトをさらに配置し得る。それぞれの関連仮想オブジェクトの配置方法は限定されない。例えば、コンテンツ生成部１４は関連仮想オブジェクトを、対応する現実オブジェクトに重畳するように配置してもよい。あるいは、コンテンツ生成部１４は関連仮想オブジェクトを、対応する現実オブジェクトに重畳させることなく、またはほとんど重畳させることなく、配置してもよい。いずれにしても、コンテンツ生成部１４はローデータに基づいて個々の関連仮想オブジェクトの仕様を決定する。例えば、コンテンツ生成部１４は関連仮想オブジェクトの位置、寸法、（および、もしあれば動作）を設定する。 In both the composite mode and the virtual mode, the content generation unit 14 may further arrange one or more related virtual objects. The method of arranging each related virtual object is not limited. For example, the content generation unit 14 may arrange the related virtual object so as to be superimposed on the corresponding real object. Alternatively, the content generation unit 14 may arrange the related virtual object without superimposing it on the corresponding real object or almost without superimposing it. In any case, the content generation unit 14 determines the specification of each related virtual object based on the raw data. For example, the content generator 14 sets the position, size, (and motion, if any) of the associated virtual object.

複合モードおよび仮想モードのいずれにおいても、コンテンツ生成部１４は、仮想空間にアバター（および関連仮想オブジェクト）が配置された仮想空間を示す教育用コンテンツデータを生成する。教育用コンテンツデータは、原画像データに対応する音声データを含んでもよい。教育用コンテンツデータの生成方法およびデータ構造は限定されない。例えば、コンテンツ生成部１４は、仮想空間と個々のオブジェクトの位置、寸法、および動作（姿勢）とを示す仮想空間データを含む教育用コンテンツデータを生成してもよい。あるいは、コンテンツ生成部１４は、設定された仮想空間に基づくレンダリングを実行することで教育用コンテンツデータを生成してもよい。この場合には、教育用コンテンツデータは、アバター（および関連仮想オブジェクト）を含むコンテンツ画像そのものを示す。一例では、コンテンツ生成部１４は、原画像から得られる実写画像領域と、仮想オブジェクト（アバター、および、もしあれば関連仮想オブジェクト）とを組み合わせることで教育用コンテンツデータを生成する。この教育用コンテンツデータは、原画像で示される現実世界と仮想オブジェクト（アバター、および、もしあれば関連仮想オブジェクト）との合成画像を表現する。 In both the composite mode and the virtual mode, the content generation unit 14 generates educational content data indicating a virtual space in which an avatar (and related virtual object) is placed in the virtual space. The educational content data may include audio data corresponding to the original image data. The educational content data generation method and data structure are not limited. For example, the content generation unit 14 may generate educational content data including virtual space data indicating the virtual space and the positions, dimensions, and movements (postures) of individual objects. Alternatively, the content generation unit 14 may generate the educational content data by executing rendering based on the set virtual space. In this case, the educational content data indicates the content image itself including the avatar (and related virtual object). In one example, the content generation unit 14 generates educational content data by combining a captured image area obtained from an original image and a virtual object (avatar and related virtual object, if any). This educational content data represents a composite image of the real world represented by the original image and a virtual object (avatar and related virtual object, if any).

ステップＳ１０８では、出力部１５が教育用コンテンツデータを出力する。ステップＳ１３と同様に、教育用コンテンツデータの出力方法は限定されない。したがって、出力部１５は教育用コンテンツデータを、１以上の生徒端末２０に向けて送信してもよいし、コンテンツデータベース５０に格納し、これらの送信および格納の双方を実行してもよい。 In step S108, the output unit 15 outputs the educational content data. Similar to step S13, the method of outputting the educational content data is not limited. Therefore, the output unit 15 may transmit the educational content data to one or more student terminals 20 or may store the educational content data in the content database 50 and execute both the transmission and the storage.

出力部１５が教育用コンテンツデータを生徒端末２０に向けて送信した場合には、生徒端末２０では、受信部２２がその教育用コンテンツデータを受信し、表示制御部２３がその教育用コンテンツデータを処理して、教育用コンテンツを表示装置上に表示する。サーバ１０でレンダリングが実行されていない場合には、表示制御部２３は教育用コンテンツデータに基づくレンダリングを実行することでコンテンツ画像を表示する。教育用コンテンツデータがコンテンツ画像そのものを示す場合には、表示制御部２３はそのコンテンツ画像をそのまま表示する。生徒端末２０は、コンテンツ画像の表示に合わせて音声をスピーカから出力する。 When the output unit 15 transmits the educational content data to the student terminal 20, the receiving unit 22 of the student terminal 20 receives the educational content data, and the display control unit 23 transmits the educational content data. Process and display educational content on a display device. When the server 10 has not executed the rendering, the display control unit 23 displays the content image by executing the rendering based on the educational content data. When the educational content data indicates the content image itself, the display control unit 23 displays the content image as it is. The student terminal 20 outputs sound from the speaker in accordance with the display of the content image.

教育用コンテンツがライブコンテンツである場合、または原画像データベース４０内の映像コンテンツが処理される場合には、ステップＳ１４またはＳ１５は繰り返し実行される。ステップＳ１４またはＳ１５は各フレーム画像に対して実行されてもよいし、一連の複数個のフレーム画像に対して実行されてもよい。当然ながら時間経過に伴って教師は動き、教育用コンテンツ内のアバターはそれに対応して動く。また、場合によっては、教師の動きに関連して関連仮想オブジェクトが表示される。 When the educational content is live content or when the video content in the original image database 40 is processed, step S14 or S15 is repeatedly executed. Step S14 or S15 may be executed for each frame image or may be executed for a series of a plurality of frame images. Of course, the teacher moves over time, and the avatar in the educational content moves correspondingly. Further, in some cases, the related virtual object is displayed in association with the movement of the teacher.

図７は複合モードおよび仮想モードにおいてアバターの仕様を決める一例を示す図である。モーション特定部１３は原画像に基づいて、教師９０の動き（例えば、姿勢、表情）に対応する複数のジョイント５０１および複数のボーン５０２を推定することで、教師９０の３次元の動作を特定する（ステップＳ１２）。そして、モーション特定部１３は特定された動作を示すローデータ（モーションデータ）を生成する（ステップＳ１４）。ここで、図７の中央はジョイント５０１およびボーン５０２の理解を助けるための便宜的な描画であり、コンテンツ配信システム１においてこの描画が必須であることを意図するものではないことに留意されたい。コンテンツ生成部１４はそのローデータ（モーションデータ）とアバターのモデルデータとに基づいて、教師９０と同じ動作を行うアバター９２を設定する（ステップＳ１７）。 FIG. 7 is a diagram showing an example of determining avatar specifications in the composite mode and the virtual mode. The motion specifying unit 13 specifies a three-dimensional motion of the teacher 90 by estimating a plurality of joints 501 and a plurality of bones 502 corresponding to the movement (for example, posture, facial expression) of the teacher 90 based on the original image. (Step S12). Then, the motion specifying unit 13 generates raw data (motion data) indicating the specified motion (step S14). It should be noted that the center of FIG. 7 is a convenient drawing for helping understanding of the joint 501 and the bone 502, and it is not intended that the drawing is essential in the content distribution system 1. The content generation unit 14 sets an avatar 92 that performs the same operation as the teacher 90 based on the raw data (motion data) and the avatar model data (step S17).

図８〜図１１を参照しながら教育用コンテンツの例を説明する。いずれの例でも、原画像４０１が、ボード９１の前に教師９０が立っている場面を示すものとする。 An example of educational content will be described with reference to FIGS. In either example, the original image 401 indicates a scene in which the teacher 90 stands in front of the board 91.

図８は実写モードでの教育用コンテンツの一例を示す図である。実写モードではコンテンツ配信システム１は原画像データをそのまま教育用コンテンツデータとして出力するので、原画像４０１がそのまま教育用コンテンツ４０２として提供される。 FIG. 8 is a diagram showing an example of educational contents in the live-action mode. In the live-action mode, the content distribution system 1 outputs the original image data as it is as the educational content data, so that the original image 401 is provided as it is as the educational content 402.

図９は複合モードでの教育用コンテンツの一例を示す図である。この例では、コンテンツ配信システム１はアバターウィンドウ４０４と教師ウィンドウ４０５とによってアバター９２および教師９０の双方を視認可能にした教育用コンテンツ４０３を提供する。アバターウィンドウ４０４の面積は教師ウィンドウ４０５の面積よりも大きく、教師ウィンドウ４０５はアバターウィンドウ４０４の縁部（図９の例では右上）に表示されている。コンテンツ配信システム１（コンテンツ生成部１４）は、アバターウィンドウ４０４とは異なる画面領域として教師ウィンドウ４０５を設定してもよい。あるいは、コンテンツ配信システム１（コンテンツ生成部１４）は、原画像４０１をアバターの背景の一要素として仮想空間内に配置し、その仮想空間を写した画像をアバターウィンドウ４０４として設定してもよい。この場合には、仮想空間内に配置された原画像４０１が教師ウィンドウ４０５として表示される。 FIG. 9 is a diagram showing an example of educational contents in the composite mode. In this example, the content distribution system 1 provides the educational content 403 in which both the avatar 92 and the teacher 90 are visible through the avatar window 404 and the teacher window 405. The area of the avatar window 404 is larger than the area of the teacher window 405, and the teacher window 405 is displayed at the edge of the avatar window 404 (upper right in the example of FIG. 9). The content distribution system 1 (content generation unit 14) may set the teacher window 405 as a screen area different from the avatar window 404. Alternatively, the content distribution system 1 (content generation unit 14) may arrange the original image 401 in the virtual space as one element of the background of the avatar, and set an image showing the virtual space as the avatar window 404. In this case, the original image 401 arranged in the virtual space is displayed as the teacher window 405.

図１０は複合モードでの教育用コンテンツの別の例を示す図である。この例では、コンテンツ配信システム１は原画像４０１に基づく仮想空間内の別々の位置にアバター９２および教師９０を配置し、この仮想空間を写した教育用コンテンツ４０６を提供する。 FIG. 10 is a diagram showing another example of educational contents in the composite mode. In this example, the content distribution system 1 arranges the avatar 92 and the teacher 90 at different positions in the virtual space based on the original image 401, and provides the educational content 406 that copies this virtual space.

図１１は仮想モードでの教育用コンテンツの一例を示す図である。この例では、コンテンツ配信システム１は教師９０に重畳するようにアバター９２を配置することで、教師９０は視認不可だがアバター９２は視認可能である教育用コンテンツ４０８を提供する。 FIG. 11 is a diagram showing an example of educational contents in the virtual mode. In this example, the content distribution system 1 arranges the avatar 92 so as to be superimposed on the teacher 90, thereby providing the educational content 408 in which the teacher 90 is invisible but the avatar 92 is visible.

図９〜図１１に示す教育用コンテンツ４０３，４０６，４０８のいずれにおいてもアバター９２の仕様は教師９０に対応するので、生徒は、教師９０と同じ動作を行うアバター９２を見ることができる。 In all of the educational contents 403, 406, and 408 shown in FIGS. 9 to 11, the specifications of the avatar 92 correspond to the teacher 90, so that the student can see the avatar 92 performing the same operation as the teacher 90.

図９〜図１１の例において、モーション特定部１３は、教師９０によってボード９１上に書かれた手書きの単語「Ｔｈｉｓ」を関連現実オブジェクトとして特定し（ステップＳ１３）、この単語を含むモーションデータを生成してもよい（ステップＳ１４）。モーション特定部１３は手書きされた単語「Ｔｈｉｓ」をテキストデータ（文字列データ）としてモーションデータに含めてもよいし、手書きされた個々の文字の特徴点を抽出してその特徴点の座標の集合を単語「Ｔｈｉｓ」のモーションデータとして設定してもよい。コンテンツ生成部１４はこのようなモーションデータに基づいて、手書きの単語「Ｔｈｉｓ」に対応する関連仮想オブジェクトを含む教育用コンテンツデータを生成する（ステップＳ１７）。生徒端末２０がその教育用コンテンツデータを表示することで、生徒は新たなまたは追加の視覚効果を伴う単語「Ｔｈｉｓ」（例えば、装飾された手書き文字「Ｔｈｉｓ」、手書きからＣＧに置き換えられた「Ｔｈｉｓ」など）を見ることができる。 In the examples of FIGS. 9 to 11, the motion specifying unit 13 specifies the handwritten word “This” written on the board 91 by the teacher 90 as the related reality object (step S13), and sets the motion data including this word. It may be generated (step S14). The motion identifying unit 13 may include the handwritten word “This” as text data (character string data) in the motion data, or may extract the feature points of each handwritten character and collect the coordinates of the feature points. May be set as the motion data of the word “This”. The content generation unit 14 generates educational content data including a related virtual object corresponding to the handwritten word "This" based on such motion data (step S17). The student terminal 20 displays the educational content data so that the student can see the word "This" with a new or additional visual effect (eg, decorated handwriting "This", handwriting replaced with CG. "This").

上述したようにコンテンツの生成および配信の手法は限定されない。図１２はコンテンツ配信システム１による教育用コンテンツの提供の様々な例を示すシーケンス図である。図１２の例（ａ）は、教育用コンテンツをリアルタイムに配信する場合、すなわちライブ配信またはインターネット生放送の場合におけるコンテンツ配信を処理フローＳ２として示す。処理フローＳ２では、教師端末３０が、教師が授業を行う場面を撮像部２０７によって撮影し（ステップＳ２１）、送信部３１がその撮影によって得られた映像データ（原画像データ）をサーバ１０に向けて送信する（ステップＳ２２）。サーバ１０はその映像データに対して処理フローＳ１を実行し、教育用コンテンツデータを生徒端末２０に向けて送信する（ステップＳ２３）。生徒端末２０はその教育用コンテンツデータを受信および表示する（ステップＳ２４）。 As described above, the method of generating and delivering the content is not limited. FIG. 12 is a sequence diagram showing various examples of providing educational content by the content distribution system 1. In the example (a) of FIG. 12, content distribution in the case of delivering educational content in real time, that is, live distribution or live Internet broadcasting is shown as a processing flow S2. In the processing flow S2, the teacher terminal 30 shoots the scene in which the teacher gives a lesson by the imaging unit 207 (step S21), and the transmission unit 31 directs the video data (original image data) obtained by the shooting to the server 10. And transmits (step S22). The server 10 executes the processing flow S1 for the video data, and transmits the educational content data to the student terminal 20 (step S23). The student terminal 20 receives and displays the educational content data (step S24).

図１２の例（ａ）では、教師端末３０が授業を撮影している間において処理フローＳ２が繰り返し実行される（言い換えると、映像データを構成する個々のフレーム画像について処理フローＳ２が実行される）。生徒は、仮想モードでは、あたかも教師に代わってアバターが教えているような授業をリアルタイムに視聴でき、複合モードでは、教師およびアバターが一緒に教えているような授業をリアルタイムに視聴できる。アバターの動作は原画像を解析することで決定されるので、教師はモーションキャプチャ用の装置を身に付けることなく、普段の服装のままで授業を行えばよい。 In the example (a) of FIG. 12, the processing flow S2 is repeatedly executed while the teacher terminal 30 is photographing the lesson (in other words, the processing flow S2 is executed for each frame image forming the video data). ). In the virtual mode, the student can view the class as if the avatar is teaching on behalf of the teacher in real time, and in the composite mode, the class as the teacher and the avatar are teaching together can be viewed in real time. Since the movement of the avatar is determined by analyzing the original image, the teacher can carry out the lesson in his / her usual clothes without wearing a device for motion capture.

図１２の例（ｂ）は、過去に撮影された映像を処理して教育用コンテンツを配信する場合を処理フローＳ３として示す。処理フローＳ３では、サーバ１０は過去に撮影された授業を示す映像データ（原画像データ）を原画像データベース４０から読み出し（ステップＳ３１）、その映像データに対して処理フローＳ１を実行し、教育用コンテンツデータを生徒端末２０に向けて送信する（ステップＳ３２）。生徒端末２０はその教育用コンテンツデータを受信および表示する（ステップＳ３３）。サーバ１０が教育用コンテンツデータを生徒端末２０に向けて送信するタイミングは限定されない。例えば、サーバ１０は、映像データを構成するすべてのフレーム画像について処理フローＳ１を実行した後に、教育用コンテンツデータを送信してもよい。あるいは、サーバ１０は、それぞれのフレーム画像について処理フローＳ１を実行する度に、該フレーム画像に対応する教育用コンテンツデータを送信してもよい。 The example (b) of FIG. 12 shows a case where the video captured in the past is processed and the educational content is distributed as a processing flow S3. In the processing flow S3, the server 10 reads video data (original image data) showing a lesson taken in the past from the original image database 40 (step S31), executes the processing flow S1 on the video data, and uses it for educational purposes. The content data is transmitted to the student terminal 20 (step S32). The student terminal 20 receives and displays the educational content data (step S33). The timing at which the server 10 transmits the educational content data to the student terminal 20 is not limited. For example, the server 10 may transmit the educational content data after executing the processing flow S1 for all the frame images forming the video data. Alternatively, the server 10 may transmit the educational content data corresponding to each frame image every time the processing flow S1 is executed.

図１２の例（ｃ）は、過去に撮影された映像を処理して教育用コンテンツを保存する場合を処理フローＳ４として示す。処理フローＳ４では、サーバ１０は過去に撮影された授業を示す映像データ（原画像データ）を原画像データベース４０から読み出し（ステップＳ４１）、その映像データに対して処理フローＳ１を実行し、教育用コンテンツデータをコンテンツデータベース５０に格納する（ステップＳ４２）。例えば、サーバ１０は、映像データを構成するすべてのフレーム画像について処理フローＳ１を実行した後に、教育用コンテンツデータを格納してもよい。生徒端末２０はそのコンテンツデータベース５０に任意のタイミングでアクセスして教育用コンテンツを受信および表示することができる（ステップＳ４３，Ｓ４４）。 The example (c) of FIG. 12 shows a case where the video captured in the past is processed and the educational content is stored as a processing flow S4. In the processing flow S4, the server 10 reads out video data (original image data) showing a lesson taken in the past from the original image database 40 (step S41), executes the processing flow S1 on the video data, and uses it for educational The content data is stored in the content database 50 (step S42). For example, the server 10 may store the educational content data after executing the processing flow S1 for all the frame images forming the video data. The student terminal 20 can access the content database 50 at any timing to receive and display the educational content (steps S43 and S44).

生徒端末２０への教育用コンテンツの提供方法は限定されない。例えば、教育用コンテンツはサーバ１０を経由して生徒端末２０に提供されてもよいし、サーバ１０とは異なるコンピュータまたはコンピュータシステムを経由して提供されてもよい。サーバ１０が教育用コンテンツを提供する場合には、生徒端末２０は生徒の操作に応答して、教育用コンテンツを取得するためのデータ信号であるコンテンツ要求をサーバ１０に向けて送信する。サーバ１０はそのコンテンツ要求を受信し、該要求で示される教育用コンテンツデータをコンテンツデータベース５０から読み出し、その教育用コンテンツデータを生徒端末２０に向けて送信する。教育用コンテンツデータの送信方法は限定されず、例えばストリーミング配信でもよいしダウンロードでもよい。 The method of providing the educational content to the student terminal 20 is not limited. For example, the educational content may be provided to the student terminal 20 via the server 10, or may be provided via a computer or computer system different from the server 10. When the server 10 provides the educational content, the student terminal 20 transmits a content request, which is a data signal for acquiring the educational content, to the server 10 in response to the operation of the student. The server 10 receives the content request, reads out the educational content data indicated by the request from the content database 50, and transmits the educational content data to the student terminal 20. The method of transmitting the educational content data is not limited, and may be streaming distribution or download, for example.

図１２の例（ｂ），（ｃ）はいずれも、過去に撮影されまたは利用された映像コンテンツの利用または再利用であるといえる。教師が授業を教える場面を映した教育用の映像コンテンツは世の中に多く存在する。コンテンツ配信システム１を用いることでその膨大な映像コンテンツを、アバターを用いたさらに魅力的な映像コンテンツに変換することが可能になる。 It can be said that each of the examples (b) and (c) of FIG. 12 is the use or reuse of the video content photographed or used in the past. There are many educational video contents in the world that show teachers teaching lessons. By using the content distribution system 1, it becomes possible to convert the enormous amount of video content into more attractive video content using an avatar.

コンテンツの生成および配信の手法は図１２の例に限定されず、さらに別の処理フローが採用されてもよい。いずれにしても、コンテンツ配信システム１は、ライブ配信（インターネット生放送）、タイムシフト配信、オンデマンド配信などの様々な配信手法に適用することができる。 The method of content generation and distribution is not limited to the example of FIG. 12, and another processing flow may be adopted. In any case, the content distribution system 1 can be applied to various distribution methods such as live distribution (Internet live broadcasting), time shift distribution, and on-demand distribution.

図１３は、教師端末３０上に表示される補助画像４１０の例を示す図である。図３の例ではこの補助画像４１０はモニタ２１２上に表示される。補助画像４１０は３人の生徒（視聴者）を示す。補助画像４１０の構成は限定されない。例えば、補助画像４１０は個々の生徒端末２０で撮影された生徒の映像または写真の集合によって構成されてもよい。図１３では、補助画像４１０は、３台の生徒端末２０に対応する３人の生徒の画像４１１，４１２，４１３の集合である。あるいは、補助画像４１０は、個々の生徒の映像または写真を合成することで得られる一つの映像または画像であってもよい。あるいは、補助画像４１０は一つの部屋内にいる複数の生徒を写す一つの映像または写真であってもよい。個々の生徒は実写画像で表現されてもよいし、生徒と同じように動くアバターで表現されてもよいし、生徒の動きと連動しない静止画によって表現されてもよい。 FIG. 13 is a diagram showing an example of the auxiliary image 410 displayed on the teacher terminal 30. In the example of FIG. 3, this auxiliary image 410 is displayed on the monitor 212. The auxiliary image 410 shows three students (viewers). The configuration of the auxiliary image 410 is not limited. For example, the auxiliary image 410 may be composed of a set of images or photographs of the students photographed by the individual student terminals 20. In FIG. 13, the auxiliary image 410 is a set of images 411, 412, 413 of three students corresponding to the three student terminals 20. Alternatively, the auxiliary image 410 may be a single image or image obtained by combining images or photographs of individual students. Alternatively, the auxiliary image 410 may be a single image or picture showing multiple students in a room. Each student may be represented by a live-action image, an avatar that moves in the same manner as the student, or a still image that is not linked to the movement of the student.

補助画像の生成方法は限定されない。サーバ１０の補助画像生成部１６は、１以上の生徒端末２０から送信されてきた生徒画像データに基づいて補助画像データを生成してもよい。生徒画像データとは、生徒を写した画像の電子データのことをいう。補助画像生成部１６は各生徒端末２０からの生徒画像データをそのまま補助画像に埋め込むことで補助画像データを生成してもよいし、該生徒画像データをそのまま補助画像データとして設定してもよい。あるいは、補助画像生成部１６は生徒端末２０から生徒画像データを取得することなく補助画像データを生成してもよい。補助画像生成部１６は生成した補助画像データを教師端末３０に向けて送信する。補助画像生成部１６は、生徒端末２０で録音された音声を示す音声データを該生徒端末２０から受信してその音声データを補助画像データに関連付けてもよい。教師端末３０では受信部３２がその補助画像データを受信し、表示制御部３３がその補助画像データを処理して補助画像を表示する。補助画像は生徒の現在の状況を映すライブ映像であってもよく、この場合には、補助画像生成部１６は補助映像の個々のフレーム画像を生成および送信し、教師端末３０がその個々のフレーム画像を順番に表示する。教師端末３０はサーバ１０から受信した音声データを処理して生徒の音声を出力してもよい。 The method of generating the auxiliary image is not limited. The auxiliary image generation unit 16 of the server 10 may generate auxiliary image data based on the student image data transmitted from one or more student terminals 20. Student image data refers to electronic data of an image of a student. The auxiliary image generation unit 16 may generate the auxiliary image data by embedding the student image data from each student terminal 20 in the auxiliary image as it is, or may set the student image data as it is as the auxiliary image data. Alternatively, the auxiliary image generation unit 16 may generate the auxiliary image data without acquiring the student image data from the student terminal 20. The auxiliary image generation unit 16 transmits the generated auxiliary image data to the teacher terminal 30. The auxiliary image generation unit 16 may receive voice data indicating a voice recorded by the student terminal 20 from the student terminal 20 and associate the voice data with the auxiliary image data. In the teacher terminal 30, the receiving unit 32 receives the auxiliary image data, and the display control unit 33 processes the auxiliary image data and displays the auxiliary image. The auxiliary image may be a live video showing the current situation of the student. In this case, the auxiliary image generation unit 16 generates and transmits each frame image of the auxiliary video, and the teacher terminal 30 causes the individual frame to be transmitted. Display images in sequence. The teacher terminal 30 may process the voice data received from the server 10 and output the voice of the student.

［効果］
以上説明したように、本開示の一側面に係るコンテンツ制御システムは、少なくとも一つのプロセッサを備える。少なくとも一つのプロセッサのうちの少なくとも一つは、教師が授業を行う場面を写した原画像データを取得する。少なくとも一つのプロセッサのうちの少なくとも一つは、原画像データに基づいて、教師の動作を示すモーションデータを生成する。少なくとも一つのプロセッサのうちの少なくとも一つが、モーションデータに基づいて、教師に対応するアバターの仕様を決定する。少なくとも一つのプロセッサのうちの少なくとも一つは、決定された仕様に基づくアバターを教師とは異なる位置に配置することで、授業を受ける生徒のための教育用コンテンツデータを生成する。少なくとも一つのプロセッサのうちの少なくとも一つは、生成された教育用コンテンツデータを出力する。 [effect]
As described above, the content control system according to one aspect of the present disclosure includes at least one processor. At least one of the at least one processor acquires original image data showing a scene in which the teacher gives a lesson. At least one of the at least one processor generates motion data indicating the motion of the teacher based on the original image data. At least one of the at least one processor determines a specification of the avatar corresponding to the teacher based on the motion data. At least one of the at least one processor arranges the avatar based on the determined specifications at a position different from the position of the teacher to generate educational content data for the student taking the lesson. At least one of the at least one processor outputs the generated educational content data.

本開示の一側面に係るコンテンツ制御方法は、教師が授業を行う場面を写した原画像データを取得するステップと、原画像データに基づいて、教師の動作を示すモーションデータを生成するステップと、モーションデータに基づいて、教師に対応するアバターの仕様を決定するステップと、決定された仕様に基づくアバターを教師とは異なる位置に配置することで、授業を受ける生徒のための教育用コンテンツデータを生成するステップと、生成された教育用コンテンツデータを出力するステップとを含む。 A content control method according to one aspect of the present disclosure includes a step of acquiring original image data showing a scene where a teacher conducts a lesson, a step of generating motion data indicating a teacher's action based on the original image data, By determining the specification of the avatar corresponding to the teacher based on the motion data, and by arranging the avatar based on the determined specification in a position different from the teacher, educational content data for students taking classes can be obtained. It includes a step of generating and a step of outputting the generated educational content data.

本開示の一側面に係るコンテンツ制御プログラムは、教師が授業を行う場面を写した原画像データを取得するステップと、原画像データに基づいて、教師の動作を示すモーションデータを生成するステップと、モーションデータに基づいて、教師に対応するアバターの仕様を決定するステップと、決定された仕様に基づくアバターを教師とは異なる位置に配置することで、授業を受ける生徒のための教育用コンテンツデータを生成するステップと、生成された教育用コンテンツデータを出力するステップとをコンピュータに実行させる。 A content control program according to one aspect of the present disclosure includes a step of acquiring original image data showing a scene where a teacher conducts a lesson, a step of generating motion data indicating a teacher's action based on the original image data, By determining the specification of the avatar corresponding to the teacher based on the motion data, and by arranging the avatar based on the determined specification in a position different from the teacher, educational content data for students taking classes can be obtained. The computer is caused to execute the step of generating and the step of outputting the generated educational content data.

このような側面においては、対応し合う教師およびアバターを表現する教育用コンテンツデータが生成される。特許文献１に記載されていない構成を有するこの教育用コンテンツデータを用いることで該コンテンツの視覚効果の向上が期待でき、その結果、生徒の興味をこのコンテンツに惹きつけることが可能になる。言い換えると、教育用コンテンツの興趣性を高めることができる。 In such an aspect, educational content data expressing corresponding teachers and avatars is generated. By using this educational content data having a configuration not described in Patent Document 1, the visual effect of the content can be expected to be improved, and as a result, the interest of students can be attracted to this content. In other words, the interest in educational content can be enhanced.

単純に教師をアバターに置き換えるのではなく、教師およびアバターの双方が映った教育用コンテンツを提供することで、コンテンツの視覚効果を高めることができる。その結果、生徒が教育用コンテンツに親しみを持ったり面白さを感じたりすることが期待でき、ひいては、授業を受ける生徒のモチベーションを維持または向上につながり得る。一方、教師などの配信者の立場からすると、モーションキャプチャ用の装置を身に着ける必要が無いので、その特別な装置の購入または利用に必要な費用を掛けることなく、普段と同様に授業を行うことができる。 The visual effect of the content can be enhanced by providing educational content in which both the teacher and the avatar are reflected, instead of simply replacing the teacher with the avatar. As a result, it can be expected that the students will find the educational content familiar and interesting, which in turn can lead to maintaining or improving the motivation of the students who take the lessons. On the other hand, from the standpoint of a distributor such as a teacher, it is not necessary to wear a device for motion capture, so classes are conducted as usual without incurring the costs necessary to purchase or use that special device. be able to.

さらに、教師をアバターに置き換えることを想定していなかった過去の画像からも教育用コンテンツデータを生成できるので、過去の膨大な実写画像を、アバターを用いた教育用コンテンツに変換して、そのライブラリを利用または再利用することが可能になる。 Furthermore, because educational content data can be generated from past images that were not intended to replace teachers with avatars, huge live-action images of the past were converted to educational content using avatars, and the Can be used or reused.

他の側面に係るコンテンツ制御システムでは、少なくとも一つのプロセッサのうちの少なくとも一つが、生成された教育用コンテンツデータを生徒の生徒端末に向けて送信することで、該生徒端末上に該教育用コンテンツデータを表示させてもよい。この処理によって、アバターを含む教育用コンテンツデータを生徒に見せることができる。 In the content control system according to another aspect, at least one of the at least one processor transmits the generated educational content data to the student terminal of the student, so that the educational content is displayed on the student terminal. The data may be displayed. By this processing, the educational content data including the avatar can be shown to the students.

他の側面に係るコンテンツ制御システムでは、教師に対応するアバターの動作が、教師と同じ動作であってもよい。アバターに教師と同じ動作を取らせることで、教育用コンテンツの視覚効果を高めることができる。 In the content control system according to another aspect, the action of the avatar corresponding to the teacher may be the same action as the teacher. By making the avatar perform the same action as the teacher, the visual effect of educational content can be enhanced.

他の側面に係るコンテンツ制御システムでは、少なくとも一つのプロセッサのうちの少なくとも一つが、アバターが映る第１ウィンドウと、教師が映る第２ウィンドウとを含む画面を生成することで、アバターを教師とは異なる位置に配置してもよい。画面分割のような手法を採用することで、教育用コンテンツの視覚効果を高めることができる。 In the content control system according to another aspect, at least one of the at least one processor generates a screen including a first window in which the avatar appears and a second window in which the teacher appears, thereby making the avatar the teacher. You may arrange | position in a different position. By adopting a technique such as screen division, the visual effect of educational contents can be enhanced.

他の側面に係るコンテンツ制御システムでは、少なくとも一つのプロセッサのうちの少なくとも一つが、第１ウィンドウの面積を第２ウィンドウの面積よりも大きくし、第１ウィンドウの縁部に第２ウィンドウを配置してもよい。アバターが映る第１ウィンドウがメインウィンドウであり、教師が映る第２ウィンドウがサブウィンドウであるように教育用コンテンツデータを構成することで、教育用コンテンツの視覚効果を高めることができる。 In a content control system according to another aspect, at least one of the at least one processor makes the area of the first window larger than the area of the second window and arranges the second window at the edge of the first window. May be. By configuring the educational content data such that the first window showing the avatar is the main window and the second window showing the teacher is the sub window, the visual effect of the educational content can be enhanced.

他の側面に係るコンテンツ制御システムでは、少なくとも一つのプロセッサのうちの少なくとも一つが、仮想空間において第２ウィンドウをアバターよりも奥に位置させてもよい。このようにアバターと第２ウィンドウとの位置関係を３次元的に規定することで、第２ウィンドウをアバターの背景の一部として表示させることができる。このような視覚効果によって、教育用コンテンツの視覚効果を高めることができる。 In the content control system according to another aspect, at least one of the at least one processor may position the second window behind the avatar in the virtual space. By thus three-dimensionally defining the positional relationship between the avatar and the second window, the second window can be displayed as a part of the background of the avatar. With such a visual effect, the visual effect of the educational content can be enhanced.

他の側面に係るコンテンツ制御システムでは、教育用コンテンツデータで示される教育用コンテンツが、教師およびアバターの双方が視認可能である複合モードと、教師は視認不可だがアバターは視認可能である仮想モードとの間で切替可能であってもよい。少なくとも一つのプロセッサのうちの少なくとも一つは、複合モードにおいて、決定された仕様に基づくアバターを教師とは異なる位置に配置してもよい。このような２種類の表示モードを提供することで教育用コンテンツの視覚効果を高めることができる。 In the content control system according to another aspect, the educational content indicated by the educational content data is in a composite mode in which both the teacher and the avatar can see, and a virtual mode in which the teacher cannot see but the avatar can see. It may be switchable between. At least one of the at least one processor may place the avatar based on the determined specifications in a different position than the teacher in the combined mode. Providing such two kinds of display modes can enhance the visual effect of educational contents.

他の側面に係るコンテンツ制御システムでは、少なくとも一つのプロセッサのうちの少なくとも一つが、仮想モードにおいて、決定された仕様に基づくアバターを教師に重畳するように配置することで教師を視認不可にしてもよい。この場合には、あたかも教師がアバターに置き換わったかのような視覚効果を生み出すことができる。 In the content control system according to another aspect, at least one of the at least one processor makes the teacher invisible by arranging the avatar based on the determined specification so as to be superimposed on the teacher in the virtual mode. Good. In this case, it is possible to create a visual effect as if the teacher replaced the avatar.

他の側面に係るコンテンツ制御システムでは、少なくとも一つのプロセッサのうちの少なくとも一つが、仮想モードにおいて、決定された仕様に基づくアバターを教師とは異なる位置に配置し、画像データで示される教師に対して不鮮明化処理を実行することで、教師を視認不可にしてもよい。この場合には、この場合には、教師の存在をコンテンツ上に残しつつアバターを表示するという視覚効果を生み出すことができる。 In the content control system according to another aspect, at least one of the at least one processor arranges the avatar based on the determined specification in a position different from the teacher in the virtual mode, and The teacher may be made invisible by executing the blurring process. In this case, in this case, the visual effect of displaying the avatar while leaving the presence of the teacher on the content can be produced.

［変形例］
以上、本開示の実施形態に基づいて詳細に説明した。しかし、本開示は上記実施形態に限定されるものではない。本開示は、その要旨を逸脱しない範囲で様々な変形が可能である。 [Modification]
The foregoing has been a detailed description based on the embodiment of the present disclosure. However, the present disclosure is not limited to the above embodiment. The present disclosure can be variously modified without departing from the gist thereof.

上記実施形態ではコンテンツ配信システム１がサーバ１０を用いて構成されたが、コンテンツ制御システムは、サーバ１０を用いないユーザ端末間の直接配信に適用されてもよい。この場合には、サーバ１０の各機能要素はいずれかのユーザ端末に実装されてもよく、例えば、配信者端末および視聴者端末のいずれか一方に実装されてもよい。あるいは、サーバ１０の個々の機能要素は複数のユーザ端末に分かれて実装されてもよく、例えば配信者端末および視聴者端末に分かれて実装されてもよい。これに関連して、コンテンツ制御プログラムはクライアントプログラムとして実現されてもよい。コンテンツ制御システムはサーバを用いて構成されてもよいし、サーバを用いることなく構成されてもよい。 Although the content distribution system 1 is configured using the server 10 in the above-described embodiment, the content control system may be applied to direct distribution between user terminals that do not use the server 10. In this case, each functional element of the server 10 may be installed in any user terminal, for example, may be installed in either one of a distributor terminal and a viewer terminal. Alternatively, the individual functional elements of the server 10 may be installed separately in a plurality of user terminals, for example, in the distributor terminal and the viewer terminal. In this regard, the content control program may be implemented as a client program. The content control system may be configured using a server or may be configured without using the server.

上記実施形態ではコンテンツ制御システムが仮想空間を設定し、その仮想空間内にアバター、（および、もしあれば関連仮想オブジェクト）を配置することで教育用コンテンツデータを生成する。しかし、仮想空間の利用は必須ではない。例えば、コンテンツ制御システムは２次元画像上に２次元表現のアバターを配置することでコンテンツデータ（例えば教育用コンテンツデータ）を生成してもよい。 In the above-described embodiment, the content control system sets a virtual space and arranges avatars (and related virtual objects, if any) in the virtual space to generate educational content data. However, the use of virtual space is not essential. For example, the content control system may generate content data (for example, educational content data) by arranging a two-dimensional representation avatar on a two-dimensional image.

上記実施形態ではコンテンツ制御システムが実写モード、複合モード、および仮想モードという３種類の表示モードを提供するが、コンテンツの表示モードは限定されない。例えば、コンテンツ制御システムは実写モードを提供することなく複合モードおよび仮想モードを提供してもよい。あるいは、コンテンツ制御モードは複合モードのみを提供してもよい。コンテンツ制御システムは、画面分割を用いる第１複合モード（図９に示すような複合モード）と、画面分割を用いない第２複合モード（図１０に示すような複合モード）という２種類の複合モードの間で教育用コンテンツを切り替えてもよい。 In the above embodiment, the content control system provides three types of display modes: a live-action mode, a composite mode, and a virtual mode, but the content display mode is not limited. For example, the content control system may provide composite mode and virtual mode without providing live-action mode. Alternatively, the content control mode may provide only the composite mode. The content control system has two types of composite modes: a first composite mode using screen division (composite mode as shown in FIG. 9) and a second composite mode not using screen division (composite mode as shown in FIG. 10). Educational content may be switched between.

上述したように、コンテンツ制御システムは、教育用コンテンツ以外の任意の種類のコンテンツを制御してもよい。例えば、コンテンツ制御システムはユーザ間の任意の情報伝達またはコミュニケーションを支援するための任意のコンテンツを制御してもよい。 As mentioned above, the content control system may control any type of content other than educational content. For example, the content control system may control any content to support any communication or communication between users.

本開示において、「少なくとも一つのプロセッサが、第１の処理を実行し、第２の処理を実行し、…第ｎの処理を実行する。」との表現、またはこれに対応する表現は、第１の処理から第ｎの処理までのｎ個の処理の実行主体（すなわちプロセッサ）が途中で変わる場合を含む概念である。すなわち、この表現は、ｎ個の処理のすべてが同じプロセッサで実行される場合と、ｎ個の処理においてプロセッサが任意の方針で変わる場合との双方を含む概念である。 In the present disclosure, the expression “at least one processor executes a first process, a second process, ... An nth process”, or a corresponding expression is the following. This is a concept including a case where the execution subjects (that is, processors) of n processes from the first process to the nth process change in the middle. That is, this expression is a concept including both the case where all of the n processes are executed by the same processor and the case where the processors change in an arbitrary process in the n processes.

少なくとも一つのプロセッサにより実行される方法の処理手順は上記実施形態での例に限定されない。例えば、上述したステップ（処理）の一部が省略されてもよいし、別の順序で各ステップが実行されてもよい。また、上述したステップのうちの任意の２以上のステップが組み合わされてもよいし、ステップの一部が修正又は削除されてもよい。あるいは、上記の各ステップに加えて他のステップが実行されてもよい。 The processing procedure of the method executed by at least one processor is not limited to the example in the above embodiment. For example, some of the steps (processes) described above may be omitted, or each step may be executed in a different order. Further, any two or more of the steps described above may be combined, and some of the steps may be modified or deleted. Alternatively, other steps may be executed in addition to the above steps.

１…コンテンツ配信システム、１０…サーバ、１１…コンテンツ管理部、１２…画像取得部、１３…モーション特定部、１４…コンテンツ生成部、１５…出力部、１６…補助画像生成部、２０…生徒端末、２１…要求部、２２…受信部、２３…表示制御部、２４…送信部、３０…教師端末、３１…送信部、３２…受信部、３３…表示制御部、４０…原画像データベース、５０…コンテンツデータベース、９０…教師（配信者）、９２…アバター、４０１…原画像、４０２，４０３，４０６，４０８…教育用コンテンツ、４０４…アバターウィンドウ（第１ウィンドウ）、４０５…教師ウィンドウ（第２ウィンドウ）、４１０…補助画像、Ｐ１…サーバプログラム、Ｐ２…クライアントプログラム。 DESCRIPTION OF SYMBOLS 1 ... Content distribution system, 10 ... Server, 11 ... Content management part, 12 ... Image acquisition part, 13 ... Motion specification part, 14 ... Content generation part, 15 ... Output part, 16 ... Auxiliary image generation part, 20 ... Student terminal , 21 ... requesting unit, 22 ... receiving unit, 23 ... display controlling unit, 24 ... transmitting unit, 30 ... teacher terminal, 31 ... transmitting unit, 32 ... receiving unit, 33 ... display controlling unit, 40 ... original image database, 50 ... content database, 90 ... teacher (distributor), 92 ... avatar, 401 ... original image, 402, 403, 406, 408 ... educational content, 404 ... avatar window (first window), 405 ... teacher window (second) Window), 410 ... auxiliary image, P1 ... server program, P2 ... client program.

Claims

With at least one processor,
At least one of the at least one processor acquires original image data showing a scene where a teacher conducts a lesson,
At least one of the at least one processor, based on the original image data, to generate motion data indicating the motion of the teacher,
At least one of the at least one processor determines a specification of an avatar corresponding to the teacher based on the motion data,
At least one of the at least one processor generates a screen including a first window in which the avatar appears and a second window in which the teacher appears, so that the teacher can obtain the avatar based on the determined specifications. Is arranged at a position different from that of the first window, and the area of the first window is made larger than the area of the second window to generate educational content data for a student taking the lesson,
At least one of the at least one processor outputs the generated educational content data.
Content control system.

At least one of the at least one processor transmits the generated educational content data to the student terminal of the student to display the educational content data on the student terminal,
The content control system according to claim 1.

The action of the avatar corresponding to the teacher is the same action as the teacher,
The content control system according to claim 1.

At least one of the at least one processor, arranging the second window to the edge of the front Symbol first window,
Content control system according to any one of claims 1 to 3.

At least one of the at least one processor positions the second window behind the avatar in a virtual space,
The content control system according to claim 4 .

The educational content indicated by the educational content data can be switched between a composite mode in which both the teacher and the avatar are visible and a virtual mode in which the teacher is invisible but the avatar is visible. And
At least one of the at least one processor arranges the avatar based on the determined specification at a position different from that of the teacher in the composite mode.
Content control system according to any one of claims 1-5.

At least one of the at least one processor renders the teacher invisible in the virtual mode by arranging the avatar based on the determined specifications so as to be superimposed on the teacher.
The content control system according to claim 6 .

In the virtual mode, at least one of the at least one processor arranges the avatar based on the determined specification at a position different from that of the teacher and blurs the teacher indicated by the image data. Making the teacher invisible by executing a process,
The content control system according to claim 6 .

At least one of the at least one processor identifies a region in which the teacher is displayed from the original image data, identifies a two-dimensional motion of the teacher in the region, and determines a plurality of joints corresponding to the motion. Of the joint, and the motion data is generated based on a positional relationship between the joints adjacent to each other and a predetermined motion rule based on rationality and consistency of body movement,
Content control system according to any one of claims 1-8.

A content control method executed by a content control system comprising at least one processor,
A step of acquiring original image data showing a scene in which the teacher gives a lesson,
Generating motion data indicating the motion of the teacher based on the original image data;
Determining specifications of an avatar corresponding to the teacher based on the motion data;
By generating a screen including a first window in which the avatar appears and a second window in which the teacher appears, the avatar based on the determined specifications is arranged at a position different from that of the teacher, and the first window is displayed. Generating educational content data for a student taking the lesson by making the area of the window larger than the area of the second window ;
Outputting the generated educational content data.

A step of acquiring original image data showing a scene in which the teacher gives a lesson,
Generating motion data indicating the motion of the teacher based on the original image data;
Determining specifications of an avatar corresponding to the teacher based on the motion data;
By generating a screen including a first window in which the avatar appears and a second window in which the teacher appears, the avatar based on the determined specifications is arranged at a position different from that of the teacher, and the first window is displayed. Generating educational content data for a student taking the lesson by making the area of the window larger than the area of the second window ;
A content control program for causing a computer to execute the step of outputting the generated educational content data.