JP2021040305A

JP2021040305A - Video playback device, video playback method, and video distribution system

Info

Publication number: JP2021040305A
Application number: JP2020116421A
Authority: JP
Inventors: 暁彦白井; Akihiko Shirai; 洋典山内; Yosuke Yamanouchi
Original assignee: GREE Inc
Current assignee: GREE Inc
Priority date: 2019-08-31
Filing date: 2020-07-06
Publication date: 2021-03-11
Anticipated expiration: 2039-08-31
Also published as: JP6945693B2

Abstract

To provide a video playback device, a video playback method, and a video distribution system that generate an animation of an avatar that moves smoothly by reflecting the movement of each user when a delay occurs in a transmission line in a system that communicates through the avatar that moves according to the movements of the user.SOLUTION: In a video distribution system 1, a video playback device plays back a video received from a video distribution device, transmits motion data including a facial feature amount representing a facial expression feature of a user to the video distribution device, receives a video including an animation of the user's avatar generated from the video distribution device on the basis of the facial feature amount, determines whether the user's facial expression matches a predetermined reference facial expression, and transmits, to the video distribution device, reference facial expression identification data for identifying the reference facial expression determined to match the user's facial expression while a delay occurs in transmission of the motion data to the video distribution device.SELECTED DRAWING: Figure 1

Description

本明細書における開示は、主に、動画再生装置、動画再生方法、及び動画配信システムに関する。 The disclosure in this specification mainly relates to a video playback device, a video playback method, and a video distribution system.

動画を視聴する視聴ユーザがアバタを用いて当該動画に参加することができる動画配信システムが知られている。例えば、特開２０１２−１２００９８号公報（特許文献１）には、配信されている動画に視聴ユーザのアバタを含めることが記載されている。 A video distribution system is known in which a viewing user who watches a video can participate in the video using an avatar. For example, Japanese Patent Application Laid-Open No. 2012-121998 (Patent Document 1) describes that the avatar of the viewing user is included in the distributed moving image.

特開２０１２−１２００９８号公報Japanese Unexamined Patent Publication No. 2012-121998

動画を配信する動画配信サービスにおいては、視聴ユーザからのフィードバックを配信されている動画に反映することで視聴ユーザのエンゲージメントを高めることができる。例えば、上記特許文献１には、視聴ユーザからのコメントを当該視聴ユーザのアバタと関連付けて表示することが記載されている。この特許文献１のシステムでは、視聴ユーザは、コメントを投稿するという方法で配信されている動画へのフィードバックを提供しており、このフィードバックが配信されている動画に反映されている。 In a video distribution service that distributes a video, the engagement of the viewing user can be enhanced by reflecting the feedback from the viewing user in the distributed video. For example, Patent Document 1 describes that a comment from a viewing user is displayed in association with the avatar of the viewing user. In the system of Patent Document 1, the viewing user provides feedback to the video distributed by posting a comment, and this feedback is reflected in the distributed video.

視聴ユーザからのコメントだけでなく視聴ユーザの動作に応じて動くアバタのアニメーションを配信コンテンツに含めることができれば、視聴ユーザのエンゲージメントをさらに高めることができると考えられる。しかしながら、視聴ユーザの動作に応じてアバタを動かすためには、当該視聴ユーザの動画再生装置から動画を生成する動画生成装置に対して、当該視聴ユーザの動作を示すモーションデータをリアルタイムで送信することが必要となる。アバタにユーザの動作を反映した動きを行わせるためには、当該ユーザの動作を示すモーションデータに当該ユーザのボーンの位置及び向きを示すボーンデータを含める必要がある。ボーンの位置及び向きは、３次元ベクトルで表現される。人体の動きを表現するためには、２０本以上のボーンのボーンデータが必要となることもある。 If the distribution content can include not only comments from the viewing user but also an animation of the avatar that moves according to the movement of the viewing user, it is considered that the engagement of the viewing user can be further enhanced. However, in order to move the avatar according to the movement of the viewing user, motion data indicating the movement of the viewing user must be transmitted in real time to the video generation device that generates the video from the video playback device of the viewing user. Is required. In order for the avatar to perform a movement that reflects the user's movement, it is necessary to include bone data indicating the position and orientation of the user's bone in the motion data indicating the user's movement. The position and orientation of the bones are represented by a three-dimensional vector. Bone data of 20 or more bones may be required to express the movement of the human body.

以上のように、ボーンデータのデータ量が大きいため、伝送路の帯域や輻輳の程度によっては、視聴ユーザの姿勢を示すボーンデータを含むモーションデータを遅延なく伝送することは難しい。特に、ユーザ装置からの上り回線は、コンテンツ配信に利用される下り回線よりも伝送容量が小さいため、上り回線を使ってのボーンデータの伝送には遅延が発生しやすい。その結果、視聴ユーザから送信されるボーンデータに基づいて視聴ユーザの動作をリアルタイムで反映したアバタのアニメーションを生成し、その生成したアニメーションを配信中の動画に含めることは難しい。 As described above, since the amount of bone data is large, it is difficult to transmit motion data including bone data indicating the posture of the viewing user without delay depending on the band of the transmission line and the degree of congestion. In particular, since the uplink from the user device has a smaller transmission capacity than the downlink used for content distribution, a delay is likely to occur in the transmission of bone data using the uplink. As a result, it is difficult to generate an avatar animation that reflects the operation of the viewing user in real time based on the bone data transmitted from the viewing user, and to include the generated animation in the video being distributed.

配信中の動画に視聴ユーザがアバタで参加する場合だけでなく、ユーザ同士が自らの動きに応じて動くアバタを介してコミュニケーションを行うシステムにおいても各ユーザのボーンデータを相手ユーザの装置に送信する必要がある。よって、ユーザ同士が自らの動きに応じて動くアバタを介してコミュニケーションを行うシステムにおいては、伝送路において遅延が発生した場合に各ユーザの動作を反映して滑らかに動くアバタのアニメーションを生成することは難しい。 Not only when the viewing user participates in the video being delivered as an avatar, but also in a system where users communicate with each other via an avatar that moves according to their own movement, the bone data of each user is transmitted to the device of the other user. There is a need. Therefore, in a system in which users communicate with each other via avatars that move according to their own movements, it is necessary to generate an animation of avatars that move smoothly by reflecting the movements of each user when a delay occurs in the transmission line. Is difficult.

本開示の目的は、上述した従来技術の問題の少なくとも一部を解決又は緩和する技術的な改善を提供することである。 An object of the present disclosure is to provide technical improvements that solve or alleviate at least some of the problems of the prior art described above.

本発明のより具体的な目的の一つは、従来よりも少ないデータ量の情報に基づいてユーザの姿勢に関連するアバタのアニメーションを生成できるようにすることである。 One of the more specific objects of the present invention is to be able to generate an animation of an avatar related to a user's posture based on information of a smaller amount of data than before.

本明細書の開示の上記以外の目的は、本明細書全体を参照することにより明らかになる。本明細書に開示される発明は、上記の課題に代えて又は上記の課題に加えて、本明細書の発明を実施するための形態の記載から把握される課題を解決するものであってもよい。 Other objectives of the disclosure herein will become apparent by reference to the entire specification. The invention disclosed in the present specification may solve the problem grasped from the description of the mode for carrying out the invention of the present specification in place of or in addition to the above-mentioned problem. Good.

一態様による動画再生装置は、一又は複数のコンピュータプロセッサを備え、前記一又は複数のコンピュータプロセッサは、コンピュータ読み取り可能な命令を実行することにより、動画配信装置から受信した動画を再生し、ユーザの姿勢の特徴を表す姿勢特徴量を含むモーションデータを前記動画配信装置に送信し、前記動画配信装置から前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを含む動画を受信し、前記姿勢特徴量に基づいて前記ユーザの姿勢を分類する分類器により前記ユーザの姿勢が予め定められた基準姿勢に属するか否かを判定し、前記モーションデータの前記動画配信装置への伝送に遅延が発生している間に、前記ユーザの姿勢が属すると判定された前記基準姿勢を識別し前記姿勢特徴量よりも少ないデータ量の基準姿勢識別データを前記動画配信装置に送信する。 The moving image playback device according to one aspect includes one or more computer processors, and the one or more computer processors play back a moving image received from the moving image distribution device by executing a computer-readable instruction, and the user Motion data including a posture feature amount representing a posture feature is transmitted to the video distribution device, and a video including the user's avatar animation generated based on the posture feature amount is received from the video distribution device, and the video is received. A classifier that classifies the user's posture based on the posture feature amount determines whether or not the user's posture belongs to a predetermined reference posture, and there is a delay in transmitting the motion data to the video distribution device. While it is being generated, the reference posture determined to belong to the user's posture is identified, and reference posture identification data having a data amount smaller than the posture feature amount is transmitted to the moving image distribution device.

一態様による動画再生装置において、前記分類器は、前記ユーザの姿勢が前記基準姿勢と合致するか否かを、前記姿勢特徴量を変数として評価関数に基づいて判定する。 In the moving image playback device according to one aspect, the classifier determines whether or not the posture of the user matches the reference posture based on the evaluation function with the posture feature amount as a variable.

一態様における動画再生装置は、前記基準姿勢を表す画像を前記ユーザに提示し、提示された前記画像に対して前記ユーザが取った姿勢を表す姿勢特徴量を教師データとして学習することにより前記分類器を作成する。 The moving image playback device in one embodiment presents an image representing the reference posture to the user, and learns the posture feature amount representing the posture taken by the user with respect to the presented image as teacher data to classify the video. Create a vessel.

一態様における動画再生装置においては、前記ユーザのアバタについて一又は複数の登録アニメーションが登録されている。一態様における動画再生装置は、前記遅延が発生している間に前記基準姿勢識別データに基づいて特定された前記基準姿勢に対応する前記登録アニメーションを含む動画を受信する。 In the moving image playback device in one aspect, one or more registered animations are registered for the user's avatar. The moving image playback device in one aspect receives the moving image including the registered animation corresponding to the reference posture specified based on the reference posture identification data while the delay is occurring.

一態様における動画再生装置は、時系列に取得された前記姿勢特徴量に基づいて前記アバタのユーザアニメーションを生成し、前記ユーザアニメーションを前記登録アニメーションとして登録し、前記ユーザアニメーションを含むサンプル動画を生成し、前記サンプル動画を構成する複数のフレームの中から選択された基準フレームの画像に基づいて前記ユーザアニメーションに対応する前記基準姿勢を決定する。 The moving image playback device in one aspect generates the user animation of the avatar based on the posture feature amount acquired in time series, registers the user animation as the registered animation, and generates a sample moving image including the user animation. Then, the reference posture corresponding to the user animation is determined based on the image of the reference frame selected from the plurality of frames constituting the sample moving image.

一態様における動画再生装置は、所定のフレームレートに基づいて前記ユーザに関する複数の特徴点の各々において前記ユーザに関する前記姿勢特徴量を算出し、第１フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第１ＲＭＳを算出し、前記第１フレームよりも時系列的に後の第２フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第２ＲＭＳを算出し、前記第２フレームよりも時系列的に後の第３フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第３ＲＭＳを算出し、前記第２ＲＭＳと前記第１ＲＭＳとの差である第１ＲＭＳ差の正負と前記第３ＲＭＳと前記第２ＲＭＳとの差である第２ＲＭＳ差の正負とが逆転した場合に、前記第３フレームにおける前記姿勢特徴量に基づいて前記ユーザアニメーションに対応する前記基準姿勢を決定する。 The moving image playback device in one embodiment calculates the posture feature amount related to the user at each of the plurality of feature points related to the user based on a predetermined frame rate, and the posture at each of the plurality of feature points in the first frame. The first RMS, which is the root mean square of the feature amount, is calculated, and in the second frame, which is time series after the first frame, the second RMS, which is the root mean square of the posture feature amount at each of the plurality of feature points, is calculated. Is calculated, and in the third frame after the second frame in chronological order, the root mean square of the posture feature amount at each of the plurality of feature points is calculated, and the second RMS and the second RMS are calculated. When the positive / negative of the first RMS difference, which is the difference from the 1 RMS, and the positive / negative of the second RMS difference, which is the difference between the third RMS and the second RMS, are reversed, the user is based on the posture feature amount in the third frame. The reference posture corresponding to the animation is determined.

一態様における動画再生装置は、前記ユーザアニメーションに対する前記ユーザの動きに基づいて時系列に取得された前記姿勢特徴量を含む第１評価データと、前記ユーザアニメーションに対する前記ユーザの他の動きに基づいて時系列に取得された前記姿勢特徴量を含む第２評価データと、を比較することで前記ユーザアニメーションに対応する前記基準姿勢を決定する。 The moving image playback device in one aspect is based on the first evaluation data including the posture feature amount acquired in time series based on the movement of the user with respect to the user animation, and other movements of the user with respect to the user animation. The reference posture corresponding to the user animation is determined by comparing with the second evaluation data including the posture feature amount acquired in time series.

一態様において、前記姿勢特徴量は、前記ユーザのボーンの位置及び向きを３次元ベクトルで表すボーンデータを含む。 In one aspect, the posture feature includes bone data representing the position and orientation of the user's bones in a three-dimensional vector.

一態様において、前記モーションデータは、前記ユーザの顔の特徴を表す顔特徴量を含む。一態様における動画再生装置は、前記顔特徴量に基づいて前記ユーザの表情を分類する他の分類器により前記ユーザの表情が予め定められた基準表情に属するか否かを判定し、前記遅延が発生している間に、前記ユーザの表情が属すると判定された前記基準表情を識別し前記顔特徴量よりも少ないデータ量の基準表情識別データを前記動画配信装置に送信する。 In one aspect, the motion data includes a facial feature amount representing the facial features of the user. The moving image playback device in one embodiment determines whether or not the user's facial expression belongs to a predetermined reference facial expression by another classifier that classifies the user's facial expression based on the facial feature amount, and the delay is caused. While it is being generated, the reference facial expression determined to belong to the user's facial expression is identified, and the reference facial expression identification data having a data amount smaller than the facial feature amount is transmitted to the moving image distribution device.

一態様において、前記基準姿勢識別データは、前記遅延が発生している間に前記モーションデータに代えて送信される。 In one aspect, the reference posture identification data is transmitted in place of the motion data while the delay is occurring.

一態様において、前記モーションデータはリアルタイムで送信される。 In one aspect, the motion data is transmitted in real time.

一態様による動画配信システムは、一又は複数のコンピュータプロセッサを備え、ユーザのアバタを含む動画を前記ユーザの動画再生装置に配信する。当該一又は複数のコンピュータプロセッサは、コンピュータ読み取り可能な命令を実行することにより、前記動画再生装置から伝送路を介して前記ユーザの姿勢を表す姿勢特徴量を含むモーションデータを受信し、前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを前記動画に含めて配信し、前記姿勢特徴量に基づいて前記ユーザの姿勢を分類する分類器により前記ユーザの姿勢が基準姿勢に属すると判定された場合、前記モーションデータの前記動画配信装置への伝送に遅延が発生している間に、前記基準姿勢を識別し前記姿勢特徴量よりも少ないデータ量の基準姿勢識別データを前記動画再生装置から受信し、前記基準姿勢識別データに基づいて生成された前記ユーザのアバタの登録アニメーションを前記動画に含めて配信する。 The moving image distribution system according to one aspect includes one or a plurality of computer processors, and distributes a moving image including a user's avatar to the user's moving image playback device. By executing a computer-readable command, the one or more computer processors receive motion data including a posture feature amount representing the posture of the user from the moving image playback device via a transmission line, and the posture feature The user's avatar animation generated based on the amount is included in the moving image and distributed, and the classifier that classifies the user's posture based on the posture feature amount determines that the user's posture belongs to the reference posture. If so, while the transmission of the motion data to the moving image distribution device is delayed, the reference posture is identified and the reference posture identification data having a data amount smaller than the posture feature amount is generated by the moving image playback device. The registered animation of the user's avatar received from the above and generated based on the reference posture identification data is included in the moving image and distributed.

一態様による動画再生方法は、一又は複数のコンピュータプロセッサがコンピュータ読み取り可能な命令を実行することにより実行される。当該方法は、動画配信装置から受信した動画を再生する工程と、ユーザの姿勢を表す姿勢特徴量を含むモーションデータを前記動画配信装置に送信する工程と、前記動画配信装置から前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを含む動画を受信する工程と、前記姿勢特徴量に基づいて前記ユーザの姿勢を分類する分類器により前記ユーザの姿勢が予め定められた基準姿勢に属するか否かを判定する工程と、前記モーションデータの前記動画配信装置への伝送に遅延が発生している間に、前記ユーザの姿勢が属すると判定された前記基準姿勢を識別し前記姿勢特徴量よりも少ないデータ量の基準姿勢識別データを前記動画配信装置に送信する工程と、を備える。 The moving image reproduction method according to one aspect is executed by executing a computer-readable instruction by one or more computer processors. The method includes a step of playing a moving image received from a moving image distribution device, a step of transmitting motion data including a posture feature amount representing a user's posture to the moving image distribution device, and a step of transmitting the motion data from the moving image distribution device to the posture feature amount. The user's posture belongs to a predetermined reference posture by a process of receiving a moving image including an animation of the user's avatar generated based on the above and a classifier that classifies the user's posture based on the posture feature amount. While there is a delay in the step of determining whether or not the motion data is transmitted to the moving image distribution device, the reference posture determined to belong to the user's posture is identified and the posture feature amount is identified. It includes a step of transmitting reference posture identification data of a smaller amount of data to the moving image distribution device.

本発明の一態様による動画再生装置は、所定のフレームレートに基づいて前記動画再生装置のユーザに関する複数の特徴点の各々において前記ユーザに関する姿勢特徴量を算出し、第１フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第１ＲＭＳを算出し、前記第１フレームよりも時系列的に後の第２フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第２ＲＭＳを算出し、前記第２ＲＭＳと前記第１ＲＭＳとの差であるＲＭＳ差を算出し、前記ＲＭＳ差が所定の閾値よりも大きい場合に動画配信装置に対して前記第２フレームにおける前記姿勢特徴量を送信し、前記動画配信装置から前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを含む動画を受信し、前記動画を再生する。 The moving image playback device according to one aspect of the present invention calculates the posture feature amount related to the user at each of the plurality of feature points related to the user of the moving image playback device based on a predetermined frame rate, and the plurality of features in the first frame. The first RMS, which is the root mean square of the posture features at each of the points, is calculated, and the square of the posture features at each of the plurality of feature points in the second frame after the first frame in time series. The second RMS, which is the root mean square, is calculated, the RMS difference, which is the difference between the second RMS and the first RMS, is calculated, and when the RMS difference is larger than a predetermined threshold value, the second frame with respect to the video distribution device. The motion feature amount in the above is transmitted, a moving image including the animation of the user's avatar generated based on the posture feature amount is received from the moving image distribution device, and the moving image is reproduced.

本発明の一態様による動画再生装置は、所定のフレームレートに基づいて前記動画再生装置のユーザに関する複数の特徴点の各々において前記ユーザに関する姿勢特徴量を算出し、第１フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第１ＲＭＳを算出し、前記第１フレームよりも時系列的に後の第２フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第２ＲＭＳを算出し、前記第２フレームよりも時系列的に後の第３フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第３ＲＭＳを算出し、前記第２ＲＭＳと前記第１ＲＭＳとの差である第１ＲＭＳ差の正負と前記第３ＲＭＳと前記第２ＲＭＳとの差である第２ＲＭＳ差の正負とが逆転した場合に、前記第３フレームにおける前記姿勢特徴量を送信し、前記動画配信装置から前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを含む動画を受信し、前記動画を再生する。 The moving image playback device according to one aspect of the present invention calculates the posture feature amount related to the user at each of the plurality of feature points related to the user of the moving image playback device based on a predetermined frame rate, and the plurality of features in the first frame. The first RMS, which is the root mean square of the posture features at each of the points, is calculated, and the square of the posture features at each of the plurality of feature points in the second frame after the first frame in time series. The root mean square, which is the root mean square, is calculated, and in the third frame, which is time-series after the second frame, the root mean square, which is the root mean square of the posture features at each of the plurality of feature points, is calculated. When the positive / negative of the first RMS difference, which is the difference between the second RMS and the first RMS, and the positive / negative of the second RMS difference, which is the difference between the third RMS and the second RMS, are reversed, the posture feature in the third frame. The amount is transmitted, the moving image including the animation of the user's avatar generated based on the posture feature amount is received from the moving image distribution device, and the moving image is reproduced.

実施形態によれば、従来よりも少ないデータ量の情報に基づいてユーザの姿勢に関連するアバタのアニメーションを生成できる。 According to the embodiment, it is possible to generate an animation of an avatar related to a user's posture based on information of a smaller amount of data than before.

一実施形態による動画配信システムを示すブロック図である。It is a block diagram which shows the moving image distribution system by one Embodiment. ３次元骨格モデルを概念的に示す図である。It is a figure which shows the 3D skeleton model conceptually. 図１の動画配信システムにおいて記憶される基準姿勢管理データを説明する図である。It is a figure explaining the reference posture management data stored in the moving image distribution system of FIG. 開始姿勢に対応する３次元骨格モデルを概念的に示す図である。It is a figure which conceptually shows the 3D skeleton model corresponding to the starting posture. トリガー姿勢に対応する３次元骨格モデルを概念的に示す図である。It is a figure which conceptually shows the 3D skeleton model corresponding to a trigger posture. 図１の動画配信システムにおいて記憶されるアバタデータを説明する図である。It is a figure explaining the avatar data stored in the moving image distribution system of FIG. 図１の動画配信システムにおいて記憶されるアニメーション管理データを説明する図である。It is a figure explaining the animation management data stored in the moving image distribution system of FIG. 撮像画像を構成するフレームを模式的に示す図である。It is a figure which shows typically the frame which constitutes the captured image. 各フレームにおける複数の特徴点における特徴量及びそのＲＭＳの例を示す表である。It is a table which shows the feature quantity at a plurality of feature points in each frame, and an example of the RMS. 一実施形態において動画再生装置１０に表示される動画の例を示す図である。It is a figure which shows the example of the moving image which is displayed on the moving image reproduction apparatus 10 in one Embodiment. 一実施形態において動画再生装置１０に表示される動画の例を示す図である。It is a figure which shows the example of the moving image which is displayed on the moving image reproduction apparatus 10 in one Embodiment. 一実施形態における動画再生方法における処理の一部の流れを示すフロー図である。It is a flow chart which shows a part flow of the process in the moving image reproduction method in one Embodiment. 他の実施形態による動画配信システムを示すブロック図である。It is a block diagram which shows the moving image distribution system by another embodiment. 図１１に示す動画配信システムにおいて記憶される基準表情管理データを説明する図である。It is a figure explaining the reference facial expression management data stored in the moving image distribution system shown in FIG. 図１１に示す動画配信システムにおいて記憶される登録表情管理データを説明する図である。It is a figure explaining the registered facial expression management data stored in the moving image distribution system shown in FIG. 他の実施形態による動画配信システムを示すブロック図である。It is a block diagram which shows the moving image distribution system by another embodiment. 他の実施形態による動画配信システムを示すブロック図である。It is a block diagram which shows the moving image distribution system by another embodiment. 各フレームにおける複数の特徴点における特徴量の例を示す表である。It is a table which shows the example of the feature quantity at a plurality of feature points in each frame.

以下、図面を適宜参照し、本発明の様々な実施形態を説明する。複数の図面において同一の又は類似する構成要素には同じ参照符号が付される。 Hereinafter, various embodiments of the present invention will be described with reference to the drawings as appropriate. The same or similar components are designated by the same reference numerals in a plurality of drawings.

図１から図６を参照して、一実施形態による動画配信システムについて説明する。図１は、一実施形態による動画配信システム１を示すブロック図である、図２、図４ａ、及び図４ｂは、３次元骨格モデルを概念的に示す図であり、図３、図５、及び図６は、動画配信システム１において記憶される情報を説明するための図である。 A moving image distribution system according to an embodiment will be described with reference to FIGS. 1 to 6. FIG. 1 is a block diagram showing a moving image distribution system 1 according to an embodiment. FIGS. 2, 4a, and 4b are diagrams conceptually showing a three-dimensional skeleton model, and FIGS. 3, 5, and 4 b. FIG. 6 is a diagram for explaining information stored in the moving image distribution system 1.

動画配信システム１は、動画再生装置１０と、動画配信装置２０と、を備える。動画再生装置１０と動画配信装置２０とは、ネットワーク５０を介して相互に通信可能に接続されている。動画配信システム１は、ストレージ６０を備えてもよい。動画再生装置１０のユーザである視聴ユーザは、動画配信装置２０から配信された動画を動画再生装置１０により視聴することができる。動画配信システム１は、２台以上の動画再生装置を備えていてもよい。動画配信装置２０から配信される動画を動画再生装置１０で視聴する視聴ユーザは、自らのアバタを当該動画に表示させることができる。言い換えると、視聴ユーザは、自らのアバタを介して配信されている動画に参加することができる。視聴ユーザは、配信されている動画に自らのアバタを表示させ、そのアバタを介して動画の配信ユーザ（又は配信されている動画内のキャラクタ）や他の視聴ユーザと交流することができる。 The video distribution system 1 includes a video playback device 10 and a video distribution device 20. The video playback device 10 and the video distribution device 20 are connected to each other so as to be able to communicate with each other via the network 50. The video distribution system 1 may include a storage 60. A viewing user who is a user of the moving image playback device 10 can view the moving image distributed from the moving image distribution device 20 by the moving image playback device 10. The video distribution system 1 may include two or more video playback devices. A viewing user who watches a moving image distributed from the moving image distribution device 20 on the moving image playback device 10 can display his / her own avatar on the moving image. In other words, the viewing user can participate in the video being delivered via his or her own avatar. The viewing user can display his / her own avatar on the distributed video and interact with the distribution user (or the character in the distributed video) or another viewing user through the avatar.

まず、動画再生装置１０について説明する。動画再生装置１０は、スマートフォンなどの情報処理装置である。動画再生装置１０は、スマートフォン以外に、携帯電話機、タブレット端末、パーソナルコンピュータ、電子書籍リーダー、ウェアラブルコンピュータ、ゲーム用コンソール、及びこれら以外の動画を再生可能な各種情報処理装置であってもよい。 First, the moving image reproducing device 10 will be described. The moving image playback device 10 is an information processing device such as a smartphone. In addition to smartphones, the moving image playback device 10 may be a mobile phone, a tablet terminal, a personal computer, an electronic book reader, a wearable computer, a game console, and various information processing devices capable of playing back moving images other than these.

動画再生装置１０は、視聴ユーザによって動画の視聴及びそれ以外の目的のために用いられる。動画再生装置１０は、コンピュータプロセッサ１１、通信Ｉ／Ｆ１２、各種情報を記憶するストレージ１３、再生される動画を表示するディスプレイ１４、及びセンサユニット１５、を備える。動画配信装置１０は、集音マイク等の前記以外の構成要素を備えてもよい。動画再生装置１０は、動画配信装置２０から配信された動画を再生する。 The moving image playback device 10 is used by a viewing user for viewing a moving image and for other purposes. The moving image reproduction device 10 includes a computer processor 11, a communication I / F 12, a storage 13 for storing various information, a display 14 for displaying the reproduced moving image, and a sensor unit 15. The moving image distribution device 10 may include components other than the above, such as a sound collecting microphone. The moving image playback device 10 reproduces the moving image distributed from the moving image distribution device 20.

コンピュータプロセッサ１１は、ストレージ１３又はそれ以外のストレージからオペレーティングシステムや様々な機能を実現する様々なプログラムをメモリにロードし、ロードしたプログラムに含まれる命令を実行する演算装置である。コンピュータプロセッサ１１は、例えば、ＣＰＵ、ＭＰＵ、ＤＳＰ、ＧＰＵ、これら以外の各種演算装置、又はこれらの組み合わせである。コンピュータプロセッサ１１は、ＡＳＩＣ、ＰＬＤ、ＦＰＧＡ、ＭＣＵ等の集積回路により実現されてもよい。図１においては、コンピュータプロセッサ１１が単一の構成要素として図示されているが、コンピュータプロセッサ１１は複数の物理的に別体のコンピュータプロセッサの集合であってもよい。本明細書において、コンピュータプロセッサ１１によって実行されるとして説明されるプログラム又は当該プログラムに含まれる命令は、単一のコンピュータプロセッサで実行されてもよいし、複数のコンピュータプロセッサにより分散して実行されてもよい。また、コンピュータプロセッサ１１によって実行されるプログラム又は当該プログラムに含まれる命令は、複数の仮想コンピュータプロセッサにより実行されてもよい。コンピュータプロセッサ１１により実現される機能については後述する。 The computer processor 11 is an arithmetic unit that loads various programs that realize an operating system and various functions from the storage 13 or other storage into a memory and executes instructions included in the loaded programs. The computer processor 11 is, for example, a CPU, an MPU, a DSP, a GPU, various arithmetic units other than these, or a combination thereof. The computer processor 11 may be realized by an integrated circuit such as an ASIC, PLD, FPGA, or MCU. Although the computer processor 11 is shown as a single component in FIG. 1, the computer processor 11 may be a set of a plurality of physically separate computer processors. In the present specification, the program described as being executed by the computer processor 11 or the instructions contained in the program may be executed by a single computer processor or may be executed by a plurality of computer processors in a distributed manner. May be good. Further, the program executed by the computer processor 11 or the instructions included in the program may be executed by a plurality of virtual computer processors. The functions realized by the computer processor 11 will be described later.

通信Ｉ／Ｆ１２は、ハードウェア、ファームウェア、又はＴＣＰ／ＩＰドライバやＰＰＰドライバ等の通信用ソフトウェア又はこれらの組み合わせとして実装される。動画再生装置１０は、通信Ｉ／Ｆ１２を介して、他の装置とデータを送受信することができる。 The communication I / F12 is implemented as hardware, firmware, communication software such as a TCP / IP driver or PPP driver, or a combination thereof. The moving image playback device 10 can transmit and receive data to and from other devices via the communication I / F12.

ストレージ１３は、コンピュータプロセッサ１１によりアクセスされる記憶装置である。ストレージ１３は、例えば、磁気ディスク、光ディスク、半導体メモリ、又はデータを記憶可能な前記以外の各種記憶装置である。ストレージ１３には、様々なプログラムが記憶され得る。ストレージ１３に記憶され得るプログラム及び各種データの少なくとも一部は、動画再生装置１０とは物理的に別体のストレージ（例えば、ストレージ６０）に格納されてもよい。 The storage 13 is a storage device accessed by the computer processor 11. The storage 13 is, for example, a magnetic disk, an optical disk, a semiconductor memory, or various storage devices other than those capable of storing data. Various programs can be stored in the storage 13. At least a part of the program and various data that can be stored in the storage 13 may be stored in a storage (for example, storage 60) that is physically separate from the moving image playback device 10.

ディスプレイ１４は、表示パネルと、タッチパネルと、を有する。表示パネルは、液晶パネル、有機ＥＬパネル、無機ＥＬパネル、又はこれら以外の画像を表示可能な任意の表示パネルである。タッチパネルは、プレイヤのタッチ操作（接触操作）を検出することができるように構成されている。タッチパネルは、プレイヤのタップ、ダブルタップ、ドラッグ等の各種タッチ操作を検出することができる。タッチパネルは、静電容量式の近接センサを備え、プレイヤの非接触操作を検出可能に構成されてもよい。動画配信装置２０から配信された動画は、ディスプレイ１４に表示される。 The display 14 has a display panel and a touch panel. The display panel is a liquid crystal panel, an organic EL panel, an inorganic EL panel, or any other display panel capable of displaying an image other than these. The touch panel is configured to be able to detect a player's touch operation (contact operation). The touch panel can detect various touch operations such as tapping, double tapping, and dragging of the player. The touch panel may include a capacitive proximity sensor and may be configured to be capable of detecting a player's non-contact operation. The moving image distributed from the moving image distribution device 20 is displayed on the display 14.

センサユニット１５は、動画再生装置１０のユーザの姿勢を検出する一又は複数のセンシングデバイスを備える。このセンシングデバイスには、ＲＧＢカメラ、深度センサ、及びこれら以外のユーザの姿勢を検出可能なデバイスが含まれてもよい。センサユニット１５は、コンピュータプロセッサを含んでもよい。センサユニット１５のコンピュータプロセッサは、センシングデバイスによって取得されたデータを解析することで、ユーザの姿勢を表す３次元骨格データを生成してもよい。センサユニット１５に備えられるコンピュータプロセッサは、専用ソフトウェアを実行することで３次元骨格データを生成してもよい。センサユニット１５により検出されるユーザの姿勢を表す３次元骨格データは、特許請求の範囲に記載されている「姿勢特徴量」の一例である。センサユニット１５は、ＭｉｃｒｏｓｏｆｔＣｏｒｐｏｒａｔｉｏｎが提供するｋｉｎｅｃｔ（商標）のようにＬｉｇｈｔＣｏｄｉｎｇ技術を用いて３次元骨格データを生成してもよい。つまり、センサユニット１５は、赤外線のランダムパターンをユーザ等の対象に照射し、その画像を解析することによって対象の深度を取得し、当該深度を解析することで３次元骨格データを生成してもよい。ＬｉｇｈｔＣｏｄｉｎｇ技術を用いて検出されるユーザの姿勢を表す特徴量は３次元骨格データには限られず、それ以外のユーザの姿勢を表す特徴量が用いられ得る。 The sensor unit 15 includes one or a plurality of sensing devices that detect the posture of the user of the moving image reproducing device 10. The sensing device may include an RGB camera, a depth sensor, and other devices capable of detecting the posture of the user. The sensor unit 15 may include a computer processor. The computer processor of the sensor unit 15 may generate three-dimensional skeleton data representing the posture of the user by analyzing the data acquired by the sensing device. The computer processor provided in the sensor unit 15 may generate three-dimensional skeleton data by executing dedicated software. The three-dimensional skeleton data representing the posture of the user detected by the sensor unit 15 is an example of the "posture feature amount" described in the claims. The sensor unit 15 may generate three-dimensional skeleton data using Light Coding technology, such as Kinect ™ provided by Microsoft Corporation. That is, even if the sensor unit 15 irradiates a target such as a user with a random pattern of infrared rays, acquires the depth of the target by analyzing the image, and generates three-dimensional skeleton data by analyzing the depth. Good. The feature amount representing the user's posture detected by using the Light Coding technique is not limited to the three-dimensional skeleton data, and other feature amounts representing the user's posture can be used.

３次元骨格データは、人体又は人体の一部を３次元骨格モデルで表現するためのデータである。３次元骨格モデルは、人体の骨の軸線に相当する複数のボーンと各ボーンを接続する複数の関節とによって、人体又は人体の一部の骨格をモデリングする。図２を参照して３次元骨格モデルについてさらに説明する。図２は、３次元骨格モデルの概念を説明するための説明図である。図２に示されている例では、３次元骨格モデル１００は、８本のボーンと、これらのボーンを接続する関節と、を含む。３次元骨格モデルのボーンは、３次元座標上の２点を結ぶ線分として表される。よって、各ボーンは、３次元空間（ＸＹＺ空間）において各ボーンに相当する有効線分を表す３次元ベクトルＶ１〜Ｖ８で表現される。３次元骨格モデルを構成する３次元骨格モデルは、当該３次元モデルに含まれるボーンの各々を表す３次元ベクトルを含む。図２には、人体の上半身の３次元骨格モデルが示されているが、３次元骨格モデルは、人体の全部の骨格をモデリングしたものであってもよいし、人体の上半身以外の一部（例えば、腕、下半身など）をモデリングしたものであってもよい。また、図２の例では、人体の上半身を８本のボーンで表現しているが、人体の上半身は８本より多い数又は少ない数のボーンで表現されてもよい。より多い数のボーンを含む３次元骨格モデルは、指の骨を表すボーンを含んでもよい。これにより、指の動きを検出することもできる。 The three-dimensional skeleton data is data for expressing the human body or a part of the human body with a three-dimensional skeleton model. The three-dimensional skeleton model models the skeleton of the human body or a part of the human body by a plurality of bones corresponding to the axes of the bones of the human body and a plurality of joints connecting each bone. The three-dimensional skeleton model will be further described with reference to FIG. FIG. 2 is an explanatory diagram for explaining the concept of the three-dimensional skeleton model. In the example shown in FIG. 2, the 3D skeletal model 100 includes eight bones and joints connecting these bones. The bones of the 3D skeleton model are represented as line segments connecting two points on the 3D coordinates. Therefore, each bone is represented by three-dimensional vectors V1 to V8 representing effective line segments corresponding to each bone in the three-dimensional space (XYZ space). The three-dimensional skeleton model constituting the three-dimensional skeleton model includes a three-dimensional vector representing each of the bones included in the three-dimensional model. FIG. 2 shows a three-dimensional skeleton model of the upper body of the human body, but the three-dimensional skeleton model may be a model of the entire skeleton of the human body, or a part other than the upper body of the human body ( For example, the arm, lower body, etc.) may be modeled. Further, in the example of FIG. 2, the upper body of the human body is represented by eight bones, but the upper body of the human body may be represented by more or less than eight bones. A 3D skeletal model containing a larger number of bones may include bones representing finger bones. Thereby, the movement of the finger can also be detected.

ストレージ１３に記憶される情報について説明する。図示の実施形態においては、ストレージ１３には、基準姿勢管理データ１３ａ及びそれ以外の動画配信装置２０が提供するサービスを利用するために必要な様々な情報が記憶される。一実施形態による動画配信システム１においては、予めユーザの基準姿勢が定められている。動画配信システム１においては、一つだけの基準姿勢が定められていてもよいし、複数の基準姿勢が定められていてもよい。この一又は複数の基準姿勢の各々は、ユーザの一連の動作を識別するために複数の姿勢のセットであってもよい。例えば、基準姿勢には、当該基準姿勢に対応付けられるユーザの一連の動作の開始時の姿勢を示す開始姿勢と、当該一連の動作において開始姿勢を取った後の特定の姿勢を示すトリガー姿勢とが含まれてもよい。開始姿勢及びトリガー姿勢の少なくとも一方は、複数であってもよい。 The information stored in the storage 13 will be described. In the illustrated embodiment, the storage 13 stores various information necessary for using the reference posture management data 13a and other services provided by the moving image distribution device 20. In the video distribution system 1 according to one embodiment, the reference posture of the user is determined in advance. In the video distribution system 1, only one reference posture may be defined, or a plurality of reference postures may be defined. Each of the one or more reference poses may be a set of poses to identify a series of user actions. For example, the reference posture includes a start posture that indicates the posture at the start of a series of movements of the user associated with the reference posture, and a trigger posture that indicates a specific posture after taking the start posture in the series of movements. May be included. At least one of the starting posture and the trigger posture may be plural.

基準姿勢管理データ１３ａの例が図３に示されている。図３に示されている例では、基準姿勢に開始姿勢とトリガー姿勢とが含まれている。このため、ストレージ１３においては、基準姿勢を識別する基準姿勢識別データと対応付けて、当該基準姿勢に含まれる開始姿勢を示す開始姿勢データ、及び、当該基準姿勢に含まれるトリガー姿勢を示すトリガー姿勢データと、が記憶され得る。 An example of the reference posture management data 13a is shown in FIG. In the example shown in FIG. 3, the reference posture includes a start posture and a trigger posture. Therefore, in the storage 13, the start posture data indicating the start posture included in the reference posture and the trigger posture indicating the trigger posture included in the reference posture are associated with the reference posture identification data for identifying the reference posture. Data and can be stored.

基準姿勢識別データは、例えば、基準姿勢を識別する基準姿勢ＩＤである。基準姿勢ＩＤは、例えば数ビットで表される識別コードである。基準姿勢ＩＤのビット数は、動画配信システム１で使用される基準姿勢の数に応じて定めることができる。基準姿勢ＩＤは、例えば、１０ビット以下の情報量のデータで表される。基準姿勢ＩＤは、５ビット以下、４ビット以下、３ビット以下、２ビット以下の情報量のデータで表されてもよい。したがって、基準姿勢識別データ（基準姿勢ＩＤ）は、３次元骨格データよりも大幅にデータ量が小さい。 The reference posture identification data is, for example, a reference posture ID for identifying the reference posture. The reference posture ID is, for example, an identification code represented by several bits. The number of bits of the reference posture ID can be determined according to the number of reference postures used in the moving image distribution system 1. The reference posture ID is represented by, for example, data having an amount of information of 10 bits or less. The reference posture ID may be represented by data having an amount of information of 5 bits or less, 4 bits or less, 3 bits or less, and 2 bits or less. Therefore, the amount of the reference posture identification data (reference posture ID) is significantly smaller than that of the three-dimensional skeleton data.

開始姿勢データは、開始姿勢を示すデータである。開始姿勢データは、例えば、開始姿勢を示す３次元骨格データである。開始姿勢の例が図４ａに示されている。図示の例における開始姿勢は、右拳が右肩と同じ高さになるように右の掌を前方に突き出している姿勢である。開始姿勢データは、この開始姿勢をモデリングした３次元骨格データであってもよい。開始姿勢データは、開始姿勢における各ボーンの位置及び向きを示す３次元ベクトルデータを含んでもよい。開始姿勢を示す３次元骨格データは、図４ａに模式的に示されているように８本のボーンを含んでもよい。この８本のボーンを表すベクトルＴ１〜Ｔ８は、センサユニット１５の検出データに基づいて生成される視聴ユーザの３次元骨格データに含まれるボーンのベクトルＶ１〜Ｖ８とそれぞれ対応している。姿勢特徴量として３次元骨格データ以外のデータが用いられる場合には、開始姿勢データは、その使用される姿勢特徴量のうち開始姿勢を表すデータである。 The starting posture data is data indicating the starting posture. The starting posture data is, for example, three-dimensional skeleton data indicating the starting posture. An example of the starting posture is shown in FIG. 4a. The starting posture in the illustrated example is a posture in which the right palm is projected forward so that the right fist is at the same height as the right shoulder. The starting posture data may be three-dimensional skeleton data modeling this starting posture. The starting posture data may include three-dimensional vector data indicating the position and orientation of each bone in the starting posture. The three-dimensional skeleton data showing the starting posture may include eight bones as schematically shown in FIG. 4a. The vectors T1 to T8 representing the eight bones correspond to the bone vectors V1 to V8 included in the three-dimensional skeleton data of the viewing user generated based on the detection data of the sensor unit 15, respectively. When data other than the three-dimensional skeleton data is used as the posture feature amount, the start posture data is data representing the start posture among the used posture feature amounts.

トリガー姿勢データは、トリガー姿勢を示すデータである。トリガー姿勢データは、例えば、トリガー姿勢を示す３次元骨格データである。トリガー姿勢の例が図４ｂに示されている。図示の例におけるトリガー姿勢は、右腕を前方斜め上に突き上げた姿勢である。トリガー姿勢データは、このトリガー姿勢をモデリングした３次元骨格データであってもよい。トリガー姿勢を示す３次元骨格データは、各ボーンの位置及び向きを示す３次元ベクトルデータを含んでもよい。トリガー姿勢を示す３次元骨格データは、図４ｂに模式的に示されているように８本のボーンを含んでもよい。この８本のボーンのベクトルＴ１〜Ｔ８は、センサユニット１５の検出データに基づいて生成される視聴ユーザの３次元骨格データに含まれるボーンのベクトルＶ１〜Ｖ８とそれぞれ対応している。姿勢特徴量として３次元骨格データ以外のデータが用いられる場合には、トリガー姿勢データは、その使用される姿勢特徴量のうちトリガー姿勢を表すデータである。 The trigger posture data is data indicating the trigger posture. The trigger attitude data is, for example, three-dimensional skeleton data indicating the trigger attitude. An example of the trigger posture is shown in FIG. 4b. The trigger posture in the illustrated example is a posture in which the right arm is pushed up diagonally forward. The trigger posture data may be three-dimensional skeleton data that models this trigger posture. The three-dimensional skeleton data indicating the trigger posture may include three-dimensional vector data indicating the position and orientation of each bone. The three-dimensional skeleton data showing the trigger posture may include eight bones as schematically shown in FIG. 4b. The eight bone vectors T1 to T8 correspond to the bone vectors V1 to V8 included in the three-dimensional skeleton data of the viewing user generated based on the detection data of the sensor unit 15. When data other than the three-dimensional skeleton data is used as the posture feature amount, the trigger posture data is data representing the trigger posture among the used posture feature amounts.

次に、動画配信装置２０について説明する。動画配信装置２０は、例えば、動画再生装置１０にネットワーク５０の下り回線を介して動画を配信する動画配信サーバである。動画配信装置２０は、コンピュータプロセッサ２１、通信Ｉ／Ｆ２２、及び各種情報を記憶するストレージ２３、を備える。動画配信装置２０は、集音マイク等の前記以外の構成要素を備えてもよい。 Next, the moving image distribution device 20 will be described. The video distribution device 20 is, for example, a video distribution server that distributes a video to the video playback device 10 via the downlink of the network 50. The video distribution device 20 includes a computer processor 21, a communication I / F 22, and a storage 23 that stores various information. The video distribution device 20 may include components other than the above, such as a sound collecting microphone.

コンピュータプロセッサ２１は、ストレージ２３又はそれ以外のストレージからオペレーティングシステムや様々な機能を実現する様々なプログラムをメモリにロードし、ロードしたプログラムに含まれる命令を実行する演算装置である。上述したコンピュータプロセッサ１１についての説明は、可能な限りコンピュータプロセッサ２１にも当てはまる。コンピュータプロセッサ２１により実現される機能については後述する。 The computer processor 21 is an arithmetic unit that loads various programs that realize an operating system and various functions from the storage 23 or other storage into a memory and executes instructions included in the loaded programs. The above description of the computer processor 11 also applies to the computer processor 21 as much as possible. The functions realized by the computer processor 21 will be described later.

通信Ｉ／Ｆ２２は、ハードウェア、ファームウェア、又はＴＣＰ／ＩＰドライバやＰＰＰドライバ等の通信用ソフトウェア又はこれらの組み合わせとして実装される。動画配信装置２０は、通信Ｉ／Ｆ２２を介して、他の装置とデータを送受信することができる。 The communication I / F 22 is implemented as hardware, firmware, communication software such as a TCP / IP driver or PPP driver, or a combination thereof. The video distribution device 20 can transmit / receive data to / from another device via the communication I / F 22.

ストレージ２３は、コンピュータプロセッサ２１によりアクセスされる記憶装置である。ストレージ２３は、例えば、磁気ディスク、光ディスク、半導体メモリ、又はデータを記憶可能な前記以外の各種記憶装置である。ストレージ２３には、様々なプログラムが記憶され得る。ストレージ２３に記憶され得るプログラム及び各種データの少なくとも一部は、動画配信装置２０とは物理的に別体のストレージ（例えば、ストレージ６０）に格納されてもよい。 The storage 23 is a storage device accessed by the computer processor 21. The storage 23 is, for example, a magnetic disk, an optical disk, a semiconductor memory, or various storage devices other than those capable of storing data. Various programs can be stored in the storage 23. At least a part of the program and various data that can be stored in the storage 23 may be stored in a storage (for example, a storage 60) that is physically separate from the moving image distribution device 20.

ストレージ２３には、オブジェクトデータ２３ａ、アバタデータ２３ｂ、アニメーション管理データ２３ｃ、及び前記以外の動画の生成及び配信のために必要な様々な情報が記憶され得る。 The storage 23 can store object data 23a, avatar data 23b, animation management data 23c, and various information necessary for generating and distributing moving images other than the above.

オブジェクトデータ２３ａは、動画を構成する仮想空間を構築するためのアセットデータを含んでもよい。オブジェクトデータ２３ａは、動画を構成する仮想空間の背景を描画するためのデータ、動画に表示される各種物体を描画するためのデータ、及びこれら以外の動画に表示される各種オブジェクトを描画するためのデータが含まれる。オブジェクトデータ２３ａには、仮想空間におけるオブジェクトの位置を示すオブジェクト位置情報を含んでもよい。オブジェクトデータ２３ａには、上記以外にも、動画再生装置１０の視聴ユーザからの表示要求に基づいて動画に表示されるギフトオブジェクトを示すデータが含まれ得る。ギフトオブジェクトには、エフェクトオブジェクトと、通常オブジェクトと、装飾オブジェクトと、が含まれ得る。視聴ユーザは、所望のギフトオブジェクトを購入することができる。動画に表示されるギフトオブジェクトの詳細は、特許第６４４６１５４号の明細書に記載されている。本出願の動画配信システム１においても、特許第第６４４６１５４号の明細書における記載と同様にギフトオブジェクトを動画中に表示することができる。 The object data 23a may include asset data for constructing a virtual space constituting a moving image. The object data 23a is data for drawing the background of the virtual space constituting the moving image, data for drawing various objects displayed in the moving image, and various objects displayed in the moving image other than these. Contains data. The object data 23a may include object position information indicating the position of the object in the virtual space. In addition to the above, the object data 23a may include data indicating a gift object to be displayed in the moving image based on a display request from the viewing user of the moving image reproducing device 10. Gift objects can include effect objects, regular objects, and decoration objects. The viewing user can purchase the desired gift object. Details of the gift object displayed in the video are described in the specification of Japanese Patent No. 6446154. Also in the moving image distribution system 1 of the present application, the gift object can be displayed in the moving image as described in the specification of Japanese Patent No. 6446154.

アバタデータ２３ｂの例が図５に示されている。図５に示すように、アバタデータ２３ｂは、動画配信システム１で視聴ユーザによって利用されるアバタのアバタ識別情報と、当該アバタを動画内に表示するためのアバタ表示情報と、を含むことができる。言い換えると、ストレージ２３には、アバタのアバタ識別情報と対応付けて、当該アバタを表示するためのアバタ表示情報が記憶され得る。アバタ識別情報は、例えば、アバタを識別するアバタＩＤである。動画再生装置１０のユーザは、動画配信システム１において自らのアバタを設定することができる。ユーザごとにアバタを管理するために、アバタＩＤは、ユーザを識別するユーザＩＤと対応付けてストレージ２３に記憶されてもよい。アバタは、例えば、人間や動物を模した画像として動画内に表示される。アバタ表示情報は、アバタを動画内に表示するために用いられる情報である。アバタ情報には、例えば、頭部、ヘアスタイル、顔のパーツ（目、鼻、口など）、胴部、服、アクセサリ、アイテム、及びこれら以外のアバタを構成するパーツの画像を示すパーツ情報又はそれ以外のアバタの外観を特定するためのスキン情報が含まれる。ユーザは、好みのパーツ画像を選択することで自らのアバタを登録することができる。アバタ表示情報には、動画にアバタを２Ｄ表示するための２Ｄ表示情報と、動画にアバタを３Ｄ表示するための３Ｄ表示情報とが含まれてもよい。３Ｄ表示情報には、アバタを動画内で立体的に表示するためのパーツの画像を示すパーツ情報、アバタの３次元での動きを表現するためのボーンデータ、及びこれら以外のアバタを立体的に表示するために用いられている公知の情報を含む。 An example of avatar data 23b is shown in FIG. As shown in FIG. 5, the avatar data 23b can include avatar identification information of the avatar used by the viewing user in the video distribution system 1 and avatar display information for displaying the avatar in the moving image. .. In other words, the storage 23 may store the avatar display information for displaying the avatar in association with the avatar identification information of the avatar. The avatar identification information is, for example, an avatar ID that identifies the avatar. The user of the moving image playback device 10 can set his / her own avatar in the moving image distribution system 1. In order to manage the avatar for each user, the avatar ID may be stored in the storage 23 in association with the user ID that identifies the user. The avatar is displayed in the moving image as, for example, an image imitating a human or an animal. The avatar display information is information used to display the avatar in the moving image. The avatar information includes, for example, part information showing images of the head, hairstyle, facial parts (eyes, nose, mouth, etc.), body, clothes, accessories, items, and other parts constituting the avatar. Other skin information is included to identify the appearance of the avatar. The user can register his / her own avatar by selecting a favorite part image. The avatar display information may include 2D display information for displaying the avatar in the moving image in 2D and 3D display information for displaying the avatar in the moving image in 3D. The 3D display information includes parts information showing images of parts for displaying avatars in a three-dimensional manner in a moving image, bone data for expressing the movement of avatars in three dimensions, and other avatars in three dimensions. Contains known information used for display.

上述したとおり、動画配信装置２０から配信される動画には、動画再生装置１０のユーザのアバタのアニメーションが含まれてもよい。あるユーザのアバタのアニメーションは、後述するように、当該ユーザの姿勢を示す姿勢特徴量に基づいて、当該ユーザの姿勢や動きをリアルタイムで反映するように生成されてもよい。 As described above, the moving image distributed from the moving image distribution device 20 may include an animation of the user's avatar of the moving image reproducing device 10. As will be described later, the animation of a user's avatar may be generated so as to reflect the posture and movement of the user in real time based on the posture feature amount indicating the posture of the user.

アバタのアニメーションとして、予め登録されている登録アニメーションを採用してもよい。登録アニメーションは、視聴ユーザの動きに追従するように動画の配信中にリアルタイムに生成されるのではなく、動画の配信前又はアバタのアニメーションの表示が必要になる前に予め登録又は定義されているアニメーションである。登録アニメーションを管理するためのアニメーション管理データ２３ｃの例が図６に示されている。図示のように、アニメーション管理データ２３ｃは、基準姿勢識別データと、登録アニメーションを識別する登録アニメーション識別データ（登録アニメーションＩＤ）と、アバタのアニメーションを特定するためのアニメーション定義データと、を有する。ストレージ２３においては、基準姿勢識別データと対応付けて、登録アニメーション識別データ、及び、アニメーション定義データが記憶されている。基準姿勢識別データは、既述のとおり基準姿勢ＩＤであってもよい。動画配信システム１において複数の基準姿勢ＩＤが用いられている場合には、その基準姿勢ＩＤの数に応じて複数の登録アニメーションが登録されていてもよい。この複数の登録アニメーションから選択された一つの登録アニメーションが動画に含められ得る。アニメーション定義データは、アバタのボーンの位置及び向きを示すボーンデータを時系列的に記述するデータであってもよい。 A pre-registered registered animation may be adopted as the animation of the avatar. The registered animation is not generated in real time during the video distribution to follow the movement of the viewing user, but is pre-registered or defined before the video is distributed or before the avatar animation needs to be displayed. It is an animation. An example of animation management data 23c for managing registered animation is shown in FIG. As shown in the figure, the animation management data 23c includes reference posture identification data, registered animation identification data (registered animation ID) for identifying the registered animation, and animation definition data for specifying the animation of the avatar. In the storage 23, the registered animation identification data and the animation definition data are stored in association with the reference posture identification data. The reference posture identification data may be the reference posture ID as described above. When a plurality of reference posture IDs are used in the video distribution system 1, a plurality of registered animations may be registered according to the number of the reference posture IDs. One registered animation selected from the plurality of registered animations can be included in the video. The animation definition data may be data that describes bone data indicating the position and orientation of the avatar bones in chronological order.

登録アニメーションは、視聴ユーザからの要求に基づいて生成されてもよい。以下、説明の便宜のために、視聴ユーザからの要求に基づいて登録される登録アニメーションを「ユーザアニメーション」と呼ぶ。動画再生装置１０は、視聴ユーザからユーザアニメーションの登録要求がなされると、当該視聴ユーザに対して登録を希望するアニメーションに対応する動きを行うように促すことができる。動画再生装置１０は、この基準姿勢を表す画像の表示に応答して視聴ユーザが取った姿勢の姿勢特徴量（例えば、３次元骨格データ）を所定時間に亘って取得する。この姿勢特徴量の取得は、姿勢データ取得部１１ｂ又はセンサユニット１５により所定のサンプリング時間間隔で行われる。このようにして、視聴ユーザが登録を希望するアニメーションに対応する動きを示すアニメーション定義データが得られる。このアニメーション定義データは、アニメーション管理データ２３ｃとして記憶されている登録アニメーションのアニメーション定義データと同じデータ形式を有していてもよい。動画再生装置１０は、このようにして取得されたユーザアニメーションを定義するアニメーション定義データを動画配信装置２０に送信し、このユーザアニメーションを新たな登録アニメーションとして動画配信装置２０に登録する。 The registration animation may be generated based on a request from the viewing user. Hereinafter, for convenience of explanation, the registered animation registered based on the request from the viewing user is referred to as "user animation". When the viewing user requests the registration of the user animation, the moving image playback device 10 can urge the viewing user to perform a movement corresponding to the animation desired to be registered. The moving image reproduction device 10 acquires the posture feature amount (for example, three-dimensional skeleton data) of the posture taken by the viewing user in response to the display of the image representing the reference posture over a predetermined time. The posture feature amount is acquired by the posture data acquisition unit 11b or the sensor unit 15 at predetermined sampling time intervals. In this way, animation definition data indicating the movement corresponding to the animation that the viewing user desires to register can be obtained. This animation definition data may have the same data format as the animation definition data of the registered animation stored as the animation management data 23c. The moving image playback device 10 transmits the animation definition data that defines the user animation acquired in this way to the moving image distribution device 20, and registers this user animation in the moving image distribution device 20 as a new registration animation.

動画配信装置２０は、ユーザアニメーションを定義するアニメーション定義データを受信すると、当該ユーザアニメーションをアニメーション管理データ２３ｃの一部としてストレージ２３に記憶する。具体的には、動画配信装置２０は、登録が要求されているユーザアニメーションを識別する登録アニメーションＩＤを発行し、当該登録アニメーションＩＤと対応付けて動画再生装置１０から受信したアニメーション定義データを記憶する。また、動画配信装置２０は、登録が要求されているユーザアニメーションを動画に含めるためのトリガーとなる基準姿勢を決定するよう動画再生装置１０に要求する。 When the video distribution device 20 receives the animation definition data that defines the user animation, the video distribution device 20 stores the user animation in the storage 23 as a part of the animation management data 23c. Specifically, the video distribution device 20 issues a registration animation ID that identifies the user animation for which registration is requested, and stores the animation definition data received from the video playback device 10 in association with the registered animation ID. .. Further, the moving image distribution device 20 requests the moving image reproducing device 10 to determine a reference posture that serves as a trigger for including the user animation for which registration is requested in the moving image.

動画再生装置１０は、動画配信装置２０からの要求に応じて、自らのアバタに新規に登録したユーザアニメーションに従った動きを行わせるためのトリガーとなる基準姿勢を決定する。本明細書では、新規に登録したユーザアニメーションに従ってアバタを動かすためのトリガーとなる基準姿勢を追加基準姿勢と呼ぶことがある。追加基準姿勢は、様々な手法で定められる。例えば、新規に登録したユーザアニメーションを含むサンプル動画を生成し、当該ユーザアニメーションの登録要求を行った視聴ユーザに対し、このサンプル動画に含まれる複数のフレームの中から一又は複数の候補フレームを選択させ、その選択されたフレームに含まれているアバタの画像に基づいて追加基準姿勢を決定することができる。当該視聴ユーザは、複数のフレームの中から好みのフレームを選択することができる。当該視聴ユーザは、例えば、記憶に残りやすい姿勢を取っているアバタが含まれているフレーム、特徴的な姿勢を取っているアバタが含まれているフレーム、又はこれら以外のフレームを選択することができる。当該複数のフレームの中から２つのフレームが選択された場合、その２つのフレームのうち時系列的に前にあるフレームに含まれているアバタの姿勢を開始姿勢として決定し、時系列的に後にあるフレームに含まれているアバタの姿勢をトリガー姿勢として決定してもよい。 In response to a request from the video distribution device 20, the video playback device 10 determines a reference posture that is a trigger for causing the avatar to perform a movement according to a newly registered user animation. In the present specification, the reference posture that triggers the movement of the avatar according to the newly registered user animation may be referred to as an additional reference posture. The additional reference posture is determined by various methods. For example, a sample video including a newly registered user animation is generated, and one or a plurality of candidate frames are selected from a plurality of frames included in the sample video for a viewing user who has made a registration request for the user animation. The additional reference posture can be determined based on the image of the avatar contained in the selected frame. The viewing user can select a favorite frame from a plurality of frames. The viewing user may select, for example, a frame containing an avatar in a memorable posture, a frame containing an avatar in a characteristic posture, or a frame other than these. it can. When two frames are selected from the plurality of frames, the posture of the avatar included in the frame that is ahead in time series of the two frames is determined as the start posture, and the posture is determined later in time series. The posture of the avatar included in a certain frame may be determined as the trigger posture.

追加基準姿勢を決定する他の方法について説明する。動画再生装置１０は、ユーザアニメーションの登録要求を行った視聴ユーザに対して、当該ユーザアニメーションに対応する動きを複数回行うように指示する。この指示は、音声又は画面表示により行われ得る。動画再生装置１０は、この指示に対応して視聴ユーザが行った動きに対応する姿勢特徴量（例えば、３次元骨格データ）を取得する。具体的には、所定の計測期間において、所定のサンプリング間隔で、ユーザアニメーションに対応して動いている視聴ユーザの姿勢を表す３次元骨格データを取得する。これにより、計測開始から計測終了までの間ユーザアニメーションに対応する動きを行った視聴ユーザの姿勢を時系列で表す３次元骨格データのセットが２組得られる。次にこの２組の３次元骨格データのセットのうち測定開始後に同タイミングで取得された３次元骨格データ同士を比較し、この比較結果に基づいて追加基準姿勢を決定する。例えば、２組の３次元骨格データのセットのうち測定開始後に同タイミングで取得された３次元骨格データの対応するボーンのベクトルが為す角度の合計値を算出し、この角度の合計値が最も小さい３次元骨格データに対応する姿勢（２組の骨格データのうちいずれを採用しても構わない。）を追加基準姿勢とすることができる。 Other methods of determining the additional reference posture will be described. The moving image playback device 10 instructs the viewing user who has requested the registration of the user animation to perform the movement corresponding to the user animation a plurality of times. This instruction may be given by voice or screen display. The moving image playback device 10 acquires a posture feature amount (for example, three-dimensional skeleton data) corresponding to the movement performed by the viewing user in response to this instruction. Specifically, three-dimensional skeleton data representing the posture of the viewing user moving in response to the user animation is acquired at a predetermined sampling interval in a predetermined measurement period. As a result, two sets of three-dimensional skeleton data representing the posture of the viewing user who has performed the movement corresponding to the user animation from the start of the measurement to the end of the measurement can be obtained. Next, of the two sets of three-dimensional skeleton data sets, the three-dimensional skeleton data acquired at the same timing after the start of measurement are compared with each other, and an additional reference posture is determined based on the comparison result. For example, out of two sets of 3D skeleton data, the total value of the angles formed by the corresponding bone vectors of the 3D skeleton data acquired at the same timing after the start of measurement is calculated, and the total value of these angles is the smallest. The posture corresponding to the three-dimensional skeleton data (any of the two sets of skeleton data may be adopted) can be used as the additional reference posture.

次に、動画再生装置１０の機能について説明する。動画再生装置１０の機能は、コンピュータプロセッサ１１がプログラムに含まれるコンピュータ読み取り可能な命令を実行することにより実現される。コンピュータプロセッサ１１は、プログラムに含まれるコンピュータ読み取り可能な命令を実行することにより、動画再生部１１ａ、姿勢データ取得部１１ｂ、送信部１１ｃ、分類部１１ｄ、及び遅延監視部１１ｅとして機能する。コンピュータプロセッサ１１により実現される機能の少なくとも一部は、コンピュータプロセッサ１１以外のコンピュータプロセッサにより実現されてもよい。コンピュータプロセッサ１１により実現される機能の少なくとも一部は、動画配信装置２０のコンピュータプロセッサ２１又はそれ以外の動画配信システム１に備えられたコンピュータプロセッサにより実現されてもよい。 Next, the function of the moving image playback device 10 will be described. The function of the moving image playback device 10 is realized by the computer processor 11 executing a computer-readable instruction included in the program. The computer processor 11 functions as a moving image reproduction unit 11a, a posture data acquisition unit 11b, a transmission unit 11c, a classification unit 11d, and a delay monitoring unit 11e by executing a computer-readable instruction included in the program. At least a part of the functions realized by the computer processor 11 may be realized by a computer processor other than the computer processor 11. At least a part of the functions realized by the computer processor 11 may be realized by the computer processor 21 of the moving image distribution device 20 or a computer processor provided in the other moving image distribution system 1.

動画再生部１１ａは、動画配信装置２０から配信された動画を再生する。再生された動画は、ディスプレイ１４に表示される。 The moving image reproduction unit 11a reproduces the moving image distributed from the moving image distribution device 20. The reproduced moving image is displayed on the display 14.

姿勢データ取得部１１ｂは、動画再生装置１０を使用して動画を視聴するユーザ（「視聴ユーザ」ということもある。）の姿勢の特徴を表す姿勢特徴量を取得する。視聴ユーザの姿勢の特徴を表す姿勢特徴量は、当該視聴ユーザの姿勢を表す３次元骨格データ、つまり視聴ユーザのボーンの位置及び向きを示す３次元ベクトルデータであってもよい。姿勢データ取得部１１ｂは、例えば、センサユニット１５が検出した検出データに基づいて視聴ユーザの３次元骨格データを生成する。視聴ユーザの３次元骨格データは、所定のサンプリング時間間隔ごとに生成されてもよい。センサユニット１５がプロセッサを備えており、そのセンサユニット１５のプロセッサにより３次元骨格データが生成される場合には、姿勢データ取得部１１ｂはプロセッサ１１の機能として実行されなくともよい。 The posture data acquisition unit 11b acquires a posture feature amount representing the posture characteristics of a user (sometimes referred to as a “viewing user”) who watches a moving image using the moving image playback device 10. The posture feature amount representing the posture feature of the viewing user may be three-dimensional skeleton data representing the posture of the viewing user, that is, three-dimensional vector data indicating the position and orientation of the bones of the viewing user. The posture data acquisition unit 11b generates, for example, three-dimensional skeleton data of the viewing user based on the detection data detected by the sensor unit 15. The viewing user's three-dimensional skeleton data may be generated at predetermined sampling time intervals. When the sensor unit 15 includes a processor and the three-dimensional skeleton data is generated by the processor of the sensor unit 15, the attitude data acquisition unit 11b does not have to be executed as a function of the processor 11.

姿勢データ取得部１１ｂにより取得される姿勢特徴量は、３次元骨格データ以外の特徴量であってもよい。例えば、上述したように、ＬｉｇｈｔＣｏｄｉｎｇ技術を用いて赤外線のランダムパターンをユーザ等の対象に照射し、その画像を解析することによって対象の深度を取得し、当該深度を姿勢特徴量としてもよい。センサユニット１５により検出される検出値及びこの検出値に基づいて算出される様々な値が姿勢特徴量として用いられ得る。 The posture feature amount acquired by the posture data acquisition unit 11b may be a feature amount other than the three-dimensional skeleton data. For example, as described above, a random pattern of infrared rays may be applied to a target such as a user by using the Light Coding technique, and the depth of the target may be obtained by analyzing the image, and the depth may be used as a posture feature amount. The detected value detected by the sensor unit 15 and various values calculated based on the detected value can be used as the posture feature amount.

姿勢データ取得部１１ｂは、動画再生装置１０のユーザの動きを所定のフレームレートで撮像して得られる複数のフレームを含む撮像画像から次のようにして姿勢特徴量を求めても良い。具体的には、姿勢データ取得部１１ｂは、当該撮像画像の各フレームにおいて当該ユーザに関連する複数の特徴点を抽出する。ユーザの姿勢や動きを表現するために適した位置が特徴点として抽出される。複数の特徴点を特徴点群ということもある。図７は、動画再生装置１０により撮像された撮像画像を構成する複数のフレームの一つであるフレームｆ１を模式的に示している。図示のように、フレームｆ１には、撮像されたユーザの画像Ｕ１が含まれている。図７に示されているＰ１〜Ｐ６はそれぞれ抽出された特徴点を示す。特徴点の数及び位置は、図７に示されたものには限られない。例えば、動画再生装置１０のユーザの体や顔に赤外線のレーザーによってランダムドットパターンを投影し、このランダムドットパターンが投影されたユーザを赤外線領域を撮影できるカメラで撮像する場合には、このランダムドット全体が特徴点群となる。 The posture data acquisition unit 11b may obtain the posture feature amount as follows from the captured image including a plurality of frames obtained by capturing the movement of the user of the moving image reproduction device 10 at a predetermined frame rate. Specifically, the posture data acquisition unit 11b extracts a plurality of feature points related to the user in each frame of the captured image. Positions suitable for expressing the user's posture and movement are extracted as feature points. A plurality of feature points may be referred to as a feature point group. FIG. 7 schematically shows a frame f1 which is one of a plurality of frames constituting the captured image captured by the moving image reproducing device 10. As shown, the frame f1 includes a captured user image U1. P1 to P6 shown in FIG. 7 indicate the extracted feature points, respectively. The number and position of feature points are not limited to those shown in FIG. For example, when a random dot pattern is projected onto the body or face of the user of the moving image playback device 10 by an infrared laser and the user on which the random dot pattern is projected is imaged with a camera capable of photographing an infrared region, the random dots The whole becomes a feature point group.

姿勢データ取得部１１ｂは、抽出された複数の特徴点の各々について姿勢特徴量（画像ベクトル）を得ることができる。特徴点Ｐ１〜Ｐ６の各々における姿勢特徴量は、各特徴点の深度であってもよいし３次元座標であってもよい。姿勢特徴量として３次元座標が用いられる場合には、撮像画像内に３次元のワールド座標系が設定され、撮影画像及びワールド座標系を元にして正規化された相対量が姿勢特徴量とされる。正規化された姿勢特徴量は、例えば、０〜１．０の相対量で表される。フレームｆ１における各特徴点Ｐ１〜Ｐ６の姿勢特徴量は、フレームｆ１よりも時系列的に前の（例えば直前の）フレームにおいて得られた各特徴点Ｐ１〜Ｐ６の姿勢特徴量とフレームｆ１において得られた各特徴点Ｐ１〜Ｐ６の姿勢特徴量との差で表される変化量であってもよい。姿勢特徴量のフレーム間での変化量を別の姿勢特徴量として用いる場合に両者を区別する必要があるときには、あるフレームにおける姿勢特徴量を「フレーム内特徴量」と呼び、フレーム間での姿勢特徴量の変化量で表される姿勢特徴量を「フレーム間特徴量」と呼んでも良い。特に断らない限り、または、文脈上別に解される場合を除き、単に「姿勢特徴量」というときには「フレーム内特徴量」及び「フレーム間特徴量」の両方を含む。各特徴点Ｐ１〜Ｐ６の姿勢特徴量は、０〜１．０の範囲に正規化されてｆｌｏａｔ配列として表現されてもよい。この場合、特徴点Ｐ１〜Ｐ６の各々における姿勢特徴量が当該配列の要素となる。 The posture data acquisition unit 11b can obtain a posture feature amount (image vector) for each of the extracted plurality of feature points. The posture feature amount at each of the feature points P1 to P6 may be the depth of each feature point or the three-dimensional coordinates. When three-dimensional coordinates are used as the posture feature amount, a three-dimensional world coordinate system is set in the captured image, and the relative amount normalized based on the captured image and the world coordinate system is used as the posture feature amount. To. The normalized posture feature quantity is represented by, for example, a relative quantity of 0 to 1.0. The posture features of the feature points P1 to P6 in the frame f1 are obtained in the posture features of the feature points P1 to P6 obtained in the frame before (for example, immediately before) the frame f1 in time series and in the frame f1. It may be the amount of change represented by the difference between each feature point P1 to P6 and the posture feature amount. When it is necessary to distinguish between the two when the amount of change in the posture feature amount between frames is used as another posture feature amount, the posture feature amount in one frame is called "in-frame feature amount", and the posture between frames is called. The posture feature amount represented by the change amount of the feature amount may be called "inter-frame feature amount". Unless otherwise specified, or unless otherwise understood in the context, the term "postural feature" includes both "in-frame features" and "interframe features". The posture features of each feature point P1 to P6 may be normalized to the range of 0 to 1.0 and expressed as a float array. In this case, the posture feature amount at each of the feature points P1 to P6 is an element of the array.

送信部１１ｃは、姿勢データ取得部１１ｂ又はセンサユニット１５により取得された視聴ユーザの姿勢を表す姿勢特徴量を動画配信装置２０に送信する。送信部１１ｃは、姿勢データ取得部１１ｂ又はセンサユニット１５から姿勢特徴量を受け取ると即時に動画配信装置２０に送信する。言い換えると、送信部１１ｃは、視聴ユーザの姿勢特徴量をリアルタイムに動画配信装置２０に送信することができる。上記のとおり、姿勢特徴量は、所定のサンプリング時間間隔又は所定のフレームレートで取得される。これにより、姿勢特徴量は、所定のサンプリング時間間隔ごと又はフレームレートごとに生成され得る。よって、所定の時間間隔に亘って連続して取得された視聴ユーザの姿勢特徴量は、当該視聴ユーザの体の動きを時系列的にデジタルデータとして表現することができる。視聴ユーザの姿勢を表す姿勢特徴量は、当該姿勢特徴量以外のデータとともに動画配信装置２０に送信されてもよい。本明細書では、視聴ユーザの姿勢や表情を表すデータまたは視聴ユーザの姿勢や表情と相関のあるデータを当該視聴ユーザの「モーションデータ」と総称することがある。視聴ユーザのモーションデータは、動画再生装置１０からネットワーク５０を含む伝送路を介して動画配信装置２０に送信される。モーションデータは、パケット送信されてもよい。つまり、送信部１１ｃは、モーションデータを含むパケットを動画配信装置２０に対して送信してもよい。 The transmission unit 11c transmits the posture feature amount representing the posture of the viewing user acquired by the posture data acquisition unit 11b or the sensor unit 15 to the moving image distribution device 20. Upon receiving the posture feature amount from the posture data acquisition unit 11b or the sensor unit 15, the transmission unit 11c immediately transmits the posture feature amount to the moving image distribution device 20. In other words, the transmission unit 11c can transmit the posture feature amount of the viewing user to the moving image distribution device 20 in real time. As described above, the posture feature amount is acquired at a predetermined sampling time interval or a predetermined frame rate. Thereby, the posture feature amount can be generated at each predetermined sampling time interval or at each frame rate. Therefore, the posture feature amount of the viewing user continuously acquired over a predetermined time interval can express the body movement of the viewing user as digital data in chronological order. The posture feature amount representing the posture of the viewing user may be transmitted to the moving image distribution device 20 together with data other than the posture feature amount. In the present specification, data representing the posture and facial expression of the viewing user or data correlating with the posture and facial expression of the viewing user may be collectively referred to as "motion data" of the viewing user. The motion data of the viewing user is transmitted from the moving image playback device 10 to the moving image distribution device 20 via a transmission line including the network 50. The motion data may be packet-transmitted. That is, the transmission unit 11c may transmit a packet containing motion data to the moving image distribution device 20.

一実施形態における分類部１１ｄは、視聴ユーザの姿勢を分類する分類器により、当該視聴ユーザの姿勢特徴量（例えば、３次元骨格データ）に基づいて、当該視聴ユーザの姿勢が予め定められた基準姿勢に属するか否かを判定する。この分類器は、例えば線形分類器である。基準姿勢が開始姿勢とトリガー姿勢とを含む場合には、分類部１１ｄは、視聴ユーザの姿勢が当該基準姿勢に属するか否か及び当該トリガー姿勢に属するか否かをそれぞれ判定することができる。分類部１１ｄは、時刻ｔ１において視聴ユーザの姿勢が基準姿勢のうちの開始姿勢に属するか否かを判定した後、当該時刻ｔ１におけるユーザの姿勢が開始姿勢に属すると判定された場合には、続いて時刻ｔ１よりも後の時刻ｔ２における視聴ユーザの姿勢がトリガー姿勢に属するか否かを判定してもよい。 The classification unit 11d in one embodiment uses a classifier that classifies the posture of the viewing user, and the posture of the viewing user is determined in advance based on the posture feature amount (for example, three-dimensional skeleton data) of the viewing user. Determine if it belongs to the posture. This classifier is, for example, a linear classifier. When the reference posture includes the start posture and the trigger posture, the classification unit 11d can determine whether or not the posture of the viewing user belongs to the reference posture and whether or not the posture belongs to the trigger posture, respectively. After determining whether or not the posture of the viewing user belongs to the start posture of the reference postures at the time t1, the classification unit 11d determines that the posture of the user at the time t1 belongs to the start posture. Subsequently, it may be determined whether or not the posture of the viewing user at time t2 after time t1 belongs to the trigger posture.

一実施形態において、分類部１１ｄは、教師データを得るために、基準姿勢を表す画像をディスプレイ１４に表示し、この基準姿勢を表す画像に従った姿勢を取るように視聴ユーザに音声や画面表示を通じて指示することができる。動画再生装置１０は、この指示に対応して視聴ユーザが行った動きの姿勢特徴量を姿勢データ取得部１１ｂ又はセンサユニット１５により逐次取得する。分類部１１ｄは、基準姿勢を表す画像に応答して視聴ユーザが取った姿勢の姿勢特徴量を教師データとして学習することにより分類器を作成することができる。 In one embodiment, the classification unit 11d displays an image representing the reference posture on the display 14 in order to obtain teacher data, and displays a voice or a screen to the viewing user so as to take a posture according to the image representing the reference posture. Can be instructed through. The moving image playback device 10 sequentially acquires the posture feature amount of the movement performed by the viewing user in response to this instruction by the posture data acquisition unit 11b or the sensor unit 15. The classification unit 11d can create a classifier by learning the posture feature amount of the posture taken by the viewing user as teacher data in response to the image representing the reference posture.

一実施形態における分類部１１ｄは、例えば、姿勢データ取得部１１ｂ又はセンサユニット１５からの姿勢特徴量が示す視聴ユーザの姿勢が予め定められた基準姿勢と合致するか否かを、当該姿勢特徴量を変数として評価関数に基づいて判定することができる。評価関数に基づいて視聴ユーザの姿勢が基準姿勢と合致すると判定された場合には、当該視聴ユーザの姿勢は、当該基準姿勢に属する。視聴ユーザの姿勢が開始姿勢と合致するか否かは、姿勢データ取得部１１ｂ又はセンサユニット１５において生成された視聴ユーザの姿勢を表す姿勢特徴量と開始姿勢を表す姿勢特徴量（開始姿勢データ）とに基づいて判定され得る。姿勢特徴量が３次元骨格データである場合には、視聴ユーザの３次元骨格データに含まれる各ボーンのベクトルＶ１〜Ｖ８と開始姿勢データのうちベクトルＶ１〜Ｖ８に対応する各ボーンのベクトルＴ１〜Ｔ８とのなす角度の合計値が小さくなるほど、センサユニット１５によって検出された視聴ユーザの姿勢と開始姿勢との類似度が高くなる。すなわち、視聴ユーザの姿勢を表す３次元骨格データに含まれる３次元ベクトルと基準姿勢を表す３次元骨格モデルの対応するベクトルのなす角度θの大きさと姿勢の類似度との間に負の相関関係がある。よって、この点に着目し、２つの３次元骨格モデルによって定義される角度θの大きさを正規化するための新規な評価関数ｆを定義する。下記式（１）に評価関数ｆの一例を示す。
The classification unit 11d in one embodiment determines, for example, whether or not the posture of the viewing user indicated by the posture feature amount from the posture data acquisition unit 11b or the sensor unit 15 matches a predetermined reference posture. Can be determined based on the evaluation function with. When it is determined that the posture of the viewing user matches the reference posture based on the evaluation function, the posture of the viewing user belongs to the reference posture. Whether or not the posture of the viewing user matches the starting posture is determined by the posture feature amount representing the posture of the viewing user and the posture feature amount representing the starting posture (starting posture data) generated by the posture data acquisition unit 11b or the sensor unit 15. Can be determined based on. When the posture feature amount is the three-dimensional skeleton data, the vectors V1 to V8 of each bone included in the three-dimensional skeleton data of the viewing user and the vectors T1 to each bone corresponding to the vectors V1 to V8 in the start posture data. The smaller the total value of the angles formed by T8, the higher the similarity between the posture of the viewing user detected by the sensor unit 15 and the starting posture. That is, there is a negative correlation between the magnitude of the angle θ formed by the 3D vector included in the 3D skeleton data representing the posture of the viewing user and the corresponding vector of the 3D skeleton model representing the reference posture and the similarity of the posture. There is. Therefore, paying attention to this point, a new evaluation function f for normalizing the magnitude of the angle θ defined by the two three-dimensional skeleton models is defined. An example of the evaluation function f is shown in the following equation (1).

上記式において、（Ｔi・Ｖi）は、対応するベクトルの内積を示し、（‖Ｔi‖‖Ｖi‖）は、各ベクトルの大きさの積を示し、ｋは、３次元骨格モデルを構成するベクトルの要素数を示す。つまり、上記式（１）に示した評価関数ｆによれば、２つの３次元骨格モデルの対応するベクトルのなす角度θの余弦（＝cosθ）の平均値が得られることになる。
ここで、２つの３次元骨格モデルの対応するベクトルが完全一致する場合（角度θ＝０）は、cosθ＝１になり、２つの３次元骨格モデルの対応するベクトルの方向が逆向きにな
る場合（角度θ＝１８０°）は、cosθ＝−１になるので、評価関数ｆの値の取り得る範囲は、−１≦ｆ≦＋１となる。この場合、たとえば、評価関数ｆの値の範囲（−１≦ｆ≦＋１）を百分率の値（０〜１００％）にリニアに割り当てれば、類似度を百分率で表現することができる。この所定の閾値は、例えば、９０％とすることができる。２つの３次元骨格データ間の類似度の判定については、特開２０１３−３７４５４号公報にも開示されている。 In the above equation, (Ti · Vi) indicates the inner product of the corresponding vectors, (‖Ti‖‖Vi‖) indicates the product of the magnitudes of each vector, and k is the vector constituting the three-dimensional skeleton model. Indicates the number of elements of. That is, according to the evaluation function f shown in the above equation (1), the average value of the cosine (= cos θ) of the angle θ formed by the corresponding vectors of the two three-dimensional skeleton models can be obtained.
Here, when the corresponding vectors of the two 3D skeleton models exactly match (angle θ = 0), cos θ = 1, and the directions of the corresponding vectors of the two 3D skeleton models are opposite. Since (angle θ = 180 °) is cos θ = -1, the range in which the value of the evaluation function f can be taken is -1 ≦ f ≦ + 1. In this case, for example, if the range of values of the evaluation function f (-1 ≦ f ≦ + 1) is linearly assigned to the percentage value (0 to 100%), the similarity can be expressed as a percentage. This predetermined threshold can be, for example, 90%. The determination of the degree of similarity between the two three-dimensional skeleton data is also disclosed in Japanese Patent Application Laid-Open No. 2013-37454.

分類部１１ｄは、上記のようにして算出した視聴ユーザの姿勢を表す姿勢特徴量と開始姿勢を表す姿勢特徴量との類似度が所定の閾値以上のときに、視聴ユーザの姿勢が開始姿勢に合致すると判定してもよい。 The classification unit 11d sets the posture of the viewing user to the starting posture when the similarity between the posture feature amount representing the posture of the viewing user calculated as described above and the posture feature amount representing the starting posture is equal to or more than a predetermined threshold value. It may be determined that they match.

分類部１１ｄは、姿勢データ取得部１１ｂ又はセンサユニット１５において視聴ユーザの姿勢特徴量が取得される都度、当該姿勢特徴量が示す視聴ユーザの姿勢が開始姿勢に属するか否かを判定してもよい。この場合、視聴ユーザの姿勢特徴量の取得のためのサンプリングレートと同じ頻度で視聴ユーザの姿勢が開始姿勢に合致するか否かの判定が行われる。 Even if the classification unit 11d determines whether or not the posture of the viewing user indicated by the posture feature amount belongs to the starting posture each time the posture data acquisition unit 11b or the sensor unit 15 acquires the posture feature amount of the viewing user. Good. In this case, it is determined whether or not the posture of the viewing user matches the starting posture at the same frequency as the sampling rate for acquiring the posture feature amount of the viewing user.

視聴ユーザの姿勢とトリガー姿勢とが合致するか否かも同様にして判定され得る。すなわち、分類部１１ｄは、時刻ｔ２における視聴ユーザの姿勢を表す姿勢特徴量とトリガー姿勢を表す姿勢特徴量との類似度を算出し、この類似度が所定の閾値以上のときに、視聴ユーザの姿勢がトリガー姿勢に合致すると判定してもよい。視聴ユーザの姿勢が開始姿勢に合致するか否かを判定するための上記の説明は、視聴ユーザの姿勢がトリガー姿勢に合致するか否かを判定するための処理にも可能な限り当てはまる。 Whether or not the posture of the viewing user and the trigger posture match can be determined in the same manner. That is, the classification unit 11d calculates the degree of similarity between the posture feature amount representing the posture of the viewing user and the posture feature amount representing the trigger posture at time t2, and when this similarity is equal to or greater than a predetermined threshold value, the viewing user It may be determined that the posture matches the trigger posture. The above description for determining whether or not the posture of the viewing user matches the start posture also applies as much as possible to the process for determining whether or not the posture of the viewing user matches the trigger posture.

遅延監視部１１ｅは、送信部１１ｃによって動画再生装置１０から動画配信装置２０に送信されたモーションデータの伝送遅延を監視する。例えば、遅延監視部１１ｅは、モーションデータを含む実パケットに送信前にタイムスタンプを付加し、この送信時に付加されたタイプスタンプと、この実パケットが動画配信装置２０で受信されたときに付加されるタイムスタンプと、を用いて動画再生装置１０と動画配信装置２０との間の伝送路における当該実パケットの伝送時間を求めることができる。遅延監視部１１ｅは、この伝送時間が所定の基準時間以上となったとき又は当該基準時間よりも長いときに当該伝送路において遅延が発生していると判定することができる。遅延監視部１１ｅは、伝送路において一旦遅延が発生していると判定した後に、伝送時間が所定の基準時間よりも短くなったとき又は当該所定の基準時間以下となったときに遅延が解消したと判定することができる。動画配信装置２０がサーバであり、動画再生装置１０がクライアントである場合には、動画再生装置１０から動画配信装置２０へのデータの伝送は上り回線を使用して行われる。この場合、遅延監視部１１ｅは、動画再生装置１０と動画配信装置２０との間の伝送路の上り回線に遅延が発生しているか否かを監視する。遅延監視部１１ｅは、上記のように実パケットの伝送時間を測定してもよいし、モーションデータを含まない疑似パケットを用いて伝送時間を測定してもよい。ある視聴ユーザの動画再生装置１０から送信されたモーションデータの伝送に遅延が発生している間に、他の視聴ユーザの動画再生装置１０から送信されたモーションデータの伝送には遅延が発生していないと判定されることも有り得る。よって、伝送路における遅延は、視聴ユーザごとに判定されてもよい。また、伝送遅延の発生の有無は、パケットに含まれるタイムスタンプに基づいて動画配信装置２０において判定されてもよい。動画再生装置１０は、動画配信装置２０における伝送遅延の判定結果を受け取っても良い。 The delay monitoring unit 11e monitors the transmission delay of the motion data transmitted from the moving image playback device 10 to the moving image distribution device 20 by the transmitting unit 11c. For example, the delay monitoring unit 11e adds a time stamp to an actual packet containing motion data before transmission, and adds a type stamp added at the time of transmission and when the actual packet is received by the video distribution device 20. It is possible to obtain the transmission time of the actual packet on the transmission line between the moving image reproduction device 10 and the moving image distribution device 20 by using the time stamp. The delay monitoring unit 11e can determine that a delay has occurred in the transmission line when the transmission time exceeds a predetermined reference time or is longer than the reference time. After determining that a delay has occurred once in the transmission line, the delay monitoring unit 11e eliminates the delay when the transmission time becomes shorter than the predetermined reference time or becomes less than or equal to the predetermined reference time. Can be determined. When the video distribution device 20 is a server and the video playback device 10 is a client, data transmission from the video playback device 10 to the video distribution device 20 is performed using an uplink. In this case, the delay monitoring unit 11e monitors whether or not a delay has occurred in the uplink of the transmission line between the moving image playback device 10 and the moving image distribution device 20. The delay monitoring unit 11e may measure the transmission time of the actual packet as described above, or may measure the transmission time using a pseudo packet that does not include motion data. While there is a delay in the transmission of motion data transmitted from the video playback device 10 of one viewing user, there is a delay in the transmission of motion data transmitted from the video playback device 10 of another viewing user. It may be determined that there is no such thing. Therefore, the delay in the transmission line may be determined for each viewing user. Further, the presence or absence of transmission delay may be determined by the moving image distribution device 20 based on the time stamp included in the packet. The moving image playback device 10 may receive the determination result of the transmission delay in the moving image distribution device 20.

再び送信部１１ｃの機能について説明する。送信部１１ｃは、遅延監視部１１ｅにおいて伝送路に遅延が発生していると判定された場合に、ストレージ１３に記憶されている基準姿勢識別データ（基準姿勢ＩＤ）を送信することができる。基準姿勢識別データは、視聴ユーザの姿勢特徴量を含むモーションデータに代えて送信されてもよい。上記のとおり、基準姿勢識別データは、開始姿勢データ及びトリガー姿勢データを含み得る。送信部１１ｃは、基準姿勢識別データとして、トリガー姿勢データを送信することができる。一実施形態において、分類部１１ｄによって視聴ユーザの姿勢がトリガー姿勢に属すると判定された場合、送信部１１ｃは、モーションデータの伝送に遅延が発生している間、基準姿勢識別データとしてトリガー姿勢を識別する基準姿勢ＩＤを動画配信装置２０に送信する。送信部１１ｃは、分類部１１ｄによって視聴ユーザの姿勢が開始姿勢に属すると判定された後の所定インターバル内に当該視聴ユーザの姿勢がトリガー姿勢に属すると判定された場合、送信部１１ｃは、モーションデータの伝送に遅延が発生している間、トリガー姿勢を識別する基準姿勢ＩＤを動画配信装置２０に送信する。視聴ユーザの姿勢が開始姿勢に属すると判定された後の視聴ユーザの姿勢がトリガー姿勢にも属すると判定された場合に基準姿勢ＩＤを送信することにより、開始姿勢又はトリガー姿勢の一方のみに属すると判定されたことに応じて基準姿勢ＩＤを送信する場合と比べて、視聴ユーザが意図せずに基準姿勢ＩＤを送信してしまうことを防止又は抑制できる。「モーションデータの伝送に遅延が発生している間」とは、遅延監視部１１ｅにおいて伝送路に遅延が発生していると判定されてから遅延が解消したと判定されるまでの期間を意味してもよい。送信部１１ｃは、開始姿勢を識別する基準姿勢ＩＤを動画配信装置２０に送信しなくともよい。 The function of the transmission unit 11c will be described again. When the delay monitoring unit 11e determines that a delay has occurred in the transmission line, the transmission unit 11c can transmit the reference posture identification data (reference posture ID) stored in the storage 13. The reference posture identification data may be transmitted instead of the motion data including the posture feature amount of the viewing user. As described above, the reference attitude identification data may include the start attitude data and the trigger attitude data. The transmission unit 11c can transmit the trigger posture data as the reference posture identification data. In one embodiment, when the classification unit 11d determines that the posture of the viewing user belongs to the trigger posture, the transmission unit 11c uses the trigger posture as the reference posture identification data while the transmission of the motion data is delayed. The identification reference posture ID is transmitted to the moving image distribution device 20. When the transmission unit 11c determines that the viewing user's posture belongs to the trigger posture within a predetermined interval after the classification unit 11d determines that the viewing user's posture belongs to the start posture, the transmission unit 11c causes the motion. While the data transmission is delayed, the reference posture ID that identifies the trigger posture is transmitted to the moving image distribution device 20. By transmitting the reference posture ID when the posture of the viewing user is determined to belong to the trigger posture after the posture of the viewing user is determined to belong to the start posture, the posture belongs to only one of the start posture and the trigger posture. It is possible to prevent or suppress the viewing user from unintentionally transmitting the reference posture ID as compared with the case where the reference posture ID is transmitted in response to the determination. “While a delay is occurring in the transmission of motion data” means a period from when the delay monitoring unit 11e determines that a delay has occurred in the transmission line to when it is determined that the delay has been eliminated. You may. The transmission unit 11c does not have to transmit the reference posture ID that identifies the start posture to the moving image distribution device 20.

次に、コンピュータプロセッサ２１により実現される機能についてより具体的に説明する。コンピュータプロセッサ２１は、配信プログラムに含まれるコンピュータ読み取り可能な命令を実行することにより、動画生成部２１ａ、動画配信部２１ｂ、及びアニメーション生成部２１ｃ、として機能する。コンピュータプロセッサ２１により実現される機能の少なくとも一部は、動画配信システム１のコンピュータプロセッサ２１以外のコンピュータプロセッサにより実現されてもよい。コンピュータプロセッサ２１により実現される機能の少なくとも一部は、例えば、動画再生装置１０のコンピュータプロセッサ１１又はそれ以外の動画配信システム１に備えられたコンピュータプロセッサにより実現されてもよい。 Next, the functions realized by the computer processor 21 will be described more specifically. The computer processor 21 functions as a moving image generation unit 21a, a moving image distribution unit 21b, and an animation generation unit 21c by executing a computer-readable instruction included in the distribution program. At least a part of the functions realized by the computer processor 21 may be realized by a computer processor other than the computer processor 21 of the moving image distribution system 1. At least a part of the functions realized by the computer processor 21 may be realized by, for example, the computer processor 11 of the moving image reproducing device 10 or a computer processor provided in the other moving image distribution system 1.

動画配信装置２０は、様々な種類の動画を配信することができる。以下では、動画配信装置２０がアクターの動きに基づいて生成されるキャラクタオブジェクトのアニメーションを含む動画を配信することを想定する。 The video distribution device 20 can distribute various types of moving images. In the following, it is assumed that the moving image distribution device 20 distributes a moving image including an animation of a character object generated based on the movement of an actor.

動画生成部２１ａは、アクターに装着されたモーションセンサによって当該アクターの動きを検出する。動画生成部２１ａは、モーションセンサによって検出されたアクターの体の動きに同期して動くキャラクタのアニメーションを生成することができる。動画生成部２１ａは、アクターの顔の動きのデジタル表現であるフェイスモーションデータを取得してもよい。この場合、動画生成部２１ａは、アクターの顔の動きに同期して表情が変わるキャラクタのアニメーションを生成することができる。動画生成部２１ａは、オブジェクトデータ２３ａを用いて仮想空間を構築し、この仮想空間と、アクターに対応するキャラクタのアニメーションと、を含む動画を生成することができる。動画生成部２１ａは、生成した動画にマイクから取得したアクターの音声を合成することができる。アクターの体や表情の動きに同期して動くキャラクタのアニメーションを含む動画の生成については、特許第６４４６１５号の明細書に詳細に開示されている。 The moving image generation unit 21a detects the movement of the actor by a motion sensor mounted on the actor. The moving image generation unit 21a can generate an animation of a character that moves in synchronization with the body movement of the actor detected by the motion sensor. The moving image generation unit 21a may acquire face motion data which is a digital representation of the face movement of the actor. In this case, the moving image generation unit 21a can generate an animation of a character whose facial expression changes in synchronization with the movement of the actor's face. The moving image generation unit 21a can construct a virtual space using the object data 23a, and can generate a moving image including the virtual space and the animation of the character corresponding to the actor. The moving image generation unit 21a can synthesize the voice of the actor acquired from the microphone with the generated moving image. The generation of moving images, including animations of characters that move in synchronization with the movements of the actor's body and facial expressions, is disclosed in detail in the specification of Japanese Patent No. 644615.

動画生成部２１ａは、動画を視聴している視聴ユーザから当該動画へアバタを参加させるための参加要求を受け付けると、当該視聴ユーザのアバタを含むように動画を生成することができる。視聴ユーザからの参加要求には、当該視聴ユーザを特定するユーザＩＤが含まれていてもよい。動画生成部２１ａは、参加要求に含まれているユーザＩＤに基づいて当該視聴ユーザのアバタを識別するアバタＩＤを特定し、ストレージ２３において当該アバタＩＤに対応付けて記憶されているパーツ情報に基づいて当該視聴ユーザのアバタを表すアバタオブジェクトを生成することができる。 When the moving image generation unit 21a receives a participation request for the avatar to participate in the moving image from the viewing user who is watching the moving image, the moving image generation unit 21a can generate the moving image so as to include the avatar of the viewing user. The participation request from the viewing user may include a user ID that identifies the viewing user. The moving image generation unit 21a identifies the avatar ID that identifies the avatar of the viewing user based on the user ID included in the participation request, and is based on the part information stored in association with the avatar ID in the storage 23. It is possible to generate an avatar object that represents the avatar of the viewing user.

動画配信部２１ｂは、動画生成部２１ａにおいて生成された動画を配信する。この動画は、ネットワーク５０を介して動画再生装置１０に配信される。受信された動画は、動画再生装置１０において再生される。 The video distribution unit 21b distributes the video generated by the video generation unit 21a. This moving image is distributed to the moving image reproducing device 10 via the network 50. The received moving image is played back in the moving image reproducing device 10.

動画配信装置２０から動画再生装置１０ａに配信され、この動画再生装置１０において再生されている動画の表示例が図９ａに示されている。動画再生装置１０において再生される動画は、ディスプレイ１４に表示され得る。図示のように、動画再生装置１０において表示されている動画６０は、アクターのキャラクタを表すキャラクタオブジェクト５１と、キャラクタオブジェクト５１が立つフロアオブジェクト５４ａと、ステージの後端を画定するスクリーンオブジェクト５４ｂと、動画６０を視聴している視聴ユーザのアバタを示すアバタオブジェクト５６ａ〜５６ｊと、を含んでいる。キャラクタオブジェクト５１は、アクターの体の動きに同期して仮想空間で動くことができる。動画６０は、動画配信装置２０から多数の動画再生装置１０に配信されるため、画像６０は、多数のアバタを含んでいる。図９ａにおいては、１０人の視聴ユーザがアバタを介して動画に参加していることが想定されている。動画に参加可能な視聴ユーザの数は、１０より多くともよいし、１０より少なくともよい。 FIG. 9a shows a display example of a moving image distributed from the moving image distribution device 20 to the moving image reproducing device 10a and reproduced by the moving image reproducing device 10. The moving image reproduced by the moving image reproducing device 10 may be displayed on the display 14. As shown in the figure, the moving image 60 displayed on the moving image reproducing device 10 includes a character object 51 representing an actor's character, a floor object 54a on which the character object 51 stands, and a screen object 54b defining the rear end of the stage. The avatar objects 56a to 56j indicating the avatar of the viewing user who is viewing the moving image 60 are included. The character object 51 can move in the virtual space in synchronization with the movement of the actor's body. Since the moving image 60 is distributed from the moving image distribution device 20 to a large number of moving image playback devices 10, the image 60 contains a large number of avatars. In FIG. 9a, it is assumed that 10 viewing users are participating in the moving image via the avatar. The number of viewing users who can participate in the video may be more than 10, or at least 10 or less.

ディスプレイ１４には、動画６０に重複するように、ユーザの操作を受け付けるための操作ボタンが表示されてもよい。図９ａに示されている例では、ギフティングを行うためのギフトボタン６１と、評価を提供するための評価ボタン６２と、アバタによる動画６０への参加を申請するためのアバタ参加ボタン６３と、が動画６０に重畳表示されている。ギフトボタン６１、評価ボタン６２、及びアバタ参加ボタン６３は、視聴ユーザによって選択可能に表示されている。動画６０には、これら以外の操作ボタンが表示されてもよい。動画６０を視聴している視聴ユーザは、ギフトボタン６１の選択により、動画６０を配信している配信者や動画６０にキャラクタ５１を介して出演しているアクターに対して所望のギフトを贈ることができる。視聴ユーザは、評価ボタン６２の選択により、動画６０への肯定的な評価がなされたことを示す評価情報を動画配信装置２０に送信することができる。様々な視聴ユーザからの評価情報を集計し、その集計結果が動画６０とともに表示されてもよい。視聴ユーザは、アバタ参加ボタン６３の選択により、自らのアバタを動画６０に参加させることを要求する参加要求を動画配信装置２０に送信することができる。 The display 14 may display an operation button for accepting a user's operation so as to overlap the moving image 60. In the example shown in FIG. 9a, a gift button 61 for performing gifting, an evaluation button 62 for providing an evaluation, an avatar participation button 63 for applying for participation in the video 60 by the avatar, and the like. Is superimposed and displayed on the moving image 60. The gift button 61, the evaluation button 62, and the avatar participation button 63 are displayed so as to be selectable by the viewing user. Operation buttons other than these may be displayed on the moving image 60. The viewing user who is watching the video 60 gives a desired gift to the distributor who distributes the video 60 or the actor who appears in the video 60 via the character 51 by selecting the gift button 61. Can be done. By selecting the evaluation button 62, the viewing user can transmit the evaluation information indicating that the moving image 60 has been positively evaluated to the moving image distribution device 20. Evaluation information from various viewing users may be aggregated and the aggregated result may be displayed together with the moving image 60. By selecting the avatar participation button 63, the viewing user can send a participation request requesting that his / her own avatar participate in the moving image 60 to the moving image distribution device 20.

アニメーション生成部２１ｃは、動画６０に参加している視聴者のアバタのアニメーションを生成する。一実施形態において、アニメーション生成部２１ｃは、動画再生装置１０から受信した視聴ユーザの姿勢特徴量に基づいて、当該視聴ユーザのアバタのアニメーションを生成する。時間的に連続して受信される姿勢特徴量（例えば、３次元骨格データ）は、視聴ユーザの体の動きを時系列的に表現する。よって、アニメーション生成部２１ｃは、動画再生装置１０から視聴ユーザの姿勢特徴量を連続して受信することで、当該姿勢特徴量に基づいて、当該視聴ユーザの体の動きに同期して動くアバタのアニメーションを生成することができる。動画生成部２１ａは、アニメーション生成部２１ｃによってアバタのアニメーションが生成されると、当該アバタのアニメーションを含むように動画６０を生成する。 The animation generation unit 21c generates an animation of the viewer's avatar participating in the moving image 60. In one embodiment, the animation generation unit 21c generates an animation of the viewing user's avatar based on the viewing user's posture feature amount received from the moving image playback device 10. The posture feature amount (for example, three-dimensional skeleton data) received continuously in time represents the movement of the body of the viewing user in chronological order. Therefore, the animation generation unit 21c continuously receives the posture feature amount of the viewing user from the moving image playback device 10, and the avatar moves in synchronization with the body movement of the viewing user based on the posture feature amount. Animations can be generated. When the animation generation unit 21c generates the animation of the avatar, the moving image generation unit 21a generates the moving image 60 so as to include the animation of the avatar.

動画再生装置１０から動画配信装置２０への上り回線の伝送路で伝送遅延が発生すると、動画配信装置２０は、視聴ユーザの姿勢特徴量を時間的に連続して受信することができなくなることがある。動画再生装置１０から送信される視聴ユーザの姿勢特徴量のみに基づいてアバタのアニメーションを作成しようとすると、伝送路において伝送遅延が発生した場合、視聴ユーザの動きを反映して動くアバタのアニメーションを生成することができなくなってしまうおそれがある。これに対し、一実施形態におけるアニメーション生成部２１ｃは、視聴ユーザの姿勢特徴量のみに依存せず、動画再生装置１０からの基準姿勢識別データ（例えば、基準姿勢ＩＤ）に基づいてアバタのアニメーションを生成することができる。具体的には、アニメーション生成部２１ｃは、動画再生装置１０から基準姿勢識別データを受信すると、アニメーション管理データ２３ｃを参照することで受信した基準姿勢識別データに対応付けられている登録アニメーションを特定することができ、この特定された登録アニメーションの登録アニメーションＩＤに対応付けられているアニメーション定義データに基づいてアバタのアニメーションを生成することができる。既述のとおり、基準姿勢識別データは、動画再生装置１０の遅延監視部１１ｅによって姿勢特徴量を含むモーションデータの上り回線での伝送に遅延があると判定された場合に、動画再生装置１０から動画配信装置２０へ送信される。よって、アニメーション生成部２１ｃは、モーションデータの伝送に遅延が発生している間に姿勢特徴量が受信できなかったとしても、その遅延が発生している間に動画再生装置１０から送られてくる基準姿勢識別データに基づいてアバタのアニメーションを生成することができる。 If a transmission delay occurs in the upstream transmission line from the video playback device 10 to the video distribution device 20, the video distribution device 20 may not be able to continuously receive the posture feature amount of the viewing user in time. is there. If an attempt is made to create an avatar animation based only on the posture feature amount of the viewing user transmitted from the video playback device 10, and a transmission delay occurs in the transmission line, the avatar animation that moves to reflect the movement of the viewing user is generated. It may not be possible to generate it. On the other hand, the animation generation unit 21c in one embodiment does not depend only on the posture feature amount of the viewing user, but animates the avatar based on the reference posture identification data (for example, the reference posture ID) from the video playback device 10. Can be generated. Specifically, when the animation generation unit 21c receives the reference posture identification data from the moving image playback device 10, the animation generation unit 21c identifies the registered animation associated with the received reference posture identification data by referring to the animation management data 23c. It is possible to generate an avatar animation based on the animation definition data associated with the registered animation ID of the specified registered animation. As described above, the reference posture identification data is obtained from the video playback device 10 when the delay monitoring unit 11e of the video playback device 10 determines that there is a delay in the transmission of the motion data including the posture feature amount on the uplink. It is transmitted to the video distribution device 20. Therefore, even if the posture feature amount cannot be received while the motion data transmission is delayed, the animation generation unit 21c is sent from the moving image playback device 10 while the delay is occurring. An avatar animation can be generated based on the reference posture identification data.

アバタのアニメーションの生成について説明する。視聴ユーザの姿勢特徴量（例えば、３次元骨格データ）に基づいて生成されるアバタのアニメーションは、視聴ユーザの動きを反映したアバタの動きを表現するものである。例えば、視聴ユーザが右手を腰に当てた位置から斜め上方に向かって持ち上げる動作を行ったと想定する。この場合、視聴ユーザの姿勢特徴量に基づいて当該視聴ユーザのアバタのアニメーションを生成する。このアニメーションを動画に含めることにより、アバタは、当該動画内で、視聴ユーザの動きと同様に右手（アバタの右手に相当するパーツ）を腰に当てた位置から斜め上方に向かって持ち上げる動作を行う。 The generation of avatar animation will be described. The avatar animation generated based on the posture feature amount of the viewing user (for example, three-dimensional skeleton data) expresses the movement of the avatar reflecting the movement of the viewing user. For example, it is assumed that the viewing user performs an operation of lifting the right hand diagonally upward from the position where the right hand is placed on the waist. In this case, the animation of the avatar of the viewing user is generated based on the posture feature amount of the viewing user. By including this animation in the video, the avatar performs the action of lifting the right hand (the part corresponding to the right hand of the avatar) diagonally upward from the position where the right hand (the part corresponding to the right hand of the avatar) is placed on the waist in the video, similar to the movement of the viewing user. ..

他方、伝送路において遅延が発生している場合には、姿勢特徴量に基づくアバタのアニメーションの生成は中断されてもよい。例えば、伝送路において遅延が発生している場合には、送信部１１ｃによる姿勢特徴量の送信を中断してもよく、これに応じてアニメーション生成部２１ｃにおける姿勢特徴量に基づくアバタのアニメーションの生成も中断されてもよい。上記のように、動画配信装置２０は、動画再生装置１０から受信するパケットに含まれるタイムスタンプに基づいて遅延の有無を判定することができる。伝送路に遅延が発生しているときには、動画配信装置２０においてデータ量が大きい姿勢特徴量を適時に受信することは困難となる。かかる場合にも姿勢特徴量に基づくアバタのアニメーションを継続すると、動画内におけるアバタの動きが不自然になってしまうおそれがある。このようなアバタの不自然な動きは、配信される動画のクオリティを劣化させるため望ましくない。伝送路において遅延が発生している場合に、姿勢特徴量に基づくアバタのアニメーションの生成を中断することにより、アバタが不自然な動きを行わないようにすることができる。姿勢特徴量に基づくアバタのアニメーションの生成を中断しても、アニメーション生成部２１ｃは、動画再生装置１０からの基準姿勢識別データに基づいてアバタの登録アニメーションＩＤを特定し、この特定された登録アニメーションＩＤに対応付けられているアニメーション定義データに基づいてアバタのアニメーションを生成することができる。伝送路において遅延が発生している間に、視聴ユーザが基準姿勢と合致する姿勢を取らなかった場合には、動画配信装置２０は、基準姿勢識別データを受信しない。この場合には、アニメーション生成部２１ｃは、当該視聴ユーザのアバタのアニメーションを生成しなくともよい。アニメーション生成部２１ｃがアバタのアニメーションを生成しない場合には、当該アバタは動画内で静止している。別の実施形態において、伝送路において遅延が発生している間に動画配信装置２０が基準姿勢識別データを受信しない場合には、アバタについて定められている基本動作に従った動きを行うように当該アバタのアニメーションを作成してもよい。アバタの基本動作とは、例えば、手を上下に揺らす、手を左右に揺らす、飛び跳ねるなどの予め定められた動作を指す。この基本動作は、複数の視聴ユーザに共通に設定されていてもよい。アバタの基本動作は、動画再生装置１０からのアバタの動きに関する情報（３次元骨格データや基準姿勢識別データ）を受信しなくてもアバタによって行われ得る点で、アニメーション生成部２１ｃにおいて生成されるアニメーションとは異なっている。 On the other hand, if there is a delay in the transmission line, the generation of the avatar animation based on the attitude features may be interrupted. For example, when a delay occurs in the transmission line, the transmission of the posture feature amount by the transmission unit 11c may be interrupted, and the animation generation unit 21c may generate an avatar animation based on the posture feature amount accordingly. May also be interrupted. As described above, the moving image distribution device 20 can determine the presence or absence of a delay based on the time stamp included in the packet received from the moving image playing device 10. When a delay occurs in the transmission line, it becomes difficult for the moving image distribution device 20 to receive the posture feature amount having a large amount of data in a timely manner. Even in such a case, if the animation of the avatar based on the posture feature amount is continued, the movement of the avatar in the moving image may become unnatural. Such unnatural movement of the avatar is not desirable because it deteriorates the quality of the delivered video. When a delay occurs in the transmission line, it is possible to prevent the avatar from making an unnatural movement by interrupting the generation of the avatar animation based on the posture feature amount. Even if the generation of the avatar animation based on the posture feature amount is interrupted, the animation generation unit 21c identifies the avatar registration animation ID based on the reference posture identification data from the moving image playback device 10, and the specified registration animation. An avatar animation can be generated based on the animation definition data associated with the ID. If the viewing user does not take a posture that matches the reference posture while the delay is occurring in the transmission line, the moving image distribution device 20 does not receive the reference posture identification data. In this case, the animation generation unit 21c does not have to generate the animation of the avatar of the viewing user. When the animation generation unit 21c does not generate an animation of the avatar, the avatar is stationary in the moving image. In another embodiment, when the moving image distribution device 20 does not receive the reference posture identification data while the delay is occurring in the transmission line, the movement is performed according to the basic operation defined for the avatar. You may create an animation of the avatar. The basic movement of the avatar refers to a predetermined movement such as shaking the hand up and down, shaking the hand left and right, and jumping. This basic operation may be set in common to a plurality of viewing users. The basic operation of the avatar is generated by the animation generation unit 21c in that it can be performed by the avatar without receiving information (three-dimensional skeleton data and reference posture identification data) regarding the movement of the avatar from the moving image playback device 10. It's different from animation.

アニメーション生成部２１ｃによってアバタのアニメーションが生成されると、動画生成部２１ａによって当該アニメーションを含む動画が生成され、このアバタのアニメーションを含む動画が動画配信部２１ｂによって動画再生装置１０に配信される。図９ａに示されている動画６０にアバタオブジェクト５６ａのアニメーションが含まれる場合には、アバタオブジェクト５６ａのアニメーションを含む動画６０が表示される。例えば、アバタオブジェクト５６ａが右手を挙げるアニメーションが生成された場合には、図９ｂに示されているように、動画６０内においてアバタオブジェクト５６ａが右手を挙げる動きを行う。 When the animation of the avatar is generated by the animation generation unit 21c, the moving image generation unit 21a generates a moving image including the animation, and the moving image distribution unit 21b distributes the moving image including the animation of the avatar to the moving image reproducing device 10. When the moving image 60 shown in FIG. 9a includes the animation of the avatar object 56a, the moving image 60 including the animation of the avatar object 56a is displayed. For example, when an animation in which the avatar object 56a raises the right hand is generated, the avatar object 56a moves to raise the right hand in the moving image 60 as shown in FIG. 9b.

上記のように、動画配信装置２０は、登録が要求されているユーザアニメーションを動画に含めるためのトリガーとなる追加基準姿勢を決定するよう動画再生装置１０に要求してもよい。追加基準姿勢の決定方法の一部の態様については既述のとおりであるが、図８をさらに参照して追加基準姿勢を決定する別の態様について説明する。以下の例では、ユーザの画像の特徴点を抽出し、その特徴点における姿勢特徴量の二乗平均平方根を利用して追加基準姿勢が決定される。まず、動画再生装置１０は、ユーザアニメーションとして登録したい動きを行うようにユーザに指示し、当該指示に基づいて登録したい動きを行っているユーザを所定のフレームレートで撮像することにより複数のフレームを含む撮像画像を得る。動画再生装置１０は、抽出された複数の特徴点の各々について姿勢特徴量を得る。ユーザを撮像した撮像画像の複数のフレームの各々における特徴点の抽出については、図７を参照して既に説明したので、繰り返しの説明は行わない。 As described above, the video distribution device 20 may request the video playback device 10 to determine an additional reference posture that triggers the inclusion of the user animation for which registration is required in the video. Although some aspects of the method for determining the additional reference posture are as described above, another aspect for determining the additional reference posture will be described with reference to FIG. In the following example, the feature points of the user's image are extracted, and the additional reference posture is determined by using the root mean square of the posture feature amount at the feature points. First, the moving image playback device 10 instructs the user to perform the movement to be registered as a user animation, and images a user performing the movement to be registered based on the instruction at a predetermined frame rate to capture a plurality of frames. Obtain a captured image including. The moving image playback device 10 obtains posture feature quantities for each of the extracted plurality of feature points. Since the extraction of the feature points in each of the plurality of frames of the captured image captured by the user has already been described with reference to FIG. 7, no repetitive description will be given.

図８は、第０フレームから第２３フレームまでの２４フレームの各々における姿勢特徴量を含む表を示している。図８の表においては、特徴点Ｐ１〜Ｐ６の各々の姿勢特徴量が、「Ｐ１」欄〜「Ｐ６」欄にそれぞれ示されている。この姿勢特徴量は、０〜１．０の範囲に正規化されている。動画再生装置１０は、各特徴点Ｐ１〜Ｐ６の姿勢特徴量の二乗平均平方根（ＲＭＳ）をフレームごとに算出する。Ｎ個の特徴点Ｐ₁〜Ｐ_Nの各々における姿勢特徴量をｘ_i（ただし、ｉは１〜Ｎの値をとる。）とすると、特徴点Ｐ１〜Ｐ_Nの姿勢特徴量のＲＭＳ（ｘ）は以下の式で表される。
FIG. 8 shows a table including posture features in each of the 24 frames from the 0th frame to the 23rd frame. In the table of FIG. 8, each posture feature amount of the feature points P1 to P6 is shown in the "P1" column to the "P6" column, respectively. This posture feature amount is normalized to the range of 0 to 1.0. The moving image playback device 10 calculates the root mean square (RMS) of the posture feature amounts of the feature points P1 to P6 for each frame. The orientation feature quantity at each of the N feature points P ₁ ~P _N x _i (where, i takes values of 1 to N.) To the, the posture feature quantity of the feature point P1~P _N RMS (x ) Is expressed by the following formula.

図８には、フレームごとに算出されたＲＭＳ（ｘ）が記載されている。動画再生装置１０は、ＲＭＳ（ｘ）以外に、各フレームについて、各特徴点における姿勢特徴量の平均値、及び、当該フレームのＲＭＳ（ｘ）と前フレームのＲＭＳ（ｘ）との差を示すＲＭＳ差を算出しても良い。動画再生装置１０は、フレームごとに、ＲＭＳ差の正負を示す正負フラグ、ＲＭＳ差の正負が逆転したか否かを示す反転フラグを求めてもよい。例えば、第２フレームにおけるＲＭＳ差は、第２フレームにおけるＲＭＳ（ｘ₂）と第１フレームにお
けるＲＭＳ（ｘ₁）との差である。正負フラグは、例えば、ＲＭＳ差が正のときに「１」
であり、ＲＭＳ差が負のときに「０」となる。反転フラグは、例えば、ＲＭＳ差の符号が逆転したときに「１」であり、ＲＭＳ差の符号が逆転していない（つまり、全フレームと符号が同じ）にときに「０」となる。図８には、これらの平均値、ＲＭＳ差、正負フラグ、及び反転フラグが含められている。図８においては、第４フレーム、第１０フレーム、第１６フレーム、第１７フレーム、第１９フレーム、及び第２１フレームでその直前のフレームとＲＭＳ差の正負が逆転しているため、反転フラグに「１」が設定されている。第０フレームよりも先行するフレームは存在しないため、第０フレームについてはＲＭＳ差が空欄になっている。 FIG. 8 shows the RMS (x) calculated for each frame. In addition to the RMS (x), the moving image playback device 10 shows the average value of the posture feature amount at each feature point and the difference between the RMS (x) of the frame and the RMS (x) of the previous frame for each frame. The RMS difference may be calculated. The moving image playback device 10 may obtain a positive / negative flag indicating the positive / negative of the RMS difference and an inversion flag indicating whether or not the positive / negative of the RMS difference is reversed for each frame. For example, the RMS difference in the second frame is the difference between the RMS (x ₂ ) in the second frame and the RMS (x ₁ ) in the first frame. The positive / negative flag is, for example, "1" when the RMS difference is positive.
It becomes "0" when the RMS difference is negative. The inversion flag is, for example, "1" when the sign of the RMS difference is reversed, and is "0" when the sign of the RMS difference is not reversed (that is, the sign is the same as that of all frames). FIG. 8 includes the mean value, the RMS difference, the positive / negative flag, and the inversion flag. In FIG. 8, since the positive and negative of the RMS difference with the immediately preceding frame are reversed in the 4th frame, the 10th frame, the 16th frame, the 17th frame, the 19th frame, and the 21st frame, the inversion flag is set to ". 1 "is set. Since there is no frame preceding the 0th frame, the RMS difference is blank for the 0th frame.

反転フラグが「１」のフレームにおいては、前フレームからＲＭＳ差の符号が逆転しているから、当該フレームにおいてユーザが大きな動きを行ったと推定される。反転フラグによって検出可能な大きな動きは、例えば、手を振る、まばたきをするなど周期的な動作であることが多い。そこで、本発明の一実施形態においては、反転フラグの「１」に設定されているフレーム、すなわち姿勢特徴量のＲＭＳ差の符号が前フレームと逆転したフレームを開始フレームとし、この開始フレームからＲＭＳ差の符号が再び反転するまで（次に反転フラグが「１」になるまで）の区間について注目することが容易になる。この開始フレームにおけるユーザの姿勢を開始姿勢とすることができる。また、開始フレームの次に反転フラグが「１」に設定されているフレームをトリガーフレームとし、このトリガーフレームにおけるユーザの姿勢をトリガー姿勢とすることができる。このように、反転フラグが「１」になってから次に反転フラグが「１」になるまでの区間に着目してユーザの動作や姿勢を解析できる。図８に示されている例では、第４フレームにおいて反転フラグが初めて「１」になっており、第１０フレームで反転フラグが２回目に「１」になっている。したがって、第４フレームにおけるユーザの姿勢を開始姿勢とし、第１０フレームにおけるユーザの姿勢をトリガー姿勢とする周期性のある動作やそのトリガー姿勢の抽出を低い演算コストで可能にする。開始姿勢を示す開始姿勢データ及びトリガー姿勢を示すトリガー姿勢データは、二乗平均平方根誤差（ＲＭＳＥ：ＲｏｏｔＭｅａｎＳｑｕａｒｅｄＥｒｒｏｒ）や平均平方二乗誤差率（ＲＭＳＰＴＥ：ＲｏｏｔＭｅａｎＳｑｕａｒｅｄＰｅｒｃｅｎｔａｇｅＥｒｒｏｒ）、または撮像画像から算出または推定することができる。 In the frame in which the inversion flag is "1", the sign of the RMS difference is reversed from the previous frame, so it is presumed that the user has made a large movement in the frame. The large movements that can be detected by the inversion flag are often periodic movements, such as waving or blinking. Therefore, in one embodiment of the present invention, the frame set to "1" of the inversion flag, that is, the frame in which the sign of the RMS difference of the posture feature amount is reversed from the previous frame is set as the start frame, and the RMS is started from this start frame. It becomes easy to pay attention to the interval until the sign of the difference is inverted again (then until the inversion flag becomes "1"). The posture of the user in this start frame can be set as the start posture. Further, the frame in which the inversion flag is set to "1" next to the start frame can be set as the trigger frame, and the posture of the user in this trigger frame can be set as the trigger posture. In this way, the user's movement and posture can be analyzed by focusing on the section from when the inversion flag becomes "1" to when the inversion flag becomes "1". In the example shown in FIG. 8, the inversion flag is set to "1" for the first time in the fourth frame, and the inversion flag is set to "1" for the second time in the tenth frame. Therefore, it is possible to extract a periodic operation with the user's posture in the 4th frame as the start posture and the user's posture in the 10th frame as the trigger posture and the trigger posture at a low calculation cost. The start posture data indicating the start posture and the trigger posture data indicating the trigger posture are calculated from the root mean square error (RMSE), the root mean square error rate (RMSPTE), or the captured image. Or can be estimated.

図８に示されている例では、第１６フレームと第１７フレームにおいていずれも反転フラグが「１」に設定されている。連続するフレーム間の時間間隔は極めて短いため、これらの隣接するフレームの各々におけるユーザの姿勢を開始姿勢及びトリガー姿勢とすることは適切でない可能性がある。そこで、開始フレームとトリガーフレームとの間に存在すべきフレーム数の下限である下限フレーム数が定められていてもよい。下限フレーム数は、例えば、通信パケットの圧縮、近隣フレームとの差分、及びこれら以外の要素を考慮して３フレーム以上としてもよい。開始フレームとトリガーフレームとの間の時間間隔の下限が定められても良い。開始フレームとトリガーフレームとの間の時間間隔の下限は、例えば、動作周波数、対象とするユーザの動作、及びこれら以外の要素を考慮して、１秒以上、２秒以上、３秒以上、４秒以上、５秒以上、又はこれら以外の下限としてもよい。仮に、下限フレーム数を３フレームとした場合、図８において第４フレームと第１０フレームとの間には下限フレーム数を超える５フレームが存在しているから、第１０フレームをトリガーフレームと扱うことができる。仮に第７フレームにおいて反転フラグが「１」に設定されたとすると、第４フレームと第７フレームとの間には下限フレーム数以下の２フレームしか存在していないから第７フレームをトリガーフレームとせずにその次に反転フラグが「１」になる第１０フレームをトリガーフレームとすることができる。しかしながら、この第１０フレームから次の対象フレームとなる１６フレームまでのＲＭＳＰＥは概ね１以下であるから、第１０フレームを雑音と判定し、トリガーフレームとして処理しなくともよい。 In the example shown in FIG. 8, the inversion flag is set to "1" in both the 16th frame and the 17th frame. Since the time interval between consecutive frames is extremely short, it may not be appropriate to set the user's posture in each of these adjacent frames as the start posture and the trigger posture. Therefore, the lower limit number of frames, which is the lower limit of the number of frames that should exist between the start frame and the trigger frame, may be set. The lower limit number of frames may be, for example, 3 frames or more in consideration of the compression of communication packets, the difference from neighboring frames, and other factors. A lower limit of the time interval between the start frame and the trigger frame may be set. The lower limit of the time interval between the start frame and the trigger frame is, for example, 1 second or more, 2 seconds or more, 3 seconds or more, 4 in consideration of the operating frequency, the operation of the target user, and other factors. It may be a lower limit of seconds or more, 5 seconds or more, or other than these. Assuming that the lower limit frame number is 3, there are 5 frames between the 4th frame and the 10th frame that exceed the lower limit frame number in FIG. 8, so that the 10th frame is treated as a trigger frame. Can be done. If the inversion flag is set to "1" in the 7th frame, the 7th frame is not used as the trigger frame because there are only 2 frames less than the lower limit number of frames between the 4th frame and the 7th frame. Next, the tenth frame in which the inversion flag becomes "1" can be used as the trigger frame. However, since the RMSPE from the 10th frame to the 16th frame to be the next target frame is approximately 1 or less, it is not necessary to determine the 10th frame as noise and process it as a trigger frame.

以上のように、ユーザの特徴点における姿勢特徴量の二乗平均平方根を利用して追加基準姿勢（開始姿勢及びトリガー姿勢）を定めることにより、候補フレームやサンプル動画を提示しなくともユーザアニメーションの登録並びに当該ユーザアニメーションに対応する開始姿勢及びトリガー姿勢を決定することができる。 As described above, by determining the additional reference postures (start posture and trigger posture) using the root mean square of the posture features at the user's feature points, the user animation can be registered without presenting candidate frames or sample videos. In addition, the start posture and the trigger posture corresponding to the user animation can be determined.

次に、図１０を参照して、一態様による動画再生方法に含まれる処理の一部について説明する。図１０は、一実施形態における動画再生方法における処理の一部の流れを示すフロー図である。図１０の動画再生方法においては、視聴ユーザが動画再生装置１０により動画を再生しており、当該動画に自らのアバタを参加させていることが想定されている。つまり、視聴ユーザが視聴している動画には自らのアバタが含まれている。図１０に示されている処理は、動画の視聴中に行われる姿勢特徴量及び基準姿勢識別データの動画配信装置２０への送信に関する。 Next, with reference to FIG. 10, a part of the processing included in the moving image reproduction method according to one aspect will be described. FIG. 10 is a flow chart showing a partial flow of processing in the moving image reproduction method according to the embodiment. In the moving image reproduction method of FIG. 10, it is assumed that the viewing user is playing the moving image by the moving image reproducing device 10 and has his / her own avatar participate in the moving image. That is, the video being viewed by the viewing user includes its own avatar. The process shown in FIG. 10 relates to the transmission of the posture feature amount and the reference posture identification data to the video distribution device 20 performed during viewing of the video.

まず、ステップＳ１１において、動画を視聴している視聴ユーザの姿勢特徴量に関するデータが取得される。また、ステップＳ１１では、取得された姿勢特徴量が即時に動画配信装置２０に送信される。ステップＳ１１においては、姿勢特徴量及び他のデータを含むモーションデータが動画配信装置２０に送信されてもよい。視聴ユーザの姿勢特徴量は、当該視聴ユーザの姿勢を表す３次元骨格データであってもよい。姿勢特徴量は、所定のサンプリング時間間隔ごとに取得されてもよい。視聴ユーザの姿勢特徴量の取得は、例えば、上記の姿勢データ取得部１１ｂ又はセンサユニット１５で行われる。姿勢特徴量の送信は、例えば、上記の送信部１１ｃにより行われる。 First, in step S11, data regarding the posture feature amount of the viewing user who is viewing the moving image is acquired. Further, in step S11, the acquired posture feature amount is immediately transmitted to the moving image distribution device 20. In step S11, motion data including the posture feature amount and other data may be transmitted to the moving image distribution device 20. The posture feature amount of the viewing user may be three-dimensional skeleton data representing the posture of the viewing user. Posture features may be acquired at predetermined sampling time intervals. The posture feature amount of the viewing user is acquired by, for example, the posture data acquisition unit 11b or the sensor unit 15. The posture feature amount is transmitted by, for example, the transmission unit 11c described above.

次に、ステップＳ１２において、ステップＳ１１で取得された視聴ユーザの姿勢特徴量に基づいて、当該視聴ユーザの姿勢が予め定められた基準姿勢に属するか否かが判定される。この判定は、例えば、上記の分類部１１ｄにより行われる。視聴ユーザの姿勢が基準姿勢に属しない場合には、処理はステップＳ１１に戻り、ステップＳ１１において姿勢特徴量の取得が引き続き行われる。視聴ユーザの姿勢が基準姿勢のいずれかに属すると判定された場合には、処理は、ステップＳ１３に進む。 Next, in step S12, it is determined whether or not the posture of the viewing user belongs to a predetermined reference posture based on the posture feature amount of the viewing user acquired in step S11. This determination is made, for example, by the classification unit 11d described above. If the posture of the viewing user does not belong to the reference posture, the process returns to step S11, and the acquisition of the posture feature amount is continued in step S11. If it is determined that the posture of the viewing user belongs to any of the reference postures, the process proceeds to step S13.

ステップＳ１３では、動画再生装置１０と動画配信装置２０との間の伝送路においてモーションデータの送信に遅延が発生しているか否かが判定される。この判定は、例えば、上記の遅延監視部１１ｅによって行われる。伝送路において遅延が発生していないと判定された場合には、処理はステップＳ１１に戻り、ステップＳ１１において姿勢特徴量の取得が引き続き行われる。伝送路において遅延が発生していると判定された場合には、処理はステップＳ１４に進む。 In step S13, it is determined whether or not there is a delay in the transmission of motion data in the transmission line between the moving image reproduction device 10 and the moving image distribution device 20. This determination is performed, for example, by the delay monitoring unit 11e described above. If it is determined that no delay has occurred in the transmission line, the process returns to step S11, and the acquisition of the posture feature amount is continued in step S11. If it is determined that a delay has occurred in the transmission line, the process proceeds to step S14.

ステップＳ１４では、ステップＳ１２において視聴ユーザの姿勢が属すると判定された基準姿勢を識別する基準姿勢識別データが動画配信装置２０に送信される。基準姿勢識別データの送信は、例えば、上記の送信部１１ｃにより行われる。ステップＳ１２で視聴ユーザの姿勢が開始姿勢に属すると判定され、その後さらにトリガー姿勢に属すると判定された場合には、このトリガー姿勢を識別するトリガー姿勢データが送信される。 In step S14, the reference posture identification data for identifying the reference posture determined to belong to the posture of the viewing user in step S12 is transmitted to the moving image distribution device 20. The reference posture identification data is transmitted by, for example, the transmission unit 11c described above. If it is determined in step S12 that the posture of the viewing user belongs to the start posture and then further belongs to the trigger posture, the trigger posture data that identifies this trigger posture is transmitted.

以上のステップＳ１１〜ステップＳ１４の各工程における処理と並行して、動画配信装置２０から動画再生装置１０に対して動画が継続して配信されている。動画の配信中には、ステップＳ１１〜ステップＳ１４の処理が繰り返し行われる。ステップＳ１３において伝送路において遅延が発生していないと判定されている場合には、配信される動画には視聴ユーザの姿勢特徴量に基づいて生成されたアバタのアニメーションが含められる。逆に、ステップＳ１３において伝送路において遅延が発生していると判定された場合には、配信される動画には動画配信装置２０に送信された基準姿勢識別データに基づいて特定されるアバタの登録アニメーションが含められる。 In parallel with the processes in each of the above steps S11 to S14, the moving image is continuously distributed from the moving image distribution device 20 to the moving image reproducing device 10. During the distribution of the moving image, the processes of steps S11 to S14 are repeated. When it is determined in step S13 that no delay has occurred in the transmission line, the delivered moving image includes an avatar animation generated based on the posture feature amount of the viewing user. On the contrary, when it is determined in step S13 that a delay has occurred in the transmission line, the avatar specified based on the reference posture identification data transmitted to the video distribution device 20 is registered in the distributed video. Animation is included.

ステップＳ１３は、ステップＳ１１の前又はステップＳ１１とステップＳ１２との間に実行されてもよい。これ以外にも、図１０に示されている処理の順番は、適宜変更されてもよい。また、ステップＳ１１〜ステップＳ１４の処理に加えて、図１０に明示的に記載されていない処理が行われてもよい。 Step S13 may be executed before step S11 or between step S11 and step S12. In addition to this, the order of processing shown in FIG. 10 may be changed as appropriate. Further, in addition to the processes of steps S11 to S14, processes not explicitly shown in FIG. 10 may be performed.

次に、図１１から図１３を参照して、本発明の他の実施形態について説明する。図１１には、本発明の他の実施形態による動画配信システム１０１のブロック図が示されている。動画配信システム１０１は、動画再生装置１１０及び動画配信装置１２０を備えている。動画配信システム１０１の動画再生装置１１０は、コンピュータプロセッサ１１が顔特徴量取得部１１ｆとして機能し、ストレージ１３が基準表情管理データ１３ｂを記憶している点で動画配信システム１の動画再生装置１０と異なる。動画配信システム１０１の動画配信装置１２０は、ストレージ２３が登録表情管理データ２３ｄを記憶している点で動画配信システム１の動画配信装置２０と異なっている。以下では、図１１の実施形態において、図１の実施形態と異なる点を説明する。図１１の実施形態において図１の実施形態と共通する事項については説明を省略する。 Next, other embodiments of the present invention will be described with reference to FIGS. 11 to 13. FIG. 11 shows a block diagram of the moving image distribution system 101 according to another embodiment of the present invention. The video distribution system 101 includes a video playback device 110 and a video distribution device 120. The video playback device 110 of the video distribution system 101 is different from the video playback device 10 of the video distribution system 1 in that the computer processor 11 functions as the facial feature amount acquisition unit 11f and the storage 13 stores the reference facial expression management data 13b. different. The video distribution device 120 of the video distribution system 101 is different from the video distribution device 20 of the video distribution system 1 in that the storage 23 stores the registered facial expression management data 23d. Hereinafter, the differences between the embodiment of FIG. 11 and the embodiment of FIG. 1 will be described. The matters common to the embodiment of FIG. 1 in the embodiment of FIG. 11 will not be described.

顔特徴量取得部１１ｆは、動画再生装置１１０を使用して動画を視聴する視聴ユーザの顔の特徴を表す顔特徴量を取得する。具体的には、顔特徴量取得部１１ｆは、センサユニット１５により取得された視聴ユーザの顔を含む画像から、例えばＨＯＧ、ＳＩＦＴ、ＳＵＲＦ、又はこれら以外の公知のアルゴリズムに従って顔の特徴を表す特徴量を算出する。例えば、ＳＩＦＴにより算出される顔特徴量は、１２８次元の特徴量として算出される。このように、顔特徴量は、姿勢データ取得部１１ｂで取得される姿勢特徴量と同様にデータ量が大きい。 The face feature amount acquisition unit 11f acquires a face feature amount representing the facial features of the viewing user who views the moving image using the moving image playback device 110. Specifically, the facial feature amount acquisition unit 11f is a feature that represents facial features from an image including the face of the viewing user acquired by the sensor unit 15 according to, for example, HOG, SIFT, SURF, or a known algorithm other than these. Calculate the amount. For example, the facial feature amount calculated by SIFT is calculated as a 128-dimensional feature amount. As described above, the facial feature amount has a large amount of data as well as the posture feature amount acquired by the posture data acquisition unit 11b.

図１２は、基準表情管理データの例を示している。図１２に示されているように、ストレージ１３には、基準表情を識別する基準表情識別データと対応付けて、当該基準表情を示す基準表情識別データが記憶されている。基準表情識別データは、例えば、基準表情を識別する基準表情ＩＤである。基準表情ＩＤは、例えば数ビットで表される識別コードである。基準表情ＩＤのビット数は、動画配信システム１で使用される基準表情の数に応じて定めることができる。基準表情ＩＤは、例えば、１０ビット以下の情報量のデータで表される。基準表情ＩＤは、５ビット以下、４ビット以下、３ビット以下、２ビット以下の情報量のデータで表されてもよい。したがって、基準表情識別データ（基準表情ＩＤ）は、顔特徴量よりも大幅にデータ量が小さい。 FIG. 12 shows an example of reference facial expression management data. As shown in FIG. 12, the storage 13 stores the reference facial expression identification data indicating the reference facial expression in association with the reference facial expression identification data for identifying the reference facial expression. The reference facial expression identification data is, for example, a reference facial expression ID that identifies the reference facial expression. The reference facial expression ID is, for example, an identification code represented by several bits. The number of bits of the reference facial expression ID can be determined according to the number of reference facial expressions used in the moving image distribution system 1. The reference facial expression ID is represented by, for example, data having an amount of information of 10 bits or less. The reference facial expression ID may be represented by data having an amount of information of 5 bits or less, 4 bits or less, 3 bits or less, and 2 bits or less. Therefore, the amount of the reference facial expression identification data (reference facial expression ID) is significantly smaller than the amount of facial features.

基準表情識別データは、基準表情を示すデータである。動画配信システム１には、一又は複数の基準表情を設定し得る。基準表情は、例えば、正面を向いて目を大きく見開いた表情である。基準表情識別データは、基準表情を記述するデータであり、顔特徴量取得部１１ｆにより算出される顔特徴量と同じデータ構造を有する。顔特徴量取得部１１ｆにより算出される顔特徴量がＳＩＦＴアルゴリズムで算出されるＳＩＦＴ特徴量である場合には、基準表情識別データは、基準表情をＳＩＦＴ特徴量と同じデータ形式で記述したものである。 The reference facial expression identification data is data indicating a reference facial expression. One or more reference facial expressions may be set in the video distribution system 1. The reference facial expression is, for example, a facial expression with the eyes wide open facing the front. The reference facial expression identification data is data that describes the reference facial expression, and has the same data structure as the face feature amount calculated by the face feature amount acquisition unit 11f. When the face feature amount calculated by the face feature amount acquisition unit 11f is the SIFT feature amount calculated by the SIFT algorithm, the reference expression identification data describes the reference expression in the same data format as the SIFT feature amount. is there.

分類部１１ｄは、視聴ユーザの表情を分類する分類器により、当該視聴ユーザの顔特徴量に基づいて、当該視聴ユーザの表情が予め定められた基準表情に属するか否かを判定する。一実施形態において、分類部１１ｄは、教師データを得るために、基準表情を表す画像をディスプレイ１４に表示し、この基準表情を表す画像に従った表情をする視聴ユーザに促すことができる。コンテンツ再生装置１１０は、この基準表情を表す画像の表示に応答して視聴ユーザが作った表情の顔特徴量を顔特徴量取得部１１ｆにより取得する。分類部１１ｄは、基準表情を表す画像に対して視聴ユーザが作った表情の表情特徴量を教師データとして学習することにより分類器を作成することができる。 The classification unit 11d determines whether or not the facial expression of the viewing user belongs to a predetermined reference facial expression based on the facial feature amount of the viewing user by a classifier that classifies the facial expression of the viewing user. In one embodiment, the classification unit 11d can display an image representing a reference facial expression on the display 14 in order to obtain teacher data, and prompt the viewing user to make a facial expression according to the image representing the reference facial expression. The content reproduction device 110 acquires the facial feature amount of the facial expression created by the viewing user in response to the display of the image representing the reference facial expression by the facial feature amount acquisition unit 11f. The classification unit 11d can create a classifier by learning the facial expression features of the facial expression created by the viewing user with respect to the image representing the reference facial expression as teacher data.

送信部１１ｃは、顔特徴量取得部１１ｆにより取得された視聴ユーザの顔の特徴を表す顔特徴量を含むモーションデータを動画配信装置１２０に送信する。送信部１１ｃは、遅延監視部１１ｅにおいて伝送路においてモーションデータの伝送に遅延が発生していると判定された場合に、ストレージ１３に記憶されている基準表情識別データ（基準表情ＩＤ）を動画配信装置１２０へ送信することができる。遅延監視部１１ｅにおいて伝送路に遅延が発生していると判定された場合には、基準表情識別データは、顔特徴量に代えて動画配信装置１２０へ送信されてもよい。 The transmission unit 11c transmits motion data including a face feature amount representing the facial features of the viewing user acquired by the face feature amount acquisition unit 11f to the moving image distribution device 120. When the delay monitoring unit 11e determines that the transmission of motion data is delayed in the transmission line, the transmission unit 11c distributes the reference facial expression identification data (reference facial expression ID) stored in the storage 13 as a moving image. It can be transmitted to the device 120. When the delay monitoring unit 11e determines that a delay has occurred in the transmission line, the reference facial expression identification data may be transmitted to the moving image distribution device 120 instead of the facial feature amount.

動画配信装置２０において、アニメーション生成部２１ｃは、動画６０に参加している視聴者のアバタのアニメーションを生成する。一実施形態において、アニメーション生成部２１ｃは、動画再生装置１１０から受信した視聴ユーザの顔特徴量に基づいて、当該視聴ユーザのアバタのアニメーションを生成する。アニメーション生成部２１ｃは、動画再生装置１１０から視聴ユーザの顔特徴量を連続して受信することで、当該顔特徴量に基づいて、当該視聴ユーザの顔の動き（表情の変化）に同期して表情を変化させるアバタのアニメーションを生成することができる。 In the video distribution device 20, the animation generation unit 21c generates an animation of the viewer's avatar participating in the video 60. In one embodiment, the animation generation unit 21c generates an animation of the viewing user's avatar based on the amount of facial features of the viewing user received from the moving image playback device 110. The animation generation unit 21c continuously receives the facial feature amount of the viewing user from the moving image playback device 110, and synchronizes with the facial movement (change of facial expression) of the viewing user based on the facial feature amount. It is possible to generate an avatar animation that changes the facial expression.

一実施形態におけるアニメーション生成部２１ｃは、登録表情管理データ２３ｄを参照することにより、動画再生装置１１０からの基準表情識別データ（例えば、基準表情ＩＤ）に基づいて、表情の動きを含むアバタのアニメーションを生成することができる。図１３に、図１１の実施形態における登録表情管理データ２３ｄの例が示されている。図１３に示されているように、登録表情管理データ２３ｄは、基準表情識別データ（登録表情ＩＤ）と、登録表情を識別する登録表情識別データ（登録表情ＩＤ）と、アバタの表情の動きを含むアニメーションを特定するためのアニメーション定義データと、を有する。このアニメーション定義データは、アバタの表情の動きを定義するデータである。登録表情管理データ２３ｄに含まれるアニメーション定義データは、アバタの顔の特徴点の位置を時系列的に記述するデータであってもよい。アニメーション生成部２１ｃは、動画再生装置１１０から基準表情識別データを受信すると、上記のアニメーション管理データ２３ｄを参照することにより、受信した基準表情識別データに対応付けられている登録表情識別データを特定することができ、この特定された登録表情識別データに対応付けられているアニメーション定義データに基づいて表情の動きを含むアバタのアニメーションを生成することができる。 The animation generation unit 21c in one embodiment refers to the registered facial expression management data 23d, and based on the reference facial expression identification data (for example, the reference facial expression ID) from the moving image playback device 110, the animation of the avatar including the movement of the facial expression. Can be generated. FIG. 13 shows an example of the registered facial expression management data 23d according to the embodiment of FIG. As shown in FIG. 13, the registered facial expression management data 23d displays the reference facial expression identification data (registered facial expression ID), the registered facial expression identification data for identifying the registered facial expression (registered facial expression ID), and the movement of the avatar's facial expression. It has animation definition data for specifying the including animation. This animation definition data is data that defines the movement of the facial expression of the avatar. The animation definition data included in the registered facial expression management data 23d may be data that describes the positions of the feature points of the avatar's face in chronological order. When the animation generation unit 21c receives the reference facial expression identification data from the moving image playback device 110, the animation generation unit 21c identifies the registered facial expression identification data associated with the received reference facial expression identification data by referring to the animation management data 23d. It is possible to generate an avatar animation including facial expression movements based on the animation definition data associated with the specified registered facial expression identification data.

次に、図１４を参照して、本発明の他の実施形態について説明する。図１４には、本発明の他の実施形態による動画配信システム２０１のブロック図が示されている。概して言えば、図１に示されている動画配信システム１においては、視聴ユーザの姿勢を表す姿勢特徴量がリアルタイムで伝送されるのに対し、図１４に示されている動画配信システム２０１においては、視聴ユーザが特徴のある動きを行った場合にのみ当該視聴ユーザの姿勢特徴量が伝送される点で両者は異なっている。動画配信システム２０１は、動画再生装置２１０及び動画配信装置２２０を備えている。動画配信システム２０１の動画再生装置２１０は、コンピュータプロセッサ１１が判定部１１ｇ及び送信部１１ｈとして機能する。動画再生装置２１０は基準表情管理データ１３ｂを有していなくともよく、動画配信装置２２０はアニメーション管理データ２３ｃを有していなくともよい。図１４の実施形態において図１の実施形態と共通する事項については説明を省略する。 Next, another embodiment of the present invention will be described with reference to FIG. FIG. 14 shows a block diagram of the moving image distribution system 201 according to another embodiment of the present invention. Generally speaking, in the video distribution system 1 shown in FIG. 1, the posture feature amount representing the posture of the viewing user is transmitted in real time, whereas in the video distribution system 201 shown in FIG. 14, the posture feature amount is transmitted in real time. The two are different in that the posture feature amount of the viewing user is transmitted only when the viewing user makes a characteristic movement. The video distribution system 201 includes a video playback device 210 and a video distribution device 220. In the moving image playback device 210 of the moving image distribution system 201, the computer processor 11 functions as a determination unit 11g and a transmission unit 11h. The moving image playback device 210 does not have to have the reference facial expression management data 13b, and the moving image distribution device 220 does not have to have the animation management data 23c. The matters common to the embodiment of FIG. 1 in the embodiment of FIG. 14 will not be described.

判定部１１ｇは、動画再生装置２１０のユーザが特徴的な動きを行ったか否かを判定する。具体的には、姿勢データ取得部１１ｂにおいて所定のフレームレートで動画再生装置２１０のユーザの特徴点の姿勢特徴量を算出し、判定部１１ｇは、このフレームの姿勢特徴量のＲＭＳ差に基づいて当該ユーザが特徴的な動きを行ったか否かを判定する。フレームごとの姿勢特徴量の算出及びＲＭＳ差の算出については図８を参照して説明済である。一実施形態において、判定部１１ｇは、ＲＭＳ差の正負が逆転したフレームにおいてユーザが特徴的な動きを行ったと判定する。上記のように、あるフレームにおいてＲＭＳ差の符号が前フレームと逆転している場合には、当該フレームにおいてユーザが手を振る、大きくうなずくなどの往復動を伴う特徴的な動きを行ったと推定される。例えば、姿勢データ取得部１１ｂにおいて図８に示す２４フレーム分の姿勢特徴量が算出された場合には、反転フラグに「１」が設定されている第４フレーム、第１０フレーム、第１６フレーム、第１７フレーム、第１９フレーム、及び第２１フレームにおいてユーザが特徴的な動きを行ったと判定される。 The determination unit 11g determines whether or not the user of the moving image reproduction device 210 has performed a characteristic movement. Specifically, the posture data acquisition unit 11b calculates the posture feature amount of the user's feature point of the moving image playback device 210 at a predetermined frame rate, and the determination unit 11g is based on the RMS difference of the posture feature amount of this frame. It is determined whether or not the user has performed a characteristic movement. The calculation of the posture feature amount for each frame and the calculation of the RMS difference have already been described with reference to FIG. In one embodiment, the determination unit 11g determines that the user has made a characteristic movement in the frame in which the positive and negative of the RMS difference are reversed. As described above, when the sign of the RMS difference is reversed from that of the previous frame in a certain frame, it is presumed that the user made a characteristic movement accompanied by a reciprocating movement such as waving or nodding in the frame. To. For example, when the posture data acquisition unit 11b calculates the posture feature amount for 24 frames shown in FIG. 8, the fourth frame, the tenth frame, the 16th frame, in which the inversion flag is set to "1", It is determined that the user has made a characteristic movement in the 17th frame, the 19th frame, and the 21st frame.

一実施形態において、判定部１１ｇは、ＲＭＳＰＥが所定の閾値よりも大きいフレームにおいてユーザが大きな動きを行ったと判定する。この閾値は、例えば、１．０とすることができる。判定部１１ｇが使用する閾値は、適宜変更可能である。図８の例においては、第４フレームから第７フレームにおいてＲＭＳＰＥが閾値である１．０よりも大きくなっている。よって、判定部１１ｇは、第４フレームから第８フレームにおいてユーザが大きな動きを行ったと判定することができる。 In one embodiment, the determination unit 11g determines that the user has made a large movement in a frame in which the RMSPE is greater than a predetermined threshold. This threshold can be, for example, 1.0. The threshold value used by the determination unit 11g can be changed as appropriate. In the example of FIG. 8, the RMSPE is larger than the threshold value of 1.0 in the 4th to 7th frames. Therefore, the determination unit 11g can determine that the user has made a large movement in the 4th to 8th frames.

送信部１１ｈは、判定部１１ｇによってユーザが特徴的な動きを行ったと判定されたフレームを選択し、この選択されたフレームにおける姿勢特徴量を動画配信装置２２０に送信する。送信部１１ｈにより選択されたフレームを「選択フレーム」と呼んでも良い。選択フレームは、判定部１１ｇによってユーザが特徴的な動きを行ったと判定されたフレームだけであってもよいし、判定部１１ｇによってユーザが特徴的な動きを行ったと判定されたフレーム及びそれに続く一又は複数のフレームであってもよい。このように、送信部１１ｈは、選択フレームにおける姿勢特徴量を動画配信装置２２０に送信する一方、選択フレーム以外のフレームにおける姿勢特徴量を動画配信装置２２０に送信しないように構成される。 The transmission unit 11h selects a frame determined by the determination unit 11g that the user has performed a characteristic movement, and transmits the posture feature amount in the selected frame to the moving image distribution device 220. The frame selected by the transmission unit 11h may be referred to as a "selected frame". The selection frame may be only a frame determined by the determination unit 11g that the user has performed a characteristic movement, or a frame determined by the determination unit 11g that the user has performed a characteristic movement, and one subsequent frame. Alternatively, it may be a plurality of frames. In this way, the transmission unit 11h is configured so that the posture feature amount in the selected frame is transmitted to the moving image distribution device 220, while the posture feature amount in the frames other than the selected frame is not transmitted to the moving image distribution device 220.

送信部１１ｈによって送信された姿勢特徴量を受信した動画配信装置２２０において、アニメーション生成部２１ｃは、受信した姿勢特徴量に基づいて、動画再生装置２１０のユーザのアバタのアニメーションを生成する。姿勢特徴量に基づくアバタのアニメーションの生成方法は、動画配信システム１における生成方法と同様である。動画配信システム２０１においては、動画再生装置２１０から動画配信装置２２０に対して姿勢特徴量が伝送されない期間がある。具体的には、選択フレーム以外のフレームにおいては姿勢特徴量は動画配信装置２２０に伝送されない。この姿勢特徴量が伝送されない期間において、アニメーション生成部２１ｃは、アバタについて定められている基本動作に従った動きを行うように当該アバタのアニメーションを作成してもよい。アバタの基本動作は記述の通りであり、手を揺らす動作などの予め定められた動作である。 In the video distribution device 220 that has received the posture feature amount transmitted by the transmission unit 11h, the animation generation unit 21c generates an animation of the user's avatar of the video playback device 210 based on the received posture feature amount. The method of generating the avatar animation based on the posture feature amount is the same as the method of generating the animation in the moving image distribution system 1. In the video distribution system 201, there is a period in which the posture feature amount is not transmitted from the video playback device 210 to the video distribution device 220. Specifically, the posture feature amount is not transmitted to the moving image distribution device 220 in frames other than the selected frame. During the period when the posture feature amount is not transmitted, the animation generation unit 21c may create an animation of the avatar so as to perform a movement according to a basic motion defined for the avatar. The basic movements of the avatar are as described, and are predetermined movements such as waving hands.

動画配信システム２０１においては、選択フレームにおいてのみ姿勢特徴量が動画再生装置２１０から動画配信装置２２０に伝送されるので、算出された姿勢特徴量をリアルタイムで伝送する態様よりも伝送されるデータ量を削減することができる。 In the video distribution system 201, since the posture feature amount is transmitted from the video playback device 210 to the video distribution device 220 only in the selected frame, the amount of data to be transmitted is smaller than the mode in which the calculated posture feature amount is transmitted in real time. Can be reduced.

次に、図１５を参照して、本発明の他の実施形態について説明する。図１５には、本発明の他の実施形態による動画配信システム３０１のブロック図が示されている。動画配信システム１においては、動画再生装置において姿勢特徴量などの動画生成に必要なデータに基づいて動画を生成する点で動画配信システム１と異なっている。具体的には、動画配信システム３０１は、姿勢データ取得装置３１０と、動画配信装置３２０と、動画再生装置３３０と、を備える。姿勢データ取得装置３１０と、動画配信装置３２０、及び動画再生装置３３０は、ネットワーク５０を介して接続されている。動画配信システム３０１においては、姿勢データ取得装置３１０のユーザの動きに基づいて生成されるアバタのアニメーションを含む動画を動画再生装置３２０において生成し、この生成した動画を再生することが想定されている。つまり、姿勢データ取得装置３１０のユーザは、自らの動きに基づいて動くアバタを含む動画を、視聴ユーザによって使用される動画再生装置３２０において再生させることができる。姿勢データ取得装置３１０のユーザは、動画再生装置３２０のユーザとアバタを介して対話することができる。本実施形態に関して姿勢データ取得装置３１０のユーザを単に「配信ユーザ」と呼ぶことがある。図１５には説明のため姿勢データ取得装置３１０、動画配信装置３２０、及び動画再生装置３３０が一つずつ図示されているが、動画配信システム３０１は、これらの装置を複数備えてもよい。 Next, another embodiment of the present invention will be described with reference to FIG. FIG. 15 shows a block diagram of the moving image distribution system 301 according to another embodiment of the present invention. The moving image distribution system 1 is different from the moving image distribution system 1 in that the moving image reproducing device generates a moving image based on data necessary for generating a moving image such as a posture feature amount. Specifically, the video distribution system 301 includes a posture data acquisition device 310, a video distribution device 320, and a video playback device 330. The posture data acquisition device 310, the video distribution device 320, and the video playback device 330 are connected via the network 50. In the video distribution system 301, it is assumed that the video playback device 320 generates a video including an avatar animation generated based on the movement of the user of the posture data acquisition device 310, and the generated video is played back. .. That is, the user of the posture data acquisition device 310 can play a moving image including an avatar that moves based on his / her own movement on the moving image reproducing device 320 used by the viewing user. The user of the posture data acquisition device 310 can interact with the user of the moving image playback device 320 via the avatar. Regarding the present embodiment, the user of the posture data acquisition device 310 may be simply referred to as a “delivery user”. Although the posture data acquisition device 310, the moving image distribution device 320, and the moving image reproducing device 330 are shown one by one in FIG. 15 for explanation, the moving image distribution system 301 may include a plurality of these devices.

姿勢データ取得装置３１０は、姿勢データ取得装置３１０のユーザの姿勢の特徴を表す姿勢特徴量を取得する。視聴ユーザの姿勢の特徴を表す姿勢特徴量は、当該視聴ユーザの姿勢を表す３次元骨格データ、つまり視聴ユーザのボーンの位置及び向きを示す３次元ベクトルデータであってもよい。姿勢データ取得装置３１０は、既述のセンサユニット１５を備えてもよい。姿勢データ取得装置３１０は、センサユニット１５が検出した検出データに基づいて視聴ユーザの３次元骨格データを生成することができる。姿勢データ取得装置３１０は、取得された姿勢特徴量を動画配信装置３２０に送信する。姿勢データ取得装置３１０は、配信ユーザの音声を表す音声データを動画配信装置３２０に送信しても良い。 The posture data acquisition device 310 acquires a posture feature amount representing the posture characteristics of the user of the posture data acquisition device 310. The posture feature amount representing the posture feature of the viewing user may be three-dimensional skeleton data representing the posture of the viewing user, that is, three-dimensional vector data indicating the position and orientation of the bones of the viewing user. The posture data acquisition device 310 may include the sensor unit 15 described above. The posture data acquisition device 310 can generate three-dimensional skeleton data of the viewing user based on the detection data detected by the sensor unit 15. The posture data acquisition device 310 transmits the acquired posture feature amount to the moving image distribution device 320. The posture data acquisition device 310 may transmit voice data representing the voice of the distribution user to the video distribution device 320.

動画配信装置３２０は、既述の動画配信装置２０と同様にコンピュータプロセッサ２１と、通信Ｉ／Ｆ２２と、ストレージ２３と、を備える。動画配信装置３２０は、コンピュータプロセッサ２１によって実現される機能及びストレージ２３に格納されるデータの点で動画配信装置２０と異なっている。そこで、以下では、動画配信装置３２０のコンピュータプロセッサ２１において実現される機能及びストレージ２３に格納されるデータについて説明する。 The video distribution device 320 includes a computer processor 21, a communication I / F 22, and a storage 23, similarly to the video distribution device 20 described above. The video distribution device 320 differs from the video distribution device 20 in the functions realized by the computer processor 21 and the data stored in the storage 23. Therefore, in the following, the functions realized by the computer processor 21 of the moving image distribution device 320 and the data stored in the storage 23 will be described.

図示のように、動画配信システム３０１ストレージ２３には、オブジェクトデータ２３ａ、アバタデータ２３ｂ、基準姿勢管理データ３２３ａ、及び前記以外の動画の生成及び配信のために必要な様々な情報が記憶され得る。オブジェクトデータ２３ａ及びアバタデータ２３ｂは、動画配信システム１において記憶されるオブジェクトデータ２３ａ及びアバタデータ２３ｂと同じであってもよい。基準姿勢管理データ３２３ａは、動画配信システム１における基準姿勢管理データ１３ａと同じものであってもよい。すなわち、基準姿勢管理データ３２３ａは、図３に示されているように、基準姿勢を識別する基準姿勢識別データと、この基準姿勢識別データと対応付けて記憶される開始姿勢データ及びトリガー姿勢データを含むことができる。 As shown in the figure, the moving image distribution system 301 storage 23 can store object data 23a, avatar data 23b, reference posture management data 323a, and various information necessary for generating and distributing moving images other than the above. The object data 23a and the avatar data 23b may be the same as the object data 23a and the avatar data 23b stored in the moving image distribution system 1. The reference posture management data 323a may be the same as the reference posture management data 13a in the moving image distribution system 1. That is, as shown in FIG. 3, the reference posture management data 323a includes the reference posture identification data for identifying the reference posture, and the start posture data and the trigger posture data stored in association with the reference posture identification data. Can include.

コンピュータプロセッサ２１は、コンピュータ読み取り可能な命令を実行することにより、姿勢特徴量取得部３２１ａ、送信部３２１ｂ、分類部３２１ｃ、及び遅延監視部３２１ｄとして機能する。コンピュータプロセッサ２１により実現される機能の少なくとも一部は、動画配信システム３０１のコンピュータプロセッサ２１以外のコンピュータプロセッサにより実現されてもよい。 The computer processor 21 functions as a posture feature acquisition unit 321a, a transmission unit 321b, a classification unit 321c, and a delay monitoring unit 321d by executing a computer-readable instruction. At least a part of the functions realized by the computer processor 21 may be realized by a computer processor other than the computer processor 21 of the moving image distribution system 301.

姿勢特徴量取得部３２１ａは、姿勢データ取得装置３１０から、姿勢データ取得装置３１０のユーザの姿勢を表す姿勢特徴量を取得する。 The posture feature amount acquisition unit 321a acquires the posture feature amount representing the user's posture of the posture data acquisition device 310 from the posture data acquisition device 310.

送信部３２１ｂは、姿勢データ取得装置３１０から取得された当該姿勢データ取得装置３１０のユーザの姿勢を表す姿勢特徴量を動画再生装置３３０に送信する。送信部３２１ｂは、姿勢データ取得装置３１０から姿勢特徴量を受け取ると即時に動画配信装置３２０に送信してもよい。言い換えると、送信部３２１ｂは、配信ユーザの姿勢特徴量をリアルタイムに動画配信装置３２０に送信することができる。送信部３２１ｂは、この姿勢特徴量を含むパケットを動画再生装置３２０に対して送信してもよい。送信部３２１ｂは、姿勢特徴量に加えて、動画再生装置３３０における動画の生成に必要なデータを動画再生装置３３０に送信することができる。例えば、送信部３２１ｂは、オブジェクトデータ２３ａ及びアバタデータ２３ｂの少なくとも一部を動画再生装置３３０に送信することができる。オブジェクトデータ２３ａ及びアバタデータ２３ｂは、送信部３２１ｂによって動画再生装置３３０に送信されるのではなく、動画再生装置３３０に予め記憶されていてもよい。この場合、動画配信装置３２０から動画再生装置３３０へのオブジェクトデータ２３ａ及びアバタデータ２３ｂの送信は不要となる。また、送信部３２１ｂは、動画配信装置３２０が生成データ取得装置３１０から取得した配信ユーザの音声を表す音声データを動画再生装置３３０に送信しても良い。 The transmission unit 321b transmits the posture feature amount representing the posture of the user of the posture data acquisition device 310 acquired from the posture data acquisition device 310 to the moving image playback device 330. When the transmission unit 321b receives the posture feature amount from the posture data acquisition device 310, it may immediately transmit it to the moving image distribution device 320. In other words, the transmission unit 321b can transmit the posture feature amount of the distribution user to the video distribution device 320 in real time. The transmission unit 321b may transmit a packet including this posture feature amount to the moving image playback device 320. In addition to the posture feature amount, the transmission unit 321b can transmit the data necessary for generating the moving image in the moving image reproducing device 330 to the moving image reproducing device 330. For example, the transmission unit 321b can transmit at least a part of the object data 23a and the avatar data 23b to the moving image playback device 330. The object data 23a and the avatar data 23b may be stored in advance in the moving image reproduction device 330 instead of being transmitted to the moving image reproduction device 330 by the transmission unit 321b. In this case, it is not necessary to transmit the object data 23a and the avatar data 23b from the video distribution device 320 to the video playback device 330. Further, the transmission unit 321b may transmit the voice data representing the voice of the distribution user acquired from the generation data acquisition device 310 by the video distribution device 320 to the video playback device 330.

分類部３２１ｃは、配信ユーザの姿勢を分類する分類器により、当該視聴ユーザの姿勢特徴量（例えば、３次元骨格データ）に基づいて、当該視聴ユーザの姿勢が予め定められた基準姿勢に属するか否かを判定する。この分類器は、例えば線形分類器である。分類部３２１ｃは、動画配信システム１の分類部１１ｄと概ね同じ機能を実現してもよい。例えば、分類部３２１ｃは、基準姿勢が開始姿勢とトリガー姿勢とを含む場合に、配信ユーザの姿勢が当該基準姿勢に属するか否か及び当該トリガー姿勢に属するか否かをそれぞれ判定することができる。 The classification unit 321c uses a classifier that classifies the posture of the distribution user, and whether the posture of the viewing user belongs to a predetermined reference posture based on the posture feature amount (for example, three-dimensional skeleton data) of the viewing user. Judge whether or not. This classifier is, for example, a linear classifier. The classification unit 321c may realize substantially the same function as the classification unit 11d of the moving image distribution system 1. For example, when the reference posture includes the start posture and the trigger posture, the classification unit 321c can determine whether or not the posture of the distribution user belongs to the reference posture and whether or not the posture belongs to the trigger posture. ..

遅延監視部３２１ｄは、送信部３２１ｃによって動画配信装置３２０から動画再生装置３３０に送信されたモーションデータの伝送遅延を監視する。遅延監視部３２１ｄは、動画配信システム１の遅延監視部１１ｅと概ね同じ機能を実現しても良い。例えば、遅延監視部３２１ｄは、モーションデータを含む実パケットに送信前にタイムスタンプを付加し、この送信時に付加されたタイプスタンプと、この実パケットが動画再生装置３３０で受信されたときに付加されるタイムスタンプと、を用いて動画配信装置３２０と動画再生装置３３０との間の伝送路における当該実パケットの伝送時間を求めることができる。伝送遅延の発生の有無は、パケットに含まれるタイムスタンプに基づいて動画再生装置３３０において判定されてもよい。動画再生装置３２０は、動画配信装置３３０における伝送遅延の判定結果を受け取っても良い。 The delay monitoring unit 321d monitors the transmission delay of the motion data transmitted from the video distribution device 320 to the video playback device 330 by the transmission unit 321c. The delay monitoring unit 321d may realize substantially the same function as the delay monitoring unit 11e of the moving image distribution system 1. For example, the delay monitoring unit 321d adds a time stamp to an actual packet containing motion data before transmission, and adds a type stamp added at the time of transmission and when this actual packet is received by the moving image playback device 330. It is possible to obtain the transmission time of the actual packet on the transmission line between the moving image distribution device 320 and the moving image reproducing device 330 by using the time stamp. Whether or not a transmission delay has occurred may be determined by the moving image player 330 based on the time stamp included in the packet. The moving image playback device 320 may receive the determination result of the transmission delay in the moving image distribution device 330.

再び送信部３２１ｂの機能について説明する。送信部３２１ｂは、遅延監視部３２１ｄにおいて伝送路に遅延が発生していると判定された場合に、ストレージ２３に記憶されている基準姿勢識別データ（基準姿勢ＩＤ）を動画再生装置３３０に送信することができる。基準姿勢識別データは、視聴ユーザの姿勢特徴量を含むモーションデータに代えて送信されてもよい。上記のとおり、基準姿勢識別データは、開始姿勢データ及びトリガー姿勢データを含み得る。この伝送遅延が発生している場合に実行される送信部３２１ｂの機能は、伝送遅延が発生している場合に送信部１１ｃによって実行される機能と同様である。 The function of the transmission unit 321b will be described again. When the delay monitoring unit 321d determines that a delay has occurred in the transmission line, the transmission unit 321b transmits the reference posture identification data (reference posture ID) stored in the storage 23 to the moving image playback device 330. be able to. The reference posture identification data may be transmitted instead of the motion data including the posture feature amount of the viewing user. As described above, the reference attitude identification data may include the start attitude data and the trigger attitude data. The function of the transmission unit 321b executed when the transmission delay occurs is the same as the function executed by the transmission unit 11c when the transmission delay occurs.

動画再生装置３３０は、既述の動画再生装置１０と同様にコンピュータプロセッサ１１と、通信Ｉ／Ｆ１２と、ストレージ１３と、を備える。動画再生装置３３０は、コンピュータプロセッサ１１によって実現される機能及びストレージ２３に格納されるデータの点で動画再生装置１０と異なっている。そこで、以下では、動画再生装置３３０のコンピュータプロセッサ２１において実現される機能及びストレージ２３に格納されるデータについて説明する。 The moving image reproducing device 330 includes a computer processor 11, a communication I / F 12, and a storage 13 like the moving image reproducing device 10 described above. The moving image playback device 330 differs from the moving image playback device 10 in terms of the functions realized by the computer processor 11 and the data stored in the storage 23. Therefore, in the following, the functions realized by the computer processor 21 of the moving image reproduction device 330 and the data stored in the storage 23 will be described.

動画再生装置３３０のストレージ１３には、アニメーション管理データ３１３ａが記憶される。アニメーション管理データ３１３ａは、動画配信システム１において記憶されるアニメーション管理データ１３ａと同じであってもよい。例えば、アニメーション管理データ３１３ａは、図６に示されているように、基準姿勢識別データと、登録アニメーションを識別する登録アニメーション識別データ（登録アニメーションＩＤ）と、アバタのアニメーションを特定するためのアニメーション定義データと、を有する。 Animation management data 313a is stored in the storage 13 of the moving image playback device 330. The animation management data 313a may be the same as the animation management data 13a stored in the moving image distribution system 1. For example, as shown in FIG. 6, the animation management data 313a includes reference posture identification data, registered animation identification data (registered animation ID) for identifying registered animation, and animation definition for specifying avatar animation. With data.

コンピュータプロセッサ１１は、コンピュータ読み取り可能な命令を実行することにより、アニメーション生成部３３１ａ、動画生成部３３１ｂ、及び動画再生部３３１ｃとして機能する。コンピュータプロセッサ１１により実現される機能の少なくとも一部は、動画配信システム３０１のコンピュータプロセッサ１１以外のコンピュータプロセッサにより実現されてもよい。 The computer processor 11 functions as an animation generation unit 331a, a moving image generation unit 331b, and a moving image reproduction unit 331c by executing a computer-readable instruction. At least a part of the functions realized by the computer processor 11 may be realized by a computer processor other than the computer processor 11 of the moving image distribution system 301.

アニメーション生成部３３１ａは、動画配信システム１のアニメーション生成部２１ｃと概ね同じ機能を実現しても良い。例えば、アニメーション生成部３３１ｃは、姿勢データ取得装置３１０の配信ユーザのアバタのアニメーションを生成することができる。一実施形態において、アニメーション生成部３３１ｃは、動画配信装置３２０から受信した配信ユーザの姿勢特徴量及びアバタデータ２３ｂに基づいて、当該配信ユーザのアバタのアニメーションを生成する。アニメーション生成部３３１ｃは、動画配信装置３２０から視聴ユーザの姿勢特徴量を連続して受信することで、当該姿勢特徴量に基づいて、当該視聴ユーザの体の動きに同期して動くアバタのアニメーションを生成することができる。 The animation generation unit 331a may realize substantially the same function as the animation generation unit 21c of the moving image distribution system 1. For example, the animation generation unit 331c can generate an animation of the avatar of the distribution user of the posture data acquisition device 310. In one embodiment, the animation generation unit 331c generates an animation of the avatar of the distribution user based on the posture feature amount of the distribution user and the avatar data 23b received from the video distribution device 320. The animation generation unit 331c continuously receives the posture feature amount of the viewing user from the video distribution device 320, and based on the posture feature amount, produces an animation of the avatar that moves in synchronization with the body movement of the viewing user. Can be generated.

動画生成部３３１ｂは、動画配信装置３２０から取得したオブジェクトデータ２３ａを用いて仮想空間を構築し、この仮想空間内でアニメーション生成部３３１ａにて生成されたアバタが動く動画を生成することができる。この動画には、配信ユーザの音声を合成することができる。 The moving image generation unit 331b can construct a virtual space using the object data 23a acquired from the moving image distribution device 320, and can generate a moving image in which the avatar generated by the animation generation unit 331a moves in the virtual space. The audio of the distribution user can be combined with this video.

動画再生部３３１ａは、動画生成部３３１ｂにおいて生成された動画を再生する。これにより、動画再生装置３３０のディスプレイ１４に配信ユーザのアバタのアニメーションを含む動画が表示される。 The moving image reproduction unit 331a reproduces the moving image generated by the moving image generation unit 331b. As a result, the moving image including the animation of the distribution user's avatar is displayed on the display 14 of the moving image reproducing device 330.

動画配信装置３２０から動画再生装置３３０へデータを伝送する伝送路において伝送遅延が発生した場合におけるアニメーション生成部３３１ａにより実現される処理について説明する。動画配信装置３２０から動画再生装置３３０への伝送路で伝送遅延が発生すると、配信ユーザの動きを反映して滑らかに動くアバタのアニメーションを生成することができなくなってしまうおそれがある。アニメーション生成部３３１ａは、動画配信装置３２０からの基準姿勢識別データ（例えば、基準姿勢ＩＤ）に基づいてアバタのアニメーションを生成することができる。具体的には、アニメーション生成部３３１ａは、動画再生装置１０から基準姿勢識別データを受信すると、アニメーション管理データ３１３ａを参照することで受信した基準姿勢識別データに対応付けられている登録アニメーションを特定することができ、この特定された登録アニメーションの登録アニメーションＩＤに対応付けられているアニメーション定義データに基づいてアバタのアニメーションを生成することができる。よって、アニメーション生成部３３１ａは、モーションデータの伝送に遅延が発生している間に姿勢特徴量が受信できなかったとしても、その遅延が発生している間に動画配信装置３２０から送られてくる基準姿勢識別データに基づいてアバタのアニメーションを生成することができる。 A process realized by the animation generation unit 331a when a transmission delay occurs in the transmission line for transmitting data from the moving image distribution device 320 to the moving image reproducing device 330 will be described. If a transmission delay occurs in the transmission line from the video distribution device 320 to the video playback device 330, it may not be possible to generate a smoothly moving avatar animation that reflects the movement of the distribution user. The animation generation unit 331a can generate an animation of the avatar based on the reference posture identification data (for example, the reference posture ID) from the moving image distribution device 320. Specifically, when the animation generation unit 331a receives the reference posture identification data from the moving image playback device 10, the animation management unit 331a identifies the registered animation associated with the received reference posture identification data by referring to the animation management data 313a. It is possible to generate an avatar animation based on the animation definition data associated with the registered animation ID of the specified registered animation. Therefore, even if the posture feature amount cannot be received while the motion data transmission is delayed, the animation generation unit 331a is sent from the moving image distribution device 320 while the delay is occurring. An avatar animation can be generated based on the reference posture identification data.

一実施形態において、姿勢データ取得装置３１０が動画再生装置３２０の機能を実現可能に構成され、動画再生装置３２０が姿勢データ取得装置３１０の機能を実現可能に構成されてもよい。これにより、両装置のユーザ同士がアバタを介して双方向的にコミニュケーションできる。 In one embodiment, the posture data acquisition device 310 may be configured to realize the function of the moving image reproduction device 320, and the moving image reproduction device 320 may be configured to be able to realize the function of the posture data acquisition device 310. As a result, users of both devices can communicate with each other bidirectionally via the avatar.

次に、図１６を参照して、上記の動画配信システムの変形例について説明する。上記の実施形態では、分類部１１ｄによって視聴ユーザの姿勢が基準姿勢に属すると判定された場合に、姿勢特徴量に代えて基準姿勢を識別する基準姿勢ＩＤを送信することで、ユーザの姿勢を表すデータのデータ量を少なくしている。図１６に示す例では、基準姿勢ＩＤに代えて、以下のようにして決定する特徴配列のインデックス（添字）を用いる。つまり、伝送遅延が発生しているときに、ユーザの姿勢を表す姿勢特徴量に代えて、特徴配列のインデックスが送信される。特徴配列のインデックスは、以下で説明するように数ビット程度のデータであり、基準姿勢ＩＤと同様に１０ビット以下の範囲の情報量で表すことができる。 Next, a modified example of the above-mentioned video distribution system will be described with reference to FIG. In the above embodiment, when the posture of the viewing user is determined to belong to the reference posture by the classification unit 11d, the posture of the user is determined by transmitting the reference posture ID that identifies the reference posture instead of the posture feature amount. The amount of data to be represented is reduced. In the example shown in FIG. 16, instead of the reference posture ID, the index (subscript) of the feature array determined as follows is used. That is, when the transmission delay occurs, the index of the feature array is transmitted instead of the posture feature amount representing the posture of the user. The index of the feature array is data of about several bits as described below, and can be represented by an amount of information in the range of 10 bits or less like the reference posture ID.

動画配信システム１においては、特徴配列は、以下のようにして決定される。まず、図７及び図８を参照して説明したように、動画再生装置１０は、所定のフレームレートで動画再生装置１０のユーザを撮像して撮像画像を取得し、この撮像画像の各フレームにおいて当該ユーザに関連する複数の特徴点を抽出する。例えば、図７に示されているようにＰ１〜Ｐ６の６つの特徴点が抽出される。動画再生装置１０は、抽出された複数の特徴点の各々についてユーザの動きに関する画像ベクトルを得る。特徴点Ｐ１〜Ｐ６の各々における画像ベクトルは、例えば、各特徴点における深度である。各特徴点Ｐ１〜Ｐ６の画像ベクトルは、０〜１．０の範囲に正規化されてｆｌｏａｔ配列として表現されてもよい。このようにして取得された特徴点Ｐ１〜Ｐ６の画像ベクトルを図１６に示す。図１６には、説明の簡略化のために１０フレーム分の画像ベクトルが示されているが、実際には統計的解析に十分な量のフレームについて画像ベクトルが取得される。 In the moving image distribution system 1, the feature arrangement is determined as follows. First, as described with reference to FIGS. 7 and 8, the moving image playback device 10 captures the user of the moving image playback device 10 at a predetermined frame rate to acquire a captured image, and in each frame of the captured image. Extract a plurality of feature points related to the user. For example, as shown in FIG. 7, six feature points P1 to P6 are extracted. The moving image playback device 10 obtains an image vector relating to the movement of the user for each of the extracted plurality of feature points. The image vector at each of the feature points P1 to P6 is, for example, the depth at each feature point. The image vector of each feature point P1 to P6 may be normalized to the range of 0 to 1.0 and expressed as a float array. The image vectors of the feature points P1 to P6 acquired in this way are shown in FIG. Although FIG. 16 shows image vectors for 10 frames for the sake of brevity, the image vectors are actually acquired for a sufficient amount of frames for statistical analysis.

次に、特徴点Ｐ１〜Ｐ６の各々について、所定区間における画像ベクトルの要素の和Σを算出する。図１６の表には、第０フレーム〜第２５フレームの２６フレーム分の区間における画像ベクトルの要素の和が「Σ」と表記されている行に示されている。次に、各特徴点Ｐ１〜Ｐ６の画像ベクトルの要素の和を大きい順にソートする。図１６においては、このソート結果が「ＲＡＮＫ」と表記されている行に示されている。図１６の例では、Ｐ４、Ｐ３、Ｐ６、Ｐ２、Ｐ１、Ｐ５の順にソートされる。このソート後の配列を｛4,3,6,2,1,5｝と表現する。 Next, for each of the feature points P1 to P6, the sum Σ of the elements of the image vector in the predetermined section is calculated. In the table of FIG. 16, the sum of the elements of the image vector in the section of 26 frames from the 0th frame to the 25th frame is shown in the row indicated by “Σ”. Next, the sum of the elements of the image vectors of the feature points P1 to P6 is sorted in descending order. In FIG. 16, this sort result is shown in the line labeled "RANK". In the example of FIG. 16, the items are sorted in the order of P4, P3, P6, P2, P1, and P5. This sorted array is expressed as {4,3,6,2,1,5}.

動画再生装置１０は、ユーザの動きに関する画像ベクトルのサンプルを学習することにより、ユーザの特徴的な姿勢又は動きを示す代表ベクトルを算出する。代表ベクトルは、例えば、ロイドアルゴリズム及びこれ以外の公知のアルゴリズムを用いて算出される。つまり、代表ベクトルは、ユーザの特徴的な姿勢を示す。よって、この代表ベクトルに基づいてユーザの姿勢を特定することができる。動画再生装置は、算出された代表ベクトルをインデックスとともに記憶する。動画再生装置１０は、自装置以外で算出された代表ベクトルを取得しても良い。本実施形態では、４つの代表ベクトルＡ〜Ｄが算出されており、各代表ベクトルの要素を大きい順にソートして代表ベクトルの配列を得る。代表ベクトルＡ〜Ｄの各々の要素を大きい順にソートした配列はそれぞれ順に｛1,3,6,2,4,5｝、｛4,3,6,2,1,5｝｛2,3,6,4,5,1｝｛1,2,3,4,5,6｝を表されると想定する。 The moving image playback device 10 calculates a representative vector showing a characteristic posture or movement of the user by learning a sample of an image vector related to the movement of the user. The representative vector is calculated using, for example, the Lloyd algorithm and other known algorithms. That is, the representative vector indicates the characteristic posture of the user. Therefore, the posture of the user can be specified based on this representative vector. The moving image playback device stores the calculated representative vector together with the index. The moving image playback device 10 may acquire a representative vector calculated by a device other than its own device. In the present embodiment, four representative vectors A to D are calculated, and the elements of each representative vector are sorted in descending order to obtain an array of representative vectors. The array in which each element of the representative vectors A to D is sorted in descending order is {1,3,6,2,4,5}, {4,3,6,2,1,5} {2,3, respectively. It is assumed that 6,4,5,1} {1,2,3,4,5,6} is represented.

動画再生装置１０は、撮像画像から得られた画像ベクトルをソートして得られた入力配列｛4,3,6,2,1,5｝と各代表ベクトルの配列とを比較し、入力配列｛4,3,6,2,1,5｝と最も近い代表ベクトルの配列を選定する。図１６に示されている例では、入力配列｛4,3,6,2,1,5｝が４つの代表ベクトルの配列のうち代表ベクトルＢの配列｛4,3,6,2,1,5｝と一致していることから、｛4,3,6,2,1,5｝が特徴配列として選定される。動画再生装置１０は、
このようにして選定された特徴配列のインデックスを動画配信装置２０に送信する。 The moving image playback device 10 compares the input array {4,3,6,2,1,5} obtained by sorting the image vectors obtained from the captured image with the array of each representative vector, and compares the input array { Select the array of representative vectors closest to 4,3,6,2,1,5}. In the example shown in FIG. 16, the input array {4,3,6,2,1,5} is an array of the representative vector B out of an array of four representative vectors {4,3,6,2,1, Since it matches 5}, {4,3,6,2,1,5} is selected as the feature sequence. The video playback device 10
The index of the feature array selected in this way is transmitted to the moving image distribution device 20.

動画配信装置２０は、代表ベクトルＡ〜Ｄとその配列とが対応付けられているコードブックを記憶している。動画配信装置２０は、当該コードブックを参照し、動画再生装置１０から取得した特徴配列に対応する代表ベクトルＢに対応する姿勢を取るように上記ユーザのアバタのアニメーションを生成する。 The moving image distribution device 20 stores a code book in which the representative vectors A to D and their arrays are associated with each other. The video distribution device 20 refers to the codebook and generates an animation of the user's avatar so as to take a posture corresponding to the representative vector B corresponding to the feature array acquired from the video playback device 10.

上記の例では、２６フレーム分の単位区間ごとに特徴配列が動画再生装置１０から動画配信装置２０に送信される。この単位区間は、適宜調整可能である。 In the above example, the feature array is transmitted from the moving image playback device 10 to the moving image distribution device 20 for each unit interval of 26 frames. This unit interval can be adjusted as appropriate.

以上のとおり、動画配信システム１において基準姿勢ＩＤに代えて特徴配列のインデックスを伝送し、伝送遅延時にはこの特徴配列のインデックスに基づいてアバタのアニメーションを生成することを説明した。動画配信システム３０１においても同様に、基準姿勢ＩＤに代えて特徴配列を用いることができる。特徴配列は、動画配信システム１０１において基準表情ＩＤに代えて用いることもできる。 As described above, it has been described that the video distribution system 1 transmits the index of the feature array instead of the reference posture ID, and generates the avatar animation based on the index of the feature array when the transmission is delayed. Similarly, in the video distribution system 301, a feature array can be used instead of the reference posture ID. The feature array can also be used in place of the reference facial expression ID in the moving image distribution system 101.

以上の実施形態によって奏される作用効果について説明する。上記の一態様において、伝送路において遅延が発生している間は、データ量が大きな姿勢特徴量ではなく、小さなデータ量で視聴ユーザの姿勢に関する情報を特定することができる基準姿勢識別データが動画再生装置１０から動画配信装置２０に対して送信される。これにより、伝送路において遅延が発生している場合に、小さなデータ量の基準姿勢識別データにより、視聴ユーザの姿勢に関する情報を動画配信装置２０に伝達することができる。 The action and effect produced by the above embodiments will be described. In the above aspect, while the delay occurs in the transmission line, the reference posture identification data that can specify the information about the posture of the viewing user with a small amount of data instead of the large amount of posture features is the moving image. It is transmitted from the playback device 10 to the moving image distribution device 20. As a result, when a delay occurs in the transmission line, information regarding the posture of the viewing user can be transmitted to the moving image distribution device 20 by the reference posture identification data of a small amount of data.

一態様においては、視聴ユーザのアバタに関して登録アニメーションを登録することができ、伝送路において遅延が発生している間には、基準姿勢識別データに基づいて特定される基準姿勢に対応する登録アニメーションを含む動画が配信される。これにより、動画再生装置１０から動画配信装置２０への伝送路における遅延のために視聴ユーザの姿勢を表す姿勢特徴量の動画配信装置２０への送信が困難な場合であっても、視聴ユーザの姿勢に関連する登録アニメーションを動画に含めることができる。 In one aspect, a registration animation can be registered for the viewing user's avatar, and while a delay occurs in the transmission line, a registration animation corresponding to the reference posture specified based on the reference posture identification data is created. The including video will be delivered. As a result, even when it is difficult to transmit the posture feature amount representing the posture of the viewing user to the video distribution device 20 due to the delay in the transmission path from the video playback device 10 to the video distribution device 20, the viewing user Posture-related registration animations can be included in the video.

上記の一態様においては、動画再生装置３３０において、動画配信装置３２０から受信した配信ユーザの姿勢特徴量に基づいて当該配信ユーザの動きに基づいて動くアバタのアニメーションが生成される。伝送路において遅延が発生している間は、姿勢特徴量ではなく基準姿勢識別データが動画配信装置３２０から動画再生装置３１０に対して送信される。これにより、伝送路において遅延が発生している場合に、配信ユーザの動きを小さなデータ量で表すことができる基準姿勢識別データにより、動画再生装置３３０においてアバタのアニメーションを生成することができる。 In the above aspect, the moving image playback device 330 generates an animation of an avatar that moves based on the movement of the distribution user based on the posture feature amount of the distribution user received from the video distribution device 320. While the delay is occurring in the transmission line, the reference posture identification data, not the posture feature amount, is transmitted from the video distribution device 320 to the video playback device 310. As a result, when a delay occurs in the transmission line, the avatar animation can be generated in the moving image playback device 330 by the reference posture identification data that can represent the movement of the distribution user with a small amount of data.

一態様においては、ユーザの動きに基づいて生成されるユーザアニメーションを登録アニメーションとして登録することができる。これにより、姿勢特徴量が利用できない場合におけるアバタの動きのバリエーションを増やすことができる。 In one aspect, a user animation generated based on the user's movement can be registered as a registration animation. As a result, it is possible to increase the variation of the movement of the avatar when the posture feature amount cannot be used.

一態様において、伝送路において遅延が発生している間は、データ量が大きな顔特徴量ではなく、小さなデータ量で視聴ユーザの姿勢に関する情報を特定することができる基準表情識別データが動画再生装置１０から動画配信装置２０に対して送信される。これにより、伝送路において遅延が発生している場合に、小さなデータ量の基準表情識別データにより、視聴ユーザの表情に関する情報を動画配信装置２０に伝達することができる。 In one aspect, while the delay occurs in the transmission line, the reference facial expression identification data that can specify the information about the posture of the viewing user with a small amount of data instead of the large amount of facial features is the video playback device. It is transmitted from 10 to the moving image distribution device 20. As a result, when a delay occurs in the transmission line, information on the facial expression of the viewing user can be transmitted to the moving image distribution device 20 by the reference facial expression identification data of a small amount of data.

本明細書において説明された処理手順、特にフロー図を用いて説明された処理手順においては、その処理手順を構成する工程（ステップ）の一部を省略すること、その処理手順を構成する工程として明示されていない工程を追加すること、及び／又は当該工程の順序を入れ替えることが可能であり、このような省略、追加、順序の変更がなされた処理手順も本発明の趣旨を逸脱しない限り本発明の範囲に含まれる。 In the processing procedure described in the present specification, particularly the processing procedure described using the flow chart, a part of the steps (steps) constituting the processing procedure is omitted, as a step constituting the processing procedure. It is possible to add steps that are not specified and / or to change the order of the steps, and the processing procedure in which such omissions, additions, and changes in order are made does not deviate from the gist of the present invention. It is included in the scope of the invention.

以下に、本願の原出願の出願当初の特許請求の範囲に記載された発明を付記する。
［１］
一又は複数のコンピュータプロセッサを備え、
前記一又は複数のコンピュータプロセッサは、コンピュータ読み取り可能な命令を実行することにより、
動画配信装置から受信した動画を再生し、
ユーザの姿勢の特徴を表す姿勢特徴量を含むモーションデータを前記動画配信装置に送信し、
前記動画配信装置から前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを含む動画を受信し、
前記ユーザの姿勢が予め定められた基準姿勢に属するか否かを判定し、
前記モーションデータの前記動画配信装置への伝送に遅延が発生している間に、前記ユーザの姿勢が属すると判定された前記基準姿勢を識別し前記姿勢特徴量よりも少ないデータ量の基準姿勢識別データを前記動画配信装置に送信する、
動画再生装置。
［２］
前記分類器は、前記ユーザの姿勢が前記基準姿勢と合致するか否かを、前記姿勢特徴量を変数として評価関数に基づいて判定する、
［１］に記載の動画再生装置。
［３］
前記ユーザの姿勢が前記基準姿勢に属するか否かの判定は、前記姿勢特徴量に基づいて前記ユーザの姿勢を分類する分類器により行われ、
前記基準姿勢を表す画像を前記ユーザに提示し、提示された前記画像に対して前記ユーザが取った姿勢を表す姿勢特徴量を教師データとして学習することにより前記分類器を作成する、
［１］に記載の動画再生装置。
［４］
前記ユーザのアバタについて一又は複数の登録アニメーションが登録されており、
前記遅延が発生している間に前記基準姿勢識別データに基づいて特定された前記基準姿勢に対応する前記登録アニメーションを含む動画を受信する、
［１］から［３］のいずれか１項に記載の動画再生装置。
［５］
時系列に取得された前記姿勢特徴量に基づいて前記アバタのユーザアニメーションを生成し、
前記ユーザアニメーションを前記登録アニメーションとして登録し、
前記ユーザアニメーションを含むサンプル動画を生成し、
前記サンプル動画を構成する複数のフレームの中から選択された基準フレームの画像に基づいて前記ユーザアニメーションに対応する前記基準姿勢を決定する、
［４］に記載の動画再生装置。
［６］
時系列に取得された前記姿勢特徴量に基づいて前記アバタのユーザアニメーションを生成し、
前記ユーザアニメーションを前記登録アニメーションとして登録し、
所定のフレームレートに基づいて前記ユーザに関する複数の特徴点の各々において前記ユーザに関する前記姿勢特徴量を算出し、
第１フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第１ＲＭＳを算出し、
前記第１フレームよりも時系列的に後の第２フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第２ＲＭＳを算出し、
前記第２フレームよりも時系列的に後の第３フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第３ＲＭＳを算出し、
前記第２ＲＭＳと前記第１ＲＭＳとの差である第１ＲＭＳ差の正負と前記第３ＲＭＳと前記第２ＲＭＳとの差である第２ＲＭＳ差の正負とが逆転した場合に、前記第３フレームにおける前記姿勢特徴量に基づいて前記ユーザアニメーションに対応する前記基準姿勢を決定する、
［４］に記載の動画再生装置。
［７］
時系列に取得された前記姿勢特徴量に基づいて前記アバタのユーザアニメーションを生成し、
前記ユーザアニメーションを前記登録アニメーションとして登録し、
前記ユーザアニメーションに対する前記ユーザの動きに基づいて時系列に取得された前記姿勢特徴量を含む第１評価データと、前記ユーザアニメーションに対する前記ユーザの他の動きに基づいて時系列に取得された前記姿勢特徴量を含む第２評価データと、を比較することで前記ユーザアニメーションに対応する前記基準姿勢を決定する、
［４］に記載の動画再生装置。
［８］
前記姿勢特徴量は、前記ユーザのボーンの位置及び向きを３次元ベクトルで表すボーンデータを含む、
［１］から［７］のいずれか１項に記載の動画再生装置。
［９］
前記モーションデータは、前記ユーザの顔の特徴を表す顔特徴量を含み、
前記一又は複数のコンピュータプロセッサは、
前記顔特徴量に基づいて前記ユーザの表情を分類する他の分類器により前記ユーザの表情が予め定められた基準表情に属するか否かを判定し、
前記遅延が発生している間に、前記ユーザの表情が属すると判定された前記基準表情を識別し前記顔特徴量よりも少ないデータ量の基準表情識別データを前記動画配信装置に送信する、
［１］から［８］のいずれか１項に記載の動画再生装置。
［１０］
前記基準姿勢識別データは、前記遅延が発生している間に前記モーションデータに代えて送信される、
［１］から［９］のいずれか１項に記載の動画再生装置。
［１１］
前記モーションデータはリアルタイムで送信される、
［１］から［１０］のいずれか１項に記載の動画再生装置。
［１２］
一又は複数のコンピュータプロセッサを備え、ユーザのアバタを含む動画を前記ユーザの動画再生装置に配信する動画配信システムであって、
前記一又は複数のコンピュータプロセッサは、コンピュータ読み取り可能な命令を実行することにより、
前記動画再生装置から伝送路を介して前記ユーザの姿勢を表す姿勢特徴量を含むモーションデータを受信し、
前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを前記動画に含めて配信し、
前記姿勢特徴量に基づいて前記ユーザの姿勢を分類する分類器により前記ユーザの姿勢が基準姿勢に属すると判定された場合、前記モーションデータの前記動画配信装置への伝送に遅延が発生している間に、前記基準姿勢を識別し前記姿勢特徴量よりも少ないデータ量の基準姿勢識別データを前記動画再生装置から受信し、
前記基準姿勢識別データに基づいて生成された前記ユーザのアバタの登録アニメーションを前記動画に含めて配信する、
動画配信システム。
［１３］
一又は複数のコンピュータプロセッサがコンピュータ読み取り可能な命令を実行することにより実行される動画再生方法であって、
動画配信装置から受信した動画を再生する工程と、
ユーザの姿勢を表す姿勢特徴量を含むモーションデータを前記動画配信装置に送信する工程と、
前記動画配信装置から前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを含む動画を受信する工程と、
前記姿勢特徴量に基づいて前記ユーザの姿勢を分類する分類器により前記ユーザの姿勢が予め定められた基準姿勢に属するか否かを判定する工程と、
前記モーションデータの前記動画配信装置への伝送に遅延が発生しているか否かを判定する工程と、
前記遅延が発生している間に、前記ユーザの姿勢が属すると判定された前記基準姿勢を識別し前記姿勢特徴量よりも少ないデータ量の基準姿勢識別データを前記動画配信装置に送信する工程と、
を備える動画再生方法。
［１４］
一又は複数のコンピュータプロセッサを備える動画再生装置であって、
前記一又は複数のコンピュータプロセッサは、コンピュータ読み取り可能な命令を実行することにより、
所定のフレームレートに基づいて前記動画再生装置のユーザに関する複数の特徴点の各々において前記ユーザに関する姿勢特徴量を算出し、
第１フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第１ＲＭＳを算出し、
前記第１フレームよりも時系列的に後の第２フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第２ＲＭＳを算出し、
前記第２ＲＭＳと前記第１ＲＭＳとの差であるＲＭＳ差を算出し、
前記ＲＭＳ差が所定の閾値よりも大きい場合に動画配信装置に対して前記第２フレームにおける前記姿勢特徴量を送信し、
前記動画配信装置から前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを含む動画を受信し、
前記動画を再生する、
動画再生装置。
［１５］
一又は複数のコンピュータプロセッサを備える動画再生装置であって、
前記一又は複数のコンピュータプロセッサは、コンピュータ読み取り可能な命令を実行することにより、
所定のフレームレートに基づいて前記動画再生装置のユーザに関する複数の特徴点の各々において前記ユーザに関する姿勢特徴量を算出し、
第１フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第１ＲＭＳを算出し、
前記第１フレームよりも時系列的に後の第２フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第２ＲＭＳを算出し、
前記第２フレームよりも時系列的に後の第３フレームにおいて前記複数の特徴点の各々における前記姿勢特徴量の二乗平均平方根である第３ＲＭＳを算出し、
前記第２ＲＭＳと前記第１ＲＭＳとの差である第１ＲＭＳ差の正負と前記第３ＲＭＳと前記第２ＲＭＳとの差である第２ＲＭＳ差の正負とが逆転した場合に、前記第３フレームにおける前記姿勢特徴量を送信し、
前記動画配信装置から前記姿勢特徴量に基づいて生成された前記ユーザのアバタのアニメーションを含む動画を受信し、
前記動画を再生する、
動画再生装置。
［１６］
一又は複数のコンピュータプロセッサを備え、
前記一又は複数のコンピュータプロセッサは、コンピュータ読み取り可能な命令を実行することにより、
ユーザの姿勢の特徴を表す姿勢特徴量を含むモーションデータを動画再生装置に送信し、
前記ユーザの姿勢が予め定められた基準姿勢に属するか否かを判定し、
前記モーションデータの前記動画再生装置への伝送に遅延が発生している間に、前記ユーザの姿勢が属すると判定された前記基準姿勢を識別し前記姿勢特徴量よりも少ないデータ量の基準姿勢識別データを前記動画再生装置に送信する、
動画配信装置。
［１７］
一又は複数のコンピュータプロセッサと、登録アニメーションを記憶するストレージと、を備え、
前記一又は複数のコンピュータプロセッサは、コンピュータ読み取り可能な命令を実行することにより、
動画配信装置から受信したユーザの姿勢の特徴を表す姿勢特徴量を含むモーションデータを受信し、
前記モーションデータに基づいて生成されたユーザのアバタのアニメーションを含む動画を生成し、
前記動画配信システムからの前記モーションデータの伝送に遅延が発生している間に、前記ユーザの姿勢が属すると判定された基準姿勢を識別し前記姿勢特徴量よりも少ないデータ量の基準姿勢識別データに基づいて前記基準姿勢に対応する前記登録アニメーションを含む動画を生成する、
動画再生装置。 The inventions described in the claims at the time of filing the original application of the present application are added below.
[1]
Equipped with one or more computer processors
The one or more computer processors may execute computer-readable instructions.
Play the video received from the video distribution device and play it.
Motion data including a posture feature amount representing the posture feature of the user is transmitted to the video distribution device, and the motion data is transmitted to the video distribution device.
A video including an animation of the user's avatar generated based on the posture feature amount is received from the video distribution device, and the video is received.
It is determined whether or not the user's posture belongs to a predetermined reference posture, and the user's posture is determined.
While the transmission of the motion data to the moving image distribution device is delayed, the reference posture determined to belong to the user's posture is identified, and the reference posture identification of a data amount smaller than the posture feature amount is specified. Sending data to the video distribution device,
Video playback device.
[2]
The classifier determines whether or not the posture of the user matches the reference posture based on the evaluation function using the posture feature amount as a variable.
The moving image playback device according to [1].
[3]
Whether or not the posture of the user belongs to the reference posture is determined by a classifier that classifies the posture of the user based on the posture feature amount.
The classifier is created by presenting an image representing the reference posture to the user and learning the posture feature amount representing the posture taken by the user with respect to the presented image as teacher data.
The moving image playback device according to [1].
[4]
One or more registered animations are registered for the user's avatar.
While the delay is occurring, a moving image including the registered animation corresponding to the reference posture identified based on the reference posture identification data is received.
The moving image playback device according to any one of [1] to [3].
[5]
A user animation of the avatar is generated based on the posture feature amount acquired in time series.
Registering the user animation as the registration animation,
Generate a sample video containing the user animation
The reference posture corresponding to the user animation is determined based on the image of the reference frame selected from the plurality of frames constituting the sample moving image.
The moving image playback device according to [4].
[6]
A user animation of the avatar is generated based on the posture feature amount acquired in time series.
Registering the user animation as the registration animation,
The posture feature amount for the user is calculated at each of the plurality of feature points for the user based on a predetermined frame rate.
In the first frame, the first RMS, which is the root mean square of the posture feature amount at each of the plurality of feature points, is calculated.
In the second frame after the first frame in chronological order, the second RMS, which is the root mean square of the posture feature amount at each of the plurality of feature points, is calculated.
In the third frame after the second frame in chronological order, the third RMS, which is the root mean square of the posture feature amount at each of the plurality of feature points, is calculated.
When the positive / negative of the first RMS difference, which is the difference between the second RMS and the first RMS, and the positive / negative of the second RMS difference, which is the difference between the third RMS and the second RMS, are reversed, the posture feature in the third frame. The reference posture corresponding to the user animation is determined based on the quantity.
The moving image playback device according to [4].
[7]
A user animation of the avatar is generated based on the posture feature amount acquired in time series.
Registering the user animation as the registration animation,
The first evaluation data including the posture feature amount acquired in time series based on the movement of the user with respect to the user animation, and the posture acquired in time series based on other movements of the user with respect to the user animation. The reference posture corresponding to the user animation is determined by comparing with the second evaluation data including the feature amount.
The moving image playback device according to [4].
[8]
The posture feature amount includes bone data representing the position and orientation of the user's bone as a three-dimensional vector.
The moving image playback device according to any one of [1] to [7].
[9]
The motion data includes a facial feature amount representing the facial features of the user.
The one or more computer processors
It is determined whether or not the user's facial expression belongs to a predetermined reference facial expression by another classifier that classifies the user's facial expression based on the facial feature amount.
While the delay is occurring, the reference facial expression determined to belong to the user's facial expression is identified, and the reference facial expression identification data having a data amount smaller than the facial feature amount is transmitted to the moving image distribution device.
The moving image playback device according to any one of [1] to [8].
[10]
The reference posture identification data is transmitted in place of the motion data while the delay is occurring.
The moving image playback device according to any one of [1] to [9].
[11]
The motion data is transmitted in real time,
The moving image playback device according to any one of [1] to [10].
[12]
A video distribution system that includes one or more computer processors and distributes a video including a user's avatar to the user's video playback device.
The one or more computer processors may execute computer-readable instructions.
Motion data including a posture feature amount representing the posture of the user is received from the moving image playback device via a transmission line, and the motion data is received.
The animation of the user's avatar generated based on the posture feature amount is included in the moving image and distributed.
When the posture of the user is determined to belong to the reference posture by the classifier that classifies the posture of the user based on the posture feature amount, the transmission of the motion data to the moving image distribution device is delayed. In the meantime, the reference posture is identified, and the reference posture identification data having a data amount smaller than the posture feature amount is received from the moving image playback device.
The user's avatar registration animation generated based on the reference posture identification data is included in the moving image and distributed.
Video distribution system.
[13]
A method of playing video that is executed by one or more computer processors executing computer-readable instructions.
The process of playing the video received from the video distribution device and
A process of transmitting motion data including a posture feature amount representing a user's posture to the video distribution device, and
A step of receiving a moving image including an animation of the user's avatar generated based on the posture feature amount from the moving image distribution device, and
A step of determining whether or not the user's posture belongs to a predetermined reference posture by a classifier that classifies the user's posture based on the posture feature amount.
A step of determining whether or not a delay has occurred in the transmission of the motion data to the moving image distribution device, and
A step of identifying the reference posture determined to belong to the user's posture while the delay is occurring, and transmitting reference posture identification data having a data amount smaller than the posture feature amount to the moving image distribution device. ,
Video playback method with.
[14]
A video player equipped with one or more computer processors.
The one or more computer processors may execute computer-readable instructions.
Based on a predetermined frame rate, the posture feature amount related to the user is calculated at each of the plurality of feature points related to the user of the moving image playback device.
In the first frame, the first RMS, which is the root mean square of the posture feature amount at each of the plurality of feature points, is calculated.
In the second frame after the first frame in chronological order, the second RMS, which is the root mean square of the posture feature amount at each of the plurality of feature points, is calculated.
The RMS difference, which is the difference between the second RMS and the first RMS, is calculated.
When the RMS difference is larger than a predetermined threshold value, the posture feature amount in the second frame is transmitted to the moving image distribution device.
A video including an animation of the user's avatar generated based on the posture feature amount is received from the video distribution device, and the video is received.
Play the video,
Video playback device.
[15]
A video player equipped with one or more computer processors.
The one or more computer processors may execute computer-readable instructions.
Based on a predetermined frame rate, the posture feature amount related to the user is calculated at each of the plurality of feature points related to the user of the moving image playback device.
In the first frame, the first RMS, which is the root mean square of the posture feature amount at each of the plurality of feature points, is calculated.
In the second frame after the first frame in chronological order, the second RMS, which is the root mean square of the posture feature amount at each of the plurality of feature points, is calculated.
In the third frame after the second frame in chronological order, the third RMS, which is the root mean square of the posture feature amount at each of the plurality of feature points, is calculated.
When the positive / negative of the first RMS difference, which is the difference between the second RMS and the first RMS, and the positive / negative of the second RMS difference, which is the difference between the third RMS and the second RMS, are reversed, the posture feature in the third frame. Send the amount,
A video including an animation of the user's avatar generated based on the posture feature amount is received from the video distribution device, and the video is received.
Play the video,
Video playback device.
[16]
Equipped with one or more computer processors
The one or more computer processors may execute computer-readable instructions.
Motion data including a posture feature amount representing the posture feature of the user is transmitted to the video playback device, and the motion data is transmitted.
It is determined whether or not the user's posture belongs to a predetermined reference posture, and the user's posture is determined.
While the transmission of the motion data to the moving image playback device is delayed, the reference posture determined to belong to the user's posture is identified, and the reference posture identification of a data amount smaller than the posture feature amount is specified. Sending the data to the video playback device,
Video distribution device.
[17]
Equipped with one or more computer processors and storage for storing registered animations,
The one or more computer processors may execute computer-readable instructions.
Receives motion data including a posture feature amount representing the posture feature of the user received from the video distribution device, and receives the motion data.
Generate a video containing the user's avatar animation generated based on the motion data.
While the transmission of the motion data from the moving image distribution system is delayed, the reference posture determined to belong to the user's posture is identified, and the reference posture identification data having a data amount smaller than the posture feature amount is used. Generates a video containing the registered animation corresponding to the reference posture based on
Video playback device.

１、１０１、２０１、３０１動画配信システム
１０、１１０、２１０、３３０動画再生装置
１１ａ動画再生部
１１ｂ姿勢データ取得部
１１ｃ送信部
１１ｄ分類部
１１ｅ遅延監視部
１１ｆ顔特徴量取得部
２０、１２０、２２０、３２０動画配信装置
２１ａ動画生成部
２１ｂ動画配信部
２１ｃアニメーション生成部
３１０姿勢データ取得装置 1, 101, 201, 301 Video distribution system 10, 110, 210, 330 Video playback device 11a Video playback unit 11b Posture data acquisition unit 11c Transmission unit 11d Classification unit 11e Delay monitoring unit 11f Facial feature amount acquisition unit 20, 120, 220 , 320 Video distribution device 21a Video generation unit 21b Video distribution unit 21c Animation generation unit 310 Posture data acquisition device

Claims

Equipped with one or more computer processors
The one or more computer processors may execute computer-readable instructions.
Play the video received from the video distribution device and play it.
Motion data including a facial feature amount representing a user's facial feature is transmitted to the video distribution device, and the motion data is transmitted to the video distribution device.
A video including an animation of the user's avatar generated based on the facial features is received from the video distribution device, and the video is received.
It is determined whether or not the user's facial expression matches a predetermined reference facial expression, and the user's facial expression is determined.
While the transmission of the motion data to the moving image distribution device is delayed, the reference facial expression identification data for identifying the reference facial expression determined to match the user's facial expression is transmitted to the moving image distribution device.
The amount of the reference facial expression identification data is smaller than the amount of the facial feature.
Video playback device.

Whether or not the user's facial expression matches the reference facial expression is determined based on the evaluation function with the facial feature amount as a variable.
The moving image playback device according to claim 1.

Whether or not the user's facial expression matches the reference facial expression is determined by a classifier that classifies the user's facial expression based on the facial feature amount.
The classifier is created by presenting an image representing the reference facial expression to the user and learning the facial feature amount representing the facial expression created by the user with respect to the presented image as teacher data.
The moving image playback device according to claim 1.

One or more registered animations are registered for the user's avatar.
While the delay is occurring, the moving image including the registered animation corresponding to the reference facial expression identified based on the reference facial expression identification data is received.
The moving image playback device according to any one of claims 1 to 3.

A user animation of the avatar is generated based on the facial features acquired in chronological order.
Registering the user animation as the registration animation,
Generate a sample video containing the user animation
The reference facial expression corresponding to the user animation is determined based on the image of the reference frame selected from the plurality of frames constituting the sample moving image.
The moving image playback device according to claim 4.

A user animation of the avatar is generated based on the facial features acquired in chronological order.
Registering the user animation as the registration animation,
The facial feature amount for the user is calculated at each of the plurality of feature points for the user based on a predetermined frame rate.
In the first frame, the first RMS, which is the root mean square of the facial features at each of the plurality of feature points, is calculated.
In the second frame after the first frame in chronological order, the second RMS, which is the root mean square of the facial features at each of the plurality of feature points, is calculated.
In the third frame after the second frame in chronological order, the third RMS, which is the root mean square of the facial features at each of the plurality of feature points, is calculated.
When the positive / negative of the first RMS difference, which is the difference between the second RMS and the first RMS, and the positive / negative of the second RMS difference, which is the difference between the third RMS and the second RMS, are reversed, the facial expression feature in the third frame. The reference facial expression corresponding to the user animation is determined based on the amount.
The moving image playback device according to claim 4.

A user animation of the avatar is generated based on the facial features acquired in chronological order.
Registering the user animation as the registration animation,
The first evaluation data including the facial feature amount acquired in time series based on the facial expression of the user with respect to the user animation, and the face acquired in time series based on other facial expressions of the user with respect to the user animation. The reference facial expression corresponding to the user animation is determined by comparing with the second evaluation data including the feature amount.
The moving image playback device according to claim 4.

The motion data is transmitted in real time,
The moving image playback device according to any one of claims 1 to 7.

A video distribution system that includes one or more computer processors and distributes a video including a user's avatar to the user's video playback device.
The one or more computer processors may execute computer-readable instructions.
Motion data including a facial feature amount representing the user's facial expression is received from the moving image playback device via a transmission path, and the motion data is received.
The animation of the user's avatar generated based on the facial feature amount is included in the moving image and distributed.
While the transmission of the motion data from the moving image playback device matching the user's facial expression is delayed, the reference facial expression identification data for identifying the reference facial expression matching the user's facial expression is obtained from the moving image playback device. Receive and
The registered animation of the user's avatar generated based on the reference facial expression identification data is included in the moving image and distributed.
The amount of the reference facial expression identification data is smaller than the amount of the facial feature.
Video distribution system.

A method of playing video that is executed by one or more computer processors executing computer-readable instructions.
The process of playing the video received from the video distribution device and
A process of transmitting motion data including a facial feature amount representing a user's facial expression to the video distribution device, and
A step of receiving a moving image including an animation of the user's avatar generated based on the facial feature amount from the moving image distribution device, and
A step of determining whether or not the user's facial expression matches a predetermined reference facial expression by a classifier that classifies the user's facial expression based on the facial feature amount.
A step of determining whether or not a delay has occurred in the transmission of the motion data to the moving image distribution device, and
A step of transmitting reference facial expression identification data for identifying the reference facial expression determined to match the user's facial expression to the moving image distribution device while the delay is occurring.
With
The amount of the reference facial expression identification data is smaller than the amount of the facial feature.
Video playback method.

Equipped with one or more computer processors
The one or more computer processors may execute computer-readable instructions.
Motion data including facial features representing the features of the user's facial expressions is transmitted to the video playback device,
It is determined whether or not the user's facial expression matches a predetermined reference facial expression, and the user's facial expression is determined.
While the transmission of the motion data to the moving image playback device is delayed, the reference facial expression identification data for identifying the reference facial expression determined to match the user's facial expression is transmitted to the moving image playback device.
The amount of the reference facial expression identification data is smaller than the amount of the facial feature.
Video distribution device.

Equipped with one or more computer processors and storage for storing registered animations,
The one or more computer processors may execute computer-readable instructions.
Receives motion data including facial features that represent the features of the user's facial expressions from the video distribution device,
Generate a video containing the user's avatar animation generated based on the motion data.
The said reference facial expression corresponding to the reference facial expression based on the reference facial expression identification data for identifying the reference facial expression determined to match the user's facial expression while the transmission of the motion data from the moving image distribution device is delayed. Generate a video containing the registered animation and
The amount of the reference facial expression identification data is smaller than the amount of the facial feature.
Video playback device.