JP2023183624A

JP2023183624A - Information processing device, information processing method and information processing program

Info

Publication number: JP2023183624A
Application number: JP2022097228A
Authority: JP
Inventors: 秀明岡本; Hideaki Okamoto; 裕真鈴木; Yuma Suzuki; 隆之堀; Takayuki Hori; 麟太郎金田; Rintaro Kaneda; 努寺田; Tsutomu Terada; 修平土田; Shuhei Tsuchida; コウミンモウ; Komin Mo
Original assignee: Kobe University NUC; SoftBank Corp
Current assignee: Kobe University NUC; SoftBank Corp
Priority date: 2022-06-16
Filing date: 2022-06-16
Publication date: 2023-12-28
Anticipated expiration: 2042-06-16
Also published as: JP7474447B2

Abstract

To enable an arbitrary exercise video to be generated from an exercise video including movement of a user's body.SOLUTION: An information processing device according to the present application includes: an acquisition part for acquiring a machine learning model including an encoder for generating feature information showing features of time series data on the basis of time series data about a change in a joint angle, and a decoder for generating time series data on the basis of the feature information; and a generation part for generating second time series data about a change in a joint angle of a user from first time series data about the change in the joint angle of the user by using a latent space of the machine learning model.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法及び情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

従来、機械学習の分野では、オートエンコーダ（AutoEncoder）に関する技術が知られている。オートエンコーダは、対象となる情報を潜在表現（特徴表現ともいう）に変換するニューラルネットワークであるエンコーダと、潜在表現から再び対象となる情報を復元するニューラルネットワークであるデコーダによって構成される。また、オートエンコーダから派生したＶＡＥ（Variational Autoencoder）に関する技術が知られている。ＶＡＥは、潜在表現の確率分布が正規分布に従うようにニューラルネットワークを学習させる。例えば、ＶＡＥを用いて、手書きで書かれた「０」～「９」の数字に対応する画像データ（以下、手書き数字画像ともいう）と、画像に書かれた数字の正解となるラベルデータとの組のデータセットを潜在表現に変換して、潜在表現を潜在空間にマッピングする。そして、ＶＡＥの潜在空間上で潜在変数を連続的に変化させながら画像を生成する。これにより、画像に描かれた数字を連続的に変化させた手書き数字画像を生成する技術が知られている。 Conventionally, in the field of machine learning, techniques related to autoencoders have been known. An autoencoder is comprised of an encoder, which is a neural network that converts target information into a latent representation (also called a feature representation), and a decoder, which is a neural network that restores target information from the latent representation. Further, a technique related to VAE (Variational Autoencoder), which is derived from autoencoder, is known. VAE trains a neural network so that the probability distribution of latent representations follows a normal distribution. For example, using VAE, image data corresponding to handwritten numbers "0" to "9" (hereinafter also referred to as handwritten number images) and label data that are the correct answers to the numbers written in the image are generated. Convert the set of datasets into latent representations and map the latent representations to the latent space. Then, an image is generated while continuously changing the latent variables on the latent space of VAE. There is a known technique for generating a handwritten numeral image in which the numerals drawn in the image are continuously changed.

Diederik P. Kingma、他３名、“Semi-Supervised Learning with Deep Generative Models”、［online］、June 2014、［令和４年５月３１日検索］、インターネット＜URL：https://arxiv.org/abs/1406.5298v1＞Diederik P. Kingma, 3 others, “Semi-Supervised Learning with Deep Generative Models”, [online], June 2014, [Retrieved May 31, 2020], Internet <URL: https://arxiv.org /abs/1406.5298v1＞

また、近年、ストレスの発散や心の癒し等の精神的な豊かさを実現するための情報通信技術であるエンタテインメント・コンピューティング（entertainment computing）に関する研究が盛んに行われている。例えば、利用者の身体の動きを含む運動映像から任意の運動映像を生成可能とする技術が求められている。 Furthermore, in recent years, research has been actively conducted on entertainment computing, which is information and communication technology for realizing mental enrichment such as stress relief and mental healing. For example, there is a need for a technology that can generate arbitrary exercise images from exercise images that include the movement of a user's body.

本願は、利用者の身体の動きを含む運動映像から任意の運動映像を生成可能とすることができる情報処理装置、情報処理方法及び情報処理プログラムを提供することを目的とする。 An object of the present application is to provide an information processing device, an information processing method, and an information processing program that can generate an arbitrary exercise image from an exercise image that includes the movement of a user's body.

本願に係る情報処理装置は、関節角度の変化に関する時系列データに基づいて前記時系列データの特徴を示す特徴情報を生成するエンコーダと、前記特徴情報に基づいて前記時系列データを生成するデコーダと、を含む機械学習モデルを取得する取得部と、前記機械学習モデルの潜在空間を用いて、利用者の関節角度の変化に関する第１の時系列データから、前記利用者の関節角度の変化に関する第２の時系列データを生成する生成部と、を備える。 An information processing device according to the present application includes an encoder that generates feature information indicating characteristics of the time series data based on time series data regarding changes in joint angles, and a decoder that generates the time series data based on the feature information. an acquisition unit that acquires a machine learning model including: a first time series data regarding changes in the user's joint angles from first time series data regarding changes in the user's joint angles using the latent space of the machine learning model; and a generation unit that generates time series data of No. 2.

前記取得部は、前記特徴情報の確率分布が正規分布に従うように学習された前記機械学習モデルを取得し、前記生成部は、前記第１の時系列データを前記潜在空間に写像し、前記潜在空間における潜在変数を前記潜在空間に写像された前記第１の時系列データに対応する第１の特徴情報を持つ値から第２の特徴情報を持つ値に変化させ、変化させた後の前記潜在変数の値に対応する前記第２の特徴情報に基づいて、前記第２の時系列データを生成する。 The acquisition unit acquires the machine learning model trained so that the probability distribution of the feature information follows a normal distribution, and the generation unit maps the first time series data to the latent space and generates the latent space. The latent variable in the space is changed from a value having first feature information corresponding to the first time series data mapped to the latent space to a value having second feature information, and the latent variable after the change is changed. The second time series data is generated based on the second feature information corresponding to the value of the variable.

前記時系列データは、前記時系列データに対応する属性情報を含み、前記取得部は、前記属性情報を含む前記時系列データの特徴を示す前記特徴情報の確率分布が正規分布に従うように学習された前記機械学習モデルを取得し、前記生成部は、前記潜在空間における潜在変数を第１の属性情報に対応する前記第１の特徴情報を持つ値から第２の属性情報に対応する前記第２の特徴情報を持つ値に変化させ、変化させた後の前記潜在変数の値に対応する前記第２の特徴情報に基づいて、前記第２の時系列データを生成する。 The time series data includes attribute information corresponding to the time series data, and the acquisition unit is trained such that a probability distribution of the feature information indicating the characteristics of the time series data including the attribute information follows a normal distribution. the machine learning model, and the generation unit converts the latent variable in the latent space from the value having the first feature information corresponding to the first attribute information to the second characteristic information corresponding to the second attribute information. The second time series data is generated based on the second feature information corresponding to the value of the latent variable after the change.

前記機械学習モデルは、対象物を含む画像から前記対象物の姿勢を推定するよう学習された姿勢推定モデルをさらに含み、前記生成部は、前記姿勢推定モデルを用いて、前記利用者の身体の動きを含む第１の運動映像から前記利用者の関節点の座標を推定し、推定した関節点の座標に基づいて、前記第１の時系列データを生成する。 The machine learning model further includes a posture estimation model learned to estimate the posture of the object from an image including the object, and the generation unit uses the posture estimation model to estimate the posture of the user's body. The coordinates of the joint points of the user are estimated from a first motion image including movement, and the first time series data is generated based on the estimated coordinates of the joint points.

前記生成部は、生成した前記第２の時系列データに基づいて、前記第２の時系列データに対応する前記利用者の身体の動きを含む第２の運動映像を生成する。 The generation unit generates, based on the generated second time-series data, a second exercise image including a movement of the user's body corresponding to the second time-series data.

前記機械学習モデルは、対象者の関節点を含む関節画像から前記関節点に対応する前記対象者の人物画像を生成するよう学習された画像変換モデルをさらに含み、前記生成部は、前記画像変換モデルを用いて、前記第２の時系列データから前記第２の運動映像を生成する。 The machine learning model further includes an image transformation model trained to generate a person image of the subject corresponding to the joint points from a joint image including the joint points of the subject, and the generation unit is configured to perform the image transformation. The second motion image is generated from the second time series data using the model.

本願に係る情報処理装置は、関節角度の変化に関する時系列データを取得する取得部と、前記時系列データに基づいて前記時系列データの特徴を示す特徴情報を生成するエンコーダと、前記特徴情報に基づいて前記時系列データを生成するデコーダと、を含む機械学習モデルを生成するモデル生成部と、を備える。 The information processing device according to the present application includes: an acquisition unit that acquires time series data regarding changes in joint angles; an encoder that generates feature information indicating characteristics of the time series data based on the time series data; and a model generation unit that generates a machine learning model including a decoder that generates the time series data based on the time series data.

前記モデル生成部は、前記エンコーダに入力される前記時系列データと、前記デコーダから出力される前記時系列データとの類似度が所定の閾値を超えるように前記機械学習モデルを学習させる。 The model generation unit trains the machine learning model so that a degree of similarity between the time series data input to the encoder and the time series data output from the decoder exceeds a predetermined threshold.

前記モデル生成部は、前記特徴情報の確率分布が正規分布に従うように前記機械学習モデルを学習させる。 The model generation unit trains the machine learning model so that the probability distribution of the feature information follows a normal distribution.

前記取得部は、前記時系列データに対応する属性情報を含む前記時系列データを取得し、前記モデル生成部は、前記属性情報を含む前記時系列データの特徴を示す前記特徴情報の確率分布が正規分布に従うように前記機械学習モデルを学習させる。 The acquisition unit acquires the time series data including attribute information corresponding to the time series data, and the model generation unit generates a probability distribution of the feature information indicating the characteristics of the time series data including the attribute information. The machine learning model is trained to follow a normal distribution.

前記モデル生成部は、前記特徴情報を前記属性情報に応じたクラスタに分類する。 The model generation unit classifies the feature information into clusters according to the attribute information.

前記属性情報は、前記時系列データに対応する運動映像に含まれる対象者の身体の動きの種類、前記対象者の身体の動きの習熟度、前記対象者の身体の動きの特徴、または、前記対象者の生体情報を示す情報である。 The attribute information may include the type of body movement of the subject included in the exercise video corresponding to the time series data, the proficiency level of the subject's body movement, the characteristics of the subject's body movement, or This is information indicating the subject's biological information.

本願に係る情報処理方法は、情報処理装置が実行するプログラムにより実現される情報処理方法であって、関節角度の変化に関する時系列データに基づいて前記時系列データの特徴を示す特徴情報を生成するエンコーダと、前記特徴情報に基づいて前記時系列データを生成するデコーダと、を含む機械学習モデルを取得する取得工程と、前記機械学習モデルの潜在空間を用いて、利用者の関節角度の変化に関する第１の時系列データから、前記利用者の関節角度の変化に関する第２の時系列データを生成する生成工程と、を含む。 The information processing method according to the present application is an information processing method realized by a program executed by an information processing device, and the information processing method generates feature information indicating characteristics of the time series data based on time series data regarding changes in joint angles. an acquisition step of acquiring a machine learning model including an encoder and a decoder that generates the time series data based on the feature information; The method includes a generation step of generating second time-series data regarding changes in joint angles of the user from the first time-series data.

本願に係る情報処理方法は、情報処理装置が実行するプログラムにより実現される情報処理方法であって、関節角度の変化に関する時系列データを取得する取得工程と、前記時系列データに基づいて前記時系列データの特徴を示す特徴情報を生成するエンコーダと、前記特徴情報に基づいて前記時系列データを生成するデコーダと、を含む機械学習モデルを生成するモデル生成工程と、を含む。 The information processing method according to the present application is an information processing method realized by a program executed by an information processing device, and includes an acquisition step of acquiring time series data regarding changes in joint angles, and a step of acquiring time series data regarding changes in joint angles; The method includes a model generation step of generating a machine learning model including an encoder that generates feature information indicating characteristics of series data, and a decoder that generates the time series data based on the feature information.

本願に係る情報処理プログラムは、関節角度の変化に関する時系列データに基づいて前記時系列データの特徴を示す特徴情報を生成するエンコーダと、前記特徴情報に基づいて前記時系列データを生成するデコーダと、を含む機械学習モデルを取得する取得手順と、前記機械学習モデルの潜在空間を用いて、利用者の関節角度の変化に関する第１の時系列データから、前記利用者の関節角度の変化に関する第２の時系列データを生成する生成手順と、をコンピュータに実行させる。 The information processing program according to the present application includes an encoder that generates feature information indicating characteristics of the time series data based on time series data regarding changes in joint angles, and a decoder that generates the time series data based on the feature information. , an acquisition procedure for acquiring a machine learning model including: a first time series data regarding changes in the joint angles of the user from first time series data regarding changes in the joint angles of the user using the latent space of the machine learning model; A computer is made to execute the generation procedure for generating time series data in step 2.

本願に係る情報処理プログラムは、関節角度の変化に関する時系列データを取得する取得手順と、前記時系列データに基づいて前記時系列データの特徴を示す特徴情報を生成するエンコーダと、前記特徴情報に基づいて前記時系列データを生成するデコーダと、を含む機械学習モデルを生成するモデル生成手順と、をコンピュータに実行させる。 The information processing program according to the present application includes an acquisition procedure for acquiring time series data regarding changes in joint angles, an encoder that generates feature information indicating characteristics of the time series data based on the time series data, and an encoder that generates feature information indicating the characteristics of the time series data based on the time series data. a decoder that generates the time-series data based on the data, and a model generation procedure that generates a machine learning model.

実施形態の一態様によれば、利用者の身体の動きを含む運動映像から任意の運動映像を生成可能とすることができる。 According to one aspect of the embodiment, it is possible to generate an arbitrary exercise image from an exercise image that includes the movement of the user's body.

図１は、実施形態に係る情報処理の概要について説明するための図である。FIG. 1 is a diagram for explaining an overview of information processing according to an embodiment. 図２は、実施形態に係る情報処理システムの構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of an information processing system according to an embodiment. 図３は、実施形態に係る生成装置の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of a generation device according to an embodiment. 図４は、実施形態に係る生成装置による情報処理手順を示すフローチャートである。FIG. 4 is a flowchart showing an information processing procedure by the generation device according to the embodiment. 図５は、実施形態に係る情報処理装置の構成例を示す図である。FIG. 5 is a diagram illustrating a configuration example of an information processing device according to an embodiment. 図６は、実施形態に係る潜在空間の一例について説明するための図である。FIG. 6 is a diagram for explaining an example of the latent space according to the embodiment. 図７は、実施形態に係る情報処理装置による情報処理手順を示すフローチャートである。FIG. 7 is a flowchart showing an information processing procedure by the information processing apparatus according to the embodiment. 図８は、変形例に係る潜在空間の一例について説明するための図である。FIG. 8 is a diagram for explaining an example of a latent space according to a modification. 図９は、情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 9 is a hardware configuration diagram showing an example of a computer that implements the functions of the information processing device.

以下に、本願に係る情報処理装置、情報処理方法及び情報処理プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報処理装置、情報処理方法及び情報処理プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 DESCRIPTION OF THE PREFERRED EMBODIMENTS An information processing apparatus, an information processing method, and an information processing program according to the present application (hereinafter referred to as "embodiments") will be described in detail below with reference to the drawings. Note that the information processing apparatus, information processing method, and information processing program according to the present application are not limited to this embodiment. Further, in each of the embodiments below, the same parts are given the same reference numerals, and redundant explanations will be omitted.

（実施形態）
〔１．情報処理の概要〕
図１は、実施形態に係る情報処理の概要について説明するための図である。図１では、実施形態に係る情報処理装置１００によって、実施形態に係る情報処理などが実現されるものとする。図１では、情報処理装置１００が、学習済みの機械学習モデルＭ１の潜在空間を用いて、ジャズダンスを踊っている利用者を撮像したダンス映像Ｇ１に基づいて、利用者が踊っているダンスのジャンルをジャズダンスからヒップホップダンスに変化させたダンス映像Ｇ２を生成する場合について説明する。 (Embodiment)
[1. Overview of information processing]
FIG. 1 is a diagram for explaining an overview of information processing according to an embodiment. In FIG. 1, it is assumed that the information processing apparatus 100 according to the embodiment realizes information processing according to the embodiment. In FIG. 1, the information processing device 100 uses the latent space of the trained machine learning model M1 to create a dance image of the user dancing based on a dance video G1 of the user dancing a jazz dance. A case will be described in which a dance video G2 in which the genre is changed from jazz dance to hip-hop dance is generated.

具体的には、情報処理装置１００は、姿勢推定モデルを用いて、ダンス映像Ｇ１に撮像された利用者の関節点の座標をフレームごとに推定する。続いて、情報処理装置１００は、推定された各関節点の座標から各関節の関節角度のフレームごとの変化量（以下、関節角度の変化量の第１の時系列データともいう）を算出する。ここで、姿勢推定モデルとは、対象物を含む画像から対象物の姿勢を推定するよう学習された機械学習モデルである。続いて、情報処理装置１００は、各関節の関節角度の初期角度と各関節の関節角度の変化量の第１の時系列データを入力情報として機械学習モデルＭ１に入力し、第１の特徴情報を生成する。情報処理装置１００は、生成した第１の特徴情報を機械学習モデルＭ１の潜在空間にマッピングする。 Specifically, the information processing device 100 uses the posture estimation model to estimate the coordinates of the joint points of the user captured in the dance video G1 for each frame. Next, the information processing device 100 calculates the amount of change in the joint angle of each joint for each frame (hereinafter also referred to as first time series data of the amount of change in joint angle) from the estimated coordinates of each joint point. . Here, the posture estimation model is a machine learning model trained to estimate the posture of a target object from an image including the target object. Subsequently, the information processing device 100 inputs the first time series data of the initial joint angle of each joint and the amount of change in the joint angle of each joint as input information to the machine learning model M1, and inputs the first characteristic information. generate. The information processing device 100 maps the generated first feature information to the latent space of the machine learning model M1.

本実施形態における機械学習モデルＭ１は、時系列データに対応する特徴情報を潜在空間にマッピングするように事前に学習された機械学習モデルである。具体的には、機械学習モデルＭ１は、時系列データに対応する特徴情報の確率分布が正規分布に従うように事前に学習されたニューラルネットワークであってよい。例えば、機械学習モデルＭ１は、ＶＡＥ（Variational Autoencoder）にＲＮＮ（Recurrent Neural Network）を適用した機械学習モデルであるＶＲＡＥ（Variational Recurrent Autoencoders）であってよい（参考文献；Otto Fabius、他１名、“VARIATIONAL RECURRENT AUTO-ENCODERS”、［online］、December 2014、［令和４年５月３１日検索］、インターネット＜URL：https://arxiv.org/abs/1412.6581v1＞）。図１では、機械学習モデルＭ１がＶＲＡＥである場合について説明する。 The machine learning model M1 in this embodiment is a machine learning model trained in advance to map feature information corresponding to time-series data onto a latent space. Specifically, the machine learning model M1 may be a neural network trained in advance so that the probability distribution of feature information corresponding to time-series data follows a normal distribution. For example, the machine learning model M1 may be VRAE (Variational Recurrent Autoencoders), which is a machine learning model that applies RNN (Recurrent Neural Network) to VAE (Variational Autoencoder). VARIATIONAL RECURRENT AUTO-ENCODERS”, [online], December 2014, [searched on May 31, 2020], Internet <URL: https://arxiv.org/abs/1412.6581v1>). In FIG. 1, a case will be described in which the machine learning model M1 is VRAE.

ここで、ＶＲＡＥの基礎となっているＶＡＥ（Variational Autoencoder）について詳しく説明する。ＶＡＥは、画像を生成する生成モデルの一種として知られる。例えば、手書きで書かれた「０」～「９」の数字に対応する画像データ（以下、手書き数字画像ともいう）と、画像に書かれた数字の正解となるラベルデータとの組のデータセットを学習データ（訓練データともいう）としてＶＡＥを学習させる。具体的には、データセットを潜在表現に変換して、潜在表現を潜在空間にマッピングするようＶＡＥを学習させる。ここで、ＶＡＥは、潜在表現の確率分布が正規分布に従うようにニューラルネットワークを学習する点に特徴がある。そのため、ＶＡＥの潜在空間では、類似する画像、つまり、手書き数字画像の場合、同じ数字が描かれた手書き数字画像に対応する潜在表現同士が潜在空間上の近い位置にマッピングされる傾向がある。また、同じ数字が描かれた手書き数字画像の潜在表現同士が潜在空間上の近い位置にマッピングされることは、各数字が描かれた手書き数字画像に対応する潜在表現のクラスタが生成されることに対応する。例えば、「１」という数字が描かれた手書き数字画像に対応する潜在表現のクラスタ（以下、「１」のクラスタ）、「２」という数字が描かれた手書き数字画像に対応する潜在表現のクラスタ（以下、「２」のクラスタ）、および、「３」という数字が描かれた手書き数字画像に対応する潜在表現のクラスタ（以下、「３」のクラスタ）、…のように各数字に対応する潜在表現のクラスタがそれぞれ生成される。例えば、潜在空間上の「１」のクラスタから最も近い距離に「２」のクラスタがマッピングされたとする。また、「２」のクラスタの次に「１」のクラスタから近い距離に「３」のクラスタがマッピングされたとする。このとき、ＶＡＥの潜在空間における潜在変数を、例えば、「１」のクラスタの平均値から「３」のクラスタの平均値まで「１」→「２」→「３」のように連続的に変化させながら画像を生成する。これにより、画像に描かれた数字を「１」→「２」→「３」のように「１」から「３」まで連続的に変化させた手書き数字画像を生成することができる。 Here, VAE (Variational Autoencoder), which is the basis of VRAE, will be explained in detail. VAE is known as a type of generative model that generates images. For example, a data set includes image data corresponding to handwritten numbers "0" to "9" (hereinafter also referred to as handwritten number images) and label data that is the correct answer to the number written in the image. The VAE is trained using the data as learning data (also referred to as training data). Specifically, we transform the dataset into latent representations and train the VAE to map the latent representations to the latent space. Here, VAE is characterized in that a neural network is trained so that the probability distribution of latent expressions follows a normal distribution. Therefore, in the latent space of VAE, in the case of similar images, that is, handwritten numeric images, latent expressions corresponding to handwritten numeric images in which the same number is drawn tend to be mapped to close positions on the latent space. Furthermore, the fact that latent representations of handwritten digit images with the same digits are mapped to close positions in the latent space means that clusters of latent representations corresponding to handwritten digit images with each digit are generated. corresponds to For example, a cluster of latent expressions corresponding to an image of handwritten numerals with the number "1" drawn on them (hereinafter referred to as a cluster of "1"), a cluster of latent expressions corresponding to an image of handwritten numerals with the number "2" drawn on them. (hereinafter referred to as a cluster of "2"), and a cluster of latent expressions corresponding to a handwritten digit image with the number "3" drawn (hereinafter referred to as a cluster of "3"), corresponding to each number as in... Each cluster of latent representations is generated. For example, assume that cluster "2" is mapped to the closest distance from cluster "1" in the latent space. Further, it is assumed that the cluster "3" is mapped next to the cluster "2" at a distance close to the cluster "1". At this time, the latent variable in the latent space of VAE is changed continuously, for example, from the average value of the cluster "1" to the average value of the cluster "3" in the order of "1" → "2" → "3". Generate images while As a result, it is possible to generate a handwritten number image in which the numbers drawn in the image are changed continuously from "1" to "3", such as "1" → "2" → "3".

図１の説明に戻る。図１に示す例では、あらかじめ、ダンス映像とダンス映像に含まれるダンスの種類（例えば、ジャズダンス、バレエダンス、ヒップホップダンス等のジャンル）を示すラベルデータとの組のデータセットを学習データとして機械学習モデルＭ１を学習させる。具体的には、データセットを特徴情報（上記の潜在表現に対応）に変換して、特徴情報を潜在空間にマッピングするように機械学習モデルＭ１を学習させる。ここで、ＶＲＡＥである機械学習モデルＭ１は、ＶＡＥと同様に、特徴情報の確率分布が正規分布に従うようにニューラルネットワークを学習する。そのため、機械学習モデルＭ１の潜在空間では、類似するダンス映像、つまり、同じ種類のダンスを含むダンス映像に対応する特徴情報同士が潜在空間上の近い位置にマッピングされる傾向がある。また、同じ種類のダンスを含むダンス映像の特徴情報同士が潜在空間上の近い位置にマッピングされることは、各種類のダンス映像に対応する潜在表現のクラスタが生成されることに対応する。図１では、ジャズダンスのダンス映像に対応する特徴情報のクラスタ（以下、ジャズダンスのクラスタ）、バレエダンスのダンス映像に対応する特徴情報のクラスタ（以下、バレエダンスのクラスタ）、および、ヒップホップダンスのダンス映像に対応する特徴情報のクラスタ（以下、ヒップホップダンスのクラスタ）、…のように各種類のダンス映像に対応する特徴情報のクラスタがそれぞれ生成される。そして、各種類のダンス映像に対応する特徴情報のクラスタが潜在空間上にマッピングされる様子を示す。また、情報処理装置１００は、公知のクラスタリング技術を用いて、図１に示す潜在空間にマッピングされた特徴情報を、ダンス映像に含まれるダンスの種類（例えば、ジャズダンス、バレエダンス、ヒップホップダンス等）に応じたクラスタに分類してよい。その上で、例えば、情報処理装置１００は、ダンス映像Ｇ１がジャズダンスの映像である場合、潜在空間におけるジャズダンスのクラスタの位置に第１の特徴情報をマッピングする。図１に示す点Ｐ１は、潜在空間にマッピングされた第１の特徴情報の位置を示す。 Returning to the explanation of FIG. In the example shown in Figure 1, a data set of a dance video and label data indicating the type of dance included in the dance video (for example, a genre such as jazz dance, ballet dance, hip-hop dance, etc.) is used as learning data in advance. Train the machine learning model M1. Specifically, the data set is converted into feature information (corresponding to the latent expression described above), and the machine learning model M1 is trained to map the feature information to the latent space. Here, the machine learning model M1, which is VRAE, trains the neural network so that the probability distribution of feature information follows a normal distribution, similarly to VAE. Therefore, in the latent space of the machine learning model M1, feature information corresponding to similar dance videos, that is, dance videos including the same type of dance, tends to be mapped to close positions on the latent space. Further, the fact that feature information of dance videos including the same type of dance is mapped to close positions in the latent space corresponds to the generation of clusters of latent expressions corresponding to each type of dance video. In FIG. 1, a cluster of feature information corresponding to a dance video of jazz dance (hereinafter referred to as a cluster of jazz dance), a cluster of feature information corresponding to a dance video of ballet dance (hereinafter referred to as a cluster of ballet dance), and a cluster of feature information corresponding to a dance video of ballet dance (hereinafter referred to as a cluster of hip-hop dance) are shown. Clusters of feature information corresponding to each type of dance video are generated, such as a cluster of feature information corresponding to a dance video (hereinafter referred to as a cluster of hip-hop dance), and so on. It also shows how clusters of feature information corresponding to each type of dance video are mapped onto the latent space. In addition, the information processing device 100 uses a known clustering technique to calculate the type of dance included in the dance video (for example, jazz dance, ballet dance, hip-hop dance etc.) may be classified into clusters according to the following. In addition, for example, when the dance video G1 is a jazz dance video, the information processing device 100 maps the first feature information to the position of the jazz dance cluster in the latent space. A point P1 shown in FIG. 1 indicates the position of the first feature information mapped to the latent space.

続いて、情報処理装置１００は、潜在空間における潜在変数を第１の特徴情報を持つ値から第２の特徴情報を持つ値に変化させる。例えば、情報処理装置１００は、潜在変数をジャズダンスのクラスタに属する第１の特徴情報を持つ値からヒップホップのクラスタに属する第２の特徴情報を持つ値まで変化させる。図１に示す点Ｐ２は、潜在空間にマッピングされた第２の特徴情報の位置を示す。例えば、潜在空間上のジャズダンスのクラスタから最も近い距離にバレエダンスのクラスタがマッピングされる。また、バレエダンスのクラスタの次にジャズダンスのクラスタから近い距離にヒップホップダンスのクラスタがマッピングされる。このとき、情報処理装置１００は、潜在変数を、ジャズダンスのクラスタに属する第１の特徴情報を持つ値からヒップホップダンスのクラスタの平均値から所定範囲内に位置する第２の特徴情報を持つ値に連続的に変化させてよい。例えば、情報処理装置１００は、ジャズダンスのクラスタに属する第１の特徴情報を持つ値→バレエダンスのクラスタの平均値を持つ値→ヒップホップダンスのクラスタの平均値から所定範囲内に位置する第２の特徴情報を持つ値のように潜在変数を連続的に変化させてよい。これにより、情報処理装置１００は、利用者が踊っているダンスのジャンルを、例えば、ジャズダンス→バレエダンス→ヒップホップダンスのようにジャズダンスからヒップホップダンスまで連続的に変化させたダンス映像を生成することができる。 Subsequently, the information processing device 100 changes the latent variable in the latent space from a value having the first feature information to a value having the second feature information. For example, the information processing device 100 changes the latent variable from a value having first feature information belonging to the jazz dance cluster to a value having second feature information belonging to the hip-hop cluster. Point P2 shown in FIG. 1 indicates the position of the second feature information mapped to the latent space. For example, a ballet dance cluster is mapped to the closest distance from a jazz dance cluster on the latent space. Furthermore, the hip-hop dance cluster is mapped next to the ballet dance cluster and the closest distance from the jazz dance cluster. At this time, the information processing device 100 changes the latent variable from a value having first feature information belonging to the jazz dance cluster to a value having second feature information located within a predetermined range from the average value of the hip hop dance cluster. The value may be changed continuously. For example, the information processing device 100 may select a value having the first characteristic information belonging to the jazz dance cluster→a value having the average value of the ballet dance cluster→a value located within a predetermined range from the average value of the hip hop dance cluster. The latent variable may be changed continuously, such as the value having characteristic information of 2. As a result, the information processing device 100 displays a dance video in which the genre of dance performed by the user is changed continuously from jazz dance to hip hop dance, for example, from jazz dance to ballet dance to hip hop dance. can be generated.

続いて、情報処理装置１００は、変化させた後の潜在変数の値に対応する第２の特徴情報に基づいて、各関節の関節角度のフレームごとの変化量（以下、関節角度の変化量の第２の時系列データともいう）を生成する。例えば、情報処理装置１００は、機械学習モデルＭ１の出力情報として、各関節の関節角度の変化量の第２の時系列データを出力し、各関節の関節角度の変化量の第２の時系列データを生成する。続いて、情報処理装置１００は、機械学習モデルＭ１から出力された各関節の関節角度の変化量の第２の時系列データと各関節の関節角度の初期角度に基づいて、フレームごとの各関節の関節角度を算出する。続いて、情報処理装置１００は、フレームごとの各関節の関節角度から、フレームごとの各関節点の座標を算出する。 Next, the information processing device 100 calculates the amount of change in the joint angle of each joint (hereinafter referred to as the amount of change in the joint angle) for each frame based on the second feature information corresponding to the value of the latent variable after the change. (also referred to as second time series data). For example, the information processing device 100 outputs second time series data of the amount of change in the joint angle of each joint as output information of the machine learning model M1, and outputs second time series data of the amount of change in the joint angle of each joint. Generate data. Subsequently, the information processing device 100 calculates each joint for each frame based on the second time series data of the amount of change in the joint angle of each joint output from the machine learning model M1 and the initial angle of the joint angle of each joint. Calculate the joint angle of. Subsequently, the information processing device 100 calculates the coordinates of each joint point for each frame from the joint angle of each joint for each frame.

続いて、情報処理装置１００は、画像変換モデルを用いて、算出された各関節点の座標に対応する関節点を含む各フレームをダンス中の利用者を含むダンス映像Ｇ２に変換する。ここで、画像変換モデルとは、対象者の関節点を含む関節画像から関節点に対応する対象者の人物画像を生成するよう学習された機械学習モデルである。ここで、第２の特徴情報は、潜在空間におけるヒップホップダンスのクラスタの位置にマッピングされているので、第２の特徴情報に対応するダンス映像Ｇ２は、利用者がヒップホップダンスを踊っている映像に対応する。 Subsequently, the information processing device 100 uses the image conversion model to convert each frame including the joint points corresponding to the calculated coordinates of each joint point into a dance video G2 including the user dancing. Here, the image conversion model is a machine learning model trained to generate a human image of the subject corresponding to the joint points from a joint image including the subject's joint points. Here, the second feature information is mapped to the position of the hip-hop dance cluster in the latent space, so the dance video G2 corresponding to the second feature information shows the user dancing the hip-hop dance. Compatible with video.

上述したように、情報処理装置１００は、学習済みの機械学習モデルＭ１を用いて、ダンス映像Ｇ１から第１の特徴情報を生成し、第１の特徴情報を潜在空間にマッピングする。続いて、情報処理装置１００は、潜在空間における潜在変数を第１の特徴情報を持つ値から第２の特徴情報を持つ値に変化させる。続いて、情報処理装置１００は、変化させた後の潜在変数の値に対応する第２の特徴情報に基づいて、ダンス映像Ｇ２を生成する。このように、情報処理装置１００は、機械学習モデルＭ１の潜在空間を用いることにより、利用者のダンス映像Ｇ１を潜在空間上の任意の値に対応したダンス映像Ｇ２へと変化させることができる。すなわち、情報処理装置１００は、機械学習モデルＭ１の潜在空間を用いることにより、利用者のダンス映像Ｇ１からダンス映像Ｇ１を加工したダンス映像Ｇ２へのモーフィングを実現可能にする。例えば、情報処理装置１００は、ダンス映像の種類に応じて分類された潜在空間を用いることにより、ジャズダンスのダンス映像Ｇ１からヒップホップダンスのダンス映像Ｇ２へのモーフィングを実現可能にする。すなわち、情報処理装置１００は、利用者のダンス映像に基づいて、利用者が所望するダンス映像の属性（例えば、ダンスの種類）に応じた新たなダンス映像を生成することができる。したがって、情報処理装置１００は、利用者のダンス映像から任意のダンス映像を生成可能とすることができる。また、情報処理装置１００は、利用者が所望するダンス映像の属性（例えば、ダンスの種類）に応じた新たなダンス映像を利用者に対して提供可能とすることができる。すなわち、情報処理装置１００は、利用者に対して新しいエンタテインメントを提供可能とすることができる。したがって、情報処理装置１００は、利用者に対して精神的な豊かさを提供可能とすることができる。 As described above, the information processing device 100 uses the trained machine learning model M1 to generate the first feature information from the dance video G1, and maps the first feature information to the latent space. Subsequently, the information processing device 100 changes the latent variable in the latent space from a value having the first feature information to a value having the second feature information. Subsequently, the information processing device 100 generates the dance video G2 based on the second feature information corresponding to the value of the latent variable after being changed. In this way, by using the latent space of the machine learning model M1, the information processing device 100 can change the dance video G1 of the user into the dance video G2 corresponding to an arbitrary value on the latent space. That is, the information processing device 100 uses the latent space of the machine learning model M1 to realize morphing from the user's dance video G1 to the dance video G2 obtained by processing the dance video G1. For example, the information processing device 100 can realize morphing from a jazz dance dance video G1 to a hip-hop dance video G2 by using latent spaces classified according to the type of dance video. That is, the information processing device 100 can generate a new dance video based on the user's dance video according to the attribute of the dance video (for example, the type of dance) desired by the user. Therefore, the information processing device 100 can generate any dance video from the user's dance video. Further, the information processing device 100 can provide the user with a new dance video according to the dance video attribute (for example, type of dance) desired by the user. That is, the information processing device 100 can provide new entertainment to users. Therefore, the information processing device 100 can provide spiritual enrichment to the user.

〔２．情報処理システムの構成〕
図２は、実施形態に係る情報処理システム１の構成例を示す図である。図２に示すように、実施形態に係る情報処理システム１には、生成装置２０と情報処理装置１００とが含まれる。生成装置２０と情報処理装置１００とは、各種の通信ネットワークを介して、有線または無線で互いに通信可能に接続される。なお、図２に示した情報処理システム１には、任意の数の生成装置２０と、任意の数の情報処理装置１００とが含まれていてもよい。 [2. Information processing system configuration]
FIG. 2 is a diagram showing a configuration example of the information processing system 1 according to the embodiment. As shown in FIG. 2, the information processing system 1 according to the embodiment includes a generation device 20 and an information processing device 100. The generation device 20 and the information processing device 100 are connected to be able to communicate with each other by wire or wirelessly via various communication networks. Note that the information processing system 1 shown in FIG. 2 may include any number of generation devices 20 and any number of information processing devices 100.

生成装置２０は、図１で説明した機械学習モデルＭ１を生成するサーバ装置である。生成装置２０は、機械学習モデルＭ１を生成した場合、生成した機械学習モデルＭ１に関する情報を各利用者の情報処理装置１００に配信する。 The generation device 20 is a server device that generates the machine learning model M1 described in FIG. 1. When generating the machine learning model M1, the generation device 20 distributes information regarding the generated machine learning model M1 to the information processing device 100 of each user.

情報処理装置１００は、図１で説明した情報処理を実現する情報処理装置である。具体的には、情報処理装置１００は、利用者によって使用されるスマートフォン等の端末装置であってよい。情報処理装置１００は、生成装置２０から機械学習モデルＭ１を取得し、図１で説明した情報処理を実現する。 The information processing device 100 is an information processing device that implements the information processing described in FIG. 1. Specifically, the information processing device 100 may be a terminal device such as a smartphone used by a user. The information processing device 100 acquires the machine learning model M1 from the generation device 20 and implements the information processing described in FIG. 1.

〔３．生成装置の構成〕
図３は、実施形態に係る生成装置２０の構成例を示す図である。生成装置２０は、通信部２１と、記憶部２２と、制御部２３とを有する。 [3. Configuration of generation device]
FIG. 3 is a diagram showing a configuration example of the generation device 20 according to the embodiment. The generation device 20 includes a communication section 21, a storage section 22, and a control section 23.

（通信部２１）
通信部２１は、ＮＩＣ（Network Interface Card）やアンテナ等によって実現される。通信部２１は、各種ネットワークと有線または無線で接続され、例えば、情報処理装置１００との間で情報の送受信を行う。 (Communication Department 21)
The communication unit 21 is realized by a NIC (Network Interface Card), an antenna, or the like. The communication unit 21 is connected to various networks by wire or wirelessly, and transmits and receives information to and from the information processing device 100, for example.

（記憶部２２）
記憶部２２は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。具体的には、記憶部２２は、各種プログラム（情報処理プログラムの一例）を記憶する。また、記憶部２２は、モデル生成部２３２によって生成された機械学習モデルＭ１に関する情報を記憶する。 (Storage unit 22)
The storage unit 22 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. Specifically, the storage unit 22 stores various programs (an example of an information processing program). Furthermore, the storage unit 22 stores information regarding the machine learning model M1 generated by the model generation unit 232.

（制御部２３）
制御部２３は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、生成装置２０内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部２３は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Control unit 23)
The control unit 23 is a controller, and includes various programs (an example of an information processing program) stored in a storage device inside the generation device 20 by, for example, a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). (equivalent to ) is realized by executing the RAM as a work area. Further, the control unit 23 is a controller, and is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

制御部２３は、取得部２３１と、モデル生成部２３２と、配信部２３３を機能部として有し、以下に説明する情報処理の作用を実現または実行してよい。なお、制御部２３の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、各機能部は、制御部２３の機能を示したものであり、必ずしも物理的に区別されるものでなくともよい。 The control unit 23 has an acquisition unit 231, a model generation unit 232, and a distribution unit 233 as functional units, and may realize or execute the information processing operation described below. Note that the internal configuration of the control unit 23 is not limited to the configuration shown in FIG. 3, and may be any other configuration as long as it performs information processing to be described later. Further, each functional unit indicates a function of the control unit 23, and does not necessarily have to be physically distinct.

（取得部２３１）
取得部２３１は、モデル生成部２３２による機械学習モデルＭ１の学習に用いられる学習データを取得する。具体的には、取得部２３１は、学習データとして、あらかじめ人手によってダンス映像の属性を示す属性情報とダンス映像とが紐づけられた情報を取得してよい。属性情報は、例えば、ダンス映像に含まれる人物（以下、対象者ともいう）が踊るダンスの種類であってよい。より具体的には、取得部２３１は、学習データとして、ダンス映像に含まれるダンスの種類を示すラベル（例えば、ジャズダンス、バレエダンス、ヒップホップダンス等のダンスのジャンルを示すラベル）とダンス映像との組み合わせからなるデータセットを取得してよい。例えば、取得部２３１は、学習データを作成した作成者によって使用される端末装置から学習データを取得してよい。 (Acquisition unit 231)
The acquisition unit 231 acquires learning data used for learning the machine learning model M1 by the model generation unit 232. Specifically, the acquisition unit 231 may acquire, as learning data, information in which attribute information indicating the attributes of the dance video and the dance video are linked manually in advance. The attribute information may be, for example, the type of dance performed by a person (hereinafter also referred to as a target person) included in the dance video. More specifically, the acquisition unit 231 acquires, as learning data, a label indicating the type of dance included in the dance video (for example, a label indicating the dance genre such as jazz dance, ballet dance, hip-hop dance, etc.) and the dance video. You may obtain a dataset consisting of a combination of For example, the acquisition unit 231 may acquire the learning data from a terminal device used by the creator who created the learning data.

また、取得部２３１は、関節角度の変化量の時系列データを取得する。具体的には、取得部２３１は、姿勢推定モデルを用いて、学習データのダンス映像に含まれる対象者の関節点の座標をフレームごとに推定してよい。続いて、取得部２３１は、推定された各関節点の座標から各関節の関節角度のフレームごとの変化量を算出してよい。例えば、取得部２３１は、推定された各関節点の座標から９つの関節点（首、両肩、両肘、両腰および両膝）の関節角度のフレームごとの変化量を算出してよい。 The acquisition unit 231 also acquires time-series data of the amount of change in joint angle. Specifically, the acquisition unit 231 may estimate the coordinates of the joint points of the subject included in the dance video of the learning data for each frame using the posture estimation model. Subsequently, the acquisition unit 231 may calculate the amount of change in the joint angle of each joint for each frame from the estimated coordinates of each joint point. For example, the acquisition unit 231 may calculate the amount of change in the joint angles of nine joint points (neck, both shoulders, both elbows, both hips, and both knees) for each frame from the estimated coordinates of each joint point.

例えば、取得部２３１は、第２余弦定理および逆三角定理を用いて、３つの関節点の座標から、フレームごとの各関節の関節角度を算出してよい。例として、取得部２３１が、右膝の関節角度を算出する場合について説明する。取得部２３１は、対象者の骨格モデルにおける右腰の関節点（以下、点Ｂと記載する）の座標（r_hip（x）、r_hip（y））、右膝の関節点（以下、点Ｃと記載する）の座標（r_knee（x）、r_knee（y））、および、右足首の関節点（以下、点Ａと記載する）の座標（r_ankle（x）、r_ankle（y））から、フレームごとの右膝の関節角度を算出してよい。例えば、取得部２３１は、三角形ＡＢＣの各辺の長さの二乗を、それぞれ、（ＢＣ）^２＝（r_knee（x）－r_hip（x））^２＋（r_knee（y）－r_hip（y））^２、（ＣＡ）^２＝（r_knee（x）－r_ankle（x））^２＋（r_knee（y）－r_ankle（y））^２、（ＡＢ）^２＝（r_ankle（x）－r_hip（x））^２＋（r_ankle（y）－r_hip（y））^２によって算出する。続いて、取得部２３１は、三角形ＡＢＣの頂点Ｃの角度（すなわち、右膝の関節角度）をθで表すと、第２余弦定理より、（ＡＢ）^２＝（ＢＣ）^２＋（ＣＡ）^２－２（ＢＣ）*（ＣＡ）cosθが成り立つので、逆三角定理を用いて、θ＝cos^（－１）（（ＢＣ）^２＋（ＣＡ）^２－（ＡＢ）^２）/２（ＢＣ）*（ＣＡ）により算出する。取得部２３１は、右膝の関節角度を算出する場合と同様にして、３つの関節点の座標から、フレームごとの各関節の関節角度を算出してよい。続いて、取得部２３１は、所定のフレームにおける各関節の関節角度と、所定のフレームの次のフレームにおける各関節の関節角度との差分を算出することにより、各関節の関節角度のフレームごとの変化量（以下、関節角度の変化量の時系列データともいう）を算出してよい。なお、上記の例では、取得部２３１が、３つの関節点として、右腰、右膝、右足首のように骨格モデルにおいて連続する３部位の関節点を選択し、これらからなる角度を関節角度として算出する場合について説明したが、本実施形態はこれに限定されない。すなわち、一実施形態において取得部２３１は、骨格モデルにおける任意の３部位からなる角度を関節角度として算出してもよい。 For example, the acquisition unit 231 may calculate the joint angle of each joint for each frame from the coordinates of the three joint points using the second cosine theorem and the inverse triangle theorem. As an example, a case will be described in which the acquisition unit 231 calculates the joint angle of the right knee. The acquisition unit 231 obtains the coordinates (r_hip(x), r_hip(y)) of the right hip joint point (hereinafter referred to as point B) and the right knee joint point (hereinafter referred to as point C) in the subject's skeletal model. From the coordinates (r_knee (x), r_knee (y)) of the joint point of the right ankle (hereinafter referred to as point A) (r_ankle (x), r_ankle (y)), The joint angle of the right knee may be calculated. For example, the acquisition unit 231 calculates the square of the length of each side of triangle ABC as follows: (BC) ² = (r_knee(x) - r_hip(x)) ² + (r_knee(y) - r_hip(y)) ² , (CA) ² = (r_knee (x) - r_ankle (x)) ² + (r_knee (y) - r_ankle (y)) ² , (AB) ² = (r_ankle (x) - r_hip (x)) ² +(r_ankle(y)−r_hip(y)) Calculated by ² . Subsequently, the acquisition unit 231 calculates that if the angle of the vertex C of the triangle ABC (that is, the joint angle of the right knee) is expressed by θ, then (AB) ² = (BC) ² + (CA) ² from the second cosine theorem. -2(BC)*(CA)cosθ holds, so using the inverse trigonometric theorem, θ=cos^(-1)((BC) ² +(CA) ² -(AB) ² )/2(BC) *Calculated using (CA). The acquisition unit 231 may calculate the joint angle of each joint for each frame from the coordinates of the three joint points in the same manner as when calculating the joint angle of the right knee. Next, the acquisition unit 231 obtains the joint angle of each joint for each frame by calculating the difference between the joint angle of each joint in a predetermined frame and the joint angle of each joint in the frame following the predetermined frame. The amount of change (hereinafter also referred to as time series data of the amount of change in joint angle) may be calculated. In the above example, the acquisition unit 231 selects the joint points of three consecutive parts of the skeletal model, such as the right hip, right knee, and right ankle, as the three joint points, and calculates the angle formed by these joint points as the joint angle. Although a case has been described in which the calculation is performed as follows, the present embodiment is not limited to this. That is, in one embodiment, the acquisition unit 231 may calculate an angle formed by arbitrary three parts in the skeletal model as a joint angle.

(モデル生成部２３２)
モデル生成部２３２は、取得部２３１によって取得された各関節の関節角度の変化量の時系列データ（以下、時系列データともいう）に基づいて時系列データの特徴を示す特徴情報を生成するエンコーダと、特徴情報に基づいて時系列データを生成するデコーダと、を含む機械学習モデルＭ１を生成する。具体的には、機械学習モデルＭ１は、ＶＡＥにＲＮＮを適用した機械学習モデルであるＶＲＡＥであってよい。 (Model generation unit 232)
The model generation unit 232 is an encoder that generates feature information indicating the characteristics of the time series data based on the time series data (hereinafter also referred to as time series data) of the amount of change in the joint angle of each joint acquired by the acquisition unit 231. A machine learning model M1 is generated that includes: and a decoder that generates time series data based on the feature information. Specifically, the machine learning model M1 may be VRAE, which is a machine learning model in which RNN is applied to VAE.

より具体的には、モデル生成部２３２は、機械学習モデルＭ１のエンコーダを用いて、時系列データから特徴情報を生成してよい。ここで、特徴情報は、時系列データよりも低次元のベクトルであってよい。モデル生成部２３２は、機械学習モデルＭ１のエンコーダを用いて、時系列データを特徴情報に次元圧縮する。続いて、モデル生成部２３２は、機械学習モデルＭ１のデコーダを用いて、特徴情報から時系列データを生成してよい。続いて、モデル生成部２３２は、エンコーダに入力される時系列データと、デコーダから出力される時系列データとの類似度が所定の閾値を超えるように機械学習モデルＭ１を学習させてよい。例えば、モデル生成部２３２は、バックプロパゲーション等を用いて、エンコーダに入力される時系列データと、デコーダから出力される時系列データとの類似度が所定の閾値を超えるまで、機械学習モデルＭ１のエンコーダとデコーダをそれぞれ学習させてよい。また、モデル生成部２３２は、特徴情報の確率分布が正規分布に従うように機械学習モデルＭ１を学習させてよい。例えば、モデル生成部２３２は、特徴情報の確率分布が正規分布に従うと仮定して、正規分布の平均μおよび分散σを出力するようエンコーダを学習させてよい。また、モデル生成部２３２は、エンコーダから出力された平均μおよび分散σに基づいて、正規分布Ｎ（μ、σ）に従う特徴情報をサンプリングし、サンプリングされた特徴情報から時系列データを復元するようデコーダを学習させてよい。このようにして、モデル生成部２３２は、学習済みの機械学習モデルＭ１を生成してよい。 More specifically, the model generation unit 232 may generate feature information from time-series data using the encoder of the machine learning model M1. Here, the feature information may be a vector with a lower dimension than the time series data. The model generation unit 232 uses the encoder of the machine learning model M1 to dimensionally compress the time series data into feature information. Subsequently, the model generation unit 232 may generate time series data from the feature information using the decoder of the machine learning model M1. Subsequently, the model generation unit 232 may cause the machine learning model M1 to learn such that the degree of similarity between the time series data input to the encoder and the time series data output from the decoder exceeds a predetermined threshold. For example, the model generation unit 232 uses backpropagation or the like to generate the machine learning model M1 until the similarity between the time series data input to the encoder and the time series data output from the decoder exceeds a predetermined threshold. The encoder and decoder may be trained separately. Furthermore, the model generation unit 232 may cause the machine learning model M1 to learn so that the probability distribution of the feature information follows a normal distribution. For example, the model generation unit 232 may make the encoder learn to output the mean μ and variance σ of the normal distribution, assuming that the probability distribution of the feature information follows a normal distribution. The model generation unit 232 also samples feature information according to a normal distribution N(μ, σ) based on the mean μ and variance σ output from the encoder, and restores time series data from the sampled feature information. You can train the decoder. In this way, the model generation unit 232 may generate the trained machine learning model M1.

また、取得部２３１は、時系列データに対応する属性情報を含む時系列データを取得してよい。例えば、取得部２３１は、時系列データに対応する属性情報を含む時系列データとして、ダンス映像に対応するダンスの種類を示すラベルとダンス映像との組のデータセットを取得してよい。また、モデル生成部２３２は、属性情報を含む時系列データの特徴を示す特徴情報を潜在空間にマッピングするよう機械学習モデルＭ１を学習させてよい。例えば、モデル生成部２３２は、取得部２３１によって取得されたデータセットの特徴を示す特徴情報を潜在空間にマッピングするよう機械学習モデルＭ１を学習させてよい。また、モデル生成部２３２は、属性情報を含む時系列データの特徴を示す特徴情報の確率分布が正規分布に従うように機械学習モデルＭ１を学習させてよい。例えば、モデル生成部２３２は、取得部２３１によって取得されたデータセットの特徴を示す特徴情報の確率分布が正規分布に従うように機械学習モデルＭ１を学習させてよい。続いて、モデル生成部２３２は、学習済みの機械学習モデルＭ１の潜在空間にマッピングされた特徴情報を属性情報に応じたクラスタに分類してよい。例えば、モデル生成部２３２は、k-means法を用いて潜在空間にマッピングされた特徴情報同士の距離を算出することで、クラスタリングを行ってよい。なお、モデル生成部２３２は、k-means法の他にも、公知のクラスタリング技術を用いて、潜在空間にマッピングされた特徴情報を属性に応じたクラスタに分類してよい。例えば、モデル生成部２３２は、潜在空間にマッピングされた特徴情報を、属性情報が示すダンスの種類（例えば、ジャズダンス、バレエダンス、ヒップホップダンス等の種類）に応じたクラスタに分類してよい。 Further, the acquisition unit 231 may acquire time series data including attribute information corresponding to the time series data. For example, the acquisition unit 231 may acquire a data set of a dance video and a label indicating the type of dance corresponding to the dance video as time series data including attribute information corresponding to the time series data. Furthermore, the model generation unit 232 may cause the machine learning model M1 to learn to map feature information indicating features of time-series data including attribute information to the latent space. For example, the model generation unit 232 may cause the machine learning model M1 to learn to map feature information indicating the characteristics of the dataset acquired by the acquisition unit 231 to the latent space. Further, the model generation unit 232 may cause the machine learning model M1 to learn such that the probability distribution of feature information indicating the characteristics of time series data including attribute information follows a normal distribution. For example, the model generation unit 232 may cause the machine learning model M1 to learn such that the probability distribution of feature information indicating the characteristics of the dataset acquired by the acquisition unit 231 follows a normal distribution. Subsequently, the model generation unit 232 may classify the feature information mapped into the latent space of the learned machine learning model M1 into clusters according to the attribute information. For example, the model generation unit 232 may perform clustering by calculating distances between feature information mapped in the latent space using the k-means method. In addition to the k-means method, the model generation unit 232 may use a known clustering technique to classify the feature information mapped in the latent space into clusters according to attributes. For example, the model generation unit 232 may classify the feature information mapped in the latent space into clusters according to the type of dance indicated by the attribute information (for example, types of jazz dance, ballet dance, hip-hop dance, etc.). .

また、機械学習モデルＭ１は、対象物を含む画像から対象物の姿勢を推定するよう学習された姿勢推定モデルを含んでよい。例えば、モデル生成部２３２は、公知の姿勢推定技術を用いて、ダンス映像からダンス映像に含まれる対象者の姿勢を推定するよう学習された姿勢推定モデルを生成してよい。 Furthermore, the machine learning model M1 may include a posture estimation model trained to estimate the posture of the target object from an image including the target object. For example, the model generation unit 232 may use a known posture estimation technique to generate a posture estimation model trained to estimate the posture of the subject included in the dance video from the dance video.

また、機械学習モデルＭ１は、対象者の関節点を含む関節画像から関節点に対応する対象者の人物画像を生成するよう学習された画像変換モデルを含んでよい。例えば、モデル生成部２３２は、Pix2Pix、CYcleGAN、DiscoGAN、UNIT等の公知の画像変換モデルを用いて、対象者の関節点を含む関節画像から関節点に対応する対象者の人物画像を生成するよう画像変換モデルを学習させてよい。 Furthermore, the machine learning model M1 may include an image conversion model learned to generate a person image of the subject corresponding to the joint points from a joint image including the subject's joint points. For example, the model generation unit 232 uses a known image conversion model such as Pix2Pix, CYcleGAN, DiscoGAN, UNIT, etc. to generate a human image of the target person corresponding to the joint points from the joint image including the joint points of the target person. An image transformation model may be trained.

(配信部２３３)
配信部２３３は、モデル生成部２３２によって生成された機械学習モデルＭ１に関する情報を各利用者の情報処理装置１００に配信する。 (Distribution Department 233)
The distribution unit 233 distributes information regarding the machine learning model M1 generated by the model generation unit 232 to each user's information processing device 100.

〔４．生成装置による情報処理の手順〕
図４は、実施形態に係る生成装置２０による情報処理手順を示すフローチャートである。図４に示すように、取得部２３１は、姿勢推定モデルを用いて、ダンス映像に含まれる対象者の関節点の座標を推定する（ステップＳ１１）。続いて、取得部２３１は、各関節点の座標に基づいて、関節角度の変化量の時系列データを生成する（ステップＳ１２）。続いて、モデル生成部２３２は、機械学習モデルＭ１のエンコーダを用いて、時系列データから特徴情報を生成する（ステップＳ１３）。続いて、モデル生成部２３２は、機械学習モデルＭ１のデコーダを用いて、特徴情報から時系列データを生成する（ステップＳ１４）。続いて、モデル生成部２３２は、エンコーダに入力される時系列データと、デコーダから出力される時系列データとの類似度が所定の閾値を超えるように機械学習モデルＭ１を学習させる（ステップＳ１５）。 [4. Procedure of information processing by generation device]
FIG. 4 is a flowchart showing an information processing procedure by the generation device 20 according to the embodiment. As shown in FIG. 4, the acquisition unit 231 estimates the coordinates of the joint points of the subject included in the dance video using the posture estimation model (step S11). Subsequently, the acquisition unit 231 generates time-series data of the amount of change in joint angle based on the coordinates of each joint point (step S12). Subsequently, the model generation unit 232 generates feature information from the time series data using the encoder of the machine learning model M1 (step S13). Subsequently, the model generation unit 232 generates time series data from the feature information using the decoder of the machine learning model M1 (step S14). Next, the model generation unit 232 trains the machine learning model M1 so that the degree of similarity between the time series data input to the encoder and the time series data output from the decoder exceeds a predetermined threshold (step S15). .

〔５．情報処理装置の構成〕
図５は、実施形態に係る情報処理装置１００の構成例を示す図である。情報処理装置１００は、通信部１１０と、記憶部１２０と、入力部１３０と、出力部１４０と、制御部１５０とを有する。 [5. Configuration of information processing device]
FIG. 5 is a diagram illustrating a configuration example of the information processing device 100 according to the embodiment. The information processing device 100 includes a communication section 110, a storage section 120, an input section 130, an output section 140, and a control section 150.

（通信部１１０）
通信部１１０は、ＮＩＣやアンテナ等によって実現される。通信部１１０は、各種ネットワークと有線または無線で接続され、例えば、生成装置２０との間で情報の送受信を行う。 (Communication Department 110)
The communication unit 110 is realized by a NIC, an antenna, or the like. The communication unit 110 is connected to various networks by wire or wirelessly, and transmits and receives information to and from the generation device 20, for example.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。具体的には、記憶部１２０は、各種プログラム（情報処理プログラムの一例）を記憶する。また、記憶部１２０は、機械学習モデルＭ１に関する情報を記憶する。 (Storage unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory device such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk. Specifically, the storage unit 120 stores various programs (an example of an information processing program). Furthermore, the storage unit 120 stores information regarding the machine learning model M1.

（入力部１３０）
入力部１３０は、利用者から各種操作が入力される。例えば、入力部１３０は、タッチパネル機能により表示面（例えば出力部１４０）を介して利用者からの各種操作を受け付けてもよい。また、入力部１３０は、情報処理装置１００に設けられたボタンや、情報処理装置１００に接続されたキーボードやマウスからの各種操作を受け付けてもよい。例えば、入力部１３０は、利用者から画面に表示された利用者の特徴情報を加工する操作を受け付けてよい。 (Input section 130)
The input unit 130 receives various operations from the user. For example, the input unit 130 may receive various operations from the user via a display screen (for example, the output unit 140) using a touch panel function. Further, the input unit 130 may accept various operations from buttons provided on the information processing device 100 or a keyboard or mouse connected to the information processing device 100. For example, the input unit 130 may receive an operation from the user to process the user's characteristic information displayed on the screen.

（出力部１４０）
出力部１４０は、例えば、液晶ディスプレイや有機ＥＬ（Electro-Luminescence）ディスプレイ等によって実現される表示画面であり、各種情報を表示するための表示装置である。出力部１４０は、制御部１５０の制御に従って、各種情報を表示する。例えば、出力部１４０は、提供部１５３の制御に従って、潜在空間にマッピングされた特徴情報の画像を表示してよい。なお、情報処理装置１００にタッチパネルが採用される場合には、入力部１３０と出力部１４０とは一体化される。また、以下の説明では、出力部１４０を画面と記載する場合がある。 (Output section 140)
The output unit 140 is a display screen realized by, for example, a liquid crystal display or an organic EL (Electro-Luminescence) display, and is a display device for displaying various information. The output unit 140 displays various information under the control of the control unit 150. For example, the output unit 140 may display an image of the feature information mapped to the latent space under the control of the providing unit 153. Note that when a touch panel is employed in the information processing apparatus 100, the input section 130 and the output section 140 are integrated. Furthermore, in the following description, the output unit 140 may be referred to as a screen.

（制御部１５０）
制御部１５０は、コントローラであり、例えば、ＣＰＵやＭＰＵ等によって、情報処理装置１００内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１５０は、コントローラであり、例えば、ＡＳＩＣやＦＰＧＡ等の集積回路により実現される。 (Control unit 150)
The control unit 150 is a controller, and for example, various programs (corresponding to an example of an information processing program) stored in a storage device inside the information processing device 100 are executed by a CPU, an MPU, etc. using the RAM as a work area. This is achieved by Further, the control unit 150 is a controller, and is realized by, for example, an integrated circuit such as an ASIC or an FPGA.

制御部１５０は、取得部１５１と、生成部１５２と、提供部１５３と、受付部１５４を機能部として有し、以下に説明する情報処理の作用を実現または実行してよい。なお、制御部１５０の内部構成は、図５に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、各機能部は、制御部１５０の機能を示したものであり、必ずしも物理的に区別されるものでなくともよい。 The control unit 150 has an acquisition unit 151, a generation unit 152, a provision unit 153, and a reception unit 154 as functional units, and may realize or execute the information processing operation described below. Note that the internal configuration of the control unit 150 is not limited to the configuration shown in FIG. 5, and may be any other configuration as long as it performs information processing to be described later. Further, each functional unit indicates a function of the control unit 150, and does not necessarily have to be physically distinct.

（取得部１５１）
取得部１５１は、関節角度の変化量の時系列データに基づいて時系列データの特徴を示す特徴情報を生成するエンコーダと、特徴情報に基づいて時系列データを生成するデコーダと、を含む機械学習モデルＭ１を取得する。また、取得部１５１は、特徴情報の確率分布が正規分布に従うように学習された機械学習モデルＭ１を取得する。例えば、取得部１５１は、属性情報を含む時系列データの特徴を示す特徴情報の確率分布が正規分布に従うように学習された機械学習モデルＭ１を取得してよい。具体的には、取得部１５１は、生成装置２０から学習済みの機械学習モデルＭ１に関する情報を取得してよい。 (Acquisition unit 151)
The acquisition unit 151 is a machine learning system that includes an encoder that generates feature information indicating the characteristics of the time series data based on the time series data of the amount of change in joint angle, and a decoder that generates time series data based on the feature information. Obtain model M1. Further, the acquisition unit 151 acquires a machine learning model M1 trained so that the probability distribution of feature information follows a normal distribution. For example, the acquisition unit 151 may acquire the machine learning model M1 that has been trained so that the probability distribution of feature information indicating characteristics of time series data including attribute information follows a normal distribution. Specifically, the acquisition unit 151 may acquire information regarding the learned machine learning model M1 from the generation device 20.

(生成部１５２)
生成部１５２は、利用者からダンス映像を受け付ける。具体的には、生成部１５２は、利用者自身のダンス映像（以下、利用者のダンス映像ともいう）を受け付けてよい。例えば、生成部１５２は、入力部１３０を介して、利用者から利用者のダンス映像（以下、第１のダンス映像ともいう）を受け付けてよい。例えば、生成部１５２は、第１のダンス映像として、利用者がジャズダンスを踊っている様子を撮影した映像を受け付けてよい。 (Generation unit 152)
The generation unit 152 receives dance videos from users. Specifically, the generation unit 152 may receive the user's own dance video (hereinafter also referred to as the user's dance video). For example, the generation unit 152 may receive a user's dance video (hereinafter also referred to as a first dance video) from the user via the input unit 130. For example, the generation unit 152 may receive a video of a user performing a jazz dance as the first dance video.

続いて、生成部１５２は、第１のダンス映像を受け付けた場合、姿勢推定モデルを用いて、第１のダンス映像に撮像された利用者の各関節点の座標をフレームごとに推定してよい。続いて、生成部１５２は、推定された各関節点の座標から各関節の関節角度のフレームごとの変化量（以下、第１の時系列データともいう）を算出してよい。このようにして、生成部１５２は、姿勢推定モデルを用いて、第１のダンス映像から利用者の各関節点の座標を推定し、推定した関節点の座標に基づいて、第１の時系列データを生成してよい。 Subsequently, when receiving the first dance video, the generation unit 152 may use the posture estimation model to estimate the coordinates of each joint point of the user captured in the first dance video for each frame. . Subsequently, the generation unit 152 may calculate the amount of change in the joint angle of each joint for each frame (hereinafter also referred to as first time series data) from the estimated coordinates of each joint point. In this way, the generation unit 152 estimates the coordinates of each joint point of the user from the first dance video using the posture estimation model, and generates the first time series based on the estimated coordinates of the joint points. May generate data.

続いて、生成部１５２は、第１の時系列データを生成した場合、機械学習モデルＭ１のエンコーダを用いて、第１の時系列データを機械学習モデルの潜在空間に写像してよい。例えば、生成部１５２は、各関節の関節角度の初期角度と各関節の関節角度の変化量の第１の時系列データを入力情報として機械学習モデルＭ１のエンコーダに入力し、第１の特徴情報を生成してよい。続いて、生成部１５２は、生成した第１の特徴情報を機械学習モデルＭ１の潜在空間にマッピングしてよい。図６は、実施形態に係る潜在空間の一例について説明するための図である。図６の左側の図における点Ｐ１は、生成部１５２によって潜在空間にマッピングされた第１の特徴情報の位置を示す。図６の左側の図では、利用者から受け付けた第１のダンス映像がジャズダンスの映像なので、生成部１５２によって第１の特徴情報がジャズダンスのクラスタの位置にマッピングされる様子を示す。 Subsequently, when generating the first time series data, the generation unit 152 may map the first time series data to the latent space of the machine learning model using the encoder of the machine learning model M1. For example, the generation unit 152 inputs first time series data of the initial joint angle of each joint and the amount of change in the joint angle of each joint as input information to the encoder of the machine learning model M1, and generates first feature information. may be generated. Subsequently, the generation unit 152 may map the generated first feature information onto the latent space of the machine learning model M1. FIG. 6 is a diagram for explaining an example of the latent space according to the embodiment. A point P1 in the left diagram of FIG. 6 indicates the position of the first feature information mapped to the latent space by the generation unit 152. The diagram on the left side of FIG. 6 shows how the first feature information is mapped to the position of the jazz dance cluster by the generation unit 152, since the first dance video received from the user is a jazz dance video.

(提供部１５３)
提供部１５３は、機械学習モデルＭ１の潜在空間に関する情報を利用者に対して提供する。例えば、提供部１５３は、学習済みの特徴情報がマッピングされた潜在空間に関する情報を表示するよう出力部１４０を制御してよい。図６の左側の図に示す例では、提供部１５３は、学習済みの特徴情報とともに、生成部１５２によって生成された第１の特徴情報が点Ｐ１の位置にマッピングされた様子を示す潜在空間の画像Ｇ３を表示するよう出力部１４０を制御してよい。 (Providing Department 153)
The providing unit 153 provides information regarding the latent space of the machine learning model M1 to the user. For example, the providing unit 153 may control the output unit 140 to display information regarding the latent space to which the learned feature information is mapped. In the example shown in the diagram on the left side of FIG. 6, the providing unit 153 generates a latent space indicating how the first feature information generated by the generating unit 152 is mapped to the position of point P1 together with the learned feature information. The output unit 140 may be controlled to display the image G3.

(受付部１５４)
受付部１５４は、利用者から潜在空間に対する操作を受け付ける。具体的には、受付部１５４は、利用者から潜在空間における潜在変数を変化させる操作を受け付けてよい。図６の右側の図に示す例では、受付部１５４は、潜在空間における潜在変数を第１の特徴情報を持つ値を示す点Ｐ１の位置から第２の特徴情報を持つ値を示す点Ｐ２の位置に変化させる操作を利用者から受け付けてよい。例えば、受付部１５４は、入力部１３０を介して、提供部１５３によって表示された潜在空間の画像Ｇ３に対する操作を受け付けてよい。受付部１５４は、潜在変数をジャズダンスのクラスタに属する第１の特徴情報を持つ値を示す点Ｐ１の位置からヒップホップダンスのクラスタに属する第２の特徴情報を持つ値を示す点Ｐ２の位置に変化させる操作を利用者から受け付けてよい。また、受付部１５４は、利用者によって変化させられた後の潜在変数の値に対応する第２の特徴情報を受け付けてよい。 (Reception Department 154)
The reception unit 154 receives operations on the latent space from the user. Specifically, the reception unit 154 may receive an operation for changing a latent variable in the latent space from the user. In the example shown in the diagram on the right side of FIG. 6, the reception unit 154 changes the latent variable in the latent space from the position of the point P1 indicating the value having the first feature information to the point P2 indicating the value having the second feature information. An operation to change the position may be accepted from the user. For example, the receiving unit 154 may accept, via the input unit 130, an operation on the latent space image G3 displayed by the providing unit 153. The reception unit 154 converts the latent variable from the position of a point P1 indicating a value having first characteristic information belonging to the cluster of jazz dance to the position of point P2 indicating a value having second characteristic information belonging to the cluster of hip-hop dance. You may accept an operation from the user to change it to . Further, the receiving unit 154 may receive second feature information corresponding to the value of the latent variable after being changed by the user.

また、生成部１５２は、受付部１５４によって受け付けられた第２の特徴情報に基づいて、各関節の関節角度の変化量の第２の時系列データを生成してよい。例えば、生成部１５２は、受付部１５４によって受け付けられた第２の特徴情報に基づいて、潜在空間における潜在変数を第１の特徴情報を持つ値から第２の特徴情報を持つ値に変化させる。例えば、生成部１５２は、潜在空間における潜在変数を第１の属性情報に対応する第１の特徴情報を持つ値から第２の属性情報に対応する第２の特徴情報を持つ値に変化させてよい。具体的には、例えば、生成部１５２は、潜在変数をダンスの種類がジャズダンスであることを示すラベルに対応する特徴情報を持つ値から、ダンスの種類がヒップホップダンスであることを示すラベルに対応する特徴情報を持つ値まで変化させてよい。続いて、生成部１５２は、変化させた後の潜在変数の値に対応する第２の特徴情報に基づいて、機械学習モデルＭ１デコーダを用いて、第２の時系列データを生成してよい。 Further, the generation unit 152 may generate second time series data of the amount of change in the joint angle of each joint based on the second feature information received by the reception unit 154. For example, the generation unit 152 changes the latent variable in the latent space from a value having the first feature information to a value having the second feature information based on the second feature information received by the reception unit 154. For example, the generation unit 152 changes the latent variable in the latent space from a value having first feature information corresponding to first attribute information to a value having second feature information corresponding to second attribute information. good. Specifically, for example, the generation unit 152 converts the latent variable into a label indicating that the dance type is hip-hop dance from a value having feature information corresponding to a label indicating that the dance type is jazz dance. may be changed up to a value having characteristic information corresponding to . Subsequently, the generation unit 152 may generate the second time series data using the machine learning model M1 decoder based on the second feature information corresponding to the value of the latent variable after being changed.

続いて、生成部１５２は、機械学習モデルＭ１から出力された第２の時系列データと各関節の関節角度の初期角度に基づいて、フレームごとの各関節の関節角度を算出してよい。続いて、生成部１５２は、フレームごとの各関節の関節角度から、フレームごとの各関節点の座標を算出してよい。このようにして、生成部１５２は、利用者の各関節の関節角度の変化量の第２の時系列データに対応する関節の動きを含む映像（以下、関節映像）を生成してよい。 Subsequently, the generation unit 152 may calculate the joint angle of each joint for each frame based on the second time series data output from the machine learning model M1 and the initial angle of the joint angle of each joint. Subsequently, the generation unit 152 may calculate the coordinates of each joint point for each frame from the joint angle of each joint for each frame. In this way, the generation unit 152 may generate an image (hereinafter referred to as a joint image) including movement of a joint corresponding to the second time-series data of the amount of change in joint angle of each joint of the user.

続いて、生成部１５２は、生成した第２の時系列データに基づいて、第２の時系列データに対応する第２のダンス映像を生成してよい。具体的には、生成部１５２は、画像変換モデルを用いて、第２の時系列データから第２のダンス映像を生成してよい。例えば、生成部１５２は、画像変換モデルを用いて、第２の時系列データに対応する関節映像の各フレームをダンス中の人物を含む第２のダンス映像に変換してよい。図６に示すように、第２の特徴情報を示す点Ｐ２は、潜在空間におけるヒップホップダンスのクラスタの位置にマッピングされているので、第２の特徴情報に対応する第２のダンス映像は、ヒップホップダンスの映像である。このようにして、生成部１５２は、利用者によるジャズダンスのダンス映像（第１のダンス映像）から、利用者によるヒップホップダンスのダンス映像（第２のダンス映像）を生成してよい。 Subsequently, the generation unit 152 may generate a second dance video corresponding to the second time series data based on the generated second time series data. Specifically, the generation unit 152 may generate the second dance video from the second time series data using an image conversion model. For example, the generation unit 152 may use an image conversion model to convert each frame of the joint video corresponding to the second time series data into a second dance video including a dancing person. As shown in FIG. 6, the point P2 indicating the second feature information is mapped to the position of the hip-hop dance cluster in the latent space, so the second dance video corresponding to the second feature information is This is a video of hip hop dance. In this way, the generation unit 152 may generate a dance video of the user's hip-hop dance (second dance video) from a dance video of the user's jazz dance (first dance video).

〔６．情報処理装置による情報処理の手順〕
図７は、実施形態に係る情報処理装置１００による情報処理手順を示すフローチャートである。図７に示すように、取得部１５１は、事前学習済みの機械学習モデルＭ１を取得する（ステップＳ１０１）。生成部１５２は、取得部１５１によって取得された機械学習モデルＭ１を用いて、利用者のダンス映像に対応する第１の特徴情報を潜在空間にマッピングする（ステップＳ１０２）。提供部１５３は、第１の特徴情報をマッピングした潜在空間の情報を利用者に対して提供する（ステップＳ１０３）。受付部１５４は、潜在空間における潜在変数を変化させる操作を利用者から受け付ける（ステップＳ１０４）。生成部１５２は、受付部１５４によって受け付けられた変化後の潜在変数の値に対応する第２の特徴情報に基づいて、新たなダンス映像を生成する（ステップＳ１０５）。提供部１５３は、生成部１５２によって生成された新たなダンス映像を利用者に対して提供する（ステップＳ１０６）。 [6. Information processing procedure by information processing device]
FIG. 7 is a flowchart showing an information processing procedure by the information processing apparatus 100 according to the embodiment. As shown in FIG. 7, the acquisition unit 151 acquires a pre-trained machine learning model M1 (step S101). The generation unit 152 uses the machine learning model M1 acquired by the acquisition unit 151 to map the first feature information corresponding to the user's dance video onto the latent space (step S102). The providing unit 153 provides the user with information on the latent space in which the first feature information is mapped (step S103). The reception unit 154 receives an operation for changing a latent variable in the latent space from the user (step S104). The generation unit 152 generates a new dance video based on the second feature information corresponding to the changed latent variable value accepted by the reception unit 154 (step S105). The providing unit 153 provides the new dance video generated by the generating unit 152 to the user (step S106).

〔７．変形例〕
上述した実施形態に係る処理は、上記実施形態以外にも種々の異なる形態にて実施されてよい。 [7. Modified example]
The processing according to the embodiment described above may be implemented in various different forms other than the embodiment described above.

〔７－１．潜在空間について〕
上述した実施形態では、属性情報が、ダンス映像に含まれる人物が踊るダンスの種類である場合について説明したが、属性情報は、ダンスの種類に限られない。例えば、属性情報は、時系列データに対応するダンス映像に含まれる人物のダンスの習熟度、ダンスの特徴、または、ダンスを踊っている人物の生体情報を示す情報であってよい。例えば、属性情報は、ダンスの習熟度を示すスコアであってよい。例えば、ダンスの習熟度を示すスコアは、プロのダンサーがダンス映像に含まれるダンスの上手さを「１」～「５」までの５段階で評価し、ダンス映像に対して「１」～「５」までの数値を付与（例えば、ダンスが上手いほど大きい数値を付与）したものであってよい。また、属性情報は、ダンスの特徴を示すスコアであってよい。例えば、ダンスの特徴を示すスコアは、ダンスの習熟度を示すスコアと同様に、プロのダンサーがダンス映像に含まれるダンスの特徴（例えば、ダンスにおけるキレの有無）を数値によって評価し、ダンス映像に対して評価に相当する数値を付与したものであってよい。また、属性情報は、ダンスを踊っている人物の生体情報を示す数値であってよい。例えば、ダンス映像の撮影前（または、撮影中であってもよい）に生体センサを用いてダンスを踊る人物の生体情報（例えば、筋肉量）を取得する。そして、生体センサから取得した生体情報の数値をダンス映像に付与したものであってよい。 [7-1. About latent space]
In the embodiment described above, a case has been described in which the attribute information is the type of dance performed by the person included in the dance video, but the attribute information is not limited to the type of dance. For example, the attribute information may be information indicating the dancing proficiency level of the person included in the dance video corresponding to the time-series data, the characteristics of the dance, or the biological information of the dancing person. For example, the attribute information may be a score indicating dance proficiency. For example, to obtain a score that indicates dance proficiency, professional dancers evaluate the skill of the dance included in a dance video on a five-point scale from "1" to "5."5" (for example, the better the dancer is, the higher the value is assigned). Further, the attribute information may be a score indicating the characteristics of the dance. For example, similar to scores indicating dance proficiency, scores indicating dance characteristics are obtained by professional dancers numerically evaluating the dance characteristics (for example, presence or absence of sharpness in the dance) included in the dance video. may be given a numerical value corresponding to the evaluation. Further, the attribute information may be a numerical value indicating biological information of the person dancing. For example, biometric information (for example, muscle mass) of a dancing person is acquired using a biosensor before (or even during) a dance video is captured. Then, numerical values of biological information acquired from a biological sensor may be added to the dance video.

図８は、変形例に係る潜在空間の一例について説明するための図である。図８の左側の図は、特徴情報と紐づいた属性情報が、ダンスの習熟度である場合を示す。例えば、生成部１５２は、潜在空間における潜在変数を第１の属性情報（例えば、ダンスが下手なことを示すラベル）に対応する第１の特徴情報を持つ値から第２の属性情報（例えば、ダンスが上手いことを示すラベル）に対応する第２の特徴情報を持つ値に変化させてよい。これにより、情報処理装置１００は、元のダンス映像に含まれる利用者のダンスよりも、利用者のダンスがより上手くなったダンス映像を生成することができる。また、情報処理装置１００は、利用者のダンスがより上手くなったダンス映像を利用者に対して提供することができる。 FIG. 8 is a diagram for explaining an example of a latent space according to a modification. The diagram on the left side of FIG. 8 shows a case where the attribute information associated with the feature information is dance proficiency. For example, the generation unit 152 converts the latent variable in the latent space from a value having first feature information corresponding to first attribute information (for example, a label indicating that one is bad at dancing) to a value having second attribute information (for example, The value may be changed to a value having second characteristic information corresponding to a label indicating that the dancer is good at dancing. Thereby, the information processing device 100 can generate a dance video in which the user's dancing is better than the user's dance included in the original dance video. Furthermore, the information processing device 100 can provide the user with a dance video in which the user's dancing becomes better.

また、図８の中央の図は、特徴情報と紐づいた属性情報が、ダンスのキレの有無である場合を示す。例えば、生成部１５２は、潜在空間における潜在変数を第１の属性情報（例えば、ダンスのキレがないことを示すラベル）に対応する第１の特徴情報を持つ値から第２の属性情報（例えば、ダンスのキレがあることを示すラベル）に対応する第２の特徴情報を持つ値に変化させてよい。これにより、情報処理装置１００は、元のダンス映像に含まれる利用者のダンスよりも、利用者のダンスがよりキレのあるダンスになったダンス映像を生成することができる。また、情報処理装置１００は、利用者のダンスがよりキレのあるダンスになったダンス映像を利用者に対して提供することができる。 Furthermore, the center diagram in FIG. 8 shows a case where the attribute information associated with the feature information is whether or not the dance is sharp. For example, the generation unit 152 converts the latent variable in the latent space from a value having first characteristic information corresponding to first attribute information (for example, a label indicating lack of sharpness in dancing) to second attribute information (for example, , a label indicating that the dance is sharp). Thereby, the information processing device 100 can generate a dance video in which the user's dance is sharper than the user's dance included in the original dance video. Furthermore, the information processing device 100 can provide the user with a dance video in which the user's dance becomes more sharp.

また、図８の右側の図は、特徴情報と紐づいた属性情報が、ダンスを踊っている人物の筋肉量である場合を示す。例えば、生成部１５２は、潜在空間における潜在変数を第１の属性情報（例えば、筋肉量が少ないことを示すラベル）に対応する第１の特徴情報を持つ値から第２の属性情報（例えば、筋肉量が多いことを示すラベル）に対応する第２の特徴情報を持つ値に変化させてよい。これにより、情報処理装置１００は、元のダンス映像に含まれる利用者の筋肉量よりも、利用者の筋肉量がより多くなったダンス映像を生成することができる。また、情報処理装置１００は、利用者の筋肉量がより多くなったダンス映像を利用者に対して提供することができる。 Furthermore, the diagram on the right side of FIG. 8 shows a case where the attribute information associated with the feature information is the muscle mass of the person dancing. For example, the generation unit 152 converts the latent variable in the latent space from a value having first feature information corresponding to first attribute information (for example, a label indicating that muscle mass is small) to a value having second attribute information (for example, It may be changed to a value having second characteristic information corresponding to a label indicating that the muscle mass is large. Thereby, the information processing device 100 can generate a dance video in which the user's muscle mass is greater than the user's muscle mass included in the original dance video. Furthermore, the information processing device 100 can provide the user with a dance video in which the user has more muscle mass.

〔７－２．利用者の身体の動きについて〕
上述した実施形態では、利用者の身体の動きを含む運動映像がダンス映像である場合について説明したが、運動映像はダンス映像に限られない。例えば、運動映像に含まれる利用者の身体の動きは、ダンス以外にも、リハビリテーション、スポーツ（例えば、フィギュアスケートなど）、または演技における動作であってよい。 [7-2. About the user's body movements]
In the embodiment described above, a case has been described in which the exercise video including the movement of the user's body is a dance video, but the exercise video is not limited to a dance video. For example, the user's body movements included in the exercise video may be movements in rehabilitation, sports (eg, figure skating, etc.), or acting, in addition to dancing.

〔８．効果〕
上述したように、実施形態に係る情報処理装置（実施形態では情報処理装置１００）は、取得部（実施形態では取得部１５１）と生成部（実施形態では生成部１５２）を備える。取得部は、関節角度の変化に関する時系列データに基づいて時系列データの特徴を示す特徴情報を生成するエンコーダと、特徴情報に基づいて時系列データを生成するデコーダと、を含む機械学習モデルを取得する。生成部は、機械学習モデルの潜在空間を用いて、利用者の関節角度の変化に関する第１の時系列データから、利用者の関節角度の変化に関する第２の時系列データを生成する。 [8. effect〕
As described above, the information processing device according to the embodiment (the information processing device 100 in the embodiment) includes an acquisition unit (the acquisition unit 151 in the embodiment) and a generation unit (the generation unit 152 in the embodiment). The acquisition unit uses a machine learning model that includes an encoder that generates feature information indicating the characteristics of the time series data based on the time series data regarding changes in joint angles, and a decoder that generates time series data based on the feature information. get. The generation unit generates second time-series data regarding changes in the user's joint angles from first time-series data regarding changes in the user's joint angles using the latent space of the machine learning model.

これにより、情報処理装置は、機械学習モデルの潜在空間を用いることにより、利用者の第１の身体の動き（第１の運動ともいう）に対応する第１の時系列データから、潜在空間上の任意の値に対応した第２の時系列データへと変化させることができる。ここで、第２の時系列データは、第１の身体の動きとは異なる利用者の第２の身体の動き（以下、第２の運動）に対応する。すなわち、情報処理装置は、機械学習モデルの潜在空間を用いることにより、利用者の第１の運動に対応する第１の時系列データから、利用者の第１の運動を加工した利用者の第２の運動に対応する第２の時系列データへのモーフィングを実現可能にする。例えば、情報処理装置は、運動の種類に応じて分類された潜在空間を用いることにより、第１の運動に対応する第１の時系列データから第２の運動に対応する第２の時系列データへのモーフィングを実現可能にする。また、情報処理装置は、利用者の第１の運動を含む第１の運動映像から、第１の時系列データを生成することができる。また、情報処理装置は、第２の時系列データから、利用者の第２の運動を含む第２の運動映像を生成することができる。すなわち、情報処理装置は、利用者の第１の運動映像に基づいて、利用者が所望する運動映像の属性（例えば、運動の種類）に応じた新たな運動映像（例えば、第２の運動映像）を生成することができる。したがって、情報処理装置は、利用者の身体の動きを含む運動映像から任意の運動映像を生成可能とすることができる。また、情報処理装置は、利用者の身体の動きを含む運動映像から任意の運動映像を生成可能とすることができるため、持続可能な開発目標（ＳＤＧｓ）の目標９「産業と技術革新の基盤をつくろう」の達成に貢献できる。また、情報処理装置は、利用者が所望する運動映像の属性に応じた新たな運動映像を利用者に対して提供可能とすることができる。すなわち、情報処理装置は、利用者に対して新しいエンタテインメントを提供可能とすることができる。したがって、情報処理装置は、利用者に対して精神的な豊かさを提供可能とすることができる。 As a result, the information processing device uses the latent space of the machine learning model to generate data in the latent space from the first time series data corresponding to the user's first body movement (also referred to as the first movement). can be changed to second time series data corresponding to an arbitrary value. Here, the second time-series data corresponds to a second body movement (hereinafter referred to as second movement) of the user that is different from the first body movement. That is, by using the latent space of the machine learning model, the information processing device calculates the user's first movement obtained by processing the user's first movement from the first time series data corresponding to the user's first movement. Morphing to the second time series data corresponding to the second movement can be realized. For example, by using a latent space classified according to the type of movement, the information processing device can convert first time series data corresponding to a first movement to second time series data corresponding to a second movement. make it possible to morph into Further, the information processing device can generate first time-series data from a first exercise video including a first exercise of the user. Further, the information processing device can generate a second exercise video including a second exercise of the user from the second time series data. That is, the information processing device creates a new exercise image (for example, a second exercise image) according to the attribute of the exercise image desired by the user (for example, the type of exercise) based on the user's first exercise image. ) can be generated. Therefore, the information processing device can generate an arbitrary exercise image from an exercise image that includes the movement of the user's body. In addition, information processing devices can generate arbitrary exercise images from exercise images that include the user's body movements. We can contribute to the achievement of "Let's create something." Further, the information processing device can provide the user with a new exercise video according to the attributes of the exercise video desired by the user. That is, the information processing device can provide new entertainment to users. Therefore, the information processing device can provide spiritual enrichment to the user.

また、取得部は、特徴情報の確率分布が正規分布に従うように学習された機械学習モデルを取得する。生成部は、第１の時系列データを潜在空間に写像し、潜在空間における潜在変数を潜在空間に写像された第１の時系列データに対応する第１の特徴情報を持つ値から第２の特徴情報を持つ値に変化させ、変化させた後の潜在変数の値に対応する第２の特徴情報に基づいて、第２の時系列データを生成する。 The acquisition unit also acquires a machine learning model trained so that the probability distribution of the feature information follows a normal distribution. The generation unit maps the first time series data to a latent space, and converts a latent variable in the latent space from a value having first feature information corresponding to the first time series data mapped to the latent space to a second value. The latent variable is changed to a value having characteristic information, and second time series data is generated based on the second characteristic information corresponding to the value of the latent variable after the change.

これにより、情報処理装置は、潜在空間上に、運動映像の属性に応じた特徴情報のクラスタを生成することができる。また、情報処理装置は、利用者の第１の運動映像を第１の運動映像の第１の属性に応じた特徴情報のクラスタ（以下、第１の属性のクラスタともいう）の位置にマッピングすることができる。また、情報処理装置は、潜在変数を、第１の属性のクラスタに属する第１の特徴情報を持つ値から第２の属性のクラスタに属する第２の特徴情報を持つ値に変化させることができる。また、情報処理装置は、第２の属性のクラスタに属する第２の特徴情報に基づいて、第２の運動映像を生成することができる。 Thereby, the information processing device can generate clusters of feature information according to the attributes of the exercise video on the latent space. The information processing device also maps the first exercise image of the user to the position of a cluster of feature information according to the first attribute of the first exercise image (hereinafter also referred to as a cluster of the first attribute). be able to. Further, the information processing device can change the latent variable from a value having first feature information belonging to a cluster of a first attribute to a value having second feature information belonging to a cluster of a second attribute. . Further, the information processing device can generate a second exercise image based on second feature information belonging to a cluster with a second attribute.

また、時系列データは、時系列データに対応する属性情報を含む。取得部は、属性情報を含む時系列データの特徴を示す特徴情報の確率分布が正規分布に従うように学習された機械学習モデルを取得する。生成部は、潜在空間における潜在変数を第１の属性情報に対応する第１の特徴情報を持つ値から第２の属性情報に対応する第２の特徴情報を持つ値に変化させ、変化させた後の潜在変数の値に対応する第２の特徴情報に基づいて、第２の時系列データを生成する。 Further, the time series data includes attribute information corresponding to the time series data. The acquisition unit acquires a machine learning model trained such that a probability distribution of feature information indicating characteristics of time-series data including attribute information follows a normal distribution. The generation unit changes the latent variable in the latent space from a value having first feature information corresponding to the first attribute information to a value having second feature information corresponding to the second attribute information. Second time-series data is generated based on second feature information corresponding to the subsequent value of the latent variable.

これにより、情報処理装置は、潜在変数を、第１の属性のクラスタに属する第１の特徴情報を持つ値から第２の属性のクラスタに属する第２の特徴情報を持つ値に変化させることができる。また、情報処理装置は、第２の属性のクラスタに属する第２の特徴情報に基づいて、第２の運動映像を生成することができる。 Thereby, the information processing device can change the latent variable from a value having the first feature information belonging to the cluster of the first attribute to a value having the second feature information belonging to the cluster of the second attribute. can. Further, the information processing device can generate a second exercise image based on second feature information belonging to a cluster of a second attribute.

また、機械学習モデルは、対象物を含む画像から対象物の姿勢を推定するよう学習された姿勢推定モデルをさらに含む。生成部は、姿勢推定モデルを用いて、利用者の身体の動きを含む第１の運動映像から利用者の関節点の座標を推定し、推定した関節点の座標に基づいて、第１の時系列データを生成する。 The machine learning model further includes a posture estimation model trained to estimate the posture of the target object from an image including the target object. The generation unit estimates the coordinates of the user's joint points from the first movement image including the user's body movements using the posture estimation model, and calculates the coordinates of the user's joint points based on the estimated coordinates of the joint points. Generate series data.

これにより、情報処理装置は、第１の運動映像に含まれる利用者の姿勢を適切に推定することができるので、利用者の身体の動きを示す情報を適切に生成することができる。 Thereby, the information processing device can appropriately estimate the user's posture included in the first exercise video, and therefore can appropriately generate information indicating the user's body movement.

また、生成部は、生成した第２の時系列データに基づいて、第２の時系列データに対応する利用者の身体の動きを含む第２の運動映像を生成する。 Further, the generation unit generates, based on the generated second time-series data, a second exercise image including the movement of the user's body corresponding to the second time-series data.

これにより、情報処理装置は、利用者の身体の動きを含む第１の運動映像を加工した任意の身体の動きを含む第２の運動映像を生成可能とすることができる。 Thereby, the information processing device can generate a second exercise image including any body movement obtained by processing the first exercise image including the user's body movement.

また、機械学習モデルは、対象者の関節点を含む関節画像から関節点に対応する対象者の人物画像を生成するよう学習された画像変換モデルをさらに含む。生成部は、画像変換モデルを用いて、第２の時系列データから第２の運動映像を生成する。 The machine learning model further includes an image transformation model learned to generate a person image of the subject corresponding to the joint points from a joint image including the joint points of the subject. The generation unit generates a second exercise image from the second time series data using the image conversion model.

これにより、情報処理装置は、利用者の骨格モデルを肉付けした利用者の身体の動きを含む第２の運動映像を生成可能とすることができる。 Thereby, the information processing device can generate a second exercise image that includes the user's body movements by fleshing out the user's skeletal model.

上述したように、実施形態に係る情報処理装置（実施形態では生成装置２０）は、取得部（実施形態では取得部２３１）とモデル生成部（実施形態ではモデル生成部２３２）を備える。取得部は、関節角度の変化に関する時系列データを取得する。モデル生成部は、時系列データに基づいて時系列データの特徴を示す特徴情報を生成するエンコーダと、特徴情報に基づいて時系列データを生成するデコーダと、を含む機械学習モデルを生成する。 As described above, the information processing device according to the embodiment (generation device 20 in the embodiment) includes an acquisition unit (acquisition unit 231 in the embodiment) and a model generation unit (model generation unit 232 in the embodiment). The acquisition unit acquires time-series data regarding changes in joint angles. The model generation unit generates a machine learning model including an encoder that generates feature information indicating characteristics of the time series data based on the time series data, and a decoder that generates time series data based on the feature information.

これにより、情報処理装置は、機械学習モデルの潜在空間を用いることにより、利用者の第１の身体の動き（第１の運動ともいう）に対応する第１の時系列データから、潜在空間上の任意の値に対応した第２の時系列データへと変化させることを可能とすることができる。すなわち、情報処理装置は、機械学習モデルの潜在空間を用いることにより、利用者の第１の運動に対応する第１の時系列データから、利用者の第１の運動を加工した利用者の第２の身体の動き（第２の運動ともいう）を示す第２の時系列データへのモーフィングを実現可能にする。例えば、情報処理装置は、運動の種類に応じて分類された潜在空間を用いることにより、第１の運動に対応する第１の時系列データから第２の運動に対応する第２の時系列データへのモーフィングを実現可能にする。また、情報処理装置は、利用者の第１の運動を含む第１の運動映像から、第１の時系列データを生成可能とすることができる。また、情報処理装置は、第２の時系列データから、利用者の第２の運動を含む第２の運動映像を生成可能とすることができる。すなわち、情報処理装置は、利用者の第１の運動映像に基づいて、利用者が所望する運動映像の属性（例えば、運動の種類）に応じた新たな運動映像（例えば、第２の運動映像）を生成可能とすることができる。したがって、情報処理装置は、利用者の身体の動きを含む運動映像から任意の運動映像を生成可能とすることができる。また、情報処理装置は、利用者の身体の動きを含む運動映像から任意の運動映像を生成可能とすることができるため、持続可能な開発目標（ＳＤＧｓ）の目標９「産業と技術革新の基盤をつくろう」の達成に貢献できる。また、情報処理装置は、利用者が所望する運動映像の属性に応じた新たな運動映像を利用者に対して提供可能とすることができる。すなわち、情報処理装置は、利用者に対して新しいエンタテインメントを提供可能とすることができる。したがって、情報処理装置は、利用者に対して精神的な豊かさを提供可能とすることができる。 As a result, the information processing device uses the latent space of the machine learning model to generate data in the latent space from the first time series data corresponding to the user's first body movement (also referred to as the first movement). can be changed to second time series data corresponding to an arbitrary value. That is, by using the latent space of the machine learning model, the information processing device calculates the user's first movement obtained by processing the user's first movement from the first time series data corresponding to the user's first movement. Morphing to second time series data representing second body movement (also referred to as second movement) is made possible. For example, by using a latent space classified according to the type of movement, the information processing device can convert first time series data corresponding to a first movement to second time series data corresponding to a second movement. make it possible to morph into Further, the information processing device can generate the first time-series data from the first exercise video including the user's first exercise. Further, the information processing device can generate a second exercise image including the second exercise of the user from the second time series data. That is, the information processing device creates a new exercise image (for example, a second exercise image) according to the attribute of the exercise image desired by the user (for example, the type of exercise) based on the user's first exercise image. ) can be generated. Therefore, the information processing device can generate an arbitrary exercise image from an exercise image that includes the movement of the user's body. In addition, information processing devices can generate arbitrary exercise images from exercise images that include the user's body movements. We can contribute to the achievement of "Let's create something." Further, the information processing device can provide the user with a new exercise video according to the attributes of the exercise video desired by the user. That is, the information processing device can provide new entertainment to users. Therefore, the information processing device can provide spiritual enrichment to the user.

また、モデル生成部は、エンコーダに入力される時系列データと、デコーダから出力される時系列データとの類似度が所定の閾値を超えるように機械学習モデルを学習させる。 The model generation unit also trains the machine learning model so that the degree of similarity between the time series data input to the encoder and the time series data output from the decoder exceeds a predetermined threshold.

これにより、情報処理装置は、機械学習モデルの精度を向上させることができる。 Thereby, the information processing device can improve the accuracy of the machine learning model.

また、モデル生成部は、特徴情報の確率分布が正規分布に従うように機械学習モデルを学習させる。 Further, the model generation unit trains the machine learning model so that the probability distribution of the feature information follows a normal distribution.

これにより、情報処理装置は、潜在空間上に、運動映像の属性に応じた特徴情報のクラスタを生成可能とすることができる。 Thereby, the information processing device can generate clusters of feature information according to the attributes of the exercise video on the latent space.

また、取得部は、時系列データに対応する属性情報を含む時系列データを取得する。モデル生成部は、属性情報を含む時系列データの特徴を示す特徴情報の確率分布が正規分布に従うように機械学習モデルを学習させる。 The acquisition unit also acquires time series data including attribute information corresponding to the time series data. The model generation unit trains the machine learning model so that the probability distribution of feature information indicating characteristics of time series data including attribute information follows a normal distribution.

また、モデル生成部は、特徴情報を属性情報に応じたクラスタに分類する。 Furthermore, the model generation unit classifies the feature information into clusters according to the attribute information.

これにより、情報処理装置は、利用者が所望する運動映像の属性（例えば、運動の種類）に関する情報を利用者に対して提供する際のユーザビリティを向上させることができる。 Thereby, the information processing device can improve usability when providing information to the user regarding the attributes of the exercise video desired by the user (for example, the type of exercise).

また、属性情報は、時系列データに対応する運動映像に含まれる対象者の身体の動きの種類、対象者の身体の動きの習熟度、対象者の身体の動きの特徴、または、対象者の生体情報を示す情報である。 In addition, the attribute information includes the type of the subject's body movement included in the exercise video corresponding to the time-series data, the subject's proficiency level of the subject's body movement, the characteristics of the subject's body movement, or the subject's This is information indicating biological information.

これにより、情報処理装置は、利用者が所望する身体の動きの種類、身体の動きの習熟度、身体の動きの特徴、または、生体情報に応じた新たな運動映像を生成可能とすることができる。 As a result, the information processing device can generate a new exercise image according to the type of body movement desired by the user, the proficiency level of body movement, the characteristics of body movement, or biological information. can.

〔９．ハードウェア構成〕
また、上述してきた実施形態に係る生成装置２０や情報処理装置１００等の情報機器は、例えば図９に示すような構成のコンピュータ１０００によって実現される。以下、情報処理装置１００を例に挙げて説明する。図９は、情報処理装置１００の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を備える。 [9. Hardware configuration]
Further, information devices such as the generation device 20 and the information processing device 100 according to the embodiments described above are realized by, for example, a computer 1000 having a configuration as shown in FIG. The information processing device 100 will be described below as an example. FIG. 9 is a hardware configuration diagram showing an example of a computer that implements the functions of the information processing device 100. The computer 1000 includes a CPU 1100, a RAM 1200, a ROM 1300, an HDD 1400, a communication interface (I/F) 1500, an input/output interface (I/F) 1600, and a media interface (I/F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 CPU 1100 operates based on a program stored in ROM 1300 or HDD 1400, and controls each section. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started, programs depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、所定の通信網を介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを所定の通信網を介して他の機器へ送信する。 The HDD 1400 stores programs executed by the CPU 1100, data used by the programs, and the like. Communication interface 1500 receives data from other devices via a predetermined communication network and sends it to CPU 1100, and transmits data generated by CPU 1100 to other devices via a predetermined communication network.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls output devices such as a display and a printer, and input devices such as a keyboard and mouse via an input/output interface 1600. CPU 1100 obtains data from an input device via input/output interface 1600. Further, CPU 1100 outputs the generated data to an output device via input/output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 Media interface 1700 reads programs or data stored in recording medium 1800 and provides them to CPU 1100 via RAM 1200. CPU 1100 loads this program from recording medium 1800 onto RAM 1200 via media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or a PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. etc.

例えば、コンピュータ１０００が実施形態に係る情報処理装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１５０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置から所定の通信網を介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information processing device 100 according to the embodiment, the CPU 1100 of the computer 1000 realizes the functions of the control unit 150 by executing a program loaded onto the RAM 1200. The CPU 1100 of the computer 1000 reads these programs from the recording medium 1800 and executes them, but as another example, these programs may be acquired from another device via a predetermined communication network.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Some of the embodiments of the present application have been described above in detail based on the drawings, but these are merely examples, and various modifications and variations may be made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure section of the invention. It is possible to carry out the invention in other forms with modifications.

〔１０．その他〕
また、上記実施形態及び変形例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [10. others〕
Furthermore, among the processes described in the above embodiments and modified examples, all or part of the processes described as being performed automatically can be performed manually, or may be described as being performed manually. All or part of this processing can also be performed automatically using known methods. In addition, information including the processing procedures, specific names, and various data and parameters shown in the above documents and drawings may be changed arbitrarily, unless otherwise specified. For example, the various information shown in each figure is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、上述した実施形態では、生成装置２０と情報処理装置１００とが別々の装置である場合について説明したが、生成装置２０と情報処理装置１００とは、一体の装置であってもよい。生成装置２０と情報処理装置１００が一体の装置である場合、情報処理装置１００は、生成装置２０の機能を備えてよい。 Furthermore, each component of each device shown in the drawings is functionally conceptual, and does not necessarily need to be physically configured as shown in the drawings. In other words, the specific form of distributing and integrating each device is not limited to what is shown in the diagram, and all or part of the devices can be functionally or physically distributed or integrated in arbitrary units depending on various loads and usage conditions. Can be integrated and configured. For example, in the embodiment described above, a case has been described in which the generation device 20 and the information processing device 100 are separate devices, but the generation device 20 and the information processing device 100 may be an integrated device. When the generation device 20 and the information processing device 100 are integrated, the information processing device 100 may have the functions of the generation device 20.

また、上述してきた実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Furthermore, the above-described embodiments and modifications can be combined as appropriate within a range that does not conflict with the processing contents.

１情報処理システム
２０生成装置
２１通信部
２２記憶部
２３制御部
２３１取得部
２３２モデル生成部
２３３配信部
１００情報処理装置
１１０通信部
１２０記憶部
１３０入力部
１４０出力部
１５０制御部
１５１取得部
１５２生成部
１５３提供部
１５４受付部 1 Information processing system 20 Generation device 21 Communication unit 22 Storage unit 23 Control unit 231 Acquisition unit 232 Model generation unit 233 Distribution unit 100 Information processing device 110 Communication unit 120 Storage unit 130 Input unit 140 Output unit 150 Control unit 151 Acquisition unit 152 Generation Department 153 Providing Department 154 Reception Department

Claims

Obtaining a machine learning model that includes an encoder that generates feature information indicating characteristics of the time series data based on time series data regarding changes in joint angles, and a decoder that generates the time series data based on the feature information. an acquisition unit to
a generation unit that uses the latent space of the machine learning model to generate second time-series data regarding changes in the user's joint angles from first time-series data regarding changes in the user's joint angles;
An information processing device comprising:

The acquisition unit includes:
obtaining the machine learning model trained such that the probability distribution of the feature information follows a normal distribution;
The generation unit is
The first time series data is mapped onto the latent space, and a latent variable in the latent space is changed from a value having first feature information corresponding to the first time series data mapped onto the latent space to a second value. and generating the second time series data based on the second characteristic information corresponding to the value of the latent variable after the change.
The information processing device according to claim 1.

The time series data includes attribute information corresponding to the time series data,
The acquisition unit includes:
obtaining the machine learning model trained so that the probability distribution of the feature information indicating the characteristics of the time series data including the attribute information follows a normal distribution;
The generation unit is
After changing the latent variable in the latent space from a value having the first characteristic information corresponding to the first attribute information to a value having the second characteristic information corresponding to the second attribute information, generating the second time series data based on the second feature information corresponding to the value of the latent variable;
The information processing device according to claim 2.

The machine learning model further includes a pose estimation model trained to estimate a pose of the target object from an image including the target object,
The generation unit is
Using the posture estimation model, the coordinates of the joint points of the user are estimated from a first movement image including the body movements of the user, and the coordinates of the joint points of the user are estimated based on the estimated coordinates of the joint points. generate series data,
The information processing device according to claim 1.

The generation unit is
Based on the generated second time-series data, a second exercise image including the movement of the user's body corresponding to the second time-series data is generated;
The information processing device according to claim 1.

The machine learning model further includes an image transformation model learned to generate a person image of the subject corresponding to the joint points from a joint image including the joint points of the subject,
The generation unit is
generating the second motion image from the second time series data using the image conversion model;
The information processing device according to claim 5.

an acquisition unit that acquires time series data regarding changes in joint angles;
A model generation unit that generates a machine learning model, including an encoder that generates feature information indicating characteristics of the time series data based on the time series data, and a decoder that generates the time series data based on the feature information. and,
An information processing device comprising:

The model generation unit is
learning the machine learning model so that the degree of similarity between the time series data input to the encoder and the time series data output from the decoder exceeds a predetermined threshold;
The information processing device according to claim 7.

The model generation unit is
learning the machine learning model so that the probability distribution of the feature information follows a normal distribution;
The information processing device according to claim 7.

The acquisition unit includes:
obtaining the time series data including attribute information corresponding to the time series data;
The model generation unit is
learning the machine learning model so that the probability distribution of the feature information indicating the characteristics of the time series data including the attribute information follows a normal distribution;
The information processing device according to claim 9.

The model generation unit is
classifying the feature information into clusters according to the attribute information;
The information processing device according to claim 10.

The attribute information may include the type of body movement of the subject included in the exercise video corresponding to the time series data, the proficiency level of the subject's body movement, the characteristics of the subject's body movement, or Information indicating the subject's biological information,
The information processing device according to claim 3 or 10.

An information processing method realized by a program executed by an information processing device, the method comprising:
Obtaining a machine learning model that includes an encoder that generates feature information indicating characteristics of the time series data based on time series data regarding changes in joint angles, and a decoder that generates the time series data based on the feature information. an acquisition process to
a generation step of generating second time-series data regarding changes in the user's joint angles from first time-series data regarding changes in the user's joint angles using the latent space of the machine learning model;
Information processing methods including.

An information processing method realized by a program executed by an information processing device, the method comprising:
an acquisition step of acquiring time series data regarding changes in joint angle;
A model generation step of generating a machine learning model including an encoder that generates feature information indicating characteristics of the time series data based on the time series data, and a decoder that generates the time series data based on the feature information. and,
Information processing methods including.

Obtaining a machine learning model that includes an encoder that generates feature information indicating characteristics of the time series data based on time series data regarding changes in joint angles, and a decoder that generates the time series data based on the feature information. the acquisition procedure,
a generation procedure of generating second time series data regarding changes in the user's joint angles from first time series data regarding changes in the user's joint angles using the latent space of the machine learning model;
An information processing program that causes a computer to execute.

an acquisition procedure for acquiring time series data regarding changes in joint angle;
A model generation procedure for generating a machine learning model including an encoder that generates feature information indicating characteristics of the time series data based on the time series data, and a decoder that generates the time series data based on the feature information. and,
An information processing program that causes a computer to execute.