WO2024023902A1 - Information processing device, motion transfer method, and program - Google Patents

Information processing device, motion transfer method, and program Download PDF

Info

Publication number
WO2024023902A1
WO2024023902A1 PCT/JP2022/028671 JP2022028671W WO2024023902A1 WO 2024023902 A1 WO2024023902 A1 WO 2024023902A1 JP 2022028671 W JP2022028671 W JP 2022028671W WO 2024023902 A1 WO2024023902 A1 WO 2024023902A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
feature data
video
data
gesture
Prior art date
Application number
PCT/JP2022/028671
Other languages
French (fr)
Japanese (ja)
Inventor
雄貴 蔵内
俊一 瀬古
隆二 山本
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/028671 priority Critical patent/WO2024023902A1/en
Publication of WO2024023902A1 publication Critical patent/WO2024023902A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/802D [Two Dimensional] animation, e.g. using sprites

Definitions

  • the present invention relates to an information processing device, a gesture transcription method, and a program.
  • Non-Patent Document 1 discloses a technique for extracting data indicating a specific gesture from video data of a person and transferring it to video data of another person in real time.
  • the disclosed technology aims to reduce the amount of video data required to transcribe gestures.
  • the disclosed technology includes: a feature extraction unit configured to extract a plurality of feature data indicating a specific gesture from gesture video data indicating an image including a gesture; a control unit configured to receive a transfer request and select feature data corresponding to the transfer request from the synthesized feature data;
  • An information processing apparatus includes a feature transfer unit configured to transfer input video data to generate output video data.
  • the amount of video data required to transcribe gestures can be kept small.
  • FIG. 1 is a diagram illustrating an example of a functional configuration of an information processing device according to Example 1 of an embodiment of the present invention
  • FIG. 3 is a flowchart illustrating an example of the flow of feature transfer processing according to Example 1 of the embodiment of the present invention.
  • FIG. 3 is a diagram for explaining an overview of feature transfer processing according to Example 1 of the embodiment of the present invention.
  • FIG. 2 is a diagram for explaining a method for synthesizing feature data according to Example 1 of the embodiment of the present invention.
  • FIG. 3 is a diagram illustrating an example of a functional configuration of an information processing device according to Example 2 of the embodiment of the present invention.
  • 12 is a flowchart illustrating an example of the flow of feature transfer processing according to Example 2 of the embodiment of the present invention.
  • 1 is a diagram showing an example of a hardware configuration of a computer.
  • Example 1 and Example 2 will be described as specific examples of this embodiment.
  • Example 1 In this example, a plurality of feature data extracted from data indicating a gesture video are combined, and the video is processed based on the combined feature data to reflect various gestures on the person in the video. Let's discuss an example.
  • FIG. 1 is a diagram illustrating an example of the functional configuration of an information processing apparatus according to Example 1 of the embodiment of the present invention.
  • the information processing device 10 includes a gesture video storage section 11, a feature extraction section 12, a feature synthesis section 13, a control section 14, an input video storage section 15, a feature transfer section 16, and an output A video storage section 17 is provided.
  • the gesture video storage unit 11 stores data indicating a gesture video.
  • a gesture video is a pre-recorded video of a person's gestures.
  • Gestures are actions that convey emotions, intentions, etc., such as facial expressions, blinking, nodding, posture, gestures, and gaze.
  • the feature extraction unit 12 extracts a plurality of feature data from the data indicating the gesture video according to the content of the specific gesture.
  • the data to be extracted is extracted for each gesture content, such as feature data of "smile”, feature data of "nod”, etc.
  • the feature synthesis unit 13 synthesizes the extracted plurality of feature data. For example, the feature synthesis unit 13 synthesizes the feature data of "smile” and the feature data of "nod” to generate feature data of "smile and nod", which is a combination of "smile” and "nod”. .
  • the feature data may be, for example, vector data indicating features. Therefore, the feature synthesis unit 13 may synthesize a plurality of feature data by vector synthesis.
  • the control unit 14 receives the transfer request and selects feature data corresponding to the transfer request from the synthesized feature data.
  • the transcription request is a request for transcription in which a specific gesture is specified by a user's operation or the like.
  • the control unit 14 may select feature data from either the combined feature data or the non-combined feature data. For example, the control unit 14 may select the feature data from any of the feature data of "smile", the feature data of "nod", and the feature data of "smile and nod".
  • the input video storage unit 15 stores data indicating input video.
  • the input video is a video of the user photographed by a photographing device such as a web camera.
  • the feature transfer unit 16 transfers the feature data output by the control unit 14 to the input video.
  • the feature transfer unit 16 transfers feature data of "smiling and nodding" to an input video of an expressionless user, thereby converting it into video data representing a smiling and nodding user, and outputs the video data.
  • the output video storage unit 17 stores the video data output by the feature transfer unit 16.
  • the information processing device 10 executes feature transfer processing in response to a user's operation or the like.
  • FIG. 2 is a flowchart showing an example of the flow of feature transfer processing according to Example 1 of the embodiment of the present invention.
  • the feature extraction unit 12 extracts a plurality of feature data from the gesture video (step S11).
  • the feature synthesis unit 13 synthesizes the plurality of extracted feature data (step S12).
  • the control unit 14 receives a transfer request through a user's operation or the like, it selects feature data corresponding to the transfer request from the synthesized feature data (step S13).
  • the feature transfer unit 16 transfers the feature data to the input video to generate an output video (step S14).
  • the generated output video is stored in the output video storage section 17.
  • the information processing device 10 outputs the generated output video (step S15).
  • FIG. 3 is a diagram for explaining an overview of feature transfer processing according to Example 1 of the embodiment of the present invention.
  • the feature data 101 is an example of feature data of "nod".
  • the feature data 101 is, for example, a feature vector characterized by conversion from a normal video 101a to a "nodding" video 101b.
  • the feature data 102 is an example of "smile" feature data.
  • the feature data 102 is, for example, a feature vector characterized by conversion from a normal video 102a to a "smile" video 102b.
  • the feature data 103 is an example of "smile and nod” feature data that is a combination of "smile” feature data and "nod” feature data.
  • the feature data 103 is, for example, a feature vector characterized by conversion from a normal video 103a to a "smile and nod" video 103b.
  • the normal video 101a, the normal video 102a, and the normal video 103a may be the same video or different videos.
  • the video 104 is an example of an input video.
  • Video 105 is an example of an output video.
  • a video 105 is generated that includes an image in which the person in the video 104 is smiling and nodding.
  • the person appearing in the input video and the person appearing in the gesture video may be the same person or different people. What appears in the input video or the gesture video may or may not be a person, and may be, for example, an animal other than a person, such as a dog or a cat.
  • FIG. 4 is a diagram for explaining a method for synthesizing feature data according to Example 1 of the embodiment of the present invention.
  • the input/output video data and the transferred feature data are each expressed as vector data (video vector and feature vector) by edge processing and the like included in the video.
  • vector data video vector and feature vector
  • an input image 202a in which a person A is captured is characterized by an image vector 301a starting from the origin 201.
  • the input image 202b in which the person B is captured is characterized by an image vector 301b starting from the origin 201.
  • the feature vector 302a and the feature vector 302b may be the same vector.
  • the feature vector 303a and the feature vector 303b may be the same vector.
  • step S12 of the feature transfer process described above the feature synthesis unit 13 synthesizes, for example, the feature vector 302a and the feature vector 303a. Then, the feature transfer unit 16 transfers the combined feature vectors to 202a to generate an image 204a, and transfers the combined feature vectors to 202b to generate an image 204b.
  • various gestures can be applied to a person etc. in the video. It can be reflected. Therefore, if you want to combine multiple elements, such as smiling and nodding, you do not need as many videos as the number of combinations of elements, so the amount of video data required to transcribe the gesture can be kept small. .
  • Example 2 Example 2 will be described below with reference to the drawings.
  • the second embodiment differs from the first embodiment in that emotions are estimated based on input video. Therefore, in the following explanation of the second embodiment, the differences from the first embodiment will be mainly explained, and parts having the same functional configuration as the first embodiment will be designated by the same reference numerals as used in the explanation of the first embodiment. A symbol is given and the explanation thereof is omitted.
  • This embodiment is an example for solving the following problems. That is, when transferring feature data extracted based on a gesture image to an input image, the facial expressions of the gesture image to be converted (for example, the normal image 101a, the normal image 102a, etc. shown in FIG. 3) and the input image are different from each other. Must match. For example, if the source of the gesture video is an expressionless expression and the destination is a smiling expression, it is fine as long as the input video is expressionless, but the source of the gesture video is an angry expression and the destination is a smiling expression. If the input video has a neutral expression, the conversion may not be successful.
  • emotions are estimated based on the input video, and feature data corresponding to the estimated emotions are synthesized.
  • FIG. 5 is a diagram illustrating an example of a functional configuration of an information processing device according to Example 2 of the embodiment of the present invention.
  • the information processing device 10 according to the present embodiment has a configuration in which an emotion estimation unit 18 is added to the information processing device 10 according to the first embodiment.
  • the emotion estimation unit 18 estimates emotions based on the input video. For example, the emotion estimation unit 18 may estimate what kind of emotion the person is feeling based on the facial expression of the person in the input video. For example, the emotion of joy is estimated based on an image of a person with a smiling face.
  • the feature synthesis unit 13 synthesizes feature data corresponding to the estimated emotion from among the plurality of extracted feature data.
  • FIG. 6 is a flowchart showing an example of the flow of feature transfer processing according to Example 2 of the embodiment of the present invention.
  • the feature extraction unit 12 extracts a plurality of feature data from the gesture video (step S21).
  • the emotion estimation unit 18 estimates emotions based on the input video (step S22).
  • the feature synthesis unit 13 synthesizes feature data corresponding to the estimated emotion from among the plurality of extracted feature data (step S23).
  • the control unit 14 receives a transfer request through a user's operation or the like, it selects feature data corresponding to the transfer request from the synthesized feature data (step S24).
  • the feature transfer unit 16 transfers the input video to the selected feature data to generate an output video (step S25).
  • the generated output video is stored in the output video storage section 17.
  • the information processing device 10 outputs the generated output video (step S26).
  • an emotion is estimated based on an input video, and feature data corresponding to the estimated emotion is synthesized.
  • feature data corresponding to the estimated emotion is synthesized.
  • This makes it possible to synthesize and use feature data suitable for the input video.
  • facial expressions can be appropriately converted using feature data based on a gesture video with the same facial expression as the facial expression of a person in the input video.
  • the information processing device 10 according to this embodiment is realized, for example, by the hardware configuration of a computer 500 shown in FIG. 7.
  • FIG. 7 is a diagram showing an example of the hardware configuration of the computer.
  • the computer in FIG. 7 includes a drive device 1000, an auxiliary storage device 1002, a memory device 1003, a CPU 1004, an interface device 1005, a display device 1006, an input device 1007, an output device 1008, and the like, which are interconnected via a bus B.
  • a program that realizes processing on the computer is provided, for example, on a recording medium 1001 such as a CD-ROM or a memory card.
  • a recording medium 1001 such as a CD-ROM or a memory card.
  • the program is installed from the recording medium 1001 to the auxiliary storage device 1002 via the drive device 1000.
  • the program does not necessarily need to be installed from the recording medium 1001, and may be downloaded from another computer via a network.
  • the auxiliary storage device 1002 stores installed programs as well as necessary files, data, and the like.
  • the memory device 1003 reads and stores the program from the auxiliary storage device 1002 when there is an instruction to start the program.
  • the CPU 1004 implements functions related to the device according to programs stored in the memory device 1003.
  • the interface device 1005 is used as an interface for connecting to a network.
  • a display device 1006 displays a GUI (Graphical User Interface) and the like based on a program.
  • the input device 1007 is composed of a keyboard, a mouse, buttons, a touch panel, or the like, and is used to input various operation instructions.
  • An output device 1008 outputs the calculation result.
  • the computer may include a GPU (Graphics Processing Unit) or a TPU (Tensor Processing Unit) instead of the CPU 1004, or may include a GPU or a TPU in addition to the CPU 1004. In that case, the processing may be divided and executed such that the GPU or TPU executes processing that requires special calculations, and the CPU 1004 executes other processing.
  • the information processing device 10 is realized by reading a program for causing the computer 500 to execute each of the above-described processes, and executing the processes specified in the program.
  • the program may be recorded on the recording medium 503a or the like, or may be provided through a network.
  • This specification describes at least the information processing device, gesture transcription method, and program described in the following sections.
  • (Section 1) a feature extraction unit configured to extract a plurality of feature data indicating a specific gesture from gesture video data indicating an image including the gesture; a feature synthesis unit configured to synthesize the plurality of feature data; a control unit configured to receive a transfer request and select feature data corresponding to the transfer request from the synthesized feature data; a feature transfer unit configured to transfer the selected feature data to input video data to generate output video data; Information processing device.
  • the feature synthesis unit is configured to synthesize a plurality of vector data indicating the plurality of feature data by vector synthesis.
  • the information processing device according to item 1.
  • (Section 3) further comprising an emotion estimation unit configured to estimate an emotion based on the input video data,
  • the feature synthesis unit is configured to synthesize feature data corresponding to the estimated emotion from among the plurality of extracted feature data.
  • the information processing device according to item 1 or 2.
  • (Section 4) A gesture transcription method performed by a computer, the method comprising: extracting a plurality of feature data indicating a specific gesture from gesture video data indicating an image including the gesture; a step of synthesizing the plurality of feature data; receiving a transcription request and selecting feature data corresponding to the transcription request from the synthesized feature data; transcribing the selected feature data to input video data to generate output video data; Gesture transcription method.
  • (Section 5) A program for causing a computer to function as each part of the information processing apparatus according to any one of items 1 to 3.
  • Information processing device 11 Gesture video storage unit 12 Feature extraction unit 13 Feature synthesis unit 14 Control unit 15 Input video storage unit 16 Feature transfer unit 17 Output video storage unit 18 Emotion estimation unit 1000 Drive device 1001 Recording medium 1002 Auxiliary storage device 1003 Memory device 1004 CPU 1005 Interface device 1006 Display device 1007 Input device 1008 Output device

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

An information processing device comprising: a feature extraction unit that is configured to extract a plurality of feature data indicating a specific motion from motion video data indicating a motion-including video; a feature synthesis unit that is configured to synthesize the plurality of feature data; a control unit that is configured to select, upon receiving a transfer request, feature data that corresponds to the transfer request from the synthesized feature data; and a feature transfer unit that is configured to transfer the selected feature data to input video data and generate output video data.

Description

情報処理装置、身振転写方法およびプログラムInformation processing device, gesture transcription method and program
 本発明は、情報処理装置、身振転写方法およびプログラムに関する。 The present invention relates to an information processing device, a gesture transcription method, and a program.
 人物の映像データを、頷き、笑顔などのような特定の身振りをさせた映像データに変換する技術が知られている。例えば、非特許文献1には、人物の映像データから特定の身振りを示すデータを抽出し、他の人物の映像データに対してリアルタイムに転写する技術が開示されている。 There is a known technology that converts video data of a person into video data of a person making a specific gesture such as nodding or smiling. For example, Non-Patent Document 1 discloses a technique for extracting data indicating a specific gesture from video data of a person and transferring it to video data of another person in real time.
 ビデオ会議等において、参加者の映像に表情、まばたき、頷き、姿勢、相槌、視線等の身振りを追加することによって、円滑な人間関係の構築や会議の進行を可能とすることが考えられる。しかし、従来の技術では、抽出元の人物の身振りを示す映像データを時系列順に対象の映像データに転写するだけであるため、笑顔で頷くなどの複数の要素を組み合わせたい場合、要素の組み合わせ数に相当する映像が必要となるため、必要な映像データの量が多くなるという問題がある。 By adding gestures such as facial expressions, blinks, nods, postures, compliments, and glances to the images of participants in video conferences, it is possible to build smooth interpersonal relationships and facilitate the progress of the meeting. However, with conventional technology, only the video data showing the gestures of the source person is transferred to the target video data in chronological order. Therefore, there is a problem in that the amount of required video data increases.
 開示の技術は、身振りを転写させるために必要な映像データの量を少なく抑えることを目的とする。 The disclosed technology aims to reduce the amount of video data required to transcribe gestures.
 開示の技術は、身振りを含む映像を示す身振映像データから、特定の身振りを示す複数の特徴データを抽出するように構成されている特徴抽出部と、前記複数の特徴データを合成するように構成されている特徴合成部と、転写要求を受けて、合成された前記特徴データから前記転写要求に対応する特徴データを選択するように構成されている制御部と、選択された前記特徴データを入力映像データに転写して出力映像データを生成するように構成されている特徴転写部と、を備える情報処理装置である。 The disclosed technology includes: a feature extraction unit configured to extract a plurality of feature data indicating a specific gesture from gesture video data indicating an image including a gesture; a control unit configured to receive a transfer request and select feature data corresponding to the transfer request from the synthesized feature data; An information processing apparatus includes a feature transfer unit configured to transfer input video data to generate output video data.
 身振りを転写させるために必要な映像データの量を少なく抑えることができる。 The amount of video data required to transcribe gestures can be kept small.
本発明の実施の形態の実施例1に係る情報処理装置の機能構成の一例を示す図である。1 is a diagram illustrating an example of a functional configuration of an information processing device according to Example 1 of an embodiment of the present invention; FIG. 本発明の実施の形態の実施例1に係る特徴転写処理の流れの一例を示すフローチャートである。3 is a flowchart illustrating an example of the flow of feature transfer processing according to Example 1 of the embodiment of the present invention. 本発明の実施の形態の実施例1に係る特徴転写処理の概要について説明するための図である。FIG. 3 is a diagram for explaining an overview of feature transfer processing according to Example 1 of the embodiment of the present invention. 本発明の実施の形態の実施例1に係る特徴データの合成方法について説明するための図である。FIG. 2 is a diagram for explaining a method for synthesizing feature data according to Example 1 of the embodiment of the present invention. 本発明の実施の形態の実施例2に係る情報処理装置の機能構成の一例を示す図である。FIG. 3 is a diagram illustrating an example of a functional configuration of an information processing device according to Example 2 of the embodiment of the present invention. 本発明の実施の形態の実施例2に係る特徴転写処理の流れの一例を示すフローチャートである。12 is a flowchart illustrating an example of the flow of feature transfer processing according to Example 2 of the embodiment of the present invention. コンピュータのハードウェア構成例を示す図である。1 is a diagram showing an example of a hardware configuration of a computer.
 以下、図面を参照して本発明の実施の形態(本実施の形態)を説明する。以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。 Hereinafter, an embodiment of the present invention (this embodiment) will be described with reference to the drawings. The embodiments described below are merely examples, and embodiments to which the present invention is applied are not limited to the following embodiments.
 以下、本実施の形態の具体的な実施例として、実施例1および実施例2について説明する。 Hereinafter, Example 1 and Example 2 will be described as specific examples of this embodiment.
 (実施例1)
 本実施例では、身振映像を示すデータから抽出された複数の特徴データを合成し、合成された特徴データに基づいて映像を加工することによって、映像に写る人物等にさまざまな身振りを反映させる例について説明する。
(Example 1)
In this example, a plurality of feature data extracted from data indicating a gesture video are combined, and the video is processed based on the combined feature data to reflect various gestures on the person in the video. Let's discuss an example.
 図1は、本発明の実施の形態の実施例1に係る情報処理装置の機能構成の一例を示す図である。本実施例に係る情報処理装置10は、身振映像格納部11と、特徴抽出部12と、特徴合成部13と、制御部14と、入力映像格納部15と、特徴転写部16と、出力映像格納部17と、を備える。 FIG. 1 is a diagram illustrating an example of the functional configuration of an information processing apparatus according to Example 1 of the embodiment of the present invention. The information processing device 10 according to the present embodiment includes a gesture video storage section 11, a feature extraction section 12, a feature synthesis section 13, a control section 14, an input video storage section 15, a feature transfer section 16, and an output A video storage section 17 is provided.
 身振映像格納部11は、身振映像を示すデータを格納する。身振映像は、人物の身振りの動作をあらかじめ録画した映像である。身振りとは、例えば、表情、まばたき、頷き、姿勢、相槌、視線等のように、感情や意思などを伝達する動作である。 The gesture video storage unit 11 stores data indicating a gesture video. A gesture video is a pre-recorded video of a person's gestures. Gestures are actions that convey emotions, intentions, etc., such as facial expressions, blinking, nodding, posture, gestures, and gaze.
 特徴抽出部12は、特定の身振りの内容に沿って、身振映像を示すデータから複数の特徴データを抽出する。抽出されるデータは、例えば「笑顔」の特徴データ、「頷き」の特徴データ等のように、身振りの内容ごとに抽出される。 The feature extraction unit 12 extracts a plurality of feature data from the data indicating the gesture video according to the content of the specific gesture. The data to be extracted is extracted for each gesture content, such as feature data of "smile", feature data of "nod", etc.
 特徴合成部13は、抽出された複数の特徴データを合成する。特徴合成部13は、例えば、「笑顔」の特徴データと「頷き」の特徴データとを合成して、「笑顔」と「頷き」とが複合した動作「笑顔で頷き」の特徴データを生成する。特徴データは、例えば特徴を示すベクトルデータであってもよい。そこで、特徴合成部13は、ベクトル合成によって複数の特徴データを合成してもよい。 The feature synthesis unit 13 synthesizes the extracted plurality of feature data. For example, the feature synthesis unit 13 synthesizes the feature data of "smile" and the feature data of "nod" to generate feature data of "smile and nod", which is a combination of "smile" and "nod". . The feature data may be, for example, vector data indicating features. Therefore, the feature synthesis unit 13 may synthesize a plurality of feature data by vector synthesis.
 制御部14は、転写要求を受けて、合成された特徴データから転写要求に対応する特徴データを選択する。転写要求は、ユーザの操作等によって、特定の身振りを指定した転写の要求である。なお、制御部14は、合成された特徴データおよび合成されない特徴データのいずれかから特徴データを選択してもよい。例えば、制御部14は、「笑顔」の特徴データ、「頷き」の特徴データ、および「笑顔で頷き」の特徴データのいずれかから特徴データを選択してもよい。 The control unit 14 receives the transfer request and selects feature data corresponding to the transfer request from the synthesized feature data. The transcription request is a request for transcription in which a specific gesture is specified by a user's operation or the like. Note that the control unit 14 may select feature data from either the combined feature data or the non-combined feature data. For example, the control unit 14 may select the feature data from any of the feature data of "smile", the feature data of "nod", and the feature data of "smile and nod".
 入力映像格納部15は、入力映像を示すデータを格納する。入力映像は、Webカメラ等の撮影装置によってユーザを撮影した映像である。 The input video storage unit 15 stores data indicating input video. The input video is a video of the user photographed by a photographing device such as a web camera.
 特徴転写部16は、制御部14によって出力された特徴データを、入力映像に転写する。例えば、特徴転写部16は、無表情なユーザを撮影した入力映像に、「笑顔で頷き」の特徴データを転写することによって、笑顔で頷くユーザを表す映像データに変換して出力する。 The feature transfer unit 16 transfers the feature data output by the control unit 14 to the input video. For example, the feature transfer unit 16 transfers feature data of "smiling and nodding" to an input video of an expressionless user, thereby converting it into video data representing a smiling and nodding user, and outputs the video data.
 出力映像格納部17は、特徴転写部16によって出力された映像データを格納する。 The output video storage unit 17 stores the video data output by the feature transfer unit 16.
 次に、本実施例に係る情報処理装置10の動作について説明する。情報処理装置10は、ユーザの操作等に応じて、特徴転写処理を実行する。 Next, the operation of the information processing device 10 according to this embodiment will be explained. The information processing device 10 executes feature transfer processing in response to a user's operation or the like.
 図2は、本発明の実施の形態の実施例1に係る特徴転写処理の流れの一例を示すフローチャートである。特徴転写処理を開始すると、特徴抽出部12は、身振映像から複数の特徴データを抽出する(ステップS11)。 FIG. 2 is a flowchart showing an example of the flow of feature transfer processing according to Example 1 of the embodiment of the present invention. When the feature transfer process is started, the feature extraction unit 12 extracts a plurality of feature data from the gesture video (step S11).
 続いて、特徴合成部13は、抽出された複数の特徴データを合成する(ステップS12)。
制御部14は、ユーザの操作等によって転写要求を受けると、合成された特徴データから転写要求に対応する特徴データを選択する(ステップS13)。
Subsequently, the feature synthesis unit 13 synthesizes the plurality of extracted feature data (step S12).
When the control unit 14 receives a transfer request through a user's operation or the like, it selects feature data corresponding to the transfer request from the synthesized feature data (step S13).
 次に、特徴転写部16は、入力映像に特徴データを転写して出力映像を生成する(ステップS14)。生成された出力映像は、出力映像格納部17に格納される。そして、情報処理装置10は、生成された出力映像を出力する(ステップS15)。 Next, the feature transfer unit 16 transfers the feature data to the input video to generate an output video (step S14). The generated output video is stored in the output video storage section 17. Then, the information processing device 10 outputs the generated output video (step S15).
 図3は、本発明の実施の形態の実施例1に係る特徴転写処理の概要について説明するための図である。特徴データ101は、「頷き」の特徴データの一例である。特徴データ101は、例えば、通常映像101aから「頷き」映像101bへの変換によって特徴付けられる特徴ベクトルである。 FIG. 3 is a diagram for explaining an overview of feature transfer processing according to Example 1 of the embodiment of the present invention. The feature data 101 is an example of feature data of "nod". The feature data 101 is, for example, a feature vector characterized by conversion from a normal video 101a to a "nodding" video 101b.
 特徴データ102は、「笑顔」の特徴データの一例である。特徴データ102は、例えば、例えば、通常映像102aから「笑顔」映像102bへの変換によって特徴付けられる特徴ベクトルである。 The feature data 102 is an example of "smile" feature data. The feature data 102 is, for example, a feature vector characterized by conversion from a normal video 102a to a "smile" video 102b.
 特徴データ103は、「笑顔」の特徴データと「頷き」の特徴データとが合成された「笑顔で頷き」の特徴データの一例である。特徴データ103は、例えば、通常映像103aから「笑顔で頷き」映像103bへの変換によって特徴付けられる特徴ベクトルである。 The feature data 103 is an example of "smile and nod" feature data that is a combination of "smile" feature data and "nod" feature data. The feature data 103 is, for example, a feature vector characterized by conversion from a normal video 103a to a "smile and nod" video 103b.
 ここで、通常映像101a、通常映像102aおよび通常映像103aは、互いに同じ映像であってもよいし、互いに異なる映像であってもよい。 Here, the normal video 101a, the normal video 102a, and the normal video 103a may be the same video or different videos.
 映像104は、入力される映像の一例である。映像105は、出力される映像の一例である。映像104に、「笑顔で頷き」の特徴を有する特徴データ103が転写されると、映像104に写る人物が笑顔で頷いている映像を含む映像105が生成される。 The video 104 is an example of an input video. Video 105 is an example of an output video. When the feature data 103 having the characteristics of "smiling and nodding" is transferred to the video 104, a video 105 is generated that includes an image in which the person in the video 104 is smiling and nodding.
 ここで、入力される映像に写る人物と身振映像に写る人物とは、互いに同じ人物であってもよいし、互いに異なる人物であってもよい。入力される映像に写るもの、または身振映像に写るものは、人物であってもよいし、人物でなくてもよく、例えば、犬、猫などの人物以外の動物であってもよい。 Here, the person appearing in the input video and the person appearing in the gesture video may be the same person or different people. What appears in the input video or the gesture video may or may not be a person, and may be, for example, an animal other than a person, such as a dog or a cat.
 図4は、本発明の実施の形態の実施例1に係る特徴データの合成方法について説明するための図である。入出力される映像データ、および転写される特徴データは、それぞれ映像に含まれるエッジ処理等によって、ベクトルデータ(映像ベクトルおよび特徴ベクトル)として表される。例えば、人物Aが写る入力映像202aは、原点201を起点とする映像ベクトル301aによって特徴付けられる。 FIG. 4 is a diagram for explaining a method for synthesizing feature data according to Example 1 of the embodiment of the present invention. The input/output video data and the transferred feature data are each expressed as vector data (video vector and feature vector) by edge processing and the like included in the video. For example, an input image 202a in which a person A is captured is characterized by an image vector 301a starting from the origin 201.
 「笑顔」の特徴を有する特徴ベクトル302aを入力映像202aに反映させると、笑顔の人物Aの映像203aが生成される。そして、「頷き」の特徴を有する特徴ベクトル303aを映像203aに反映させると、笑顔で頷く人物Aの映像204aが生成される。 When the feature vector 302a having the feature of "smile" is reflected on the input video 202a, a video 203a of the smiling person A is generated. Then, when the feature vector 303a having the characteristic of "nodding" is reflected in the image 203a, an image 204a of the person A smiling and nodding is generated.
 同様に、人物Bが写る入力映像202bは、原点201を起点とする映像ベクトル301bによって特徴付けられる。 Similarly, the input image 202b in which the person B is captured is characterized by an image vector 301b starting from the origin 201.
 「笑顔」の特徴を有する特徴ベクトル302bを入力映像202bに反映させると、笑顔の人物Bの映像203bが生成される。そして、「頷き」の特徴を有する特徴ベクトル303bを映像203bに反映させると、笑顔で頷く人物Bの映像204bが生成される。 When the feature vector 302b having the feature of "smile" is reflected on the input video 202b, a video 203b of the smiling person B is generated. Then, when the feature vector 303b having the characteristic of "nodding" is reflected in the video 203b, a video 204b of the person B smiling and nodding is generated.
 ここで、特徴ベクトル302aと特徴ベクトル302bとは、同じベクトルであってもよい。同様に、特徴ベクトル303aと特徴ベクトル303bとは、同じベクトルであってもよい。 Here, the feature vector 302a and the feature vector 302b may be the same vector. Similarly, the feature vector 303a and the feature vector 303b may be the same vector.
 上述した特徴転写処理のステップS12で、特徴合成部13は、例えば、特徴ベクトル302aと特徴ベクトル303aとを合成する。そして、特徴転写部16は、合成された特徴ベクトルを202aに転写して映像204aを生成し、合成された特徴ベクトルを202bに転写して映像204bを生成する。 In step S12 of the feature transfer process described above, the feature synthesis unit 13 synthesizes, for example, the feature vector 302a and the feature vector 303a. Then, the feature transfer unit 16 transfers the combined feature vectors to 202a to generate an image 204a, and transfers the combined feature vectors to 202b to generate an image 204b.
 本実施例によれば、身振映像を示すデータから抽出された複数の特徴データを合成し、合成された特徴データに基づいて映像を加工することによって、映像に写る人物等にさまざまな身振りを反映させることができる。したがって、笑顔で頷くなどの複数の要素を組み合わせたい場合、要素の組み合わせ数に相当する映像が必要となるわけではないため、身振りを転写させるために必要な映像データの量を少なく抑えることができる。 According to this embodiment, by combining a plurality of feature data extracted from data indicating a gesture video and processing the video based on the combined feature data, various gestures can be applied to a person etc. in the video. It can be reflected. Therefore, if you want to combine multiple elements, such as smiling and nodding, you do not need as many videos as the number of combinations of elements, so the amount of video data required to transcribe the gesture can be kept small. .
 (実施例2)
 以下に図面を参照して、実施例2について説明する。実施例2は、入力映像に基づいて感情を推定する点が、実施例1と相違する。よって、以下の実施例2の説明では、実施例1との相違点を中心に説明し、実施例1と同様の機能構成を有するものには、実施例1の説明で用いた符号と同様の符号を付与し、その説明を省略する。
(Example 2)
Example 2 will be described below with reference to the drawings. The second embodiment differs from the first embodiment in that emotions are estimated based on input video. Therefore, in the following explanation of the second embodiment, the differences from the first embodiment will be mainly explained, and parts having the same functional configuration as the first embodiment will be designated by the same reference numerals as used in the explanation of the first embodiment. A symbol is given and the explanation thereof is omitted.
 本実施例は、次のような問題を解決するための例である。すなわち、身振映像に基づいて抽出された特徴データを入力映像に転写させる場合、変換元となる身振映像(例えば図3に示される通常映像101a、通常映像102a等)と入力映像の表情が一致している必要がある。例えば、身振映像の変換元が無表情で、変換先が笑った表情である場合、入力映像が無表情であれば良いが、身振映像の変換元が怒った表情で、変換先が笑った表情である場合、入力映像が無表情であると変換がうまくいかない可能性がある。 This embodiment is an example for solving the following problems. That is, when transferring feature data extracted based on a gesture image to an input image, the facial expressions of the gesture image to be converted (for example, the normal image 101a, the normal image 102a, etc. shown in FIG. 3) and the input image are different from each other. Must match. For example, if the source of the gesture video is an expressionless expression and the destination is a smiling expression, it is fine as long as the input video is expressionless, but the source of the gesture video is an angry expression and the destination is a smiling expression. If the input video has a neutral expression, the conversion may not be successful.
 そこで、本実施例では、入力映像に基づいて感情を推定し、推定された感情に対応する特徴データを合成する。 Therefore, in this embodiment, emotions are estimated based on the input video, and feature data corresponding to the estimated emotions are synthesized.
 図5は、本発明の実施の形態の実施例2に係る情報処理装置の機能構成の一例を示す図である。本実施例に係る情報処理装置10は、実施例1に係る情報処理装置10に、感情推定部18を追加した構成を有する。 FIG. 5 is a diagram illustrating an example of a functional configuration of an information processing device according to Example 2 of the embodiment of the present invention. The information processing device 10 according to the present embodiment has a configuration in which an emotion estimation unit 18 is added to the information processing device 10 according to the first embodiment.
 感情推定部18は、入力映像に基づいて感情を推定する。例えば、感情推定部18は、入力映像に写る人物の表情から、当該人物がどのような感情であるかを推定してもよい。例えば、笑顔の表情の人物を写す映像に基づいて、喜びの感情が推定される。 The emotion estimation unit 18 estimates emotions based on the input video. For example, the emotion estimation unit 18 may estimate what kind of emotion the person is feeling based on the facial expression of the person in the input video. For example, the emotion of joy is estimated based on an image of a person with a smiling face.
 また、本実施例に係る特徴合成部13は、抽出された複数の特徴データのうち、推定された感情に対応する特徴データを合成する。 Furthermore, the feature synthesis unit 13 according to the present embodiment synthesizes feature data corresponding to the estimated emotion from among the plurality of extracted feature data.
 図6は、本発明の実施の形態の実施例2に係る特徴転写処理の流れの一例を示すフローチャートである。特徴転写処理を開始すると、特徴抽出部12は、身振映像から複数の特徴データを抽出する(ステップS21)。 FIG. 6 is a flowchart showing an example of the flow of feature transfer processing according to Example 2 of the embodiment of the present invention. When the feature transfer process is started, the feature extraction unit 12 extracts a plurality of feature data from the gesture video (step S21).
 次に、感情推定部18は、入力映像に基づいて感情を推定する(ステップS22)。続いて、特徴合成部13は、抽出された複数の特徴データのうち、推定された感情に対応する特徴データを合成する(ステップS23)。制御部14は、ユーザの操作等によって転写要求を受けると、合成された特徴データから転写要求に対応する特徴データを選択する(ステップS24)。 Next, the emotion estimation unit 18 estimates emotions based on the input video (step S22). Next, the feature synthesis unit 13 synthesizes feature data corresponding to the estimated emotion from among the plurality of extracted feature data (step S23). When the control unit 14 receives a transfer request through a user's operation or the like, it selects feature data corresponding to the transfer request from the synthesized feature data (step S24).
 そして、特徴転写部16は、選択された特徴データに入力映像を転写して出力映像を生成する(ステップS25)。生成された出力映像は、出力映像格納部17に格納される。そして、情報処理装置10は、生成された出力映像を出力する(ステップS26)。 Then, the feature transfer unit 16 transfers the input video to the selected feature data to generate an output video (step S25). The generated output video is stored in the output video storage section 17. Then, the information processing device 10 outputs the generated output video (step S26).
 本実施例によれば、入力映像に基づいて感情を推定し、推定された感情に対応する特徴データを合成する。これによって、入力映像に適した特徴データを合成し、使用することができる。例えば、入力映像に写る人物の表情と同じ表情の身振映像に基づく特徴データを使用して、表情の変換を適切に行うことができる。 According to this embodiment, an emotion is estimated based on an input video, and feature data corresponding to the estimated emotion is synthesized. This makes it possible to synthesize and use feature data suitable for the input video. For example, facial expressions can be appropriately converted using feature data based on a gesture video with the same facial expression as the facial expression of a person in the input video.
 <ハードウェア構成>
 最後に、本実施形態に係る情報処理装置10のハードウェア構成について説明する。本実施形態に係る情報処理装置10は、例えば、図7に示すコンピュータ500のハードウェア構成により実現される。
<Hardware configuration>
Finally, the hardware configuration of the information processing device 10 according to this embodiment will be described. The information processing device 10 according to this embodiment is realized, for example, by the hardware configuration of a computer 500 shown in FIG. 7.
 図7は、上記コンピュータのハードウェア構成例を示す図である。図7のコンピュータは、それぞれバスBで相互に接続されているドライブ装置1000、補助記憶装置1002、メモリ装置1003、CPU1004、インタフェース装置1005、表示装置1006、入力装置1007、出力装置1008等を有する。 FIG. 7 is a diagram showing an example of the hardware configuration of the computer. The computer in FIG. 7 includes a drive device 1000, an auxiliary storage device 1002, a memory device 1003, a CPU 1004, an interface device 1005, a display device 1006, an input device 1007, an output device 1008, and the like, which are interconnected via a bus B.
 当該コンピュータでの処理を実現するプログラムは、例えば、CD-ROM又はメモリカード等の記録媒体1001によって提供される。プログラムを記憶した記録媒体1001がドライブ装置1000にセットされると、プログラムが記録媒体1001からドライブ装置1000を介して補助記憶装置1002にインストールされる。但し、プログラムのインストールは必ずしも記録媒体1001より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置1002は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program that realizes processing on the computer is provided, for example, on a recording medium 1001 such as a CD-ROM or a memory card. When the recording medium 1001 storing the program is set in the drive device 1000, the program is installed from the recording medium 1001 to the auxiliary storage device 1002 via the drive device 1000. However, the program does not necessarily need to be installed from the recording medium 1001, and may be downloaded from another computer via a network. The auxiliary storage device 1002 stores installed programs as well as necessary files, data, and the like.
 メモリ装置1003は、プログラムの起動指示があった場合に、補助記憶装置1002からプログラムを読み出して格納する。CPU1004は、メモリ装置1003に格納されたプログラムに従って、当該装置に係る機能を実現する。インタフェース装置1005は、ネットワークに接続するためのインタフェースとして用いられる。表示装置1006はプログラムによるGUI(Graphical User Interface)等を表示する。入力装置1007はキーボード及びマウス、ボタン、又はタッチパネル等で構成され、様々な操作指示を入力させるために用いられる。出力装置1008は演算結果を出力する。なお、上記コンピュータは、CPU1004の代わりにGPU(Graphics Processing Unit)またはTPU(Tensor processing unit)を備えていても良く、CPU1004に加えて、GPUまたはTPUを備えていても良い。その場合、特殊な演算が必要な処理をGPUまたはTPUが実行し、その他の処理をCPU1004が実行する、というように処理を分担して実行しても良い。 The memory device 1003 reads and stores the program from the auxiliary storage device 1002 when there is an instruction to start the program. The CPU 1004 implements functions related to the device according to programs stored in the memory device 1003. The interface device 1005 is used as an interface for connecting to a network. A display device 1006 displays a GUI (Graphical User Interface) and the like based on a program. The input device 1007 is composed of a keyboard, a mouse, buttons, a touch panel, or the like, and is used to input various operation instructions. An output device 1008 outputs the calculation result. Note that the computer may include a GPU (Graphics Processing Unit) or a TPU (Tensor Processing Unit) instead of the CPU 1004, or may include a GPU or a TPU in addition to the CPU 1004. In that case, the processing may be divided and executed such that the GPU or TPU executes processing that requires special calculations, and the CPU 1004 executes other processing.
 本実施形態に係る情報処理装置10は、上述した各処理をコンピュータ500に実行させるためのプログラムを読み出して、当該プログラムに規定される処理を実行することによって実現される。当該プログラムは、記録媒体503a等に記録されていてもよいし、ネットワークを通して提供されていてもよい。 The information processing device 10 according to the present embodiment is realized by reading a program for causing the computer 500 to execute each of the above-described processes, and executing the processes specified in the program. The program may be recorded on the recording medium 503a or the like, or may be provided through a network.
 (実施の形態のまとめ)
 本明細書には、少なくとも下記の各項に記載した情報処理装置、身振転写方法およびプログラムが記載されている。
(第1項)
 身振りを含む映像を示す身振映像データから、特定の身振りを示す複数の特徴データを抽出するように構成されている特徴抽出部と、
 前記複数の特徴データを合成するように構成されている特徴合成部と、
 転写要求を受けて、合成された前記特徴データから前記転写要求に対応する特徴データを選択するように構成されている制御部と、
 選択された前記特徴データを入力映像データに転写して出力映像データを生成するように構成されている特徴転写部と、を備える、
 情報処理装置。
(第2項)
 前記特徴合成部は、前記複数の特徴データを示す複数のベクトルデータをベクトル合成によって合成するように構成されている、
 第1項に記載の情報処理装置。
(第3項)
 前記入力映像データに基づいて感情を推定するように構成されている感情推定部をさらに備え、
 前記特徴合成部は、抽出された前記複数の特徴データのうち、推定された感情に対応する特徴データを合成するように構成されている、
 第1項または第2項に記載の情報処理装置。
(第4項)
 コンピュータが実行する身振転写方法であって、
 身振りを含む映像を示す身振映像データから、特定の身振りを示す複数の特徴データを抽出するステップと、
 前記複数の特徴データを合成するステップと、
 転写要求を受けて、合成された前記特徴データから前記転写要求に対応する特徴データを選択するステップと、
 選択された前記特徴データを入力映像データに転写して出力映像データを生成するステップと、を備える、
 身振転写方法。
(第5項)
 コンピュータを第1項から第3項のいずれか1項に記載の情報処理装置における各部として機能させるためのプログラム。
(Summary of embodiments)
This specification describes at least the information processing device, gesture transcription method, and program described in the following sections.
(Section 1)
a feature extraction unit configured to extract a plurality of feature data indicating a specific gesture from gesture video data indicating an image including the gesture;
a feature synthesis unit configured to synthesize the plurality of feature data;
a control unit configured to receive a transfer request and select feature data corresponding to the transfer request from the synthesized feature data;
a feature transfer unit configured to transfer the selected feature data to input video data to generate output video data;
Information processing device.
(Section 2)
The feature synthesis unit is configured to synthesize a plurality of vector data indicating the plurality of feature data by vector synthesis.
The information processing device according to item 1.
(Section 3)
further comprising an emotion estimation unit configured to estimate an emotion based on the input video data,
The feature synthesis unit is configured to synthesize feature data corresponding to the estimated emotion from among the plurality of extracted feature data.
The information processing device according to item 1 or 2.
(Section 4)
A gesture transcription method performed by a computer, the method comprising:
extracting a plurality of feature data indicating a specific gesture from gesture video data indicating an image including the gesture;
a step of synthesizing the plurality of feature data;
receiving a transcription request and selecting feature data corresponding to the transcription request from the synthesized feature data;
transcribing the selected feature data to input video data to generate output video data;
Gesture transcription method.
(Section 5)
A program for causing a computer to function as each part of the information processing apparatus according to any one of items 1 to 3.
 以上、本実施の形態について説明したが、本発明はかかる特定の実施形態に限定されるものではなく、請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 Although the present embodiment has been described above, the present invention is not limited to such specific embodiment, and various modifications and changes can be made within the scope of the gist of the present invention as described in the claims. It is.
 10 情報処理装置
 11 身振映像格納部
 12 特徴抽出部
 13 特徴合成部
 14 制御部
 15 入力映像格納部
 16 特徴転写部
 17 出力映像格納部
 18 感情推定部
 1000 ドライブ装置
 1001 記録媒体
 1002 補助記憶装置
 1003 メモリ装置
 1004 CPU
 1005 インタフェース装置
 1006 表示装置
 1007 入力装置
 1008 出力装置
10 Information processing device 11 Gesture video storage unit 12 Feature extraction unit 13 Feature synthesis unit 14 Control unit 15 Input video storage unit 16 Feature transfer unit 17 Output video storage unit 18 Emotion estimation unit 1000 Drive device 1001 Recording medium 1002 Auxiliary storage device 1003 Memory device 1004 CPU
1005 Interface device 1006 Display device 1007 Input device 1008 Output device

Claims (5)

  1.  身振りを含む映像を示す身振映像データから、特定の身振りを示す複数の特徴データを抽出するように構成されている特徴抽出部と、
     前記複数の特徴データを合成するように構成されている特徴合成部と、
     転写要求を受けて、合成された前記特徴データから前記転写要求に対応する特徴データを選択するように構成されている制御部と、
     選択された前記特徴データを入力映像データに転写して出力映像データを生成するように構成されている特徴転写部と、を備える、
     情報処理装置。
    a feature extraction unit configured to extract a plurality of feature data indicating a specific gesture from gesture video data indicating an image including the gesture;
    a feature synthesis unit configured to synthesize the plurality of feature data;
    a control unit configured to receive a transfer request and select feature data corresponding to the transfer request from the synthesized feature data;
    a feature transfer unit configured to transfer the selected feature data to input video data to generate output video data;
    Information processing device.
  2.  前記特徴合成部は、前記複数の特徴データを示す複数のベクトルデータをベクトル合成によって合成するように構成されている、
     請求項1に記載の情報処理装置。
    The feature synthesis unit is configured to synthesize a plurality of vector data indicating the plurality of feature data by vector synthesis.
    The information processing device according to claim 1.
  3.  前記入力映像データに基づいて感情を推定するように構成されている感情推定部をさらに備え、
     前記特徴合成部は、抽出された前記複数の特徴データのうち、推定された感情に対応する特徴データを合成するように構成されている、
     請求項1に記載の情報処理装置。
    further comprising an emotion estimation unit configured to estimate an emotion based on the input video data,
    The feature synthesis unit is configured to synthesize feature data corresponding to the estimated emotion from among the plurality of extracted feature data.
    The information processing device according to claim 1.
  4.  コンピュータが実行する身振転写方法であって、
     身振りを含む映像を示す身振映像データから、特定の身振りを示す複数の特徴データを抽出するステップと、
     前記複数の特徴データを合成するステップと、
     転写要求を受けて、合成された前記特徴データから前記転写要求に対応する特徴データを選択するステップと、
     選択された前記特徴データを入力映像データに転写して出力映像データを生成するステップと、を備える、
     身振転写方法。
    A gesture transcription method performed by a computer, the method comprising:
    extracting a plurality of feature data indicating a specific gesture from gesture video data indicating an image including the gesture;
    a step of synthesizing the plurality of feature data;
    receiving a transcription request and selecting feature data corresponding to the transcription request from the synthesized feature data;
    transcribing the selected feature data to input video data to generate output video data;
    Gesture transcription method.
  5.  コンピュータを請求項1から3のいずれか1項に記載の情報処理装置における各部として機能させるためのプログラム。 A program for causing a computer to function as each part of the information processing apparatus according to any one of claims 1 to 3.
PCT/JP2022/028671 2022-07-25 2022-07-25 Information processing device, motion transfer method, and program WO2024023902A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/028671 WO2024023902A1 (en) 2022-07-25 2022-07-25 Information processing device, motion transfer method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/028671 WO2024023902A1 (en) 2022-07-25 2022-07-25 Information processing device, motion transfer method, and program

Publications (1)

Publication Number Publication Date
WO2024023902A1 true WO2024023902A1 (en) 2024-02-01

Family

ID=89705763

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/028671 WO2024023902A1 (en) 2022-07-25 2022-07-25 Information processing device, motion transfer method, and program

Country Status (1)

Country Link
WO (1) WO2024023902A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190295302A1 (en) * 2018-03-22 2019-09-26 Northeastern University Segmentation Guided Image Generation With Adversarial Networks
CN110647780A (en) * 2018-06-07 2020-01-03 东方联合动画有限公司 Data processing method and system
CN112116684A (en) * 2020-08-05 2020-12-22 中国科学院信息工程研究所 Image processing method, device, equipment and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190295302A1 (en) * 2018-03-22 2019-09-26 Northeastern University Segmentation Guided Image Generation With Adversarial Networks
CN110647780A (en) * 2018-06-07 2020-01-03 东方联合动画有限公司 Data processing method and system
CN112116684A (en) * 2020-08-05 2020-12-22 中国科学院信息工程研究所 Image processing method, device, equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GAFNI ORAN, ASHUAL ORON, WOLF LIOR: "Single-Shot Freestyle Dance Reenactment", CVPR 2021, 18 June 2021 (2021-06-18), pages 1 - 15, XP093133488 *

Similar Documents

Publication Publication Date Title
US9501663B1 (en) Systems and methods for videophone identity cloaking
US20220150285A1 (en) Communication assistance system, communication assistance method, communication assistance program, and image control program
JP2021144679A (en) System, computer implemented method, program for predicting vision-based joint action and posture motion
US9852358B2 (en) Information processing device, information processing method, and information processing system
Chiu et al. Gesture generation with low-dimensional embeddings
CN111383307A (en) Video generation method and device based on portrait and storage medium
KR102448382B1 (en) Electronic device for providing image related with text and operation method thereof
JP2011217098A (en) Information processing system, conference management device, information processing method, method for controlling conference management device, and program
CN111401101A (en) Video generation system based on portrait
US20220375224A1 (en) Device and method for generating speech video along with landmark
US20240013462A1 (en) Audio-driven facial animation with emotion support using machine learning
JP7370525B2 (en) Video distribution system, video distribution method, and video distribution program
Qi et al. Diverse 3d hand gesture prediction from body dynamics by bilateral hand disentanglement
CN111443854A (en) Action processing method, device and equipment based on digital person and storage medium
WO2024023902A1 (en) Information processing device, motion transfer method, and program
CN114567693A (en) Video generation method and device and electronic equipment
CN114902258A (en) Communication support system and communication support program
EP3923149A1 (en) Information processing device and information processing method
US20230215296A1 (en) Method, computing device, and non-transitory computer-readable recording medium to translate audio of video into sign language through avatar
WO2022244146A1 (en) Information processing device, motion transfer method, and program
JP2000089660A (en) Sign language study supporting device and recording medium with sign language study supporting program recorded therein
KR20220003389A (en) Method and apparatus for learning key point of based neural network
Tan et al. EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
KR102584484B1 (en) Apparatus and method for generating speech synsthesis image
KR102601159B1 (en) Virtual human interaction generating device and method therof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22953000

Country of ref document: EP

Kind code of ref document: A1