JP2024028023A

JP2024028023A - Facial expression processing device, facial expression processing method, and facial expression processing program

Info

Publication number: JP2024028023A
Application number: JP2022131325A
Authority: JP
Inventors: 潮美中川
Original assignee: Sony Semiconductor Solutions Corp
Current assignee: Sony Semiconductor Solutions Corp
Priority date: 2022-08-19
Filing date: 2022-08-19
Publication date: 2024-03-01
Also published as: WO2024038699A1

Abstract

【課題】画像におけるユーザの表情を適切に加工する表情加工プログラム、表情加工装置および表情加工方法を提供する。【解決手段】本開示の表情加工プログラムは、ユーザから感情情報の入力を受け付ける感情入力ステップと、前記ユーザの顔を含む第１画像と、前記感情情報とを受け付け、前記感情情報に基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から第２画像を生成する画像処理ステップと、を含む表情加工方法をコンピュータに実行させる。【選択図】図３The present invention provides an expression processing program, an expression processing device, and an expression processing method for appropriately processing a user's expression in an image. The facial expression processing program of the present disclosure includes an emotion input step of receiving emotional information input from a user, a first image including the user's face, and the emotional information, and a facial expression processing program based on the emotional information. A computer is caused to execute an expression processing method including an image processing step of generating a second image from the first image by processing the facial expression of the user in the first image. [Selection diagram] Figure 3

Description

本開示は、表情加工装置、表情加工方法および表情加工プログラムに関する。 The present disclosure relates to an expression processing device, an expression processing method, and an expression processing program.

コンピュータに内蔵されたカメラや、コンピュータに外付けされたウェブカメラを用いてオンライン会議を行う場合、対面の会議と比較して非言語情報の伝達量が少なくなるため、細かな表情の変化を相手に伝えることは難しい。これは、ストリーミング配信でも同様である。そのため、オンライン会議やストリーミング配信では、自分が意図していないような内容で相手に伝わってしまうことが起こりえる。 When conducting an online meeting using a computer's built-in camera or an external webcam, the amount of nonverbal information conveyed is smaller than in a face-to-face meeting, so small changes in facial expressions are difficult to communicate. It's difficult to tell. The same applies to streaming distribution. Therefore, in online meetings and streaming distribution, it is possible for the other party to receive information that you did not intend.

また、近年、アバターを用いたオンライン会議やストリーミング配信も行われている。上記と同様に、アバターを用いた映像では、非言語情報の伝達量が少なくなるため、細かな表情の変化を相手に伝えることは難しい。 In addition, in recent years, online conferences and streaming distribution using avatars have also been conducted. Similarly to the above, in videos using avatars, the amount of nonverbal information transmitted is small, so it is difficult to convey minute changes in facial expressions to the other person.

また、撮像画像中のユーザの感情状態を向上させることを目的とした情報処理装置が知られている。この情報処理装置では、例えば、撮像画像に含まれるユーザの笑顔レベルを測定し、一つ高い笑顔レベルに撮像画像を加工して出力する。 Furthermore, an information processing device is known that aims to improve the emotional state of a user in a captured image. In this information processing device, for example, a user's smile level included in a captured image is measured, and the captured image is processed to have a smile level one level higher and output.

特開２０１７－１８２５９４号公報JP 2017-182594 Publication

上記の情報処理装置は、笑顔という表情に特化し、撮像画像中のユーザの感情状態を向上させることを目的としている。しかし、上記の情報処理装置は、コミュニケーションの向上について十分に着目した装置ではないため、笑顔以外の表情を取り扱うことは考えられていない。 The above-mentioned information processing device specializes in facial expressions such as smiling faces, and aims to improve the emotional state of the user in the captured image. However, since the above-mentioned information processing apparatus is not a device that has sufficiently focused on improving communication, it has not been considered to handle facial expressions other than smiling faces.

そこで、本開示は、画像におけるユーザの表情を適切に加工する表情加工プログラム、表情加工装置および表情加工方法を提供する。 Therefore, the present disclosure provides a facial expression processing program, a facial expression processing device, and a facial expression processing method that appropriately process a user's facial expression in an image.

本開示の第１の側面の表情加工プログラムは、ユーザから感情情報の入力を受け付ける感情入力ステップと、前記ユーザの顔を含む第１画像と、前記感情情報とを受け付け、前記感情情報に基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から第２画像を生成する画像処理ステップと、を含む表情加工方法をコンピュータに実行させる。これにより、例えば、ユーザは、自分の足りない感情表現スキルをこのプログラムにより補うことができ、自分が意図しない内容で相手に伝わるのを防ぐことができる。 The facial expression processing program according to the first aspect of the present disclosure includes an emotion input step of receiving emotional information from a user, a first image including the user's face, and the emotional information, and a facial expression processing program based on the emotional information. A computer is caused to execute an expression processing method including an image processing step of generating a second image from the first image by processing the facial expression of the user in the first image. As a result, for example, the user can use this program to compensate for his or her own lacking emotional expression skills, and can prevent the user from conveying content that is not intended by the user to the other party.

また、この第１の側面において、前記第２画像に基づいて、ネットワーク上の仮想空間で前記ユーザの分身として表示するキャラクターであるアバターを生成するアバター生成ステップをさらに備えてもよい。これにより、例えば、表情が豊かに表現されたアバターを通じてコミュニケーションをとることで、ユーザは、オンライン会議において、齟齬なく意思の疎通を図ることができる。 The first aspect may further include an avatar generation step of generating an avatar, which is a character to be displayed as an alter ego of the user in a virtual space on a network, based on the second image. As a result, for example, by communicating through avatars with rich facial expressions, users can communicate their intentions without discrepancies in online meetings.

また、この第１の側面において、前記アバターの表情は、前記第２画像における前記ユーザの顔の表情と対応するように生成されてもよい。これにより、例えば、表情が豊かに表現されたアバターを通じてコミュニケーションをとることで、ユーザは、オンライン会議において、齟齬なく意思の疎通を図ることができる。 Further, in this first aspect, the facial expression of the avatar may be generated to correspond to the facial expression of the user in the second image. As a result, for example, by communicating through avatars with rich facial expressions, users can communicate their intentions without discrepancies in online meetings.

また、この第１の側面において、前記画像処理ステップは、前記第１画像中の前記ユーザの顔の表情を、前記感情情報に対応する表情に加工してもよい。これにより、例えば、例えば、ユーザは、自分の足りない感情表現スキルをこのプログラムにより補うことができ、自分が意図しない内容で相手に伝わるのを防ぐことができる。 Further, in this first aspect, the image processing step may process the facial expression of the user in the first image into an expression corresponding to the emotional information. As a result, for example, the user can use this program to compensate for his or her own lacking emotional expression skills, and can prevent the user from conveying content that is not intended by the user to the other party.

また、この第１の側面において、前記画像処理ステップは、人間の顔の表情を含む教師画像を取得して、前記第１画像と前記教師画像とを比較することで、前記第１画像中の前記ユーザの顔の表情を加工してもよい。これにより、例えば、感情表現に足りないユーザの顔の動きを分析することができる。 Further, in this first aspect, the image processing step acquires a teacher image including a human facial expression, and compares the first image and the teacher image to obtain a The facial expression of the user may be processed. This makes it possible, for example, to analyze the user's facial movements that are insufficient to express emotions.

また、この第１の側面において、前記画像処理ステップは、前記第１画像から前記ユーザの感情を分析し、前記分析の結果と、前記感情情報と、前記比較の結果とに基づいて、前記第１画像中の前記ユーザの顔の表情を加工してもよい。この比較結果から、画像を加工することで、ユーザは、正しく感情を伝えることができる。 Moreover, in this first aspect, the image processing step analyzes the user's emotion from the first image, and based on the result of the analysis, the emotion information, and the result of the comparison, The facial expression of the user in one image may be processed. Based on this comparison result, by processing the image, the user can convey emotions correctly.

また、この第１の側面において、前記第１画像は、撮像装置によって撮像された画像であってもよい。これにより、例えば、正確な視線の検知や感情の分析が可能となる。 Moreover, in this first aspect, the first image may be an image captured by an imaging device. This makes it possible, for example, to accurately detect line of sight and analyze emotions.

また、この第１の側面において、前記撮像装置は、ＲＧＢ撮像装置またはＲＧＢＩＲ撮像装置であってもよい。これにより、例えば、正確な視線の検知や感情の分析が可能となる。 Further, in this first aspect, the imaging device may be an RGB imaging device or an RGBIR imaging device. This makes it possible, for example, to accurately detect line of sight and analyze emotions.

また、この第１の側面において、前記ユーザから場面情報の入力を受け付ける場面入力ステップをさらに備え、前記画像処理ステップは、前記第１画像と、前記感情情報と、前記場面情報とを受け付け、前記感情情報と、前記場面情報とに基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から前記第２画像を生成してもよい。これにより、例えば、場面の雰囲気に合わせてユーザの顔の表情を加工した画像を出力することができる。そのため、ユーザは、自分が表現したい感情をより正確に、かつ相手に伝わるように表現することができ、コミュニケーションをより向上させることができる。 Further, in this first aspect, the step further includes a scene input step of receiving scene information input from the user, and the image processing step receives the first image, the emotional information, and the scene information, and the image processing step receives the first image, the emotional information, and the scene information, and The second image may be generated from the first image by processing the facial expression of the user in the first image based on emotional information and the scene information. Thereby, for example, it is possible to output an image in which the user's facial expression has been processed to match the atmosphere of the scene. Therefore, the user can express the emotions he/she wants to express more accurately and in a way that can be conveyed to the other party, and communication can be further improved.

また、この第１の側面において、前記ユーザから前記第２画像に関するフィードバックの入力を受け付けるフィードバック入力ステップをさらに備え、前記フィードバックに基づいて前記画像処理ステップが学習を行ってもよい。これにより、例えば、ユーザによるフィードバックの入力により、アバターやユーザの顔の表情を加工して得られた画像について、最適な表現となるように再度加工を行うことで、ユーザは、自分が表現したい感情をより正確に、かつ相手に伝わるように表現することができ、コミュニケーションをより向上させることができる。 Further, in this first aspect, the image processing method may further include a feedback input step of receiving feedback input regarding the second image from the user, and the image processing step may perform learning based on the feedback. This allows, for example, an image obtained by processing an avatar or user's facial expression based on feedback input by the user to be reprocessed to achieve the optimal expression, allowing the user to express the image they want to express. Emotions can be expressed more accurately and conveyed to the other person, and communication can be further improved.

また、この第１の側面において、前記第２画像は、オンライン会議における出力画像として出力されてもよい。これにより、例えば、画面越しでも豊かな表情を感じられるため、ユーザは、相手側ユーザに対してスムーズに発言の交代を促し、コミュニケーションを円滑に進めることができる。 Further, in this first aspect, the second image may be output as an output image in an online conference. As a result, for example, rich facial expressions can be felt even through the screen, so the user can smoothly encourage the other user to change the conversation, and communication can proceed smoothly.

また、この第１の側面において、前記第２画像は、前記ユーザの表示装置または相手側ユーザの表示装置に出力されてもよい。これにより、例えば、画面越しでも豊かな表情を感じられるため、ユーザは、相手側ユーザに対してスムーズに発言の交代を促し、コミュニケーションを円滑に進めることができる。 Further, in this first aspect, the second image may be output to the user's display device or the other party's display device. As a result, for example, rich facial expressions can be felt even through the screen, so that the user can smoothly encourage the other user to take turns in speaking and facilitate communication.

また、この第１の側面において、前記アバターは、オンライン会議における出力画像として出力されてもよい。これにより、例えば、ユーザは、感情が伝わるアバターを通じて、相手側ユーザに話しかけることで、相手側ユーザが自分のことを知らない場合でも、親近感を持ってもらうことができる。 Further, in this first aspect, the avatar may be output as an output image in an online conference. As a result, for example, by talking to the other user through an avatar that conveys emotions, the user can create a sense of intimacy with the other user even if the other user does not know the other user.

また、この第１の側面において、前記アバターは、前記ユーザの表示装置または相手側ユーザの表示装置に出力されてもよい。これにより、例えば、ユーザは、感情が伝わるアバターを通じて、相手側ユーザに話しかけることで、相手側ユーザが自分のことを知らない場合でも、親近感を持ってもらうことができる。 Moreover, in this first aspect, the avatar may be output to the display device of the user or the display device of the other user. As a result, for example, by talking to the other user through an avatar that conveys emotions, the user can create a sense of intimacy with the other user even if the other user does not know the other user.

また、この第１の側面において、相手側ユーザの感情情報を取得する感情情報取得ステップをさらに備え、前記画像処理ステップは、前記第１画像と、前記感情情報と、前記相手側ユーザの感情情報とを受け付け、前記感情情報と、前記相手側ユーザの感情情報とに基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から前記第２画像を生成してもよい。これにより、例えば、ユーザは、相手側ユーザの感情に応じて、自分が表現したい感情をより正確に、かつ相手に伝わるように表現することができ、コミュニケーションをより向上させることができる。 In addition, in this first aspect, the image processing step further includes an emotional information acquisition step of acquiring emotional information of the other user, and the image processing step includes the first image, the emotional information, and the emotional information of the other user. and generates the second image from the first image by processing the facial expression of the user in the first image based on the emotional information and the emotional information of the other user. You can. Thereby, for example, the user can express the emotion he/she wants to express more accurately and in a way that is conveyed to the other party according to the emotion of the other user, and communication can be further improved.

また、この第１の側面において、前記感情情報取得ステップは、前記相手側ユーザの顔を含む第３画像を前記相手側ユーザから受信し、前記第３画像に基づいて前記相手側ユーザの感情情報を取得してもよい。これにより、例えば、ユーザは、相手側ユーザの感情に応じて、自分が表現したい感情をより正確に、かつ相手に伝わるように表現することができ、コミュニケーションをより向上させることができる。 Further, in this first aspect, the emotional information obtaining step includes receiving a third image including the face of the other user from the other user, and based on the third image, emotional information of the other user. may be obtained. Thereby, for example, the user can express the emotion he/she wants to express more accurately and in a way that is conveyed to the other party according to the emotion of the other user, and communication can be further improved.

また、この第１の側面において、前記感情情報取得ステップは、前記第３画像から前記相手側ユーザの感情を分析することで、前記相手側ユーザの感情情報を取得してもよい。これにより、例えば、ユーザは、相手側ユーザの感情に応じて、自分が表現したい感情をより正確に、かつ相手に伝わるように表現することができ、コミュニケーションをより向上させることができる。 Moreover, in this first aspect, the emotional information acquisition step may acquire the emotional information of the other user by analyzing the emotions of the other user from the third image. Thereby, for example, the user can express the emotion he/she wants to express more accurately and in a way that is conveyed to the other party according to the emotion of the other user, and communication can be further improved.

また、この第１の側面において、前記第３画像は、撮像装置によって撮像された画像であってもよい。これにより、例えば、正確な視線の検知や感情の分析が可能となる。 Moreover, in this first aspect, the third image may be an image captured by an imaging device. This makes it possible, for example, to accurately detect line of sight and analyze emotions.

本開示の第２の側面の表情加工装置は、ユーザから感情情報の入力を受け付ける感情入力部と、前記ユーザの顔を含む第１画像と、前記感情情報とを受け付け、前記感情情報に基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から第２画像を生成する画像処理部とを備える。これにより、例えば、ユーザは、自分の足りない感情表現スキルをこの装置により補うことができ、自分が意図しない内容で相手に伝わるのを防ぐことができる。 A facial expression processing device according to a second aspect of the present disclosure includes an emotion input unit that receives input of emotional information from a user, a first image including the user's face, and the emotional information, and receives an input of emotional information from a user based on the emotional information. and an image processing unit that generates a second image from the first image by processing the facial expression of the user in the first image. As a result, for example, the user can use this device to compensate for his or her own lacking emotional expression skills, and can prevent the user from conveying content that is not intended by the user to the other party.

本開示の第３の側面の表情加工方法は、ユーザから感情情報の入力を受け付ける感情入力ステップと、前記ユーザの顔を含む第１画像と、前記感情情報とを受け付け、前記感情情報に基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から第２画像を生成する画像処理ステップとを含む。これにより、例えば、ユーザは、自分の足りない感情表現スキルをこの方法により補うことができ、自分が意図しない内容で相手に伝わるのを防ぐことができる。 A facial expression processing method according to a third aspect of the present disclosure includes an emotion input step of receiving an input of emotional information from a user, receiving a first image including the user's face, and the emotional information, and processing based on the emotional information. and an image processing step of generating a second image from the first image by processing facial expressions of the user in the first image. As a result, for example, the user can use this method to compensate for his or her lacking emotional expression skills, and can prevent the user from conveying content that is not intended by the user to the other party.

第１実施形態における表情加工プログラムを用いて、オンライン会議を行う際のシステム構成図の例An example of a system configuration diagram when holding an online conference using the facial expression processing program in the first embodiment 第１実施形態における表情加工プログラムを用いて、ストリーミング配信を行う際のシステム構成図の例An example of a system configuration diagram when performing streaming distribution using the facial expression processing program in the first embodiment 第１実施形態における表情加工プログラムをインストールした情報処理装置のシステムブロック図System block diagram of an information processing device in which the facial expression processing program in the first embodiment is installed 第１実施形態における表情加工プログラムのフローチャートFlowchart of the facial expression processing program in the first embodiment 撮像装置２によって撮像したユーザの顔を含む画像を表示装置４’に表示した例Example of displaying an image including the user's face captured by the imaging device 2 on the display device 4' 第１実施形態におけるユーザの表情を加工した画像を表示装置４’に表示した例Example of displaying an image processed with the user's facial expression on the display device 4' in the first embodiment 「無表情」のユーザに対応するアバターを表示装置４’に表示した例Example of displaying an avatar corresponding to a user with “no expression” on the display device 4’ 第１実施形態におけるアバターを表示装置４’に表示した例Example of displaying the avatar in the first embodiment on the display device 4' 第２実施形態における表情加工プログラムをインストールした情報処理装置のシステムブロック図System block diagram of an information processing device installed with an expression processing program in the second embodiment 第２実施形態における表情加工プログラムのフローチャートFlowchart of the facial expression processing program in the second embodiment 第２実施形態におけるユーザの表情を加工した画像を表示装置４’に表示した例Example of displaying an image processed with the user's facial expression on the display device 4' in the second embodiment 第２実施形態におけるアバターを表示装置４’に表示した例Example of displaying the avatar on the display device 4' in the second embodiment 第３実施形態における表情加工プログラムをインストールした情報処理装置のシステムブロック図System block diagram of an information processing device installed with an expression processing program according to the third embodiment 第３実施形態における表情加工プログラムのフローチャートFlowchart of the facial expression processing program in the third embodiment 第４実施形態における表情加工プログラムをインストールした情報処理装置のシステムブロック図System block diagram of an information processing device installed with an expression processing program in the fourth embodiment 第４実施形態における表情加工プログラムのフローチャートFlowchart of facial expression processing program in the fourth embodiment 第５実施形態における情報処理装置１のハードウェア構成の一例An example of the hardware configuration of the information processing device 1 in the fifth embodiment

以下、本開示の実施形態を、図面を参照して説明する。 Embodiments of the present disclosure will be described below with reference to the drawings.

（第１実施形態）
図１は、第１実施形態における表情加工プログラムを用いて、オンライン会議を行う際のシステム構成図の例である。 (First embodiment)
FIG. 1 is an example of a system configuration diagram when holding an online conference using the facial expression processing program according to the first embodiment.

図１は、オンライン会議に参加するユーザが使用する情報処理装置１と、この情報処理装置１の撮像装置２とを示している。オンライン会議に参加するユーザは、例えば、表情加工プログラムを情報処理装置１にインストールして実行することができる。情報処理装置１の例は、ＰＣ（Personal Computer）などのコンピュータ装置である。撮像装置２の例は、赤色光（Ｒ）、緑色光（Ｇ）および青色光（Ｂ）用の画素を含むＲＧＢ撮像装置や、赤色光（Ｒ）、緑色光（Ｇ）、青色光（Ｂ）および赤外光（ＩＲ）用の画素を含むＲＧＢＩＲ撮像装置である。撮像装置２は、例えばカメラやイメージセンサである。撮像装置２は、情報処理装置１に接続された外付けの装置であってもよく、または情報処理装置１と一体となった内蔵の装置でもよい。 FIG. 1 shows an information processing device 1 used by users participating in an online conference, and an imaging device 2 of this information processing device 1. A user participating in an online conference can, for example, install and execute a facial expression processing program on the information processing device 1. An example of the information processing device 1 is a computer device such as a PC (Personal Computer). Examples of the imaging device 2 include an RGB imaging device including pixels for red light (R), green light (G), and blue light (B); ) and pixels for infrared light (IR). The imaging device 2 is, for example, a camera or an image sensor. The imaging device 2 may be an external device connected to the information processing device 1, or may be a built-in device integrated with the information processing device 1.

図１はさらに、上記のオンライン会議に参加する相手側ユーザが使用する情報処理装置１’と、この情報処理装置１’の撮像装置２’とを示している。図１では、説明を分かりやすくするため、相手側ユーザの情報処理装置を「情報処理装置１’」と表記し、相手側ユーザの撮像装置を「撮像装置２’」と表記している。情報処理装置１’および撮像装置２’の詳細は、情報処理装置１および撮像装置２と同様である。なお、上記のオンライン会議は、３人以上のユーザが３台以上の情報処理装置を使用して行われてもよい。 FIG. 1 further shows an information processing device 1' used by a user on the other side participating in the online conference, and an imaging device 2' of this information processing device 1'. In FIG. 1, in order to make the explanation easier to understand, the information processing device of the other user is expressed as "information processing device 1'", and the imaging device of the other user is expressed as "imaging device 2'". The details of the information processing device 1' and the imaging device 2' are the same as the information processing device 1 and the imaging device 2. Note that the above online conference may be held by three or more users using three or more information processing devices.

以下、図１に示す情報処理装置１および撮像装置２のさらなる詳細を説明するが、以下の説明は、情報処理装置１’および撮像装置２’にも適用可能である。 Further details of the information processing device 1 and the imaging device 2 shown in FIG. 1 will be described below, but the following description is also applicable to the information processing device 1' and the imaging device 2'.

表情加工プログラムは、画像中のユーザの顔の表情を加工するためのコンピュータプログラムである。表情加工プログラムは、撮像装置２からユーザの顔を含む画像を受け付け、かつ、ユーザからの入力による感情情報を受け付ける。また、表情加工プログラムは、感情情報に基づいて画像中におけるユーザの顔の表情を加工する。撮像装置２によって撮像された画像は、本開示の第１画像の例である。また、ユーザの顔の表情を加工して得られた画像は、本開示の第２画像の例である。表情の加工の詳細については後述する。また、表情加工プログラムは、オンライン会議を行うユーザ全員が使用する必要はなく、自分の表情を加工したいユーザのみが使用してもよい。例えば、表情加工プログラムは、情報処理装置１と情報処理装置１’の各々にインストールされていてもよいし、情報処理装置１のみにインストールされていてもよい。 The facial expression processing program is a computer program for processing the facial expression of a user in an image. The facial expression processing program receives an image including the user's face from the imaging device 2, and also receives emotional information input from the user. Furthermore, the facial expression processing program processes the facial expression of the user in the image based on the emotional information. The image captured by the imaging device 2 is an example of the first image of the present disclosure. Further, an image obtained by processing the user's facial expression is an example of the second image of the present disclosure. Details of facial expression processing will be described later. Further, the facial expression processing program does not need to be used by all users who hold online meetings, and may be used only by users who wish to process their own facial expressions. For example, the facial expression processing program may be installed in each of the information processing device 1 and the information processing device 1', or may be installed only in the information processing device 1.

表情加工プログラムにより作成された画像は、例えば、オンライン会議用のプログラムにおいて、入力画像として使用することができる。オンライン会議用のプログラムは、撮像装置２によって撮像された画像を入力画像として受け付ける代わりに、表情加工プログラムにより生成した画像を入力画像として受け付けることができる。この場合、表情加工プログラムにより生成した画像を各ユーザに出力することで、オンライン会議における表情の加工を実現することができる。また、表情加工プログラムは、オンライン会議用のプログラムの機能の一部として実装されてもよい。例えば、オンライン会議用のプログラムにおいて、ユーザの顔の表情を加工して得られた画像を生成し、各ユーザに出力することが考えられる。 The image created by the facial expression processing program can be used as an input image in, for example, an online conference program. Instead of accepting an image captured by the imaging device 2 as an input image, the online conference program can accept an image generated by a facial expression processing program as an input image. In this case, by outputting images generated by the facial expression processing program to each user, facial expression processing in an online conference can be realized. Further, the facial expression processing program may be implemented as part of the functions of an online conference program. For example, in an online conference program, it is conceivable to generate images obtained by processing users' facial expressions and output them to each user.

このオンライン会議では、情報処理装置１は、顔の表情を加工した画像を、ネットワーク１００を介して情報処理装置１’に送信する。ネットワーク１００は、有線ネットワークおよび無線ネットワークのいずれで構成されていてもよく、表情加工プログラムを使用する目的に応じて、様々なネットワーク１００が構築される。 In this online conference, the information processing device 1 transmits an image with processed facial expressions to the information processing device 1' via the network 100. The network 100 may be configured as either a wired network or a wireless network, and various networks 100 are constructed depending on the purpose of using the facial expression processing program.

図２は、第１実施形態における表情加工プログラムを用いて、ストリーミング配信を行う際のシステム構成図の例である。図２のシステムについては、図１のシステムとの相違点を中心に説明する。 FIG. 2 is an example of a system configuration diagram when performing streaming distribution using the facial expression processing program in the first embodiment. The system shown in FIG. 2 will be explained focusing on the differences from the system shown in FIG. 1.

配信者であるユーザは、例えば、表情加工プログラムを情報処理装置１にインストールして実行することができる。図１は、配信者が使用する１台の情報処理装置１と、この情報処理装置１の撮像装置２とを示している。このストリーミング配信では、配信者は顔の表情を加工した画像を、ネットワーク１００を介してストリーミング配信サーバ３００に送信する。ストリーミング配信サーバ３００は、複数の情報処理装置１’を含む情報処理装置群５００にこの画像をストリーミング配信することができる。情報処理装置群５００の各情報処理装置１’は、視聴者であるユーザにより使用される。 A user who is a distributor can, for example, install and execute a facial expression processing program on the information processing device 1. FIG. 1 shows one information processing device 1 used by a distributor and an imaging device 2 of this information processing device 1. In this streaming distribution, the distributor transmits an image with processed facial expressions to the streaming distribution server 300 via the network 100. The streaming distribution server 300 can stream-distribute this image to the information processing device group 500 including the plurality of information processing devices 1'. Each information processing device 1' of the information processing device group 500 is used by a user who is a viewer.

表情加工プログラムにより作成された画像は、例えば、ストリーミング配信用のプログラムにおいて、入力画像として使用することができる。表情加工プログラムとストリーミング配信用のプログラムとの関係は、上述の表情加工プログラムとオンライン会議用のプログラムとの関係と同様である。 An image created by the facial expression processing program can be used as an input image in a streaming distribution program, for example. The relationship between the facial expression processing program and the program for streaming distribution is the same as the relationship between the facial expression processing program and the online conference program described above.

第１実施形態におけるシステム構成は、図１または図２に記載した構成に限定されず、表情加工プログラムを使用する目的に応じて様々なシステム構成とすることができる。例えば、通信は１対１に限られず、１対Ｎ（Ｎは１以上の整数）の情報処理装置１で行うことができる。以下、説明を分かりやすくするため、１対１で通信を行う場合の例について説明するが、以下の説明は、１対Ｎで通信を行う場合にも適用可能である。 The system configuration in the first embodiment is not limited to the configuration shown in FIG. 1 or 2, and can be configured in various ways depending on the purpose of using the facial expression processing program. For example, communication is not limited to one-to-one communication, but can be performed by one-to-N information processing apparatuses 1 (N is an integer of 1 or more). Hereinafter, in order to make the explanation easier to understand, an example in which one-to-one communication is performed will be described, but the following explanation is also applicable to a case in which one-to-N communication is performed.

図３は、第１実施形態における表情加工プログラムをインストールした情報処理装置１のシステムブロック図である。 FIG. 3 is a system block diagram of the information processing device 1 in which the facial expression processing program according to the first embodiment is installed.

図３は、オンライン会議に参加するユーザが使用する情報処理装置１と、この情報処理装置１の撮像装置２、入力装置３、および表示装置４とを示している。図３はさらに、このオンライン会議に参加する相手側ユーザが使用する情報処理装置１’と、この情報処理装置１’の撮像装置２’、入力装置３’、および表示装置４’とを示している。図３では、説明を分かりやすくするため、相手側ユーザの情報処理装置、撮像装置、入力装置、表示装置をそれぞれ「情報処理装置１’」「撮像装置２’」「入力装置３’」「表示装置４’」と表記している。 FIG. 3 shows an information processing device 1 used by a user participating in an online conference, and an imaging device 2, an input device 3, and a display device 4 of this information processing device 1. FIG. 3 further shows an information processing device 1' used by a user on the other side participating in this online conference, and an imaging device 2', an input device 3', and a display device 4' of this information processing device 1'. There is. In FIG. 3, in order to make the explanation easier to understand, the information processing device, imaging device, input device, and display device of the other party's user are respectively referred to as "information processing device 1'," "imaging device 2'," "input device 3'," and "display device." "Device 4'".

本実施形態では、情報処理装置１内の画像処理部８は、撮像装置２により撮像されたユーザの顔を含む画像を、ユーザにより入力装置３および感情入力部５を介して入力された感情情報に基づいて加工する。画像処理部８はユーザの顔の表情を加工して得られた画像を、表示装置４や表示装置４’に出力することができる。また、画像処理部８は、アバター生成器１１により、この画像からアバターを生成し、表示装置４や表示装置４’に出力することもできる。アバターとは、ネットワーク上の仮想空間でユーザの分身として表示するキャラクターのことである。また、これらの各機能は、表情加工プログラムによって実現することができる。以下、各機能ブロックについて説明する。 In the present embodiment, the image processing unit 8 in the information processing device 1 converts an image including the user's face captured by the imaging device 2 into emotional information input by the user via the input device 3 and the emotion input unit 5. Process based on. The image processing unit 8 can output an image obtained by processing the user's facial expression to the display device 4 or the display device 4'. The image processing unit 8 can also generate an avatar from this image using the avatar generator 11 and output it to the display device 4 or the display device 4'. An avatar is a character displayed as a user's alter ego in a virtual space on a network. Further, each of these functions can be realized by a facial expression processing program. Each functional block will be explained below.

入力装置３は、例えばマウスである。入力装置３は、マウスのほか、キーボードやマイクなども含む。また、入力装置３は、情報処理装置１に接続された外付けの装置であってもよく、または情報処理装置１と一体となった内蔵の装置でもよい。 The input device 3 is, for example, a mouse. The input device 3 includes a mouse, a keyboard, a microphone, and the like. Further, the input device 3 may be an external device connected to the information processing device 1, or may be a built-in device integrated with the information processing device 1.

表示装置４は、例えばディスプレイである。ユーザは、表情加工プログラムにより生成したユーザの顔の表情を加工した画像、またはアバターを表示装置４または４’に出力して表示することができる。例えば、オンライン会議やストリーミング配信を介して、ユーザは、この画像やアバターを表示装置４または４’に出力して表示することが考えられる。表示装置４は、情報処理装置１に接続された外付けの装置であってもよく、または情報処理装置１と一体となった内蔵の装置でもよい。 The display device 4 is, for example, a display. The user can output and display on the display device 4 or 4' an image or an avatar in which the user's facial expression has been processed using the facial expression processing program. For example, the user may output and display this image or avatar on the display device 4 or 4' via an online conference or streaming distribution. The display device 4 may be an external device connected to the information processing device 1, or may be a built-in device integrated with the information processing device 1.

感情入力部５は、例えばプルダウンメニューで表されるユーザインタフェース（不図示）を提供する。感情入力部５は、ユーザから、例えば入力装置３を用いて感情情報の入力を受け付ける。感情情報は例えば、「うれしい」や「悲しい」といった感情に関する情報であり、複数のカテゴリから選択することができる。また、感情入力部５への入力は、プルダウンメニューによる選択に限定されず、マウスによるアイコンの選択、キーボードによる文字入力、マイクによる音声の入力など、様々な入力方法を採用することができる。 The emotion input unit 5 provides a user interface (not shown) represented by a pull-down menu, for example. The emotion input unit 5 receives input of emotion information from the user using, for example, the input device 3. Emotional information is, for example, information related to emotions such as "happy" and "sad" and can be selected from a plurality of categories. Furthermore, the input to the emotion input section 5 is not limited to selection using a pull-down menu, and various input methods can be employed, such as selecting an icon using a mouse, inputting characters using a keyboard, and inputting voice using a microphone.

記憶部６は、感情情報の手本となる人間の顔の表情を含む画像を教師画像７として記憶する。記憶部６は、例えばハードディスクなどの補助記憶装置上に構築される。記憶部６は、情報処理装置１の内部の装置だけでなく、外付けのハードディスクやクラウドサーバといった、外部の装置上に構築されていてもよい。教師画像７は、撮像装置２やデジタルカメラによって撮像された人間の顔の表情を含む画像の他、人間の顔の表情を含むイラストデータまたは３Ｄデータなど様々なデータを採用することができる。 The storage unit 6 stores an image including a human facial expression serving as a model of emotional information as a teacher image 7. The storage unit 6 is constructed, for example, on an auxiliary storage device such as a hard disk. The storage unit 6 may be constructed not only on an internal device of the information processing device 1 but also on an external device such as an external hard disk or a cloud server. The teacher image 7 can employ various data such as an image including a human facial expression captured by the imaging device 2 or a digital camera, as well as illustration data or 3D data including a human facial expression.

画像処理部８は、ＡＩ（Artificial Intelligence）９と、加工器１０と、アバター生成器１１とを備える。画像処理部８は、ユーザの顔を含む画像と、感情情報とを入力として受け付け、感情情報に基づいて、この画像から、ユーザの顔の表情を加工した画像を生成する。加工した画像は、表示装置４または４’に出力することができる。また、ユーザの顔の表情を加工した画像から、アバターを生成することができ、表示装置４または４’に出力することができる。 The image processing unit 8 includes an AI (Artificial Intelligence) 9, a processor 10, and an avatar generator 11. The image processing unit 8 receives an image including the user's face and emotional information as input, and generates an image in which the facial expression of the user is processed from this image based on the emotional information. The processed image can be output to the display device 4 or 4'. Furthermore, an avatar can be generated from an image in which the user's facial expression has been processed, and can be output to the display device 4 or 4'.

ＡＩ９は、ユーザの顔を含む画像と、感情情報とを入力として受け付ける。また、ＡＩ９は、これらの入力に基づいて、感情表現に足りない顔の動きを分析した結果を出力する。ＡＩ９は、例えば、畳み込みニュ―ラルネットワーク（ＣＮＮ）といったアルゴリズムを利用して、ユーザの顔を含む画像と、感情情報と、感情表現に足りない顔の動きとの関係について、教師画像７を用いて学習することが考えられる。ＡＩ９の処理の一例として、ＡＩ９は、受け付けた画像に対して畳み込み演算を実施することにより、画像に含まれるユーザの顔の特徴を抽出する。ＡＩ９は、抽出した特徴に基づいて、ユーザの顔の表情および目線を検知し、感情を分析する。ここで分析される感情は、何らかの形で数値化したものでもよく、または、「無表情」といった定性的に表現したものでもよい。ＡＩ９は、感情情報と、感情を分析した画像と、教師画像７を比較し、感情表現に足りない顔の動きを分析する。例えば、ＡＩ９は、口元の表情について、感情を分析した画像と、感情情報と対応する教師画像７とを比較して、それぞれの特徴を比較することで、感情表現に足りない顔の動きを数値化してもよい。また、ＡＩ９は、感情表現に足りない顔の動きを確認するために、教師画像７を複数用いて比較してもよい。 The AI 9 receives an image including the user's face and emotional information as input. Furthermore, based on these inputs, the AI 9 outputs the results of analyzing facial movements that are insufficient to express emotions. AI9 uses the teacher image 7 to determine the relationship between an image including the user's face, emotional information, and facial movements that are insufficient to express emotion, using an algorithm such as a convolutional neural network (CNN), for example. It is possible to learn by using As an example of processing by the AI 9, the AI 9 extracts features of the user's face included in the image by performing a convolution operation on the received image. The AI 9 detects the user's facial expressions and line of sight based on the extracted features, and analyzes the user's emotions. The emotion analyzed here may be quantified in some form, or may be qualitatively expressed such as "expressionless." The AI 9 compares the emotional information, the emotion-analyzed image, and the teacher image 7, and analyzes facial movements that are insufficient to express the emotion. For example, AI9 compares an image in which emotions are analyzed with the teacher image 7 that corresponds to the emotion information regarding facial expressions around the mouth, and compares the characteristics of each to numerically evaluate facial movements that are insufficient to express emotions. may be converted into Furthermore, the AI 9 may use a plurality of teacher images 7 for comparison in order to confirm facial movements that are insufficient for emotional expression.

ＡＩ９は、感情の分析のために、顔の向きや手振り身振りを検知することを含めてもよい。また、ＡＩ９は、受け付けた画像から目線を分析することを含めてもよい。目線を分析する場合、ＡＩ９は、受け付けた画像から、顔と目線の向きを確認する。 AI9 may include detecting facial orientation and hand gestures for emotional analysis. Furthermore, AI9 may include analyzing the line of sight from the received image. When analyzing the line of sight, the AI 9 checks the direction of the face and line of sight from the received image.

加工器１０は、感情を分析した画像と、感情情報と、比較の結果とに基づいて、感情表現が正確に伝わるように、ユーザの顔における表情を加工する。例えば、ユーザが感情情報として、「楽しい」と入力した場合、加工器１０は、「楽しい」という感情表現が正確に伝わるように、表情を加工する。これにより、「楽しい」という感情表現が補完されるように、表情を加工することが可能となる。また、表情の加工は既存のアルゴリズムなどを利用することができる。また、加工器１０は、ＡＩ９が確認した顔と目線の向きに基づいて、目線を加工することを含めてもよい。例えば、加工器１０は、目線が正面を向くように合成することや、適切な頻度または時間で相手側ユーザと目線を合わせるように加工することが考えられる。適切な頻度または時間とは、例えば、目が合う長さを２～３秒に設定する、１分間あたりの目線の合う長さは２０～３０秒以内にする、または話し出すタイミングと話し終わるタイミングでアイコンタクトを取るなどの加工が考えられる。 The processor 10 processes the expression on the user's face based on the emotion-analyzed image, the emotion information, and the comparison results so that the emotional expression is accurately conveyed. For example, when the user inputs "fun" as emotional information, the processor 10 processes the facial expression so that the emotional expression "fun" is accurately conveyed. This makes it possible to process facial expressions to complement the emotional expression of "fun." Additionally, existing algorithms can be used to process facial expressions. The processing device 10 may also process the line of sight based on the direction of the face and line of sight confirmed by the AI 9. For example, it is conceivable that the processing device 10 may synthesize the images so that the line of sight is facing forward, or process the images so that the user's line of sight matches that of the other user at an appropriate frequency or time. Appropriate frequency or time is, for example, setting the length of eye contact to 2 to 3 seconds, keeping the length of eye contact within 20 to 30 seconds per minute, or setting the time when you start talking and when you finish speaking. Processing such as making eye contact can be considered.

アバター生成器１１は、加工器１０により表情を加工した画像から、ユーザのアバターを生成する。アバターの生成には、既存のアルゴリズムなどを利用することができる。アバターの表情は、ユーザの顔を加工した画像と対応する表情となるように生成される。生成するアバターは、２次元または３次元キャラクターいずれであってもよい。また、アバターを生成するかどうかは、ユーザが選択することができる。例えば、ボタンアイコンで表されるユーザインタフェース（不図示）によって、ユーザが表情を加工した画像またはアバターのいずれを出力するか切換えできるようにすることが考えられる。 The avatar generator 11 generates a user's avatar from the image whose facial expression has been processed by the processing device 10. Existing algorithms can be used to generate avatars. The avatar's facial expression is generated to correspond to the processed image of the user's face. The avatar to be generated may be either a two-dimensional or three-dimensional character. Furthermore, the user can select whether or not to generate an avatar. For example, a user interface (not shown) represented by a button icon may allow the user to switch between outputting an image with processed facial expressions or an avatar.

図４は、第１実施形態における表情加工プログラムのフローチャートを示す。ここでは、ユーザの顔を含む画像および感情情報に基づいて、アバターを表示装置４’に出力するフローを説明する。 FIG. 4 shows a flowchart of the facial expression processing program in the first embodiment. Here, a flow for outputting an avatar to the display device 4' based on an image including the user's face and emotional information will be described.

ステップＳ１１では、ユーザが、感情入力部５に入力装置３を用いて感情情報を入力する。 In step S11, the user inputs emotional information to the emotional input section 5 using the input device 3.

ステップＳ１２では、画像処理部８におけるＡＩ９が、撮像装置２からユーザの顔を含む画像を受け付ける。ステップＳ１３では、ＡＩ９が、ステップＳ１２で受け付けた画像から、顔の表情および目線を検知する。ステップＳ１４では、ＡＩ９が、顔の表情および目線を検知した画像から、感情を分析する。 In step S12, the AI 9 in the image processing unit 8 receives an image including the user's face from the imaging device 2. In step S13, the AI 9 detects facial expressions and line of sight from the image received in step S12. In step S14, the AI 9 analyzes emotions from the image in which facial expressions and line of sight are detected.

ステップＳ１５では、ＡＩ９が、感情を分析した画像と、感情情報と、記憶部６に記憶される教師画像７とを比較し、感情表現に足りない顔の動きを分析する。ステップＳ１６では、加工器１０が、比較の結果から感情情報に対応するようにユーザの顔を含む画像を加工する。ステップＳ１７では、アバター生成器１１が、表情を加工した画像に基づいてユーザのアバターを生成する。ステップＳ１８では、アバター生成器１１が、生成したアバターの画像を表示装置４’に出力する。 In step S15, the AI 9 compares the emotion-analyzed image, the emotion information, and the teacher image 7 stored in the storage unit 6, and analyzes facial movements that are insufficient to express emotion. In step S16, the processor 10 processes the image including the user's face so that it corresponds to the emotional information based on the comparison result. In step S17, the avatar generator 11 generates the user's avatar based on the image with processed facial expressions. In step S18, the avatar generator 11 outputs the generated avatar image to the display device 4'.

このフローチャートでは、ステップＳ１２からＳ１４までのフローは、ステップＳ１１のフローの後に行われる記載となっているが、ステップＳ１１のフローの前に行われてもよい。つまり、感情情報の入力の後に感情の分析を行う方法としてもよく、また感情の分析を先に完了させた後、ユーザから感情情報の入力を受け付ける方法としてもよい。 In this flowchart, steps S12 to S14 are described as being performed after the flow of step S11, but they may be performed before the flow of step S11. In other words, a method may be adopted in which emotion analysis is performed after the input of emotional information, or a method may be adopted in which emotion analysis is completed first and then input of emotional information is received from the user.

次に、図５～図８を参照して、ユーザが、感情入力部５に「楽しい」という感情情報を入力した場合における入出力画像の例を説明する。 Next, with reference to FIGS. 5 to 8, examples of input and output images when the user inputs emotional information such as "fun" into the emotion input section 5 will be described.

図５は、撮像装置２によって撮像したユーザの顔を含む画像を表示装置４’に表示した例である。また、図６は、第１実施形態におけるユーザの表情を加工した画像を表示装置４’に表示した例である。図５の画像は、「無表情」であるユーザ２０の例を表している。この例では、ＡＩ９は、図５のユーザの顔を含む画像から、顔の表情および目線を検知し、ユーザの感情について、「無表情」であると分析する。またＡＩ９は、「無表情」と分析された画像と、「楽しい」という感情情報と、手本となる教師画像とを比較し、感情表現に足りない顔の動きを分析する。図５では、感情表現に足りない顔の動きは、口元であると確認され、加工器１０は、図６で示すとおり、口元について、「楽しい」という感情表現になるようにユーザの顔の表情を加工する。加工器１０は、表示装置４’に「無表情」であるユーザ２０を表示する代わりに、「楽しい」という感情情報に対応するユーザ２１を含む画像を出力する。 FIG. 5 is an example in which an image including the user's face captured by the imaging device 2 is displayed on the display device 4'. Further, FIG. 6 is an example in which an image obtained by processing the user's facial expression in the first embodiment is displayed on the display device 4'. The image in FIG. 5 represents an example of the user 20 who is "expressionless". In this example, the AI 9 detects facial expressions and line of sight from the image including the user's face shown in FIG. 5, and analyzes the user's emotion as being "expressionless." The AI9 also compares images analyzed as ``expressionless'', emotional information such as ``fun'', and a teacher image that serves as a model, and analyzes facial movements that are insufficient to express emotion. In FIG. 5, it is confirmed that the facial movement that is insufficient to express the emotion is the mouth, and the processing device 10 adjusts the user's facial expression so that the mouth expresses the emotion "fun" as shown in FIG. Process. Instead of displaying the user 20 with "no expression" on the display device 4', the processing device 10 outputs an image including the user 21 corresponding to the emotional information "fun".

次にアバター生成器１１が、表情を加工したアバターを出力する例を示す。図７は、「無表情」のユーザに対応するアバターを表示装置４’に表示した例である。表情加工プログラムによるユーザの表情の加工を行わずに、「無表情」であるユーザ２０に対してアバターを生成すると、図７のように表される。図７では、「無表情」であるユーザ２０に対応する表情として、「無表情」という顔の表情に対応するアバター２２が出力される。一方で、図８は、第１実施形態におけるアバターを表示装置４’に表示した例である。アバター生成器１１は、図６のように、「楽しい」という感情情報に対応するユーザ２１を含む画像の入力を受け付ける。アバター生成器１１は、図８に示すように「楽しい」という顔の表情に対応するアバター２３を生成する。そして、アバター生成器１１は、表示装置４’に、生成したアバターを含む画像を出力する。この例では、犬の２次元キャラクターをアバターとして生成する例を示すが、２次元キャラクターに限定されず、３次元キャラクターなど、様々なアバターを生成することができる。 Next, an example will be shown in which the avatar generator 11 outputs an avatar with processed facial expressions. FIG. 7 is an example in which an avatar corresponding to a user with "no expression" is displayed on the display device 4'. If an avatar is generated for the user 20 who is "expressionless" without processing the user's facial expression using a facial expression processing program, it will be represented as shown in FIG. In FIG. 7, the avatar 22 corresponding to the facial expression "no expression" is output as the expression corresponding to the user 20 who is "no expression". On the other hand, FIG. 8 is an example in which the avatar in the first embodiment is displayed on the display device 4'. The avatar generator 11 receives input of an image including the user 21 corresponding to the emotional information "fun", as shown in FIG. The avatar generator 11 generates an avatar 23 corresponding to the facial expression of "fun" as shown in FIG. Then, the avatar generator 11 outputs an image including the generated avatar to the display device 4'. In this example, a two-dimensional dog character is generated as an avatar, but the present invention is not limited to two-dimensional characters, and various avatars such as three-dimensional characters can be generated.

なお、表情加工プログラムは、情報処理装置１にインストールして実行する代わりに、外部のサーバ（不図示）によって実行してもよい。例えば、外部のサーバは、ブラウザを介して入力装置３および撮像装置２の入力を受け付けた後、ユーザの表情を加工した画像やアバターを生成し、ユーザの表示装置４’に表示してもよい。 Note that the facial expression processing program may be executed by an external server (not shown) instead of being installed and executed in the information processing device 1. For example, after receiving input from the input device 3 and the imaging device 2 via the browser, the external server may generate an image or avatar in which the user's facial expression is processed and display it on the user's display device 4'. .

本実施形態によれば、例えば、ユーザは、自分の足りない感情表現スキルを表情加工プログラムにより補うことができ、自分が意図しない内容で相手に伝わるのを防ぐことができる。また、本実施形態によれば、画面越しでも豊かな表情を感じられるため、ユーザは、オンライン会議などにおいて、相手側ユーザに対してスムーズに発言の交代を促し、コミュニケーションを円滑に進めることができる。 According to the present embodiment, for example, the user can compensate for his or her lacking emotional expression skills with the facial expression processing program, and can prevent the user from conveying content that is not intended by the user to the other party. Furthermore, according to the present embodiment, rich facial expressions can be felt even through the screen, so the user can smoothly encourage the other party to change the conversation in an online meeting, etc., thereby facilitating smooth communication. .

また、表情が豊かに表現されたアバターを通じてコミュニケーションをとることで、ユーザは、オンライン会議において、齟齬なく意思の疎通を図ることができる。また、ユーザは、感情が伝わるアバターを通じて、相手側ユーザに話しかけることで、相手側ユーザが自分のことを知らない場合でも、親近感を持ってもらうことができる。 In addition, by communicating through avatars with rich facial expressions, users can communicate their intentions seamlessly in online meetings. Furthermore, by talking to the other user through an avatar that conveys emotions, the user can create a sense of intimacy with the other user even if the other user does not know the user.

また、本実施形態によれば、ストリーミング配信において、配信者であるユーザは、感情をより鮮明に視聴者に届けることができる。 Further, according to the present embodiment, in streaming distribution, the user who is the distributor can more clearly convey emotions to the viewers.

また、ＡＩ９を用いることで、ユーザの顔を含む画像と、教師画像７とを比較し、ユーザが感情表現に足りない顔の動きを分析することができる。この比較結果から、加工器１０により表情を含む画像を加工することで、ユーザは、正しく感情を伝えることができる。 Furthermore, by using the AI 9, an image including the user's face can be compared with the teacher image 7, and the user can analyze facial movements that are insufficient for emotional expression. From this comparison result, by processing the image including facial expressions using the processing device 10, the user can accurately convey emotions.

また、ＲＧＢ撮像装置を用いることで、白黒撮像装置とは異なり、正確な視線の検知や感情の分析が可能となる。ＲＧＢＩＲ撮像装置を用いることで、ＲＧＢ情報に加えて、深度情報が得られるため、表情の細かな変化を捉えることができる。また、被写体の色合いや照度にかかわらず、視線を検知や感情を分析することが可能となる。また、近赤外画像では、瞳孔を追跡することができるため、安定した目線の検出が可能となる。 Furthermore, by using an RGB imaging device, it is possible to accurately detect line of sight and analyze emotions, unlike a monochrome imaging device. By using an RGBIR imaging device, depth information can be obtained in addition to RGB information, so minute changes in facial expressions can be captured. Additionally, it is possible to detect the subject's line of sight and analyze their emotions, regardless of the subject's color or illuminance. Furthermore, in a near-infrared image, the pupil can be tracked, making it possible to stably detect the line of sight.

（第２実施形態）
図９は、第２実施形態における表情加工プログラムをインストールした情報処理装置１のシステムブロック図である。 (Second embodiment)
FIG. 9 is a system block diagram of the information processing device 1 in which the facial expression processing program according to the second embodiment is installed.

第２実施形態では、情報処理装置１は、第１実施形態における構成に加え、場面入力部１２を備える。場面入力部１２は、例えばプルダウンメニューで表されるユーザインタフェース（不図示）を提供する。場面入力部１２は、ユーザから、例えば入力装置３を用いて場面情報の入力を受け付ける。場面情報は例えば、「ディスカッション」、「プレゼンテーション」または「チャット」といった、場面の雰囲気を表す情報であり、複数のカテゴリから選択することができる。また、場面入力部１２は、プルダウンメニューによる選択に限定されず、マウスによるアイコンの選択、キーボードによる文字入力、マイクによる音声の入力またはセンサーによる入力など、様々な入力方法を採用することができる。このように、場面入力部１２への入力は、マウス以外の入力装置３（例えば、キーボードやマイク）を用いて行われてもよい。 In the second embodiment, the information processing device 1 includes a scene input section 12 in addition to the configuration in the first embodiment. The scene input unit 12 provides a user interface (not shown) represented by a pull-down menu, for example. The scene input unit 12 receives input of scene information from the user using, for example, the input device 3. The scene information is information representing the atmosphere of the scene, such as "discussion," "presentation," or "chat," and can be selected from a plurality of categories. Further, the scene input unit 12 is not limited to selection using a pull-down menu, and can employ various input methods such as selecting an icon using a mouse, inputting characters using a keyboard, inputting voice using a microphone, or inputting using a sensor. In this way, input to the scene input section 12 may be performed using the input device 3 (for example, a keyboard or a microphone) other than the mouse.

本実施形態における画像処理部８は、ユーザの顔を含む画像と、ユーザからの感情情報と、場面情報とを受け付け、感情情報と、場面情報とに基づいて、この画像から、ユーザの顔の表情を加工した画像を生成する。 The image processing unit 8 in this embodiment receives an image including the user's face, emotional information from the user, and scene information, and analyzes the user's face from this image based on the emotional information and scene information. Generates an image with processed facial expressions.

図１０は、第２実施形態における表情加工プログラムのフローチャートを示す。ここでは、ユーザの顔を含む画像と、感情情報と、場面情報とに基づいて、ユーザの顔の表情を加工した画像を生成し、アバターとして表示装置４’に出力するフローを説明する。また、ステップＳ２１およびＳ２３～２５については、説明を省略する。 FIG. 10 shows a flowchart of the facial expression processing program in the second embodiment. Here, a flow will be described in which an image in which the user's facial expression is processed is generated based on an image including the user's face, emotional information, and scene information, and is output as an avatar to the display device 4'. Furthermore, descriptions of steps S21 and S23 to S25 will be omitted.

ステップＳ２２では、ユーザが場面入力部１２に入力装置３を用いて場面情報を入力する。ステップＳ２６では、ＡＩ９が、感情を分析した画像と、感情情報と、場面情報と、記憶部６に記憶される教師画像７を比較し、感情表現に足りない顔の動きを分析する。ステップＳ２７では、加工器１０が、比較の結果から感情情報に対応するようにユーザの顔を含む画像を加工する。ステップＳ２８では、アバター生成器１１が表情を加工した画像に基づいてユーザのアバターを生成する。アバターの表情は、ユーザの顔を加工した画像と対応する表情となるように生成される。ステップＳ２９では、アバター生成器１１が、生成したアバターの画像を表示装置４’に出力する。 In step S22, the user inputs scene information into the scene input section 12 using the input device 3. In step S26, the AI 9 compares the emotion-analyzed image, emotion information, scene information, and the teacher image 7 stored in the storage unit 6, and analyzes facial movements that are insufficient to express emotion. In step S27, the processor 10 processes the image including the user's face so that it corresponds to the emotional information based on the comparison result. In step S28, the avatar generator 11 generates the user's avatar based on the image with processed facial expressions. The avatar's facial expression is generated to correspond to the processed image of the user's face. In step S29, the avatar generator 11 outputs the generated avatar image to the display device 4'.

このフローチャートでは、ステップＳ２３からＳ２５までのフローは、ステップＳ２１およびＳ２２のフローの後に行われる記載となっているが、ステップＳ２１およびＳ２２のフローの前に行われてもよい。つまり、感情情報および場面情報の入力の後に感情の分析を行う方法としてもよく、また感情の分析を先に完了させた後、ユーザから感情情報および場面情報の入力を受け付ける方法としてもよい。 In this flowchart, steps S23 to S25 are described as being performed after steps S21 and S22, but may be performed before steps S21 and S22. That is, a method may be adopted in which emotion analysis is performed after the input of emotional information and scene information, or a method may be adopted in which emotion analysis is completed first and then input of emotional information and scene information is received from the user.

次に、ユーザが、感情入力部５に「楽しい」という感情情報を入力し、場面入力部１２に、「チャット」という場面情報を入力した場合における出力画像の例を説明する。ユーザの顔を含む画像ついては、実施形態１と同様に図５を用いて説明する。図１１は第２実施形態におけるユーザの表情を加工した画像を表示装置４’に表示した例である。この例では、場面情報である「チャット」という砕けたコミュニケーションであることを考慮して、楽しいという感情表現について、さらに強調を加えている。第１実施形態とは異なり、ＡＩ９によって、感情表現に足りない顔の動きは、口元に加え、目元であると分析される。加工器１０は、図１１で示すとおり、口元および目元について、「楽しい」という感情表現になるように加工する。表示装置４’には、「無表情」であるユーザ２０を表示する代わりに、「楽しい」表情が強調されたユーザ２４を含む画像を出力する。 Next, an example of an output image when the user inputs emotional information such as "fun" into the emotional input section 5 and scene information such as "chat" into the scene input section 12 will be described. The image including the user's face will be described using FIG. 5 as in the first embodiment. FIG. 11 is an example in which an image processed with a user's facial expression according to the second embodiment is displayed on the display device 4'. In this example, considering the informal communication of "chat" which is situational information, more emphasis is placed on the emotional expression of fun. Unlike the first embodiment, the AI9 analyzes facial movements that are insufficient to express emotion as being in the eyes in addition to the mouth. As shown in FIG. 11, the processing device 10 processes the mouth and eyes so that they express the emotion of "fun." Instead of displaying the user 20 with a "neutral" expression, the display device 4' outputs an image including the user 24 with an emphasized "happy" expression.

アバターとして出力する場合、アバター生成器１１は、ユーザの表情を加工した画像からアバターを生成し、表示装置４’に出力することができる。図１２は、第２実施形態におけるアバターを表示装置４’に表示した例である。図１１におけるユーザの顔の表情と対応するように「楽しい」表情が強調されたアバター２５を含む画像が出力される。 When outputting as an avatar, the avatar generator 11 can generate the avatar from an image processed with the user's facial expression and output it to the display device 4'. FIG. 12 is an example in which the avatar according to the second embodiment is displayed on the display device 4'. An image including the avatar 25 with an emphasized "happy" expression corresponding to the facial expression of the user in FIG. 11 is output.

本実施形態によれば、情報処理装置１は、ユーザによる感情情報だけではなく、場面情報を考慮することにより、場面の雰囲気に合わせてユーザの顔の表情を加工した画像を出力することができる。そのため、ユーザは、自分が表現したい感情をより正確に、かつ相手に伝わるように表現することができ、コミュニケーションをより向上させることができる。 According to the present embodiment, the information processing device 1 is able to output an image in which the user's facial expression has been processed to match the atmosphere of the scene by considering not only emotional information by the user but also scene information. . Therefore, the user can express the emotions he/she wants to express more accurately and in a way that can be conveyed to the other party, and communication can be further improved.

（第３実施形態）
図１３は、第３実施形態における表情加工プログラムをインストールした情報処理装置１のシステムブロック図である。 (Third embodiment)
FIG. 13 is a system block diagram of the information processing device 1 in which the facial expression processing program according to the third embodiment is installed.

第３実施形態では、情報処理装置１は、第１実施形態における構成に加え、フィードバック入力部１３を備える。フィードバック入力部１３は、例えばプルダウンメニューで表されるユーザインタフェース（不図示）を提供する。フィードバック入力部１３は、加工器１０が生成したユーザの顔の表情を加工して得られた画像、またはアバター生成器１１が生成したアバターの表現が最適かどうかについて、ユーザからフィードバックの入力を受け付ける。フィードバックは例えば、「良」または「否」といった、ユーザの顔の表情を加工して得られた画像に関する良否を表す情報であり、ユーザは、フィードバックについて、複数のカテゴリから選択することができる。また、フィードバック入力部１３は、プルダウンメニューによる選択に限定されず、マウスによるアイコンの選択、キーボードによる文字入力またはマイクによる音声の入力など、様々な入力方法を採用することができる。このように、フィードバック入力部１３への入力は、マウス以外の入力装置（例えば、キーボードやマイク）を用いて行われてもよい。 In the third embodiment, the information processing device 1 includes a feedback input unit 13 in addition to the configuration in the first embodiment. The feedback input unit 13 provides a user interface (not shown) represented by a pull-down menu, for example. The feedback input unit 13 receives feedback input from the user regarding whether the image obtained by processing the user's facial expression generated by the processor 10 or the expression of the avatar generated by the avatar generator 11 is optimal. . The feedback is, for example, information such as "good" or "fail" indicating the quality of the image obtained by processing the user's facial expression, and the user can select the feedback from a plurality of categories. Further, the feedback input unit 13 is not limited to selection using a pull-down menu, and can employ various input methods such as selecting an icon using a mouse, inputting characters using a keyboard, or inputting voice using a microphone. In this way, input to the feedback input unit 13 may be performed using an input device other than the mouse (for example, a keyboard or a microphone).

本実施形態では、加工器１０が生成したユーザの顔の表情を加工して得られた画像またはアバター生成器１１が生成したアバターの表現が最適かどうかについて、ユーザがフィードバックを与える。これにより、画像処理部８が、ユーザの表情の加工の再生成またはアバターの再生成を行うことができる。また、ＡＩ９がフィードバックに基づいて追加学習を行うことができる。 In this embodiment, the user gives feedback as to whether the image obtained by processing the user's facial expression generated by the processing device 10 or the expression of the avatar generated by the avatar generator 11 is optimal. Thereby, the image processing unit 8 can regenerate the processing of the user's facial expression or regenerate the avatar. Additionally, the AI 9 can perform additional learning based on feedback.

図１４は、第３実施形態における表情加工プログラムのフローチャートを示す。このフローチャートでは、アバター生成器１１が生成したアバターに対して、フィードバックを与える例を示す。 FIG. 14 shows a flowchart of the facial expression processing program in the third embodiment. This flowchart shows an example in which feedback is given to the avatar generated by the avatar generator 11.

Ｓ３１からＳ３８においては、図４と同様なフローにより、アバター出力を実施する。ステップＳ３９において、ユーザは、表示装置４’に表示されたアバターの表現が最適かどうかについて、フィードバック入力部１３にフィードバックを入力する。 In S31 to S38, avatar output is performed according to a flow similar to that in FIG. In step S39, the user inputs feedback into the feedback input section 13 regarding whether the expression of the avatar displayed on the display device 4' is optimal.

ステップＳ３９において、ユーザが、アバターの表現が最適でないというフィードバックを入力した場合、再度ステップＳ３６からＳ３８のフローを繰り返し、アバターの再生成および再出力を行う。ここで、ステップＳ３６において、加工器１０が、表情の加工を行う際は、別の教師画像７を参照して表情の加工などを行うことが考えられる。 In step S39, if the user inputs feedback that the expression of the avatar is not optimal, the flow from steps S36 to S38 is repeated again to regenerate and reoutput the avatar. Here, in step S36, when processing the facial expression, the processor 10 may refer to another teacher image 7 to process the facial expression.

また、ステップＳ３９において、ユーザが、アバターの表現が最適であるというフィードバックを入力した場合、ステップＳ４０において、ステップＳ３８で作成されたアバターの出力を継続する。ステップＳ４１において、ＡＩ９が、ステップＳ３９でユーザから得られたフィードバックに基づいて追加学習を行う。ＡＩ９が、フィードバックに基づいた追加学習を行うことで、表情の加工の精度を向上させることができる。 Further, in step S39, if the user inputs feedback indicating that the expression of the avatar is optimal, in step S40, the output of the avatar created in step S38 is continued. In step S41, the AI 9 performs additional learning based on the feedback obtained from the user in step S39. By performing additional learning based on feedback, the AI 9 can improve the accuracy of facial expression processing.

また、本フローチャートでは、ユーザは、アバターの表現が最適かどうかについて、フィードバックを入力する例を説明したが、アバターを出力しない場合は、ユーザの顔の表情を加工して得られた画像についてフィードバックを入力する。 In addition, in this flowchart, an example was explained in which the user inputs feedback regarding whether the expression of the avatar is optimal, but if the avatar is not output, feedback is provided regarding the image obtained by processing the user's facial expression. Enter.

また、アバターやユーザの顔の表情を加工して得られた画像は表示装置４’に出力する前に、ユーザが表情を確認するため、表示装置４に出力することとしてもよい。この場合、表示装置４に出力されたアバターなどに基づいてフィードバックを入力する。 Further, the image obtained by processing the facial expression of the avatar or the user may be output to the display device 4 for the user to check the facial expression before being output to the display device 4'. In this case, feedback is input based on the avatar etc. output to the display device 4.

本実施形態によれば、情報処理装置１は、ユーザによる感情情報に加え、フィードバックの入力を受け付ける。アバターやユーザの顔の表情を加工して得られた画像について、最適な表現となるように再度加工を行うことで、ユーザは、自分が表現したい感情をより正確に、かつ相手に伝わるように表現することができ、コミュニケーションをより向上させることができる。 According to this embodiment, the information processing device 1 receives input of feedback from the user in addition to emotional information. By reprocessing images obtained by processing avatars and user's facial expressions to achieve optimal expression, users can more accurately convey the emotions they want to express to others. You can express yourself and improve your communication.

（第４実施形態）
図１５は、第４実施形態における表情加工プログラムをインストールした情報処理装置１のシステムブロック図である。 (Fourth embodiment)
FIG. 15 is a system block diagram of the information processing device 1 in which the facial expression processing program according to the fourth embodiment is installed.

第４実施形態では、情報処理装置１は、第１実施形態における構成に加え、感情情報取得部１４を備える。感情情報取得部１４は、オンライン会議などにおける相手側ユーザの顔を含む画像に基づいて、相手側ユーザの感情情報を取得する。相手側ユーザの顔を含む画像は、本開示の第３画像の例である。以下、説明を分かりやすくするため、表情の加工を行うユーザを「ユーザＡ」とし、相手側ユーザを「ユーザＢ」として説明する。ユーザＢの顔を含む画像は、例えば、撮像装置２’によって撮像された画像を、情報処理装置１がオンライン会議などで受信した後、情報処理装置１の感情情報取得部１４で受け付けることが考えられる。 In the fourth embodiment, the information processing device 1 includes an emotional information acquisition unit 14 in addition to the configuration in the first embodiment. The emotional information acquisition unit 14 acquires emotional information of the other user based on an image including the other user's face in an online conference or the like. The image including the other user's face is an example of the third image of the present disclosure. Hereinafter, in order to make the explanation easier to understand, the user who processes facial expressions will be referred to as "user A", and the other user will be referred to as "user B". It is conceivable that the image including the user B's face is received by the emotional information acquisition unit 14 of the information processing device 1 after the information processing device 1 receives the image captured by the imaging device 2' in an online conference or the like. It will be done.

本実施形態における情報処理装置１の画像処理部８は、撮像装置２により撮像されたユーザＡの顔を含む画像と、ユーザＡの感情情報と、ユーザＢの感情情報とを受け付け、ユーザＡの顔の表情を加工した画像を生成する。 The image processing unit 8 of the information processing device 1 in this embodiment receives an image including the user A's face captured by the imaging device 2, user A's emotional information, and user B's emotional information, and receives user A's emotional information. Generates an image with processed facial expressions.

図１６は、第４実施形態における表情加工プログラムのフローチャートを示す。ここでは、ユーザＡの顔を含む画像と、ユーザＡの感情情報と、ユーザＢの感情情報とに基づいて、ユーザＡのアバターをユーザＢの表示装置４’に出力するフローを説明する。また、ステップＳ５１およびＳ５５～Ｓ５７については説明を省略する。 FIG. 16 shows a flowchart of the facial expression processing program in the fourth embodiment. Here, a flow will be described in which an avatar of user A is output to the display device 4' of user B based on an image including the face of user A, emotional information of user A, and emotional information of user B. Furthermore, descriptions of steps S51 and S55 to S57 will be omitted.

ステップＳ５２では、情報処理装置１の感情情報取得部１４は、ユーザＢの画像を受け付ける。ステップＳ５３では、この感情情報取得部１４が、受け付けた画像から、ユーザＢの顔の表情および目線を検知する。ステップＳ５４では、この感情情報取得部１４が、表情および目線を検知した結果から、ユーザＢの感情を分析する。この感情情報取得部１４は、例えば、ＡＩ９と同様な手法により感情を分析することが考えられる。この感情情報取得部１４は、畳み込みニューラルネットワークを利用し、受け付けた画像に対して畳み込み演算を実施することにより、画像における特徴を抽出する。この感情情報取得部１４は、抽出した特徴に基づいて、ユーザＢの顔の表情および目線を検知し、感情を分析する。ここで分析されるユーザＢの感情は、感情を何らかの形で数値化したものでもよく、または、「無表情」といった定性的に表現したものでもよい。このように、ステップＳ５２～Ｓ５４を通じて、この感情情報取得部１４は、ユーザＢの感情情報を取得する。 In step S52, the emotional information acquisition unit 14 of the information processing device 1 receives the image of user B. In step S53, the emotional information acquisition unit 14 detects user B's facial expression and line of sight from the received image. In step S54, the emotion information acquisition unit 14 analyzes the emotion of user B based on the detected facial expression and line of sight. It is conceivable that the emotion information acquisition unit 14 analyzes emotions using a method similar to the AI 9, for example. The emotional information acquisition unit 14 extracts features in the image by performing a convolution operation on the received image using a convolutional neural network. The emotional information acquisition unit 14 detects the facial expression and line of sight of user B based on the extracted features, and analyzes the emotion. The emotion of user B analyzed here may be expressed numerically in some form, or may be qualitatively expressed such as "expressionless." In this way, the emotional information acquisition unit 14 acquires user B's emotional information through steps S52 to S54.

ステップＳ５８では、情報処理装置１のＡＩ９が、ユーザＡの感情を分析した画像と、ユーザＡの感情情報と、ユーザＢの感情情報と、情報処理装置１の記憶部６に保存される教師画像７を比較し、ユーザＡの感情表現に足りない顔の動きを分析する。ステップＳ５９では、情報処理装置１の加工器１０が、比較の結果から、感情情報に対応するようにユーザＡの顔を含む画像を加工する。ステップＳ６０では、情報処理装置１のアバター生成器１１が表情を加工した画像に基づいてユーザＡのアバターを生成する。アバターの表情は、ユーザＡの顔を加工した画像と対応する表情となるように生成される。ステップＳ６１では、生成したアバターの画像を表示装置４’に出力する。 In step S58, the AI 9 of the information processing device 1 analyzes the image of the user A's emotions, the emotional information of the user A, the emotional information of the user B, and the teacher image stored in the storage unit 6 of the information processing device 1. 7 and analyze user A's facial movements that are insufficient for emotional expression. In step S59, the processing device 10 of the information processing device 1 processes the image including the user A's face so as to correspond to the emotional information based on the comparison result. In step S60, the avatar generator 11 of the information processing device 1 generates an avatar of the user A based on the facial expression-processed image. The facial expression of the avatar is generated to correspond to the processed image of user A's face. In step S61, the generated avatar image is output to the display device 4'.

本実施形態によれば、情報処理装置１は、ユーザＡによる感情情報に加え、ユーザＢの感情情報も反映してユーザの表情の加工を行う。そのため、ユーザＡは、ユーザＢの感情に応じて、自分が表現したい感情をより正確に、かつ相手に伝わるように表現することができ、コミュニケーションをより向上させることができる。 According to this embodiment, the information processing device 1 processes the user's facial expression by reflecting not only the emotional information of the user A but also the emotional information of the user B. Therefore, user A can express the emotions he/she wants to express more accurately and in a way that can be conveyed to the other party according to user B's emotions, and communication can be further improved.

その他、表情加工プログラムの応用例として、以下の内容が考えられる。例えば、表情加工プログラムをコールセンターのクレーム対応に応用することが考えられる。オペレータは、アバターを通じて感情を伝えることで、顧客から必要以上に攻撃的な発言を防ぐことができる。 Other possible applications of the facial expression processing program include the following. For example, it is possible to apply facial expression processing programs to handling complaints at call centers. By conveying emotions through avatars, operators can prevent customers from making unnecessary offensive comments.

別の例として、表情加工プログラムを無人店舗の防犯として応用することが考えられる。人間に近い表情を持つアバターが店舗を見守ることで、万引きや異常行動の抑制につなげることができる。 Another example would be to apply the facial expression processing program to crime prevention in unmanned stores. By having avatars with facial expressions similar to those of humans watching over stores, it is possible to curb shoplifting and abnormal behavior.

別の例として、表情加工プログラムを無人店舗におけるデジタルサイネージを活用したリモート対応に応用することが考えられる。アバターを通じて店員の表情を豊かに表現し、リモートからデジタルサイネージに映し出すことで、アバターが対面と同等の接客を行うことができる。 Another example would be to apply the facial expression processing program to remote support using digital signage in unmanned stores. By expressing the store clerk's facial expressions richly through the avatar and displaying it remotely on digital signage, the avatar can provide the same level of customer service as in-person customer service.

（第５実施形態）
図１７に、第５実施形態における情報処理装置１のハードウェア構成の一例を示す。第５実施形態の情報処理装置１は、第１実施形態の情報処理装置１の一例に相当する。 (Fifth embodiment)
FIG. 17 shows an example of the hardware configuration of the information processing device 1 in the fifth embodiment. The information processing device 1 of the fifth embodiment corresponds to an example of the information processing device 1 of the first embodiment.

本実施形態に係る情報処理装置１は、コンピュータ装置２００により構成される。コンピュータ装置２００は、ＣＰＵ（Central Processing Unit）２０１、主記憶装置２０２、補助記憶装置２０３、通信インタフェース２０４および入出力インタフェース２０５を備え、これらはバス２０６により相互に接続されている。 The information processing device 1 according to this embodiment is configured by a computer device 200. The computer device 200 includes a CPU (Central Processing Unit) 201, a main storage device 202, an auxiliary storage device 203, a communication interface 204, and an input/output interface 205, which are interconnected by a bus 206.

ＣＰＵ２０１は、主記憶装置２０２上で、情報処理装置１の上述の各機能構成を実現するコンピュータプログラムを実行する。ＣＰＵ２０１が、コンピュータプログラムを実行することにより、図３の感情入力部５および画像処理部８の機能が実現される。このコンピュータプログラムは、例えば表情加工プログラム、オンライン会議用のプログラムまたはストリーミング配信用のプログラムである。 The CPU 201 executes computer programs that implement the above-described functional configurations of the information processing device 1 on the main storage device 202 . When the CPU 201 executes a computer program, the functions of the emotion input section 5 and the image processing section 8 shown in FIG. 3 are realized. This computer program is, for example, a facial expression processing program, an online conference program, or a streaming distribution program.

主記憶装置２０２は、本実施形態の処理を実現するプログラム、およびプログラムの実行に必要なデータ、およびプログラムの実行により生成されたデータなどを記憶する。プログラムは、主記憶装置２０２上で展開され、実行される。主記憶装置２０２は、例えば、ＲＡＭ（Random Access Memory）であるが、これに限られない。 The main storage device 202 stores programs that implement the processing of this embodiment, data necessary for executing the programs, data generated by executing the programs, and the like. The program is expanded on the main storage device 202 and executed. The main storage device 202 is, for example, a RAM (Random Access Memory), but is not limited to this.

補助記憶装置２０３は、上記プログラムおよびプログラムの実行に必要なデータ、およびプログラムの実行により生成されたデータなどを記憶する。これらのプログラムやデータは、本実施形態の処理の際に主記憶装置２０２に読み出される。補助記憶装置２０３は、例えば、ハードディスク、光ディスク、フラッシュメモリ、および磁気テープであるが、これに限られない。図３の記憶部６は、補助記憶装置２０３上に構築されてもよい。 The auxiliary storage device 203 stores the program, data necessary for executing the program, data generated by executing the program, and the like. These programs and data are read into the main storage device 202 during processing of this embodiment. The auxiliary storage device 203 is, for example, a hard disk, an optical disk, a flash memory, and a magnetic tape, but is not limited thereto. The storage unit 6 in FIG. 3 may be constructed on the auxiliary storage device 203.

通信インタフェース２０４は、オンライン会議などにおいて、外部コンピュータ装置と、有線または無線で通信を行うための回路である。 The communication interface 204 is a circuit for communicating by wire or wirelessly with an external computer device during an online conference or the like.

入出力インタフェース２０５は、撮像装置２、入力装置３の例であるマウスのほか、キーボードおよびマイクなどの入力装置などの入力装置や、表示装置４などの出力装置と接続するための回路である。 The input/output interface 205 is a circuit for connecting with the imaging device 2 and the mouse, which is an example of the input device 3, as well as input devices such as a keyboard and microphone, and output devices such as the display device 4.

バス２０６は、ＣＰＵ２０１、主記憶装置２０２、補助記憶装置２０３、通信インタフェース２０４および入出力インタフェース２０５を相互に接続するための回路である。 Bus 206 is a circuit for interconnecting CPU 201, main storage 202, auxiliary storage 203, communication interface 204, and input/output interface 205.

なお、上述のプログラムは、コンピュータ装置２００に予めインストールされていてもよいし、ＣＤ－ＲＯＭなどの記憶媒体に記憶されていてもよい。また、当該プログラムは、インターネット上にアップロードされていてもよい。 Note that the above-mentioned program may be installed in advance on the computer device 200, or may be stored in a storage medium such as a CD-ROM. Further, the program may be uploaded onto the Internet.

なお、コンピュータ装置２００は、ＣＰＵ２０１、主記憶装置２０２、補助記憶装置２０３、通信インタフェース２０４および入出力インタフェース２０５をそれぞれ１つまたは複数備えてもよい。 Note that the computer device 200 may include one or more of each of a CPU 201, a main storage device 202, an auxiliary storage device 203, a communication interface 204, and an input/output interface 205.

また、情報処理装置１は、単一のコンピュータ装置２００により構成されてもよいし、相互に接続された複数のコンピュータ装置２００からなるシステムとして構成されてもよい。 Further, the information processing device 1 may be configured with a single computer device 200, or may be configured as a system consisting of a plurality of mutually connected computer devices 200.

本構成によれば、第１実施形態における表情加工プログラムの機能をソフトウェアにより実現することが可能となる。また、図１６では、図３における情報処理装置１のハードウェア構成の一例を示したが、その他の実施形態においても同様の構成により、表情加工プログラムの機能をソフトウェアにより実現することが可能である。 According to this configuration, the functions of the facial expression processing program in the first embodiment can be realized by software. Further, although FIG. 16 shows an example of the hardware configuration of the information processing device 1 in FIG. 3, it is possible to implement the functions of the facial expression processing program by software in other embodiments with a similar configuration. .

以上、本開示の実施形態について説明したが、これらの実施形態は、本開示の要旨を逸脱しない範囲内で、種々の変更を加えて実施してもよい。例えば、２つ以上の実施形態を組み合わせて実施してもよい。 Although the embodiments of the present disclosure have been described above, these embodiments may be implemented with various changes within the scope of the gist of the present disclosure. For example, two or more embodiments may be combined and implemented.

なお、本開示は、以下のような構成を取ることもできる。 Note that the present disclosure can also have the following configuration.

（１）
ユーザから感情情報の入力を受け付ける感情入力ステップと、
前記ユーザの顔を含む第１画像と、前記感情情報とを受け付け、前記感情情報に基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から第２画像を生成する画像処理ステップと、
を含む表情加工方法をコンピュータに実行させる表情加工プログラム。 (1)
an emotion input step of accepting emotional information input from the user;
A first image including the user's face and the emotional information are received, and the facial expression of the user in the first image is processed based on the emotional information, thereby converting the first image into a second image. an image processing step for generating
A facial expression processing program that causes a computer to execute facial expression processing methods including

（２）
前記第２画像に基づいて、ネットワーク上の仮想空間で前記ユーザの分身として表示するキャラクターであるアバターを生成するアバター生成ステップをさらに備える、（１）に記載の表情加工プログラム。 (2)
The facial expression processing program according to (1), further comprising an avatar generation step of generating an avatar, which is a character to be displayed as an alter ego of the user in a virtual space on a network, based on the second image.

（３）
前記アバターの表情は、前記第２画像における前記ユーザの顔の表情と対応するように生成される、（２）に記載の表情加工プログラム。 (3)
The facial expression processing program according to (2), wherein the facial expression of the avatar is generated to correspond to the facial expression of the user in the second image.

（４）
前記画像処理ステップは、前記第１画像中の前記ユーザの顔の表情を、前記感情情報に対応する表情に加工する、（１）に記載の表情加工プログラム。 (4)
The facial expression processing program according to (1), wherein the image processing step processes the facial expression of the user in the first image into an expression corresponding to the emotional information.

（５）
前記画像処理ステップは、人間の顔の表情を含む教師画像を取得して、前記第１画像と前記教師画像とを比較することで、前記第１画像中の前記ユーザの顔の表情を加工する、（１）に記載の表情加工プログラム。 (5)
The image processing step processes the user's facial expression in the first image by acquiring a teacher image including a human facial expression and comparing the first image and the teacher image. , the facial expression processing program described in (1).

（６）
前記画像処理ステップは、前記第１画像から前記ユーザの感情を分析し、前記分析の結果と、前記感情情報と、前記比較の結果とに基づいて、前記第１画像中の前記ユーザの顔の表情を加工する、（５）に記載の表情加工プログラム。 (6)
The image processing step analyzes the user's emotion from the first image, and analyzes the user's face in the first image based on the analysis result, the emotion information, and the comparison result. The facial expression processing program described in (5) that processes facial expressions.

（７）
前記第１画像は、撮像装置によって撮像された画像である、（１）に記載の表情加工プログラム。 (7)
The facial expression processing program according to (1), wherein the first image is an image captured by an imaging device.

（８）
前記撮像装置は、ＲＧＢ撮像装置またはＲＧＢＩＲ撮像装置である、（７）に記載の表情加工プログラム。 (8)
The facial expression processing program according to (7), wherein the imaging device is an RGB imaging device or an RGBIR imaging device.

（９）
前記ユーザから場面情報の入力を受け付ける場面入力ステップをさらに備え、
前記画像処理ステップは、前記第１画像と、前記感情情報と、前記場面情報とを受け付け、前記感情情報と、前記場面情報とに基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から前記第２画像を生成する、（１）に記載の表情加工プログラム。 (9)
further comprising a scene input step of receiving scene information input from the user,
The image processing step receives the first image, the emotional information, and the scene information, and processes the facial expression of the user in the first image based on the emotional information and the scene information. The facial expression processing program according to (1), wherein the second image is generated from the first image by doing so.

（１０）
前記ユーザから前記第２画像に関するフィードバックの入力を受け付けるフィードバック入力ステップをさらに備え、
前記フィードバックに基づいて前記画像処理ステップが学習を行う、（１）に記載の表情加工プログラム。 (10)
further comprising a feedback input step of receiving feedback input regarding the second image from the user;
The facial expression processing program according to (1), wherein the image processing step performs learning based on the feedback.

（１１）
前記第２画像は、オンライン会議における出力画像として出力される、（１）に記載の表情加工プログラム。 (11)
The facial expression processing program according to (1), wherein the second image is output as an output image in an online conference.

（１２）
前記第２画像は、前記ユーザの表示装置または相手側ユーザの表示装置に出力される、（１）に記載の表情加工プログラム。 (12)
The facial expression processing program according to (1), wherein the second image is output to the user's display device or the other user's display device.

（１３）
前記アバターは、オンライン会議における出力画像として出力される、（２）に記載の表情加工プログラム。 (13)
The facial expression processing program according to (2), wherein the avatar is output as an output image in an online conference.

（１４）
前記アバターは、前記ユーザの表示装置または相手側ユーザの表示装置に出力される、（２）に記載の表情加工プログラム。 (14)
The facial expression processing program according to (2), wherein the avatar is output to the user's display device or the other user's display device.

（１５）
相手側ユーザの感情情報を取得する感情情報取得ステップをさらに備え、
前記画像処理ステップは、前記第１画像と、前記感情情報と、前記相手側ユーザの感情情報とを受け付け、前記感情情報と、前記相手側ユーザの感情情報とに基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から前記第２画像を生成する、（１）に記載の表情加工プログラム。 (15)
Further comprising an emotional information acquisition step of acquiring emotional information of the other user,
The image processing step receives the first image, the emotional information, and the emotional information of the other user, and processes the image in the first image based on the emotional information and the emotional information of the other user. The facial expression processing program according to (1), wherein the second image is generated from the first image by processing the facial expression of the user.

（１６）
前記感情情報取得ステップは、前記相手側ユーザの顔を含む第３画像を前記相手側ユーザから受信し、前記第３画像に基づいて前記相手側ユーザの感情情報を取得する、（１５）に記載の表情加工プログラム。 (16)
The emotional information acquisition step is described in (15), wherein a third image including the face of the other user is received from the other user, and emotional information of the other user is obtained based on the third image. Facial expression processing program.

（１７）
前記感情情報取得ステップは、前記第３画像から前記相手側ユーザの感情を分析することで、前記相手側ユーザの感情情報を取得する、（１６）に記載の表情加工プログラム。 (17)
The facial expression processing program according to (16), wherein the emotion information acquisition step acquires the emotion information of the other party user by analyzing the emotion of the other party user from the third image.

（１８）
前記第３画像は、撮像装置によって撮像された画像である、（１６）に記載の表情加工プログラム。 (18)
The facial expression processing program according to (16), wherein the third image is an image captured by an imaging device.

（１９）
ユーザから感情情報の入力を受け付ける感情入力部と、
前記ユーザの顔を含む第１画像と、前記感情情報とを受け付け、前記感情情報に基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から第２画像を生成する画像処理部と、
を備える表情加工装置。 (19)
an emotion input section that accepts input of emotional information from a user;
A first image including the user's face and the emotional information are received, and the facial expression of the user in the first image is processed based on the emotional information, thereby converting the first image into a second image. an image processing unit that generates
An expression processing device equipped with.

（２０）
ユーザから感情情報の入力を受け付ける感情入力ステップと、
前記ユーザの顔を含む第１画像と、前記感情情報とを受け付け、前記感情情報に基づいて前記第１画像中の前記ユーザの顔の表情を加工することで、前記第１画像から第２画像を生成する画像処理ステップと、
を含む表情加工方法。 (20)
an emotion input step of accepting emotional information input from the user;
A first image including the user's face and the emotional information are received, and the facial expression of the user in the first image is processed based on the emotional information, thereby converting the first image into a second image. an image processing step for generating
Facial expression processing methods including.

１：情報処理装置、２：撮像装置、３：入力装置、４、表示装置、５：感情入力部、
１’：相手側ユーザの情報処理装置、２’：相手側ユーザの撮像装置、
３’：相手側ユーザの入力装置、４’、相手側ユーザの表示装置、
６：記憶部、７：教師画像、８：画像処理部、９：ＡＩ、１０：加工器、
１１：アバター生成器、１２：場面入力部、１３：フィードバック入力部、
１４：感情情報取得部、２０：「無表情」であるユーザ、
２１：「楽しい」という感情情報に対応するユーザ、
２２：「無表情」という顔の表情に対応するアバター、
２３：「楽しい」という顔の表情に対応するアバター、
２４：「楽しい」表情が強調されたユーザ、
２５：「楽しい」表情が強調されたアバター、１００：ネットワーク、
２００：コンピュータ装置、２０１：ＣＰＵ、２０２：主記憶装置、
２０３：補助記憶装置、２０４：通信インタフェース、
２０５：入出力インタフェース、２０６：バス、
３００：ストリーミング配信サーバ、５００：情報処理装置群 1: Information processing device, 2: Imaging device, 3: Input device, 4, Display device, 5: Emotion input unit,
1': information processing device of the other user, 2': imaging device of the other user,
3': input device of the other user; 4', display device of the other user;
6: Storage unit, 7: Teacher image, 8: Image processing unit, 9: AI, 10: Processing device,
11: Avatar generator, 12: Scene input unit, 13: Feedback input unit,
14: Emotional information acquisition unit, 20: User who is "expressionless",
21: User corresponding to the emotional information “fun”,
22: Avatar that corresponds to the facial expression “no expression”,
23: Avatar corresponding to the facial expression of “fun”,
24: User whose “happy” expression was emphasized,
25: Avatar with emphasized “fun” expression, 100: Network,
200: Computer device, 201: CPU, 202: Main storage device,
203: Auxiliary storage device, 204: Communication interface,
205: input/output interface, 206: bus,
300: Streaming distribution server, 500: Information processing device group

Claims

an emotion input step of accepting emotional information input from the user;
A first image including the user's face and the emotional information are received, and the facial expression of the user in the first image is processed based on the emotional information, thereby converting the first image into a second image. an image processing step for generating
A facial expression processing program that causes a computer to execute facial expression processing methods including

The facial expression processing program according to claim 1, further comprising an avatar generation step of generating an avatar, which is a character to be displayed as an alter ego of the user in a virtual space on a network, based on the second image.

The facial expression processing program according to claim 2, wherein the facial expression of the avatar is generated to correspond to the facial expression of the user in the second image.

The facial expression processing program according to claim 1, wherein the image processing step processes the facial expression of the user in the first image into an expression corresponding to the emotional information.

The image processing step processes the user's facial expression in the first image by acquiring a teacher image including a human facial expression and comparing the first image and the teacher image. The facial expression processing program according to claim 1.

The image processing step analyzes the user's emotion from the first image, and analyzes the user's face in the first image based on the analysis result, the emotion information, and the comparison result. The facial expression processing program according to claim 5, which processes facial expressions.

Emotion input step, image processing step The facial expression processing program according to claim 1, wherein the first image is an image captured by an imaging device.

The facial expression processing program according to claim 7, wherein the imaging device is an RGB imaging device or an RGBIR imaging device.

further comprising a scene input step of receiving scene information input from the user,
The image processing step receives the first image, the emotional information, and the scene information, and processes the facial expression of the user in the first image based on the emotional information and the scene information. The facial expression processing program according to claim 1, wherein the second image is generated from the first image by doing so.

further comprising a feedback input step of receiving feedback input regarding the second image from the user;
The facial expression processing program according to claim 1, wherein the image processing step performs learning based on the feedback.

The facial expression processing program according to claim 1, wherein the second image is output as an output image in an online conference.

The facial expression processing program according to claim 1, wherein the second image is output to the display device of the user or the display device of the other user.

The facial expression processing program according to claim 2, wherein the avatar is output as an output image in an online conference.

3. The facial expression processing program according to claim 2, wherein the avatar is output to the user's display device or the other user's display device.

Further comprising an emotional information acquisition step of acquiring emotional information of the other user,
The image processing step receives the first image, the emotional information, and the emotional information of the other user, and processes the image in the first image based on the emotional information and the emotional information of the other user. The facial expression processing program according to claim 1, wherein the second image is generated from the first image by processing the facial expression of the user.

16. The emotional information acquisition step includes receiving a third image including the face of the other user from the other user, and acquiring emotional information of the other user based on the third image. Facial expression processing program.

17. The facial expression processing program according to claim 16, wherein the emotion information acquisition step acquires the emotion information of the other party user by analyzing the emotion of the other party user from the third image.

The facial expression processing program according to claim 16, wherein the third image is an image captured by an imaging device.

an emotion input section that accepts input of emotional information from a user;
A first image including the user's face and the emotional information are received, and the facial expression of the user in the first image is processed based on the emotional information, thereby converting the first image into a second image. an image processing unit that generates
An expression processing device equipped with.

an emotion input step of accepting emotional information input from the user;
A first image including the user's face and the emotional information are received, and the facial expression of the user in the first image is processed based on the emotional information, thereby converting the first image into a second image. an image processing step for generating
Facial expression processing methods including.