JP2023131824A

JP2023131824A - Information processing device, control program, and control method

Info

Publication number: JP2023131824A
Application number: JP2022036795A
Authority: JP
Inventors: 崇志三上; Takashi Mikami; 浩石黒; Hiroshi Ishiguro; 昇吾西口; Shogo Nishiguchi
Original assignee: Avita Inc
Current assignee: Avita Inc
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2023-09-22

Abstract

To provide an information processing device, a control program, and a control method that make it easy to talk with a user who is a dialog partner.SOLUTION: In an information processing system in which a user-side terminal communicably connects to an operator-side terminal and a server, the operator-side terminal includes a CPU. The operator looks at an avatar image 210 which is shown to a display device and which corresponds to a user who is a dialog partner, and talks with a user who uses the communicably connected user-side terminal. When the operator is not speaking, the operator-side terminal recognizes the expression of the user and, when the recognized expression is a smiling or a displeased expression, an avatar image having been represented by favorably changing the same expression is displayed. Meanwhile, when the operator is speaking, the operator-side terminal recognizes the expression of the operator and, when the recognized expression is a smiling or a sad expression, an avatar image in which the same expression is represented with a preset level of expression is displayed.SELECTED DRAWING: Figure 5

Description

この発明は、情報処理装置、制御プログラムおよび制御方法に関し、特にたとえば、通信可能に接続される端末の利用者と音声で対話する操作者が使用する、情報処理装置、制御プログラムおよび制御方法に関する。 The present invention relates to an information processing device, a control program, and a control method, and more particularly to an information processing device, a control program, and a control method used by an operator who interacts by voice with a user of a communicably connected terminal.

この種の従来の情報処理装置の一例が特許文献１に開示されている。特許文献１に開示される情報処理システムでは、ユーザが商品に関することや、ショッピングサイトの利用方法に関することについて相談したい場合、ユーザは呼出ボタンを押下することで、オペレータを呼び出して相談することが可能である。ユーザ端末とオペレータ端末が接続されると、オペレータ端末には、ユーザ端末に表示されたウェブサイトが現在の表示態様で表示される。また、ユーザ端末には、ウェブサイトに、オペレータの画像またはこれに同期したアバター画像が表示される。したがって、オペレータは、ユーザに対して身振り手振りを用いながら接客する。 An example of this type of conventional information processing device is disclosed in Patent Document 1. In the information processing system disclosed in Patent Document 1, when a user wants to consult about a product or how to use a shopping site, the user can press a call button to call and consult an operator. It is. When the user terminal and the operator terminal are connected, the website displayed on the user terminal is displayed on the operator terminal in the current display mode. Furthermore, an image of the operator or an avatar image synchronized therewith is displayed on the website on the user terminal. Therefore, the operator serves the user while using gestures and gestures.

特許第６９３７５３４号Patent No. 6937534

上記の特許文献１では、ショッピングサイト画面以外のウェブサイト画面や、他の個人情報など、ユーザが意図しない画面がオペレータ端末に映り込むことがない。つまり、オペレータ端末には、ユーザの画像またはこれに同期したアバター画像が表示されることが無く、オペレータはユーザまたはこれに同期したアバターの顔を見ることができないため、対話し難い。このため、ユーザの画像をオペレータ端末に表示することが考えられるが、ユーザのプライバシーを侵害する虞がある。また、ユーザの画像に同期したアバター画像を表示することも考えられるが、オペレータがより対話し易くするためには改善の余地がある。 In Patent Document 1 mentioned above, screens that are not intended by the user, such as website screens other than the shopping site screen and other personal information, are not reflected on the operator terminal. In other words, the user's image or the avatar image synchronized therewith is not displayed on the operator terminal, and the operator cannot see the face of the user or the avatar synchronized therewith, making it difficult to interact with the operator. For this reason, it is conceivable to display the user's image on the operator terminal, but there is a risk that the user's privacy may be violated. It is also possible to display an avatar image synchronized with the user's image, but there is still room for improvement in order to make it easier for the operator to interact.

それゆえに、この発明の主たる目的は、新規な、情報処理装置、制御プログラムおよび制御方法を提供することである。 Therefore, the main object of the present invention is to provide a novel information processing device, control program, and control method.

また、この発明の他の目的は、相手と対話し易くすることができる、情報処理装置、制御プログラムおよび制御方法を提供することである。 Another object of the present invention is to provide an information processing device, a control program, and a control method that make it easier to interact with the other party.

第１の発明は、利用者と対話する操作者が発話した音声である操作者音声を検出する音声検出手段、操作者の顔画像を撮影する撮影手段、音声検出手段によって操作者音声を検出している場合に、撮影手段によって撮影された操作者の顔画像に基づいて操作者の表情を認識する第１認識手段、第１認識手段によって認識された操作者の表情が所定の第１の表情である場合に、所定の第１の表情と同じ表情を表現する、対話の相手である利用者に対応するアバターの画像を表示装置に表示するアバター表示手段、および音声検出手段によって検出された操作者音声を利用者が使用する利用者側端末に送信する送信手段を備える、情報処理装置である。 The first invention detects the operator's voice using a voice detecting means for detecting the operator's voice, which is the voice uttered by the operator who interacts with the user, a photographing means for photographing the face image of the operator, and a voice detecting means. a first recognition means for recognizing the operator's facial expression based on a facial image of the operator photographed by the photographing means; the facial expression of the operator recognized by the first recognition means is a predetermined first facial expression; , an operation detected by the avatar display means for displaying on a display device an image of an avatar corresponding to the user who is the other party of the dialogue, expressing the same facial expression as the predetermined first facial expression, and the voice detection means. The information processing apparatus includes a transmitting means for transmitting user voice to a user terminal used by the user.

第２の発明は、第１の発明に従属し、所定の第１の表情は、微笑む表情および悲しい表情である。 A second invention is dependent on the first invention, and the predetermined first facial expressions are a smiling facial expression and a sad facial expression.

第３の発明は、第１または第２の発明に従属し、音声検出手段によって操作者音声を検出している場合に、操作者音声に基づいて頷きのタイミングであるかどうかを判断するタイミング判断手段、およびタイミング判断手段によって頷きのタイミングであることが判断された場合に、アバターに頷き動作を実行させるアバター制御手段をさらに備える。 A third invention is dependent on the first or second invention, and provides a timing judgment for determining whether or not it is time to nod based on the operator's voice when the operator's voice is detected by the voice detection means. The apparatus further includes avatar control means for causing the avatar to perform a nodding motion when the timing determining means determines that it is time to nod.

第４の発明は、第１から第３の発明までのいずれかに従属し、利用者側端末から送信された利用者の顔画像を受信する受信手段、および音声検出手段によって操作者音声を検出していない場合に、受信手段によって受信された利用者の顔画像に基づいて利用者の表情を認識する第２認識手段をさらに備え、アバター表示手段は、第２認識手段によって認識された利用者の表情が所定の第２の表情である場合に、所定の第２の表情と同じ表情を操作者に好意的に変更してアバターに表現させる。 A fourth invention is dependent on any one of the first to third inventions, and detects operator voice by a receiving means for receiving a user's face image transmitted from a user side terminal and a voice detecting means. The avatar display means further includes a second recognition means for recognizing the user's facial expression based on the user's facial image received by the receiving means when the avatar display means recognizes the facial expression of the user recognized by the second recognition means. When the facial expression of is a predetermined second facial expression, the same facial expression as the predetermined second facial expression is changed favorably by the operator and the avatar is made to express the same facial expression.

第５の発明は、第４の発明に従属し、所定の第２の表情は、微笑む表情および不機嫌な表情であり、第２認識手段は、利用者の微笑む表情および不機嫌な表情の度合をさらに認識し、アバター表示手段は、利用者の表情が微笑む表情である場合に、微笑みの度合を強調してアバターに表現させ、利用者の表情が不機嫌な表情である場合に、不機嫌の度合を緩和してアバターに表現させる。 A fifth invention is dependent on the fourth invention, wherein the predetermined second expression is a smiling expression and a displeased expression, and the second recognition means further determines the degree of the user's smiling expression and displeased expression. The avatar display means emphasizes the degree of smile and causes the avatar to express it when the user's facial expression is a smiling expression, and reduces the degree of displeasure when the user's facial expression is a displeased expression. and have the avatar express it.

第６の発明は、第１から第３の発明までのいずれかに従属し、利用者側端末から送信された利用者の表情を受信する受信手段をさらに備え、アバター表示手段は、音声検出手段によって操作者音声を検出していない場合に、受信手段によって受信された利用者の表情が所定の第２の表情である場合に、所定の第２の表情と同じ表情を操作者に好意的に変更してアバターに表現させる。 A sixth invention is dependent on any one of the first to third inventions, and further comprises a receiving means for receiving the user's facial expression transmitted from the user side terminal, and the avatar displaying means is a voice detecting means. If the user's facial expression received by the receiving means is a predetermined second facial expression when the operator's voice is not detected by Change it and let the avatar express it.

第７の発明は、第６の発明に従属し、所定の第２の表情は、微笑む表情および不機嫌な表情であり、受信手段は、利用者の微笑む表情および不機嫌な表情の度合をさらに受信し、アバター表示手段は、利用者の表情が微笑む表情である場合に、微笑みの度合を強調してアバターに表現させ、利用者の表情が不機嫌な表情である場合に、不機嫌の度合を緩和してアバターに表現させる。 A seventh invention is dependent on the sixth invention, wherein the predetermined second expression is a smiling expression and a displeased expression, and the receiving means further receives the degree of the user's smiling expression and displeased expression. When the user's facial expression is a smiling expression, the avatar display means emphasizes the degree of smile and causes the avatar to express it, and when the user's facial expression is a displeased expression, the avatar display means reduces the degree of displeasure. Let the avatar express it.

第８の発明は、第１から第５までのいずれかに従属し、受信手段によって受信された利用者の顔画像に基づいて利用者の視線を検出する視線検出手段をさらに備え、アバター表示手段は、視線検出手段によって検出された利用者の視線に合わせてアバターの視線を設定し、音声検出手段によって操作者音声を検出していない場合に、視線検出手段によって検出された利用者の視線が正面を向いている時間が第１所定時間を経過すると、利用者の視線に関係無く、アバターの視線を第２所定時間逸らす。 An eighth invention is dependent on any one of the first to fifth aspects, further comprising a line of sight detection means for detecting the user's line of sight based on the user's face image received by the receiving means, and avatar display means. sets the avatar's line of sight to match the user's line of sight detected by the line of sight detection means, and when the voice detection means does not detect the operator's voice, the user's line of sight detected by the line of sight detection means When the avatar faces forward for a first predetermined time period, the avatar's line of sight is averted for a second predetermined time period, regardless of the user's line of sight.

第９の発明は、第６または第７の発明に従属し、受信手段は利用者の視線をさらに受信し、アバター表示手段は、受信した利用者の視線に合わせてアバターの視線を設定し、音声検出手段によって操作者音声を検出していない場合に、視線検出手段によって検出された利用者の視線が正面を向いている時間が第１所定時間を経過すると、利用者の視線に関係無く、アバターの視線を第２所定時間逸らす。 A ninth invention is dependent on the sixth or seventh invention, wherein the receiving means further receives the user's line of sight, and the avatar display means sets the avatar's line of sight in accordance with the received user's line of sight, When the voice detecting means does not detect the operator's voice, if the time during which the user's line of sight detected by the line of sight detection means is facing forward has passed the first predetermined time, regardless of the user's line of sight, Avert the avatar's line of sight for a second predetermined period of time.

第１０の発明は、情報処理装置で実行される制御プログラムであって、情報処理装置のプロセッサに、利用者と対話する操作者が発話した音声である操作者音声を検出する音声検出ステップ、操作者の顔画像を撮影する撮影ステップ、音声検出ステップにおいて操作者音声を検出している場合に、撮影ステップにおいて撮影した操作者の顔画像に基づいて操作者の表情を認識する認識ステップ、認識ステップにおいて認識した操作者の表情が所定の表情である場合に、所定の表情と同じ表情を表現する、対話の相手である利用者に対応するアバターの画像を表示装置に表示するアバター表示ステップ、および音声検出ステップにおいて検出した操作者音声を利用者が使用する利用者側端末に送信する送信ステップを実行させる、制御プログラムである。 A tenth invention is a control program executed by an information processing device, which includes a voice detection step for detecting an operator voice, which is a voice uttered by an operator interacting with a user, in a processor of the information processing device, and an operation. a recognition step for recognizing the facial expression of the operator based on the facial image of the operator photographed in the photographing step when the operator's voice is detected in the photographing step of photographing a facial image of the operator; an avatar displaying step of displaying, when the facial expression of the operator recognized in step is a predetermined facial expression, an image of an avatar that expresses the same facial expression as the predetermined facial expression and corresponds to the user who is the other party of the dialogue; This is a control program that executes a transmitting step of transmitting the operator's voice detected in the voice detecting step to a user terminal used by the user.

第１１の発明は、情報処理装置の制御方法であって、(ａ)利用者と対話する操作者が発話した音声である操作者音声を検出するステップ、（ｂ）操作者の顔画像を撮影するステップ、（ｃ）ステップ（ａ）において操作者音声を検出している場合に、撮影ステップにおいて撮影した操作者の顔画像に基づいて操作者の表情を認識するステップ、（ｄ）ステップ（ｃ）において認識した操作者の表情が所定の表情である場合に、所定の表情と同じ表情を表現する、対話の相手である利用者に対応するアバターの画像を表示装置に表示するステップ、および（ｅ）ステップ（ａ）において検出した操作者音声を利用者が使用する利用者側端末に送信するステップを含む、制御方法である。 An eleventh invention is a method for controlling an information processing device, comprising: (a) detecting operator voice, which is voice uttered by an operator interacting with a user; (b) photographing a face image of the operator. (c) when the operator voice is detected in step (a), recognizing the facial expression of the operator based on the facial image of the operator photographed in the photographing step; (d) step (c) If the facial expression of the operator recognized in ) is a predetermined facial expression, displaying on a display device an image of an avatar corresponding to the user who is the other party of the interaction and expressing the same facial expression as the predetermined facial expression; e) A control method including the step of transmitting the operator voice detected in step (a) to a user-side terminal used by a user.

この発明によれば、対話の相手である利用者と対話し易くすることができる。 According to this invention, it is possible to facilitate dialogue with the user who is the other party of the dialogue.

この発明の上述の目的、その他の目的，特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above objects, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１はこの発明の一実施例の情報処理システムを示す図である。FIG. 1 is a diagram showing an information processing system according to an embodiment of the present invention. 図２は図１に示す利用者側端末の電気的な構成を示すブロック図である。FIG. 2 is a block diagram showing the electrical configuration of the user terminal shown in FIG. 1. 図３は図１に示す操作者側端末の電気的な構成を示すブロック図である。FIG. 3 is a block diagram showing the electrical configuration of the operator side terminal shown in FIG. 1. 図４は利用者側端末の表示装置に表示される画面の一例を示す図である。FIG. 4 is a diagram showing an example of a screen displayed on a display device of a user-side terminal. 図５は操作者側端末の表示装置に表示されるアバターの画像の一例を示す図である。FIG. 5 is a diagram showing an example of an avatar image displayed on the display device of the operator side terminal. 図６（Ａ）は度合の異なるアバターの微笑む表情の一例を示す図であり、図６(Ｂ)は度合の異なるアバターの不機嫌な表情の一例を示す図であり、図６（Ｃ）は度合の異なるアバターの悲しい表情の一例を示す図である。FIG. 6(A) is a diagram showing an example of an avatar's smiling expression with different degrees, FIG. 6(B) is a diagram showing an example of a displeased expression of an avatar with different degrees, and FIG. 6(C) is a diagram showing an example of a displeased expression of an avatar with different degrees. FIG. 3 is a diagram showing an example of sad facial expressions of different avatars. 図７は操作者側端末のＲＡＭのメモリマップの一例を示す図である。FIG. 7 is a diagram showing an example of a memory map of the RAM of the operator side terminal. 図８は図７に示すデータ記憶領域の具体的な内容の一例を示す図である。FIG. 8 is a diagram showing an example of specific contents of the data storage area shown in FIG. 7. 図９は図３に示す操作者側端末のＣＰＵの制御処理の一例の一部を示すフロー図である。FIG. 9 is a flow diagram showing a part of an example of control processing of the CPU of the operator side terminal shown in FIG. 図１０は図３に示す操作者側端末のＣＰＵの制御処理の一例の他の一部であって、図９に後続するフロー図である。FIG. 10 is another part of the example of the control process of the CPU of the operator side terminal shown in FIG. 3, and is a flow diagram subsequent to FIG. 図１１は図３に示す操作者側端末のＣＰＵの制御処理の一例のその他の一部であって、図９に後続するフロー図である。FIG. 11 is another part of the example of the control process of the CPU of the operator terminal shown in FIG. 3, and is a flow diagram subsequent to FIG. 図１２は図３に示す操作者側端末のＣＰＵの制御処理の一例のさらに他の一部であって、図１０および図１１に後続するフロー図である。FIG. 12 is a flowchart showing still another part of the control process of the CPU of the operator terminal shown in FIG. 3, and is subsequent to FIGS. 10 and 11. 図１３は図３に示す操作者側端末のＣＰＵの送受信処理の一例を示すフロー図である。FIG. 13 is a flowchart showing an example of the transmission/reception processing of the CPU of the operator side terminal shown in FIG. 図１４は図２に示す利用者側端末のＣＰＵの送受信処理の一例を示すフロー図である。FIG. 14 is a flow diagram showing an example of the transmission/reception processing of the CPU of the user terminal shown in FIG. 図１５は第２実施例の操作者側端末のＣＰＵの制御処理の一部を示すフロー図である。FIG. 15 is a flow diagram showing part of the control processing of the CPU of the operator side terminal in the second embodiment. 図１６は第２実施例の利用者側端末のＣＰＵの送受信処理を示すフロー図である。FIG. 16 is a flow diagram showing the transmission and reception processing of the CPU of the user terminal in the second embodiment.

＜第１実施例＞
図１を参照して、この第１実施例の情報処理システム１０は利用者側端末１２を含み、利用者側端末１２は、ネットワーク１４を介して、操作者側端末１６およびサーバ１８に通信可能に接続される。 <First example>
Referring to FIG. 1, an information processing system 10 according to the first embodiment includes a user terminal 12, and the user terminal 12 can communicate with an operator terminal 16 and a server 18 via a network 14. connected to.

なお、この第１実施例では、１台の利用者側端末１２および１台の操作者側端末１６を示すが、実際には、複数台の利用者側端末１２および複数台の操作者側端末１６が設けられ、後述するように、１台の利用者側端末１２と、この１台の利用者側端末１２からの要求に応じてサーバ１８によって選択された１台の操作者側端末１６の間でチャットまたはトークの処理が行われる。 Although this first embodiment shows one user terminal 12 and one operator terminal 16, in reality, a plurality of user terminals 12 and a plurality of operator terminals are used. 16, and as described later, one user terminal 12 and one operator terminal 16 selected by the server 18 in response to a request from this one user terminal 12. A chat or talk process takes place between.

利用者側端末１２は、サーバ１８によって提供される所定のサービスを利用する利用者によって使用され、操作者側端末１６は、利用者に応対する操作者によって使用される。 The user terminal 12 is used by a user who uses a predetermined service provided by the server 18, and the operator terminal 16 is used by an operator who responds to the user.

利用者側端末１２は、情報処理装置であり、一例として、汎用のスマートフォンであり、ブラウザ機能を備えている。他の例では、利用者側端末１２として、タブレットＰＣ、ノート型ＰＣまたはデスクトップ型ＰＣなどの他の汎用の端末を用いることもできる。 The user terminal 12 is an information processing device, for example a general-purpose smartphone, and is equipped with a browser function. In other examples, other general-purpose terminals such as a tablet PC, a notebook PC, or a desktop PC may be used as the user terminal 12.

ネットワーク１４は、インターネットを含むＩＰ網（または、ＩＰネットワーク）と、このＩＰ網にアクセスするためのアクセス網（または、アクセスネットワーク）とから構成される。アクセス網としては、公衆電話網、携帯電話網、有線ＬＡＮ、無線ＬＡＮ、ＣＡＴＶ（Cable Television）等を用いることができる。 The network 14 is composed of an IP network (or IP network) including the Internet, and an access network (or access network) for accessing this IP network. As the access network, a public telephone network, a mobile phone network, a wired LAN, a wireless LAN, CATV (Cable Television), etc. can be used.

操作者側端末１６は、利用者側端末１２とは異なる他の情報処理装置であり、一例として、汎用のノート型ＰＣまたはデスクトップ型ＰＣであるが、他の例では、スマートフォンまたはタブレットＰＣなどの他の汎用の端末を用いることもできる。 The operator-side terminal 16 is another information processing device different from the user-side terminal 12, and is, for example, a general-purpose notebook PC or a desktop PC, but other examples include a smartphone or a tablet PC. Other general-purpose terminals can also be used.

サーバ１８は、利用者側端末１２および操作者側端末１６とは異なるその他の情報処理装置であり、汎用のサーバを用いることができる。したがって、サーバ１８は、ＣＰＵ１８ａおよび記憶部（ＨＤＤ、ＲＯＭおよびＲＡＭを含む）１８ｂを備えるとともに、通信インタフェースおよび入出力インタフェースなどのコンポーネントを備える。第１実施例では、サーバ１８は、所定のサービスを提供するサイトを運営する。 The server 18 is an information processing device different from the user terminal 12 and the operator terminal 16, and may be a general-purpose server. Therefore, the server 18 includes a CPU 18a and a storage section (including an HDD, ROM, and RAM) 18b, as well as components such as a communication interface and an input/output interface. In the first embodiment, the server 18 operates a site that provides predetermined services.

図２は図１に示した利用者側端末１２の電気的な構成を示すブロック図である。図２に示すように、利用者側端末１２はＣＰＵ２０を含み、ＣＰＵ２０は、内部バスを介して、記憶部２２、通信インタフェース（以下、「通信Ｉ／Ｆ」という）２４および入出力インタフェース（以下、「入出力Ｉ／Ｆ」という）２６に接続される。 FIG. 2 is a block diagram showing the electrical configuration of the user terminal 12 shown in FIG. 1. As shown in FIG. As shown in FIG. 2, the user terminal 12 includes a CPU 20, and the CPU 20 connects a storage unit 22, a communication interface (hereinafter referred to as "communication I/F") 24, and an input/output interface (hereinafter referred to as "communication I/F") via an internal bus. , ``input/output I/F'') 26.

ＣＰＵ２０は、利用者側端末１２の全体的な制御を司る。ただし、ＣＰＵ２０に代えて、ＣＰＵ機能、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）機能等の複数の機能を含むＳｏＣ（Ｓｙｓｔｅｍ－ｏｎ－ａ－ｃｈｉｐ）を設けてもよい。記憶部２２は、ＨＤＤ、ＲＯＭおよびＲＡＭを含む。ただし、ＨＤＤに代えて、または、ＨＤＤ、ＲＯＭおよびＲＡＭに加えて、ＳＳＤ等の不揮発性メモリが使用されてもよい。 The CPU 20 is in charge of overall control of the user terminal 12. However, instead of the CPU 20, an SoC (System-on-a-chip) including a plurality of functions such as a CPU function and a GPU (Graphics Processing Unit) function may be provided. Storage unit 22 includes an HDD, ROM, and RAM. However, a nonvolatile memory such as an SSD may be used instead of the HDD or in addition to the HDD, ROM, and RAM.

通信Ｉ／Ｆ２４は、ＣＰＵ２０の制御の下、ネットワーク１４を介して、操作者側端末１６およびサーバ１８などの外部のコンピュータとの間で、制御信号およびデータの送受信を行うために有線インタフェースを有する。ただし、通信Ｉ／Ｆ２４としては、無線ＬＡＮまたはBluetooth（登録商標）等の無線インタフェースを使用することもできる。 The communication I/F 24 has a wired interface for transmitting and receiving control signals and data to and from an external computer such as the operator terminal 16 and the server 18 via the network 14 under the control of the CPU 20. . However, as the communication I/F 24, a wireless interface such as a wireless LAN or Bluetooth (registered trademark) can also be used.

入出力Ｉ／Ｆ２６には、入力装置２８および表示装置３０、マイク３２およびスピーカ３４が接続されている。入力装置２８は、タッチパネルおよびハードウェアのボタンである。タッチパネルは、汎用のタッチパネルであり、静電容量方式、電磁誘導方式、抵抗膜方式、赤外線方式など、任意の方式のものを用いることができる。後述する操作者側端末１６についても同様である。 An input device 28, a display device 30, a microphone 32, and a speaker 34 are connected to the input/output I/F 26. Input device 28 is a touch panel and hardware buttons. The touch panel is a general-purpose touch panel, and can be of any type, such as a capacitive type, an electromagnetic induction type, a resistive film type, or an infrared type. The same applies to the operator terminal 16, which will be described later.

ただし、利用者側端末１２として、ノート型ＰＣまたはデスクトップ型ＰＣが用いられる場合には、入力装置２８として、キーボードおよびコンピュータマウスが使用される。 However, when a notebook PC or a desktop PC is used as the user terminal 12, a keyboard and a computer mouse are used as the input device 28.

また、表示装置３０は、ＬＣＤまたは有機ＥＬ表示装置である。上記のタッチパネルは、表示装置３０の表示面上に設けられてもよいし、タッチパネルが表示装置３０と一体的に形成されたタッチパネルディスプレイが設けられてもよい。このことは、後述する操作者側端末１６についても同様である。 Further, the display device 30 is an LCD or an organic EL display device. The above touch panel may be provided on the display surface of the display device 30, or a touch panel display in which the touch panel is integrally formed with the display device 30 may be provided. This also applies to the operator terminal 16, which will be described later.

入出力Ｉ／Ｆ２６は、入力装置２８から入力された操作データ（または、操作情報）をＣＰＵ２０に出力するとともに、ＣＰＵ２０によって生成された画像データを表示装置３０に出力して、画像データに対応する画面または画像を表示装置３０に表示させる。ただし、外部のコンピュータ（たとえば、操作者側端末１６またはサーバ１８）から受信した画像データがＣＰＵ２０によって出力される場合もある。 The input/output I/F 26 outputs the operation data (or operation information) input from the input device 28 to the CPU 20, and also outputs the image data generated by the CPU 20 to the display device 30 to correspond to the image data. A screen or image is displayed on the display device 30. However, image data received from an external computer (for example, operator terminal 16 or server 18) may be output by CPU 20.

また、入出力Ｉ／Ｆ２６は、マイク３２で検出された利用者の音声をデジタルの音声データに変換してＣＰＵ２０に出力するとともに、ＣＰＵ２０によって出力される音声データをアナログの音声信号に変換してスピーカ３４から出力させる。ただし、第１実施例では、ＣＰＵ２０から出力される音声データは、操作者側端末１６から受信した音声データである。 In addition, the input/output I/F 26 converts the user's voice detected by the microphone 32 into digital audio data and outputs it to the CPU 20, and also converts the audio data output by the CPU 20 into an analog audio signal. output from the speaker 34. However, in the first embodiment, the audio data output from the CPU 20 is the audio data received from the operator terminal 16.

また、利用者側端末１２は、センサインタフェース（センサＩ／Ｆ）３６およびカメラ３８を備えている。ＣＰＵ２０は、バスおよびセンサＩ／Ｆ３６を介してカメラ３８に接続される。カメラ３８は、ＣＣＤまたはＣＭＯＳのような撮像素子を用いたカメラである。 The user terminal 12 also includes a sensor interface (sensor I/F) 36 and a camera 38. CPU 20 is connected to camera 38 via a bus and sensor I/F 36. The camera 38 is a camera using an imaging device such as a CCD or CMOS.

なお、図２に示す利用者側端末１２の電気的な構成は一例であり、限定される必要はない。 Note that the electrical configuration of the user terminal 12 shown in FIG. 2 is an example, and does not need to be limited.

また、利用者側端末１２がスマートフォンである場合には、携帯電話通信網、または、携帯電話網および公衆電話網を介して、通話するための通話回路を備えるが、第１実施例では、そのような通話は行わないため、図示は省略してある。このことは、後述する操作者側端末１６がスマートフォンである場合についても同じである。 Furthermore, when the user terminal 12 is a smartphone, it is provided with a telephone communication circuit for making telephone calls via a mobile telephone communication network or a mobile telephone network and a public telephone network. Since such calls are not conducted, illustrations are omitted. This also applies to the case where the operator terminal 16, which will be described later, is a smartphone.

図３は図１に示した操作者側端末１６の電気的な構成を示すブロック図である。図３に示すように、操作者側端末１６はＣＰＵ５０を含み、ＣＰＵ５０は、内部バスを介して、記憶部５２、通信Ｉ／Ｆ５４および入出力Ｉ／Ｆ５６に接続される。 FIG. 3 is a block diagram showing the electrical configuration of the operator side terminal 16 shown in FIG. 1. As shown in FIG. As shown in FIG. 3, the operator side terminal 16 includes a CPU 50, and the CPU 50 is connected to a storage unit 52, a communication I/F 54, and an input/output I/F 56 via an internal bus.

ＣＰＵ５０は、操作者側端末１６の全体的な制御を司る。ただし、ＣＰＵ５０に代えて、ＣＰＵ機能、ＧＰＵ機能等の複数の機能を含むＳｏＣを設けてもよい。記憶部５２は、ＨＤＤ、ＲＯＭおよびＲＡＭを含む。ただし、ＨＤＤに代えて、または、ＨＤＤ、ＲＯＭおよびＲＡＭに加えて、ＳＳＤ等の不揮発性メモリが使用されてもよい。 The CPU 50 is in charge of overall control of the operator terminal 16. However, instead of the CPU 50, an SoC including multiple functions such as a CPU function and a GPU function may be provided. Storage unit 52 includes an HDD, ROM, and RAM. However, a nonvolatile memory such as an SSD may be used instead of the HDD or in addition to the HDD, ROM, and RAM.

通信Ｉ／Ｆ５４は、ＣＰＵ５０の制御の下、ネットワーク１４を介して、利用者側端末１２およびサーバ１８などの外部のコンピュータとの間で、制御信号およびデータの送受信を行うために有線インタフェースを有する。ただし、通信Ｉ／Ｆ５４としては、無線ＬＡＮまたはBluetooth（登録商標）等の無線インタフェースを使用することもできる。 The communication I/F 54 has a wired interface for transmitting and receiving control signals and data between the user terminal 12 and an external computer such as the server 18 via the network 14 under the control of the CPU 50. . However, as the communication I/F 54, a wireless interface such as a wireless LAN or Bluetooth (registered trademark) can also be used.

入出力Ｉ／Ｆ５６には、入力装置５８および表示装置６０、マイク６２およびスピーカ６４が接続されている。マイク６２およびスピーカ６４は、操作者が利用者との間で音声通話するために使用するマイク付きのヘッドセットを構成する。 An input device 58, a display device 60, a microphone 62, and a speaker 64 are connected to the input/output I/F 56. The microphone 62 and speaker 64 constitute a headset with a microphone used by the operator to make voice calls with the user.

また、入力装置５８としては、キーボードおよびコンピュータマウスが用いられる。ただし、操作者側端末１６として、スマートフォンまたはタブレットＰＣが用いられる場合には、入力装置５８として、タッチパネルおよびハードウェアのボタンが設けられる。また、表示装置６０は、ＬＣＤまたは有機ＥＬ表示装置である。 Further, as the input device 58, a keyboard and a computer mouse are used. However, when a smartphone or a tablet PC is used as the operator terminal 16, a touch panel and hardware buttons are provided as the input device 58. Further, the display device 60 is an LCD or an organic EL display device.

入出力Ｉ／Ｆ５６は、入力装置５８から入力された操作データ（または、操作情報）をＣＰＵ５０に出力するとともに、ＣＰＵ５０によって生成された画像データを表示装置６０に出力して、画像データに対応する画面を表示装置６０に表示させる。 The input/output I/F 56 outputs the operation data (or operation information) input from the input device 58 to the CPU 50, and also outputs the image data generated by the CPU 50 to the display device 60 to correspond to the image data. The screen is displayed on the display device 60.

また、入出力Ｉ／Ｆ５６は、マイク６２で検出された操作者の音声をデジタルの音声データに変換してＣＰＵ５０に出力するとともに、ＣＰＵ５０によって出力される音声データをアナログの音声信号に変換してスピーカ６４から出力させる。ただし、第１実施例では、ＣＰＵ５０から出力される音声データは、利用者側端末１２から受信した音声データである。 In addition, the input/output I/F 56 converts the operator's voice detected by the microphone 62 into digital audio data and outputs it to the CPU 50, and also converts the audio data output by the CPU 50 into an analog audio signal. output from the speaker 64. However, in the first embodiment, the audio data output from the CPU 50 is the audio data received from the user terminal 12.

また、操作者側端末１６は、センサＩ／Ｆ６６およびカメラ６８を備えている。ＣＰＵ５０は、バスおよびセンサＩ／Ｆ６６を介してカメラ６８に接続される。カメラ６８は、ＣＣＤまたはＣＭＯＳのような撮像素子を用いたカメラである。 Further, the operator side terminal 16 includes a sensor I/F 66 and a camera 68. CPU 50 is connected to camera 68 via a bus and sensor I/F 66. The camera 68 is a camera using an imaging device such as a CCD or CMOS.

このような情報処理システム１０では、利用者が利用者側端末１２を使用して、サーバ１８が提供する所定のサービスのウェブ画面１００を見て、ショッピング等を行う。ウェブ画面１００の前面には、操作者（オペレータ）とチャットまたはトークでコミュニケーションするためのボタン１１０およびボタン１１２が表示される。 In such an information processing system 10, a user uses the user terminal 12 to view a web screen 100 of a predetermined service provided by the server 18 and perform shopping or the like. On the front of the web screen 100, a button 110 and a button 112 for communicating with an operator via chat or talk are displayed.

ただし、ウェブ画面１００は、ウェブブラウザを起動し、所定のＵＲＬを入力することにより、表示装置３０に表示される。ウェブ画面１００は、所定のサービスのウェブサイト（または、ウェブページ）の画面である。図４では、或るオンラインショッピングのウェブ画面１００の例が示される。一例として、所定のサービスは、オンラインショッピングであるが、チャットまたはトークで、利用者の問い合わせに対して応対（応答）することができる、任意のオンラインサービスである。 However, the web screen 100 is displayed on the display device 30 by starting a web browser and inputting a predetermined URL. The web screen 100 is a screen of a website (or web page) of a predetermined service. In FIG. 4, an example of a web screen 100 for online shopping is shown. As an example, the predetermined service is online shopping, but it is any online service that can respond to inquiries from users through chat or talk.

また、ボタン１１０およびボタン１１２は、所定の条件を満たした場合に表示されるようにしてもよい。所定の条件は、利用者がボタン１１０およびボタン１１２の表示を指示したこと、利用者の操作が長時間（たとえば、３０秒から数分）以上無いこと、表示装置３０に表示中のウェブ画面１００において同じ位置または似たような場所（近くの位置）を繰り返し指示または継続して指示していること、所定のサービスにおいて複数回（たとえば、３回）同じウェブ画面１００に戻ってくることである。 Further, the button 110 and the button 112 may be displayed when a predetermined condition is satisfied. The predetermined conditions are that the user has instructed the display of the buttons 110 and 112, that there is no user operation for a long time (for example, from 30 seconds to several minutes), and that the web screen 100 being displayed on the display device 30 Repeatedly or continuously instructing the same location or a similar location (nearby location), or returning to the same web screen 100 multiple times (for example, three times) in a predetermined service. .

利用者がボタン１１０をオンすると、利用者にチャットサービスが提供され、利用者とサーバ１８によって選択された操作者の間でチャットが行われる。つまり、利用者側端末１２と操作者側端末１６の間でテキストによるメッセージの送受信が行われる。チャットサービスは既に周知であり、また、本願発明の本質的な内容ではないため、説明を省略する。一例として、特開２０２０－８６６７７号に開示されたチャットサービスを用いることができる。ただし、操作者が応対することに代えて、チャットボットが応対することもできる。 When the user turns on the button 110, a chat service is provided to the user, and a chat is performed between the user and the operator selected by the server 18. That is, text messages are exchanged between the user terminal 12 and the operator terminal 16. The chat service is already well known and is not essential to the present invention, so a description thereof will be omitted. As an example, a chat service disclosed in Japanese Patent Application Publication No. 2020-86677 can be used. However, instead of an operator responding, a chatbot can also respond.

また、利用者がボタン１１２をオンすると、利用者はオンラインショッピングのサイトの利用方法および商品に関する問い合わせを、サーバ１８によって選択された操作者に対して音声で行うことができる。 Further, when the user turns on the button 112, the user can make a voice inquiry to the operator selected by the server 18 about how to use the online shopping site and about the product.

この場合、操作者に対応するアバターの画像が利用者側端末１２の表示装置３０に表示される。一例として、操作者に対応するアバターの画像は、人間を模したキャラクタの顔を含む一部または全身の画像であり、ウェブ画面１００の前面に表示される。また、操作者に対応するアバターの画像は、操作者の音声の出力に合せて動作される。アバターは、口を動かしたり、首を動かしたり、瞬きしたり、顔の表情を変えたりする。ただし、アバターの画像が、人間を模したキャラクタの上半身または全身の画像である場合には、アバターは、さらに、身振り手振りする。 In this case, an image of an avatar corresponding to the operator is displayed on the display device 30 of the user terminal 12. As an example, the image of the avatar corresponding to the operator is a partial or full-body image of a character imitating a human face, and is displayed on the front of the web screen 100. Further, the image of the avatar corresponding to the operator is moved in accordance with the output of the operator's voice. The avatar moves its mouth, moves its head, blinks, and changes facial expressions. However, if the avatar image is an image of the upper body or the whole body of a character imitating a human, the avatar also makes gestures.

なお、操作者に対応するアバターの画像の表示およびその制御は本願発明の本質的な内容ではなく、公知技術を採用することができるため、説明を省略する。 Note that the display of the image of the avatar corresponding to the operator and its control are not essential contents of the present invention, and a known technique can be adopted, so a description thereof will be omitted.

ただし、操作者に対応するアバターの画像を表示することに代えて、操作者の画像を表示するようにしてもよい。 However, instead of displaying the image of the avatar corresponding to the operator, an image of the operator may be displayed.

上述したような情報処理システム１０においては、操作者は、利用者の映像を見ながら利用者と対話するか、利用者の映像を見ずに利用者と対話することが一般的である。 In the information processing system 10 as described above, the operator generally interacts with the user while viewing the user's video, or interacts with the user without viewing the user's video.

前者の場合には、操作者は、利用者の映像を見ることができ、しかも、利用者の表情を認識することができるため、利用者と対話し易い。しかし、利用者の映像を表示するため、利用者のプライバシーを守るためには改善の余地がある。 In the former case, the operator can easily interact with the user because he or she can see the user's image and recognize the user's facial expressions. However, since the video of the user is displayed, there is room for improvement in order to protect the privacy of the user.

また、後者の場合には、利用者のプライバシーを守ることは出来ていると考えられるが、操作者は利用者の映像を見ることができないため、対話し難いという問題がある。 In the latter case, although it is considered that the user's privacy can be protected, there is a problem in that it is difficult for the operator to interact with the user because the operator cannot see the user's image.

したがって、第１実施例では、利用者のプライバシーを守りつつ、利用者と対話し易くするために、利用者に対応するアバターの画像を操作者側端末１６の表示装置６０に表示し、アバターの表情（第１実施例では、顔の表情）、アバターの視線およびアバターの動作を制御するようにしてある。 Therefore, in the first embodiment, in order to facilitate interaction with the user while protecting the user's privacy, an image of the avatar corresponding to the user is displayed on the display device 60 of the operator terminal 16, and the image of the avatar is displayed on the display device 60 of the operator terminal 16. Expressions (in the first embodiment, facial expressions), the avatar's line of sight, and the avatar's movements are controlled.

図５は、操作者側端末１６の表示装置６０に表示される、利用者に対応するアバターの画像２１０の一例を示す。図５に示すように、四角形の表示枠２００内に、アバターの画像２１０が表示される。利用者に対応するアバターの画像２１０は、人間を模したキャラクタの顔を含む一部の画像である。より具体的には、アバターの画像２１０は、人間の頭部（顔を含む）、首および肩の一部の画像である。 FIG. 5 shows an example of an image 210 of an avatar corresponding to the user, which is displayed on the display device 60 of the operator terminal 16. As shown in FIG. 5, an avatar image 210 is displayed within a rectangular display frame 200. The avatar image 210 corresponding to the user is a part of the image including the face of a character imitating a human. More specifically, the avatar image 210 is an image of a portion of a human head (including the face), neck, and shoulders.

ただし、アバターの画像２１０が、人間を模したキャラクタの上半身または全身の画像である場合には、アバターは、さらに、身振り手振りする。詳細な説明は省略するが、この場合、後述するように、アバターの表情を制御する場合には、身振り手振りも制御される。 However, if the avatar image 210 is an image of the upper body or the whole body of a character imitating a human, the avatar further gestures. Although a detailed explanation will be omitted, in this case, as will be described later, when controlling the facial expressions of the avatar, the body movements are also controlled.

また、利用者側端末１２の表示装置３０に表示されたウェブ画面１００と同じウェブ画面１００を表示装置６０に表示し、このウェブ画面１００の前面に、表示枠２００および利用者に対応するアバターの画像２１０が表示されるようにすることもできる。 In addition, a web screen 100 that is the same as the web screen 100 displayed on the display device 30 of the user side terminal 12 is displayed on the display device 60, and a display frame 200 and an avatar corresponding to the user are displayed in front of the web screen 100. An image 210 may also be displayed.

この場合、利用者側端末１２の表示装置３０に表示されているウェブサイトのＵＲＬと、ウィンドウサイズと、カーソルの位置座標を含む表示情報が、利用者側端末１２においてボタン１１２がオンされたときにサーバ１８に送信され、さらに、応対する操作者すなわち操作者側端末１６が選択されたときに、サーバ１８からこの操作者側端末１６に送信される。ただし、表示情報は、通信が開始されてからトークが開始されるまでの間に、利用者側端末１２から操作者側端末１６に直接送信されてもよい。 In this case, display information including the website URL, window size, and cursor position coordinates displayed on the display device 30 of the user terminal 12 is displayed when the button 112 is turned on on the user terminal 12. The information is transmitted to the server 18 at the time of the request, and further transmitted from the server 18 to the operator terminal 16 when the operator to respond, that is, the operator terminal 16 is selected. However, the display information may be directly transmitted from the user terminal 12 to the operator terminal 16 between the start of communication and the start of talk.

アバターの画像２１０は、利用者が予め選択したアバターまたは操作者が予め選択したアバターについての画像である。利用者が予め選択したアバターについての画像が表示される場合には、アバターの種類が、利用者側端末１２からサーバ１８を介して、または、利用者側端末１２から直接、操作者側端末１６に通知される。 The avatar image 210 is an image of an avatar selected in advance by the user or an avatar selected in advance by the operator. When an image of an avatar selected by the user in advance is displayed, the type of avatar can be changed from the user terminal 12 via the server 18 or directly from the user terminal 12 to the operator terminal 16. will be notified.

また、第１実施例では、操作者側端末１６は、利用者に対応するアバターに、操作者に好意的な表情または共感する表情を表現させる。また、操作者側端末１６は、利用者に対応するアバターを、操作者が喋り難い状況を回避するように動作させる。 Further, in the first embodiment, the operator side terminal 16 causes the avatar corresponding to the user to express a facial expression that is favorable or sympathetic to the operator. Further, the operator side terminal 16 operates the avatar corresponding to the user so as to avoid a situation in which it is difficult for the operator to speak.

具体的には、操作者が発話していない場合には、操作者側端末１６は、利用者の表情または利用者の視線および頭部の動きに基づいて、利用者に対応するアバターの表情および視線および頭部（首）の動きを制御する。ただし、一部の利用者の表情については、強調または緩和して利用者に対応するアバターに反映される。また、利用者に対応するアバターの視線は、利用者の視線とは関係無く制御される場合もある。 Specifically, when the operator is not speaking, the operator terminal 16 displays the facial expressions and expressions of the avatar corresponding to the user based on the user's facial expression or the user's line of sight and head movements. Controls gaze and head (neck) movements. However, some users' facial expressions are emphasized or relaxed and reflected in the avatar corresponding to the user. Furthermore, the line of sight of the avatar corresponding to the user may be controlled independently of the user's line of sight.

第１実施例では、利用者の表情が微笑む表情である場合には、微笑みの度合を大きくして、利用者に対応するアバターに微笑む表情を表現させる。つまり、利用者の微笑みが強調されたアバターの画像２１０が表示される。また、利用者の表情が不機嫌な表情である場合には、不機嫌の度合を小さくして、利用者に対応するアバターに不機嫌な表情を表現させる。つまり、利用者の不機嫌さが緩和されたアバターの画像２１０が表示される。このように、利用者に対応するアバターの表情を制御することで、利用者の表情を好意的に変更した度合で利用者に対応するアバターを表現させる。 In the first embodiment, when the user's facial expression is a smiling expression, the degree of smiling is increased and the avatar corresponding to the user is made to express a smiling expression. In other words, an avatar image 210 in which the user's smile is emphasized is displayed. Further, when the user's facial expression is a displeased expression, the degree of displeasure is reduced and the avatar corresponding to the user is made to express the displeased expression. In other words, an image 210 of the avatar in which the user's moodiness is alleviated is displayed. In this way, by controlling the facial expression of the avatar corresponding to the user, the avatar corresponding to the user is expressed to the extent that the user's facial expression is changed favorably.

微笑みが強調される度合（以下、「強調度合」）および不機嫌さが緩和される度合（以下、「緩和度合」）のパラメータ（以下、「設定パラメータ」という）については、操作者によって予め設定される。 The parameters (hereinafter referred to as "setting parameters") for the degree to which a smile is emphasized (hereinafter referred to as "emphasis degree") and the degree to which displeasure is alleviated (hereinafter referred to as "relaxation degree") are set in advance by the operator. Ru.

ただし、強調度合および緩和度合の各設定パラメータについては、自動的に設定されてもよい。この場合、一例として、利用者の微笑みの度合が低い場合には、強調度合が大きく設定され、利用者の微笑みの度合が高い場合には、強調度合が低く設定される。また、一例として、利用者の不機嫌さの度合が低い場合には、緩和度合が小さく設定され、利用者の不機嫌さの度合が高い場合には、緩和度合が大きく設定される。 However, each setting parameter of the degree of emphasis and the degree of relaxation may be automatically set. In this case, as an example, when the degree of the user's smile is low, the degree of emphasis is set high, and when the degree of the user's smile is high, the degree of emphasis is set low. Further, as an example, when the user's degree of moodiness is low, the relaxation degree is set to be small, and when the user's moodiness is high, the relaxation degree is set to be large.

また、第１実施例では、利用者の視線に応じて利用者に対応するアバターの視線が制御され、利用者の視線が正面を向いており、正面を向いている時間が第１所定時間（たとえば、数秒～１０秒程度）以上継続した場合には、利用者の視線に関係無く、第２所定時間（たとえば、第１所定時間の所定の割合）利用者に対応するアバターの視線を逸らす。つまり、視線を逸らすように、利用者に対応するアバターを動作させる（つまり、制御する）ことで、利用者に対応するアバターは操作者に対する威圧感の無い動作を行い、操作者が喋り難い状況を回避することができる。所定の割合のパラメータもまた、上記の設定パラメータに含まれ、操作者によって予め設定される。 Further, in the first embodiment, the line of sight of the avatar corresponding to the user is controlled according to the user's line of sight, and the user's line of sight is facing the front, and the time period for which the user is facing the front is the first predetermined time ( For example, if it continues for more than a few seconds to 10 seconds, the avatar corresponding to the user averts its line of sight for a second predetermined period of time (for example, a predetermined percentage of the first predetermined time), regardless of the user's line of sight. In other words, by moving (in other words, controlling) the avatar corresponding to the user so as to avert the user's line of sight, the avatar corresponding to the user can move in a way that does not feel intimidating to the operator, making it difficult for the operator to speak. can be avoided. The predetermined ratio parameter is also included in the above setting parameters and is preset by the operator.

ただし、利用者と操作者が対話中の利用者の視線は、利用者の顔画像に基づいて検出される。顔画像から視線を検出する方法は既に周知であるため、その説明は省略することにする。 However, the user's line of sight during a conversation between the user and the operator is detected based on the user's facial image. Since the method of detecting the line of sight from a face image is already well known, its explanation will be omitted.

また、利用者の頭部の動きは次のように検出（推定）される。利用者の顔の向きが、カメラ３８に正対する利用者の顔画像の向きを基準として、現在の顔の向きが現在の顔画像に基づいて算出され、現在の顔の向きに基づいて利用者の頭部の動きが検出（または、推定）される。ただし、顔の向きは、顔画像から抽出した複数の顔の特徴点の動きで検出することができる。説明は省略するが、操作者の顔の向きを算出し、頭部の動きを検出する場合も同様である。 Further, the movement of the user's head is detected (estimated) as follows. The orientation of the user's face is based on the orientation of the user's face image directly facing the camera 38, the current face orientation is calculated based on the current face image, and the user's face orientation is calculated based on the current face orientation. head movement is detected (or estimated). However, the direction of the face can be detected by the movement of multiple facial feature points extracted from the face image. Although the explanation will be omitted, the same applies to the case where the direction of the operator's face is calculated and the movement of the head is detected.

利用者の視線および頭部の動きは、上述したように、利用者に対応するアバターの視線および頭部（首）の動きに反映される。このことは、利用者が発話している場合も同様である。 As described above, the user's line of sight and head movement are reflected in the line of sight and head (neck) movement of the avatar corresponding to the user. This also applies when the user is speaking.

また、操作者が発話していない場合には、利用者が発話することもある。利用者が発話している場合には、利用者の音声の出力に合わせて利用者に対応するアバターの口唇部が動かされる。以下、利用者に対応するアバターが、利用者の音声の出力に合わせて口唇部を動かすことを発話動作と呼ぶことがある。 Further, if the operator is not speaking, the user may speak. When the user is speaking, the lips of the avatar corresponding to the user are moved in accordance with the output of the user's voice. Hereinafter, the movement of the lips of the avatar corresponding to the user in accordance with the output of the user's voice may be referred to as a speaking action.

また、操作者が発話している場合には、操作者側端末１６は、操作者の表情に基づいて利用者に対応するアバターの表情を制御するとともに、操作者が発話する音声に基づいて利用者に対応するアバターの動作を制御する場合がある。 Further, when the operator is speaking, the operator side terminal 16 controls the facial expression of the avatar corresponding to the user based on the operator's facial expression, and also controls the avatar's facial expression based on the voice uttered by the operator. The behavior of the avatar corresponding to the person may be controlled.

第１実施例では、操作者側端末１６は、操作者の表情を認識し、操作者の表情と同じ表情をアバターに表現させる。具体的には、操作者の表情が微笑む表情である場合には、利用者に対応するアバターに微笑む表情を表現させる。また、操作者の表情が悲しい表情である場合には、利用者に対応するアバターに悲しい表情を表現させる。つまり、操作者が発話している場合には、アバターを操作者に共感させる。 In the first embodiment, the operator terminal 16 recognizes the facial expression of the operator and causes the avatar to express the same facial expression as the operator's facial expression. Specifically, when the operator's facial expression is a smiling expression, the avatar corresponding to the user is caused to express a smiling expression. Furthermore, when the operator's facial expression is a sad expression, the avatar corresponding to the user is made to express a sad expression. In other words, when the operator is speaking, the avatar is made to empathize with the operator.

この第１実施例では、微笑みの度合および悲しみの度合のパラメータは、操作者によって予め設定され、上記の設定パラメータに含まれる。ただし、微笑みの度合および悲しみの度合のパラメータは、操作者の微笑みの度合および悲しみの度合に応じて設定することもできる。 In this first embodiment, the parameters of the degree of smile and the degree of sadness are set in advance by the operator and are included in the above-mentioned setting parameters. However, the parameters for the degree of smile and the degree of sadness can also be set according to the degree of smile and the degree of sadness of the operator.

上述したように、利用者および操作者の表情は、それぞれ、顔画像に基づいて認識され、認識された表情がアバターの画像２１０で表現される。利用者および操作者の表情は、それぞれ、対話中における利用者および操作者の顔画像に基づいて認識することができる。 As described above, the facial expressions of the user and the operator are recognized based on the facial images, and the recognized facial expressions are expressed in the avatar image 210. The facial expressions of the user and the operator can be recognized based on the facial images of the user and the operator during the interaction, respectively.

顔画像を用いて、利用者および操作者のような人間の表情を認識する方法はすでに公知であるため、その方法の説明については省略する。一例として、「小林宏、原文雄：ニューラルネットワークによる人の基本表情認識、計測自動制御学会論文集 Vol.29, No.1, 112/118(1993)」、「小谷中陽介、本間経康、酒井正夫、阿部健一：ニューラルネットワークを用いた顔表情認識、東北大医保健学科紀要 13(1):23～32, 2004」および「西銘大喜、遠藤聡志、當間愛晃、山田孝治、赤嶺有平：畳み込みニューラルネットワークを用いた表情表現の獲得と顔特徴量の分析、人工知能学会論文誌３２巻５号ＦＺ（２０１７年）」などに開示された公知技術を用いることができる。 Since a method for recognizing facial expressions of people such as users and operators using facial images is already known, a description of the method will be omitted. For example, "Hiroshi Kobayashi, Fumio Hara: Basic human facial expression recognition using neural networks, Proceedings of the Society of Instrument and Control Engineers Vol. 29, No. 1, 112/118 (1993)", "Yosuke Koyanaka, Tsuneyasu Honma, Masao Sakai, Kenichi Abe: Facial expression recognition using neural networks, Bulletin of the Department of Medicine and Health Sciences, Tohoku University 13(1):23-32, 2004” and “Daiki Nishime, Satoshi Endo, Yoshiaki Toma, Koji Yamada, Arihei Akamine It is possible to use known techniques such as those disclosed in "Acquisition of facial expressions and analysis of facial features using convolutional neural networks, Journal of the Japanese Society for Artificial Intelligence, Vol. 32, No. 5, FZ (2017)."

また、他の公知技術では、顔画像から抽出した特徴点に基づいて人間の表情を認識する手法として、特開２０２０－１６３６６０号公報に開示された技術を用いることもできる。 In addition, as another known technique, the technique disclosed in Japanese Patent Application Laid-open No. 2020-163660 can also be used as a method of recognizing human facial expressions based on feature points extracted from facial images.

ただし、人間の顔画像に基づいて表情を認識するために必要な回路コンポーネントおよびデータは適宜操作者側端末１６に設けられる。また、顔画像に基づいて表情を認識する装置（以下、「認識装置」という）をクラウド上に設けて、認識装置に顔画像を送信し、表情の認識結果を推定装置から受け取るようにしてもよい。 However, circuit components and data necessary for recognizing facial expressions based on a human face image are provided in the operator terminal 16 as appropriate. Alternatively, a device that recognizes facial expressions based on facial images (hereinafter referred to as a "recognition device") may be installed on the cloud, and the facial images can be sent to the recognition device and the facial expression recognition results can be received from the estimation device. good.

また、度合の異なる複数の表情（この第１実施例では、微笑む表情、不機嫌な表情および悲しい表情）をニューラルネットワークに学習させておくことにより、表情の認識のみならず、表情の度合も認識（または、推定）することができる。また、表情を認識したときのニューラルネットワークの出力の差に基づいて、表情の度合を認識することもできる。たとえば、無表情の顔画像についての出力と、認識された表情についての出力の差に基づいて表情の度合が認識される。一例として、表情の度合は、０－１００％の間で認識される。 In addition, by having the neural network learn multiple facial expressions with different degrees of intensity (in this first embodiment, a smiling expression, a displeased expression, and a sad expression), it is possible to recognize not only facial expressions but also the degree of expression ( or estimated). Furthermore, it is also possible to recognize the degree of facial expression based on the difference in the output of the neural network when facial expressions are recognized. For example, the degree of facial expression is recognized based on the difference between the output for an expressionless facial image and the output for a recognized facial expression. As an example, the degree of facial expression is recognized between 0-100%.

また、特開２０２０－１６３６６０号公報の方法を用いて人間の表情を認識する場合には、顔画像から抽出された特徴点の差（距離）に基づいて、表情の度合を認識（または、推定）することもできる。たとえば、無表情の顔画像から抽出された各特徴点に対する、表情の認識に使用した利用者の顔画像から抽出された各特徴点についての距離を算出し、算出した距離に基づいて表情の度合が決定される。距離は、各特徴点について算出されるため、表情の度合は、たとえば、算出された複数の距離についての平均値、最大値または分散に基づいて決定される。 Furthermore, when recognizing human facial expressions using the method disclosed in Japanese Patent Application Laid-open No. 2020-163660, the degree of facial expression is recognized (or estimated) based on the difference (distance) between feature points extracted from facial images. ) can also be done. For example, the distance between each feature point extracted from an expressionless face image and each feature point extracted from the user's face image used for expression recognition is calculated, and the degree of expression is determined based on the calculated distance. is determined. Since the distance is calculated for each feature point, the degree of facial expression is determined based on, for example, the average value, maximum value, or variance of the plurality of calculated distances.

図６（Ａ）は、アバターの微笑む表情についての度合の違いを説明するための図であり、図６（Ｂ）は、アバターの不機嫌な表情についての度合の違いを説明するための図であり、図６（Ｃ）は、アバターの悲しい表情についての度合の違いを説明するための図である。 FIG. 6(A) is a diagram for explaining the difference in the degree of the avatar's smiling expression, and FIG. 6(B) is a diagram for explaining the difference in the degree of the avatar's displeased expression. , FIG. 6(C) is a diagram for explaining the difference in the degree of sad expression of the avatar.

微笑む表情、不機嫌な表情および悲しい表情は、それぞれ、その度合（または、大きさ）を最小（０％）から最大（１００％）まで複数の段階（たとえば、２０段階）で設定可能である。アバターの顔の表情は、各表情および各表情の度合について、眉毛（位置、形状）、眉間のしわ（寄り具合）、眼（黒目の大きさ、目尻の上げ下げ具合）、瞼（開き具合）、ほうれい線（寄り具合）、および口（位置、形状および開き具合）の各部位についてのパラメータ（以下、「表情パラメータ」という）で決定される。 The degree (or magnitude) of each of the smiling, displeased, and sad expressions can be set in multiple levels (for example, 20 levels) from minimum (0%) to maximum (100%). The avatar's facial expressions include each expression and the degree of each expression: eyebrows (position, shape), wrinkles between the eyebrows (how close they are), eyes (size of the iris, how the outer corners of the eyes are raised and lowered), eyelids (how open they are), It is determined by parameters (hereinafter referred to as "facial expression parameters") for each part of the nasolabial folds (degree of approach) and mouth (position, shape, and degree of opening).

上述したように、利用者および操作者のような人間の表情の度合を０－１００％の間で認識し、アバターの表情を０－１００％の度合で表現するため、この第１実施例では、表情パラメータは、人間の表情の度合に合わせて決定される。したがって、顔画像から認識された人間の表情と同じ表情を同じ度合で表現されたアバターの画像２１０はその人間と同様の表情になる。 As mentioned above, in order to recognize the degree of facial expressions of humans such as users and operators between 0 and 100%, and to express the facial expressions of avatars at a degree of 0 to 100%, in this first embodiment, , facial expression parameters are determined according to the degree of human facial expression. Therefore, the avatar image 210, in which the same facial expression as the human facial expression recognized from the facial image is expressed to the same degree, has the same facial expression as that human.

図６（Ａ）－図６（Ｃ）に示す例では、左端（すなわち、無表情）から右端に向かうに従って表情の度合が大きくされ、左端に近づくに従って表情の度合が小さくされる。図６（Ａ）－図６（Ｃ）では、各表情について、表現の度合が中くらい（普通）である場合と、表現の度合が最大である場合についてのアバターの画像２１０を示してある。 In the examples shown in FIGS. 6A to 6C, the degree of facial expression increases from the left end (that is, expressionless) toward the right end, and decreases as the left end is approached. 6(A) to 6(C) show avatar images 210 when the degree of expression is medium (normal) and when the degree of expression is maximum for each facial expression.

ただし、上記の無表情のアバターの画像２１０が、微笑む表情、不機嫌な表情および悲しい表情の各々について表情の度合が最低である場合のアバターの画像２１０である。 However, the above expressionless avatar image 210 is the avatar image 210 when the degree of expression is the lowest for each of the smiling expression, the displeased expression, and the sad expression.

また、図示は省略するが、上述したように、各感情の度合は複数の段階に設定されているため、各段階の表情を有するアバターの画像２１０を表示することが可能である。 Furthermore, although not shown, since the degree of each emotion is set to multiple levels as described above, it is possible to display the image 210 of the avatar having facial expressions at each level.

さらに、各表情は２０段階で設定可能であるため、上述した強調度合および緩和度合、微笑み度合および悲しみの度合の設定パラメータは、５％刻みで設定可能である。 Furthermore, since each facial expression can be set in 20 levels, the setting parameters for the degree of emphasis, degree of relaxation, degree of smile, and degree of sadness described above can be set in 5% increments.

なお、無表情のアバターでは、しわが無く、顔が左右対称に設定される。このようにデザインすることで、性別が判断し難くなり、男性または女性に偏った好みを持つ利用者にも受け入れられるアバターになる。また、しわが無く、左右対称に設定された特徴の無い顔にすることで、少しのしわを作るだけで、笑顔またはしかめ面のような表情を簡単に表現することができ、その表情の強さも簡単に制御することができる。 Note that an expressionless avatar has no wrinkles and a symmetrical face. By designing in this way, it becomes difficult to determine the gender of the avatar, and the avatar becomes acceptable to users who have biased preferences toward men or women. In addition, by creating a featureless face with no wrinkles and symmetrical features, you can easily express facial expressions such as a smile or a frown just by creating a few wrinkles, and the strength of that facial expression can be increased. It can also be easily controlled.

また、第１実施例では、操作者が発話している場合には、操作者側端末１６は、適宜のタイミングで、利用者に対応するアバターに頷き動作を行わせる。つまり、操作者が喋り難い状況が回避される。 Further, in the first embodiment, when the operator is speaking, the operator terminal 16 causes the avatar corresponding to the user to perform a nodding motion at an appropriate timing. In other words, a situation where it is difficult for the operator to speak is avoided.

具体的には、操作者の音声が途切れたタイミング、または、操作者が利用者に同意を求めている内容を発話したタイミングで、利用者に対応するアバターに頷き動作を行わせる。操作者の音声が途切れたことは、操作者の音声の音量が予め設定される所定のレベル以下である状態が第３所定時間（たとえば、数ｍｓｅｃ）継続した場合に判断される。また、操作者が利用者に同意を求めている内容を発話したことは、操作者の音声を認識し、操作者が予め設定される所定の同意を求める内容を発話しているかどうかで判断される。同意を求める内容は、「～ですよね」および「よろしいですか」などである。 Specifically, the avatar corresponding to the user is caused to nod at the timing when the operator's voice is interrupted or at the timing when the operator utters the content for which consent is requested from the user. It is determined that the operator's voice has been interrupted when the volume of the operator's voice remains below a predetermined level for a third predetermined period of time (for example, several milliseconds). In addition, whether the operator has uttered the content for which consent is being requested from the user is determined by recognizing the operator's voice and determining whether or not the operator has uttered the content for which consent is requested from the user. Ru. Contents for which consent is requested include "Isn't it..." and "Are you sure?".

ただし、利用者に対応するアバターが頷き動作を行わず、操作者の表情が利用者に対応するアバターに反映されない場合には、利用者の表情、利用者の視線および利用者の頭部の動きが利用者に対応するアバターに反映される。また、利用者に対応するアバターが頷き動作を行わず、操作者の表情が利用者に対応するアバターに反映される場合には、利用者の視線および利用者の頭部の動きが利用者に対応するアバターに反映される。さらに、利用者に対応するアバターが頷き動作を行う場合には、利用者の表情が利用者に対応するアバターに反映される。 However, if the avatar corresponding to the user does not make a nodding motion and the operator's facial expression is not reflected in the avatar corresponding to the user, the user's facial expression, the user's line of sight, and the user's head movements will be reflected on the avatar corresponding to the user. In addition, if the avatar corresponding to the user does not nod and the operator's facial expression is reflected on the avatar corresponding to the user, the user's line of sight and the movement of the user's head may be It will be reflected on the corresponding avatar. Further, when the avatar corresponding to the user performs a nodding motion, the facial expression of the user is reflected on the avatar corresponding to the user.

また、操作者が発話している場合にも、利用者が発話することもある。この場合にも、利用者の音声の出力に合わせて利用者に対応するアバターは発話動作を行う。 Further, even when the operator is speaking, the user may also speak. In this case as well, the avatar corresponding to the user performs a speaking motion in accordance with the output of the user's voice.

図７は操作者側端末１６に内蔵される記憶部（ここでは、ＲＡＭ）５２のメモリマップ３００の一例を示す。ＲＡＭは、ＣＰＵ５０のワーク領域およびバッファ領域として使用される。図７に示すように、ＲＡＭは、プログラム記憶領域３０２およびデータ記憶領域３０４を含む。プログラム記憶領域３０２には、この第１実施例の操作者側端末１６で実行される全体的な処理についての制御プログラムが記憶されている。 FIG. 7 shows an example of a memory map 300 of the storage unit (here, RAM) 52 built in the operator terminal 16. The RAM is used as a work area and a buffer area for the CPU 50. As shown in FIG. 7, the RAM includes a program storage area 302 and a data storage area 304. The program storage area 302 stores a control program for the overall processing executed by the operator terminal 16 of the first embodiment.

制御プログラムは、操作検出プログラム３０２ａ、撮影プログラム３０２ｂ、音検出プログラム３０２ｃ、音声認識プログラム３０２ｄ、通信プログラム３０２ｅ、画像生成プログラム３０２ｆ、画像出力プログラム３０２ｇ、アバター制御プログラム３０２ｈ、表情認識プログラム３０２ｉ、視線および頭部の動き検出プログラム３０２ｊおよび音出力プログラム３０２ｋなどを含む。 The control programs include an operation detection program 302a, a photography program 302b, a sound detection program 302c, a voice recognition program 302d, a communication program 302e, an image generation program 302f, an image output program 302g, an avatar control program 302h, an expression recognition program 302i, and a line of sight and head recognition program. The program includes a motion detection program 302j, a sound output program 302k, and the like.

操作検出プログラム３０２ａは、操作者の操作に従って入力装置５８から入力される操作データ３０４ａを検出し、データ記憶領域３０４に記憶するためのプログラムである。撮影プログラム３０２ｂは、カメラ６８で画像を撮影し、撮影した撮影画像データを送信データ３０４ｂとしてデータ記憶領域３０４に記憶するとともに、撮影画像データ３０４ｉをデータ記憶領域３０４に記憶するためのプログラムである。 The operation detection program 302a is a program for detecting operation data 304a input from the input device 58 according to an operation by an operator, and storing the detected operation data 304a in the data storage area 304. The photographing program 302b is a program for photographing an image with the camera 68, storing the photographed image data as transmission data 304b in the data storage area 304, and storing the photographed image data 304i in the data storage area 304.

音検出プログラム３０２ｃは、マイク６２から入力される音声を検出し、対応する音声データを送信データ３０４ｂとしてデータ記憶領域３０４に記憶するとともに、音声データ３０４ｋをデータ記憶領域３０４に記憶するためのプログラムである。音声認識プログラム３０２ｄは、音検出プログラム３０２ｃに従って検出した音声を音声認識するためのプログラムである。音声認識のために必要な辞書データについては図示を省略するが、操作者側端末１６の記憶部（ここでは、ＨＤＤまたはＲＯＭ）５２に記憶される。 The sound detection program 302c is a program for detecting the sound input from the microphone 62, storing the corresponding sound data as transmission data 304b in the data storage area 304, and storing the sound data 304k in the data storage area 304. be. The voice recognition program 302d is a program for recognizing the voice detected according to the sound detection program 302c. Dictionary data necessary for voice recognition is not shown, but is stored in the storage unit (HDD or ROM in this case) 52 of the operator terminal 16.

通信プログラム３０２ｅは、外部の機器、この第１実施例では、利用者側端末１２およびサーバ１８と有線または無線で通信（データの送信および受信）するためのプログラムである。 The communication program 302e is a program for wired or wireless communication (data transmission and reception) with external devices, in this first embodiment, the user terminal 12 and the server 18.

画像生成プログラム３０２ｆは、表示装置６０に表示するための各種の画面の全部または一部に対応する画像（アバターの画像２１０を含む）の画像データを、画像生成データ３０４ｄを用いて生成するためのプログラムである。画像出力プログラム３０２ｇは、画像生成プログラム３０２ｆに従って生成した画像データを表示装置６０に出力するためのプログラムである。 The image generation program 302f is for generating image data of images (including the avatar image 210) corresponding to all or part of various screens to be displayed on the display device 60 using the image generation data 304d. It is a program. The image output program 302g is a program for outputting image data generated according to the image generation program 302f to the display device 60.

アバター制御プログラム３０２ｈは、利用者に対応するアバターを制御するためのプログラムである。この第１実施例では、ＣＰＵ５０は、アバター制御プログラム３０２ｈに従って、アバターの表情を変化させたり、アバターを動作（発話動作および頭部（首）の動作）させたり、アバターの視線を移動させたりする。 The avatar control program 302h is a program for controlling the avatar corresponding to the user. In this first embodiment, the CPU 50 changes the facial expression of the avatar, makes the avatar move (speech movement and head (neck) movement), and moves the avatar's line of sight according to the avatar control program 302h. .

表情認識プログラム３０２ｉは、利用者および操作者の表情を認識するためのプログラムである。上述したように、利用者側端末１２から受信した撮影画像データに基づいて利用者の表情およびその度合が認識される。また、操作者側端末１６のカメラ６８で撮影された撮影画像データ３０４ｉに基づいて操作者の表情が認識される。 The facial expression recognition program 302i is a program for recognizing facial expressions of users and operators. As described above, the user's facial expression and its degree are recognized based on the captured image data received from the user terminal 12. Furthermore, the facial expression of the operator is recognized based on the captured image data 304i captured by the camera 68 of the operator-side terminal 16.

視線および頭部の動き検出プログラム３０２ｊは、利用者側端末１２から受信した撮影画像データに基づいて利用者の視線および頭部の動きを検出するためのプログラムである。音出力プログラム３０２ｋは、利用者側端末１２から受信した利用者の音声データをスピーカ６４に出力するためのプログラムである。 The line of sight and head movement detection program 302j is a program for detecting the user's line of sight and head movement based on the captured image data received from the user terminal 12. The sound output program 302k is a program for outputting the user's voice data received from the user terminal 12 to the speaker 64.

図示は省略するが、プログラム記憶領域３０２には、操作者側端末１６のオペレーティングシステムなどのミドルウェア、ブラウザ機能を実行するためのプログラムおよび各種のアプリケーションプログラムなどの他のプログラムも記憶される。 Although not shown, the program storage area 302 also stores other programs such as middleware such as the operating system of the operator terminal 16, programs for executing browser functions, and various application programs.

図８は図７に示したＲＡＭのデータ記憶領域３０４の具体的な内容の一例を示す図である。図８に示すように、データ記憶領域３０４には、操作データ３０４ａ、送信データ３０４ｂ、受信データ３０４ｃ、画像生成データ３０４ｄ、表情パラメータデータ３０４ｅ、設定パラメータデータ３０４ｆ、利用者表情データ３０４ｇ、利用者視線および頭部の動きデータ３０４ｈ、撮影画像データ３０４ｉ、操作者表情データ３０４ｊおよび音声データ３０４ｋなどが記憶される。 FIG. 8 is a diagram showing an example of specific contents of the data storage area 304 of the RAM shown in FIG. As shown in FIG. 8, the data storage area 304 includes operation data 304a, transmission data 304b, reception data 304c, image generation data 304d, facial expression parameter data 304e, setting parameter data 304f, user facial expression data 304g, and user gaze. Also stored are head movement data 304h, photographed image data 304i, operator facial expression data 304j, audio data 304k, and the like.

操作データ３０４ａは、操作検出プログラム３０２ａに従って検出された操作データである。送信データ３０４ｂは、利用者側端末１２に送信するデータであり、チャットにおける操作者の応答内容についてのテキストデータおよびトークにおける操作者の応答内容についての音声データである。 The operation data 304a is operation data detected according to the operation detection program 302a. The transmission data 304b is data to be transmitted to the user terminal 12, and is text data regarding the contents of the operator's response in the chat and audio data regarding the contents of the operator's response in the talk.

受信データ３０４ｃは、利用者側端末１２から送信され、受信したデータであり、チャットにおける利用者の質問内容についてのテキストデータ、トークにおける利用者の質問内容についての音声データおよび利用者側端末１２のカメラ３８で撮影された撮影画像データである。また、受信データ３０４ｃは、サーバ１８から送信される利用者側端末１２の接続情報データを含む。 The received data 304c is data transmitted and received from the user terminal 12, and includes text data regarding the content of the user's question in the chat, audio data regarding the content of the user's question in the talk, and data from the user terminal 12. This is photographed image data taken by the camera 38. The received data 304c also includes connection information data of the user terminal 12 transmitted from the server 18.

画像生成データ３０４ｄは、操作者側端末１６の表示装置６０に表示される各種の画面を生成するためのデータであり、アバターの画像２１０を生成するためのデータを含む。アバターの画像２１０を生成するためのデータは、アバターの静止した状態の画像データおよび首の動きについてのデータを含む。首の動きは、発話時の首の動きおよび頷く時の首の動きである。ただし、複数種類のアバターが設けられるため、アバターの静止した状態の画像データはアバター毎に記憶され、選択されたアバターの画像データが使用される。 The image generation data 304d is data for generating various screens displayed on the display device 60 of the operator terminal 16, and includes data for generating the avatar image 210. The data for generating the avatar image 210 includes image data of the avatar at rest and data about neck movements. The neck movement is the movement of the neck when speaking and the movement of the neck when nodding. However, since a plurality of types of avatars are provided, the image data of the avatar in a stationary state is stored for each avatar, and the image data of the selected avatar is used.

表情パラメータデータ３０４ｅは、微笑む表情、不機嫌な表情および悲しい表情の各々について、表情の度合を最小から最大まで複数の段階で変化させるための各部位の表情パラメータについてのデータである。ただし、複数のアバターが設けられるため、表情パラメータについてのデータはアバター毎に記憶され、選択されたアバターについての表情パラメータが使用される。 The facial expression parameter data 304e is data about the facial expression parameters of each part for changing the degree of facial expression from the minimum to the maximum in multiple stages for each of a smiling expression, a displeased expression, and a sad expression. However, since a plurality of avatars are provided, data regarding facial expression parameters are stored for each avatar, and facial expression parameters for the selected avatar are used.

設定パラメータデータ３０４ｆは、強調度合、緩和度合、微笑み度合、悲しみ度合および所定の割合の各設定パラメータについてのデータである。 The setting parameter data 304f is data regarding each setting parameter of the degree of emphasis, the degree of relaxation, the degree of smile, the degree of sadness, and a predetermined ratio.

利用者表情データ３０４ｇは、利用者側端末１２から受信した撮影画像データから認識した利用者の表情およびその度合を示すデータである。利用者視線および頭部の動きデータ３０４ｈは、利用者側端末１２から受信した撮影画像データから算出した利用者の視線および利用者の頭部の動きを示すデータである。 The user facial expression data 304g is data indicating the user's facial expression and its degree recognized from the captured image data received from the user terminal 12. The user's line of sight and head movement data 304h is data indicating the user's line of sight and the movement of the user's head calculated from the captured image data received from the user terminal 12.

撮影画像データ３０４ｉは、カメラ６８で撮影した画像データである。操作者表情データ３０４ｊは、撮影画像データ３０４ｉから認識した操作者の表情を示すデータである。音声データ３０４ｋは、音検出プログラム３０２ｃに従って検出された操作者の音声についてのデータであり、操作者の音声を認識するために用いられる。 The photographed image data 304i is image data photographed by the camera 68. Operator facial expression data 304j is data indicating the facial expression of the operator recognized from photographed image data 304i. The audio data 304k is data about the operator's voice detected according to the sound detection program 302c, and is used to recognize the operator's voice.

図示は省略するが、データ記憶領域３０４には、制御処理を実行するために必要な他のデータが記憶されたり、タイマ（カウンタ）およびフラグが設けられたりする。 Although not shown, the data storage area 304 stores other data necessary for executing control processing, and is provided with a timer (counter) and a flag.

また、図示は省略するが、利用者側端末１２は操作者側端末１６との間でチャットまたはトークを行うため、利用者側端末１２の記憶部（ここでは、ＲＡＭ）２２には、操作者側端末１６のＲＡＭに記憶されるプログラムおよびデータのうち、チャットまたはトークに必要なプログラムおよびデータと同様のプログラムおよびデータが記憶される。 Although not shown, since the user side terminal 12 chats or talks with the operator side terminal 16, the storage unit (RAM in this case) 22 of the user side terminal 12 stores information about the operator side terminal 16. Among the programs and data stored in the RAM of the side terminal 16, the same programs and data as those necessary for chatting or talking are stored.

具体的には、利用者側端末１２のＲＡＭのプログラム記憶領域には、操作検出プログラム、撮影プログラム、音検出プログラム、通信プログラム、画像生成プログラム、画像出力プログラムおよび音出力プログラムなどが記憶される。 Specifically, the program storage area of the RAM of the user terminal 12 stores an operation detection program, a photography program, a sound detection program, a communication program, an image generation program, an image output program, a sound output program, and the like.

操作検出プログラムは、利用者の操作に従って入力装置２８から入力される操作データを検出し、記憶部２２のデータ記憶領域に記憶するためのプログラムである。撮影プログラムは、カメラ３８で画像を撮影し、撮影した画像についての撮影画像データを送信データとしてデータ記憶領域に記憶するためのプログラムである。音検出プログラムは、マイク３２から入力される音声を検出し、検出した音声についての音声データを送信データとしてデータ記憶領域に記憶するためのプログラムである。 The operation detection program is a program for detecting operation data input from the input device 28 according to a user's operation, and storing the detected operation data in the data storage area of the storage unit 22. The photographing program is a program for photographing an image with the camera 38 and storing photographed image data of the photographed image in the data storage area as transmission data. The sound detection program is a program for detecting the sound input from the microphone 32 and storing the sound data of the detected sound in the data storage area as transmission data.

通信プログラムは、外部の機器、この第１実施例では、操作者側端末１６およびサーバ１８と有線または無線で通信するためのプログラムである。画像生成プログラムは、表示装置３０に表示するための各種の画面に対応する画像データを、画像生成データを用いて生成するためのプログラムである。画像出力プログラムは、画像生成プログラムに従って生成した画像データを表示装置３０に出力するためのプログラムである。音出力プログラムは、受信した操作者の音声データを出力するためのプログラムである。 The communication program is a program for communicating with external equipment, in this first embodiment, the operator terminal 16 and the server 18, by wire or wirelessly. The image generation program is a program for generating image data corresponding to various screens to be displayed on the display device 30 using image generation data. The image output program is a program for outputting image data generated according to the image generation program to the display device 30. The sound output program is a program for outputting the received operator's voice data.

また、記憶部２２のデータ記憶領域には、操作データ、送信データ、受信データおよび画像生成データなどが記憶される。 Further, the data storage area of the storage unit 22 stores operation data, transmission data, reception data, image generation data, and the like.

操作データは、操作検出プログラムに従って検出された操作データである。送信データは、操作者側端末１６およびサーバ１８に送信するデータである。操作者側端末１６に送信するデータは、チャットにおける利用者の質問内容についてのテキストデータ、トークにける利用者の質問内容についての音声データおよびカメラ３８で撮影した撮影画像データである。サーバ１８に送信するデータは、オンラインショッピングに関するブラウザ上の操作データ（ボタン１１０および１１２についての操作データを含む）である。 The operation data is operation data detected according to the operation detection program. The transmission data is data to be transmitted to the operator terminal 16 and the server 18. The data transmitted to the operator side terminal 16 is text data regarding the content of the user's question in the chat, audio data regarding the content of the user's question in the talk, and photographed image data taken by the camera 38. The data transmitted to the server 18 is operation data on the browser regarding online shopping (including operation data regarding buttons 110 and 112).

受信データは、操作者側端末１６またはサーバ１８から送信され、受信したデータである。操作者側端末１６から受信したデータは、チャットにおける操作者の応答内容についてのテキストデータおよびトークにおける操作者の応答内容についての音声データである。ただし、操作者側端末１６で撮影された撮影画像データを受信する場合もある。また、サーバ１８から受信したデータは、ブラウザに表示するデータおよびサーバ１８によって選択された操作者側端末１６の接続情報データである。 The received data is data transmitted and received from the operator terminal 16 or the server 18. The data received from the operator side terminal 16 is text data about the operator's response in the chat and audio data about the operator's response in the talk. However, photographed image data photographed by the operator side terminal 16 may also be received. The data received from the server 18 is data to be displayed on the browser and connection information data for the operator terminal 16 selected by the server 18.

画像生成データは、利用者側端末１２の表示装置３０に表示される各種の画面を生成するためのデータである。 The image generation data is data for generating various screens displayed on the display device 30 of the user terminal 12.

なお、記憶部５２には、利用者側端末１２のオペレーティングシステムなどのミドルウェア、ブラウザ機能を実行するためのプログラムに加え、利用者とチャットまたはトークを実行するために必要な他のプログラムおよびデータも記憶される。 Note that, in addition to middleware such as the operating system of the user terminal 12 and a program for executing the browser function, the storage unit 52 also stores other programs and data necessary for chatting or talking with the user. be remembered.

図９－図１２は操作者側端末１６のＣＰＵ５０の制御処理を示すフロー図である。図１３は操作者側端末１６のＣＰＵ５０の送受信処理を示すフロー図である。図１４は利用者側端末１２のＣＰＵ２０の送受信処理を示すフロー図である。 9 to 12 are flowcharts showing control processing of the CPU 50 of the operator terminal 16. FIG. 13 is a flow diagram showing the transmission/reception processing of the CPU 50 of the operator terminal 16. FIG. 14 is a flow diagram showing the transmission and reception processing of the CPU 20 of the user terminal 12.

図示は省略するが、ＣＰＵ５０は、制御処理および送受信処理と並行して、操作者の操作を検出する処理、操作者の画像を撮影する処理および操作者の音声を検出する処理を実行する。 Although not shown, the CPU 50 executes a process of detecting an operation of the operator, a process of photographing an image of the operator, and a process of detecting the voice of the operator in parallel with the control process and the transmission/reception process.

図９に示すように、操作者側端末１６のＣＰＵ５０は、制御処理を開始すると、ステップＳ１で、利用者に対応するアバターの画像２１０を表示する。つまり、ＣＰＵ５０は、利用者に対応するアバターの画像２１０を含む表示枠２００の画像データを生成し、生成した画像データを表示装置６０に出力する。 As shown in FIG. 9, when the CPU 50 of the operator terminal 16 starts the control process, it displays an image 210 of an avatar corresponding to the user in step S1. That is, the CPU 50 generates image data for the display frame 200 including the image 210 of the avatar corresponding to the user, and outputs the generated image data to the display device 60.

次のステップＳ３では、利用者の表情を認識する。ここでは、ＣＰＵ５０は、受信データ３０４ｃに含まれる撮影画像データを取得し、取得した撮影画像データに含まれる利用者の顔画像から表情およびその度合を認識し、対応する利用者表情データ３０４ｇを記憶（更新）する。 In the next step S3, the facial expression of the user is recognized. Here, the CPU 50 acquires the photographed image data included in the received data 304c, recognizes the facial expression and its degree from the user's facial image included in the acquired photographic image data, and stores the corresponding user facial expression data 304g. (Update.

次のステップＳ５では、利用者の視線および頭部の動きを検出する。ここでは、ＣＰＵ５０は、ステップＳ３で取得した撮影画像データに含まれる利用者の顔画像から利用者の視線および頭部の動きを検出し、対応する利用者視線および頭部の動きデータ３０４ｈを記憶（更新）する。 In the next step S5, the user's line of sight and head movement are detected. Here, the CPU 50 detects the user's line of sight and head movement from the user's face image included in the captured image data acquired in step S3, and stores the corresponding user's line of sight and head movement data 304h. (Update.

ただし、取得した撮影画像データに利用者の顔画像が含まれていない場合には、表情は認識されず、利用者の視線および頭部の動きも検出されない。 However, if the acquired photographic image data does not include a facial image of the user, facial expressions will not be recognized, and the user's line of sight and head movements will not be detected.

続いて、ステップＳ７では、操作者が発話しているかどうかを判断する。ここでは、ＣＰＵ５０は、マイク６２で音声を検出しているかどうかを判断する。ステップＳ７で“ＹＥＳ”である場合には、つまり、操作者が発話している場合には、図１１に示すステップＳ２５に進む。 Subsequently, in step S7, it is determined whether the operator is speaking. Here, the CPU 50 determines whether or not the microphone 62 is detecting audio. If "YES" in step S7, that is, if the operator is speaking, the process advances to step S25 shown in FIG. 11.

一方、ステップＳ７で“ＮＯ”であれば、つまり、操作者が発話していない場合には、図１０に示すステップＳ９で、利用者が微笑んだかどうかを判断する。ここでは、ＣＰＵ５０は、ステップＳ３で認識した利用者の表情が微笑みであるかどうかを判断する。 On the other hand, if "NO" in step S7, that is, if the operator is not speaking, it is determined in step S9 shown in FIG. 10 whether the user smiles. Here, the CPU 50 determines whether the user's facial expression recognized in step S3 is a smiling face.

ステップＳ９で“ＹＥＳ”であれば、つまり、利用者が微笑んだと判断すると、ステップＳ１１で、アバターの画像２１０を、ステップＳ５で検出した利用者の視線に合わせ、強調した微笑む表情で表示させて、図１２に示すステップＳ４９に進む。ステップＳ１１では、ＣＰＵ５０は、利用者が微笑んだ度合よりも高い度合で微笑むようにアバターの表情を生成する。ただし、高くする度合は、予め設定された強調度合のパラメータで決定される。 If "YES" in step S9, that is, if it is determined that the user has smiled, then in step S11, the avatar image 210 is displayed with an emphasized smiling expression, aligned with the user's line of sight detected in step S5. The process then proceeds to step S49 shown in FIG. In step S11, the CPU 50 generates a facial expression of the avatar so that the avatar smiles at a higher degree than the user smiles. However, the degree of enhancement is determined by a preset emphasis degree parameter.

一方、ステップＳ９で“ＮＯ”であれば、つまり、利用者が微笑んでいないと判断すると、ステップＳ１３で、利用者が不機嫌な顔をしたかどうかを判断する。ここでは、ＣＰＵ５０は、ステップＳ３で認識した利用者の表情が不機嫌な顔であるかどうかを判断する。 On the other hand, if "NO" in step S9, that is, if it is determined that the user is not smiling, it is determined in step S13 whether or not the user has a displeased face. Here, the CPU 50 determines whether the facial expression of the user recognized in step S3 is a displeased face.

ステップＳ１３で“ＹＥＳ”であれば、つまり、利用者が不機嫌な顔をしたと判断すれば、ステップＳ１５で、アバターの画像２１０を、ステップＳ５で検出した利用者の視線に合わせ、緩和した不機嫌な表情で表示させて、ステップＳ４９に進む。ステップＳ１５では、ＣＰＵ５０は、利用者の不機嫌な表情の度合よりも低い度合でアバターに不機嫌な表情を生成する。ただし、低くする度合は、予め設定された緩和度合のパラメータで決定される。 If "YES" in step S13, that is, if it is determined that the user made a displeased face, in step S15, the avatar image 210 is aligned with the user's line of sight detected in step S5, and the displeasure is alleviated. The screen is displayed with a friendly expression, and the process advances to step S49. In step S15, the CPU 50 generates a displeased facial expression on the avatar at a level lower than the displeased facial expression of the user. However, the degree of reduction is determined by a preset relaxation degree parameter.

一方、ステップＳ１３で“ＮＯ”であれば、つまり、利用者が不機嫌な顔をしていないと判断すれば、ステップＳ１７で、ステップＳ３で認識した利用者の表情をアバターに反映する。つまり、アバターの画像２１０を、利用者の表情で表示する。 On the other hand, if "NO" in step S13, that is, if it is determined that the user does not have a displeased face, then in step S17, the facial expression of the user recognized in step S3 is reflected on the avatar. In other words, the avatar image 210 is displayed with the user's facial expression.

続くステップＳ１９では、アバターの視線を逸らすかどうかを判断する。ここでは、ＣＰＵ５０は、利用者が第１所定時間（たとえば、数秒～１０秒程度）以上正面を向いているかどうかを判断する。図示は省略したが、利用者の視線が正面を向いている場合に、その時間がカウントされ、正面以外を向いている場合にはカウントされない。また、利用者が正面以外を向いた後に再度正面を向いた場合には、最初から正面を向いている時間がカウントされる。 In the following step S19, it is determined whether to avert the avatar's line of sight. Here, the CPU 50 determines whether the user has been facing forward for a first predetermined period of time (for example, about several seconds to 10 seconds) or more. Although not shown, the time is counted when the user's line of sight is facing forward, and is not counted when the user's line of sight is facing other than the front. Furthermore, if the user turns to the front again after facing away from the front, the time spent facing the front from the beginning is counted.

ステップＳ１９で“ＹＥＳ”であれば、つまり、視線を逸らすことを判断すれば、ステップＳ２１で、アバターの視線を正面から逸らして、ステップＳ４９に進む。ステップＳ２１では、ＣＰＵ５０は、アバターの視線を第２所定時間正面以外に向ける。第２所定時間は、第１所定時間の所定の割合（たとえば、１割程度）に設定される。所定の割合は、予め設定された所定の割合のパラメータで決定される。また、アバターが視線を逸らす方向すなわち正面以外の方向は予め設定されている複数の方向のうちからランダムに選択または確率で抽選される。 If "YES" in step S19, that is, if it is determined that the avatar's line of sight is to be averted, in step S21, the avatar's line of sight is averted from the front, and the process proceeds to step S49. In step S21, the CPU 50 directs the avatar's line of sight away from the front for a second predetermined period of time. The second predetermined time is set to a predetermined ratio (for example, about 10%) of the first predetermined time. The predetermined ratio is determined by a predetermined ratio parameter set in advance. Further, the direction in which the avatar averts its line of sight, that is, the direction other than the front, is randomly selected from a plurality of preset directions or drawn by lottery with probability.

一方、ステップＳ１９で“ＮＯ”であれば、つまり、視線を逸らすことを判断しなければ、ステップＳ２３で、アバターの画像２１０を、利用者の視線に合わせて、ステップＳ４９に進む。 On the other hand, if "NO" in step S19, that is, if it is not determined to avert the user's line of sight, in step S23, the avatar image 210 is aligned with the user's line of sight, and the process proceeds to step S49.

上述したように、操作者が発話している場合には、ステップＳ７で“ＹＥＳ”となり、図１１に示すステップＳ２５で、カメラ６８の撮影画像を取得して、ステップＳ２７で、操作者の表情を認識する。ただし、上述したように、操作者を撮影する処理は、制御処理と並行して実行されており、ステップＳ２５では、ＣＰＵ５０は現在の撮影画像データ３０４ｉを取得する。また、ステップＳ２７では、ＣＰＵ５０は、撮影画像データ３０４ｉに含まれる操作者の顔画像から表情およびその度合を認識し、対応する操作者表情データ３０４ｊを記憶（更新）する。 As described above, if the operator is speaking, the answer is "YES" in step S7, the image taken by the camera 68 is acquired in step S25 shown in FIG. Recognize. However, as described above, the process of photographing the operator is executed in parallel with the control process, and in step S25, the CPU 50 acquires the current photographed image data 304i. Further, in step S27, the CPU 50 recognizes the facial expression and its degree from the facial image of the operator included in the captured image data 304i, and stores (updates) the corresponding operator facial expression data 304j.

続いて、ステップＳ２９では、頷くタイミングであるかどうかを判断する。ここでは、ＣＰＵ５０は、操作者の音声が途切れたタイミングであるか、操作者が利用者に同意を求めている内容を発話したタイミングであるかどうかを判断する。 Subsequently, in step S29, it is determined whether it is the timing to nod. Here, the CPU 50 determines whether it is the timing at which the operator's voice is interrupted or the timing at which the operator utters the content for which consent is requested from the user.

図示は省略するが、ＣＰＵ５０は、ステップＳ２９の処理を実行するとき、操作者の音声を音声認識する処理も実行する。 Although not shown, when executing the process of step S29, the CPU 50 also executes a process of recognizing the voice of the operator.

ステップＳ２９で“ＹＥＳ”であれば、つまり、頷くタイミングである場合には、ステップＳ３１で、アバターに頷き動作を実行させ、ステップＳ３３で、利用者に対応するアバターを利用者の視線に合わせ、利用者の表情を利用者に対応するアバターに反映して、ステップＳ４９に進む。つまり、アバターの画像２１０を、利用者の視線に合わせ、利用者の表情で表示させる。 If "YES" in step S29, that is, if it is time to nod, the avatar is caused to perform a nodding motion in step S31, and in step S33, the avatar corresponding to the user is aligned with the user's line of sight, The user's facial expression is reflected on the avatar corresponding to the user, and the process advances to step S49. In other words, the avatar image 210 is aligned with the user's line of sight and displayed with the user's facial expression.

一方、ステップＳ２９で“ＮＯ”であれば、つまり、頷くタイミングでない場合には、ステップＳ３５で、操作者が微笑んだかどうかを判断する。ここでは、ＣＰＵ５０は、ステップＳ２７で認識した操作者の顔の表情が微笑みであるかどうかを判断する。 On the other hand, if "NO" in step S29, that is, if it is not the timing to nod, it is determined in step S35 whether or not the operator has smiled. Here, the CPU 50 determines whether the facial expression of the operator recognized in step S27 is a smile.

ステップＳ３５で“ＹＥＳ”であれば、つまり、操作者が微笑んだと判断すると、ステップＳ３７で、予め設定された微笑みの度合で微笑む表情をアバターに表現させて、ステップＳ４５に進む。ステップＳ３７では、ＣＰＵ５０は、予め設定された微笑みの度合で微笑むようにアバターの表情を生成する。ただし、微笑みの度合は、設定パラメータデータ３０４ｆに含まれる微笑みの度合のパラメータで決定される。 If "YES" in step S35, that is, if it is determined that the operator has smiled, in step S37, the avatar is made to express a smiling expression with a preset smile level, and the process proceeds to step S45. In step S37, the CPU 50 generates the facial expression of the avatar so that the avatar smiles at a preset degree of smile. However, the degree of smile is determined by the smile degree parameter included in the setting parameter data 304f.

一方、ステップＳ３５で“ＮＯ”であれば、つまり、操作者が微笑んでいないと判断すると、ステップＳ３９で、操作者が悲しい顔をしたかどうかを判断する。ここでは、ＣＰＵ５０は、ステップＳ２７で認識した操作者の顔の表情が悲しみであるかどうかを判断する。 On the other hand, if "NO" in step S35, that is, if it is determined that the operator is not smiling, it is determined in step S39 whether or not the operator has a sad face. Here, the CPU 50 determines whether the facial expression of the operator recognized in step S27 is sad.

ステップＳ３９で“ＮＯ”であれば、つまり、操作者が悲しい顔をしていないと判断すると、ステップＳ４１で、利用者の表情を利用者に対応するアバターに反映して、ステップＳ４５に進む。 If "NO" in step S39, that is, if it is determined that the operator does not have a sad face, the user's facial expression is reflected in the avatar corresponding to the user in step S41, and the process proceeds to step S45.

一方、ステップＳ３９で“ＹＥＳ”であれば、つまり、操作者が悲しい顔をしたと判断すると、ステップＳ４３で、予め設定された悲しみの度合で悲しむ表情をアバターに表現させて、ステップＳ４５に進む。ステップＳ４３では、ＣＰＵ５０は、予め設定された悲しみの度合で悲しむようにアバターの表情を生成する。ただし、悲しみの度合は、設定パラメータデータ３０４ｆに含まれる悲しみの度合のパラメータで決定される。 On the other hand, if "YES" in step S39, that is, if it is determined that the operator made a sad face, in step S43, the avatar is made to express a sad expression with a preset degree of sadness, and the process proceeds to step S45. . In step S43, the CPU 50 generates a facial expression of the avatar to make the avatar sad to a preset degree of sadness. However, the degree of sadness is determined by the sadness degree parameter included in the setting parameter data 304f.

ステップＳ４５では、利用者に対応するアバターの視線を利用者の視線に合わせる。続くステップＳ４７では、利用者の頭部の動きを利用者に対応するアバターに反映して、図１２に示すステップＳ４９に進む。 In step S45, the line of sight of the avatar corresponding to the user is aligned with the line of sight of the user. In the following step S47, the movement of the user's head is reflected on the avatar corresponding to the user, and the process proceeds to step S49 shown in FIG. 12.

図１２に示すように、ステップＳ４９では、利用者が発話しているかどうかを判断する。ここでは、ＣＰＵ５０は、受信データ３０４ｃに、利用者の音声データが含まれているかどうかを判断する。 As shown in FIG. 12, in step S49, it is determined whether the user is speaking. Here, the CPU 50 determines whether the received data 304c includes the user's voice data.

ステップＳ４９で“ＹＥＳ”であれば、つまり、利用者が発話している場合には、ステップＳ５１で、利用者の音声を出力するとともに、利用者に対応するアバターに発話動作を実行させて、ステップＳ５５に進む。ステップＳ５１では、ＣＰＵ５０は、受信した利用者の音声データをスピーカ６４から出力するとともに、この音声データに合せて、アバターの画像２１０の口唇部を動かすとともに、アバターの画像２１０の頭部（首）を動かす。つまり、利用者に対応するアバターが実際にしゃべっているように表現される。 If "YES" in step S49, that is, if the user is speaking, in step S51, the user's voice is output, and the avatar corresponding to the user is made to perform a speaking action, The process advances to step S55. In step S51, the CPU 50 outputs the received user's voice data from the speaker 64, moves the lips of the avatar image 210 in accordance with this voice data, and moves the head (neck) of the avatar image 210. move. In other words, it appears as if the avatar corresponding to the user is actually speaking.

一方、ステップＳ４９で“ＮＯ”であれば、つまり、利用者が発話していない場合には、ステップＳ５３で、利用者の頭部の動きを利用者に対応するアバターに反映して、ステップＳ５５に進む。 On the other hand, if "NO" in step S49, that is, if the user is not speaking, the movement of the user's head is reflected in the avatar corresponding to the user in step S53, and step S55 Proceed to.

ステップＳ５５では、終了かどうかを判断する。ここでは、ＣＰＵ５０は、操作者が制御処理を終了することを指示したり、利用者が対話を終了したりしたかどうかを判断する。 In step S55, it is determined whether the process is finished. Here, the CPU 50 determines whether the operator has given an instruction to end the control process or whether the user has ended the dialogue.

ステップＳ５５で“ＮＯ”であれば、つまり、終了でなければ、図９に示したステップＳ１に戻る。したがって、アバターの画像２１０が更新される。一方、ステップＳ５５で“ＹＥＳ”であれば、つまり、終了であれば、制御処理を終了する。 If "NO" in step S55, that is, if the process is not finished, the process returns to step S1 shown in FIG. Therefore, the avatar image 210 is updated. On the other hand, if "YES" in step S55, that is, if the process ends, the control process ends.

図１３に示すように、ＣＰＵ５０は、送受信処理を開始すると、ステップＳ７１で、利用者側端末１２と通信を開始する。続くステップＳ７３では、操作者の音声を検出したかどうかを判断する。ステップＳ７３で“ＮＯ”であれば、つまり、操作者の音声を検出していなければ、ステップＳ７７に進む。 As shown in FIG. 13, when the CPU 50 starts the transmission/reception process, it starts communication with the user terminal 12 in step S71. In the following step S73, it is determined whether the operator's voice has been detected. If "NO" in step S73, that is, if the operator's voice is not detected, the process advances to step S77.

一方、ステップＳ７３で“ＹＥＳ”であれば、つまり、操作者の音声を検出していれば、ステップＳ７５で、操作者の音声を利用者側端末１２に送信して、ステップＳ７７に進む。 On the other hand, if "YES" in step S73, that is, if the operator's voice is detected, the operator's voice is transmitted to the user terminal 12 in step S75, and the process advances to step S77.

ステップＳ７７では、利用者側端末１２からデータを受信したかどうかを判断する。ステップＳ７７で“ＮＯ”であれば、つまり、利用者側端末１２からデータを受信していない場合には、ステップＳ８１に進む。 In step S77, it is determined whether data has been received from the user terminal 12. If "NO" in step S77, that is, if data is not received from the user terminal 12, the process advances to step S81.

一方、ステップＳ７７で“ＹＥＳ”であれば、つまり、利用者側端末１２からデータを受信している場合には、ステップＳ７９で、受信したデータを記憶して、ステップＳ８１に進む。 On the other hand, if "YES" in step S77, that is, if data is being received from the user terminal 12, the received data is stored in step S79, and the process proceeds to step S81.

ステップＳ８１では、終了かどうかを判断する。ここでは、ＣＰＵ５０は、操作者が送受信処理を終了することを指示したり、利用者が対話を終了したりしたかどうかを判断する。 In step S81, it is determined whether the process is finished. Here, the CPU 50 determines whether the operator has given an instruction to end the transmission/reception process or whether the user has ended the dialogue.

ステップＳ８１で“ＮＯ”であれば、つまり、終了でなければ、ステップＳ７３に戻る。一方、ステップＳ８１で“ＹＥＳ”であれば、つまり、終了であれば、利用者側端末１２との送受信処理を終了する。 If "NO" in step S81, that is, if the process is not finished, the process returns to step S73. On the other hand, if "YES" in step S81, that is, if it is finished, the transmission/reception process with the user terminal 12 is finished.

図１４に示すように、利用者側端末１２のＣＰＵ２０は送受信処理を開始すると、ステップＳ１０１で、操作者側端末１６との通信を開始する。次のステップＳ１０３では、利用者の音声を検出したかどうかを判断する。 As shown in FIG. 14, when the CPU 20 of the user terminal 12 starts the transmission/reception process, it starts communication with the operator terminal 16 in step S101. In the next step S103, it is determined whether the user's voice has been detected.

ステップＳ１０３で“ＮＯ”であれば、つまり、利用者の音声を検出していなければ、ステップＳ１０７に進む。一方、ステップＳ１０３で“ＹＥＳ”であれば、つまり、利用者の音声を検出していれば、ステップＳ１０５で、検出した音声を操作者側端末１６に送信して、ステップＳ１０７に進む。 If "NO" in step S103, that is, if the user's voice is not detected, the process advances to step S107. On the other hand, if "YES" in step S103, that is, if the user's voice is detected, the detected voice is transmitted to the operator terminal 16 in step S105, and the process advances to step S107.

ステップＳ１０７では、撮影画像を取得する。次のステップＳ１０９では、撮影画像を操作者側端末１６に送信する。続いて、ステップＳ１１１で、操作者の音声を受信したかどうかを判断する。 In step S107, a photographed image is acquired. In the next step S109, the photographed image is transmitted to the operator terminal 16. Subsequently, in step S111, it is determined whether the operator's voice has been received.

ステップＳ１１１で“ＮＯ”であれば、つまり、操作者の音声を受信していなければ、ステップＳ１１５に進む。一方、ステップＳ１１１で“ＹＥＳ”であれば、つまり、操作者の音声を受信していれば、ステップＳ１１３で、操作者の音声を出力して、ステップＳ１１５に進む。 If "NO" in step S111, that is, if the operator's voice is not received, the process advances to step S115. On the other hand, if "YES" in step S111, that is, if the operator's voice is being received, the operator's voice is output in step S113, and the process proceeds to step S115.

ステップＳ１１５では、終了かどうかを判断する。ここでは、ＣＰＵ２０は、利用者が送受信処理を終了することを指示したり、操作者が対話を終了したりしたかどうかを判断する。 In step S115, it is determined whether the process is finished. Here, the CPU 20 determines whether the user has given an instruction to end the transmission/reception process or whether the operator has ended the dialogue.

ステップＳ１１５で“ＮＯ”であれば、つまり、終了でなければ、ステップＳ１０３に戻る。一方、ステップＳ１１５で“ＹＥＳ”であれば、つまり、終了であれば、操作者側端末１６との送受信処理を終了する。 If "NO" in step S115, that is, if the process is not finished, the process returns to step S103. On the other hand, if "YES" in step S115, that is, if it is finished, the transmission/reception process with the operator side terminal 16 is finished.

第１実施例によれば、操作者側端末の表示装置に利用者に対応するアバターを表示することで、利用者のプライバシーを守るとともに、操作者が発話している場合には、操作者の表情と同じ表情で利用者に対応するアバターを表現させるので、つまり、アバターが共感してくれるので、操作者は対話の相手である利用者と対話し易い。このため、利用者に応対し易くすることができる。 According to the first embodiment, by displaying an avatar corresponding to the user on the display device of the operator side terminal, the privacy of the user is protected, and when the operator is speaking, the avatar corresponding to the user is displayed. Since the avatar corresponding to the user is expressed with the same facial expression as the user's facial expression, in other words, the avatar empathizes with the user, making it easier for the operator to interact with the user. This makes it easier to respond to users.

また、第１実施例によれば、操作者が発話していない場合には、利用者の表情を操作者に好意的な表情でアバターを表現させるので、操作者が喋り易い状況を作ることができる。したがって、利用者に応対し易くすることができる。 Further, according to the first embodiment, when the operator is not speaking, the avatar is made to express the user's facial expression with a favorable expression for the operator, so it is possible to create a situation where the operator can easily speak. can. Therefore, it is possible to easily respond to users.

さらに、第１実施例によれば、操作者が発話していない場合には、適宜アバターの視線を逸らすことで、アバターに威圧感の無い動作を行わせるので、操作者が喋り易い状況を作ることができる。したがって、利用者に応対し易くすることができる。 Furthermore, according to the first embodiment, when the operator is not speaking, the avatar averts the avatar's line of sight as appropriate to make the avatar perform non-threatening movements, thereby creating a situation in which the operator can easily speak. be able to. Therefore, it is possible to easily respond to users.

なお、第１実施例では、操作者が発話していない場合には、利用者の表情と同じ表情を操作者に対して好意的に変更してアバターに表現させる処理を実行したり、適宜視線を逸らす処理を実行したりし、操作者が発話している場合には、操作者の表情と同じ表情をアバターに表現させる処理を実行したり、適宜頷き動作を行わせる処理を実行したりしたが、これらすべての処理が実行される必要はない。いずれか１つまたは２つ以上の処理が実行された場合にも、操作者は利用者と対話し易い。各処理を実行するかどうかを操作者が設定し、実行しない処理については、図９－図１２に示した制御処理においてスキップされる。 In the first embodiment, when the operator is not speaking, the avatar changes the same facial expression as the user's facial expression to be favorable to the operator, or changes the line of sight as appropriate. If the operator is speaking, the avatar may express the same facial expression as the operator's, or make the avatar nod as appropriate. However, it is not necessary for all of these processes to be performed. Even when one or more processes are executed, the operator can easily interact with the user. The operator sets whether or not to execute each process, and processes that are not executed are skipped in the control process shown in FIGS. 9 to 12.

一例として、操作者が発話していない場合に、利用者の表情と同じ表情を操作者に対して好意的に変更してアバターに表現させる処理を実行しない場合には、ステップＳ７で“ＮＯ”の場合に、図１０に示すステップＳ９およびＳ１３の処理がスキップされ、ステップＳ１７の処理に移行される。したがって、ステップＳ１１およびステップＳ１５の処理が実行されることはない。なお、この場合には、利用者側端末１２から撮影画像データを送信しなくてもよい。 For example, if the operator is not speaking and the avatar does not want to perform a process of changing the same facial expression as the user's facial expression to make it more favorable to the operator, select "NO" in step S7. In this case, the processes of steps S9 and S13 shown in FIG. 10 are skipped, and the process proceeds to step S17. Therefore, the processes of step S11 and step S15 are never executed. Note that in this case, it is not necessary to transmit the photographed image data from the user terminal 12.

また、操作者発話していない場合に、適宜視線を逸らす処理を実行しない場合には、ステップＳ１７の処理を実行した場合に、ステップＳ１９の処理がスキップされ、ステップＳ２３の処理に移行される。 Further, if the operator does not speak and the process of averting the line of sight is not performed, the process of step S19 is skipped and the process proceeds to step S23 when the process of step S17 is executed.

さらに、操作者が発話している場合に、操作者の表情と同じ表情をアバターに表現させる処理を実行しない場合には、ステップＳ２９で“ＮＯ”の場合に、ステップＳ３５およびＳ３９の処理がスキップされ、ステップＳ４１の処理に移行される。したがって、ステップＳ３７およびＳ４３が実行されることはない。 Furthermore, if the operator does not want to perform the process of making the avatar express the same facial expression as the operator's facial expression when the operator is speaking, if "NO" in step S29, the processes of steps S35 and S39 are skipped. The process then proceeds to step S41. Therefore, steps S37 and S43 are never executed.

さらにまた、操作者が発話している場合に、適宜頷き動作を行わせる処理を実行しない場合には、ステップＳ２７の処理が実行されると、ステップＳ２９の処理がスキップされ、ステップＳ３５に移行する。したがって、ステップＳ３１およびＳ３３の処理が実行されることはない。 Furthermore, if the process of causing the operator to nod appropriately while speaking is not executed, once the process of step S27 is executed, the process of step S29 is skipped and the process moves to step S35. . Therefore, the processes of steps S31 and S33 are never executed.

説明は省略するが、２つ以上の処理を実行しない場合には、上記のように、該当する処理がスキップされる。 Although the explanation is omitted, if two or more processes are not executed, the corresponding process is skipped as described above.

なお、第１実施例では、利用者および操作者の表情を顔画像から認識するようにしたが、これに限定される必要はない。利用者および操作者の表情は、利用者および操作者の音声からそれぞれ認識することもできる。音声から人間の表情を推定する手法としては、公知技術を用いることができる。たとえば、特開２０２１－１２２８５号および「森大毅：音声から感情・態度の理解、電子情報通信学会誌 Vol. 101, No. 9, 2018」などに開示された技術を用いることができる。 Note that in the first embodiment, the facial expressions of the user and the operator are recognized from the facial images, but there is no need to be limited to this. The facial expressions of the user and the operator can also be recognized from the voices of the user and the operator, respectively. Known techniques can be used as a method for estimating human facial expressions from voice. For example, the technology disclosed in Japanese Patent Application Laid-Open No. 2021-12285 and "Daiki Mori: Understanding Emotions and Attitudes from Speech, Journal of the Institute of Electronics, Information and Communication Engineers Vol. 101, No. 9, 2018" can be used.

また、第１実施例では、操作者が発話している場合には、操作者の表情を認識し、予め設定された微笑みの度合で、利用者に対応するアバターに微笑む表情を表現させたり、予め設定された悲しみの度合で、利用者に対応するアバターに悲しい表情を表現させたりしたが、これに限定される必要はない。操作者の顔画像から、操作者の表情の度合を認識することも可能であるため、認識した操作者の表情の度合で、アバターに微笑む表情または悲しい表情を表現させるようにしてもよい。 Further, in the first embodiment, when the operator is speaking, the operator's facial expression is recognized, and the avatar corresponding to the user is made to express a smiling expression with a preset smile level, Although the avatar corresponding to the user expresses a sad expression based on a preset degree of sadness, the present invention is not limited to this. Since it is possible to recognize the degree of the operator's facial expression from the operator's face image, the avatar may be made to express a smiling expression or a sad expression based on the recognized degree of the operator's facial expression.

さらに、第１実施例では、操作者の音声が途切れたタイミング、または、操作者が利用者に同意を求めている内容を発話したタイミングで、利用者に対応するアバターに頷き動作を行わせるようにしたが、これに限定される必要はない。頷く動作が多い場合または少ない場合には、操作者が発話し難い場合もあるため、頷く動作を行う頻度を設定することも可能である。頷く動作を行う頻度は、設定パラメータとして記憶される。頷く動作を行う頻度が少なくされた場合には、操作者の音声が途切れたタイミング、または、操作者が利用者に同意を求めている内容を発話したタイミングになった場合でも、少なくする割合に応じて、頷く動作が実行されない。一方、頷く動作を行う頻度が少なくされた場合には、操作者の音声が途切れたタイミング、または、操作者が利用者に同意を求めている内容を発話したタイミングになった場合だけでなく、多くする割合に応じて、直前の頷きの動作から第４所定時間（たとえば、０．５秒から数秒）経過した場合にも頷く動作が実行される。 Furthermore, in the first embodiment, the avatar corresponding to the user is made to nod when the operator's voice is interrupted or when the operator utters the content for which the user is requested to consent. However, there is no need to be limited to this. If the number of nods is large or small, it may be difficult for the operator to speak, so it is also possible to set the frequency of nodding. The frequency of nodding is stored as a setting parameter. If the frequency of nodding is reduced, even if the operator's voice is interrupted or the operator has uttered the content for which the user is requesting consent, the frequency of nodding will be reduced. Accordingly, the nodding action is not performed. On the other hand, if the frequency of nodding is reduced, it will occur not only when the operator's voice is interrupted or when the operator utters the content for which the user is requesting consent. Depending on the increasing rate, the nodding motion is also performed when a fourth predetermined time (for example, from 0.5 seconds to several seconds) has elapsed since the previous nodding motion.

さらにまた、第１実施例では、利用者の表情を操作者に対して好意的に変更した度合で利用者に対応するアバターを表現させることにより、操作者が対話し易い状況を作るようにしたが、これに限定される必要はない。利用者が強面または不機嫌そうに見える顔であり、怒りの表情または不機嫌な表情が認識された場合には、怒りまたは不機嫌の度合を低減したり、喜びの表情に変換したりして、操作者が対話し易い状況を作るようにしてもよい。怒りまたは不機嫌の度合を低減すること、または、喜びの表情に変換することは、操作者が設定したり、解除したりできるようにしてもよい。ただし、低減する怒りまたは不機嫌の度合は設定パラメータに設定される。このように、利用者の表情を操作者に対して好意的に変更した度合で利用者に対応するアバターを表現させたり、利用者の表情を操作者に対して好意的な表情に変更して利用者に対応するアバターを表現させたりするようにしてもよい。 Furthermore, in the first embodiment, a situation is created in which it is easy for the operator to interact by having the user express an avatar corresponding to the degree to which the user's facial expression has been changed in a way that is favorable to the operator. However, it is not necessary to be limited to this. If the user's face looks tough or displeased and an angry or displeased expression is recognized, the operator can reduce the degree of anger or displeasure or change the expression to a happy one. It is also possible to create a situation where it is easy for people to interact with each other. The operator may be able to set or cancel the reduction of the degree of anger or displeasure, or the conversion to a happy expression. However, the degree of anger or displeasure to be reduced is set in a configuration parameter. In this way, the degree to which the user's facial expression is changed to be favorable toward the operator is expressed as the corresponding avatar, or the user's facial expression is changed to a favorable facial expression toward the operator. It may also be possible to have the user express a corresponding avatar.

＜第２実施例＞
第２実施例では、利用者側端末１２で、利用者の表情を認識するとともに、利用者の視線および頭部の動きを検出し、認識した利用者の表情および検出した利用者の視線および頭部の動きを操作者側端末１６に送信するようにした以外は、第１実施例と同じであるため、異なる内容について説明し、重複した説明についての説明は省略する。 <Second example>
In the second embodiment, the user-side terminal 12 recognizes the user's facial expression and detects the user's line of sight and head movement. The second embodiment is the same as the first embodiment except that the movement of the part is transmitted to the operator terminal 16, so the different contents will be explained and the redundant explanation will be omitted.

したがって、第２実施例では、表情認識プログラム３０２ｉは、利用者側端末１２にも記憶される。また、視線および頭部の動き検出プログラム３０２ｊは、利用者側端末１２に記憶され、操作者側端末１６から削除される。 Therefore, in the second embodiment, the facial expression recognition program 302i is also stored in the user terminal 12. Furthermore, the line of sight and head movement detection program 302j is stored in the user terminal 12 and deleted from the operator terminal 16.

具体的には、図１３に示すように、操作者側端末１６のＣＰＵ５０の制御処理の一部が変更され、図１４に示すように、利用者側端末１２のＣＰＵ２０の送受信処理の一部が変更される。 Specifically, as shown in FIG. 13, part of the control processing of the CPU 50 of the operator terminal 16 is changed, and as shown in FIG. Be changed.

第２実施例では、ＣＰＵ５０の制御処理から、ステップＳ５の利用者の表情を認識する処理と、ステップＳ７の利用者の視線を検出処理が削除される。 In the second embodiment, the process of recognizing the user's facial expression in step S5 and the process of detecting the user's line of sight in step S7 are deleted from the control process of the CPU 50.

また、第２実施例では、ＣＰＵ２０の送受信処理から、ステップＳ１０９の撮影画像を操作者側端末１６に送信する処理が削除され、ステップＳ１０７とＳ１１１の間に、利用者の表情を認識するステップＳ１２１の処理、利用者の視線および頭部の動きを検出するステップＳ１２３の処理および利用者の表情および視線および頭部の動きを操作者側端末１６に送信するステップＳ１２５の処理がその順番で追加される。 Further, in the second embodiment, the process of transmitting the photographed image in step S109 to the operator terminal 16 is deleted from the transmission/reception process of the CPU 20, and the step S121 of recognizing the user's facial expression is performed between steps S107 and S111. , the process of step S123 for detecting the user's line of sight and head movement, and the process of step S125 for transmitting the user's facial expression, line of sight, and head movement to the operator terminal 16 are added in that order. Ru.

第２実施例においても、第１実施例と同様に、対話の相手である利用者と対話し易く、利用者に応対し易くすることができる。 In the second embodiment, as in the first embodiment, it is possible to easily interact with the user who is the other party of the dialog, and to respond to the user.

なお、上述の各実施例では、利用者側端末および操作者側端末がネットワークを介して通信するようにしたが、利用者側端末および操作者側端末はネットワークおよびサーバを介して通信するようにしてもよい。この場合、サーバは、利用者側端末から操作者側端末に送信した画像データを受信した場合に、受信した画像データに基づいて、利用者の表情を認識するとともに、利用者の視線および頭部の動きを検出して、受信した画像データに代えて、利用者の表情、利用者の視線および利用者の頭部の動きについてのデータを操作者側端末に送信するようにしてもよい。 In each of the above embodiments, the user terminal and the operator terminal communicate via the network, but the user terminal and the operator terminal communicate via the network and the server. It's okay. In this case, when the server receives image data sent from the user terminal to the operator terminal, the server recognizes the user's facial expression based on the received image data, and also recognizes the user's line of sight and head position. , and instead of the received image data, data regarding the user's facial expression, the user's line of sight, and the movement of the user's head may be transmitted to the operator side terminal.

また、上述の各実施例では、ショッピングサイトの利用者とこの利用者に応対する操作者が対話する場合について説明したが、これに限定される必要はない。２人または３人以上の参加者がビデオ通話またはウェブ会議する場合に、対話する相手または会議に参加する他の参加者の各々に対応するアバターの画像を、各参加者が使用する端末の表示装置に表示し、各アバターを上述の実施例で示した方法で個別に制御するようにしてもよい。 Further, in each of the above-described embodiments, a case has been described in which a user of a shopping site and an operator who serves the user interact, but the present invention is not limited to this. When two or more participants are in a video call or web conference, the device used by each participant displays an avatar image corresponding to each person with whom to interact or other participants participating in the conference. The information may be displayed on the device and each avatar may be individually controlled using the method shown in the above embodiment.

さらに、上述の各実施例で示したフロー図の各ステップは同じ結果が得られる場合には、処理する順番を変更することが可能である。 Furthermore, the order of processing of each step in the flowcharts shown in each of the above embodiments can be changed if the same result is obtained.

さらにまた、上述の各実施例で挙げた各種の画面、具体的数値はいずれも単なる例示であり、必要に応じて適宜変更可能である。 Furthermore, the various screens and specific numerical values mentioned in each of the above-mentioned embodiments are merely examples, and can be changed as necessary.

１０ …情報処理システム
１２ …利用者側端末
１４ …ネットワーク
１６ …操作者側端末
１８ …サーバ
１８ａ、２０、５０ …ＣＰＵ
１８ｂ、２２、５２ …記憶部
２４、５４ …通信Ｉ／Ｆ
２６、５６ …入出力Ｉ／Ｆ
２８、５８ …入力装置
３０、６０ …表示装置
３２、６２ …マイク
３４、６４ …スピーカ
３６、６６ …センサＩ／Ｆ
３８、６８ …カメラ 10...Information processing system 12...User side terminal 14...Network 16...Operator side terminal 18...Server 18a, 20, 50...CPU
18b, 22, 52...Storage unit 24, 54...Communication I/F
26, 56...Input/output I/F
28, 58...Input device 30, 60...Display device 32, 62...Microphone 34, 64...Speaker 36, 66...Sensor I/F
38, 68...camera

Claims

voice detection means for detecting operator voice, which is voice uttered by an operator interacting with a user;
a photographing means for photographing a face image of the operator;
a first recognition means for recognizing the facial expression of the operator based on a facial image of the operator photographed by the photographing means when the voice of the operator is detected by the voice detecting means;
When the facial expression of the operator recognized by the first recognition means is a predetermined first facial expression, the user who is the other party of the interaction expresses the same facial expression as the predetermined first facial expression. An information processing device comprising: avatar display means for displaying an image of an avatar on a display device; and transmission means for transmitting the operator voice detected by the voice detection means to a user terminal used by the user.

The information processing device according to claim 1, wherein the predetermined first expression is a smiling expression and a sad expression.

a timing determining means for determining whether or not it is the timing of the nod based on the operator's voice when the voice of the operator is detected by the voice detecting means; 3. The information processing apparatus according to claim 1, further comprising avatar control means for causing the avatar to perform a nodding motion when it is determined that this is the case.

a receiving means for receiving the user's face image transmitted from the user-side terminal; and a receiving means for receiving the user's face image transmitted from the user-side terminal; further comprising a second recognition means for recognizing the user's facial expression based on the facial image,
When the user's facial expression recognized by the second recognition device is a predetermined second facial expression, the avatar display means displays the same facial expression as the predetermined second facial expression favorably to the operator. The information processing device according to any one of claims 1 to 3, wherein the information processing device is changed and expressed by the avatar.

The predetermined second expression is a smiling expression and a displeased expression,
The second recognition means further recognizes the degree of a smiling expression and a displeased expression of the user,
The avatar display means causes the avatar to emphasize the degree of smile when the user's facial expression is the smiling expression, and displays a moody expression when the user's facial expression is the displeased expression. 5. The information processing apparatus according to claim 4, wherein the avatar is made to express the expression by relaxing the degree of expression.

further comprising receiving means for receiving the facial expression of the user transmitted from the user-side terminal,
The avatar display means displays the predetermined second expression when the user's facial expression received by the receiving means is a predetermined second expression when the voice detection means does not detect the operator voice. 4. The information processing apparatus according to claim 1, wherein the avatar is made to express the same facial expression as the second facial expression in a manner favorable to the operator.

The predetermined second expression is a smiling expression and a displeased expression,
The receiving means further receives the degree of smiling expression and displeased expression of the user,
The avatar display means causes the avatar to emphasize the degree of smile when the user's facial expression is the smiling expression, and displays a moody expression when the user's facial expression is the displeased expression. 7. The information processing apparatus according to claim 6, wherein the avatar expresses the expression by relaxing the degree of expression.

Further comprising a line of sight detection means for detecting the line of sight of the user based on the facial image of the user received by the receiving means,
The avatar display means sets the line of sight of the avatar according to the line of sight of the user detected by the line of sight detection means, and when the voice detection means does not detect the operator voice, the line of sight detection means From claim 1, wherein when the time during which the user's line of sight is facing forward, as detected by the means, has elapsed for a first predetermined time, the line of sight of the avatar is averted for a second predetermined time regardless of the user's line of sight. 5. The information processing device according to any one of items 5 to 5.

The receiving means further receives the user's line of sight,
The avatar display means sets the line of sight of the avatar in accordance with the received line of sight of the user, and when the voice detection means does not detect the operator voice, the avatar display means sets the line of sight of the avatar according to the received line of sight of the user, and when the voice detection means does not detect the operator voice, the line of sight detected by the line of sight detection means is set. 8. The information processing apparatus according to claim 6, wherein when the user's line of sight faces forward for a first predetermined period of time, the avatar's line of sight is averted for a second predetermined period of time regardless of the user's line of sight. .

A control program executed on an information processing device,
A processor of the information processing device,
a voice detection step of detecting operator voice, which is voice uttered by an operator interacting with a user;
a photographing step of photographing a face image of the operator;
a recognition step of recognizing the facial expression of the operator based on the facial image of the operator photographed in the photographing step when the operator's voice is detected in the voice detecting step;
If the facial expression of the operator recognized in the recognition step is a predetermined facial expression, displaying on a display device an image of an avatar corresponding to the user, who is the other party of the dialogue, and which expresses the same facial expression as the predetermined facial expression. and a transmitting step of transmitting the operator's voice detected in the voice detecting step to a user terminal used by the user.

A method for controlling an information processing device, the method comprising:
(a) detecting operator voice, which is voice uttered by an operator interacting with a user;
(b) photographing a face image of the operator;
(c) when the operator voice is detected in the step (a), recognizing the facial expression of the operator based on the facial image of the operator photographed in the photographing step;
(d) If the facial expression of the operator recognized in step (c) is a predetermined facial expression, an image of an avatar corresponding to the user, who is the other party of the dialogue, expressing the same facial expression as the predetermined facial expression. and (e) transmitting the operator's voice detected in step (a) to a user terminal used by the user.