JP2023072111A

JP2023072111A - Information processing apparatus, control program, control method, and information processing system

Info

Publication number: JP2023072111A
Application number: JP2021184439A
Authority: JP
Inventors: 浩石黒; Hiroshi Ishiguro; 昇吾西口; Shogo Nishiguchi
Original assignee: Avita Inc; Osaka University NUC
Current assignee: Avita Inc; Osaka University NUC
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2023-05-24

Abstract

To display an avatar with stronger presence.SOLUTION: A user-side terminal (12) includes a CPU (20). A user of the user-side terminal and an operator of an operator-side terminal (16) communicate by voice in real time, and when the operator responds to an inquiry from the user, the CPU of the user-side terminal receives voice data of the voice of the operator. The received voice data is added with volume data of the voice of the operator, and the CPU of the user-side terminal calculates a ratio for enlarging or reducing an image of an avatar 120 based on the volume data. The CPU of the user-side terminal displays an image of an avatar enlarged or reduced at the calculated ratio compared to a normal time, and outputs the received voice data. When the image of the avatar is enlarged, the image of the avatar may protrude from a display frame 152.SELECTED DRAWING: Figure 7

Description

この発明は、情報処理装置、制御プログラム、制御方法および情報処理システムに関し、特にたとえば、利用者と操作者がチャットまたはトークでコミュニケーションを行う、情報処理装置、制御プログラム、制御方法および情報処理システムに関する。 The present invention relates to an information processing device, a control program, a control method, and an information processing system, and more particularly to an information processing device, a control program, a control method, and an information processing system in which a user and an operator communicate by chat or talk. .

この種の従来の情報処理装置の一例が特許文献１に開示されている。特許文献１に開示されるコミュニケーションシステムでは、テレイグジスタンスモードが設定された状態で、オペレータ端末は、オペレータの表情およびジェスチャを座標データに変換し、応答音声データと共に応対端末へ送信する。応対端末は、オペレータ端末から送られた座標データに基づいてアバターを生成することで、オペレータの表情およびジェスチャがアバターの表情および仕草に反映されたキャラクタ応対情報を生成し、ユーザに向けて表示する。 An example of this type of conventional information processing apparatus is disclosed in Japanese Unexamined Patent Application Publication No. 2002-200012. In the communication system disclosed in Patent Literature 1, an operator terminal converts the operator's expression and gestures into coordinate data in a telexistence mode setting, and transmits the coordinate data together with response voice data to a reception terminal. The response terminal generates an avatar based on the coordinate data sent from the operator terminal, thereby generating character response information in which the expression and gestures of the operator are reflected in the avatar's expression and gestures, and displays the character response information to the user. .

特開２０２１－５６９４０号Japanese Patent Application Laid-Open No. 2021-56940

上記の特許文献１では、オペレータ端末において、キャラクタ応対情報に基づくアバター画像が表示部に表示されるが、表示部の画面は２次元であるため、アバター画像が３次元ＣＧ画像データを用いて生成されたとしても、アバターは平面的に表示され、アバターの存在感を示すためには、改善の余地がある。 In the above Patent Document 1, an avatar image based on character response information is displayed on the display unit of the operator terminal, but since the screen of the display unit is two-dimensional, the avatar image is generated using three-dimensional CG image data. Even if it is, the avatar is displayed two-dimensionally, and there is room for improvement in order to show the presence of the avatar.

それゆえに、この発明の主たる目的は、新規な、情報処理装置、制御プログラム、制御方法および情報処理システムを提供することである。 SUMMARY OF THE INVENTION Therefore, a primary object of the present invention is to provide a novel information processing apparatus, control program, control method, and information processing system.

また、この発明の他の目的は、存在感を増したアバターを表示できる、情報処理装置、制御プログラム、制御方法および情報処理システムを提供することである。 Another object of the present invention is to provide an information processing device, a control program, a control method, and an information processing system capable of displaying an avatar with increased presence.

第１の発明は、操作者が発話した音声と、当該操作者が発話したときの所定の情報を操作者側端末から受信する受信手段、受信手段によって受信された音声を出力する音出力手段、受信手段によって受信された所定の情報に基づいて操作者に対応するアバターの画像を、受信手段によって受信された音声を出力していない通常時の大きさに対して拡大または縮小する比率を算出する比率算出手段、および音出力手段によって音声を出力するときに、比率算出手段によって算出された比率で描画したアバターの画像をディスプレイに表示する画像表示手段を備える、情報処理装置である。 A first invention comprises a receiving means for receiving a voice uttered by an operator and predetermined information when the operator utters the voice from an operator-side terminal, a sound output means for outputting the voice received by the receiving means, A ratio for enlarging or reducing the image of the avatar corresponding to the operator based on the predetermined information received by the receiving means is calculated with respect to the normal size received by the receiving means when no sound is output. The information processing apparatus is provided with image display means for displaying, on a display, an image of an avatar drawn at the ratio calculated by the ratio calculation means when sound is output by the ratio calculation means and the sound output means.

第２の発明は、第１の発明に従属し、画像表示手段は、通常時にアバターの画像が収まる枠画像をさらに表示し、枠画像の前面にアバターの画像を表示する。 A second invention is according to the first invention, wherein the image display means further displays a frame image in which the image of the avatar normally fits, and displays the image of the avatar in front of the frame image.

第３の発明は、第１または第２の発明に従属し、所定の情報は、操作者が発話した音声の音量であり、比率算出手段は、音量に基づいて比率を算出する。 A third invention is according to the first invention, wherein the predetermined information is the volume of the voice uttered by the operator, and the ratio calculating means calculates the ratio based on the volume.

第４の発明は、第１または第２の発明に従属し、所定の情報は、操作者が発話したときの当該操作者の首の動きであり、比率算出手段は、操作者の首の動きに基づいて比率を算出する。 A fourth invention is according to the first or second invention, wherein the predetermined information is a movement of the operator's neck when the operator speaks, and the ratio calculating means comprises: Calculate the ratio based on

第５の発明は、操作者が発話した音声と、当該操作者が発話したときの所定の情報に基づいて算出した比率を操作者側端末から受信する受信手段、受信手段によって受信された音声を出力する音出力手段、および音出力手段によって音声を出力するときに、受信手段よって受信された比率で描画したアバターの画像をディスプレイに表示する画像表示手段を備え、比率は、操作者に対応するアバターの画像を、受信手段によって受信された音声を出力していない通常時の大きさに対して拡大または縮小する比率である、情報処理装置である。 According to a fifth aspect of the present invention, there is provided a receiving means for receiving, from an operator-side terminal, a ratio of the voice uttered by the operator and the ratio calculated based on predetermined information when the operator uttered the voice, and the voice received by the receiving means. Sound output means for outputting, and image display means for displaying on a display an image of the avatar drawn at the ratio received by the receiving means when the sound is output by the sound output means, the ratio corresponding to the operator. The information processing device is a ratio for enlarging or reducing an image of an avatar with respect to a normal size when sound received by a receiving means is not output.

第６の発明は、操作者が発話した音声と、当該操作者が発話したときの所定の情報に基づいて算出した比率で描画したアバターの画像を受信する受信手段、受信手段によって受信された音声を出力する音出力手段、および音出力手段によって音声を出力するときに、受信手段よって受信されたアバターの画像をディスプレイに表示する画像表示手段を備え、比率は、操作者に対応するアバターの画像を、受信手段によって受信された音声を出力していない通常時の大きさに対して拡大または縮小する比率である、情報処理装置である。 A sixth aspect of the present invention is a receiving means for receiving a voice uttered by an operator and an image of an avatar drawn at a ratio calculated based on predetermined information when the operator uttered the voice, and a voice received by the receiving means. and image display means for displaying on a display the image of the avatar received by the receiving means when the sound output means outputs the sound, the ratio being the image of the avatar corresponding to the operator is a ratio of enlarging or reducing the sound received by the receiving means to the normal time when the sound is not output.

第７の発明は、操作者が発話した音声と、当該操作者が発話したときの所定の情報を操作者側端末から受信する受信手段、受信手段によって受信された音声を操作者と対話する利用者が使用する利用者側端末に出力する音出力手段、受信手段によって受信された所定の情報に基づいて操作者に対応するアバターの画像を、受信手段によって受信された音声を操作者と対話する利用者が使用する利用者側端末に出力していない通常時の大きさに対して拡大または縮小する比率を算出する比率算出手段、および音出力手段によって音声を出力するときに、比率算出手段によって算出された比率で描画したアバターの画像を利用者側端末に出力する画像出力手段を備える、情報処理装置である。 A seventh aspect of the present invention is a receiving means for receiving a voice uttered by an operator and predetermined information when the operator utters the voice from an operator-side terminal, and a use for interacting with the operator using the voice received by the receiving means. Sound output means for outputting to the user side terminal used by the operator, based on the predetermined information received by the receiving means, the image of the avatar corresponding to the operator and the voice received by the receiving means are interacted with the operator. Ratio calculation means for calculating the ratio of enlargement or reduction with respect to the normal size that is not output to the user terminal used by the user; The information processing apparatus includes image output means for outputting an image of the avatar drawn at the calculated ratio to a user terminal.

第８の発明は、操作者が入力したテキストまたは発話した音声を受信する受信手段、受信手段によって受信されたテキストまたは音声を出力する出力手段、および操作者に対応するアバターの画像をディスプレイに表示する画像表示手段を備え、画像表示手段は、受信手段によって受信されたテキストまたは音声を出力していない通常時に枠画像に収まる態様でアバターの画像をディスプレイに表示し、出力手段によってテキストまたは音声を出力するときに枠画像からはみ出す態様でアバターの画像をディスプレイに表示する、情報処理装置である。 An eighth invention includes receiving means for receiving text or voice uttered by an operator, output means for outputting the text or voice received by the receiving means, and displaying an image of an avatar corresponding to the operator on a display. The image display means displays the image of the avatar on the display in a manner that fits in the frame image in a normal time when the text or voice received by the receiving means is not output, and the output means outputs the text or voice. An information processing device that displays an image of an avatar on a display in such a manner as to protrude from a frame image when outputting.

第９の発明は、情報処理装置で実行される制御プログラムであって、情報処理装置のプロセッサに、操作者が発話した音声と、当該操作者が発話したときの所定の情報を操作者側端末から受信する受信ステップ、受信ステップにおいて受信した音声を出力する音出力ステップ、受信ステップにおいて受信した所定の情報に基づいて操作者に対応するアバターの画像を、受信ステップにおいて受信した音声を出力していない通常時の大きさに対して拡大または縮小する比率を算出する比率算出ステップ、および音出力ステップにおいて音声を出力するときに、比率算出ステップにおいて算出した比率で描画したアバターの画像をディスプレイに表示する画像表示ステップを実行させる、制御プログラムである。 A ninth aspect of the present invention is a control program executed by an information processing device, which transmits a voice uttered by an operator and predetermined information when the operator uttered to a processor of the information processing device. a sound output step of outputting the voice received in the receiving step; an image of an avatar corresponding to the operator based on predetermined information received in the receiving step; and outputting the voice received in the receiving step. When outputting sound in the ratio calculation step of calculating the ratio of enlargement or reduction with respect to the normal size, and the sound output step, the avatar image drawn at the ratio calculated in the ratio calculation step is displayed on the display It is a control program that causes an image display step to be executed.

第１０の発明は、情報処理装置で実行される制御プログラムであって、情報処理装置のプロセッサに、操作者が発話した音声と、当該操作者が発話したときの所定の情報に基づいて算出した比率を操作者側端末から受信する受信ステップ、受信ステップにおいて受信した音声を出力する音出力ステップ、および音出力ステップにおいて音声を出力するときに、受信ステップにおいて受信した比率で描画したアバターの画像をディスプレイに表示する画像表示ステップを実行させ、比率は、操作者に対応するアバターの画像を、受信ステップにおいて受信した音声を出力していない通常時の大きさに対して拡大または縮小する比率である、制御プログラムである。 A tenth aspect of the present invention is a control program executed by an information processing device, wherein a processor of the information processing device calculates based on a voice uttered by an operator and predetermined information when the operator utters the voice. A reception step of receiving the ratio from the operator-side terminal, a sound output step of outputting the sound received in the reception step, and when outputting the sound in the sound output step, an avatar image drawn at the ratio received in the reception step. The image display step for displaying on the display is executed, and the ratio is the ratio for enlarging or reducing the image of the avatar corresponding to the operator with respect to the normal size when the voice received in the receiving step is not output. , is the control program.

第１１の発明は、情報処理装置で実行される制御プログラムであって、情報処理装置のプロセッサに、操作者が発話した音声と、当該操作者が発話したときの所定の情報に基づいて算出した比率で描画したアバターの画像を受信する受信ステップ、受信ステップにおいて受信した音声を出力する音出力ステップ、および音出力ステップにおいて音声を出力するときに、受信ステップにおいて受信したアバターの画像をディスプレイに表示する画像表示ステップを実行させ、比率は、操作者に対応するアバターの画像を、受信ステップにおいて受信した音声を出力していない通常時の大きさに対して拡大または縮小する比率である、制御プログラムである。 An eleventh aspect of the present invention is a control program executed by an information processing device, wherein a processor of the information processing device calculates based on a voice uttered by an operator and predetermined information when the operator utters the voice. a receiving step of receiving an avatar image drawn at a ratio; a sound output step of outputting the sound received in the receiving step; and displaying the avatar image received in the receiving step on a display when outputting the sound in the sound outputting step. and the ratio is the ratio of enlarging or reducing the image of the avatar corresponding to the operator with respect to the normal size when the sound received in the receiving step is not output, the control program is.

第１２の発明は、情報処理装置で実行される制御プログラムであって、情報処理装置のプロセッサに、操作者が発話した音声と、当該操作者が発話したときの所定の情報を操作者側端末から受信する受信ステップ、受信ステップにおいて受信した音声を操作者と対話する利用者が使用する利用者側端末に出力する音出力ステップ、受信ステップにおいて受信した所定の情報に基づいて操作者に対応するアバターの画像を、受信ステップにおいて受信した音声を操作者と対話する利用者が使用する利用者側端末に出力していない通常時の大きさに対して拡大または縮小する比率を算出する比率算出ステップ、および音出力ステップにおいて音声を出力するときに、比率算出ステップにおいて算出した比率で描画したアバターの画像を利用者側端末に出力する画像出力ステップを実行させる、制御プログラムである。 A twelfth aspect of the invention is a control program executed by an information processing device, which transmits a voice uttered by an operator and predetermined information when the operator uttered to a processor of the information processing device. a receiving step for receiving from, a sound output step for outputting the voice received in the receiving step to a user-side terminal used by the user who interacts with the operator, and responding to the operator based on the predetermined information received in the receiving step A ratio calculation step for calculating the ratio of the avatar image to be enlarged or reduced with respect to the normal size when the voice received in the reception step is not output to the user-side terminal used by the user who interacts with the operator. , and an image output step of outputting an image of the avatar drawn at the ratio calculated in the ratio calculation step to the user-side terminal when outputting sound in the sound output step.

第１３の発明は、情報処理装置で実行される制御プログラムであって、情報処理装置のプロセッサに、操作者が入力したテキストまたは発話した音声を受信する受信ステップ、受信ステップにおいて受信したテキストまたは音声を出力する出力ステップ、および操作者に対応するアバターの画像をディスプレイに表示する画像表示ステップを実行させ、画像表示ステップは、受信ステップにおいて受信したテキストまたは音声を出力していない通常時に枠画像に収まる態様でアバターの画像をディスプレイに表示し、出力ステップにおいてテキストまたは音声を出力するときに枠画像からはみ出す態様でアバターの画像をディスプレイに表示する、制御プログラムである。 A thirteenth aspect of the present invention is a control program executed by an information processing device, comprising: a receiving step for receiving text or voice uttered by an operator; and an image display step of displaying the image of the avatar corresponding to the operator on the display. A control program for displaying an image of an avatar on a display in such a manner that the avatar image fits in the display, and displaying the image of the avatar on the display in a manner that protrudes from the frame image when text or voice is output in an output step.

第１４の発明は、ディスプレイを備える情報処理装置の制御方法であって、（ａ）操作者が発話した音声と、当該操作者が発話したときの所定の情報を操作者側端末から受信するステップ、（ｂ）ステップ（ａ）において受信した音声を出力するステップ、（ｃ）ステップ（ａ）において受信した所定の情報に基づいて操作者に対応するアバターの画像を、ステップ（ａ）において受信した音声を出力していない通常時の大きさに対して拡大または縮小する比率を算出するステップ、および（ｄ）ステップ（ｂ）において音声を出力するときに、ステップ（ｃ）において算出した比率で描画したアバターの画像をディスプレイに表示するステップを含む、制御方法である。 A fourteenth aspect of the invention is a control method for an information processing apparatus having a display, comprising: (a) a step of receiving, from an operator-side terminal, voice uttered by an operator and predetermined information when the operator uttered the voice; (b) a step of outputting the voice received in step (a); (c) an image of an avatar corresponding to the operator based on the predetermined information received in step (a); (d) drawing at the ratio calculated in step (c) when outputting sound in step (b); and displaying an image of the avatar on the display.

第１５の発明は、ディスプレイを備える情報処理装置の制御方法であって、（ａ）操作者が発話した音声と、当該操作者が発話したときの所定の情報に基づいて算出した比率を操作者側端末から受信するステップ、（ｂ）ステップ（ａ）において受信した音声を出力するステップ、および（ｃ）ステップ（ｂ）において音声を出力するときに、ステップ（ａ）において受信した比率で描画したアバターの画像をディスプレイに表示するステップを含み、比率は、操作者に対応するアバターの画像を、ステップ（ｂ）において音声を出力していない通常時の大きさに対して拡大または縮小する比率である、制御プログラムである。 A fifteenth aspect of the invention is a control method for an information processing apparatus having a display, comprising: (b) outputting the voice received in step (a); and (c) rendering the voice at the ratio received in step (a) when outputting the voice in step (b). The step of displaying the avatar image on the display is included, and the ratio is the ratio of enlarging or reducing the avatar image corresponding to the operator to the normal size when sound is not output in step (b). There is a control program.

第１６の発明は、ディスプレイを備える情報処理装置の制御方法であって、（ａ）操作者が発話した音声と、当該操作者が発話したときの所定の情報に基づいて算出した比率で描画したアバターの画像を受信するステップ、（ｂ）ステップ（ａ）において受信した音声を出力するステップ、および（ｃ）ステップ（ｂ）において音声を出力するときに、ステップ(ａ)において受信したアバターの画像をディスプレイに表示するステップを含み、比率は、操作者に対応するアバターの画像を、ステップ（ｂ）において音声を出力していない通常時の大きさに対して拡大または縮小する比率である、制御方法である。 A sixteenth aspect of the invention is a control method for an information processing apparatus having a display, wherein: (a) drawing is performed with a ratio calculated based on a voice uttered by an operator and predetermined information when the operator utters the voice; (b) outputting the sound received in step (a); and (c) when outputting the sound in step (b), the image of the avatar received in step (a). is displayed on the display, and the ratio is the ratio of enlarging or reducing the image of the avatar corresponding to the operator to the size of the normal time when no sound is output in step (b), the control The method.

第１７の発明は、情報処理装置の制御方法であって、（ａ）操作者が発話した音声と、当該操作者が発話したときの所定の情報を操作者側端末から受信するステップ、（ｂ）ステップ（ａ）において受信した音声を操作者と対話する利用者が使用する利用者側端末に出力するステップ、（ｃ）ステップ（ａ）において受信した所定の情報に基づいて操作者に対応するアバターの画像を、ステップ（ａ）において受信した音声を操作者と対話する利用者が使用する利用者側端末に出力していない通常時の大きさに対して拡大または縮小する比率を算出するステップ、および（ｄ）ステップ（ｂ）において音声を出力するときに、ステップ（ｃ）において算出した比率で描画したアバターの画像を利用者側端末に出力するステップを含む、制御方法ある。 A seventeenth aspect of the invention is a control method for an information processing apparatus, comprising: (a) a step of receiving, from an operator-side terminal, a voice uttered by an operator and predetermined information when the operator uttered the utterance; ) step of outputting the voice received in step (a) to a user-side terminal used by the user who interacts with the operator; (c) responding to the operator based on the predetermined information received in step (a); A step of calculating the ratio of enlargement or reduction of the avatar image to the normal size when the voice received in step (a) is not output to the user-side terminal used by the user who interacts with the operator. and (d) a step of outputting an image of the avatar drawn at the ratio calculated in step (c) to the user terminal when outputting the sound in step (b).

第１８の発明は、情報処理装置の制御方法であって、（ａ）操作者が入力したテキストまたは発話した音声を受信するステップ、（ｂ）ステップ（ａ）において受信したテキストまたは音声を出力するステップ、および（ｃ）操作者に対応するアバターの画像をディスプレイに表示するステップを含み、ステップ（ｃ）は、ステップ（ａ）において受信したテキストまたは音声を出力していない通常時に枠画像に収まる態様でアバターの画像をディスプレイに表示し、ステップ（ｂ）においてテキストまたは音声を出力するときに枠画像からはみ出す態様でアバターの画像をディスプレイに表示する、制御方法である。 An eighteenth aspect of the invention is a control method for an information processing apparatus, comprising: (a) a step of receiving text or voice uttered by an operator; and (b) outputting the text or voice received in step (a). and (c) displaying an image of an avatar corresponding to the operator on a display, wherein step (c) fits in the frame image during normal times when the text or voice received in step (a) is not output. and displaying the avatar image on the display in a manner that protrudes from the frame image when text or voice is output in step (b).

第１９の発明は、サーバと、サーバと通信可能に接続された利用者側端末および操作者側端末を備える情報処理システムであって、操作者が発話した音声と、当該操作者が発話したときの所定の情報を操作者側端末から受信する受信手段、受信手段によって受信された音声を出力する音出力手段、所定の情報に基づいて操作者に対応するアバターの画像を、受信手段によって受信された音声を出力していない通常時の大きさに対して拡大または縮小する比率を算出する比率算出手段、および音出力手段によって音声を出力するときに、比率算出手段によって算出された比率で描画したアバターの画像を利用者側端末のディスプレイに表示する画像表示手段を備える、情報処理システムである。 A nineteenth invention is an information processing system comprising a server, a user-side terminal and an operator-side terminal communicably connected to the server, wherein a voice uttered by an operator and when the operator utters a voice receiving means for receiving predetermined information from the terminal on the operator side; sound output means for outputting the voice received by the receiving means; A ratio calculation means for calculating a ratio of enlargement or reduction with respect to the normal size when the sound is not output, and when the sound is output by the sound output means, the drawing is performed at the ratio calculated by the ratio calculation means. An information processing system comprising image display means for displaying an image of an avatar on a display of a user terminal.

この発明によれば、存在感を増したアバターを表示することができる。 According to this invention, an avatar with increased presence can be displayed.

この発明の上述の目的、その他の目的，特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features and advantages of the present invention will become more apparent from the following detailed description of the embodiments with reference to the drawings.

図１はこの発明の一実施例の情報処理システムを示す図である。FIG. 1 is a diagram showing an information processing system according to one embodiment of the present invention. 図２は図１に示す利用者側端末の電気的な構成を示すブロック図である。FIG. 2 is a block diagram showing the electrical configuration of the user terminal shown in FIG. 図３は図１に示す操作者側端末の電気的な構成を示すブロック図である。FIG. 3 is a block diagram showing an electrical configuration of the operator-side terminal shown in FIG. 1; 図４は利用者側端末の表示装置に表示される画面の一例を示す図である。FIG. 4 is a diagram showing an example of a screen displayed on the display device of the user terminal. 図５は利用者側端末の表示装置に表示される画面の他の例を示す図である。FIG. 5 is a diagram showing another example of a screen displayed on the display device of the user-side terminal. 図６は利用者側端末の表示装置に表示される画面のその他の例を示す図である。FIG. 6 is a diagram showing another example of the screen displayed on the display device of the user-side terminal. 図７（Ａ）は通常時のアバターの画像を表示したトーク画面の一例を示す図であり、図７（Ｂ）は拡大したアバターの画像を表示したトーク画面の一例を示す図であり、図７（Ｃ）は縮小したアバターの画像を表示したトーク画面の一例を示す図である。FIG. 7A is a diagram showing an example of a talk screen displaying normal avatar images, and FIG. 7B is a diagram showing an example of a talk screen displaying enlarged avatar images. 7(C) is a diagram showing an example of a talk screen displaying a reduced avatar image. 図８は図２に示す利用者側端末のＲＡＭのメモリマップの一例を示す図である。FIG. 8 is a diagram showing an example of a memory map of the RAM of the user-side terminal shown in FIG. 図９は図２に示す利用者側端末のＣＰＵの制御処理の一例の第１の一部を示すフロー図である。9 is a flowchart showing a first part of an example of control processing of the CPU of the user-side terminal shown in FIG. 2; FIG. 図１０は図２に示す利用者側端末のＣＰＵの制御処理の一例の第２の一部であって、図９に後続するフロー図である。10 is a second part of an example of the control processing of the CPU of the user-side terminal shown in FIG. 2, and is a flow chart subsequent to FIG. 9. FIG. 図１１は図２に示す利用者側端末のＣＰＵの制御処理の一例の第３の一部であって、図９に後続するフロー図である。11 is a flow chart subsequent to FIG. 9, showing a third part of an example of the control processing of the CPU of the user-side terminal shown in FIG. 2; FIG. 図１２は図２に示す利用者側端末のＣＰＵの比率算出処理を示すフロー図である。FIG. 12 is a flow chart showing the ratio calculation processing of the CPU of the user side terminal shown in FIG. 図１３は第２実施例の操作者側端末の電気的な構成を示すブロック図である。FIG. 13 is a block diagram showing the electrical configuration of the operator side terminal of the second embodiment. 図１４（Ａ）は操作者が頷く場合の基準面の移動距離を説明するための図であり、図１４（Ｂ）は操作者が首を振る場合の基準面の移動距離を説明するための図である。FIG. 14A is a diagram for explaining the movement distance of the reference plane when the operator nods, and FIG. 14B is a diagram for explaining the movement distance of the reference plane when the operator shakes his head. It is a diagram. 図１５は第２実施例における利用者側端末のＣＰＵの比率算出処理を示すフロー図である。FIG. 15 is a flow chart showing the CPU ratio calculation process of the user terminal in the second embodiment. 図１６は第３実施例における利用者側端末のＣＰＵの制御処理の一部を示すフロー図である。FIG. 16 is a flowchart showing part of the control processing of the CPU of the user terminal in the third embodiment. 図１７は第３実施例における操作者側端末のＣＰＵのアバターの画像生成処理の一例をフロー図である。FIG. 17 is a flowchart showing an example of avatar image generation processing of the CPU of the operator-side terminal in the third embodiment. 図１８は第４実施例におけるサーバのＣＰＵの制御処理の一例の第１の一部を示すフロー図である。FIG. 18 is a flowchart showing a first part of an example of control processing of the CPU of the server in the fourth embodiment. 図１９は第４実施例におけるサーバのＣＰＵの制御処理の一例の第２の一部であって、図１８に後続するフロー図である。FIG. 19 is a flow chart subsequent to FIG. 18, showing a second part of an example of the control processing of the CPU of the server in the fourth embodiment. 図２０は第４実施例におけるサーバのＣＰＵの制御処理の一例の第３の一部であって、図１８に後続するフロー図である。FIG. 20 is a flow chart subsequent to FIG. 18, showing the third part of an example of the control processing of the CPU of the server in the fourth embodiment. 図２１は枠画像からはみ出す態様のアバターの画像の例を示す図である。FIG. 21 is a diagram showing an example of an image of an avatar that protrudes from the frame image.

＜第１実施例＞
図１を参照して、この第１実施例の情報処理システム１０は利用者側端末１２を含み、利用者側端末１２は、ネットワーク１４を介して、操作者側端末１６およびサーバ１８に通信可能に接続される。 <First embodiment>
Referring to FIG. 1, an information processing system 10 of the first embodiment includes a user terminal 12, and the user terminal 12 can communicate with an operator terminal 16 and a server 18 via a network 14. connected to

利用者側端末１２は、サーバ１８によって提供される所定のサービスを利用する利用者によって使用され、操作者側端末１６は、利用者に応対する操作者によって使用される。 The user-side terminal 12 is used by a user who uses a predetermined service provided by the server 18, and the operator-side terminal 16 is used by an operator who responds to the user.

利用者側端末１２は、情報処理装置であり、一例として、スマートフォンであり、ブラウザ機能を備えている。他の実施例では、利用者側端末１２として、タブレットＰＣ、ノート型ＰＣまたはデスクトップ型ＰＣなどの汎用の端末を用いることもできる。 The user-side terminal 12 is an information processing device, such as a smartphone, and has a browser function. In another embodiment, a general-purpose terminal such as a tablet PC, a notebook PC, or a desktop PC can be used as the user terminal 12 .

ネットワーク１４は、インターネットを含むＩＰ網（または、ＩＰネットワーク）と、このＩＰ網にアクセスするためのアクセス網（または、アクセスネットワーク）とから構成される。アクセス網としては、公衆電話網、携帯電話網、有線ＬＡＮ、無線ＬＡＮ、ＣＡＴＶ（Cable Television）等を用いることができる。 The network 14 is composed of an IP network (or IP network) including the Internet and an access network (or access network) for accessing this IP network. As the access network, a public telephone network, a mobile telephone network, a wired LAN, a wireless LAN, CATV (Cable Television), or the like can be used.

操作者側端末１６は、利用者側端末１２とは異なる他の情報処理装置であり、一例として、ノート型ＰＣまたはデスクトップ型ＰＣであるが、他の実施例では、スマートフォンまたはタブレットＰＣなどの汎用の端末を用いることもできる。 The operator-side terminal 16 is another information processing device different from the user-side terminal 12. As an example, the operator-side terminal 16 is a notebook PC or a desktop PC. terminal can also be used.

サーバ１８は、利用者側端末１２および操作者側端末１６とは異なるその他の情報処理装置であり、汎用のサーバを用いることができる。したがって、サーバ１８は、ＣＰＵ１８ａおよび記憶部（ＨＤＤ、ＲＯＭおよびＲＡＭを含む）１８ｂを備えるとともに、通信インタフェースおよび入出力インタフェースなどのコンポーネントを備える。第１実施例では、サーバ１８は、所定のサービスを提供するサイトを運営するために設けられる。 The server 18 is an information processing device different from the user-side terminal 12 and the operator-side terminal 16, and a general-purpose server can be used. Accordingly, the server 18 includes a CPU 18a and a storage unit (including HDD, ROM and RAM) 18b, as well as components such as a communication interface and an input/output interface. In the first embodiment, the server 18 is provided to operate a site that provides a given service.

図２は図１に示した利用者側端末１２の電気的な構成を示すブロック図である。図２に示すように、利用者側端末１２はＣＰＵ２０を含み、ＣＰＵ２０は、内部バスを介して、記憶部２２、通信インタフェース（以下、「通信Ｉ／Ｆ」という）２４および入出力インタフェース（以下、「入出力Ｉ／Ｆ」という）２６に接続される。 FIG. 2 is a block diagram showing the electrical configuration of the user terminal 12 shown in FIG. As shown in FIG. 2, the user-side terminal 12 includes a CPU 20. The CPU 20 connects a storage unit 22, a communication interface (hereinafter referred to as "communication I/F") 24, and an input/output interface (hereinafter referred to as "communication I/F") 24 via an internal bus. , “input/output I/F”) 26 .

ＣＰＵ２０は、利用者側端末１２の全体的な制御を司る。ただし、ＣＰＵ２０に代えて、ＣＰＵ機能、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）機能等の複数の機能を含むＳｏＣ（Ｓｙｓｔｅｍ－ｏｎ－ａ－ｃｈｉｐ）を設けてもよい。記憶部２２は、ＨＤＤ、ＲＯＭおよびＲＡＭを含む。ただし、ＨＤＤに代えて、または、ＨＤＤ、ＲＯＭおよびＲＡＭに加えて、ＳＳＤ等の不揮発性メモリが使用されてもよい。 The CPU 20 manages overall control of the user terminal 12 . However, instead of the CPU 20, an SoC (System-on-a-Chip) including multiple functions such as a CPU function and a GPU (Graphics Processing Unit) function may be provided. Storage unit 22 includes an HDD, a ROM, and a RAM. However, a non-volatile memory such as an SSD may be used instead of the HDD or in addition to the HDD, ROM and RAM.

通信Ｉ／Ｆ２４は、ＣＰＵ２０の制御の下、ネットワーク１４を介して、操作者側端末１６およびサーバ１８などの外部のコンピュータとの間で、制御信号およびデータの送受信を行うために有線インタフェースを有する。ただし、通信Ｉ／Ｆ２４としては、無線ＬＡＮまたはBluetooth（登録商標）等の無線インタフェースを使用することもできる。 The communication I/F 24 has a wired interface for transmitting and receiving control signals and data to and from external computers such as the operator terminal 16 and the server 18 via the network 14 under the control of the CPU 20. . However, as the communication I/F 24, a wireless interface such as a wireless LAN or Bluetooth (registered trademark) can also be used.

入出力Ｉ／Ｆ２６には、入力装置２８、表示装置３０、マイク３２およびスピーカ３４が接続されている。入力装置２８は、タッチパネルおよびハードウェアのボタンである。タッチパネルは、汎用のタッチパネルであり、静電容量方式、電磁誘導方式、抵抗膜方式、赤外線方式など、任意の方式のものを用いることができる。後述する操作者側端末１６についても同様である。 An input device 28 , a display device 30 , a microphone 32 and a speaker 34 are connected to the input/output I/F 26 . The input device 28 is a touch panel and hardware buttons. The touch panel is a general-purpose touch panel, and any type of touch panel such as an electrostatic capacity type, an electromagnetic induction type, a resistive film type, an infrared type, or the like can be used. The same applies to the operator-side terminal 16, which will be described later.

ただし、利用者側端末１２として、ノート型ＰＣまたはデスクトップ型ＰＣが用いられる場合には、入力装置２８として、キーボードおよびコンピュータマウスが使用される。 However, when a notebook PC or desktop PC is used as the user-side terminal 12, a keyboard and a computer mouse are used as the input device 28. FIG.

また、表示装置３０は、ＬＣＤまたは有機ＥＬディスプレイである。上記のタッチパネルは、表示装置３０の表示面上に設けられてもよいし、タッチパネルが表示装置３０と一体的に形成されたタッチディスプレイが設けられてもよい。このことは、後述する操作者側端末１６についても同様である。 Also, the display device 30 is an LCD or an organic EL display. The above touch panel may be provided on the display surface of the display device 30, or a touch display in which the touch panel is integrally formed with the display device 30 may be provided. This also applies to the operator-side terminal 16, which will be described later.

入出力Ｉ／Ｆ２６は、マイク３２で検出された利用者の音声をデジタルの音声データに変換してＣＰＵ２０に出力するとともに、ＣＰＵ２０によって出力される音声データをアナログの音声信号に変換してスピーカ３４から出力させる。ただし、ＣＰＵ２０から出力される音声データは、操作者側端末１６から受信した音声データである。また、入出力Ｉ／Ｆ２６は、入力装置２８から入力された操作データ（または、操作情報）をＣＰＵ２０に出力するとともに、ＣＰＵ２０によって生成された画像データを表示装置３０に出力して、画像データに対応する画面または画像を表示装置３０に表示させる。ただし、外部のコンピュータ（たとえば、操作者側端末１６またはサーバ１８）から受信した画像データがＣＰＵ２０によって出力される場合もある。 The input/output I/F 26 converts the user's voice detected by the microphone 32 into digital voice data and outputs it to the CPU 20, and also converts the voice data output by the CPU 20 into an analog voice signal and outputs it to the speaker 34. output from However, the audio data output from the CPU 20 is the audio data received from the operator-side terminal 16 . In addition, the input/output I/F 26 outputs operation data (or operation information) input from the input device 28 to the CPU 20, and outputs image data generated by the CPU 20 to the display device 30 so that the image data A corresponding screen or image is displayed on the display device 30 . However, image data received from an external computer (for example, operator-side terminal 16 or server 18) may be output by CPU 20. FIG.

なお、図２に示す利用者側端末１２の電気的な構成は一例であり、限定される必要はない。他の実施例では、利用者側端末１２はカメラを備えていてもよい。 Note that the electrical configuration of the user-side terminal 12 shown in FIG. 2 is an example, and does not need to be limited. In another embodiment, the user terminal 12 may be equipped with a camera.

また、利用者側端末１２がスマートフォンである場合には、携帯電話通信網、または、携帯電話網および公衆電話網を介して、通話するための通話回路を備えるが、第１実施例では、そのような通話は行わないため、図示は省略してある。このことは、後述する操作者側端末１６がスマートフォンである場合についても同じである。 Further, when the user-side terminal 12 is a smart phone, it is provided with a call circuit for making calls via a mobile phone communication network, or a mobile phone network and a public telephone network. Since such a call is not made, illustration is omitted. This is the same when the operator-side terminal 16, which will be described later, is a smart phone.

図３は図１に示した操作者側端末１６の電気的な構成を示すブロック図である。図３に示すように、操作者側端末１６はＣＰＵ５０を含み、ＣＰＵ５０は、内部バスを介して、記憶部５２、通信Ｉ／Ｆ５４および入出力Ｉ／Ｆ５６に接続される。 FIG. 3 is a block diagram showing the electrical configuration of the operator-side terminal 16 shown in FIG. As shown in FIG. 3, the operator-side terminal 16 includes a CPU 50, and the CPU 50 is connected to a storage section 52, a communication I/F 54 and an input/output I/F 56 via an internal bus.

ＣＰＵ５０は、操作者側端末１６の全体的な制御を司る。ただし、ＣＰＵ５０に代えて、ＣＰＵ機能、ＧＰＵ機能等の複数の機能を含むＳｏＣを設けてもよい。記憶部５２は、ＨＤＤ、ＲＯＭおよびＲＡＭを含む。ただし、ＨＤＤに代えて、または、ＨＤＤ、ＲＯＭおよびＲＡＭに加えて、ＳＳＤ等の不揮発性メモリが使用されてもよい。 The CPU 50 is in charge of overall control of the operator-side terminal 16 . However, instead of the CPU 50, an SoC including multiple functions such as a CPU function and a GPU function may be provided. Storage unit 52 includes an HDD, a ROM, and a RAM. However, a non-volatile memory such as an SSD may be used instead of the HDD or in addition to the HDD, ROM and RAM.

通信Ｉ／Ｆ５４は、ＣＰＵ５０の制御の下、ネットワーク１４を介して、利用者側端末１２およびサーバ１８などの外部のコンピュータとの間で、制御信号およびデータの送受信を行うために有線インタフェースを有する。ただし、通信Ｉ／Ｆ５４としては、無線ＬＡＮまたはBluetooth（登録商標）等の無線インタフェースを使用することもできる。 The communication I/F 54 has a wired interface for transmitting and receiving control signals and data between the user terminal 12 and an external computer such as the server 18 via the network 14 under the control of the CPU 50. . However, as the communication I/F 54, a wireless interface such as a wireless LAN or Bluetooth (registered trademark) can also be used.

入出力Ｉ／Ｆ５６には、入力装置５８および表示装置６０、マイク６２およびスピーカ６４が接続されている。マイク６２およびスピーカ６４は、操作者が利用者との間で音声通話するために使用するマイク付きのヘッドセットを構成する。 An input device 58 , a display device 60 , a microphone 62 and a speaker 64 are connected to the input/output I/F 56 . Microphone 62 and speaker 64 constitute a microphone-equipped headset used by the operator for voice communication with the user.

また、入力装置５８としては、キーボードおよびコンピュータマウスが用いられる。ただし、操作者側端末１６として、スマートフォンまたはタブレットＰＣが用いられる場合には、入力装置５８として、タッチパネルおよびハードウェアのボタンが設けられる。また、表示装置６０は、ＬＣＤまたは有機ＥＬディスプレイである。 A keyboard and a computer mouse are used as the input device 58 . However, when a smart phone or a tablet PC is used as the operator-side terminal 16, a touch panel and hardware buttons are provided as the input device 58. FIG. Also, the display device 60 is an LCD or an organic EL display.

入出力Ｉ／Ｆ５６は、マイク６２で検出された操作者の音声をデジタルの音声データに変換してＣＰＵ５０に出力するとともに、ＣＰＵ５０によって出力される音声データをアナログの音声信号に変換してスピーカ６４から出力させる。ただし、第１実施例では、ＣＰＵ５０から出力される音声データは、利用者側端末１２から受信した音声データである。また、入出力Ｉ／Ｆ５６は、入力装置５８から入力された操作データ（または、操作情報）をＣＰＵ５０に出力するとともに、ＣＰＵ５０によって生成された画像データを表示装置６０に出力して、画像データに対応する画像を表示装置６０に表示させる。 The input/output I/F 56 converts the operator's voice detected by the microphone 62 into digital voice data and outputs it to the CPU 50, and converts the voice data output by the CPU 50 into an analog voice signal and outputs it to the speaker 64. output from However, in the first embodiment, the audio data output from the CPU 50 is the audio data received from the user terminal 12 . In addition, the input/output I/F 56 outputs operation data (or operation information) input from the input device 58 to the CPU 50, and outputs image data generated by the CPU 50 to the display device 60 so that the image data The corresponding image is displayed on the display device 60 .

なお、図３に示す操作者側端末１６の電気的な構成は一例であり、限定される必要はない。他の実施例では、操作者側端末１６はカメラを備えていてもよい。 Note that the electrical configuration of the operator-side terminal 16 shown in FIG. 3 is an example, and does not need to be limited. In another embodiment, operator-side terminal 16 may be equipped with a camera.

このような情報処理システム１０では、利用者が利用者側端末１２を使用して、サーバ１８が提供する所定のサービスのウェブ画面１００を見ている場合に、所定の条件を満たすと、操作者（オペレータ）とチャットまたはトークでコミュニケーションできる、アプリケーション（以下、単に「アプリ」という）が起動される。 In such an information processing system 10, when a user uses the user-side terminal 12 to view a web screen 100 of a predetermined service provided by the server 18, and a predetermined condition is satisfied, the operator can operate. An application (hereinafter simply referred to as "application") is launched that allows communication with (operator) via chat or talk.

一例として、所定のサービスは、オンラインショッピングであるが、チャットまたはトークで、利用者の問い合わせに対して対応（応答）することができる、任意のオンラインサービスである。 As an example, the predetermined service is online shopping, but it is any online service that can respond (response) to user inquiries through chat or talk.

図４は、ウェブ画面１００の前面に、アプリの選択画面１１０が表示された場合の一例を示す。ただし、ウェブ画面１００は、ウェブブラウザを起動し、所定のＵＲＬを入力することにより、表示装置３０に表示される。ウェブ画面１００は、所定のサービスのウェブサイト（または、ウェブページ）の画面である。図４では、或るオンラインショッピングのウェブ画面１００の例が示される。また、選択画面１１０は、アプリが起動されたときなどに表示される初期画面である。 FIG. 4 shows an example when an application selection screen 110 is displayed in front of the web screen 100 . However, the web screen 100 is displayed on the display device 30 by activating a web browser and entering a predetermined URL. The web screen 100 is a screen of a website (or web page) of a given service. In FIG. 4, an example of a web screen 100 of some online shopping is shown. Also, the selection screen 110 is an initial screen displayed when the application is activated.

上述したように、アプリは、所定の条件を満たす場合に、起動される。この第１実施例では、所定の条件は、所定のサービスのウェブ画面（第１実施例では、ウェブ画面１００）を表示した状態において、利用者がアプリの起動（または、実行）を指示したこと、利用者の操作が第１所定時間（この第１実施例では、３０秒）以上無いこと、当該ウェブ画面において同じ位置または似たような場所（近くの位置）を指示していること、所定のサービスにおいて複数回（たとえば、３回）同じウェブ画面に戻ってくることである。 As described above, an application is activated when a predetermined condition is satisfied. In the first embodiment, the predetermined condition is that the user instructs to start (or execute) the application while the web screen of the predetermined service (the web screen 100 in the first embodiment) is displayed. , the user does not operate for more than a first predetermined time (30 seconds in this first embodiment), the same position or a similar place (nearby position) is indicated on the web screen, a predetermined is to return to the same web screen multiple times (for example, three times) in the service.

なお、図４では、ウェブ画面１００が、利用者側端末１２がスマートフォンである場合の表示装置６０に表示された例を示してある。また、ウェブ画面１００は一例であり、オンラインショッピング以外の他のサービスについての画面が表示される場合もある。 Note that FIG. 4 shows an example in which the web screen 100 is displayed on the display device 60 when the user-side terminal 12 is a smart phone. Also, the web screen 100 is an example, and screens for services other than online shopping may be displayed.

選択画面１１０には、画面の上部に、表示枠１１２が設けられ、表示枠１１２の下方に、ボタン１１４、ボタン１１６およびボタン１１８が縦に並んで設けられる。 A selection screen 110 is provided with a display frame 112 in the upper part of the screen, and buttons 114 , 116 and 118 are vertically arranged below the display frame 112 .

表示枠１１２はアバターの画像１２０を表示するための枠画像である。この第１実施例では、アバターの画像１２０はチャットまたはトークの相手（対話の相手）である操作者の分身となるキャラクタであり、予め設定されている。したがって、アバターの画像１２０は、人間を模したキャラクタであり、この第１実施例では、頭部および首についての画像である。 A display frame 112 is a frame image for displaying the avatar image 120 . In the first embodiment, the avatar image 120 is a preset character that is the alter ego of the operator who is the partner of the chat or talk (dialogue partner). Thus, the avatar image 120 is a human-like character, and in this first embodiment is an image of the head and neck.

ただし、アバターの画像１２０は、動物またはロボットを模したキャラクタ、アニメキャラクタ、ゲームキャラクタなどの画像でもよい。また、アバターの画像１２０は、キャラクタの上半身または全身についての画像でもよい。 However, the avatar image 120 may be an image of a character imitating an animal or a robot, an animation character, a game character, or the like. Also, the avatar image 120 may be an image of the character's upper body or whole body.

また、この第１実施例では、アバターの画像１２０は、チャットまたはトークする場合に、操作者のチャットにおける応答内容の表示またはトークにおける応答内容（または、発話内容）の音声の出力に合せて発話するまたは発話動作を行う。この第１実施例では、アバターの画像１２０は頭部および首が表示されるため、発話動作では、アバターの画像１２０の口唇部がチャットにおける応答内容のテキストの表示またはトークにおける応答内容の音声の出力に合わせて動かされる。したがって、アバターが実際にしゃべっているように表現される。 In addition, in the first embodiment, when chatting or talking, the avatar image 120 speaks in accordance with the display of the operator's response content in the chat or the output of the voice of the response content (or utterance content) in the talk. or speak. In this first embodiment, since the head and neck of the avatar image 120 are displayed, the lip portion of the avatar image 120 displays the text of the response in chat or the voice of the response in talk. driven by the output. Therefore, it is represented as if the avatar is actually speaking.

アバターの画像１２０は、応答内容の音声を出力していない状態、すなわち、アバターが発話していないまたは発話動作を行っていない状態（以下、「通常時」という）において、表示枠１１２に収まる大きさで表示（または、描画）される。 The avatar image 120 is sized to fit within the display frame 112 in a state in which the voice of the response content is not being output, that is, in a state in which the avatar is not speaking or performing a speaking motion (hereinafter referred to as “normal time”). displayed (or drawn) at

また、選択画面１１０においては、アバターは、自然の動作（以下、「無意識動作」という）を行う。無意識動作の代表的な例としては、瞬きや呼吸が該当する。また、このような生理的な動作のみならず、癖による動作も無意識動作に含まれる。たとえば、癖による動作としては、髪の毛を触る動作、顔を触る動作および爪を噛む動作などが該当する。ただし、選択画面１１０が表示されると、アバターの画像１２０は、最初に、静止した状態で表示され、続いて、利用者に対して挨拶する（たとえば、お辞儀する）ように表示される。 Also, on the selection screen 110, the avatar performs a natural action (hereinafter referred to as "unconscious action"). Typical examples of unconscious actions include blinking and breathing. In addition to such physiological actions, habitual actions are also included in unconscious actions. For example, habitual motions include motions of touching hair, motions of touching the face, and motions of biting nails. However, when the selection screen 110 is displayed, the avatar image 120 is first displayed stationary and then displayed greeting (eg, bowing) to the user.

したがって、選択画面１１０においては、静止した状態のアバターの画像１２０が表示された後に、無意識動作または挨拶の動作を行うアバターの画像１２０が表示される。本願発明の本質的な内容ではないため、詳細な説明は省略するが、一例として、静止した状態のアバターの画像１２０の表示は、予め記憶され静止した状態の画像データを出力（または、再生）することにより行われる。また、無意識動作および挨拶するときの動作を行うアバターの画像１２０の表示については、予め記憶された動画（アニメーション）データを再生することにより行われる。 Therefore, in the selection screen 110, after the static avatar image 120 is displayed, the avatar image 120 performing the unconscious action or the greeting action is displayed. Although detailed description is omitted because it is not essential to the present invention, as an example, the display of the static avatar image 120 is performed by outputting (or reproducing) image data stored in advance and static. It is done by Also, the display of the image 120 of the avatar performing the unconscious action and the greeting action is performed by reproducing pre-stored moving image (animation) data.

ボタン１１４は、利用者が操作者とチャットするためのボタンである。チャットとは、テキストのやり取りによってリアルタイムに話をすることを意味し、この第１実施例では、利用者および操作者の操作によって、利用者側端末１２と操作者側端末１６の間で、テキストデータが送受信される。 Button 114 is a button for the user to chat with the operator. Chatting means talking in real time by exchanging texts. Data is sent and received.

ボタン１１６は、利用者が操作者とトークするためのボタンである。トークとは、音声のやり取りによってリアルタイムに話をすることを意味し、この第１実施例では、利用者および操作者の発話によって、利用者側端末１２と操作者側端末１６の間で、音声データが送受信される。 Button 116 is a button for the user to talk with the operator. Talk means to talk in real time by exchanging voice. Data is sent and received.

ボタン１１８は、アプリを終了するためのボタンである。ボタン１１８がオンされると、アプリが終了され、選択画面１１０が非表示される。 A button 118 is a button for ending the application. When the button 118 is turned on, the application is ended and the selection screen 110 is hidden.

図５はウェブ画面１００の前面にチャット画面１３０が表示された状態の一例を示す。選択画面１１０においてボタン１１４がオンされると、図５に示すようなチャット画面１３０が表示される。 FIG. 5 shows an example of a state in which a chat screen 130 is displayed in front of the web screen 100. As shown in FIG. When button 114 is turned on on selection screen 110, chat screen 130 as shown in FIG. 5 is displayed.

チャット画面１３０には、画面の上部に、表示枠１３２が設けられ、表示枠１３２内にアバターの画像１２０が表示される。また、表示枠１３２の下方には、表示枠１３４および表示枠１３６が縦に並んで設けられ、表示枠１３６の下方に、ボタン１３８が設けられる。 A chat screen 130 is provided with a display frame 132 in the upper part of the screen, and an avatar image 120 is displayed within the display frame 132 . A display frame 134 and a display frame 136 are vertically arranged below the display frame 132 , and a button 138 is provided below the display frame 136 .

表示枠１３４は、操作者の応答内容を表示するための枠画像である。操作者の応答内容は、利用者の問い合わせ（または、質問）に対して応答（または、回答）する具体的な内容であるが、利用者に対する挨拶および利用者に対する問いかけなども含まれる。つまり、操作者側端末１６から送信されたテキストデータが表示枠１３４に表示される。詳細な説明は省略するが、チャットの場合には、操作者側端末１６から送信されたテキストデータは、操作者が入力装置５８を用いてキー入力した内容についてのデータである。 The display frame 134 is a frame image for displaying the content of the operator's response. The content of the operator's response is the specific content of the response (or answer) to the user's inquiry (or question), but includes greetings to the user and questions to the user. That is, the text data transmitted from the operator-side terminal 16 is displayed in the display frame 134 . Although detailed description is omitted, in the case of chat, the text data transmitted from the operator-side terminal 16 is data about the contents of key input by the operator using the input device 58 .

表示枠１３６は、利用者の問い合わせ（質問）内容を表示するための枠画像である。チャットの場合には、利用者の質問内容は、利用者側端末１２の入力装置２８を用いて入力される。図５では省略するが、スマートフォンやタブレットＰＣでは、チャット画面１３０とは別にソフトウェアキーボードが表示され、ソフトウェアキーボードを用いて文字（テキスト）をタッチ入力したり、文字の入力が完了したこと（質問内容の送信）の指示をタッチ入力したりすることができる。質問内容の送信が指示されると、質問内容のテキストデータが操作者側端末１６に送信される。したがって、操作者側端末１６の表示装置６０に質問内容のテキストが表示される。図示は省略するが、チャットにおいては、操作者側端末１６の表示装置６０に、応答内容と質問内容を表示可能なチャット画面が表示される。 The display frame 136 is a frame image for displaying the content of the user's inquiry (question). In the case of chatting, the user's question is input using the input device 28 of the user-side terminal 12 . Although omitted in FIG. 5, on a smartphone or tablet PC, a software keyboard is displayed separately from the chat screen 130, and characters (text) are touch-inputted using the software keyboard, or characters are completed (question content). You can touch input instructions for sending (sending). When the transmission of the question content is instructed, the text data of the question content is sent to the operator side terminal 16 . Therefore, the text of the question content is displayed on the display device 60 of the operator-side terminal 16 . Although not shown, in the chat, the display device 60 of the operator-side terminal 16 displays a chat screen on which the content of the response and the content of the question can be displayed.

ボタン１３８は、チャットを終了するためのボタンである。ボタン１３８がオンされると、チャットを終了し、チャット画面１３０が非表示され、選択画面１１０がウェブ画面１００の前面に表示される。 A button 138 is a button for ending the chat. When the button 138 is turned on, the chat ends, the chat screen 130 is hidden, and the selection screen 110 is displayed in front of the web screen 100 .

図６はウェブ画面１００の前面にトーク画面１５０が表示された状態の一例を示す。選択画面１１０においてボタン１１６がオンされると、図６に示すようなトーク画面１５０が表示される。 FIG. 6 shows an example of a state in which a talk screen 150 is displayed in front of the web screen 100. As shown in FIG. When button 116 is turned on on selection screen 110, talk screen 150 as shown in FIG. 6 is displayed.

トーク画面１５０には、画面の上部に、表示枠１５２が設けられ、表示枠１５２内にアバターの画像１２０が表示される。また、トーク画面１５０には、表示枠１５２の下方であり、画面の下部に、ボタン１５４が設けられる。 A display frame 152 is provided in the upper part of the screen of the talk screen 150 , and an avatar image 120 is displayed within the display frame 152 . Further, on the talk screen 150, a button 154 is provided below the display frame 152 and at the bottom of the screen.

詳細な説明は省略するが、トークの場合には、操作者側端末１６から送信される音声データは、操作者がマイク６２を通して入力した音声についてのデータである。操作者側端末１６から送信された音声データは、利用者側端末１２で受信され、スピーカ３４から出力される。 Although detailed description is omitted, in the case of talk, the voice data transmitted from the operator-side terminal 16 is data about voice input by the operator through the microphone 62 . The voice data transmitted from the operator-side terminal 16 is received by the user-side terminal 12 and output from the speaker 34 .

また、トークの場合には、利用者側端末１２から送信される音声データは、利用者がマイク３２を通して入力した音声についてのデータである。また、利用者側端末１２から送信された音声データは、操作者側端末１６で受信され、スピーカ６４から出力される。 In the case of talk, the voice data transmitted from the user-side terminal 12 is data about the voice input by the user through the microphone 32 . Also, voice data transmitted from the user-side terminal 12 is received by the operator-side terminal 16 and output from the speaker 64 .

ボタン１５４は、トークを終了するためのボタンである。ボタン１５４がオンされると、トークを終了し、トーク画面１５０が非表示され、選択画面１１０がウェブ画面１００の前面に表示される。 A button 154 is a button for ending the talk. When the button 154 is turned on, the talk ends, the talk screen 150 is hidden, and the selection screen 110 is displayed in front of the web screen 100 .

上記のように、操作者の音声に対応する音声データは、利用者側端末１２のスピーカ６４から出力されるが、このとき、アバターは発話動作を行う。この第１実施例では、アバターの画像１２０は、スピーカ６４から出力される音声にリップシンクされる。したがって、アバターの画像１２０が喋っているように表現される。 As described above, voice data corresponding to the voice of the operator is output from the speaker 64 of the user-side terminal 12. At this time, the avatar makes a speaking motion. In this first embodiment, the avatar image 120 is lip-synced to the audio output from the speaker 64 . Therefore, the avatar image 120 is expressed as if it were speaking.

また、操作者側端末１６では、操作者がマイク６２を通して音声を入力したときに、その音声の音量を検出し、検出した音量についてのデータ（以下、「音量データ」）を、音声データに付加して、利用者側端末１２に送信する。 When the operator inputs voice through the microphone 62, the operator-side terminal 16 detects the volume of the voice, and adds data about the detected volume (hereinafter referred to as "volume data") to the voice data. and transmits it to the user-side terminal 12 .

ただし、音量データは、マイク６２で検出された音声の音量の第２所定時間（この第１実施例では、１／１０秒程度）分の平均値についてのデータであり、第２所定時間毎に算出される。ただし、平均値は一例であり、第２所定時間における音量の最大値でもよい。 However, the volume data is data about the average value of the volume of the voice detected by the microphone 62 for a second predetermined time period (about 1/10 second in this first embodiment), and is measured every second predetermined time period. Calculated. However, the average value is an example, and the maximum value of the volume in the second predetermined time period may be used.

利用者側端末１２は、操作者側端末１６から受信した音声データに付加された音量データに基づいてアバターの画像１２０の大きさを決定するための比率ｐを算出する。ただし、比率ｐは、通常時におけるアバターの画像１２０の大きさを１（１００％）とした場合の変化後の大きさの割合である。この第１実施例では、比率ｐは操作者の音声の音量が所定値よりも大きい場合に数１に従って算出される。また、第１実施例では、操作者の音声の音量が所定値よりも大きい場合において、音量が大きくなるにつれてアバターの画像１２０の大きさが大きくされる。ただし、音量が小さく、比率ｐが１よりも小さい場合には、アバターの画像１２０の大きさが通常時よりも小さくされる。 The user-side terminal 12 calculates a ratio p for determining the size of the avatar image 120 based on the volume data added to the voice data received from the operator-side terminal 16 . However, the ratio p is the ratio of the size after the change when the size of the avatar image 120 in the normal state is set to 1 (100%). In this first embodiment, the ratio p is calculated according to Equation 1 when the volume of the operator's voice is greater than a predetermined value. Further, in the first embodiment, when the volume of the voice of the operator is higher than the predetermined value, the size of the avatar image 120 is increased as the volume increases. However, when the volume is low and the ratio p is less than 1, the size of the avatar image 120 is made smaller than normal.

ただし、ｍは操作者の音声の音量であり、Ｍは予め設定した音量の最大値であり、Ｐは音量が最大値である場合の比率（たとえば、１．４）である。また、比率ｐの最小値は０．８に設定され、この最小値よりも小さい値になる場合の音量が所定値以下である。なお、比率ｐの初期値は１であり、比率ｐが算出されない場合には、初期値のままである。 Here, m is the volume of the operator's voice, M is the preset maximum volume, and P is the ratio (for example, 1.4) when the volume is at the maximum value. Also, the minimum value of the ratio p is set to 0.8, and the volume when the value is smaller than this minimum value is equal to or less than the predetermined value. Note that the initial value of the ratio p is 1, and remains the initial value when the ratio p is not calculated.

［数１］
ｐ＝Ｐ（ｍ／Ｍ）
図７（Ａ）は通常時におけるアバターの画像１２０を表示したトーク画面１５０の一例を示し、図７（Ｂ）は比率ｐ＝１．４で通常時から拡大したアバターの画像１２０を表示したトーク画面１５０の一例を示し、図７（Ｃ）は比率ｐ＝０．８で通常時から縮小したアバターの画像１２０を表示したトーク画面１５０の一例を示す。 [Number 1]
p=P(m/M)
FIG. 7(A) shows an example of a talk screen 150 displaying an avatar image 120 at normal time, and FIG. 7(B) shows an example of a talk screen displaying an avatar image 120 enlarged from the normal time at a ratio p=1.4. An example of the screen 150 is shown, and FIG. 7C shows an example of the talk screen 150 displaying the avatar image 120 reduced from the normal time at a ratio p=0.8.

詳細な説明は省略するが、トーク画面１５０（選択画面１１０およびチャット画面１３０も同様）では、アバターの画像１２０とアバターの画像１２０以外の画像（画面の表示枠、画面の背景、画面内の表示枠およびボタンの画像）は別のレイヤーで描画され、アバターの画像１２０が描画されたレイヤーが、アバターの画像１２０以外の画像が描画されたレイヤーの前面に配置される。 Although detailed description is omitted, on the talk screen 150 (the same applies to the selection screen 110 and the chat screen 130), the avatar image 120 and images other than the avatar image 120 (screen display frame, screen background, display in the screen) frame and button images) are drawn on separate layers, and the layer on which the avatar image 120 is drawn is placed in front of the layer on which images other than the avatar image 120 are drawn.

また、仮想空間において、通常時における、仮想カメラ（視点）の位置およびアバターの位置は予め決定されており、アバターの画像１２０を拡大または縮小する場合には、比率ｐに応じて、仮想カメラの位置または／およびアバターの位置が移動され、仮想カメラとアバターの距離が変更される。 Also, in the virtual space, the position of the virtual camera (viewpoint) and the position of the avatar are determined in advance in normal times. The position and/or position of the avatar is moved and the distance between the virtual camera and the avatar is changed.

ただし、他の実施例では、アバターの画像１２０を拡大または縮小する場合には、描画するアバターの画像１２０の大きさを拡大または縮小してもよいし、仮想カメラの画角を拡大または縮小してもよい。 However, in other embodiments, when enlarging or reducing the avatar image 120, the size of the drawn avatar image 120 may be enlarged or reduced, or the angle of view of the virtual camera may be enlarged or reduced. may

図７（Ａ）に示すように、通常時では、アバターの画像１２０は、上述したように、表示枠１５２に収まる所定の大きさで表示される。通常時では、アバターの頭部と首が表示される。 As shown in FIG. 7A, normally, the avatar image 120 is displayed in a predetermined size that fits within the display frame 152, as described above. Normally, the avatar's head and neck are displayed.

図７（Ｂ）に示すように、アバターの画像１２０が拡大して表示されると、アバターの画像１２０は表示枠１５２からはみ出すことがある。したがって、２次元の画面に表示されたアバターの画像１２０が３次元の現実空間に飛び出そうとしているように見える。 As shown in FIG. 7B , when the avatar image 120 is enlarged and displayed, the avatar image 120 may protrude from the display frame 152 . Therefore, the avatar image 120 displayed on the two-dimensional screen looks like it is about to jump out into the three-dimensional real space.

図７（Ｃ）に示すように、アバターの画像１２０が縮小して表示されると、アバターの画像１２０は利用者から離れる（または、遠ざかる）ように見える。 As shown in FIG. 7(C), when the avatar image 120 is displayed in a reduced size, the avatar image 120 appears to move away (or go away) from the user.

図示は省略するが、比率ｐは０．８以上１．４以下の間で算出されるため、アバターの画像１２０は、表示枠１５２からはみ出さないで、表示枠１５２内で拡大される場合もある。 Although illustration is omitted, since the ratio p is calculated between 0.8 and 1.4, the avatar image 120 may be enlarged within the display frame 152 without protruding from the display frame 152. be.

このように、アバターの画像１２０が発話動作を行う場合には、音量に応じて算出した比率ｐに応じてアバターの画像１２０を拡大または縮小するので、奥行き感を表現することができ、２次元の画面に表示されているにも関わらず、立体感が得られる。つまり、存在感を増したアバターの画像１２０を表示することができる。 In this way, when the avatar image 120 makes a speaking motion, the avatar image 120 is enlarged or reduced in accordance with the ratio p calculated according to the volume. A three-dimensional effect can be obtained even though it is displayed on the screen of That is, the avatar image 120 with increased presence can be displayed.

また、拡大したアバターの画像１２０が表示枠１５２（枠画像）からはみ出すように表示される場合には、３次元の現実空間に飛び出そうとしているように見える。この場合にも、存在感を増したアバターの画像１２０を表示することができる。 Also, when the enlarged avatar image 120 is displayed so as to protrude from the display frame 152 (frame image), it looks like it is about to jump out into a three-dimensional real space. In this case also, the avatar image 120 with increased presence can be displayed.

図８は利用者側端末１２に内蔵される記憶部（ここでは、ＲＡＭ）２２のメモリマップ３００の一例を示す。ＲＡＭは、ＣＰＵ２０のワーク領域およびバッファ領域として使用される。図８に示すように、記憶部２２は、プログラム記憶領域３０２およびデータ記憶領域３０４を含む。プログラム記憶領域３０２には、この実施例の制御プログラムが記憶されている。 FIG. 8 shows an example of a memory map 300 of the storage unit (here, RAM) 22 built into the user-side terminal 12. As shown in FIG. The RAM is used as a work area and a buffer area for the CPU 20 . As shown in FIG. 8, storage unit 22 includes program storage area 302 and data storage area 304 . A program storage area 302 stores the control program of this embodiment.

制御プログラムは、起動判断プログラム３０２ａ、メイン処理プログラム３０２ｂ、操作検出プログラム３０２ｃ、通信プログラム３０２ｄ、画像生成プログラム３０２ｅ、画像出力プログラム３０２ｆ、アバター制御プログラム３０２ｇ、比率算出プログラム３０２ｈ、音検出プログラム３０２ｉおよび音出力プログラム３０２ｊなどを含む。上述した第１実施例のアプリは、メイン処理プログラム３０２ｂ、操作検出プログラム３０２ｃ、通信プログラム３０２ｄ、画像生成プログラム３０２ｅ、画像出力プログラム３０２ｆ、アバター制御プログラム３０２ｇ、比率算出プログラム３０２ｈ、音検出プログラム３０２ｉおよび音出力プログラム３０２ｊを含む。 The control programs include an activation determination program 302a, a main processing program 302b, an operation detection program 302c, a communication program 302d, an image generation program 302e, an image output program 302f, an avatar control program 302g, a ratio calculation program 302h, a sound detection program 302i, and sound output. Includes program 302j and the like. The application of the first embodiment described above includes a main processing program 302b, an operation detection program 302c, a communication program 302d, an image generation program 302e, an image output program 302f, an avatar control program 302g, a ratio calculation program 302h, a sound detection program 302i, and a sound Contains output program 302j.

ただし、アプリは、利用者側端末１２が端末本体の機能として備える、操作検出プログラム、通信プログラム、画像生成プログラム、画像出力プログラム、音検出プログラムおよび音出力プログラムを利用することもできる。 However, the application can also use an operation detection program, a communication program, an image generation program, an image output program, a sound detection program, and a sound output program that the user-side terminal 12 has as functions of the terminal body.

起動判断プログラム３０２ａは、この第１実施例のアプリを起動するかどうかを判断するためのプログラムである。メイン処理プログラム３０２ｂは、この第１実施例のアプリのメインルーチンの処理（全体的な処理）を実行するためのプログラムである。 The activation determination program 302a is a program for determining whether to activate the application of the first embodiment. The main processing program 302b is a program for executing the main routine processing (overall processing) of the application of the first embodiment.

操作検出プログラム３０２ｃは、利用者の操作に従って入力装置２８から入力される操作データ３０４ａを検出し、データ記憶領域３０４に記憶するためのプログラムである。 The operation detection program 302c is a program for detecting operation data 304a input from the input device 28 according to the user's operation and storing it in the data storage area 304. FIG.

通信プログラム３０２ｄは、外部の機器、この第１実施例では、所定のサービスを提供するサイトを運営するためのサーバおよび操作者側端末１６と有線または無線で通信（データの送信および受信）するためのプログラムである。 The communication program 302d communicates (transmits and receives data) with an external device, in this first embodiment, a server for managing a site that provides a predetermined service and the operator terminal 16 by wire or wirelessly. program.

画像生成プログラム３０２ｅは、表示装置３０に表示するための各種の画面の全部または一部に対応する画像データを、画像生成データ３０４ｄを用いて生成するためのプログラムである。 The image generation program 302e is a program for generating image data corresponding to all or part of various screens to be displayed on the display device 30 using the image generation data 304d.

画像出力プログラム３０２ｆは、画像生成プログラム３０２ｅに従って生成した画像データを表示装置３０に出力するためのプログラムである。 The image output program 302f is a program for outputting to the display device 30 image data generated according to the image generation program 302e.

アバター制御プログラム３０２ｇは、アバターを動作させるためのプログラムである。この第１実施例では、ＣＰＵ２０は、アバター制御プログラム３０２ｇに従って、アバターに発話動作をさせたり、アバターに無意識動作をさせたり、アバターに挨拶の動作（挨拶の音声出力を含む）をさせたりする。 The avatar control program 302g is a program for operating the avatar. In this first embodiment, the CPU 20 causes the avatar to speak, make the avatar unconsciously make an avatar, or make the avatar make a greeting (including voice output of the greeting) according to the avatar control program 302g.

比率算出プログラム３０２ｈは、操作者の音声の音量に基づいて比率ｐを算出するためのプログラムである。また、第１実施例では、比率算出プログラム３０２ｈは、操作者の音声の音量が所定値よりも大きいかどうかを判断し、音量が所定値よりも大きい場合に、比率ｐを算出することを決定するためのプログラムでもある。 The ratio calculation program 302h is a program for calculating the ratio p based on the volume of the voice of the operator. Further, in the first embodiment, the ratio calculation program 302h determines whether the volume of the operator's voice is greater than a predetermined value, and if the volume is greater than the predetermined value, determines to calculate the ratio p. It is also a program for

音検出プログラム３０２ｉは、マイク３２から入力される操作者の音声を検出するためのプログラムである。 The sound detection program 302 i is a program for detecting the operator's voice input from the microphone 32 .

音出力プログラム３０２ｊは、受信した操作者の音声データを出力するためのプログラムである。 The sound output program 302j is a program for outputting the received voice data of the operator.

図示は省略するが、プログラム記憶領域３０２には、利用者側端末１２のオペレーティングシステムなどのミドルウェア、ブラウザ機能を実行するためのプログラム、本願のアプリ以外の他のアプリケーション・プログラムも記憶される。 Although not shown, the program storage area 302 also stores middleware such as the operating system of the user terminal 12, programs for executing browser functions, and application programs other than the application of the present application.

また、データ記憶領域３０４には、操作データ３０４ａ、送信データ３０４ｂ、受信データ３０４ｃ、画像生成データ３０４ｄおよび比率データ３０４ｅなどが記憶される。 The data storage area 304 also stores operation data 304a, transmission data 304b, reception data 304c, image generation data 304d, ratio data 304e, and the like.

操作データ３０４ａは、操作検出プログラム３０２ｃに従って検出された操作データである。送信データ３０４ｂは、操作者側端末１６に送信するデータであり、チャットにおける利用者の質問内容についてのテキストデータおよびトークにおける利用者の質問内容についての音声データである。受信データ３０４ｃは、操作者側端末１６から送信され、受信したデータであり、チャットにおける操作者の応答内容についてのテキストデータおよびトークにおける操作者の応答内容についての音声データである。 The operation data 304a is operation data detected according to the operation detection program 302c. The transmission data 304b is data to be transmitted to the operator-side terminal 16, and is text data about the content of the user's question in chat and voice data about the content of the user's question in talk. The received data 304c is data transmitted and received from the operator-side terminal 16, and is text data about the content of the operator's response to the chat and voice data about the content of the operator's response to the talk.

画像生成データ３０４ｄは、利用者側端末１２の表示装置３０に表示される各種の画面を生成するためのデータであり、アバターの画像１２０を生成するためのデータを含む。また、アバターの画像１２０を生成するためのデータは、アバターの画像１２０の静止した状態の画像データ、無意識動作および挨拶の動作についてのアニメーションデータを含む。比率データ３０４ｅは、比率ｐについてのデータである。比率ｐの初期値は１であり、比率算出プログラム３０２ｈに従って算出された比率ｐで更新される。また、操作者の音声を出力していないとき、すなわち、アバターが発話または発話動作を行っていないとき、比率ｐはリセットされ、初期値に戻される。 The image generation data 304d is data for generating various screens displayed on the display device 30 of the user-side terminal 12, and includes data for generating the image 120 of the avatar. In addition, the data for generating the avatar image 120 includes still image data of the avatar image 120 and animation data about unconscious actions and greeting actions. The ratio data 304e is data about the ratio p. The initial value of the ratio p is 1, and is updated with the ratio p calculated according to the ratio calculation program 302h. Also, when the operator's voice is not being output, that is, when the avatar is not speaking or speaking, the ratio p is reset to its initial value.

図示は省略するが、データ記憶領域３０４には、制御処理を実行するために必要な他のデータが記憶されたり、タイマ（カウンタ）およびフラグが設けられたりする。 Although not shown, the data storage area 304 stores other data necessary for executing control processing, and is provided with timers (counters) and flags.

また、図示は省略するが、操作者側端末１６は利用者側端末１２との間でチャットまたはトークを行うため、操作者側端末１６の記憶部（ここでは、ＲＡＭ）５２には、利用者側端末１２の記憶部２２に記憶されるプログラムおよびデータと同様のプログラムおよびデータが記憶される。 Although not shown, the operator-side terminal 16 chats or talks with the user-side terminal 12. Therefore, the storage unit (here, RAM) 52 of the operator-side terminal 16 stores user information. Programs and data similar to those stored in the storage unit 22 of the side terminal 12 are stored.

具体的には、操作者側端末１６の記憶部５２のプログラム記憶領域には、メイン処理プログラム、操作検出プログラム、通信プログラム、画像生成プログラム、画像出力プログラム、音検出プログラム、音量検出プログラムおよび音出力プログラムなどが記憶される。 Specifically, the program storage area of the storage unit 52 of the operator-side terminal 16 contains a main processing program, an operation detection program, a communication program, an image generation program, an image output program, a sound detection program, a sound volume detection program, and sound output. Programs and the like are stored.

メイン処理プログラムは、チャットまたはトークでコミュニケーションを行う操作者側端末１６のアプリケーションのメインルーチンの処理（全体的な処理）を実行するためのプログラムである。 The main processing program is a program for executing the main routine processing (overall processing) of the application of the operator side terminal 16 that communicates by chat or talk.

操作検出プログラムは、操作者の操作に従って入力装置５８から入力される操作データを検出し、記憶部５２のデータ記憶領域に記憶するためのプログラムである。 The operation detection program is a program for detecting operation data input from the input device 58 according to the operation of the operator and storing it in the data storage area of the storage unit 52 .

通信プログラムは、外部の機器、この第１実施例では、利用者側端末１２およびサーバ１８と有線または無線で通信するためのプログラムである。 The communication program is a program for wired or wireless communication with external devices, in this first embodiment, the user terminal 12 and the server 18 .

画像生成プログラムは、表示装置６０に表示するための各種の画面に対応する画像データを、画像生成データを用いて生成するためのプログラムである。 The image generation program is a program for generating image data corresponding to various screens to be displayed on the display device 60 using image generation data.

画像出力プログラムは、画像生成プログラムに従って生成した画像データを表示装置６０に出力するためのプログラムである。 The image output program is a program for outputting image data generated according to the image generation program to the display device 60 .

ただし、操作者側端末１６では、チャットまたはトークを選択したり、アバターの画像を表示したりする必要はない。このため、選択画面１１０のような画面は表示されず、チャット画面１３０およびトーク画面１５０のような画面では、利用者のアバターの画像は表示されない。ただし、利用者のアバターの画像が表示されるようにしてもよい。 However, the operator-side terminal 16 does not need to select chat or talk, or display an avatar image. Therefore, a screen such as selection screen 110 is not displayed, and an image of the user's avatar is not displayed on screens such as chat screen 130 and talk screen 150 . However, an image of the user's avatar may be displayed.

音検出プログラムは、マイク６２から入力される操作者の音声を検出するためのプログラムである。 The sound detection program is a program for detecting the operator's voice input from the microphone 62 .

音量検出プログラムは、音量検出プログラムに従って検出された音声の音量を検出するためのプログラムである。上述したように、音量は、マイク６２で検出された音声の音量の第２所定時間（この第１実施例では、１／１０秒程度）分の平均値であり、第２所定時間毎に算出される。 The volume detection program is a program for detecting the volume of voice detected according to the volume detection program. As described above, the volume is the average value of the volume of the voice detected by the microphone 62 for the second predetermined time period (about 1/10 second in this first embodiment), and is calculated every second predetermined time period. be done.

音出力プログラムは、受信した利用者の音声データを出力するためのプログラムである。 The sound output program is a program for outputting received voice data of the user.

また、記憶部５２のデータ記憶領域には、送信データ、受信データ、画像生成データ、音声データおよび音量データなどが記憶される。 The data storage area of the storage unit 52 stores transmission data, reception data, image generation data, audio data, volume data, and the like.

操作データは、操作検出プログラムに従って検出された操作データである。送信データは、利用者側端末１２に送信するデータであり、チャットにおける操作者の応答についてのテキストデータおよびトークにおける操作者の応答についての音声データである。第１実施例では、音声データに音量データが付加される。受信データは、利用者側端末１２から送信され、受信したデータであり、チャットにおける利用者の質問についてのテキストデータおよびトークにおける利用者の質問についての音声データである。 The operation data is operation data detected according to the operation detection program. The transmission data is data to be transmitted to the user-side terminal 12, and is text data about the operator's response to the chat and voice data about the operator's response to the talk. In the first embodiment, volume data is added to audio data. The received data is data transmitted and received from the user-side terminal 12, and is text data about the user's question in the chat and voice data about the user's question in the talk.

画像生成データは、操作者側端末１６の表示装置６０に表示される各種の画面を生成するためのデータである。音声データは、音検出プログラムに従って検出された操作者の音声に対応するデータである。音量データは、音量検出プログラムに従って検出された音量に対応するデータである。 The image generation data is data for generating various screens displayed on the display device 60 of the operator-side terminal 16 . The voice data is data corresponding to the operator's voice detected according to the sound detection program. Volume data is data corresponding to the volume detected according to the volume detection program.

なお、記憶部５２には、利用者とチャットまたはトークを実行するために必要な他のプログラムおよびデータも記憶される。 Note that the storage unit 52 also stores other programs and data necessary for chatting or talking with the user.

図９－図１１は、利用者側端末１２のＣＰＵ２０の制御処理を示すフロー図である。図示は省略するが、ＣＰＵ２０は、制御処理と並行して、操作データの検出処理を実行するとともに、操作者側端末１６からのデータを受信する処理を実行する。 9 to 11 are flowcharts showing control processing of the CPU 20 of the user terminal 12. FIG. Although not shown, the CPU 20 executes processing for detecting operation data and processing for receiving data from the operator-side terminal 16 in parallel with the control processing.

図９に示すように、利用者側端末１２のＣＰＵ２０は、制御処理を開始すると、ステップＳ１で、アプリの起動条件を満たすかどうかを判断する。上述したように、ＣＰＵ２０は、所定のサービスのウェブ画面（第１実施例では、ウェブ画面１００）を表示した状態において、利用者がアプリの起動（または、実行）を指示した場合、利用者の操作が第１所定時間（たとえば、３０秒）以上無い場合、当該ウェブ画面において同じ位置または似たような場所（近くの位置）を指示している場合、所定のサービスにおいて複数回（たとえば、３回）同じウェブ画面に戻ってくる場合に、アプリの起動条件を満たすと判断する。 As shown in FIG. 9, when the control process is started, the CPU 20 of the user-side terminal 12 determines in step S1 whether or not the application activation condition is satisfied. As described above, when the user instructs to start (or execute) an application while the web screen of a predetermined service (the web screen 100 in the first embodiment) is displayed, the CPU 20 If there is no operation for a first predetermined time (for example, 30 seconds) or more, if the same location or a similar location (nearby location) is indicated on the web screen, multiple times (for example, 3 times) If it returns to the same web screen, it is determined that the application launch conditions are met.

ステップＳ１で“ＮＯ”であれば、つまり、アプリの起動条件を満たしていない場合には、ステップＳ１に戻る。一方、ステップＳ１で“ＹＥＳ”であれば、つまり、アプリの起動条件を満たしていれば、ステップＳ３で、アプリを起動する。なお、制御処理において、ステップＳ３以降がアプリのメインルーチンの処理である。 If "NO" in step S1, that is, if the application activation condition is not satisfied, the process returns to step S1. On the other hand, if "YES" in step S1, that is, if the application activation condition is satisfied, the application is activated in step S3. In the control process, steps after step S3 are the process of the main routine of the application.

続くステップＳ５では、図４に示したような選択画面１１０をウェブ画面１００の前面に表示する。ステップＳ５では、ＣＰＵ２０は、選択画面１１０についての画像データを生成し、生成した画像データを表示装置３０に出力する。以下、画面を表示する場合について同様である。 In subsequent step S5, the selection screen 110 as shown in FIG. 4 is displayed in front of the web screen 100. FIG. In step S<b>5 , CPU 20 generates image data for selection screen 110 and outputs the generated image data to display device 30 . Hereinafter, the same applies to the case of displaying the screen.

上述したように、アプリを起動した当初では、すなわち、アプリを起動して最初に選択画面１１０を表示するときに、ＣＰＵ２０は、静止した状態のアバターの画像１２０を表示した後に、アニメーションデータを再生し、アバターに挨拶の動作を行わせる。 As described above, when the application is initially started, that is, when the selection screen 110 is displayed for the first time after the application is started, the CPU 20 displays the static avatar image 120 and then reproduces the animation data. and let the avatar perform a greeting action.

次のステップＳ７では、アバターの画像１２０に待機動作を実行させる。上述したように、ＣＰＵ２０は、無意識動作を行わせる。ただし、ＣＰＵ２０は、選択画面１１０において利用者の操作が無い場合において、数秒または数十秒毎にアバターに無意識動作を実行させる。 In the next step S7, the avatar image 120 is caused to perform a standby action. As described above, the CPU 20 causes the unconscious action to be performed. However, when there is no user's operation on the selection screen 110, the CPU 20 causes the avatar to perform an unconscious action every several seconds or several tens of seconds.

続いて、ステップＳ９で、ボタン操作が有るかどうかを判断する。ここでは、ＣＰＵ２０は、操作データ３０４ａを参照して、選択画面１１０のボタン１１４またはボタン１１６がオンされたかどうかを判断する。なお、図示は省略するが、ボタン１１８がオンされた場合には、選択画面１１０を閉じて（非表示して）、アプリを終了する。 Subsequently, in step S9, it is determined whether or not there is a button operation. Here, CPU 20 refers to operation data 304a to determine whether button 114 or button 116 on selection screen 110 has been turned on. Although illustration is omitted, when the button 118 is turned on, the selection screen 110 is closed (hidden) and the application is terminated.

ステップＳ９で“ＮＯ”であれば、つまり、ボタン操作が無ければ、ステップＳ７に戻る。一方、ステップＳ９で“ＹＥＳ”であれば、つまり、ボタン操作が有れば、ステップＳ１１で、チャットかどうかを判断する。ここでは、ＣＰＵ２０は、ボタン１１４のオンであるかを判断する。 If "NO" in step S9, that is, if there is no button operation, the process returns to step S7. On the other hand, if "YES" in step S9, that is, if there is a button operation, it is determined in step S11 whether or not there is a chat. Here, the CPU 20 determines whether the button 114 is on.

ステップＳ１１で“ＮＯ”であれば、つまり、ボタン１１６のオンであれば、図１１に示すステップＳ３１に進む。一方、ステップＳ１１で“ＹＥＳ”であれば、つまり、ボタン１１４のオンであれば、図１０に示すステップＳ１３で、図５に示したようなチャット画面１３０をウェブ画面１００の前面に表示する。 If "NO" in step S11, that is, if button 116 is on, the process proceeds to step S31 shown in FIG. On the other hand, if "YES" in step S11, that is, if button 114 is turned on, chat screen 130 as shown in FIG. 5 is displayed in front of web screen 100 in step S13 shown in FIG.

なお、利用者側端末１２にハードウェアのキーボードが接続されていない場合には、ソフトウェアキーボードも表示される。また、チャット画面１３０が表示されるときに、選択画面１１０が非表示される。 If a hardware keyboard is not connected to the user terminal 12, a software keyboard is also displayed. Also, when the chat screen 130 is displayed, the selection screen 110 is hidden.

次のステップＳ１５では、質問内容の入力かどうかを判断する。ここでは、ＣＰＵ２０は、質問内容（テキスト）のキー入力があるかどうかを判断する。ステップＳ１５で“ＹＥＳ”であれば、つまり、質問内容の入力であれば、ステップＳ１７で、入力した質問内容を表示枠１３６に表示して、ステップＳ１５に戻る。一方、ステップＳ１５で“ＮＯ”であれば、つまり、質問内容の入力でなければ、ステップＳ１９で、質問内容の送信かどうかを判断する。ここでは、ＣＰＵ２０は、質問内容が確定されたかどうかを判断する。 In the next step S15, it is determined whether or not the question content is input. Here, the CPU 20 determines whether or not there is a key input of question content (text). If "YES" in step S15, that is, if the content of the question is input, the content of the input question is displayed in the display frame 136 in step S17, and the process returns to step S15. On the other hand, if "NO" in step S15, that is, if question content is not input, it is determined in step S19 whether or not question content is to be sent. Here, the CPU 20 determines whether or not the content of the question has been finalized.

ステップＳ１９で“ＹＥＳ”であれば、つまり、質問内容の送信であれば、ステップＳ２１で、質問内容のテキストデータを操作者側端末１６に送信して、ステップＳ２３に進む。一方、ステップＳ１９で“ＮＯ”であれば、つまり、質問内容の送信でなければ、ステップＳ２３に進む。 If "YES" in the step S19, that is, if the content of the question is to be transmitted, the text data of the content of the question is transmitted to the operator-side terminal 16 in a step S21, and the process proceeds to a step S23. On the other hand, if "NO" in step S19, that is, if the content of the question is not transmitted, the process proceeds to step S23.

ステップＳ２３では、応答内容のテキストデータを受信したかどうかを判断する。ステップＳ２３で“ＮＯ”であれば、つまり、応答内容のテキストデータを受信していなければ、ステップＳ２７に進む。一方、ステップＳ２３で“ＹＥＳ”であれば、つまり、応答内容のテキストデータを受信すれば、ステップＳ２５で、応答内容の表示に合せてアバターを発話させて、ステップＳ２７に進む。ステップＳ２５では、ＣＰＵ２０は、応答内容を表示枠１３４に一文字ずつテキストで表示し、その表示に合せてアバターが喋るように口唇部を変化させて発話動作を行うアバターの画像１２０を表示枠１３２に表示する。なお、応答内容をすべて表示枠１３４に表示すると、質問内容を入力可能とするために、表示枠１３６の文字列がすべて消去（つまり、非表示）される。 In step S23, it is determined whether the text data of the content of the response has been received. If "NO" in step S23, that is, if the text data of the response content has not been received, the process proceeds to step S27. On the other hand, if "YES" in step S23, that is, if the text data of the response content is received, in step S25, the avatar is made to speak in accordance with the display of the response content, and the process proceeds to step S27. In step S25, the CPU 20 displays the contents of the response as text in the display frame 134 one by one. indicate. When all the responses are displayed in the display frame 134, all the character strings in the display frame 136 are erased (that is, hidden) so that the question can be entered.

ステップＳ２７では、チャットの終了かどうかを判断する。ここでは、ＣＰＵ２０は、ボタン１３８がオンされたり、操作者側端末１６からチャットの終了が指示されたりしたかどうかを判断する。 In step S27, it is determined whether or not the chat has ended. Here, the CPU 20 determines whether the button 138 has been turned on or whether the operator terminal 16 has instructed to end the chat.

ステップＳ２７で“ＮＯ”であれば、つまり、チャットの終了でなければ、ステップＳ１５に戻る。一方、ステップＳ２７で“ＹＥＳ”であれば、つまり、チャットの終了であれば、ステップＳ２９で、チャット画面１３０を閉じて、図９に示したステップＳ５に戻る。 If "NO" in step S27, that is, if the chat has not ended, the process returns to step S15. On the other hand, if "YES" in step S27, that is, if the chat ends, the chat screen 130 is closed in step S29, and the process returns to step S5 shown in FIG.

また、上述したように、ステップＳ１１で“ＮＯ”であれば、図１１に示すステップＳ３１で、図６に示したようなトーク画面１５０をウェブ画面１００の前面に表示する。なお、トーク画面１５０が表示されるときに、選択画面１１０が非表示される。また、トーク画面１５０が表示されるときに、すなわち、トークが開始されるときに、比率ｐが初期値（ｐ＝１）に設定される。 Also, as described above, if "NO" in step S11, the talk screen 150 as shown in FIG. 6 is displayed in front of the web screen 100 in step S31 shown in FIG. Note that the selection screen 110 is hidden when the talk screen 150 is displayed. Also, when the talk screen 150 is displayed, that is, when the talk is started, the ratio p is set to the initial value (p=1).

続くステップＳ３３では、音声の入力かどうかを判断する。ここでは、ＣＰＵ２０は、マイク３２で音声を検出したかどうかを判断する。ステップＳ３３で“ＮＯ”であれば、つまり、音声の入力でなければ、ステップＳ３７に進む。一方、ステップＳ３３で“ＹＥＳ”であれば、つまり、音声の入力であれば、ステップＳ３５で、入力された音声に対応する音声データ（すなわち、質問内容の音声データ）を操作者側端末１６に送信して、ステップＳ３７に進む。 In the subsequent step S33, it is determined whether or not there is voice input. Here, the CPU 20 determines whether or not the microphone 32 has detected sound. If "NO" in step S33, that is, if there is no voice input, the process proceeds to step S37. On the other hand, if "YES" in step S33, that is, if it is a voice input, voice data corresponding to the input voice (that is, voice data of question content) is sent to the operator side terminal 16 in step S35. After transmitting, the process proceeds to step S37.

ステップＳ３７では、応答内容の音声データを受信したかどうかを判断する。ステップＳ３７で“ＹＥＳ”であれば、つまり、応答内容の音声データを受信すれば、ステップＳ３９で、後述する比率算出処理(図１２参照)を実行して、ステップＳ４１で、応答内容の音声データを出力し、ステップＳ４３で、比率ｐに応じた大きさで、応答内容の音声データに合せて発話動作を行うアバターの画像１２０の画像データを生成し、出力して、ステップＳ３３に戻る。したがって、スピーカ３４から操作者の音声が出力されるとともに、トーク画面１５０において、比率ｐに応じた大きさで、喋っているように表現されるアバターの画像１２０が表示される。 In step S37, it is determined whether or not voice data of the content of the response has been received. If "YES" in step S37, that is, if voice data of response content is received, in step S39, ratio calculation processing (see FIG. 12), which will be described later, is executed, and in step S41, voice data of response content is , and in step S43, the image data of the image 120 of the avatar performing a speaking action in accordance with the voice data of the response content is generated and output in accordance with the ratio p, and the process returns to step S33. Therefore, the voice of the operator is output from the speaker 34, and the avatar image 120 expressed as if speaking is displayed on the talk screen 150 with a size corresponding to the ratio p.

また、ステップＳ３７で“ＮＯ”であれば、つまり、応答内容の音声データを受信していなければ、ステップＳ４５で、比率ｐをリセットし、つまり、比率データ３０４ｅが示す比率ｐを初期値（１）に設定し、ステップＳ４７で、通常の大きさでアバターを表示して、ステップＳ４９に進む。つまり、操作者の音声を出力しない場合には、アバターの画像１２０の大きさが通常時の大きさに戻される。 If "NO" in step S37, that is, if the voice data of the response content has not been received, in step S45, the ratio p is reset, that is, the ratio p indicated by the ratio data 304e is set to the initial value (1 ), and in step S47, the avatar is displayed in normal size, and the process proceeds to step S49. That is, when the operator's voice is not output, the size of the avatar image 120 is returned to the normal size.

ステップＳ４９では、トークの終了かどうかを判断する。ここでは、ＣＰＵ２０は、ボタン１５４がオンされたり、操作者側端末１６からトークの終了が指示されたりしたかどうかを判断する。 In step S49, it is determined whether or not the talk has ended. Here, the CPU 20 determines whether the button 154 has been turned on or whether the operator terminal 16 has instructed to end the talk.

ステップＳ４９で“ＮＯ”であれば、つまり、トーク終了でなければ、ステップＳ３３に戻る。一方、ステップＳ４９で“ＹＥＳ”であれば、つまり、トーク終了であれば、ステップＳ５１で、トーク画面１５０を閉じて、ステップＳ５に戻る。 If "NO" in step S49, that is, if the talk has not ended, the process returns to step S33. On the other hand, if "YES" in step S49, that is, if the talk is finished, the talk screen 150 is closed in step S51, and the process returns to step S5.

図１２は、図１１に示したステップＳ３９の比率算出処理を示すフロー図である。図１２に示すように、ＣＰＵ２０は、比率算出処理を開始すると、ステップＳ７１で、音量が所定値よりも大きいかどうかを判断する。ここでは、ＣＰＵ２０は、受信した音声データに付加された音量データが示す音量が所定値を超えているかどうかを判断する。 FIG. 12 is a flowchart showing the ratio calculation process in step S39 shown in FIG. As shown in FIG. 12, when starting the ratio calculation process, the CPU 20 determines in step S71 whether or not the volume is greater than a predetermined value. Here, the CPU 20 determines whether or not the volume indicated by the volume data added to the received audio data exceeds a predetermined value.

ステップＳ７１で“ＮＯ”であれば、つまり、音量が所定値以下であれば、比率算出処理を終了して、図９－図１１に示した制御処理にリターンする。一方、ステップＳ７１で“ＹＥＳ”であれば、つまり、音量が所定値よりも大きければ、ステップＳ７３で、数１に従って比率ｐを算出する。 If "NO" in step S71, that is, if the sound volume is equal to or less than the predetermined value, the ratio calculation process is ended and the process returns to the control process shown in FIGS. 9-11. On the other hand, if "YES" in step S71, that is, if the volume is greater than the predetermined value, in step S73, the ratio p is calculated according to Equation 1.

続いて、ステップＳ７５で、算出した比率ｐを記憶し、つまり、算出した比率ｐで比率データ３０４ｅを更新し、比率算出処理を終了して、制御処理にリターンする。 Subsequently, in step S75, the calculated ratio p is stored, that is, the ratio data 304e is updated with the calculated ratio p, the ratio calculation process is terminated, and the process returns to the control process.

第１実施例によれば、アバターの画像を通常時よりも拡大または縮小して表示することで、奥行き感を表現することができ、２次元の画面に表示されているにも関わらず、立体感が得られる。つまり、存在感を増したアバターを表示することができる。 According to the first embodiment, by displaying the avatar image enlarged or reduced more than usual, it is possible to express a sense of depth. you get a feeling. That is, it is possible to display an avatar with increased presence.

また、第１実施例によれば、拡大したアバターの画像が枠画像からはみ出すように表示される場合には、３次元の現実空間に飛び出そうとしているように見える。つまり、存在感を増したアバターを表示することができる。 Further, according to the first embodiment, when the enlarged avatar image is displayed so as to protrude from the frame image, it looks like it is about to jump out into the three-dimensional real space. That is, it is possible to display an avatar with increased presence.

上記の第１実施例では、利用者側端末１２で比率ｐを算出するようにしたが、これに限定される必要はない。操作者側端末１６で比率ｐを算出し、比率ｐのデータを音声データに付加して利用者側端末１２に送信するようにしてもよい。この場合、操作者側端末１６では、ＣＰＵ５０は、操作者の音声を検出したときに、図１２に示した比率算出処理を実行し、音声データに比率Ｐのデータを付加して利用者側端末１２に送信する。一方、利用者側端末１２では、比率ｐを算出する必要がないため、ステップＳ３９の処理が省略され、ステップＳ４３では、受信した音声データに付加された比率ｐのデータが示す比率ｐに応じた大きさで、応答内容の音声データに合せて発話動作を行うアバターの画像１２０の画像データを生成し、出力する。 In the first embodiment described above, the ratio p is calculated by the user terminal 12, but there is no need to be limited to this. The operator terminal 16 may calculate the ratio p, add the data of the ratio p to the voice data, and transmit the audio data to the user terminal 12 . In this case, in the operator-side terminal 16, when the operator's voice is detected, the CPU 50 executes the ratio calculation process shown in FIG. Send to 12. On the other hand, in the user-side terminal 12, since there is no need to calculate the ratio p, the process of step S39 is omitted, and in step S43, the ratio p indicated by the data of the ratio p added to the received voice data is calculated. Image data of the image 120 of the avatar performing a speech action in accordance with the voice data of the response content is generated and output.

なお、第１実施例では、操作者の音声データの音量に基づいてアバターを拡大または縮小するようにしたが、これに限定される必要はない。他の実施例では、操作者が発話するときの目の開き具合に基づいてアバターを拡大または縮小するようにしてもよい。ただし、操作者の目の開き具合は、操作者の顔画像を撮影し、撮影した顔画像から抽出した複数の特徴点のうち、操作者の目の上瞼と下瞼についての特徴点の距離を算出することにより、検出することができる。たとえば、操作者が発話していないときの目の開き具合と、操作者が発話しているときの目の開き具合との差に基づいて比率ｐが算出される。 In addition, in the first embodiment, the avatar is enlarged or reduced based on the volume of the voice data of the operator, but it is not necessary to be limited to this. In another embodiment, the avatar may be scaled up or down based on how the operator's eyes open when speaking. However, the degree of opening of the operator's eyes is determined by photographing the operator's face image, and out of a plurality of feature points extracted from the photographed face image, the distance between the feature points for the upper eyelid and lower eyelid of the operator's eye. can be detected by calculating For example, the ratio p is calculated based on the difference between the eye openness when the operator is not speaking and the eye openness when the operator is speaking.

また、第１実施例では、操作者の音声の音量が所定値よりも大きい場合において、音量が大きくなるにつれてアバターの画像１２０の大きさが大きくされるようにしたが、音量が大きくなるにつれてアバターの画像１２０の大きさが小さくされるようにしてもよい。この場合、音量が小さく、比率ｐが１よりも小さい場合に、アバターの画像１２０の大きさが通常時よりも大きくされる。 In addition, in the first embodiment, when the volume of the voice of the operator is higher than a predetermined value, the size of the avatar image 120 increases as the volume increases. may be reduced in size. In this case, when the volume is low and the ratio p is less than 1, the size of the avatar image 120 is made larger than normal.

さらに、第１実施例では、チャットおよびトークにおいては、利用者側端末１２と操作者側端末１６がネットワーク１４を介して通信するようにしたが、サーバ１８を介して通信するようにしてもよい。かかる場合には、サーバ１８が操作者の音声データに付加された音量データが示す音量に基づいて比率ｐを算出し、サーバ１８は、音声データに算出た比率ｐのデータを付加して、利用者側端末１２に送信するようにしてもよい。 Furthermore, in the first embodiment, in chat and talk, the user terminal 12 and the operator terminal 16 communicate via the network 14, but they may communicate via the server 18. . In such a case, the server 18 calculates the ratio p based on the sound volume indicated by the volume data added to the voice data of the operator, and the server 18 adds the data of the calculated ratio p to the voice data and uses it. It may be transmitted to the person side terminal 12 .

＜第２実施例＞
第２実施例は、トークにおいて、操作者の音声の音量に基づいて比率ｐを算出することに変えて、操作者が発話するときの操作者の首の動きに基づいて比率ｐを算出するようにした以外は、第１実施例と同じであるため、重複した説明は省略する。 <Second embodiment>
In the second embodiment, in the talk, instead of calculating the ratio p based on the volume of the operator's voice, the ratio p is calculated based on the movement of the operator's neck when the operator speaks. Since it is the same as the first embodiment except that it is changed, redundant description is omitted.

図１３は第２実施例の操作者側端末１６の電気的な構成を示すブロック図である。図１３に示すように、第２実施例の操作者側端末１６は、センサインタフェース（センサＩ／Ｆ）６６および慣性センサ６８をさらに備えている。 FIG. 13 is a block diagram showing the electrical configuration of the operator-side terminal 16 of the second embodiment. As shown in FIG. 13 , the operator-side terminal 16 of the second embodiment further includes a sensor interface (sensor I/F) 66 and an inertial sensor 68 .

センサＩ／Ｆ６６には、慣性センサ６８が接続されている。この第２実施例では、慣性センサ６８として、角速度センサが用いられる。慣性センサ６８は、マイク６２およびスピーカ６４で構成するヘッドセットに設けられ、操作者の首の縦方向および横方向の動き（この第２実施例では、頷き動作および首振り動作）を検出する。したがって、操作者の首の縦方向の動きを検出するための軸周りと、操作者の首の横方向の動きを検出するための軸周りの角速度を検出可能な角速度センサが用いられる。一例として、操作者の首の縦方向の動きを検出するための軸は、操作者の両耳を通る直線に平行な軸である。また、一例として、操作者の首の横方向の動きを検出するための軸は、操作者の頭頂部を通り延長方向に延びる軸である。 An inertial sensor 68 is connected to the sensor I/F 66 . In this second embodiment, an angular velocity sensor is used as the inertial sensor 68 . An inertial sensor 68 is provided in the headset consisting of the microphone 62 and the speaker 64, and detects vertical and horizontal movements of the operator's neck (in this second embodiment, nodding and shaking movements). Therefore, an angular velocity sensor capable of detecting angular velocity around an axis for detecting vertical movement of the operator's neck and around an axis for detecting horizontal movement of the operator's neck is used. As an example, the axis for detecting vertical movement of the operator's neck is an axis parallel to a straight line through the operator's ears. Also, as an example, the axis for detecting the lateral movement of the operator's neck is the axis extending in the extension direction through the top of the operator's head.

ただし、慣性センサ６８としては、３軸の加速度センサを用いるようにしてもよい。この場合、操作者の顔の正面方向、頭部の横方向および頭部の縦方向のそれぞれに延びる軸の加速度が検出される。 However, as the inertial sensor 68, a triaxial acceleration sensor may be used. In this case, the acceleration of the axes extending in the front direction of the operator's face, the horizontal direction of the head, and the vertical direction of the head is detected.

第２実施例では、トークにおいては、操作者の音声に対応する音声データに、操作者が発話するときに、慣性センサ６８で検出された角速度のデータ（後述する「首の動きデータ」）が付加され、利用者側端末１２に送信される。 In the second embodiment, in the talk, angular velocity data ("neck movement data" to be described later) detected by the inertial sensor 68 when the operator speaks is added to the voice data corresponding to the operator's voice. It is added and transmitted to the user side terminal 12 .

ただし、首の動きデータは、慣性センサ６８で検出された第３所定時間（この第２実施例では、１／１０秒程度）分の複数の角速度の平均値についてのデータであり、第３所定時間毎に算出される。ただし、平均値は一例であり、第３所定時間における音量の最大値でもよい。また、第３所定時間は第２所定時間と同じでなくてもよい。 However, the neck movement data is data on the average value of a plurality of angular velocities detected by the inertial sensor 68 for a third predetermined time period (about 1/10 second in this second embodiment). Calculated hourly. However, the average value is an example, and the maximum value of the volume in the third predetermined time period may be used. Also, the third predetermined time does not have to be the same as the second predetermined time.

利用者側端末１２は操作者側端末１６から音声データを受信すると、受信した音声データに付加された慣性データに応じてアバターの画像１２０を拡大または縮小する。 When the user-side terminal 12 receives the voice data from the operator-side terminal 16, it enlarges or reduces the avatar image 120 according to the inertia data added to the received voice data.

この第２実施例では、操作者の首の縦方向の動きに基づいてアバターの画像１２０が拡大され、操作者の首の横方向の動きに基づいてアバターの画像１２０の大きさが縮小される。ただし、これは一例であり、操作者の首の縦方向の動きに基づいてアバターの画像１２０の大きさが縮小され、操作者の首の横方向の動きに基づいてアバターの画像１２０の大きさが拡大されてもよい。 In this second embodiment, the avatar image 120 is enlarged based on the vertical movement of the operator's neck, and the size of the avatar image 120 is reduced based on the horizontal movement of the operator's neck. . However, this is only an example, and the size of the avatar image 120 is reduced based on the vertical movement of the operator's neck, and the size of the avatar image 120 is reduced based on the horizontal movement of the operator's neck. may be expanded.

図１４（Ａ）は操作者が頷く場合（つまり、顔を下に向けるように操作者の首が動いた場合）の比率ｐの算出方法を説明するための図であり、図１４（Ｂ）は操作者が首を振る場合（つまり、顔を右に向けるように操作者の首が動いた場合）の比率ｐの算出方法を説明するための図である。 FIG. 14A is a diagram for explaining a method of calculating the ratio p when the operator nods (that is, when the operator's neck moves to face downward), and FIG. 4 is a diagram for explaining a method of calculating the ratio p when the operator shakes his head (that is, when the operator's neck moves so as to turn his face to the right); FIG.

この第２実施例では、操作者の首の動きについてのデータ（以下、「首の動きデータ」という）を用いて、仮想空間においてアバターの首を動かし、それによって得られる数値（パラメータ）に基づいて比率ｐが算出される。図１４（Ａ）および図１４（Ｂ）では、アバターの頭部および首の画像を示してあるが、実際には、計算のみが実行され、比率ｐを算出するためにアバターの画像１２０が描画される必要はない。 In this second embodiment, data about the movement of the operator's neck (hereinafter referred to as "neck movement data") is used to move the neck of the avatar in the virtual space, and based on the numerical values (parameters) obtained by this movement, , the ratio p is calculated. Although FIGS. 14A and 14B show images of the avatar's head and neck, in reality only calculations are performed and the avatar's image 120 is drawn to calculate the ratio p. does not need to be

したがって、計算においては、アバターの頭部のモデルは、球または楕円球で設定され、球または楕円球において、アバターの目の位置に相当する位置に、アバターの眼球に相当する大きさの球体が設定される。首については、頷く場合の回転軸Ｘと首を振る場合の回転軸Ｙのみが設定される。 Therefore, in the calculation, the model of the avatar's head is set as a sphere or an elliptical sphere, and in the sphere or elliptical sphere, spheres of a size corresponding to the avatar's eyeballs are placed at positions corresponding to the positions of the avatar's eyes. set. For the head, only the rotation axis X for nodding and the rotation axis Y for shaking the head are set.

図１４（Ａ）および図１４（Ｂ）では、左側に記載したアバターの頭部および首の画像は、首を動かしていない状態、すなわち、アバターが仮想カメラに対して正対した状態を示す。ただし、図１４（Ａ）では、アバターを横から見た図であり、図１４（Ｂ）では、アバターを上から見た図である。図示は省略するが、図１４（Ａ）および図１４（Ｂ）では、仮想カメラは、アバターの正面方向であり、所定距離だけ隔てた位置に配置される。また、仮想カメラおよびアバターの上下方向の位置は、仮想カメラの視線がアバターの頭部の中心を通るように設定される。 In FIGS. 14A and 14B, the images of the avatar's head and neck shown on the left side show a state in which the neck is not moved, that is, the avatar faces the virtual camera. However, FIG. 14A is a side view of the avatar, and FIG. 14B is a top view of the avatar. Although illustration is omitted, in FIGS. 14(A) and 14(B), the virtual camera is in the front direction of the avatar and is arranged at a position separated by a predetermined distance. Also, the vertical positions of the virtual camera and the avatar are set so that the line of sight of the virtual camera passes through the center of the avatar's head.

図１４（Ａ）に示すように、操作者が頷く場合には、回転軸Ｘを中心に、アバターの頭部および眼球が前方に（仮想カメラ側に）回転される。アバターの眼球のうち、仮想カメラ側に最も突出した部分（点）を含み、仮想カメラの視線と直交する面を基準面とし、首の動きの前後における基準面の移動距離ｄを用いて比率ｐを算出する。図１４（Ａ）に示すように、操作者が頷く場合には、基準面は移動距離ｄだけ仮想カメラ側に近づく。 As shown in FIG. 14A, when the operator nods, the head and eyeballs of the avatar are rotated forward (toward the virtual camera) around the rotation axis X. As shown in FIG. Among the eyeballs of the avatar, the plane that includes the part (point) that protrudes most toward the virtual camera side and is perpendicular to the line of sight of the virtual camera is taken as a reference plane, and the moving distance d of the reference plane before and after the movement of the neck is used to calculate the ratio p. Calculate As shown in FIG. 14A, when the operator nods, the reference plane moves closer to the virtual camera by the moving distance d.

また、図１４（Ｂ）に示すように、操作者が首を振る場合には、回転軸Ｙを中心に、アバターの頭部および眼球が右向き（図示しないが、左向きでもよい）に回転される。アバターの両目の眼球のうち、仮想カメラ側に最も突出した部分（点）を結ぶ直線の中点を含み、仮想カメラの視線と直交する面を基準面とし、首の動きの前後における移動距離ｄを用いて比率ｐを算出する。図１４（Ｂ）に示すように、操作者が首を振る場合には、基準面は移動距離ｄだけ仮想カメラから遠ざかる。 Also, as shown in FIG. 14B, when the operator shakes his/her head, the head and eyeballs of the avatar are rotated rightward (not shown, but may be leftward) around the rotation axis Y. . The reference plane is a plane perpendicular to the line of sight of the virtual camera that includes the midpoint of a straight line that connects the points (points) of the avatar's eyeballs that protrude most toward the virtual camera. to calculate the ratio p. As shown in FIG. 14B, when the operator shakes his/her head, the reference plane moves away from the virtual camera by a moving distance d.

ただし、移動距離ｄは、仮想カメラに対する基準面の移動量の絶対値である。 However, the movement distance d is the absolute value of the movement amount of the reference plane with respect to the virtual camera.

第２実施例では、操作者が頷く場合には、数２に従って比率ｐが算出され、操作者が首を振る場合には、数３に従って比率ｐが算出される。ただし、数２および数３において、Ｄは基準面の最大移動距離であり、Ｐは最大移動距離の場合の比率（拡大率：１．４）であり、Ｑは最大距離の場合の比率（縮小率：０．８）る。ただし、最大移動距離Ｄは、頷く場合には、操作者の顔が水平になるまで頷いたときの移動距離ｄであり、首を振る場合には、操作者の顔が真横になるまで首を振ったときの移動距離ｄである。ただし、操作者が頷く場合には、アバターの画像１２０を拡大する（すなわち、ｐ＞１である）ため、数２では、移動距離ｄはＤ／Ｐよりも大きい。また、操作者が首を振る場合には、アバターの画像１２０を縮小する（すなわち、ｐ＜１である）ため、数３では、移動距離ｄはＤＱよりも大きい。 In the second embodiment, the ratio p is calculated according to Equation 2 when the operator nods, and the ratio p is calculated according to Equation 3 when the operator shakes his head. However, in Equations 2 and 3, D is the maximum moving distance of the reference plane, P is the ratio in the case of the maximum moving distance (enlargement ratio: 1.4), and Q is the ratio in the case of the maximum distance (reduction rate: 0.8). However, when the operator nods, the maximum movement distance D is the movement distance d when the operator nods until the operator's face becomes horizontal. It is the movement distance d when shaken. However, when the operator nods, the avatar image 120 is enlarged (that is, p>1), so in Equation 2, the movement distance d is greater than D/P. Also, when the operator shakes his head, the avatar image 120 is reduced (that is, p<1), so in Equation 3, the movement distance d is greater than DQ.

［数２］
ｐ＝Ｐ（ｄ／Ｄ）
ただし、Ｄ／Ｐ＜ｄ≦Ｄである。 [Number 2]
p=P(d/D)
However, D/P<d≦D.

［数３］
ｐ＝Ｑ（Ｄ／ｄ）
ただし、ＤＱ＜ｄ≦Ｄである。 [Number 3]
p=Q(D/d)
However, DQ<d≦D.

このように、第２実施例では、音量データに代えて、操作者の音声データに操作者の首の動きデータが付加される点と、第１実施例で示した比率算出処理の一部が第１実施例とは異なる。ただし、首の動きデータには、頷きか首振りかを識別する情報も含まれている。 Thus, in the second embodiment, instead of the volume data, the operator's neck movement data is added to the operator's voice data, and part of the ratio calculation processing shown in the first embodiment is performed. It differs from the first embodiment. However, the neck movement data also includes information that identifies whether the person is nodding or shaking his/her head.

したがって、第２実施例では、操作者側端末１６の記憶部５２のプログラム記憶領域では、音量検出プログラムに代えて首の動きを検出するための動き検出プログラムが記憶される。また、第２実施例では、操作者側端末１６の記憶部５２のデータ記憶領域では、音量データに代えて首の動きデータが記憶される。 Therefore, in the second embodiment, a motion detection program for detecting neck motion is stored in the program storage area of the storage unit 52 of the operator-side terminal 16 instead of the volume detection program. Further, in the second embodiment, in the data storage area of the storage unit 52 of the operator-side terminal 16, neck movement data is stored in place of volume data.

動き検出プログラムは、音検出プログラムに従って利用者の音声を検出しているときに、慣性センサ６８で検出された角速度に対応する角速度データを記憶部５２のデータ記憶部に記憶するためのプログラムである。また、首の動きデータは、動き検出プログラムに従って検出された操作者の首の動きについてのデータである。 The motion detection program is a program for storing angular velocity data corresponding to the angular velocity detected by the inertial sensor 68 in the data storage section of the storage section 52 while the user's voice is being detected according to the sound detection program. . Also, the neck motion data is data about the motion of the operator's neck detected according to the motion detection program.

図１５に示すように、第２実施例の比率算出処理では、ステップＳ７１の処理に代えて、ステップＳ７１ａの処理が実行される。ＣＰＵ２０は、ステップＳ７１ａで、操作者の首の動きを示すパラメータ（この第２実施例では、移動距離ｄ）が所定の範囲内であるかどうかを判断する。つまり、操作者が頷く場合には、移動距離ｄが数２に記載した範囲内であるかどうかを判断する。また、操作者が首を振る場合には、移動距離ｄが数３に記載した範囲内であるかどうかを判断する。 As shown in FIG. 15, in the ratio calculation process of the second embodiment, the process of step S71a is executed instead of the process of step S71. In step S71a, the CPU 20 determines whether or not the parameter indicating the movement of the operator's neck (movement distance d in the second embodiment) is within a predetermined range. That is, when the operator nods, it is determined whether or not the moving distance d is within the range described in Equation (2). Also, when the operator shakes his/her head, it is determined whether or not the moving distance d is within the range described in Equation (3).

ステップＳ７１ａで“ＮＯ”であれば、つまり、操作者の首の動きを示すパラメータが所定の範囲内でなければ、比率算出処理を終了して、制御処理にリターンする。一方、ステップＳ７１ａで“ＹＥＳ”であれば、つまり、操作者の首の動きを示すパラメータが所定の範囲内であれば、ステップＳ７３で、比率ｐを算出する。ただし、操作者が頷く場合には、ＣＰＵ２０は、数２に従って比率ｐ（第２実施例では、拡大率）を算出する。また、操作者が首を振る場合には、ＣＰＵ２０は、数３に従って比率ｐ（第２実施例では、縮小率）を算出する。 If "NO" in step S71a, that is, if the parameter indicating the motion of the operator's neck is not within the predetermined range, the ratio calculation process is terminated and the process returns to the control process. On the other hand, if "YES" in step S71a, that is, if the parameter indicating the motion of the operator's neck is within a predetermined range, the ratio p is calculated in step S73. However, when the operator nods, the CPU 20 calculates the ratio p (the enlargement ratio in the second embodiment) according to Equation (2). Further, when the operator shakes his/her head, the CPU 20 calculates the ratio p (the reduction ratio in the second embodiment) according to Equation (3).

第２実施例においても、アバターの画像を通常時よりも拡大または縮小して表示することで、奥行き感を表現することができ、２次元の画面に表示されているにも関わらず、立体感が得られる。つまり、存在感を増したアバターを表示することができる。 In the second embodiment as well, by displaying the avatar image enlarged or reduced more than usual, it is possible to express a sense of depth. is obtained. That is, it is possible to display an avatar with increased presence.

また、第２実施例においても、拡大したアバターの画像が枠画像からはみ出すように表示される場合には、３次元の現実空間に飛び出そうとしているように見える。つまり、存在感を増したアバターを表示することができる。 Also in the second embodiment, when the enlarged avatar image is displayed so as to protrude from the frame image, it looks like it is about to jump out into the three-dimensional real space. That is, it is possible to display an avatar with increased presence.

なお、第２実施例では、３次元の仮想空間において、アバターの頭部モデルを設定し、操作者の首の動きに基づいてアバターの首を動かし、それによって得られる数値（パラメータ）に基づいて比率ｐを算出するようにしたが、これに限定される必要はない。他の例では、２次元の仮想空間において、アバターの頭部モデルを円または楕円で設定し、頷く場合と首を横に振る場合に分けて、２次元で計算してもよい。つまり、図１４（Ａ）および図１４（Ｂ）に示したように、それぞれに分けて計算される。この場合、基準面に代えて基準線が設定され、移動距離ｄが算出される。 In the second embodiment, the avatar's head model is set in a three-dimensional virtual space, the avatar's neck is moved based on the movement of the operator's neck, and numerical values (parameters) obtained thereby are used. Although the ratio p is calculated, it is not necessary to be limited to this. In another example, the head model of the avatar may be set as a circle or an ellipse in a two-dimensional virtual space, and two-dimensional calculations may be performed separately for nodding and shaking the head. That is, as shown in FIGS. 14(A) and 14(B), they are calculated separately. In this case, a reference line is set instead of the reference plane, and the moving distance d is calculated.

また、第２実施例では、操作者の首の動きを頷く場合と振る場合とに分けて比率ｐを算出するようにしてあるが、他の実施例では、単に、操作者の首の動きで、つまり、各軸周りの角速度の大きさ（最大値または平均値）で、比率ｐを算出するようにしてもよい。 Further, in the second embodiment, the ratio p is calculated by dividing the motion of the operator's neck into the case of nodding and the case of shaking the operator's neck. , that is, the ratio p may be calculated based on the magnitude (maximum value or average value) of the angular velocities around each axis.

また、第２実施例では、チャットおよびトークにおいては、利用者側端末１２と操作者側端末１６がネットワーク１４を介して通信するようにしたが、サーバ１８を介して通信するようにしてもよい。かかる場合には、サーバ１８が操作者の音声データに付加された首の動きデータが示す操作者の首の動きに基づいて比率ｐを算出し、サーバ１８は、音声データに算出た比率ｐのデータを付加して、利用者側端末１２に送信するようにしてもよい。 In addition, in the second embodiment, in chat and talk, the user terminal 12 and the operator terminal 16 communicate via the network 14, but they may communicate via the server 18. . In such a case, the server 18 calculates the ratio p based on the operator's neck movement indicated by the neck movement data added to the operator's voice data, and the server 18 adds the calculated ratio p to the voice data. Data may be added and transmitted to the user terminal 12 .

さらに、第２実施例では、慣性センサで操作者の首の動きを検出するようにしたが、これに限定される必要はない。他の例では、操作者の顔の向きに基づいて首の動きを検出するようにしてもよい。かかる場合には、慣性センサに代えて、イメージセンサ（ＣＣＤカメラすなわちＷｅｂカメラ）が操作者の顔を撮影可能な位置に設けられる。一例として、表示装置６０の上部にＣＣＤカメラが設けられ、ＣＣＤカメラ（または、表示装置６０）に正対する操作者の顔画像の向きを基準として、現在の顔の向きが現在の顔画像に基づいて算出され、現在の顔の向きに基づいて操作者の首の動きが推定される。ただし、顔の向きは、顔画像から抽出した複数の顔の特徴点の動きで検出することができる。 Furthermore, in the second embodiment, the motion of the operator's neck is detected by the inertial sensor, but it is not necessary to be limited to this. In another example, neck movement may be detected based on the orientation of the operator's face. In such a case, instead of the inertial sensor, an image sensor (CCD camera, ie web camera) is provided at a position where the operator's face can be photographed. As an example, a CCD camera is provided above the display device 60, and based on the orientation of the face image of the operator facing the CCD camera (or the display device 60), the current face orientation is based on the current face image. , and the motion of the operator's neck is estimated based on the current face orientation. However, the orientation of the face can be detected from the movement of a plurality of facial feature points extracted from the face image.

＜第３実施例＞
第３実施例では、操作者側端末１６でアバターの画像１２０に対応する画像データを生成するようにした以外は第１実施例と同じであるため、重複した説明は省略する。 <Third embodiment>
The third embodiment is the same as the first embodiment except that the operator terminal 16 generates image data corresponding to the image 120 of the avatar, so redundant description will be omitted.

簡単に説明すると、第３実施例は、少なくともトークにおいて、操作者側端末１６で、アバターの画像１２０に対応する画像データを生成し、生成した画像データを利用者側端末１２に送信し、利用者側端末１２は受信した画像データを用いてアバターの画像１２０をトーク画面１５０に表示する。 Briefly, in the third embodiment, at least in the talk, the operator-side terminal 16 generates image data corresponding to the avatar image 120, transmits the generated image data to the user-side terminal 12, and uses it. The person-side terminal 12 displays the avatar image 120 on the talk screen 150 using the received image data.

第３実施例では、トークにおいて、操作者が発話すると、操作者側端末１６は、操作者が発話した音声およびその音量を検出し、操作者の音声の音量が所定値よりも大きい場合に数１に従って比率ｐを算出する。上述したように、操作者の音声の音量が所定値以下である場合には、比率ｐは算出されず、初期値（ｐ＝１）のままである。 In the third embodiment, when the operator speaks in a talk, the operator-side terminal 16 detects the voice uttered by the operator and its volume, and if the volume of the voice of the operator is greater than a predetermined value, 1 to calculate the ratio p. As described above, when the volume of the operator's voice is equal to or less than the predetermined value, the ratio p is not calculated and remains at the initial value (p=1).

続いて、操作者側端末１６は、比率ｐに応じた大きさで、操作者の応答内容の音声に合せて発話動作を行うアバターの画像１２０の画像データを生成する。 Subsequently, the operator-side terminal 16 generates image data of an avatar image 120 that performs a speaking action in accordance with the voice of the response contents of the operator, with a size corresponding to the ratio p.

なお、アバターの画像１２０の画像データを生成する方法は、第１実施例で説明した方法と同じである。 The method of generating the image data of the avatar image 120 is the same as the method described in the first embodiment.

操作者側端末１６は、検出した音声の音声データと生成した画像データを利用者側端末１２に送信する。利用者側端末１２は、音声データおよび画像データを受信し、音声データの出力に合せて、画像データを用いてトーク画面１５０のアバターの画像１２０を表示する。つまり、利用者側端末１２では、スピーカ３４から操作者の音声が出力されるとともに、トーク画面１５０において、比率ｐに応じた大きさで、喋っているように表現されるアバターの画像１２０が表示される。 The operator terminal 16 transmits the audio data of the detected audio and the generated image data to the user terminal 12 . The user-side terminal 12 receives the audio data and the image data, and uses the image data to display the avatar image 120 on the talk screen 150 in accordance with the output of the audio data. That is, in the user-side terminal 12, the voice of the operator is output from the speaker 34, and the avatar image 120 expressed as if speaking is displayed on the talk screen 150 in a size corresponding to the ratio p. be done.

したがって、第３実施例では、操作者側端末１６の記憶部（ＲＡＭ）５２に、図８に示したアバター制御プログラム３０２ｇおよび比率算出プログラム３０２ｈと同じプログラムがさらに記憶される。このため、第３実施例では、利用者側端末１２において、比率算出プログラム３０２ｈおよび比率データ３０４ｅが削除される。また、操作者側端末１６の記憶部５２のデータ記憶領域には、図８に示した比率データ３０４ｅと同じデータがさらに記憶される。 Therefore, in the third embodiment, the same programs as the avatar control program 302g and ratio calculation program 302h shown in FIG. Therefore, in the third embodiment, the ratio calculation program 302h and the ratio data 304e are deleted in the user terminal 12. FIG. Further, the same data as the ratio data 304e shown in FIG.

また、第３実施例の操作者側端末１６では、トークにおいて、利用者側端末１２から送信された音声データを受信して、出力したり、操作者の音声データおよび音量データを検出して、音声データを利用者側端末１２に送信したりする処理に加えて、アバターの画像１２０の画像データを生成する処理（以下、「アバターの画像生成処理」という）が実行される。 Further, in the operator-side terminal 16 of the third embodiment, during talk, voice data transmitted from the user-side terminal 12 is received and output, and voice data and volume data of the operator are detected, In addition to the process of transmitting the voice data to the user-side terminal 12, the process of generating the image data of the avatar image 120 (hereinafter referred to as "avatar image generation process") is executed.

具体的には、操作者側端末１６は、操作者が発話した音声の音量に基づいて比率ｐを算出し、算出した比率ｐに応じた大きさで、応答内容の音声データに合せて発話動作を行うアバターの画像１２０の画像データを生成する。 Specifically, the operator-side terminal 16 calculates a ratio p based on the volume of the voice uttered by the operator, and speaks in accordance with the calculated ratio p in accordance with the voice data of the response content. image data of the image 120 of the avatar performing the

操作者側端末は、検出した音声データと生成した画像データを、利用者側端末１２に送信する。また、第３実施例では、操作者側端末１６がアバターの画像１２０の画像データを生成するため、音量データは音声データに付加されない。 The operator-side terminal transmits the detected audio data and the generated image data to the user-side terminal 12 . Further, in the third embodiment, since the operator-side terminal 16 generates the image data of the avatar image 120, volume data is not added to the voice data.

以下、具体的な処理について説明する。図１６は、第３実施例における利用者側端末１２のＣＰＵ２０の制御処理の一部を示すフロー図である。図１７は、第３実施例における操作者側端末１６のＣＰＵ５０のアバターの画像生成処理を示すフロー図である。以下、ＣＰＵ２０の制御処理について説明するとともに、ＣＰＵ５０のアバターの画像生成処理について説明するが、既に説明した処理については説明を省略する。 Specific processing will be described below. FIG. 16 is a flowchart showing part of the control processing of the CPU 20 of the user-side terminal 12 in the third embodiment. FIG. 17 is a flowchart showing avatar image generation processing of the CPU 50 of the operator-side terminal 16 in the third embodiment. Hereinafter, the control processing of the CPU 20 will be explained, and the avatar image generation processing of the CPU 50 will be explained, but the explanation of the already explained processing will be omitted.

図１６に示すように、利用者側端末１２のＣＰＵ２０は、ステップＳ３３で“ＮＯ”である場合に、または、ステップＳ３５の処理を実行した場合に、ステップＳ９１で、応答内容の音声データおよび画像データを受信したかどうかを判断する。 As shown in FIG. 16, the CPU 20 of the user-side terminal 12, when "NO" in the step S33 or when the process of the step S35 is executed, in a step S91, the voice data and the image of the response content Determine if data has been received.

ステップＳ９１で“ＮＯ”であれば、つまり、応答内容の音声データおよび画像データを受信していない場合には、ステップＳ４７に進む。一方、ステップＳ９１で“ＹＥＳ”であれば、つまり、応答内容の音声データおよび画像データを受信した場合には、ステップＳ９３で、応答内容の音声データの出力に合せて画像データを出力して、ステップＳ３３に戻る。したがって、操作者の音声が利用者側端末１２で出力されるとともに、出力された音声に合せて、比率ｐに応じた大きさのアバターが喋る動作を行う画像が表示される。 If "NO" in step S91, that is, if voice data and image data of the response content have not been received, the process proceeds to step S47. On the other hand, if "YES" in step S91, that is, if voice data and image data of the response content have been received, in step S93, the image data is output in accordance with the output of the voice data of the response content, Return to step S33. Therefore, the operator's voice is output from the user-side terminal 12, and an image in which the avatar speaks with a size corresponding to the ratio p is displayed according to the output voice.

次に、図１７を参照して、操作者側端末１６のＣＰＵ５０のアバターの画像生成処理について説明するが、既に説明した処理内容についての説明は省略する。なお、アバターの画像生成処理は、操作者の音声がマイク６２で検出された場合に実行される。 Next, avatar image generation processing of the CPU 50 of the operator-side terminal 16 will be described with reference to FIG. Note that the avatar image generation processing is executed when the operator's voice is detected by the microphone 62 .

図１７に示すように、ＣＰＵ５０は、アバターの画像生成処理を開始すると、ステップＳ１１１で、音量が所定値よりも大きいかどうかを判断する。ステップＳ１１１で“ＮＯ”であれば、つまり、音量が所定値以下であれば、ステップＳ１１５に進む。一方、ステップＳ１１１で“ＹＥＳ”であれば、つまり、音量が所定値よりも大きければ、ステップＳ１１３で、数１に従って比率ｐを算出して、ステップＳ１１５に進む。 As shown in FIG. 17, when the avatar image generation process is started, the CPU 50 determines in step S111 whether or not the sound volume is greater than a predetermined value. If "NO" in step S111, that is, if the volume is equal to or less than the predetermined value, the process proceeds to step S115. On the other hand, if "YES" in step S111, that is, if the volume is greater than the predetermined value, in step S113, the ratio p is calculated according to Equation 1, and the process proceeds to step S115.

ステップＳ１１５では、比率ｐに応じた大きさで、応答内容の音声データに合せて発話動作を行うアバターの画像１２０の画像データを生成して、アバターの画像生成処理を終了する。 In step S115, the image data of the image 120 of the avatar performing a speech action in accordance with the voice data of the response content is generated with a size corresponding to the ratio p, and the avatar image generation processing ends.

このように生成された画像データが、応答内容の音声データとともに、利用者側端末１２に送信される。 The image data generated in this manner is transmitted to the user-side terminal 12 together with the voice data of the content of the response.

第３実施例においても、アバターの画像を通常時よりも拡大または縮小して表示することで、奥行き感を表現することができ、２次元の画面に表示されているにも関わらず、立体感が得られる。つまり、存在感を増したアバターを表示することができる。 In the third embodiment as well, by displaying the avatar image enlarged or reduced more than usual, it is possible to express a sense of depth. is obtained. That is, it is possible to display an avatar with increased presence.

また、第３実施例においても、拡大したアバターの画像が枠画像からはみ出すように表示される場合には、３次元の現実空間に飛び出そうとしているように見える。つまり、存在感を増したアバターを表示することができる。 Also in the third embodiment, when the enlarged avatar image is displayed so as to protrude from the frame image, it looks like it is about to jump out into the three-dimensional real space. That is, it is possible to display an avatar with increased presence.

なお、第３実施例では、操作者の音声の音量に基づいてアバターの画像１２０を拡大または縮小するようにしたが、これに限定される必要はない。操作者の目の開き具合に基づいてアバターの画像１２０を拡大または縮小するようにしてもよい。 In addition, in the third embodiment, the avatar image 120 is enlarged or reduced based on the volume of the voice of the operator, but it is not necessary to be limited to this. The avatar image 120 may be enlarged or reduced based on the degree of eye opening of the operator.

また、第３実施例では、第２実施例で示したように、操作者の首の動きに基づいてアバターの画像１２０を拡大または縮小するようにしてもよい。この場合、図１７に示したアバターの画像生成処理において、操作者が発話するときの音声データのみならず、操作者の首の動きデータが検出される。また、図１７に示したステップＳ１１１の処理に代えて、操作者の首の動きを示すパラメータ（第３実施例では、移動距離ｄ）が所定の範囲内であるかどうかを判断する処理が実行される。つまり、操作者が頷く場合には、移動距離ｄが数２に記載した範囲内であるかどうかを判断する。また、操作者が首を振る場合には、移動距離ｄが数３に記載した範囲内であるかどうかを判断する。操作者の首の動きを示すパラメータが所定の範囲内でなければ、ステップＳ１１５に進み、操作者の首の動きを示すパラメータが所定の範囲内であれば、ステップＳ１１３で、数２または数３に従って比率ｐを算出して、ステップＳ１１５に進む。 Also, in the third embodiment, as shown in the second embodiment, the avatar image 120 may be enlarged or reduced based on the motion of the operator's neck. In this case, in the avatar image generation process shown in FIG. 17, not only the voice data when the operator speaks but also the movement data of the operator's neck are detected. Further, instead of the process of step S111 shown in FIG. 17, a process of judging whether or not the parameter indicating the movement of the operator's neck (movement distance d in the third embodiment) is within a predetermined range is executed. be done. That is, when the operator nods, it is determined whether or not the moving distance d is within the range described in Equation (2). Also, when the operator shakes his/her head, it is determined whether or not the moving distance d is within the range described in Equation (3). If the parameter indicating the motion of the operator's neck is not within the predetermined range, the process proceeds to step S115. Then, the process proceeds to step S115.

＜第４実施例＞
第４実施例では、サーバ１８側で制御処理を実行するようにした以外は、第１実施例と同じであるため、重複した説明は省略する。 <Fourth embodiment>
The fourth embodiment is the same as the first embodiment except that the control processing is executed on the server 18 side, so redundant description will be omitted.

上述したように、第４実施例では、サーバ１８が制御処理を実行するため、制御処理に関しては、利用者側端末１２は入出力装置として機能する。したがって、利用者側端末１２は、制御処理において、利用者の操作または入力に応じた操作データおよび利用者の音声に応じた音声データをサーバ１８に送信し、サーバ１８が送信した画像データ、テキストデータおよび音声データを出力する。 As described above, in the fourth embodiment, the server 18 executes the control process, so the user terminal 12 functions as an input/output device for the control process. Therefore, in the control process, the user-side terminal 12 transmits to the server 18 the operation data corresponding to the user's operation or input and the voice data corresponding to the user's voice. Output data and audio data.

第４実施例の情報処理システム１０では、第１実施例で説明したアプリはサーバ１８に記憶されており、サーバ１８で実行される。 In the information processing system 10 of the fourth embodiment, the application explained in the first embodiment is stored in the server 18 and executed by the server 18 .

したがって、第４実施例では、サーバ１８の記憶部（ＲＡＭ）１８ｂのプログラム記憶領域には、図８に示した起動判断プログラム３０２ａ、メイン処理プログラム３０２ｂ、通信プログラム３０２ｄ、画像生成プログラム３０２ｅ、画像出力プログラム３０２ｆ、アバター制御プログラム３０２ｇ、比率算出プログラム３０２ｈおよび音出力プログラム３０２ｊと同じプログラムが記憶される。ただし、第４実施例では、画像出力プログラム３０２ｆは、画像生成プログラム３０２ｅに従って生成した画像データを利用者側端末１２に出力（または、送信）する。また、音出力プログラム３０２ｊは、受信した操作者の応答内容の音声データを利用者側端末１２に出力（または、送信）する。 Therefore, in the fourth embodiment, the program storage area of the storage unit (RAM) 18b of the server 18 contains the activation determination program 302a, the main processing program 302b, the communication program 302d, the image generation program 302e, and the image output program shown in FIG. The same programs as program 302f, avatar control program 302g, ratio calculation program 302h and sound output program 302j are stored. However, in the fourth embodiment, the image output program 302f outputs (or transmits) image data generated according to the image generation program 302e to the user terminal 12. FIG. In addition, the sound output program 302j outputs (or transmits) the received voice data of the operator's response to the user-side terminal 12 .

また、サーバ１８の記憶部（ＲＡＭ）１８ｂのデータ記憶領域には、送信データ、受信データ、画像生成データおよび比率データが記憶される。送信データは、利用者側端末１２に送信するデータであり、ウェブ画面１００、選択画面１１０、チャット画面１３０、トーク画面１５０の画像データ、アバターの画像１２０の画像データ、チャットにおける利用者の質問内容についてのテキストデータ、チャットにおける操作者の応答内容についてのテキストデータおよびトークにおける利用者の質問内容についての音声データおよび操作者の応答内容についての音声データである。受信データは、操作者側端末１６から送信され、受信したデータであり、チャットにおける操作者の応答内容についてのテキストデータおよびトークにおける操作者の応答内容についての音声データ（第４実施例では、音量データが付加された音声データ）である。 A data storage area of a storage unit (RAM) 18b of the server 18 stores transmission data, reception data, image generation data, and ratio data. The transmission data is data to be transmitted to the user-side terminal 12, and includes image data of the web screen 100, the selection screen 110, the chat screen 130, the talk screen 150, the image data of the avatar image 120, and the content of the user's question in the chat. text data about the operator's response in the chat, voice data about the user's question in the talk, and voice data about the operator's response. The received data is data transmitted and received from the operator-side terminal 16, and includes text data about the content of the operator's response in chat and voice data about the content of the operator's response in talk (in the fourth embodiment, volume audio data to which data is added).

画像生成データは、利用者側端末１２の表示装置３０に表示される各種の画面を生成するためのデータであり、アバターの画像１２０を生成するためのデータを含む。また、アバターの画像１２０を生成するためのデータは、アバターの画像１２０についての静止した状態の画像データ、無意識動作および挨拶の動作についてのアニメーションデータを含む。比率データは、比率ｐについてのデータである。 The image generation data is data for generating various screens displayed on the display device 30 of the user-side terminal 12 and includes data for generating the avatar image 120 . In addition, the data for generating the avatar image 120 includes static image data for the avatar image 120 and animation data for unconscious actions and greeting actions. Ratio data is data about the ratio p.

また、第４実施例では、サーバ１８が制御処理を実行するため、利用者側端末１２では、起動判断プログラム３０２ａ、アバター制御プログラム３０２ｇおよび比率算出プログラム３０２ｈを省略することができる。同様に、利用者側端末１２には、比率データ３０４ｅは記憶されない。 In addition, in the fourth embodiment, since the server 18 executes the control processing, the user terminal 12 can omit the activation determination program 302a, the avatar control program 302g, and the ratio calculation program 302h. Similarly, the user-side terminal 12 does not store the ratio data 304e.

具体的には、サーバ１８のＣＰＵ１８ａが図１８－図２０に示す制御処理を実行する。以下、図１８－図２０を用いてサーバ１８のＣＰＵ１８ａが実行する制御処理について説明するが、既に説明した内容と重複する内容については簡単に説明することにする。 Specifically, the CPU 18a of the server 18 executes control processing shown in FIGS. The control processing executed by the CPU 18a of the server 18 will be described below with reference to FIGS. 18 to 20, but the content that overlaps with the content already described will be briefly described.

図１８に示すように、サーバ１８のＣＰＵ１８ａは、制御処理を開始すると、ステップＳ２０１で、アプリの起動条件を満たすかどうかを判断する。ステップＳ２０１で“ＮＯ”であれば、ステップＳ２０１に戻る。一方、ステップＳ２０１で“ＹＥＳ”であれば、ステップＳ２０３で、アプリを起動する。 As shown in FIG. 18, when the control process is started, the CPU 18a of the server 18 determines in step S201 whether or not the application activation condition is satisfied. If "NO" in step S201, the process returns to step S201. On the other hand, if "YES" in step S201, the application is activated in step S203.

続くステップＳ２０５では、図４に示したような選択画面１１０を利用者側端末１２に表示する。つまり、ＣＰＵ１８ａは、図４に示したような選択画面１１０の画像データを生成して利用者側端末１２に出力（または、送信）する。したがって、利用者側端末１２の表示装置３０において、選択画面１１０がウェブ画面１００の前面に表示される。以下、利用者側端末１２に画面が表示される場合について同様である。 In subsequent step S205, the selection screen 110 as shown in FIG. 4 is displayed on the user terminal 12. FIG. That is, the CPU 18a generates image data for the selection screen 110 as shown in FIG. Therefore, the selection screen 110 is displayed in front of the web screen 100 on the display device 30 of the user-side terminal 12 . The same applies to the case where the screen is displayed on the user-side terminal 12 below.

ただし、アプリを起動した当初では、ＣＰＵ１８ａは、静止した状態のアバターの画像１２０を表示枠１１２に表示する選択画面１１０の画像データを生成して利用者側端末１２に送信し、次いで、アバターに挨拶の動作を行わせるためのアニメーションデータを利用者側端末１２に出力する。 However, when the application is first started, the CPU 18a generates the image data of the selection screen 110 that displays the static avatar image 120 in the display frame 112, transmits the image data to the user terminal 12, and then displays the avatar. Animation data for performing a greeting action is output to the user-side terminal 12 .

次のステップＳ２０７では、アバターの画像１２０に待機動作を実行させる。ここでは、ＣＰＵ１８ａは、アバターに無意識動作を行わせるためのアニメーションデータを利用者側端末１２に送信する。ただし、ＣＰＵ１８ａは、選択画面１１０において利用者の操作が無い場合において、数秒または数十秒毎にアバターに無意識動作を行わせるためのアニメーションデータを送信する。 In the next step S207, the avatar image 120 is caused to perform a standby action. Here, the CPU 18a transmits to the user terminal 12 animation data for causing the avatar to perform an unconscious action. However, the CPU 18a transmits animation data for causing the avatar to perform an unconscious action every several seconds or several tens of seconds when there is no user operation on the selection screen 110 .

続いて、ステップＳ２０９で、ボタン操作が有るかどうかを判断する。ここでは、ＣＰＵ１８ａは、選択画面１１０のボタン１１４またはボタン１１６がオンされたことを示す操作データを利用者側端末１２から受信したかどうかを判断する。なお、図示は省略するが、ボタン１１８がオンされたことを示す操作データを受信した場合には、選択画面１１０を閉じる（非表示する）ことを利用者側端末１２に指示して、アプリを終了する。 Subsequently, in step S209, it is determined whether or not there is a button operation. Here, the CPU 18a determines whether operation data indicating that the button 114 or the button 116 of the selection screen 110 has been turned on has been received from the user-side terminal 12 or not. Although illustration is omitted, when operation data indicating that the button 118 is turned on is received, the user-side terminal 12 is instructed to close (hide) the selection screen 110, and the application is executed. finish.

ステップＳ２０９で“ＮＯ”であれば、ステップＳ２０７に戻る。一方、ステップＳ２０９で“ＹＥＳ”であれば、ステップＳ２１１で、チャットかどうかを判断する。ここでは、ＣＰＵ１８ａは、操作データがボタン１１４のオンを示すかどうかを判断する。 If "NO" in step S209, the process returns to step S207. On the other hand, if "YES" in step S209, it is determined in step S211 whether or not it is a chat. Here, the CPU 18a determines whether or not the operation data indicates that the button 114 is turned on.

ステップＳ２１１で“ＮＯ”であれば、つまり、ボタン１１６のオンであれば、図２０に示すステップＳ２３１に進む。一方、ステップＳ２１１で“ＹＥＳ”であれば、つまり、ボタン１１４のオンであれば、図１９に示すステップＳ２１３で、図５に示したようなチャット画面１３０を利用者側端末１２に表示する。つまり、ＣＰＵ１８ａは、図５に示したようなチャット画面１３０の画像データを生成して利用者側端末１２に送信する。 If "NO" in step S211, that is, if button 116 is ON, the process proceeds to step S231 shown in FIG. On the other hand, if "YES" in step S211, that is, if button 114 is turned on, chat screen 130 as shown in FIG. 5 is displayed on user terminal 12 in step S213 shown in FIG. That is, the CPU 18a generates image data of the chat screen 130 as shown in FIG.

したがって、利用者側端末１２の表示装置３０では、選択画面１１０が非表示され、ウェブ画面１００の前面にチャット画面１３０が表示される。 Therefore, the selection screen 110 is not displayed on the display device 30 of the user-side terminal 12 , and the chat screen 130 is displayed in front of the web screen 100 .

次のステップＳ２１５では、質問内容の入力かどうかを判断する。ここでは、ＣＰＵ１８ａは、質問内容（テキスト）のキー入力を示す操作データを利用者側端末１２から受信したかどうかを判断する。 In the next step S215, it is determined whether or not the content of the question is input. Here, the CPU 18a determines whether operation data indicating key input of question content (text) has been received from the user-side terminal 12 or not.

ステップＳ２１５で“ＹＥＳ”であれば、ステップＳ２１７で、操作データが示すキー入力に対応する文字または文字列を表示枠１３６にテキストで表示する画像データを生成して利用者側端末１２に送信して、ステップＳ２１５に戻る。 If "YES" in step S215, then in step S217 image data for displaying characters or character strings corresponding to the key input indicated by the operation data as text in display frame 136 is generated and transmitted to user-side terminal 12. Then, the process returns to step S215.

したがって、利用者側端末１２では、チャット画面１３０の表示枠１３６に、利用者が入力した質問内容についての文字または文字列が順次表示される。 Therefore, on the user-side terminal 12, characters or character strings regarding the content of the question input by the user are sequentially displayed in the display frame 136 of the chat screen 130. FIG.

一方、ステップＳ２１５で“ＮＯ”であれば、ステップＳ２１９で、質問内容の送信かどうかを判断する。ここでは、ＣＰＵ１８ａは、質問内容が確定したこと（または、質問内容を送信すること）を示す操作データを利用者側端末１２から受信したかどうかを判断する。 On the other hand, if "NO" in step S215, it is determined in step S219 whether or not the content of the question is to be transmitted. Here, the CPU 18a determines whether it has received operation data from the user-side terminal 12 indicating that the content of the question has been confirmed (or that the content of the question has been transmitted).

ステップＳ２１９で“ＹＥＳ”であれば、ステップＳ２２１で、質問内容のテキストデータを操作者側端末１６に送信して、ステップＳ２２３に進む。ただし、質問内容のテキストデータは、今回の質問において、利用者が入力した文字または文字列を時系列に並べたデータである。一方、ステップＳ２１９で“ＮＯ”であれば、ステップＳ２２３に進む。 If "YES" in step S219, then in step S221 the text data of the question content is transmitted to the operator terminal 16, and the process proceeds to step S223. However, the text data of the question content is data in which characters or character strings input by the user are arranged in chronological order in this question. On the other hand, if "NO" in step S219, the process proceeds to step S223.

ステップＳ２２３では、応答内容のテキストデータを操作者側端末１６から受信したかどうかを判断する。ステップＳ２２３で“ＮＯ”であれば、ステップＳ２２７に進む。一方、ステップＳ２２３で“ＹＥＳ”であれば、ステップＳ２２５で、応答内容の表示に合せてアバターを発話させて、ステップＳ２２７に進む。ステップＳ２２５では、ＣＰＵ１８ａは、応答内容を表示枠１３４に一文字ずつテキストで表示する画像データを生成して利用者側端末１２に送信するとともに、その表示に合せて喋るように口唇部を変化させて発話動作を行うアバターの画像１２０の画像データを生成して利用者側端末１２に送信する。 In step S223, it is determined whether the text data of the content of the response has been received from the operator-side terminal 16 or not. If "NO" in step S223, the process proceeds to step S227. On the other hand, if "YES" in step S223, in step S225 the avatar is made to speak in accordance with the display of the response content, and the process proceeds to step S227. In step S225, the CPU 18a generates image data for displaying the content of the response as text in the display frame 134 one character at a time, transmits the image data to the user-side terminal 12, and changes the lips so as to speak according to the display. Image data of the image 120 of the avatar performing the speaking motion is generated and transmitted to the user-side terminal 12 .

したがって、利用者側端末１２では、チャット画面１３０の表示枠１３４に応答内容が一文字ずつ表示されるとともに、その応答内容の表示に合せて発話動作を行うアバターの画像１２０が表示枠１３２に表示される。 Accordingly, on the user-side terminal 12, the response contents are displayed one character at a time in the display frame 134 of the chat screen 130, and the image 120 of the avatar performing the speaking action is displayed in the display frame 132 in accordance with the display of the response contents. be.

ステップＳ２２７では、チャットの終了かどうかを判断する。ここでは、ＣＰＵ１８ａは、ボタン１３８のオンを示す操作データを受信したり、操作者側端末１６からチャットの終了を指示する操作データを受信したりしたかどうかを判断する。 In step S227, it is determined whether or not the chat has ended. Here, the CPU 18a determines whether it has received operation data indicating that the button 138 is turned on, or whether it has received operation data indicating the end of the chat from the operator-side terminal 16. FIG.

ステップＳ２２７で“ＮＯ”であれば、ステップＳ２１５に戻る。一方、ステップＳ２２７で“ＹＥＳ”であれば、ステップＳ２２９で、チャット画面１３０を閉じることを利用者側端末１２に指示して、図１８に示したステップＳ２０５に戻る。 If "NO" in step S227, the process returns to step S215. On the other hand, if "YES" in step S227, the user terminal 12 is instructed to close the chat screen 130 in step S229, and the process returns to step S205 shown in FIG.

また、上述したように、ステップＳ２１１で“ＮＯ”であれば、図２０に示すステップＳ２３１で、図６に示したようなトーク画面１５０を利用者側端末１２に表示する。つまり、ＣＰＵ１８ａは、図６に示したようなトーク画面１５０の画像データを生成して利用者側端末１２に送信する。 Also, as described above, if "NO" in step S211, the talk screen 150 as shown in FIG. 6 is displayed on the user terminal 12 in step S231 shown in FIG. That is, the CPU 18a generates image data of the talk screen 150 as shown in FIG.

したがって、利用者側端末１２の表示装置３０では、選択画面１１０が非表示され、ウェブ画面１００の前面にトーク画面１５０が表示される。 Therefore, on the display device 30 of the user-side terminal 12 , the selection screen 110 is hidden, and the talk screen 150 is displayed in front of the web screen 100 .

続くステップＳ２３３では、利用者の質問内容の音声データを利用者側端末１２から受信したかどうかを判断する。ステップＳ２３３で“ＮＯ”であれば、つまり、質問内容の音声データを受信していなければ、ステップＳ２３７に進む。一方、ステップＳ２３３で“ＹＥＳ”であれば、つまり、質問内容の音声データを受信すれば、ステップＳ２３５で、質問内容の音声データを操作者側端末１６に送信して、ステップＳ２３７に進む。 In the following step S233, it is determined whether or not voice data of the content of the user's question has been received from the user-side terminal 12 or not. If "NO" in step S233, that is, if voice data of question content has not been received, the process proceeds to step S237. On the other hand, if "YES" in step S233, that is, if the voice data of the question content is received, the voice data of the question content is transmitted to the operator side terminal 16 in step S235, and the process proceeds to step S237.

つまり、サーバ１８は受信した質問内容の音声データを操作者側端末１６に送信する。したがって、操作者側端末１６では、スピーカ６４から利用者の音声が出力される。一方、操作者側端末１６は、操作者の応答内容の音声データに音量データを付加して、サーバ１８に送信する。 In other words, the server 18 transmits the received voice data of the content of the question to the operator side terminal 16 . Therefore, the user's voice is output from the speaker 64 of the operator-side terminal 16 . On the other hand, the operator side terminal 16 adds volume data to the voice data of the contents of the operator's response and transmits the data to the server 18 .

ステップＳ２３７では、操作者の応答内容の音声データを受信したかどうかを判断する。ステップＳ２３７で“ＹＥＳ”であれば、つまり、操作者の応答内容の音声データを受信すれば、ステップＳ２３９で、図１２に示した比率算出処理を実行して、ステップＳ２４１で、比率ｐに応じた大きさで、応答内容の音声データに合せて発話動作を行うアバターの画像１２０の画像データを生成する。 In step S237, it is determined whether or not voice data of the content of the operator's response has been received. If "YES" in step S237, that is, if the voice data of the operator's response content is received, in step S239 the ratio calculation process shown in FIG. 12 is executed, and in step S241 The image data of the image 120 of the avatar that speaks according to the voice data of the response content is generated.

ただし、ステップＳ２４１で生成されるアバターの画像１２０の画像データは、第３実施例（ステップＳ１１５）と同様であり、比率ｐに応じた大きさで、操作者の音声にリップシンクして発話動作を行うアバターの画像１２０についての画像データである。 However, the image data of the avatar image 120 generated in step S241 is the same as that in the third embodiment (step S115), and has a size corresponding to the ratio p, and is lip-synced to the operator's voice and speaks. This is image data about an image 120 of an avatar performing

次のステップＳ２４３では、受信した応答内容の音声データとステップＳ２４１で生成した画像データを利用者側端末１２に送信して、ステップＳ２３３に戻る。したがって、利用者側端末１２では、応答内容の音声がスピーカ３４から出力されるとともに、トーク画面１５０において、比率ｐに応じた大きさで、応答内容の音声にリップシンクして発話動作を行うアバターの画像１２０が表示枠１５２に表示される。 In the next step S243, the voice data of the received response content and the image data generated in step S241 are transmitted to the user-side terminal 12, and the process returns to step S233. Therefore, in the user-side terminal 12, the voice of the response content is output from the speaker 34, and on the talk screen 150, the voice of the response content is lip-synced to the voice of the response content at a volume corresponding to the ratio p. image 120 is displayed in the display frame 152 .

また、ステップＳ２３７で“ＮＯ”であれば、ステップＳ２４５で、比率ｐをリセットして、ステップＳ２４７で、通常の大きさでアバターの画像１２０の画像データを利用者側端末１２に送信して、ステップＳ２４９に進む。したがって、利用者側端末１２では、操作者の音声の出力が終了すると、アバターの画像１２０の大きさが通常時の大きさに戻される。 If "NO" in step S237, the ratio p is reset in step S245, and the image data of the avatar image 120 in the normal size is transmitted to the user terminal 12 in step S247. The process proceeds to step S249. Therefore, in the user-side terminal 12, when the output of the operator's voice ends, the size of the avatar image 120 is returned to the normal size.

ステップＳ２４９では、トークの終了かどうかを判断する。ここでは、ＣＰＵ１８ａは、利用者側端末１２からボタン１５４のオンを示す操作データを受信したり、操作者側端末１６からトークの終了を指示する操作データを受信したりしたかどうかを判断する。 In step S249, it is determined whether or not the talk has ended. Here, the CPU 18a determines whether it has received operation data indicating that the button 154 is turned on from the user-side terminal 12 or whether it has received operation data indicating termination of the talk from the operator-side terminal 16. FIG.

ステップＳ２４９で“ＮＯ”であれば、ステップＳ２３３に戻る。一方、ステップＳ２４９で“ＹＥＳ”であれば、ステップＳ２５１で、トーク画面１５０を閉じることを利用者側端末１２に指示して、ステップＳ２０５に戻る。 If "NO" in step S249, the process returns to step S233. On the other hand, if "YES" in step S249, the user terminal 12 is instructed to close the talk screen 150 in step S251, and the process returns to step S205.

第４実施例においても、アバターの画像を通常時よりも拡大または縮小して表示することで、奥行き感を表現することができ、２次元の画面に表示されているにも関わらず、立体感が得られる。つまり、存在感を増したアバターを表示することができる。 In the fourth embodiment as well, by displaying the avatar image enlarged or reduced more than usual, it is possible to express a sense of depth. is obtained. That is, it is possible to display an avatar with increased presence.

また、第４実施例においても、拡大したアバターの画像が枠画像からはみ出すように表示される場合には、３次元の現実空間に飛び出そうとしているように見える。つまり、存在感を増したアバターを表示することができる。 Also in the fourth embodiment, when the enlarged avatar image is displayed so as to protrude from the frame image, it looks like it is about to jump out into the three-dimensional real space. That is, it is possible to display an avatar with increased presence.

なお、第４実施例では、トークにおける利用者の質問内容の音声は、利用者側端末１２からサーバ１８を介して操作者側端末１６に送信されるが、第１実施例と同様に、利用者側端末１２から操作者側端末１６に送信されるようにしてもよい。 In the fourth embodiment, the voice of the user's question in the talk is transmitted from the user terminal 12 to the operator terminal 16 via the server 18. It may be transmitted from the operator-side terminal 12 to the operator-side terminal 16 .

また、第４実施例では、第２実施例で示したように、操作者の首の動きに基づいてアバターの画像１２０を拡大または縮小するようにしてもよい。この場合、操作者側端末１６は、首の動きデータを付加した音声データをサーバ１８に送信する。そして、ステップＳ４５で、図１５に示した第２実施例の比率算出処理を実行する。 Also, in the fourth embodiment, as shown in the second embodiment, the avatar image 120 may be enlarged or reduced based on the motion of the operator's neck. In this case, the operator-side terminal 16 transmits to the server 18 voice data to which neck movement data is added. Then, in step S45, the ratio calculation process of the second embodiment shown in FIG. 15 is executed.

上述の各実施例で示したように、情報処理システム１０においては、比率ｐを、利用者側端末１２、操作者側端末１６またはサーバ１８のいずれかで算出することができる。また、情報処理システム１０においては、アバターの画像１２０を、利用者側端末１２、操作者側端末１６またはサーバ１８のいずれかで生成することができる。 As shown in the above embodiments, in the information processing system 10, the ratio p can be calculated by either the user terminal 12, the operator terminal 16, or the server 18. FIG. Further, in the information processing system 10 , the avatar image 120 can be generated by any one of the user terminal 12 , the operator terminal 16 and the server 18 .

また、上述の各実施例では、比率ｐに応じた大きさのアバターの画像１２０を表示するとともに、応答内容の音声に合わせて発話動作を行うアバターの画像１２０を表示するようにしたが、発話動作を行わずに、比率ｐに応じた大きさでアバターの画像１２０を表示するだけでも、アバターの存在感を増すことができる。 In addition, in each of the above-described embodiments, the avatar image 120 having a size corresponding to the ratio p is displayed, and the avatar image 120 performing a speaking action in accordance with the voice of the response content is displayed. The presence of the avatar can be increased by simply displaying the avatar image 120 in a size corresponding to the ratio p without performing any action.

さらに、上述の各実施例では、比率ｐに応じた大きさのアバターの画像１２０を表示するようにしたが、比率ｐを算出せずに、操作者の音声を出力するときに、アバターの画像１２０を表示枠１５２からはみ出す大きさに拡大するようにしてもよい。たとえば、アバターの画像１２０が通常時の１．４倍の大きさにされる。ただし、これは一例であり、表示枠１５２からはみ出す大きさにされればよい。具体的には、第１実施例および第２実施例では、図１１に示すステップＳ３９およびＳ４５が削除され、ステップＳ４３で、ＣＰＵ２０は、１．４倍に拡大した大きさで、応答内容の音声データに合せて発話動作を行うアバー他の画像データを生成および出力する。また、第３実施例では、図１７に示すステップＳ１１１およびＳ１１３が削除され、ステップＳ１１５で、ＣＰＵ５０は、１．４倍に拡大した大きさで、応答内容の音声データに合せて発話動作を行うアバー他の画像データを生成する。さらに、第４実施例では、図２０に示すステップＳ２３９およびＳ２４５が削除され、ステップＳ２４１において、ＣＰＵ１８ａは、１．４倍に拡大した大きさで、応答内容の音声データに合せて発話動作を行うアバターの画像１２０の画像データを生成する。 Furthermore, in each of the above-described embodiments, the avatar image 120 having a size corresponding to the ratio p is displayed. 120 may be enlarged to a size that protrudes from the display frame 152 . For example, the avatar image 120 is made 1.4 times larger than normal. However, this is only an example, and the size may be such that it protrudes from the display frame 152 . Specifically, in the first embodiment and the second embodiment, steps S39 and S45 shown in FIG. 11 are deleted, and in step S43, the CPU 20 reproduces the voice of the content of the response in a size magnified 1.4 times. It generates and outputs image data of Aber et al., who speaks according to the data. Further, in the third embodiment, steps S111 and S113 shown in FIG. 17 are omitted, and in step S115, the CPU 50 performs a speech operation in accordance with the voice data of the response contents in a size enlarged by 1.4 times. Aber et al. generate image data. Furthermore, in the fourth embodiment, steps S239 and S245 shown in FIG. 20 are deleted, and in step S241, the CPU 18a performs an utterance operation in accordance with the voice data of the response content in a size enlarged by 1.4 times. Image data for the avatar image 120 is generated.

このように、比率ｐを算出しない場合には、操作者側端末１６は、操作者の音声の音声データのみを利用者側端末１２またはサーバ１８に送信し、音量データまたは首の動きデータを送信する必要はなく、音量データまたは首の動きデータを検出する必要もない。 Thus, when the ratio p is not calculated, the operator terminal 16 transmits only the audio data of the operator's voice to the user terminal 12 or the server 18, and transmits volume data or neck movement data. There is no need to detect volume data or neck movement data.

また、比率ｐを算出しない場合には、操作者の音声を検出したときの音声の音量または操作者の首の動きは関係無いため、チャットにおいて、操作者の応答内容のテキストを表示するときに、アバターの画像１２０を表示枠１５２からはみ出す大きさに拡大するようにしてもよい。このようにしても、存在感を増したアバターを表示することができる。具体的には、第１実施例、第２実施例および第３実施例において、図１０に示すステップＳ２５において、ＣＰＵ２０は、応答内容を表示枠１３４に一文字ずつテキストで表示し、１．４倍に拡大した大きさで、その表示に合せてアバターが喋るように口唇部を変化させて発話動作を行うアバターの画像１２０を表示枠１３２に表示する。ただし、この場合の「通常時」は、応答内容のテキストを表示していない状態を含む。 Further, when the ratio p is not calculated, the volume of the voice when the operator's voice is detected or the movement of the operator's neck is irrelevant. , the avatar image 120 may be enlarged to a size that protrudes from the display frame 152 . Even in this way, it is possible to display an avatar with an increased presence. Specifically, in the first, second, and third embodiments, in step S25 shown in FIG. 10, the CPU 20 displays the response content as text in the display frame 134 one character at a time, and increases the magnification by 1.4. The image 120 of the avatar is displayed in a display frame 132 in a size enlarged to 120, and the lip part is changed so that the avatar speaks according to the display. However, "normal time" in this case includes a state in which the text of the response content is not displayed.

さらに、比率ｐを算出しない場合には、アバターの画像１２０の大きさを拡大することに代えて、アバターの画像１２０を変化（または、変形）させることで、アバターの画像１２０を表示枠１３２または表示枠１５２からはみ出して表示することもできる。一例として、図２１に示すように、アバターの手と頭部の一部を、枠画像（表示枠１５２）からはみ出したアバターの画像１２０を表示することができる。このようにしても、アバターが３次元の現実空間に飛び出そうとしているように見える。つまり、存在感を増したアバターを表示することができる。 Furthermore, when the ratio p is not calculated, instead of enlarging the size of the avatar image 120, the avatar image 120 is changed (or deformed) so that the avatar image 120 is displayed in the display frame 132 or It can also be displayed so as to protrude from the display frame 152 . As an example, as shown in FIG. 21, it is possible to display an avatar image 120 in which part of the avatar's hands and head protrudes from the frame image (display frame 152). Even in this way, it looks like the avatar is about to jump out into the three-dimensional real space. That is, it is possible to display an avatar with increased presence.

ただし、比率ｐを算出する場合であり、かつ、比率ｐが１よりも大きい場合には、比率ｐの大きさに比例して、表示枠１５２からはみ出す度合が大きくなるように、アバターの画像１２０を変化させるようにしてもよい。 However, when the ratio p is calculated and the ratio p is greater than 1, the avatar image 120 is adjusted so that the degree of protruding from the display frame 152 increases in proportion to the size of the ratio p. may be changed.

以上のように、アバターの画像１２０を拡大または縮小したり、アバターの画像１２０を変化（または、変形）させたりして、アバターの態様を変化させることで、存在感を増したアバターを表示することができる。 As described above, the avatar image 120 is enlarged or reduced, or the avatar image 120 is changed (or deformed) to change the avatar mode, thereby displaying an avatar with increased presence. be able to.

また、比率ｐを算出せずに、アバターの画像１２０を拡大したり変化させたりして、アバターの態様を変化させる場合にも、上述したように、発話動作を行わずに、存在感を増したアバターを表示することができる。 In addition, even if the aspect of the avatar is changed by enlarging or changing the avatar image 120 without calculating the ratio p, as described above, the sense of presence is increased without performing the speaking motion. You can display your avatar.

なお、上述の各実施例では、チャットでは、操作者との間でテキストをやり取りするようにしたが、利用者側端末は、ネットワーク（クラウド）上のチャットサービスサーバにアクセスし、チャットボットとメッセージをやり取りするようにしてもよい。 In each of the above-described embodiments, in the chat, text is exchanged with the operator. may be exchanged.

また、上述の各実施例では、トーク画面において、アバターの画像は、アバターの頭部および首についての画像であり、操作者の音声を出力する場合に、その音量または操作者の首の動きに応じて、アバターの頭部および首の画像を拡大または縮小するようにしたが、アバターの上半身または全身の画像を表示する場合には、アバターの上半身または全身の画像を拡大または縮小するようにしてもよい。この場合には、アバターの発話動作には、上半身または全身を用いた身振り手振りも含まれる。 Further, in each of the above-described embodiments, the image of the avatar on the talk screen is an image of the head and neck of the avatar. The image of the avatar's head and neck is enlarged or reduced accordingly, but when displaying the image of the upper body or the whole body of the avatar, the image of the upper body or the whole body of the avatar is enlarged or reduced. good too. In this case, the speech action of the avatar includes gestures using the upper body or the whole body.

さらに、上述の各実施例では、操作者側端末から送信される音声データに含まれる音量データまたは首の動きデータのように、操作者が発話したときの所定の情報に基づいて、利用者側端末で比率を算出するようにしたが、比率は操作者側端末で算出してもよい。かかる場合には、操作者側端末は、音声データに、算出した比率データを付加して、利用者側端末に送信する。利用者側端末では、受信した音声データに付加された比率データ示す比率でアバターの画像を表示（描画）する。このように、操作者側端末から送信する音声データに比率データを付加する場合には、この比率データが所定の情報である。 Furthermore, in each of the above-described embodiments, based on predetermined information when the operator speaks, such as volume data or neck movement data included in voice data transmitted from the operator-side terminal, the user-side Although the ratio is calculated by the terminal, the ratio may be calculated by the operator's terminal. In such a case, the operator side terminal adds the calculated ratio data to the voice data and transmits it to the user side terminal. The user-side terminal displays (renders) the image of the avatar at the ratio indicated by the ratio data added to the received voice data. In this way, when the ratio data is added to the voice data transmitted from the operator side terminal, this ratio data is the predetermined information.

さらにまた、上述の各実施例では、操作者の音声を利用者側端末のスピーカからそのまま出力するようにしたが、操作者の音声を変換した音声を出力するようにしてもよい。 Furthermore, in each of the above-described embodiments, the operator's voice is directly output from the speaker of the user-side terminal, but the operator's voice may be converted to be output.

また、上述の各実施例では、チャットまたはトークを実行可能なアプリについて説明したが、これに限定される必要はない。トーク（つまり、音声のやり取り）のみを実行可能なアプリでもよい。また、他の例では、ウェブ会議またはビデオ通話を行う場合にも適用でき、ウェブ会議またはビデオ通話においてアバターの画像を表示する場合に、対応する人間が発話する音声の音量または発話するときの当該人間の首の動きに応じた比率ｐで拡大または縮小される。つまり、本願発明は、或る人間が発話する音声を当該或る人間と対話する他の人間が使用する端末で出力するとともに、当該或る人間に対応するアバターの画像を当該他の人間が使用する端末に表示する場合に適用可能である。 Also, in each of the above-described embodiments, an application capable of executing chat or talk has been described, but it is not necessary to be limited to this. An application capable of executing only talk (that is, exchange of voice) may be used. In another example, it can also be applied when conducting a web conference or video call, and when displaying an avatar image in a web conference or video call, the volume of the voice spoken by the corresponding person or the volume of the voice when speaking It is enlarged or reduced by a ratio p according to the movement of the human neck. In other words, the present invention outputs a voice uttered by a certain person to a terminal used by another person who interacts with the certain person, and outputs an avatar image corresponding to the certain person to the other person. This is applicable when displaying on a terminal that

さらに、上述の各実施例では、起動条件を満たす場合に、アプリを起動するようにしたが、ウェブ画面が表示されるときに、アプリを起動するようにしてもよい。 Furthermore, in each of the embodiments described above, the application is started when the activation condition is satisfied, but the application may be activated when the web screen is displayed.

なお、上述の各実施例で示したフロー図の各ステップは同じ結果が得られる場合には、処理する順番を変更することが可能である。 It should be noted that the order of processing can be changed if the steps in the flowcharts shown in the above-described embodiments yield the same result.

また、上述の各実施例で挙げた各種の画面、角度などの具体的数値はいずれも単なる例示であり、必要に応じて適宜変更可能である。たとえば、トークの場合には、トーク画面を表示することに代えて、アバターの画像および表示枠（枠が像）のみを表示することも可能である。 Further, the various screens, angles, and other specific numerical values given in each of the above-described embodiments are merely examples, and can be appropriately changed as necessary. For example, in the case of talk, instead of displaying the talk screen, it is possible to display only the image of the avatar and the display frame (the frame is an image).

１０ …情報処理システム
１２ …利用者側端末
１４ …ネットワーク
１６ …操作者側端末
１８ …サーバ
１８ａ、２０、５０ …ＣＰＵ
１８ｂ、２２、５２ …記憶部
２４、５４ …通信Ｉ／Ｆ
２６、５６ …入出力Ｉ／Ｆ
２８、５８ …入力装置
３０、６０ …表示装置
３２、６２ …マイク
３４、６４ …スピーカ
６６ …センサＩ／Ｆ
６８ …慣性センサ DESCRIPTION OF SYMBOLS 10... Information processing system 12... User-side terminal 14... Network 16... Operator-side terminal 18... Server 18a, 20, 50... CPU
18b, 22, 52...storage section 24, 54...communication I/F
26, 56 ... input/output I/F
28, 58 ... input device 30, 60 ... display device 32, 62 ... microphone 34, 64 ... speaker 66 ... sensor I/F
68 ... inertial sensor

Claims

Receiving means for receiving the voice uttered by the operator and predetermined information when the operator uttered from the operator-side terminal;
sound output means for outputting the sound received by the receiving means;
Based on the predetermined information received by the receiving means, the image of the avatar corresponding to the operator is enlarged or reduced relative to the normal size when the voice received by the receiving means is not output. ratio calculation means for calculating a ratio; and image display means for displaying, on a display, an image of the avatar drawn at the ratio calculated by the ratio calculation means when the sound is output by the sound output means. processing equipment.

2. The information processing apparatus according to claim 1, wherein said image display means further displays a frame image in which said image of said avatar fits in said normal state, and displays said image of said avatar in front of said frame image.

the predetermined information is the volume of the voice uttered by the operator;
3. The information processing apparatus according to claim 1, wherein said ratio calculating means calculates said ratio based on said volume.

the predetermined information is movement of the operator's neck when the operator speaks;
3. The information processing apparatus according to claim 1, wherein said ratio calculating means calculates said ratio based on a movement of said operator's neck.

Receiving means for receiving, from an operator-side terminal, a ratio calculated based on the voice uttered by the operator and predetermined information when the operator uttered the voice;
sound output means for outputting the sound received by the reception means; and an image for displaying on a display the image of the avatar drawn at the ratio received by the reception means when the sound is output by the sound output means. comprising display means,
The information processing apparatus, wherein the ratio is a ratio for enlarging or reducing the image of the avatar corresponding to the operator with respect to a normal size when the voice received by the receiving means is not output.

Receiving means for receiving the voice uttered by the operator and the image of the avatar drawn at a ratio calculated based on predetermined information when the operator uttered the voice;
sound output means for outputting the sound received by the reception means; and image display means for displaying the image of the avatar received by the reception means on a display when the sound output means outputs the sound. prepared,
The information processing apparatus, wherein the ratio is a ratio for enlarging or reducing the image of the avatar corresponding to the operator with respect to a normal size when the voice received by the receiving means is not output.

Receiving means for receiving the voice uttered by the operator and predetermined information when the operator uttered from the operator-side terminal;
sound output means for outputting the voice received by the receiving means to a user-side terminal used by a user who interacts with the operator;
An image of the avatar corresponding to the operator based on the predetermined information received by the receiving means is transferred to a user terminal used by the user who interacts with the operator by transmitting the voice received by the receiving means. Ratio calculation means for calculating a ratio of enlargement or reduction with respect to the normal size when not being output; An information processing apparatus comprising image output means for outputting an image of the avatar to the user terminal.

Receiving means for receiving text or voice uttered by an operator;
An output means for outputting the text or the voice received by the receiving means, and an image display means for displaying an image of an avatar corresponding to the operator on a display,
The image display means displays the image of the avatar on the display in a manner that fits within a frame image in a normal time when the text or the voice received by the receiving means is not output, and the output means outputs the text or the voice. An information processing apparatus that displays an image of the avatar on the display in such a manner as to protrude from the frame image when outputting a sound.

A control program executed by an information processing device,
In the processor of the information processing device,
a receiving step of receiving, from the operator-side terminal, the voice uttered by the operator and predetermined information when the operator uttered the voice;
a sound output step of outputting the sound received in the receiving step;
A ratio to enlarge or reduce the image of the avatar corresponding to the operator based on the predetermined information received in the receiving step with respect to the size received in the receiving step during a normal time when the voice is not output. and an image display step of displaying an image of the avatar drawn at the ratio calculated in the ratio calculation step on a display when outputting the sound in the sound output step.

A control program executed by an information processing device,
In the processor of the information processing device,
a receiving step of receiving, from the operator-side terminal, the voice uttered by the operator and the ratio calculated based on predetermined information when the operator uttered the voice;
a sound output step of outputting the voice received in the receiving step; and an image display step of displaying, on a display, an image of the avatar drawn at the ratio received in the receiving step when outputting the voice in the sound output step. and
The control program, wherein the ratio is a ratio for enlarging or reducing the image of the avatar corresponding to the operator with respect to a normal size when the voice received in the receiving step is not output.

A control program executed by an information processing device,
In the processor of the information processing device,
a receiving step of receiving the voice uttered by the operator and the image of the avatar drawn at a ratio calculated based on predetermined information when the operator uttered the voice;
a sound output step of outputting the voice received in the receiving step; and an image display step of displaying the image of the avatar received in the receiving step on a display when outputting the voice in the sound output step. ,
The control program, wherein the ratio is a ratio for enlarging or reducing the image of the avatar corresponding to the operator with respect to a normal size when the voice received in the receiving step is not output.

A control program executed by an information processing device,
In the processor of the information processing device,
a receiving step of receiving, from the operator-side terminal, the voice uttered by the operator and predetermined information when the operator uttered the voice;
A sound output step of outputting the voice received in the receiving step to a user-side terminal used by a user who interacts with the operator;
outputting an image of an avatar corresponding to the operator based on the predetermined information received in the receiving step, and outputting the voice received in the receiving step to a user terminal used by a user who interacts with the operator; a ratio calculation step of calculating a ratio of enlargement or reduction with respect to the normal size when not in use; and when outputting the voice in the sound output step, the avatar drawn at the ratio calculated in the ratio calculation step A control program for executing an image output step of outputting an image to the user-side terminal.

A control program executed by an information processing device,
In the processor of the information processing device,
a receiving step for receiving the text entered by the operator or the voice spoken by the operator;
executing an output step of outputting the text or the voice received in the receiving step, and an image display step of displaying an image of the avatar corresponding to the operator on a display;
In the image display step, the image of the avatar is displayed on the display in a manner that fits within a frame image in a normal time when the text or the voice received in the receiving step is not output, and in the output step, the text or the voice is displayed. is displayed on the display in such a manner that the image of the avatar protrudes from the frame image when the is output.

A control method for an information processing device having a display,
(a) a step of receiving a voice uttered by an operator and predetermined information when said operator uttered from an operator-side terminal;
(b) outputting the audio received in step (a);
(c) based on the predetermined information received in step (a), the image of the avatar corresponding to the operator is displayed at a normal size when the voice received in step (a) is not output; and (d) displaying the avatar image drawn at the ratio calculated in step (c) on the display when outputting the sound in step (b). A control method, including the step of

A control method for an information processing device having a display,
(a) a step of receiving from the operator-side terminal a ratio calculated based on the voice uttered by the operator and predetermined information when the operator uttered;
(b) outputting the audio received in step (a); and (c) rendering the avatar at the ratio received in step (a) when outputting the audio in step (b). displaying an image on the display;
The control program, wherein the ratio is a ratio for enlarging or reducing the image of the avatar corresponding to the operator with respect to a normal size when the sound is not output in step (b).

A control method for an information processing device having a display,
(a) a step of receiving a voice uttered by an operator and an image of an avatar drawn at a ratio calculated based on predetermined information when the operator uttered;
(b) outputting the audio received in step (a); and (c) displaying an image of the avatar received in step (a) when outputting the audio in step (b). including steps to display in
The control method, wherein the ratio is a ratio for enlarging or reducing the image of the avatar corresponding to the operator with respect to a normal size when the sound is not output in step (b).

A control method for an information processing device,
(a) a step of receiving a voice uttered by an operator and predetermined information when said operator uttered from an operator-side terminal;
(b) a step of outputting the voice received in step (a) to a user-side terminal used by a user who interacts with the operator;
(c) the image of the avatar corresponding to the operator based on the predetermined information received in step (a) is used by the user who interacts with the operator using the voice received in step (a); (d) when outputting the voice in step (b), the step (c) A control method, including a step of outputting the image of the avatar drawn at the ratio calculated in the above to the user-side terminal.

A control method for an information processing device,
(a) receiving operator-inputted text or spoken audio;
(b) outputting the text or the voice received in step (a); and (c) displaying an image of an avatar corresponding to the operator on a display;
The step (c) displays the image of the avatar on the display in such a manner that it fits within a frame image during a normal time when the text or the voice received in the step (a) is not output, and in the step (b) A control method, wherein the image of the avatar is displayed on the display in such a manner as to protrude from the frame image when the text or the voice is output.

An information processing system comprising a server, and a user-side terminal and an operator-side terminal communicably connected to the server,
Receiving means for receiving the voice uttered by the operator and predetermined information when the operator uttered from the operator-side terminal;
sound output means for outputting the sound received by the receiving means;
A ratio calculation for calculating a ratio for enlarging or reducing an image of the avatar corresponding to the operator based on the predetermined information with respect to a normal size when the voice received by the receiving means is not output. and image display means for displaying the image of the avatar drawn at the ratio calculated by the ratio calculation means on the display of the user-side terminal when the sound is output by the sound output means. processing system.