WO2023073991A1

WO2023073991A1 - Information processing system, server device, information processing method, and program

Info

Publication number: WO2023073991A1
Application number: PCT/JP2021/040270
Authority: WO
Inventors: 崇裕松元; 誉宗巻口
Original assignee: 日本電信電話株式会社
Priority date: 2021-11-01
Filing date: 2021-11-01
Publication date: 2023-05-04

Abstract

The information processing system according to one aspect of the present invention causes a first avatar operated by a person and a second avatar operated by a computer to interact with each other. This information processing system comprises an information processing terminal and a server device capable of communicating with the information processing terminal. The server device comprises a determination unit and a requested operation control unit. The determination unit calculates information on a distance between a first character and a second character, and determines, on the basis of the distance information, a possibility of an interactive operation based on an emotional expression between the first character and the second character for presented content. When it is determined that it is possible to perform the interactive operation, the requested operation control unit controls the operation of the second character to execute the interactive operation.

Description

Information processing system, server device, information processing method and program

One aspect of the present invention relates to an information processing system for human-computer interaction, a server device, an information processing method, and a program used in this system.

Technologies related to virtual reality (VR), augmented reality (AR), and mixed reality (MR) are known. There is an immersive VR application that uses a head mounted display (HMD) to make various characters appear in a virtual space and enjoy them.
A person or an avatar operated by a person is an example of a character that reflects a person's will. Another person, an avatar controlled by that person, or an avatar controlled by a computer are examples of different characters. In recent years, attention has been paid to techniques for allowing a plurality of such characters to interact using a computer.
VR applications are often discussed not only with so-called VR games, but also with the concept of a VR world that provides a new viewing style for broadcasting and video content. In this type of technology, it is common to acquire information on the position and direction of a person's head and hands with a controller that acquires the position of hands, a gyro sensor of an HMD, etc., and control a CG avatar in a VR space. . In the case of CG avatars controlled by humans, it is possible to view content in a VR space in which people can sequentially control the non-verbal actions of the CG avatars in the VR space and naturally initiate and perform high touches.

A high touch is an example of an interaction based on the emotional expression of content. Focusing on high-touch, there is a technology that realizes remote high-touch using a device that transmits images and vibrations of two people in a distant place by communication other than the VR space. Also, there is known a technique for achieving a high touch between a person and a telepresence robot remotely controlled by the person (see, for example, Non-Patent Document 1).

Existing technology uses CG avatars, video/vibration transmission devices, and telepresence robots to achieve a high touch between two people who do not exist in the same space. However, the control of CG avatars and telepresence robots and the control of images displayed on image/vibration transmission devices are based on the premise that humans intervene in both in real time. For this reason, when human agents and NPC agents interact through natural non-verbal actions in the VR space, when the NPC agents prompt the start of emotional expression for content, how should the NPC agents be controlled? I have a problem.

　Non-Patent Document 1 discloses a technology in which avatars high-five each other on the premise that people intervene in real time. However, this technology does not relate to high touch between a human-operated avatar and a computer-operated avatar. Therefore, in order to naturally perform emotional expressions and interactions such as high-five actions between human-operated avatars and computer-operated avatars, how should computer-operated avatars be controlled? A technology is required to solve the problem.

This invention has been made with a focus on the above circumstances, and its purpose is to provide a technology that can naturally convey to people the start of emotional expression and interaction with content.

An information processing system according to one aspect of the present invention is an information processing system that allows a first character that reflects human intentions and a second character that is different from the first character to interact with each other. This information processing system includes an information processing terminal and a server device capable of communicating with the information processing terminal. The server device includes a determination unit and a requested operation control unit. A determination unit calculates distance information between the first character and the second character, and based on the distance information, expresses emotions between the first character and the second character with respect to the presented content. determine the likelihood of interaction based on The requested action control unit controls the action of the second character to perform the interaction when it is determined that the interaction is possible.

According to one aspect of the present invention, it is possible to provide a technology capable of naturally conveying to a person the start of emotional expression and interaction with respect to content.

FIG. 1 is a diagram showing the overall configuration of an information processing system according to one embodiment of the present invention. FIG. 2 is a diagram showing an example of data flow in the information processing system according to one embodiment of the present invention. FIG. 3 is a block diagram showing the software configuration of the server device according to one embodiment of the present invention. FIG. 4 is a flow chart showing an example of a processing procedure of the processor 20 relating to high-touch control. FIG. 5 is a diagram for explaining an implementation example of the high touch possible distance determination unit 24. As shown in FIG. FIG. 6 is a diagram for explaining an implementation example of the high-touch request motion generator 25. As shown in FIG. FIG. 7 is a flow chart showing an example of a processing procedure by the NPC avatar control section 27. As shown in FIG. FIG. 8 is a diagram showing an example of pre-embedded data in the content embedding method of the high-touch activity calculation processing unit 28. As shown in FIG.

Hereinafter, embodiments according to the present invention will be described with reference to the drawings. In the embodiment, when viewing content such as sports videos and music live videos, high touch is performed between a human or a human-operated CG avatar or robot (human agent) and a computer-operated CG avatar or robot (NPC agent). Emotional expression and interaction with respect to content such as hugging, etc. Intention to initiate emotional expression and interaction with content from NPC agents only with non-verbal body movements without presentation of explicit intention to initiate high touch such as utterance, text, icon, etc. is disclosed to the human agent side to start.

Here, in an environment where content such as watching sports or live music video is viewed by multiple people, if it is a VR space, emotional expressions and interactions between an avatar operated by a viewer and an avatar operated by a computer will be considered. Technology that encourages natural interaction is described. More specifically, an operation method for naturally realizing a high touch between a human-operated CG avatar and a computer-operated CG avatar in a VR space accompanied by video content viewing will be described.

Emotional expressions and interactions with content are emotional expressions such as fist pumps and smiles. Here, instead of a physical action that can be achieved by a single person (stand) of a person, CG avatar, or robot, we will focus on actions such as high-fives and hugs that can be achieved by the body action of two or more people, CG avatars, and robots (stand). point to In addition, even if the emotional expression and interaction with respect to the content is an action such as a handshake that can only be realized by the body action of two or more people, it is generally classified as a simple interaction if it does not correspond to the emotional expression.

In the embodiment, three elements (A), (B), and (C) are used for NPC agents to express emotions and interact with content.

(A) Distance information between the human agent and the NPC agent (B) Eye contact state between the human agent and the NPC agent (C) State of the content being viewed (emotional expression for the content and activation level of interaction)
FIG. 1 is a diagram showing an example of an information processing system according to an embodiment of the invention. The system shown in FIG. 1 includes a server 10 as a server device and virtual reality (VR) terminals 100-1, 100-2, . , collectively referred to as “VR terminal 100”).

The network NW is composed of, for example, a relay network and a plurality of access networks for accessing this relay network. As the relay network, a public network such as the general Internet or a closed network that is controlled so that it can be accessed only from limited devices is used. As the access network, for example, a wireless LAN (Local Area Network), mobile phone network, wired telephone network, FTTH (Fiber To The Home), CATV (Cable Television) network is used.

The VR terminal 100 is a head-mounted display type information processing terminal, acquires space information representing the real space as 3D information, and allows the user to experience a virtual space constructed based on the acquired 3D information.

The server 10 communicates with the VR terminals 100 and generates and manages information for sharing a virtual space with a plurality of VR terminals 100. For example, an operator who develops/provides games and applications operates and manages the server. It is a server computer that

FIG. 2 is a diagram showing an example of data flow in an information processing system according to one embodiment of the present invention. A case of high-touching between a CG avatar operated by a person (hereinafter referred to as a human avatar) and a CG avatar operated by a computer (hereinafter referred to as an NPC avatar) in a VR space will be described.

In FIG. 2, a VR viewing space 1 is a virtual space in which two or more human-controlled CG avatars (human avatars) and computer-controlled CG avatars (NPC avatars) can be viewed side by side. be. Furthermore, it is assumed that the VR space is a 3D polygon space in which moving images are reproduced. The VR viewing space always outputs the entire 3D polygon space information as the VR space information to the VR space image presenting unit 21 . It also outputs the video playback time t in the VR space to the high-touch activity calculation processing unit 28 .

CG avatars (human avatar 2-1, NPC avatar 2-2) are human-shaped CG controlled by the human avatar control unit 22 or the NPC avatar control unit 27 in the VR viewing space 1. That is, the human avatar 2-1 and the NPC avatar 2-2 both hold anthropomorphic 3D polygon information as CG avatars and current head direction, head position, and hand positions as information. This is a program that can move the head and hands of a 3D polygon within the VR viewing space 1. All or part of the head direction, head position, and hand position information held by each avatar can be sent to the VR space image presentation unit 21, the eye contact action generation/judgment unit 23, and the high touch possible. It is always output to the distance determination section 24 , the high touch request motion generation section 25 , and the high touch result determination section 26 .

In FIG. 2, the VR display device 101 is a video display device that presents the visual field video of the human avatar of the VR space video presentation unit 21 to the person. In other words, the VR display device 101 is a device that outputs the display image output from the VR space image presentation unit 21 to a monitor and presents the image to a person.

The head position/direction control device 102 is a device provided in the VR display device 101, and is a device for controlling the head position/direction of the human avatar 2-1. In other words, the head position/direction control device 102 measures the head position/direction of a person in the real world, and always provides the real-world position (three-dimensional) and direction information (three-dimensional) of the head, which are the measurement results, to the person. It is a device that outputs to the avatar control unit 22 .

The hand position control device 103 is mainly a HMD controller or a body position measuring device such as Kinect, and controls the hand position of the human avatar 2-1. In other words, the hand position control device 103 measures the positions of both the right and left hands of a person in the real world, and always sends the position coordinates (three-dimensional) of the hands, which are the measurement results, to the human avatar control unit 22. It is an output device.

In FIG. 2, A is the head position/direction information of the CG avatar. B is the head position information of the CG avatar. C is the head position information of the CG avatar. D is hand position information of the CG avatar. These are output from the human avatar 2-1.

　E is the head position/direction information of the CG avatar. F is the head position information of the CG avatar. G is the head position information of the CG avatar. H is hand position information of the CG avatar. These are output from the NPC avatar 2-2.

　I is the head direction control information. J is the eye contact correctness determination result. K is propriety determination information (binary). L is hand position information control information. M is high touch completion information.

　N is control information for the head position/direction and hand position of the CG avatar. O is the estimation result (binary) of the high-touch activity. P is the video playback time t. Q is the VR spatial information. R is the head position/orientation information of the CG avatar.

　S is the display image. T is control information for the head position/direction and hand position of the CG avatar. U is the real-world position/orientation information of the head. V is the real-world position information of the hand.

The VR space image presentation unit 21 uses the entire 3D polygon space input from the VR viewing space 1 and the head position/head direction information in the VR space of the human avatar input from the human avatar 2-1 to An image seen when a pseudo camera is placed in the VR viewing space from the head position toward the head direction is always output as a display image.

The human avatar control unit 22 converts the real-world head position, head direction, and hand position coordinates sent from the head position/direction control device 102 and hand position control device 103 into coordinates in the VR viewing space. Then, the converted coordinates are constantly output to the human avatar 2-1.

The human avatar control unit 22 generates a CG avatar (a human avatar 2-1, an NPC avatar, and a 2-2) is a program for controlling the CG. Algorithms that determine the overall positions of the elbows, shoulders, etc. of the CG avatar from the designated head and hand positions can be realized by a general technique. It is assumed that there is one avatar controlled by the human avatar control unit 22 and one or more avatars controlled by the NPC avatar control unit 27 .

The VR space image presenting unit 21 generates contents such as images when the VR viewing space 1 is viewed from the head position and head direction of the CG avatar (human avatar 2-1, NPC avatar 2-2), and displays the VR space. It is a program to be output to the display device 101 .

The NPC avatar control unit 27 is a program that controls the NPC avatar 2-2 by designating the head position, head direction, and position information of both hands.

The high-touch activity calculation processing unit 28 is a program that calculates the likelihood of high-touch occurrence (high-touch activity) between the NPC avatar 2-2 and the human avatar 2-1.

The eye contact action generation/judgment unit 23 is a program that calculates the head direction of the NPC avatar 2-2 so that the NPC avatar 2-2 makes eye contact. Also, the eye contact behavior generation/determination unit 23 is a program for determining that eye contact has been successful.

The high-touchable distance determination unit 24 is a program for determining whether the person and the NPC avatar 2-2 are within a high-touchable distance.

The high touch request motion generation unit 25 is a program that controls the positions of both hands of the NPC avatar 2-2 so that the NPC avatar 2-2 generates a motion requesting a high touch.

The high touch result determination unit 26 is a program that determines whether or not the high touch was successful.

FIG. 3 is a functional block diagram showing an example of the server 10 shown in FIG. The server 10 shown in FIG. 3 includes a processor 20 such as a CPU, a memory 30, and a communication interface 11, for example.

The communication interface 11 includes, for example, one or more wired or wireless communication interface units, and communicates with the VR terminal 100 according to the communication protocol defined by the network NW. As a wired interface, for example, a wired LAN (Local Area Network) is used. As the wireless interface, an interface that adopts a low-power wireless data communication standard such as wireless LAN or Bluetooth (registered trademark) is used.

The memory 30 uses, as a storage medium, a combination of, for example, a non-volatile memory such as an HDD or an SSD that can be written and read at any time, and a volatile memory such as a RAM. It is used to store various data that has been processed.

The processor 20 has, as processing functions related to the embodiment, a VR space image presentation unit 21, a human avatar control unit 22, an eye contact action generation/judgment unit 23, a high-touch possible distance judgment unit 24, and a high-touch request motion generation unit shown in FIG. 25 , a high touch result determination unit 26 , an NPC avatar control unit 27 , and a high touch activity calculation processing unit 28 .

FIG. 4 is a flowchart showing an example of the processing procedure of the processor 20 related to high-touch control. In FIG. 4, the processor 20 calculates the high-touch activity level h of the VR viewing space and determines whether the high-touch activity level h is greater than a certain value (step S1). If Yes in step S1, the processor 20 determines whether or not the NPC avatar exists within the high touch possible distance of the human avatar (step S2).

If Yes in step S2, the processor 20 performs eye contact control on the head of the NPC avatar (step S3). Next, the processor 20 determines whether the eye contact determination result is True while the high-touch activation level h is greater than the predetermined determination value T (step S4). If it is step S4, the processor 20 causes the NPC avatar to perform a high touch request action (step S5). Next, the processor 20 determines whether or not the high touch determination has become true for a certain period of time t (step S6). If Yes in step S6, the processor 20 performs high-touch effects (control of NPC avatars, operation of human avatars, vibration of user controllers, touch sounds, etc.), and ends high-touch control (step S7).

By implementing the flowchart in Figure 4, it is possible to initiate a natural high-five from the NPC avatar through non-verbal actions.

Next, an implementation example of the high-touch activity calculation processing unit 28 will be described. As a method of calculating the high-touch activity level h at the playback time t of the content in the VR viewing space 1, for example, the following three methods can be cited.

[Content embedding method]
This is a method in which the likelihood of a high touch occurring is described in advance as a high touch activity level h according to each playback time of content such as moving images played in the VR viewing space 1 . For example, in the case of a sports video, the activity level h=1 for Δt seconds from the time when a score is scored, in the case of a live music performance, the activity level h=1 for Δt seconds after the end of the song, and h=0 for other time periods. can be considered.

[Content reading method]
This is a method of algorithmically processing content data such as moving images reproduced in a VR viewing space and calculating the high-touch activity level h at each time. In the case of sports, etc., the loudness of the audience voice is useful for determining the likelihood of occurrence of a high touch. _Therefore , the coefficient α α*SP _t , which is multiplied by , is considered to be the high-touch activity level h at video content time t.

[Content viewer behavior measurement method]
If there are other people or human agents who are viewing the content or have viewed the same content in the past, the number of high touches between human agents at content time t is assumed to be high if the number of high touches is high. This is the method used. For example, using the program of the high touch result determination unit 26, if the number of high touches between human CG avatars that occurred in the VR viewing space 1 from time t to t+Δt of the content is N _{touch_t} , the activity level h=β *N _{touch_t} can be considered.

The high-touch activity calculation processing unit 28 defines a threshold T for the high-touch activity, and determines that the high-touch activity is active when the high-touch activity h according to each method satisfies h>T, and is otherwise inactive. .

An implementation example of the eye contact behavior generation/determination unit 23 will be described. As one method for realizing the eye contact behavior generation/determining unit 23, a head position/direction control device 102 and a hand position control device 103 that determine the head position/direction of the human avatar, and the head position/direction of the NPC avatar. and how to determine that the eye contact was successful. Here, the generation of eye contact behavior is defined as control to control the head direction of the NPC avatar based on some processing so that the human avatar and the NPC avatar head direction (face front direction) or line-of-sight direction face each other. do.

Also, the control for determining successful eye contact is defined as the control for returning the success or failure of eye contact as a binary value based on the head direction or line-of-sight direction of the human avatar and the NPC avatar.

(Regarding eye contact behavior generation method)
FIG. 5 is a diagram for explaining an implementation example of the high touch possible distance determination unit 24. As shown in FIG. As shown in FIG. 5, in a coordinate system having the center of the head of the human avatar as the origin, the unit vector of the current head direction (the front direction of the face) is (Ex _u , Ey _u , Ez _u ), and the human avatar Let the unit vector from the center of the head toward the center of the head of the NPC avatar be (Ex' _u , Ey' _u , Ez' _u ), then (Ex _u , Ey _u , Ez _u ) and (Ex' _u , The angle formed by two vectors Ey′ _u , Ez′ _u ) is defined as θu.

6 is a diagram for explaining an implementation example of the high-touch request motion generation unit 25. In FIG. 6, the height component of the head center coordinates of the NPC avatar is 0 (X _n , Y _n 0) to (X _u , Y _u , 0), which is the height component 0 of the head center coordinates of the human avatar.

Next, the unit vector A is rotated by θr in the right-hand horizontal direction as viewed from the NPC avatar, and the unit vector Al is rotated by θl in the left-hand horizontal direction. Calculate RAr and RAl multiplied by the length R of the defined fixed length. Let RAr=( _XRAr , _YRAr , 0) and RAl=( _XRAl , _YRAl , 0).

Next, RAr and RAl are added to (X _n , Y _n , 0), which is the coordinate of the height component 0 of the center coordinates of the NPC avatar, and the value in the height direction is set as a fixed value of z. are set as right-hand target coordinates TR and left-hand target coordinates TL, respectively, as TR=( _Xn + _XRAr , _Yn + _YRAr , z) and TL=( _Xn + _XRAI , _Yn + _YRAI , z).

As shown in FIG. 6, in a coordinate system with the center of the head of the NPC avatar as the origin, the unit vector of the current head direction is (Ex _n , _Eyn , Ez _n ), and from the center of the NPC avatar's head When the unit vector toward the head center direction of the human avatar is ( _Ex'n , _Ey'n , _Ez'n ), ( _Exn , _Eyn , _Ezn ) and (Ex'n, _Ey'n , _Ez ' _n ) is defined as θn.

When θu<θn, change the head direction vector of the NPC avatar. When the point on the line connecting ( _Exn , _Eyn , _Ezn ) and ( _Ex'n _, _Ey'n , _Ez'n ) is (Ex''n, _Ey''n , _Ez''n ) The angle _θ''n formed by the two vectors (Ex''n, _Ey''n , _Ez''n ) and ₍ Ex'n, _Ey''n , _Ez''n ) is θ''n=θu Compute ( _Ex''n , _Ey''n , _Ez''n ) such that

Then, the head direction of the NPC avatar is controlled so that ( _Ex''n , _Ey''n , _Ez''n ) becomes a new head direction vector in the head center coordinates of the NPC avatar.

With the above processing, when the human avatar changes its head direction toward the NPC avatar's head, the NPC avatar can also implement an eye contact behavior in which the user avatar's head is turned back.

(Regarding eye contact success determination method)
A successful eye contact is defined when θ″ _n in the eye contact behavior generation method is less than a predefined angle θ _{eye_contact} and is greater than a specified time t _{eye_contact} .

An implementation example of the high touch possible distance determination unit 24 will be described. Here, an example of a method of determining whether a high touch is possible distance is shown from the head position of a human avatar and the head position of an NPC avatar. Assuming that the human avatar's head center coordinates are (X _u , _Yu , Z _u ) and the NPC character's head center coordinates are (X _n , Y _n , Z _n ), the distance D between the two coordinates is given by the formula (1) ).

If D becomes less than a predetermined constant distance D _touchable , it is determined that a high touch is possible. In this way, the high-touchable distance determining unit 24 is implemented as a program that returns binary information indicating whether or not the two are within a high-touchable distance from some kind of positional information about the human avatar and some kind of positional information about the NPC avatar.

An implementation example of the high touch request motion generation unit 25 will be described. The high-touch request motion generation unit 25 generates a high-touch request motion of the NPC from the head position/hand position of the human avatar and the head position/hand position of the NPC avatar. The high-touch request motion generation unit 25 is defined to output the coordinates of the positions of both hands of the NPC avatar, and the NPC avatar control unit 27 controls the motion of the NPC avatar according to the coordinate positions of both hands output by the high-touch request motion generation unit 25. By generating it, the NPC avatar performs a high touch request action.

In response to the above results, the processing of the NPC avatar control unit 27 moves the center coordinates of the current right and left hands of the NPC avatar toward the TR and TL obtained above. Both hands of the above NPC avatar are stretched out toward the human avatar and stand still at a certain height, and motions requiring a high touch can be generated and executed.

An implementation example of the high touch result determination unit 26 will be described. Here, a method for determining whether or not a high touch has been successful between a human avatar and an NPC avatar based on the hand position of the human avatar and the hand position of the NPC avatar will be described. The high touch result determination unit 26 is defined as outputting a binary value indicating whether or not the high touch was successful between the human avatar and the NPC avatar. When the distance between the right hand center coordinates of the human avatar and the NPC avatar's left hand center coordinates is _R1 , and the distance between the left hand center coordinates of the human avatar and the NPC avatar's right hand center coordinates is _R2 , R If both ₁ and _R2 are less than a fixed distance R _touched , it is considered that the high touch was successful and returns true.

FIG. 7 is a flowchart showing an example of the processing procedure by the NPC avatar control unit 27. In FIG. 7, the NPC avatar control unit 27 determines whether or not the high touch activity input from the high touch activity calculation processing unit 28 is active (step S11). If Yes in step S11, the NPC avatar control unit 27 determines whether the determination result of whether or not the high-touchable distance is within the high-touchable distance by the high-touchable distance determination unit 24 is (True) (step S12).

If Yes in step S12, the NPC avatar control unit 27 controls the head direction (eye direction) of the human avatar 2-1 based on the head direction data of the NPC avatar output from the eye contact behavior generating/determining unit 23. contact control) is performed (step S13).

Next, the NPC avatar control unit 27 confirms that the high touch activity input from the high touch activity calculation processing unit 28 is in an active state (True) and the eye contact determination result by the high touch possible distance determination unit 24 is True. (Step S14). If Yes in step S<b>14 , the NPC avatar control unit 27 receives the positional coordinate information of both hands of the NPC avatar from the high touch request motion generation unit 25 . Then, the NPC avatar control unit 27 gradually changes the coordinates of both hands of the NPC avatar 2-2 in the direction of the received coordinates, stops after reaching the coordinates, and executes the high touch request operation (step S15).

Next, the NPC avatar control unit 27 determines whether or not the high touch determination result input from the high touch result determination unit 26 is True within a certain period of time t after the high touch request operation is completed (step S16). If Yes in step S16, the NPC avatar control unit 27 ends the high-touch control. (Step S17).

FIG. 8 is a diagram showing an example of pre-embedded data in the content embedding method of the high-touch activity calculation processing unit 28. As shown in FIG. As for the high-touch activity calculation processing unit 28, an example based on the [content embedding method] is shown.

The high touch activity calculation processing unit 28 constantly receives the video playback time t from the VR viewing space 1 . It is also assumed that the high-touch activity level calculation processing unit 28 has pre-embedded data that uniquely determines the activity level h at each moving image time t, as shown in FIG. Then, the high-touch activity calculation processing unit 28 outputs True to the NPC avatar control unit 27 if the activity h at the current video time t is 1, and False if it is -1.

The eye contact behavior generation/determining unit 23, as described in (Method of generating eye contact behavior), based on the head position/direction information of the human avatar and the head position/direction information of the NPC avatar, Head direction information of the NPC avatar at the time of control is output to the NPC avatar control unit 27 .

In addition, the eye contact action generation/judgment unit 23 performs the eye contact correct/incorrect judgment shown in (regarding the eye contact success judgment method) at the timing of outputting information to the NPC avatar control unit 27, and the judgment result is True/ False is output to the NPC avatar control unit 27 .

The high touch possible distance determination unit 24 always refers to the head position information of the human avatar and the head position information of the NPC avatar. Then, determination as to whether or not the high-touchable distance is reached is always performed, and the determination result True/False is output to the NPC avatar control unit 27 .

As described in the description of FIG. 5, the high touch request motion generation unit 25 always refers to the head position information of the human avatar and the head position information of the NPC avatar. Then, the high-touch request motion generation unit 25 constantly outputs control information of both-hands position information corresponding to the positions of the NPC avatar's hands at the time of the high-touch request to the NPC avatar control unit 27 .

The high touch result determination unit 26 always refers to the hand position information of the human avatar and the hand position information of the NPC avatar. Then, the judgment result as to whether or not the high touch is performed is always output to the NPC avatar control unit 27 with True/False.

As described above, according to the embodiment, the distance between the viewer and the computer avatar, the likelihood of eye contact, and high-touch occurrence are calculated while the content is being viewed, and these values satisfy certain conditions. In this case, the avatar of the viewer and the avatar of the computer are controlled so that they touch each other. This allows a high-five action between a human avatar and an NPC avatar without an explicit presentation action to initiate the high-five.

A system that allows high-touch between human agents in a VR (Virtual Reality) space with content viewing is known. However, when high-touching is performed by an NPC agent and a human agent, it was not clear how to control the NPC agent in order to naturally initiate high-touching.

When the NPC agent urges the human agent to express emotions and interact with the content, such as high fives, the NPC agent presents utterances and texts such as "Let's give a high five" or CG icons associated with high fives. can be considered. However, it is rare for people watching live sports or music in the real world to start high-five by communicating their intention to start high-five through speech, text, icons, etc. Usually, it is a natural agreement through non-verbal actions. starts a high five. For this reason, a rare and unnatural initiation of a high touch may impair the sympathetic feeling/experience brought about by the high touch.

It has not been clarified how NPC agents can naturally express emotions and initiate interactions with content such as high fives using only non-verbal body movements. If the NPC agent cannot naturally convey the intention of starting the high-five to the person through non-verbal body movements, not only will the person ignore the start of the high-five, but the person will feel unnatural and uncomfortable with the NPC agent's movement and watch the content. with an NPC agent.

On the other hand, according to the embodiment, the NPC agent can naturally convey to the person the start of emotional expression and interaction with respect to the content, such as a high touch, using only non-verbal body movements, and can start the emotional expression and interaction with the content. become.

It should be noted that the present invention is not limited to the above embodiments as they are. For example, the human avatar 2-1 and the NPC avatar 2-2 can interact not only in a virtual space implemented by a computer (server 10), but also in real space. Furthermore, the human avatar 2-1 and the NPC avatar 2-2 can also interact in an environment in which augmented reality is superimposed on the real space. Also, the human avatar 2-1 may be a person itself, and the NPC avatar 2-2 may be a real-world robot or the like. In short, the human avatar 2-1 is an example of a first character that reflects human intentions, and the NPC avatar 2-2 is an example of a second character different from the first character.　

Also, in the implementation stage, the present invention can be embodied by modifying the constituent elements without departing from the gist of the present invention. Furthermore, various inventions can be formed by appropriate combinations of the plurality of constituent elements disclosed in the above embodiments. For example, some components may be omitted from all components shown in the embodiments. Furthermore, constituent elements of different embodiments may be combined as appropriate.

Reference Signs List 1 VR viewing space 2-1 Human avatar 2-2 NPC avatar 10 Server 11 Communication interface 20 Processor 21 VR space video presentation unit 22 Human avatar control unit 23 Eye contact action generation/judgment unit 24 High touch possible distance determination unit 25 High touch request motion generation unit 26 High touch result determination unit 27 NPC avatar control unit 28 High touch activity calculation processing unit 30 Memory 100 VR terminal 100-1 Virtual reality terminal 100-2 Virtual reality terminal 101 VR display device 102 Head position/direction control device 103 Hand position control device.

Claims

An information processing system that interacts with a first character that reflects a person's intention and a second character that is different from the first character,
an information processing terminal;
A server device capable of communicating with the information processing terminal,
The server device
Distance information between the first character and the second character is calculated, and based on the distance information, interaction based on emotional expression between the first character and the second character with respect to the presented content. a determination unit that determines the possibility of motion;
an information processing system, comprising: a requested action control unit that controls an action of the second character to perform the interaction when it is determined that the interaction is possible.
The first character is a first avatar operated by a person,
The information processing system according to claim 1, wherein said second character is a second avatar operated by a computer.
the content is presented in a virtual space implemented by the computer;
3. The information processing system according to claim 2, wherein said first avatar and said second avatar interact in said virtual space.
The content is presented in an environment in which augmented reality is superimposed on real space,
3. The information processing system according to claim 2, wherein said first avatar and said second avatar interact in an environment in which augmented reality is superimposed on said real space.
the content is presented in real space;
3. The information processing system according to claim 2, wherein said first avatar and said second avatar interact in said real space.
A server device provided in an information processing system that allows a first character that reflects a person's intention and a second character that is different from the first character to interact,
Distance information between the first character and the second character is calculated, and based on the distance information, interaction based on emotional expression between the first character and the second character with respect to the presented content. a determination unit that determines the possibility of motion;
A server apparatus comprising: a requested action control unit that controls an action of the second character to perform the interaction when it is determined that the interaction is possible.
An information processing method executed by a server device provided in an information processing system for causing interaction between a first character that reflects a person's intention and a second character that is different from the first character,
The server device calculates distance information between the first character and the second character, and based on the distance information, the distance between the first character and the second character with respect to the presented content. Determining likelihood of interaction based on emotional expression;
An information processing method, comprising: when the server apparatus determines that the interaction is possible, controlling the action of the second character to perform the interaction.
A program for a server device provided in an information processing system that causes a first character that reflects human intentions and a second character that is different from the first character to interact,
The server device calculates the distance information between the first character and the second character, and based on the distance information, the distance between the first character and the second character with respect to the presented content. Function as a judgment unit that judges the possibility of interaction based on emotional expression,
A program that causes the server device to function as a request action control unit that controls the action of the second character to execute the interaction when it is determined that the interaction is possible.