WO2002058011A1

WO2002058011A1 - Avatar animating method and device for communicating in a non-immersive collaborative virtual environment

Info

Publication number: WO2002058011A1
Application number: PCT/FR2002/000236
Authority: WO
Inventors: Pascal Le Mer; Grégory SAUGIS
Original assignee: France Telecom (Sa)
Priority date: 2001-01-22
Filing date: 2002-01-21
Publication date: 2002-07-25
Also published as: EP1358632A1; FR2819920A1; FR2819920B1

Abstract

The invention concerns a method for animating avatars corresponding to people for communication in a non-immersive virtual environment between people. The invention is characterised in that the method consists in: using means to provide intentional functions (action, perception and expression) of a person and to supply corresponding data; then analysing the behaviour of the person; and finally, representing said virtual environment and the avatars, at each cycle.

Description

METHOD AND DEVICE FOR ANIMATING AVATAR TO PROVIDE COMMUNICATION IN A NON-COLLABORATIVE VIRTUAL ENVIRONMENT

IMMERSIVE

The invention relates to an avatar animation method for ensuring communication in a collaborative virtual environment. not immersive. The term Virtual Collaborative Environment (ECV) is used to designate a virtual environment that allows several people to collaborate remotely, the equipment used being linked by network (we also speak of a distributed environment). In many EVCs, avatars are introduced to ensure the more or less realistic- or similar representation of each person. As an example, we can refer to the project described in reference [3] below, to illustrate what is meant by ECV.

These techniques. ^• raise the problem of designing a man-machine interface (HMI). Indeed, each user must be able to interact with the virtual environment but also identify with his "virtual body" instantiated by an avatar, to convey non-verbal information

(gestures, facial expressions) relating to communication and. action between users.

The invention proposes a new concept of human-machine interface making it possible to interact with the virtual environment and communicate to. through his own avatar (or clone), in an "office" context.

In the following we talk about an avatar. The term clone can also be found in the literature for the same concept.

•.We hear:. by context “-bureautique”, a terminal which does not force the user to use an intrusive system of peripherals to immerse themselves in the virtual environment. In other words, the user must have full possession of his different senses to act with the real environment (e.g. pick up his phone, move around, etc.). A characteristic of office contexts is that they use. suitable peripherals (e.g. mice for Windows ™, 3D peripherals for CAD in the automotive sector) -which type or use is not assumed.

Current state of the art

Currently there are two main categories of systems that mediate communication through an avatar in collaborative environments. Either it is the user who is at the origin of the behavior of the avatar, in which case we will speak of intentional animation of the avatars; Either the system interprets the collaborative activity to animate the avatar, we will then speak of interpreted animation.

* Intentional animation of avatars

- There are two approaches to intentionally animate an avatar:

• The-. “televirtuality” type systems [1] which, for example using a video technique or magnetic capture, make it possible to reproduce the behavior of the user facing his terminal through his avatar (eg facial expressions, communication gestures, etc. ) .. We can capture facial expressions and reproduce them through the avatar. Today, the technique also makes it possible to detect [6] and reproduce the gestures [4,5] (ie arms, body, etc.) as we can see in Figure 1. • “Body chat” type systems where each action (interaction with the environment) and each avatar behavior is activated by an avatar command interface [2]. There are many examples of this type of operation in the television world.

* The interpreted animation of the avatars

The idea here is to give behavior to an avatar to signify everyone's activity in the environment. It is then the system which manages its behavior by choosing it from a base of pre-calculated behaviors. The HMI interface allows the user to act in the world, while we can ". Vo r: à. ^• the terminal screen the behavior of the avatar symbolize what it does. can find an example in reference [8].

Reference may be made to the following bibliographical references:

1. INA's Televirtuality Team, http: // www. ina. en / TV /

2. Guye-Vuillème, A., Capin, T., Pandzic, I., Magnenat Thalmann, N.,

Thalmann, D., Nonverbal Communication Interface for. Collaborative Virtual Environments, Proc. CVE 98, 1998.

3. Plenacoste, P., Chabrier, N., Dumas, C, Saugis, G., Chaillou, C .;

3D Interfaces: Interaction and Evaluation HMI 97. 4. Tolani, D., Badler, NI, Rea-Time Inverse Kinematics of the Human Arm, Présence, 1996, Vol 5 (4), pp. 393-401.

5. W. Chin, K., Closed'-form and generalized inverse kinematic solutions for animating the human articulated structure, BSCH (Computer Science) of Curtin University of Technology, 1996-:

6. Marcel, S., Bernier, 0., Collobert, D., EM approach for the construction of homogeneous regions of color: application to the monitoring of a person's face and hands, Coresa 2000, October 2000. 7. Marcel, S., G., Bernier, 0.,

Viallet, J.E., Collobert, D., Hand Gesture Récognition using Input-Output Hidden Markov Models, In International Conférence on Automatic Face and Gesture Récognition, march 2000, pp 456-461.

8. Benford, N. , Greenhalgh, C.,

Bowers, J., Snowdon, D., E., Fahlen, L.,

User Embodiment ^{- "} in Collaborative Virtual

Environments, Proceedings of CHI'95, 1995, pp.224-248.

Disadvantages of prior techniques

The existing systems based on the principles presented above being either intentional, or interpreted, do not allow simultaneous use of the user's semiotic, ergotic and epistemic gestures. These different behaviors are illustrated in the reference Cadoz, C., the human-machine communication channel gesture, computer science and techniques, 1994, vol.13, pp. 31-61.

In other words, these systems do not make it possible to reproduce simultaneously the g.es.tes .. of expressions (that is to say, for example, designate someone with the arm) and the gestures of action and perception (that is to say, manipulating an object of the virtual world or changing point of view) of the user, without using immersive technologies.

The problem lies in the fact, that if the system is of televirtuality type (as illustrated in figure 1) and that the user exerts an ergotic gesture (for example he grasps his mouse to interact with the system), then the animation of the avatar will have no consistency with the actions of the user. In. "Otherwise, if this avatar is animated as described in connection with the interpreted animation, then the avatar cannot reproduce the behavior of the user. Purpose of the invention

The object of the invention is to offer a user a method and a device allowing him to communicate by expressing himself in a natural way (that is to say: verbal and non-verbal, so as to preserve the richness co-verbal gestures of the human being) with other people while having the possibility of interaction with the common virtual environment, through an office terminal.

The device makes it possible to offer a realistic representation of a user in a common virtual environment, in which he can interact and communicate non-verbally without being constrained by an intrusive system. Through this device, one can for example envisage telecommerce, teleconception, distance learning, teleconference, games, etc. services.

In a real collaboration situation, we consider that three behaviors are important in view of communication. This is the action on ^{'environment,} the perception of it and transmitting information to others.

These behaviors are respectively called: ergotic, epistemic and semiotic. It is proposed according to the invention to virtualize these behaviors. The subject of the invention is more particularly a method of animating avatars corresponding to people to ensure communication in a non-immersive virtual environment between these people by means of a terminal provided with a display screen and input devices; the terminal comprising means for controlling animation and displaying images corresponding to .1 '. virtual environment. with. des- .. a.vatars., à., from information received from its peripherals and from a communication network to which it is connected, characterized in that it comprises: the implementation of means for performing functions intentional actions (perception and expression action) of a user and the provision of corresponding information, then a behavior analysis - including a cycle consisting of performing the following operations N times:

- scrutinizing the arrival of information on intentional functions,

- detection of intentions from the information received,

- determination of the animation mode of the avatar in the virtual environment corresponding to the person according to the intention detected for this person; and finally, representation of the environment. virtual and avatars on the terminal screen, each cycle.

The subject of the invention is also a device for animating avatars corresponding to people for ensuring communication in ... a virtual environment n.o.n. immersive between people, comprising a terminal provided with a display screen and input devices, the terminal comprising means for displaying the virtual environment and avatars from information received from its devices and of a telecommunications network, characterized in that the terminal comprises: - means for implementing intentional functions of a user of said device and providing corresponding information,

- means for detecting intentions from the information received and for determining the mode of animation,

- Means for animating the corresponding avatar according to the intention detected by the detection means. Other features and advantages of the invention will become apparent from the following description given by way of nonlimiting example with reference to Figures ^"appended, which show: Figure 1, an image and the virtual representation with motion capture and animation; Figure 2, the steps of the method according to the invention;

- '- the figurative ^• . 3, - - the overview of the general operation; Figure 4 the diagram of a device according to the invention; - - Figure 5, the details of the behavior analysis and animation algorithm; FIG. 6, an illustration of a card game part; - Figure 7, an illustration of this part in the case where the players are distant and use the method of one invention.

Detailed description After a long analysis of existing but unsatisfactory technologies, the inventors have developed a new system

(method and device) implementing the communication functions integrating the behaviors that have just been defined. According to the invention, the method and the device make it possible to exploit four types of intentional behavior: • Actions; by using specialized interaction devices (for example a force feedback system).

• Intentional unaddressed behaviors (eg co-verbal gestures, facial expressions, etc.); by capture (for example the ^• vldé.o) - of - real behavior - of the user.

• Intentional behaviors addressed (looking at someone, speaking, showing something, making a welcome sign, etc.); by a designation hardware or software body (for example a touch screen or software detection of deictic gesture).

• Behaviors unrelated to collaborative activity (eg picking up the phone).

To move from one type of behavior to another, we propose to use a system of interpretation of the user's intention. One can indeed know if a behavior is for example of addressed, symbolic or ergotic type by exploiting the video channel [7].

In the proposed method, illustrated by the diagram in Figure 2, the analysis of user behavior works by permanence: this corresponds to the N loop of the algorithm in Figure 2.

This behavior analysis is interpreted and allows a mode of interaction to be defined. The visual feedback module illustrated in FIG. 3 then allows the user to know in which mode the system is, which can • serve as a regulator if there is ambiguity in the state thereof. Indeed, according to the mode of interaction, the .dis-positi-f. (the HMI interface). from, - the user displays visual feedback adapted to the type of interaction and the user's avatar behaves accordingly. These operations are illustrated by the diagram in FIG. 2 and steps I to VI of this figure.

Figure 3 shows the block diagram of the general operation of the proposed system. The system implements:

-A virtual environment distributed by means of several remote T terminals displaying avatars representing each user. Each - "user from a point of view of the global environment.

Each terminal T includes:

-capture means A of behavior: that is to say movement of the head, body and hands, of the user behind his terminal (eg camera, radar, etc.). - specialized interaction input devices B (ex: mouse, SpaceMouse ™, etc.) for any collaborative activity (ex: design of an automobile). a software module L for intention detection and animation determination using the data coming from the input peripherals.

This device allows: -a representation of the user by an avatar in a common virtual environment.

-The avatar of a user is sometimes animated from a library of pre-calculated behaviors, placed in the system and sometimes guided according to the gestures of the user.

-The device adapts according to the user's intentions by giving suitable visual feedback; this so that he is aware of his level of

"Virtualization".

-The mode selection is not binary; that is to say that • ^* the device makes it possible to combine semiotic and ergotic behavior (eg looking at someone while handling an object).

We will describe below some examples of peripherals used to perform the intentional functions. Figure 4 illustrates these examples of devices. We can also refer to the diagram in Figure 3:

Semiotigue a dre s s é; we consider two possible cases:

(1) The user makes ergotic gestures to ensure the addressed semiotic function of the avatar. We can for example use the following peripherals: - A touch screen: allows - the user to designate,

- Addressing device: specific devices (mouse, SpaceMouse ™) are used to address a behavior,

Pointing-manipulation and navigation device: these devices can be used in, excluding their ergotic and epistemic functions (eg by using a particular button).

(2) The user makes semiotic gestures to ensure the addressed semiotic function of the avatar.

That is to say, the user exploits the movements of the head and hands by excluding their unaddressed semiotic function (eg by using an area of influence of the head or hands [7]). Note that in this case there is no need to have an addressing device as illustrated in FIG. 3.

Semiotigue non -a d essé - A microphone: its use makes it possible to determine the visemes to be reproduced in the animation of the avatar when the user speaks.

One or more camera (s): their use makes it possible to capture the position and orientation of the user's body, head and hands in space.

Er go ti que -Epi s testi que

- Specific peripherals are used here for any action in the environment (eg mouse and SpaceMouse ™ for the design of a motor vehicle; phantom ™ (registered trademark) for a 3D sculpture activity; etc.) The detection module intention and mode determination is illustrated by the detailed steps in Figure 5. This module works for each minimum unit of intention; namely: hands, body and head. Obviously, the ergotic and epistemic behaviors of the head and the body are limited. • We have not described here all the ways of determining the intention of the user, since the peripherals used are only specified in terms of category (devices for capturing posture, pointing, manipulation and navigation, etc.).

• Nevertheless it -'existe combinations of use of interaction devices that do not have ambiguity about the mode / which ut LisaT eur - ^• -set .. wish the system.

Examples:

If the pointing device is used, then the system is in ergot ic-epistemic mode for this hand.

If the device does not detect the presence of the user, then the system is "inactive". - Etc.

Switching from one mode to another does this explicitly. There is no high level semantic analysis on the part of the machine. If there was any ambiguity on the intention, the visual feedback module allows the user to regulate the situation by acting on a device.

When viewing and rendering, using an avatar to represent the user allows perform interpolations between the different postures. The device can therefore make a smooth transition from one mode to another in the rendering. In order to illustrate the interest of the process, we will describe a real collaborative activity and how it would be carried out remotely using this process. In this description, we will propose a technical realization of the concept, as well as adapted visual metaphors which required ^• .-Arbitrary choices. We will consider that these choices for demonstrative purpose are not limiting.

Description of a real activity: "The card game"

In a real scene shown in Figure 6, four people play cards. During this collaborative activity, they will communicate verbally and non-verbally, and they will manipulate common objects

(cards, sheet and pencil to note the points). The example of the card game is very relevant to demonstrate the interest of the process, because in real game situations, the players use non-verbal communication a lot, especially when building coalitions during the game. And it is in particular on this point that the process has an interest compared to the other traditional mediatization tools. Here is a series of player behaviors noted during a card game:

I. Verbal exchanges to find out who will distribute the cards; 2. Player 1 raises his hand to say that he is going to do so and takes over the card game;

3. Player 1.-Looks at player 3, who knows the game well, to ask him how many cards to distribute;

4 ..: Player --3- responds to player 1;

5. Player 1 deals the cards;

6. Player 1 points to player 2 while looking at him and tells him "it's your turn to start ...";

7. Player 2 places a card and looks at player 3 to indicate that it is up to him to place the next one;

8. Player 3 hesitates; he frowns and watches his play to show that he is thinking;

9. Player 3 places a card, making a reflection to everyone, on the card he has just played; 10. Player 4 places a card;

II. Player 3 successively designates player 4 and the game by saying that he "does not have the right to play this card ...";

12. Player 4 takes his card and plays again; 13. Player 1 looks at his partner player 3 with an insistent look and tapping on the table to make him understand that he played badly before; then takes a card from the deck, grimacing;

14. The game is on its way [...];

15. Player 2 pauses the last card and looks at his partner player 4, smiling to show him his satisfaction; 16. Player 1 takes the pencil and the sheet and counts the, points;

17. The other players lean over the sheet to see the total points;

18. The players discuss the game with each other;

19. Player 2 collects the cards and then distributes them for a new game.

Description of this same activity remotely using the method of the invention.

We imagine that the same players must now participate in the same game of distance cards,. using the process. Each has the hardware configuration illustrated in Figure 7, and each has a personalized view of the scene, i.e. the game table (with cards, a paper to note the points and a pencil) and the others players represented by. avatars. We will describe in the following how the technical means are used to analyze user behavior and to animate avatars from a distance. User behavior / use of sensors:

The interactions of the ergotic type (i.e. action on the objects of the virtual environment) are done in the following way: - Use of the pointing device to control the pointer - 3D in a classic way.

Use of the manipulation and navigation device (Space Mouse) to act on a selected object.

There are several known approaches for achieving addressed semiotic behaviors, but we will only describe here the method that we call "instrumented semiotics". There is another technique which is based on the natural movement of the hands and head.

The instrumented semiotic addressed technique makes it possible to control the gaze of one's own avatar and the gestures of designation of a hand:

You use a pointing device by clicking on a particular button to "embody" the virtual hand of your avatar in order to designate something. Two visual returns make it possible to know that one then realizes the intention of designation and to control it; - it is a metaphor of his virtual hand and a highlight of the object "designated" by a ray.

We use a manipulation and navigation device (SpaceMouse ™ by clicking on a particular button to control the gaze of his avatar. In the same way as po-ur -.le .geste, de. Desi.gnati.on_ two visual returns It is a metaphor of his virtual eyes and a highlight of the object "looked at". We notice here that the semiotic mode addressed is detected by pressing a particular button on each device .

Unaddressed semiotic gestures are detected: - For the hands when the peripherals are not activated while significant movements are captured by the behavior capture module.

For the head as soon as significant movements are captured by the motion capture module.

For the body as soon as significant movements are captured by the motion capture module. - For facial expressions when significant expressions are captured by the facial expression capture module.

We note that we can combine several behaviors simultaneously. For example: a hand designation, while looking at something or someone and moving your head and expressing a particular facial expression. Behavior of avatars / animation mode. :

Since the behavior of each user is managed at the terminal level, the exact posture of the avatar is calculated at this terminal. This with the exception of the direction of gestures and directed looks, since these depend on the position of the avatar from a distance and the layout of the scene. From a distance, the information received on avatars is of two types

- A state vector, that is to say the instantaneous postures that avatars must take at each instant. - Addressing information on the objects viewed or designated. More precisely, information for the eyes and another for each hand.

So it's from a distance, depending on the scene arrangement and the position of the avatar, which uses the addressing information to animate the eyes and hands of the avatar.

We will now describe in table 1 which follows, the behaviors that a user will have behind his terminal, if he wants to restore to the best the sequence of behaviors described above from a real situation (especially on non-verbal behaviors ).

1 designates an action device, the intention to pass looks at and designates the hand by clicking on it in one _. fashion/,. ,. Avatar 2 in the player 2 'button to ^semio' fique.Vdressee ,, and all

3 --- ^ 1 activating the control brings up _^ ; ^ interfaces; and

- • X of his gesture addressed. me t aphό rë "* of ⁵ 1 ef ma in -i 'in interface 2

_' . . - virtual afir dëV .; - look and point

^' ; ^" '^;" ^{- J} "- - - give -l'utilisateur - the user (who lf * ^ -. The average ^'control is the player 2).

_' /'"his avatar.- Α -

1, etc. ~

Table 1: Behavior of the avatar in relation to user behavior

Claims

1. Method for animating avatars corresponding to people to ensure communication in a non-immersive virtual environment between said people by means of a terminal provided with a display screen and peripherals, the terminal comprising means for controlling animation and displaying images corresponding to the virtual environment av-.ec -. "avatars, 'from ^• . information received from its peripherals and from a communication network to which it is connected, characterized in that it comprises: - the implementation of means for ensuring intentional functions (action perception and expression) of a person and the supply of corresponding information, then an analysis of the behavior of the person comprising a cycle consisting of performing the following operations N times: scanning for the arrival of information on the intentional functions,

- detection of intentions from the information received, - determination of the mode of animation of the avatar in the virtual environment corresponding to the person according to the intention detected for this person; and finally, representation of said virtual environment and the avatars, at each cycle.

2. A method of animating avatars according to claim 1, characterized in that the implementation of intentional functions comprises capturing the behavior of the user and capturing interactions for an action in the environment.

-3. Method of animating avatars according to Ta ^' claim 2, characterized in that the capture of the behavior comprises the capture of semiotic behavior addressed and / or semiotic unaddressed, epistemic ergotics.

4. A method of animating avatars according to any one of the preceding claims, characterized in that the intention detection comprises, by unit of expression, the determination of the predefined type of intention, namely:

- unaddressed semiotics, - addressed semiotics,

- ergotic or epistemic.

5. A method of animating avatars according to any one of the preceding claims, characterized in that the mode animation of the avatar corresponding to the person corresponds to:

- an isomorphic animation of the avatar in relation to the behavior of the person in the case where the type of intention is semiotic unaddressed,

- an animation directed by the avatar (gaze, designation of the hand) in the case where the type of intention is semiotic addressed. - a generation of posture and behavior. symbolizing. action ... of. -. the person in case the type of intention is ergotic or epistemic,

- A generation of a posture and a symbolic behavior in the case where the type of intention is not predefined.

6. A method of animating avatars according to any one of the preceding claims, characterized in that it comprises the implementation of a visual feedback on the user's terminal making it possible to regulate the mode of animation. .

7. Avatar animation device corresponding to people to ensure communication in a non-immersive virtual environment between people, comprising a terminal (T) provided with a display screen (E) and input devices , the terminal comprising display means (UC) of the virtual environment and avatars from information received from its peripherals and from a telecommunications network, characterized in that the terminal comprises: - means (A, B) for implementing intentional functions of a user of the device and provide corresponding information,

- means (L) for detecting intentions from the information received and for

..determine the- animation mode,

- means (UC) for animating the corresponding avatar according to the intention detected by the detection means.

8. Avatar animation device according to claim 7, characterized in that the means for implementing the intentional functions include behavior sensors (A) and sensors for actions in the environment (B).

9. Avatar animation device according to claim 8, - ^' characterized in that the means for implementing the intentional functions comprise sensors of intentional behavior addressed and / or sensors of intentional behavior unaddressed.

10. Avatar animation device according to claims 8 or 9, characterized in that the various sensors are input devices (A, B) of the terminal (T).

11. Avatar animation device according to any one of claims 8 to 10, characterized in that the behavior is captured by one or more cameras, one or more radars, one or more microphones.

12. Avatar animation device according to any one of claims 7 to 11, characterized in that the. detection of an environmental action is carried out by pointing, manipulation / navigation devices.

13. Avatar animation device according to any one of claims 7 to 12, characterized in that the means for detecting the intentions from the information received comprise • ^' a software module for detecting the intention and determining the animation mode.

14. Avatar animation device according to claim 13, characterized in that the software module is capable of detecting the non user activity or behavior unrelated to the activity.