WO2007031584A1

WO2007031584A1 - Method for animating a three-dimensional conversational agent including a step of controlling the visual behaviour and corresponding device and computer programme

Info

Publication number: WO2007031584A1
Application number: PCT/EP2006/066428
Authority: WO
Inventors: Gaspard Breton; Christophe Garcia; Danielle Pele
Original assignee: France Telecom
Priority date: 2005-09-16
Filing date: 2006-09-15
Publication date: 2007-03-22

Abstract

The invention concerns a method for animating a three-dimensional conversational agent (15) in a virtual or mixed reality environment, including a step of controlling the visual behaviour of the agent (15), during a conversation with at least two other parties. The invention is characterized in that the controlling step includes a step of selecting a target at which the agent (15) gazes, integrating: at least a history of the visual behaviour, said history incorporating, for at least one of the parties, successive interactions of said agent with said party; and at least one conversational state, called conversational state, the conversational state belonging to a set of states including at least one state of dialogue and one state of rest.

Description

A method of animating a three-dimensional conversational agent implementing a visual behavior control step, device and corresponding computer program.

FIELD OF THE INVENTION The field of the invention is that of the animation of three-dimensional conversational agents (3D) in a virtual or mixed reality environment.

In particular, "three-dimensional conversational agent" refers to a three-dimensional representation of a character in a virtual environment. The invention more specifically relates to a technique for controlling the visual behavior of a conversational agent during a conversation with several interlocutors, notably to improve interaction with the interlocutors.

The invention applies in particular, but not exclusively, to virtual reality systems in the context of services, games, discussion forums, services or computer-assisted collaborative work applications.

2. Prior Art

Many techniques of animation of an avatar, allowing in particular to play on his visual behavior, are well known.

Classically, these techniques require the avatar to look in turn at each of the interlocutors, according to a vision system capable of detecting these interlocutors.

Other more recent studies have found that the semantic content of a conversation is also important for finely controlling visual servoing.

Thus, OE Torres, J. Cassel, and S. Prévost, describe in their article "Modeling Gaze Behavior as a Function of Discourse Structure" (Workshop on Human Computer Conversations, Bellagio, Italy 1997) an experiment carried out on natural persons, filmed in a situation of dialogue. According to this experience, it appeared that the nature of a proposal, for example thematic or rhematic, had an influence on the visual behavior of natural persons.

Remember that in semantics, the theme corresponds to the element of an utterance which is known to the participants in the conversation (that is to say, which refers to something already mentioned), and the rheme corresponds to the new information provided to the statement.

Thus, this article suggests taking into account the semantic content of the conversation to animate the eyes of a three-dimensional conversational agent. Other techniques of animation of a three-dimensional conversational agent in a virtual environment or mixed reality have also been described in the articles cited in Appendix 1, an integral part of this description.

3. Disadvantages of the Prior Art Conventional animation techniques based on a vision system capable of detecting the interlocutors present in the avatar's field of vision are generally basic and not very representative of the real behavior of a natural person. . They therefore interfere with the interactivity between a conversational agent and a user during a dialogue, because of this lack of realism.

In particular, these techniques generally do not allow to model finely the coordination between the movements of head and those of the eyes.

More recent animation techniques, based on the study of the semantic content of the conversation, offer better results in terms of modeling human behavior, which gives a more realistic rendering during a conversation between an agent and a user .

However, these techniques are poorly developed and were only considered in the context of face-to-face discussions between an avatar and an interlocutor. They thus provide no solution to the problem of controlling the visual behavior of an avatar who is in a dialogue with a group of several users (which interlocutor to look at, when, with what insistence, ...).

In other words, these techniques do not make it possible to improve the interactivity between the interlocutors during a dialogue situation between an avatar and a group of several users.

4. Objectives of the invention

The invention particularly aims to overcome these disadvantages of the prior art.

More specifically, an object of the invention is to provide a three-dimensional conversational agent animation technique that gives it a more realistic and fluid movement than the techniques of the prior art, in the context of a conversation with several interlocutors. .

In particular, an objective of the invention is to implement such a technique for controlling the visual behavior of a three-dimensional conversational agent, in the context of such a conversation with two or more interlocutors.

The invention also aims to provide such a technique that is effective, simple to implement and inexpensive.

5. Objective of the Invention These and other objects which will appear later are achieved by a method of animating a three-dimensional conversational agent in a virtual or mixed-reality environment, implement a step of controlling the visual behavior of the agent, during a conversation with at least two interlocutors. According to the invention, the step of controlling the visual behavior implements a step of selecting a target that the agent looks, taking into account:

at least one history of the visual behavior, the history taking into account, for at least one of the interlocutors, the successive interactions of the agent with the interlocutor; - and at least one state of the conversation, called conversation state, the conversation state belonging to a set of states comprising at least one dialogue state and a quiescent state.

Thus, the invention proposes a completely new and inventive approach to the animation of a 3D conversational agent, based on the joint use of information representative of a state of conversation between the agent and at least two interlocutors, and its past visual behavior, to select a target to watch, and more generally control different parameters of the visual behavior of the agent.

In particular, an interlocutor corresponds, in the context of the present invention, to a physical user or another 3D conversational agent.

Thus, the conversational agent can look for a predetermined first time on a selected main contact taking into account the history of the agent's visual behavior and a state of the conversation, and then look at a second interlocutor or a separate target interlocutors for a second predetermined duration, the second duration being less than the first duration. The agent can thus alternately look at the main interlocutor and any target.

Controlling the visual behavior of the conversational agent then makes it possible to improve the interactions between the agent and the group of interlocutors, since the agent can, according to the invention, choose where to look, depending on the information to convey, the history of the conversation, possibly its subject, etc.

Advantageously, the selection step also takes into account a distance from the agent to at least one of the interlocutors. The agent can thus determine at regular time intervals the position of each of the interlocutors, as well as a space in which no interlocutor is located.

Taking into account the distance of the avatar to each interlocutor makes it possible to improve the realism of the movement of his eyes, the inventors having determined that such a criterion was essential in the context of a discussion between real people.

Preferably, during the control step, at least one of the parameters of the visual behavior belonging to the group comprising: a frequency of the glances of the agent; - a duration of the glances of the agent.

The invention thus makes it possible to create an additional communication channel between the agent and the interlocutors, since it makes it possible to control the gaze of the agent so that the latter looks at an interlocutor with more or less insistence depending, for example the message he seeks to convey, or that he receives.

Advantageously, the history takes into account for at least one of the interlocutors, a time during which the agent has already looked at the interlocutor, called watch time.

This gaze time is particularly related to each interlocutor. The agent can thus select the interlocutor he will look at by considering the look time he has already granted, and the distance between the interlocutor and the agent.

Preferably, during the selection step, each of the interlocutors is associated with an interval of probabilities If included in the interval [θ, l], such that the meeting of the intervals of probabilities If corresponds to the interval [θ, l ], the intervals of probabilities // being disjoint.

These intervals of probabilities are determined in particular as a function of the distance of the interlocutor i to the agent, and the history of the visual behavior of the agent, that is to say the viewing time that the agent has already devoted to watching the user i.

Moreover, these intervals of probabilities are regularly updated (for example every 10 ms approximately). Thus, the position of the target is regularly re-evaluated, which makes it possible to control the gaze of the agent according to a possible displacement of the target. In particular, the length of the interval of probabilities associated with the interlocutor i is equal to a score of relevance S _/ normalized, determined according to the equation:

or :

Sd ₁ d.

NOT

Σ J = I ", t (≠ 0, and St / = 1 for t / = 0;

with: - N the number of interlocutors present in the agent's field of vision; d [the distance between the interlocutor i and the agent; t ₍ the gaze time associated with the interlocutor i, - a ≡ [0, l], b ≡ [0, l], a + b = 1. The values a and b make it possible in particular to weight the values of the score

Sdi linked to the distance d (, and the score St / linked to the watch time t (, and therefore to assign more or less importance to the criterion of distance or watch time.

For example, if we choose a = 0.5 and b = 0.5, this means that the distance criterion and the watch time criterion are given equal importance.

We can notice that when t; = 0, we have St / = 1. Therefore, when a caller arrives during a conversation, his watch time is equal to 0, so his score St / is high, and therefore this new interlocutor is more likely to to be watched by the conversational agent. Preferably, when the state of conversation corresponds to the state of rest, one selects any target, and the control step imposes on the agent to take a look in the direction of the target at a frequency greater than a first predetermined threshold. When the state of conversation corresponds to the state of dialogue, alternately selecting a target corresponding to one of the interlocutors and a separate target of each of the interlocutors, and the control step requires the agent to throw a glance in the direction of each of the targets at a frequency greater than a second predetermined threshold.

For example, in a state of rest, the step of controlling the visual behavior of the avatar requires the latter to glance in a random direction at a frequency greater than 1 second: the agent can for example look at the floor for 2 seconds, then look at someone for 3 seconds, then look away for 3 seconds, and look at another person for 2 seconds, etc.

In a dialog state, the agent can, for example, look at the floor for 1 second, then look at an interlocutor for 8 seconds, then look away for 3 seconds, and look at another person for 6 seconds, and so on.

The agent then seems more attentive when the state of conversation corresponds to a state of dialogue, and looks more frequently at the interlocutors. On the other hand, when the state of conversation corresponds to a state of rest, it lets its gaze rest on any target.

The invention thus makes it possible to animate the avatar, and in particular his gaze, very closely to that of human behavior.

Advantageously, when the control step requires the agent to look at one of the interlocutors, a random draw of a value in the interval [θ, l] is performed, and the selected target is the interlocutor whose the associated probability interval i contains the value obtained by drawing.

In other words, when it is necessary to select an interlocutor to look among the group of interlocutors, according to the state of the conversation, one randomly draws a value between 0 and 1 and one selects the interlocutor whose the associated probability interval contains this value: if the value 0.5 is drawn, the selected target corresponds to the interlocutor i whose probability interval i contains the value 0.5.

Preferably, when the conversation state corresponds to the dialogue state, the control step takes into account a dialogue information, representative of the semantic content of the conversation, and makes it possible to increase the frequency of the hits. 'eye when the semantic content is of type rheme, with respect to the frequency when the semantic content is of type theme.

For example, at least at the beginning of a rheme state (corresponding to the transmission or listening of a proposal containing a new element), the agent can take a first look at the interlocutor who listening, or speaking, and then glancing intermittently (less than a second) to this interlocutor.

On the other hand, in a state of a theme (corresponding to the emission or listening of a proposition referring to an element already mentioned), the agent can glance intermittently towards the interlocutor and towards a where there is no party at a frequency greater than one second.

The invention also relates to a device for animating a three-dimensional conversational agent in a virtual environment or mixed reality, comprising means for controlling the visual behavior of the agent during a conversation with at least two interlocutors.

According to the invention, the control means comprise means for selecting a target that the agent looks from:

at least one history of the visual behavior, the history taking into account, for at least one of the interlocutors, the successive interactions of the agent with the interlocutor;

and at least one state of the conversation, the conversation state belonging to a set of states comprising at least one dialogue state and a state of rest. The invention finally relates to a computer program product downloadable from a communication network and / or stored on a computer-readable and / or executable medium by a microprocessor, comprising program code instructions for carrying out the steps of the method of animation of a three-dimensional conversational agent described previously. 6. List of figures

Other features and advantages of the invention will appear more clearly on reading the following description of a preferred embodiment, given as a simple illustrative and nonlimiting example, and the appended drawings, among which: FIG. 1 shows a device for animating a three-dimensional conversational agent according to a preferred embodiment of the invention; FIG. 2 illustrates the different states of conversation of the device according to FIG. 1; Figure 3 describes the principle of the vestibulo-ocular reflex; FIG. 4 represents the angular velocity curves used according to the invention for modeling the rotation of the eyes and the neck of a conversational agent; Figure 5 illustrates the movements of the neck and eyes of the agent from an articulated chain; - Figure 6 shows the rotation techniques applied to the neck and eyes of the agent; Figure 7 shows the structure of an animation system of a three-dimensional conversational agent according to the invention.

7. DESCRIPTION OF AN EMBODIMENT OF THE INVENTION The general principle of the invention is based on the joint use of information representative of a state of conversation between a three-dimensional conversational agent (3D) and at least two interlocutors. , and a history of the agent's visual behavior during the conversation, to select a target that the agent must look at and control his gaze, in order to animate the agent more realistically. In particular, it is considered that the history of the visual behavior of the agent takes into account, for at least one of the interlocutors, successive interactions of the agent with the interlocutor.

Specifically, such a 3D conversational agent can interact with real users and / or other conversational agents, subsequently called interlocutors.

Fine modeling of the visual behavior of the conversational agent thus makes it possible to improve the interactions with the interlocutors, since the agent can, according to the invention, choose where to look, depending on the information to be passed, creating then an additional communication channel.

With reference to FIG. 1, a preferred embodiment of the invention is presented according to which a 3D conversational agent 15 discusses with at least two interlocutors, and in which video equipment 11, of digital camera type ("webcam") , transmits to a vision system 12 information relating to the interlocutors located in the field of view of the agent 15.

According to this embodiment, the behavior of the agent 15 is controlled by a behavior module 14, powered by the vision system 12 and a dialogue system 13. The vision system 12 allows in particular to detect and determine the position actual interlocutors in the field of view of the agent 15, and calculate the distance of each of the N interlocutors to the agent 15, for example based on a summary calibration taking into account the parameters of the video equipment 11 and the average size of a head. In particular, this vision system 12 can detect the face of the interlocutors in the images from the video equipment 11 and thus identify the interlocutors, according to known face detection techniques, which are not the subject of the present invention and are not described here in more detail. For more information, it may in particular refer to the French patent application FR 05 03047 in the name of the same applicants as the present patent application.

The dialogue system 13 provides for the behavior engine 14 indications on the current state of the conversation, namely a state of rest or a state of dialogue. More precisely, the dialogue state corresponds either to a speech transmission state, in which the agent speaks with the interlocutors, or to a listening state, according to which the agent listens to at least one interlocutor.

The dialogue system 13 also makes it possible to determine dialogue information representative of the semantic content of the conversation. Thus, this semantic information informs the behavior engine 14 if the words spoken by the agent 15, or by one of the interlocutors, are of type theme or rheme.

It is recalled that the theme corresponds to an element of a proposition that refers to something that has already been mentioned, while the rheme corresponds to the new element of a proposition.

Again, the operating principle of such a dialogue system 13 is known, and in particular described by D. Sadek, P. Bretier, and F. Panaget, in the document "ARTIMIS: Natural dialogue meets rational agency" (Proceedings of the 15th International Joint Conference on Artificial Intelligence, Nagoya, Japan, 1997), and is therefore not described here in more detail

The behavior engine 14 thus takes into account, according to the preferred embodiment of the invention, this information of the vision system 12 and the dialogue system 13 to animate the agent 15, including his gaze.

Specifically, when the agent is chatting with a group of interlocutors, he must select the target he is to watch during the conversation. According to the invention, this selection is made according to the state of the conversation and a history of the visual behavior of the agent, the history taking into account, for at least one of the interlocutors, successive interactions of the agent with the interlocutor. We can notably notice that this history of visual behavior takes into account the duration of a conversation with a group of interlocutors since its inception.

Indeed, if the invention is implemented in an interactive terminal, for example, interlocutors interacting with the agent are required to change regularly, which resets the history of visual behavior.

More specifically, according to an exemplary embodiment, the vision system 12 detects the interlocutors present in the field of view of the agent 15, and assigns an identifier to each of them. The behavior engine 14 keeps in particular the history of the behavior of each interlocutor i and assigns him an "age". When the interlocutor i enters the field of vision of the agent, it can be considered that there is no history of the visual behavior related to this user (the time of look attributed to the interlocutor i is therefore null and the behavior engine gives it an age equal to 0). Whenever the vision system 12 sends the list of the detected interlocutors to the behavior engine 14, the latter sets the age of the detected interlocutors at "0", and makes the others age, that is to say, increments their age of one unit. Thus, the interlocutors who remain in the field of vision of the avatar retain an age equal to "0", while the interlocutors who are no longer in the avatar's field of vision are aging, that is to say have an age greater than 0. When an interlocutor has an age greater than n after n iterations (with n = 10 for example), the history of the visual behavior related to this interlocutor is destroyed. Thus, if an interlocutor leaves for a few moments of the field of vision of the agent, the history related to this interlocutor i is not reset instantly, but kept a few seconds (corresponding to n iterations, or n detections of the system of vision 12).

Thus, if the interlocutor i returns to the field of view of the agent after a few moments of absence, the history of the visual behavior related to this interlocutor will be preserved. On the other hand, if it leaves during a more important duration of the field of vision of the avatar, the history of the visual behavior related to this interlocutor is destroyed. According to an alternative embodiment, it can also be considered that the history only takes into account the last five minutes of the conversation.

In particular, when the conversation state corresponds to a state of rest, the agent alternately looks at at least one interlocutor, then any target. Any target is chosen at random in the space in front of the agent. Indeed, in such a state of rest, it is natural to let his eyes wander over the various elements of the scene.

When the conversation state corresponds to a dialogue state, the agent alternately looks at at least one interlocutor and then a separate target of each of the interlocutors.

More precisely, when the control step requires the agent to look at one of the interlocutors, a random draw of a value in the interval [θ, l] is performed, and the interlocutor corresponding to the value is selected. drawn.

To do this, we associate to each interlocutor i an interval of probabilities // included in the interval [θ, l], such that the union of all the intervals of probabilities // corresponds to the interval [θ, l], the intervals of probabilities // being disjoint.

The length of the interval of probabilities // associated with the interlocutor i is equal to a score of relevance Sf normalized, determined by the behavior engine 14 and assigned to each of the N interlocutors present in the field of vision of the avatar .

These scores of relevance depend in particular on the distance of the interlocutor i to the agent 15, and on the history of the visual behavior of the agent, that is to say on the viewing time that the agent 15 has. already devoted to watching the user i.

In particular, this relevance score can be calculated in the following manner, for each interlocutor i among the group of N interlocutors:

Sd, = - ^

J = I St ₁ = - ^ r - ¹ for t / ≠ 0, and St / = 1 for t / = 0;

J = I with:

J / the distance between the interlocutor i and the agent; Sdi the score related to this distance; - t / the glancing time already assigned to the interlocutor i;

St; the score related to this time;

N the number of interlocutors present in the agent's field of vision at the time of the evaluation.

The total score S; is then calculated simply by weighting the two scores SJ / and St /, according to the values a and b, according to the relative importance between the look time and the distance, then is standardized: _σ _ a.Sd _ι + b. St _ι - _ S ₁

J = I with α e [θ, l], έ> e [θ, l], and a + b = 1.

In other words, the values a and b make it possible to attribute more or less importance to the criterion of viewing time or distance. We can choose for example a = 0.5 and b = 0.5, which means that we give equal importance to both criteria.

If we want to give more importance to the relative positions of the interlocutors with respect to each other and with respect to the agent (for example in the case of a large group of interlocutors), we increase the value of a, at the expense of the value of b.

We note in particular that when t / = 0, we have St / = 1. Therefore, when an interlocutor arrives during a conversation, his score St / is high, and this new interlocutor is more likely to be watched by the conversational agent.

The different intervals of probabilities associated with each of the interlocutors i may in particular be determined according to the following algorithm: MinValue = 0

For each user i

BOrHeMm ₁ = BorneMin

TerminalMax _t = BOrHeMm ₁ + S ₁

TerminalMin = TerminalMax _t End for

For example, if the caller was watched for 4 seconds (t ₍ = 4) and the caller was watched for 2 seconds (t; = 2), considering that the callers i and j are exactly the same distance ( d _ι = d,), we obtain:

Sώ '= -9, Su = ^ι ^ι ± -o ,, SSij == 00A, 422 ,, SSDD _j J j == ± - 9 ,, SSttJ J j == I- ->, SJ ₁ = 0.58 with a = b = 0.5.

The interval of probabilities of the interlocutor i thus determined is the interval I ₁ = [θ; O, 42 [and the interval of probabilities of the interlocutor j is the interval / _; - = [θ, 42; l].

It can notably be noted that these intervals of probabilities are constantly updated (for example every 10 ms approximately).

Thus, the position of the target is reevaluated regularly, which allows to control the gaze of the agent according to a possible movement of the target, and further increases the realism of its visual behavior.

As indicated above, when it is necessary to choose one of the interlocutors to look at according to the state of the conversation, one randomly draws a value between 0 and 1 and selects the interlocutor whose associated probability interval contains this value.

In the previous example, if the value 0.5 is drawn, the selected target corresponds to the interlocutor j whose probability interval /, = [θ, 42; l] contains the value 0.5.

When the visual behavior control step requires the agent to look at a different target of each of the interlocutors, ie to look at a place where he is sure there are no interlocutors , any target is selected during the selection step, then a check is made to verify that this target is sufficiently far from each of the interlocutors taken one by one.

If this criterion is not respected, a new target is selected, and the control of the distance criterion is again implemented, until this criterion is respected. To do this, the behavior engine can for example implement the following algorithm: Found = false As long as not found

Target = draw a target randomly Suffisament_loin = true

For each user i

If distance (target, i) <DistanceMin

Sujfisament_loin = false Finsi Fin for

If sufβsament_loin = true

Found = true End if

As long as Moreover, when the state of the conversation corresponds to a state of dialogue, the step of controlling the visual behavior of the conversational agent takes into account a dialogue information representative of the semantic content of the conversation, and allows to increase the frequency of the glances towards a interlocutor when the semantic content is of rheme type, compared to the frequency when the semantic content is of type theme. Thus, according to an exemplary embodiment of the invention: when the conversation state is a state of rest, the conversational agent glances in a random direction at a frequency greater than one second; - when the conversation state is a dialog state corresponding to a listening state, the agent throws intermittent glances towards the interlocutor and to a place where there is no interlocutor at a frequency greater than one second; when the conversation state is a dialog state corresponding to a speech transmission state, the agent pauses intermittently to the other party and to a place where there is no other party. More precisely, at least at the beginning of a rheme state

(corresponding to the issuing of a proposal containing a new element), the agent glances at the interlocutor followed by intermittent glances close (less than one second) to the interlocutor, and in a theme state (corresponding to the issuing of a proposal referring to an element already mentioned) the agent throws intermittent glances towards the interlocutor and to a place where there is no interlocutor at a higher frequency to a second.

One can thus consider that there are five states of conversation, by counting the substates of the state of dialogue: state of rest, state of emission of speech, state of listening, state of theme and state of rheme.

These five states can in particular be decomposed into sub-states corresponding to the beginning of the state, to the current state, and to the end of the state. These substates are as follows:

Start: the device "enters" into the state, that is to say for example that it goes from a state of rest to a state of dialogue. He must often perform a particular task; - In_cours: the device "iterate" in the state, each time taking a new decision;

End: the device "exits" from the state, that is to say returns for example from a state of dialogue to a state of rest.

As illustrated in FIG. 2, the behavior engine 14 can thus be realized in the form of a controller that makes a decision every time when enters a state, among the listening states 21, rest 22, speech transmission 23, rheme 24 or theme 25, each state being decomposed into three sub-states beginning 221, en_cours 222, and end 223 .

As described above, this decision corresponds to the selection of a target that the conversational agent must look at: either the target corresponds to an interlocutor, and the behavior engine 14 determines the interlocutor to look at, according to the technique of the intervals of probabilities previously described. This decision is called "Look at the user" later; or the target is arbitrary, and the behavior engine 14 selects a random target in the field of view of the conversational agent. This decision is called "Look_where_where" later; or the target is distinct from one interlocutor, that is to say, the behavior engine 14 chooses as a target a place where it is sure that there are no interlocutors. This decision is called

"Look_any_where_on_using_one_user" thereafter. Thus, according to an exemplary embodiment of the invention, the behavior engine 14 controls the 3D conversational agent according to the state of the conversation and the history of the visual behavior of the agent, so that: - when the state of conversation corresponds to a state of rest, the duration of glances varies between 4 and 6 seconds: o at the beginning of the state of rest, the agent does not carry out any action; o During the "en_cours" sub-state, the agent alternates between

"Look at the users" and "Look at anywhere"; o at the end of the state of rest, the agent does not perform any action; when the conversation state corresponds to a listening state, the duration of the glances varies between 5 and 8 seconds: o at the beginning of the listening state, the agent "Look at the user s"; o During the substage "en_cours" the agent alternates between "Regarde_the_Users" and "Look_nord_where_sauf_ a user "; o At the end of the listening state, the agent does not perform any action; when the state of conversation corresponds to a state of speech emission, the duration of the looks varies between 4 and 6 seconds: o at the beginning of the state of emission of speech, the agent

"Look at the users"; o During the "in progress" subreport, the agent alternates between "Look at the user" and "Look at any_where_not_a user"; o At the end of the speech transmission state, the agent "Regarde_n

_anywhere_not_a_user "; when the state of conversation corresponds to a state of rheme, the duration of the glances varies between 0,5 and 1 second: o at the beginning of the state of rheme, the agent does not carry out any action; o During the "en_cours" sub-state, the agent alternates between

"Look at the users" and "Look at any_where_on_your_user"; o At the end of the rheme state, the agent does not perform any action; When the conversation state matches a theme state, the duration of the looks varies between 1 and 4 seconds: o At the beginning of the theme state, the agent "Look at the users"; o During the "In_current" subreport, the agent alternates between "Watch_Users" and "Look_anywhere_on_the_user"; o At the end of the theme state, the agent does not perform any action.

The invention thus proposes a technique of animation of a three-dimensional agent making it possible to produce the behaviors necessary for a good management of the dialogue. This technique makes it possible to manage correctly the turns of words, and to emphasize the essential elements of the dialogue by looking more or less intensely the interlocutors depending on the semantic content of the conversation (for example, new elements can be highlighted by a brief glance).

During a conversation between the conversational agent and a group of interlocutors, the invention thus allows a scanning of the gaze of the agent of a main interlocutor to at least one secondary interlocutor, the agent being able to look at the interlocutor for a predetermined time greater than the time during which it looks at the secondary speaker.

In addition, according to a preferred embodiment of the invention, biological constraints are introduced so as to produce an even more realistic animation of the avatar, in particular by coordinating the head and the eyes in order to recreate the vestibulo ocular reflex.

Thus, when the target that the avatar must look at is determined, the behavior engine 14 calls a subsystem of automata that makes it possible to enslave the neck and the eyes by modeling the vestibulo ocular reflex. This reflex implies that, the eyes being much faster than the neck, they point first on the target before the head has turned. When the head is facing the target after rotation of the neck, the eyes return to their original position.

FIG. 3 thus has a curve 31 illustrating the degree of rotation of the eyes as a function of time, and a curve 32 illustrating the degree of rotation of the neck as a function of time.

This phenomenon can be modeled in particular by using different angular velocity curves for the rotation of the eyes 41 and the rotation of the neck 42, as illustrated in FIG. 4. More precisely, these velocity curves make it possible to choose, for the current time step. , an angular velocity (abscissa) as a function of the angular distance (ordinate) remaining to be traveled.

The functions used are of the form:

with the parameters a, b and c to be set for each curve. For example, for the angular velocity curve 41 associated with the rotation of the eyes, the values of the parameters α, b and c are respectively of the order of 500 degrees ^-1 , 8 and 1, and for the speed curve angular 42 associated with the rotation of the neck, the values of the parameters a, b and c are respectively of the order of 20 deg.s ^"1 , 6, 1.

In particular, if we consider that the animation to be modeled is the positioning of the head of the agent in front of a target, the positioning of the eyes of the agent can be considered as a sub-task, and an automaton enslaving the eyes can be called hierarchically by an automaton enslaving the head.

In particular, with reference to FIG. 5, there is shown an articulated chain representative of the head of the avatar and the articulations between the different elements of the head. This chain comprises at the base a root 51, corresponding for example to the vertebral column, articulated with the neck 52, itself coordinated with the center of the eyes 53, the right eye 54, or the left eye 55.

In order to enslave the avatar, it is therefore necessary to calculate the rotations of the neck and each of the eyes independently to model the convergence, when the head of the avatar rotates to follow a look at a target 56. In particular, the following notations are defined:

Root 51: base of the articulated chain;

RacineCou 511: vector connecting the root 51 to the center of rotation of the neck 52;

CouCentreYeux 521: vector connecting the center of rotation of the neck 52 to the center of the eyes 53;

CenterYeuxOeilDroit 531: vector connecting the center of the eyes 53 to the right eye 54;

CenterYeuxOeilGauche 532: vector connecting the center of the eyes 53 to the left eye 55; Target 56: position of the target 56; Radius: normalized vector connecting the center of the eyes 53 to the target 56; α: angle between the vector opposite to CouCentreYeux 521 and the Ray vector; - Coudble: vector connecting the center of rotation of the neck 52 and the target 56.

The following is a method for determining the rotation of the neck and rotation of the eyes of an avatar, to animate the agent fluidly and realistically by introducing biological constraints. The rotation of the neck requires moving the facial plane until it is perpendicular to a ray from the center of the eyes towards the target.

Using the notations defined above, we must calculate the rotation to move the Coudble vector from its starting position

(position of rest) at a Coudble 'arrival position, according to which the facial plane is perpendicular to the radius starting from the center of the eyes towards the target.

It can be noted in particular that the initial Coudble vector must be calculated at each instant, since it depends precisely on the distance from the neck 52 to the target 56.

It is therefore proposed to calculate the vector Coudble 'to touch the target 56 (arrival vector), then calculate the rotation transforming the vector

Coudble in Coudble '.

The calculation of the CouCible vector is in fact to calculate the modulus of the Ray vector.

It is also possible to directly calculate the modulus of the CouCible vector since this module is invariant by rotation.

Thus, knowing the modulus of the CouCentreYeux vector 521 and the value of the angle α, and using the Al Kashi theorem, we can calculate the radius vector module:

CouCture ² - CouCentreYeux ² + Radius ² -2. CouCentreYeux. Radius, cosα be: Radius ² - Radius (2.CouCentreYeux.cosa) + (CouCentreYeux ² - CouCible ² ) = 0 One deduces the value of the Ray module by solving this equation of the second degree and by taking the appropriate root.

The CouCible vector can then be constructed by vector addition. More precisely, as illustrated in FIG. 6, the rotation making it possible to transform the starting CouCible vector into a CouCible 'arrival vector corresponds to a composition of rotations P ^ and P ^x , respectively around the axes Oy and Ox', the axis Ox 'being the result of the application of the rotation P ^ to the axis Ox. Thus, to calculate these rotations P ^ and P ^x, is simply performed by projecting the Oxz planes t _e ^ ^z CouCible vectors and CouCible.

Once these rotations calculated, we can then translate them into the command that suits the behavior engine.

In addition, once the rotation of the neck determined, the calculation of the rotation of the eyes is made more simply, since this rotation is direct.

Thus, one begins by applying the rotation of the neck thus calculated to the vectors CentreYeuxOeilLauche 532, CenterYeuxOeilDroit 531, and axes Oy and Ox to transform them into, respectively, CenterYeuxOeilGauche ',

CenterYeuxOeilDroit ', Oy' and Ox. "More precisely, according to the eye considered, we add the vector

CenterYeuxOeilLeft 'or CenterYeuxOeilRight' in the center of the eyes 53.

Then, similarly, the respective rotations are calculated as a composition of the rotations, respectively " ^y and" around the axes Oy 'and Ox'", the axis Ox 'being the result of the application of the rotation" ^y ". to the Ox axis ".

Again, once these rotations calculated for each eye, we can then translate them into the command expected by the behavior engine.

Thus, the behavior engine can animate the agent so that his eyes focus first on a selected target, then his entire head rotates, allowing the eyes to return to their original position, starting from these determined rotations and angular velocity curves to simulate human behavior.

Finally, the control of the visual behavior of the conversational agent according to the invention makes it possible to improve the interactions between the agent and the group of interlocutors, by realistically reproducing the look of real people during a conversation.

We now present, in relation to Figure 7, the hardware structure of a 3D conversational agent animation system implementing the method described above. Such an animation system comprises a memory M 71, a processing unit 70 P, equipped for example with a microprocessor μP, and driven by the computer program Pg 72.

At initialization, the code instructions of the computer program 72 are for example loaded into a RAM memory before being executed by the processor of the processing unit 70. The processing unit 70 receives as input a piece of information. vision 73, representative of the interlocutors present in the field of view of the agent, and information representative of the content of the conversation 74. The microprocessor μP of the processing unit 70 implements the steps of the animation method described previously, according to the instructions of the program Pg 72. The processing unit 70 outputs a representation of the conversational agent.

ANNEX 1

O. E. Torres, J. Cassel, and S. Prévost

"Modeling Gaze Behavior as a Function of Discourse Structure"

Workshop on Human-Computer Conversations, Bellagio, Italy, 1997 C. Pelachaud, V. Carofiglio, B. De Carolis, and F. De Rosis "Embodied Contextual Agent in Information Delivering"

AAMAS, Bologna, Italy, 2002 D. Robinson

"The mechanics of human saccadic eye movements"

Journal of Physiology, vol. 174, pp. 245-264, 1964 S. Park Lee, J. Badler, and N. Badler "Eyes Alive"

Proceedings of Siggraph, San Antonio, USA, 2002

Claims

A method of animating a three-dimensional conversational agent (15) in a virtual or mixed reality environment, implementing a step of controlling the visual behavior of said agent (15) during a conversation with at least two interlocutors , characterized in that said control step implements a step of selecting a target that said agent (15) looks, taking into account:

at least one history of said visual behavior, said history taking into account, for at least one of the interlocutors, successive interactions of said agent with said interlocutor;

and at least one state of said conversation, called a conversation state, said conversation state belonging to a set of states comprising at least one dialogue state and a rest state.

2. Animation method according to claim 1, characterized in that said selection step also takes into account a distance from said agent (15) to at least one of said interlocutors.

3. Animation method according to any one of claims 1 and 2, characterized in that during said control step, controlling at least one of the parameters of said visual behavior belonging to the group comprising: - a frequency of blows of eye of said agent; a duration of the glances of said agent.

4. Animation method according to any one of claims 1 to 3, characterized in that said history takes into account, for at least one of said interlocutors, a time during which said agent has already looked at said interlocutor, called time of look.

5. Animation method according to any one of claims 1 to 4, characterized in that during said selection step, is associated with each of said interlocutors a range of probabilities // in the range [0,1] , such that the meeting of said intervals of probabilities If corresponds to the interval [0, 1], said intervals of probabilities If being disjoint.

6. Animation method according to claim 5, characterized in that the length of said interval of probabilities // associated with the interlocutor i is equal to a score of relevance S; standardized, determined according to the equation:

J = I where: a.Sd, + b.St,

S, = 'a + b sa. ₌ Λ i

Σ J = I ", St ₁ = for tι ≠ 0, and St; = 1 for t / = 0;

with:

N the number of interlocutors; d [the distance between the interlocutor i and said agent; t ₍ the gaze time associated with the interlocutor i, - a ≡ [0, l], b ≡ [0, l], a + b = 1.

7. Animation method according to any one of claims 1 to 6, characterized in that: when said state of conversation corresponds to said state of rest, any target is selected, and said control step requires said agent to throw a a glance in the direction of said target at a frequency greater than a first predetermined threshold; when said conversation state corresponds to said dialogue state, alternately selecting a target corresponding to one of said interlocutors and a separate target of each of said interlocutors, and said checking step forces said agent to take a look in the direction of each of said targets at a frequency greater than a second predetermined threshold.

8. Animation process according to any one of claims 5 to 7, characterized in that when said control step requires said agent to look at one of said interlocutors, a random draw is made of a value included in the interval [θ, 1], and in that said selected target is the interlocutor whose associated probability interval If contains said value.

9. Animation method according to any one of claims 3 to 8, characterized in that when said conversation state corresponds to said dialogue state, said control step takes into account a dialogue information, representative of the semantic content of said conversation, and makes it possible to increase said frequency when said semantic content is of rheme type, with respect to the frequency when said semantic content is of type theme.

10. A device for animating a three-dimensional conversational agent (15) in a virtual or mixed reality environment, comprising means for controlling the visual behavior of said agent (15) during a conversation with at least two interlocutors, characterized in that said control means comprise means for selecting a target that said agent (15) looks from:

11. Computer program product downloadable from a communication network and / or stored on a computer readable medium and / or executable by a microprocessor, characterized in that it comprises program code instructions for the implementation of steps of the method of animating a three-dimensional conversational agent (15) according to any one of claims 1 to 9.