WO2024127258A1 - Continuous expressive behaviour in embodied agents - Google Patents

Continuous expressive behaviour in embodied agents Download PDF

Info

Publication number
WO2024127258A1
WO2024127258A1 PCT/IB2023/062570 IB2023062570W WO2024127258A1 WO 2024127258 A1 WO2024127258 A1 WO 2024127258A1 IB 2023062570 W IB2023062570 W IB 2023062570W WO 2024127258 A1 WO2024127258 A1 WO 2024127258A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
node
blend
animation
state
Prior art date
Application number
PCT/IB2023/062570
Other languages
French (fr)
Inventor
Jo HUTTON
Pavel SUMETC
Tiago RIBEIRO
Tim Wu
Hazel WATSON-SMITH
Original Assignee
Soul Machines Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Soul Machines Limited filed Critical Soul Machines Limited
Publication of WO2024127258A1 publication Critical patent/WO2024127258A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2213/00Indexing scheme for animation
    • G06T2213/12Rule based animation

Definitions

  • Embodiments of the invention relate to computer generated animation. More particularly but not exclusively, embodiments of the invention relate to animating continuous expressive behaviour in embodied agents.
  • Animated digital characters that move and gesticulate appropriately within a given context are useful in a wide range of applications.
  • this class of character movement includes full body motion generated in synchronization with spoken text, music, locomotion direction or other context.
  • an animation generation system Given an input signal, for example spoken text, an animation generation system synthesizes output signals in terms of motion descriptors, for example, joint parameters for a character rig.
  • An Animation Generator maps input signals to motion descriptors.
  • the first approach is to build a rule based system, where expert knowledge is used to specify a computational model that maps input signals to motion descriptors.
  • the rules may take many forms and levels of complexity, including the ability to provide controlled arbitrariness into different aspects of the generated motion, to use classical artificial intelligence methods to solve constraints or generate descriptors, and classical computer animation methods to generate motion.
  • the second option is to follow a data-driven approach.
  • the first is based on deep learning techniques where a trained machine learning model is the core of the AG and once trained on a dataset, it can generate output animations.
  • the second approach is a motion-matching technique where the AG consists of a preprocessed animation database and a motion matching system which selects particular animation clips from the database and stitches them into a continuous output signal. Both these approaches utilize a training database but consume it differently and have distinct characteristics.
  • An AG implementing motion matching could be a real-time or offline generator.
  • real-time AG is capable of generating motion descriptors on the fly (real-time) with high performance and when an input temporal signal is very short or even just a single time step.
  • Offline AG do not have restrictions on timing and performance and therefore usually have
  • Prior methods to implement such an AG system have defined a set of parameters that control motion to annotate animations in a database with these parameters, and then, during the animation generation phase, given a set of control parameters, search for animation clips having values close to these parameters and compose the resulting animation by stitching these clips in a continuous way.
  • gesture motions may include things like wrist trajectory, elbow locations, feet locations, etc.
  • the control parameters here are body pose-related variables, on which the AG internally operates.
  • Drawbacks of this approach are that the set of control parameters is usually large, it requires tedious annotation of every frame in the animation database with these parameters, and generated motion from the animation clips gestures could become meaningless.
  • the proposed method is an improvement to motion matching for real-time AG. It is an object of the invention to improve continuous expressive behaviour in embodied agents, or to at least provide the public or industry with a useful choice.
  • Figure 1 shows a high level comprehension of the incorporation of the CAG using an SBT within a general embodied agent system
  • Figure 2 shows an example SBT
  • Figure 3 illustrates one same channel of two example animations A and B;
  • Figure 4 illustrates an example case of transitioning between two animations A and B
  • Figure 5 shows an example state machine for an SSB
  • Figure 6 shows a pose buffer mechanism
  • a first invention provides animation states for dynamically computed motion blend-tree sources.
  • a sequencing method for animation sources based on source segmentation metadata is disclosed, based on a “stateful blend tree”.
  • the sequencing method is based on a multi-layer state machine where each layer corresponds to an animation source and input parameters are used to trigger and drive the animation sources.
  • the sequencing method uses the metadata of the animation sources to drive the individual state machines in a collective operation, that allows to deliver motion composed from the collection of state-machine driven animation sources, so that the resulting motion is responsive to the driving input parameters while appearing continuously sound and stylistically consistent, and critical to playback time constraints.
  • a second invention provides a pose buffer for motion blend-trees that performs non-finite and non-deterministic composition of motion sources.
  • the pose-buffer can be used as a container for algorithmic, and generative methods of computing blended motion between multiple animation sources. It can be used as an independent mechanism, or as a blend-node method of an animation blend-tree or of other methods that compose motion based on sequences of multiple and independently activated motion sources.
  • pose-buffer Some uses for the pose-buffer are inertialization, motion matching, and other methods of motion and in-between motion generation either based on simulation algorithms or on data-driven approaches. It extends the timeline of blending sources into a non-finite dynamic time frame that is independent of the timelines of the blending sources.
  • SUBSTITUTE SHEET (RULE 26) [11]It allows blending of both generative (non-finite) motion sources and pre-designed (finite) motion sources together while using dynamic or generative methods to compute the transitory result of blended motion sources independently of the dimension and fmitude of each source’s time-domain.
  • Discrete input signals are provided at any time to the SBT for a particular subset of motion source identifiers each of which correspond to a Stateful Source Node (SSN).
  • the input signals may contain, among other parameters, play and stop triggers, and modulation weights.
  • For each motion source referenced in the input signals if a play trigger is present, it triggers a request to the corresponding SSN to start playing the motion source.
  • the requested motion source may be immediately played from the beginning or placed on a waiting queue with a timeout.
  • a change in any of the SSN’s states may cause a change to the state of other SSNs.
  • a Pose-Buffer is used on a Stateful Blend Node (SBN) which acts as the parent-node of SSNs to automatically generate in-between motion based on any collective state change in order to guarantee continuously correct motion.
  • SBN Stateful Blend Node
  • a Continuous expressive behaviour AG is a real-time animation generation system operating on a per-frame basis that takes as input a set of both continuous and discrete signals associated with particular target expressive behaviour animations and outputs a continuous stream of motion descriptors that result in a continuous playback of the triggered and modulated expressive behaviour animations with fluid and natural transitions between them.
  • the CAG is used to drive speech-driven behaviour, i.e., gesticulation
  • the performed gesture animations are additionally synchronized and matched to the content of a given speech.
  • the input signals may specify triggers of actions (e.g. start, stop) or modulation signals.
  • agent generated elsewhere in the embodied agent’s system and may relate to aspects such as, but not limited to, speech-driven behaviour, locomotion systems, user and world interactive behaviour such as gazing or deictic gesturing, to the agent’s personality, its emotional state, and others.
  • CAG provided comprising:
  • SUBSTITUTE SHEET (RULE 26) an animation database with stroke metadata; a Stateful Blend-Tree system; and a motion transition mechanism, further designated as Pose-Buffer.
  • the animation database contains a set of named animations and the timeline of each animation may be annotated with time points corresponding to relevant moments of the animation timeline. Such time points may be used to represent a single relevant moment of the animation, or to specify a time range for a given relevant portion of the animation.
  • Animations may be manually annotated or annotated in any other suitable manner (e.g. partly or fully automated using machine learning).
  • a gesture animation this corresponds to the gesture motion representation where a single gesture can be described as consisting of a number of consecutive movement phases: preparation phase, stroke phase and release phase.
  • the stroke phase of a gesture is the most relevant portion of the motion to convey its meaning and may contain a specific moment in time corresponding to its stroke point, which is the moment of the gesture that is used to align it in time with emphasis and rhythmic points of the corresponding speech.
  • the regions that precede and succeed the stroke phase consist of predesigned motion that blends into and out of the stroke phase from a resting pose.
  • the Stateful Blend-Tree system is a system that follows the structure of a blend-tree hierarchy of nodes.
  • Each non-leaf node is a blending node which computes the blended result of the composition of motion that is output from its child nodes using a given blending operation and parameters (e.g. animation blending weights, or parameters required in the posebuffer).
  • Blending operations may include any operations known in the art of animation such as additive, multiplicative, dynamic average, animation mixing. Blending operations may use techniques disclosed in W02022107087A1 and W02020089817A1 by the present applicant, and incorporated by reference herein.
  • each leaf node may contain a source or generator of motion which outputs motion descriptors, which is further designated as a motion source. All nodes may process any number of motion descriptor signals, also referred to as channels, with the condition that the channels of a given node are a subset of the channels of its parent node.
  • SUBSTITUTE SHEET (RULE 26) [20]
  • the SBT takes as input a set of named signals which may be continuous (e.g. blending parameters with time-varying property) or discrete (certain triggers or events, e.g. start or stop). These signals may be used as triggers to request a given motion source to start or stop playback. These signals may additionally be used as modulators of the blending operations.
  • An example of an input signal is a weight parameter of a blend node.
  • the input signals of the SBT are generated externally to the CAG in components that may perform decision-making, action selection and behaviour modulation roles, such as, but not limited to, dialogue systems, emotion systems, gazing systems, personality and behaviour stylization systems, weight balancing systems, and other partial or full body behaviour control systems.
  • the output of the SBT is the set of motion descriptors that correspond to the output of the Root Blend Node and may be used to compute the pose of an avatar.
  • Figure 1 shows a high level comprehension of the incorporation of the CAG using an SBT within a general embodied agent system.
  • An embodiment control system is anything driving the skeleton of the virtual entity or character. Examples include a 3D Tenderer and visualizer, a game engine, a robot controller, or any other system that takes in motion descriptors and applies them to the rig of a character in order to present it animated in some environment.
  • the SBT controls both the blending and playback logic of motion sources, namely, when to start, stop or otherwise control the playback of a particular motion source.
  • it includes the logic to compute the final motion that results from the composition of the motion of individual motion sources based on their playback state, the input modulation signals, the blend-tree hierarchy, and the parameters that configure the blending operations. It oversees the state of all motion sources in the database.
  • a plurality of SBTs may be provided to control the blending and playback of different components of a virtual agent or digitally-driven entity.
  • a leaf node of an SBT may contain a mechanism that acts as a state machine. Such nodes are designated as Stateful Source Nodes or SSNs. Any node that is parent to a SSN is further designated as Stateful Blend Node or SBN.
  • a leaf node of an SBT may also act as a non-stateful source node, in which case it acts as an SSN with one single and perpetual state and may be designated solely as Source Node.
  • a SSN contains at least one state which is further referred to as Default state, and may define rules that cause it to switch to a different state.
  • Each SSN may define a state or set of rules that discriminate whether the node is Active or Inactive, or in any other particularly relevant state.
  • the output of the SBT is computed at a regular or semi-regular interval designated as a timestep.
  • the input parameters are evaluated and applied to each node, and the output of each node is computed following the computation of its children nodes, for example, in the order of a depth-first search or via an alternative mechanism.
  • Timesteps may be executed at a rate that is regular and fast enough so that the sequence of output poses may result in the continuous motion of an avatar, for example, within the range of 30 & 100 Hz.
  • Blend Tree may contain a Default Source which produces output in case no blend node is Active.
  • Figure 2 shows an example SBT. This is one single example configuration of an SBT meant to illustrate the wide variety of combinations of different node types within the hierarchy of an SBT.
  • the Pose-Buffer mechanism takes as input a stream of motion descriptors and outputs a stream of motion descriptors based on a set of operations that are configured into the PB.
  • the PB may be used in any blend node including SBNs, from which it is provided as input, the resulting blended motion of the individual motion of its child nodes, and for which the SBN’s output corresponds to the output of the PB.
  • this node was designate this node as a Pose-Buffer Node or PBN.
  • the Pose-Buffer may be configured to automatically detect discontinuities in its input stream of motion descriptors and to run a given procedural motion generator that automatically computes in-between motion that smooths it and removes the discontinuity.
  • the ‘matched’ function comprises:
  • the ‘transition’ function comprises:
  • the main purpose of the Pose-Buffer is to support the computation of a continuous blended motion of the PBN’s child nodes upon any type of changes to the collective state of such children. Therefore the individual nodes are not required to compute transition states or transition animations. Having a continuous signal renders a fluid and natural motion.
  • the SBN may be configured with rules to support and enforce temporary exclusive playback within its child SSNs.
  • One such rule which we designate as Exclusivity, enforces that only a given subset of the child nodes may be performing at a given time.
  • SUBSTITUTE SHEET (RULE 26) [43]
  • the SBN may be configured with rules defining that Exclusivity applies only to a given subset of the states of the child SSNs. These rules include logic that specify how a child SSN behaves whenever it is active in a non-exclusive state. Namely, while in a non-exclusive state, the output of the source motion may be ignored, or the state may be interrupted, in order to give precedence to another sibling node which is in, or switching into, an exclusive state.
  • the SSN may be configured with rules specifying that when it is triggered to start playing, while it is already playing, that it allows it to self-interrupt and loop, i.e., to restart its motion source and keep on playing. We designate these rules as Self-Interruption.
  • These discontinuities may be solved by adding a Pose-Buffer mechanism into the SBN in order to automatically compute and output a continuous stream of multi-channel motion descriptor values.
  • a given SSN contains a motion source with metadata specifying a StrokeStart time and a StrokePoint and a StrokeEnd time.
  • the input play trigger signals are generated externally with a timing such that the motion source is requested to start playing at the right moment so that it will reach its StrokePoint in sync with the timing of a given speech-directed marker such as a specific phoneme of a word.
  • Figure 3 illustrates one same channel of two example animations A and B, each used as the motion source of two sibling SSNs which are children to a PBN along with an arbitrary number of additional siblings.
  • each of the animations there are 5 relevant points described in the animation metadata, namely, and in addition to the trivial AnimStart and AnimEnd, we see markers for the StrokeStart, StrokePoint, and StrokeEnd.
  • the x-axis in the animation cur ⁇ c represents time
  • the y-axis in an animation curve may represent the value or a partial value of a motion descriptor or any suitable animation attribute over time, for example, an object’s position, rotation or other animatable quality.
  • Figure 4 illustrates an example case where a sentence “This is a sentence with two animations” is performed by the embodied agent who speaks the sentence and performs matching gestures at
  • SUBSTITUTE SHEET (RULE 26) the same time.
  • the embodied agent system provides inputs to the CAG that trigger each animation to play with such a timing that A. StrokePoint will align with the moment of the agent’s speech corresponding to the second letter of the word “sentence”, and that B. StrokePoint will align with the fifth letter of the word “animations”, as described in the figure using the markup “s ⁇ e,A>ntence” and “anim ⁇ a,B>tion”.
  • the portion of the figure immediately below the sentence shows a layered view of both animations, illustrating how the provided timing will cause an overlap between the ending part of animation A and initial part of animation B.
  • the Pose-Buffer mechanism automatically detects the discontinuity and performs a given inbetweening operation to generate a continuous transition between the point of discontinuity and an undetermined point where the transition-generated motion matches the target motion of B in both position and velocity, each within a given matching threshold. Once the match is achieved, the Pose-Buffer disables the transition computation and proceeds to output the SSN corresponding to B faithfully from its motion source.
  • the transition may either be fully computed in advance, or computed ad-hoc, on a per-frame basis.
  • the latter case allows to continuously modulate, manipulate and warp animation B in any manner, based on additional input signals, while the transition is being computed.
  • a given SSN will produce no output while it is in the Inactive state which is used as the default state.
  • a given SSN contains, in addition to the Inactive state, a Starting state during which it outputs the motion corresponding to the region of the motion source between the first frame and the frame corresponding to StrokeStart, a Stroke state in which it outputs the motion corresponding to the region of the motion source between the frames corresponding to StrokeStart and StrokeEnd, and an Ending state in which it outputs the motion corresponding to the region of the motion source between the frame corresponding to StrokeEnd and the last frame.
  • a given SSN may start running an internal clock when it switches out of the Inactive state, and in this case, on each timestep, it will update the internal clock with the corresponding time that has elapsed since the last timestep.
  • a given SSN contains, in addition to the Inactive, Starting, Stroke and Ending states, a Wait state, into which it switches whenever its internal clock is pointing to the region of the timeline prior to the StrokeStart point, and another sibling has taken exclusivity of playback, or precedence by having been triggered first.
  • the SSN In the Wait state, the SSN will keep on running its internal clock and advancing the timeline without, however, generating an output.
  • the SBNs may be configured with rules to solve concurrent starting conflicts in the case where more than one of the SSNs have switched from the Inactive state to another state at approximately the same time.
  • the conflict solving mechanism may interrupt a currently active SSN, or may disable (switch to Inactive) an SSN that is in the Wait state
  • the SBNs may be configured with rules to enforce Exclusivity as described above.
  • the SBNs may be configured with rules to enforce Self-Interruption as described above.
  • each SSN may be driven by rules that are affected by external input parameters, or by its own internal state management, or by a collective state management which reacts to changes in state of any of its sibling nodes.
  • FIG. 5 An example state machine for SSNs targeted at continuous gesturing animation is illustrated in Figure 5.
  • This state machine conveys an example set of states and transitions, however the SBT may be configured with any alternative state machine, set of states, transitions and rules.
  • Each sibling SSN may use the same state machine, or a different state machine, where the illustrated letters from A to F may represent certain conditions, which may include conditions of both the individual and/or the collective state of the SSNs and/or their parent SBN.
  • Some examples of conditions that may be used in the transitions of the example state machine include, but is not limited to: a trigger signal is active; a sibling node is in the Stroke state; a sibling node is in the Starting state; the node's animation clock has hit the StrokeStart position; etc.
  • SUBSTITUTE SHEET (RULE 26) [62] The full AG system operates in the following order. Discrete input signals are provided at any time to the SBT for a particular subset of motion source identifiers each of which correspond to an SSN.
  • the input signals may contain, among other parameters, play and stop triggers, and modulation weights.
  • For each motion source referenced in the input signals if a play trigger is present, it triggers a request to the corresponding SSN to start playing the motion source.
  • the requested motion source may be immediately played or placed on a waiting queue with a timeout.
  • a change in any of the source node’s states may cause a change to the state of other source motions.
  • a Pose-Buffer is used on the parent-node of source nodes to automatically generate in-between motion based on any collective state change in order to guarantee continuously correct motion.
  • the AG system may operate with parameters that are not linked to the body pose characteristics, but rather with the semantic meaning of the animation being performed, and it is capable of composing the output animation from parts containing the portions of animation that retain their semantic meaning, rather than isolated small clips which individually do not have semantic meaning. Additionally, it does not require training a model or annotating motion characteristics of the animation dataset. Finally, the generated motion is continuous in nature and may be used to represent meaningful expression, including, but not limited to gesticulation, while maintaining the original naturalistic motion.
  • a node containing a given triggered gesture may perform the Starting phase in case the avatar’s current pose was at or near the resting pose, or skip it and start performing directly on the Stroke phase in case its pose was away from the resting pose.
  • the connotation of a beat gesture is the performance-based implication.
  • the connotation for beat gestures being where there is no set interpretation; instead beat choice, size and speed contribute to the perceived personality of the avatar.
  • a straight-forward chop may provide the connotation of confidence and direct/task oriented communication style.
  • the semantic meaning being the set meaning of a symbolic, metaphoric or iconic gesture being communicated. Preserving the semantic meaning is essential for these gestures to facilitate non-verbal communication. For example, a wide two arm sweep can be symbolic of ‘wider’ or ‘larger’.
  • an electronic computing system utilises the methodology of the invention using various modules and engines.
  • the electronic computing system may include at least one processor, one or more memory devices or an interface for connection to one or more memory devices, input and output interfaces for connection to external devices in order to enable the system to receive and operate upon instructions from one or more users or external systems, a data bus for internal and external communications between the various components, and a suitable power supply.
  • the electronic computing system may include one or more communication devices (wired or wireless) for communicating with external and internal devices, and one or more input/output devices, such as a display, pointing device, keyboard or printing device.
  • the processor is arranged to perform the steps of a program stored as program instructions within the memory device.
  • the program instructions enable the various methods of performing the invention as described herein to be performed.
  • the program instructions may be developed or implemented using any suitable software programming language and toolkit, such as, for example, a C-based language and compiler. Further, the program instructions may be stored in any suitable manner such that they can be transferred to
  • SUBSTITUTE SHEET (RULE 26) the memory device or read by the processor, such as, for example, being stored on a computer readable medium.
  • the computer readable medium may be any suitable medium for tangibly storing the program instructions, such as, for example, solid state memory, magnetic tape, a compact disc (CD-ROM or CD-R/W), memory card, flash memory, optical disc, magnetic disc or any other suitable computer readable medium.
  • the electronic computing system is arranged to be in communication with data storage systems or devices (for example, external data storage systems or devices) in order to retrieve the relevant data. It will be understood that the system herein described includes one or more elements that are arranged to perform the various functions and methods as described herein.
  • the embodiments herein described are aimed at providing the reader with examples of how various modules and/or engines that make up the elements of the system may be interconnected to enable the functions to be implemented. Further, the embodiments of the description explain, in system related detail, how the steps of the herein described method may be performed.
  • the conceptual diagrams are provided to indicate to the reader how the various data elements are processed at different stages by the various different modules and/or engines. It will be understood that the arrangement and construction of the modules or engines may be adapted accordingly depending on system and user requirements so that various functions may be performed by different modules or engines to those described herein, and that certain modules or engines may be combined into single modules or engines. It will be understood that the modules and/or engines described may be implemented and provided with instructions using any suitable form of technology.
  • the modules or engines may be implemented or created using any suitable software code written in any suitable language, where the code is then compiled to produce an executable program that may be run on any suitable computing system.
  • the modules or engines may be implemented using, any suitable mixture of hardware, firmware and software.
  • portions of the modules may be implemented using an application specific integrated circuit (ASIC), a system-on-a-chip (SoC), field programmable gate arrays (FPGA) or any other suitable adaptable or programmable processing device.
  • ASIC application specific integrated circuit
  • SoC system-on-a-chip
  • FPGA field programmable gate arrays
  • the methods described herein may be implemented using a general-purpose computing system specifically programmed to perform the described steps.
  • the methods described herein may be implemented using a specific electronic computer system such as a data sorting and visualisation computer, a database query computer, a graphical analysis computer, a data analysis computer, a manufacturing data analysis computer, a business intelligence computer, an artificial intelligence computer system etc., where the computer has
  • SUBSTITUTE SHEET (RULE 26) been specifically adapted to perform the described steps on specific data captured from an environment associated with a particular field.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A sequencing method for animation sources based on source segmentation metadata is disclosed, based on a "stateful blend tree". The sequencing method is based on a multi-layer state machine where each layer corresponds to an animation source and input parameters are used to trigger and drive the animation sources. The sequencing method uses the metadata of the animation sources to drive the individual state machines in a collective operation, that allows to deliver motion composed from the collection of state-machine driven animation sources, so that the resulting motion is responsive to the driving input parameters while appearing continuously sound and stylistically consistent, and critical to playback time constraints.

Description

CONTINUOUS EXPRESSIVE BEHAVIOUR IN EMBODIED AGENTS
TECHNICAL FIELD
[1] Embodiments of the invention relate to computer generated animation. More particularly but not exclusively, embodiments of the invention relate to animating continuous expressive behaviour in embodied agents.
BACKGROUND ART
[2] Animated digital characters that move and gesticulate appropriately within a given context are useful in a wide range of applications. In general, this class of character movement includes full body motion generated in synchronization with spoken text, music, locomotion direction or other context. Given an input signal, for example spoken text, an animation generation system synthesizes output signals in terms of motion descriptors, for example, joint parameters for a character rig. An Animation Generator (AG) maps input signals to motion descriptors.
[3] There are two main approaches to constructing an AG. The first approach is to build a rule based system, where expert knowledge is used to specify a computational model that maps input signals to motion descriptors. The rules may take many forms and levels of complexity, including the ability to provide controlled arbitrariness into different aspects of the generated motion, to use classical artificial intelligence methods to solve constraints or generate descriptors, and classical computer animation methods to generate motion. The second option is to follow a data-driven approach. Within this later option, there are also two main approaches. The first is based on deep learning techniques where a trained machine learning model is the core of the AG and once trained on a dataset, it can generate output animations. The second approach is a motion-matching technique where the AG consists of a preprocessed animation database and a motion matching system which selects particular animation clips from the database and stitches them into a continuous output signal. Both these approaches utilize a training database but consume it differently and have distinct characteristics.
[4] An AG implementing motion matching could be a real-time or offline generator. The main difference is that real-time AG is capable of generating motion descriptors on the fly (real-time) with high performance and when an input temporal signal is very short or even just a single time step. Offline AG do not have restrictions on timing and performance and therefore usually have
SUBSTITUTE SHEET (RULE 26) higher quality. The area of applications for offline AG is covered by animation creation without online interaction components.
[5] Considering AG as a part of a system that drives embodied agents, its purpose could be different due to the specificity of the generated animations. For example, an AG which synthesizes locomotion animation operates with parameters that define locomotion (direction of movement, feet velocity, etc.) and the animation database should be prepared accordingly, whereas an AG generating arm gesturing motion requires arm-related parameters to be controllable in the animation clips in order to perform motion matching. Locomotion motion matching is considered to be less ambiguous with far less controllable parameters compared to gesture AG. In other words, the AG structure and functionality depend on the animation nature to be synthesized and the context of animations comprising the animation database. For example, locomotion in its nature has periodic motion while gesturing has another structure characterized by emphasis points. As a result, composing continuous output animation from locomotion clips follows a different logic compared to composing continuous gestural animation.
[6] Prior methods to implement such an AG system have defined a set of parameters that control motion to annotate animations in a database with these parameters, and then, during the animation generation phase, given a set of control parameters, search for animation clips having values close to these parameters and compose the resulting animation by stitching these clips in a continuous way. In the context of gesture animation, for example, gesture motions may include things like wrist trajectory, elbow locations, feet locations, etc. The control parameters here are body pose-related variables, on which the AG internally operates. Drawbacks of this approach are that the set of control parameters is usually large, it requires tedious annotation of every frame in the animation database with these parameters, and generated motion from the animation clips gestures could become meaningless.
OBJECT OF INVENTION
[7] The proposed method is an improvement to motion matching for real-time AG. It is an object of the invention to improve continuous expressive behaviour in embodied agents, or to at least provide the public or industry with a useful choice.
SUBSTITUTE SHEET (RULE 26) BRIEF DESCRIPTION OF DRAWINGS
Figure 1 shows a high level comprehension of the incorporation of the CAG using an SBT within a general embodied agent system;
Figure 2 shows an example SBT;
Figure 3 illustrates one same channel of two example animations A and B;
Figure 4 illustrates an example case of transitioning between two animations A and B;
Figure 5 shows an example state machine for an SSB; and
Figure 6 shows a pose buffer mechanism.
SUMMARY OF INVENTION
[8] A first invention provides animation states for dynamically computed motion blend-tree sources. A sequencing method for animation sources based on source segmentation metadata is disclosed, based on a “stateful blend tree”. The sequencing method is based on a multi-layer state machine where each layer corresponds to an animation source and input parameters are used to trigger and drive the animation sources. The sequencing method uses the metadata of the animation sources to drive the individual state machines in a collective operation, that allows to deliver motion composed from the collection of state-machine driven animation sources, so that the resulting motion is responsive to the driving input parameters while appearing continuously sound and stylistically consistent, and critical to playback time constraints.
[9] A second invention provides a pose buffer for motion blend-trees that performs non-finite and non-deterministic composition of motion sources. The pose-buffer can be used as a container for algorithmic, and generative methods of computing blended motion between multiple animation sources. It can be used as an independent mechanism, or as a blend-node method of an animation blend-tree or of other methods that compose motion based on sequences of multiple and independently activated motion sources.
[10] Some uses for the pose-buffer are inertialization, motion matching, and other methods of motion and in-between motion generation either based on simulation algorithms or on data-driven approaches. It extends the timeline of blending sources into a non-finite dynamic time frame that is independent of the timelines of the blending sources.
SUBSTITUTE SHEET (RULE 26) [11]It allows blending of both generative (non-finite) motion sources and pre-designed (finite) motion sources together while using dynamic or generative methods to compute the transitory result of blended motion sources independently of the dimension and fmitude of each source’s time-domain.
[12]Discrete input signals are provided at any time to the SBT for a particular subset of motion source identifiers each of which correspond to a Stateful Source Node (SSN). The input signals may contain, among other parameters, play and stop triggers, and modulation weights. For each motion source referenced in the input signals, if a play trigger is present, it triggers a request to the corresponding SSN to start playing the motion source. Depending on the state of the system, the requested motion source may be immediately played from the beginning or placed on a waiting queue with a timeout. A change in any of the SSN’s states may cause a change to the state of other SSNs. A Pose-Buffer is used on a Stateful Blend Node (SBN) which acts as the parent-node of SSNs to automatically generate in-between motion based on any collective state change in order to guarantee continuously correct motion.
DETAILED DESCRIPTION
[13]The presented AG system is a real-time continuous animation generator for an embodied agent. A Continuous expressive behaviour AG (CAG) is a real-time animation generation system operating on a per-frame basis that takes as input a set of both continuous and discrete signals associated with particular target expressive behaviour animations and outputs a continuous stream of motion descriptors that result in a continuous playback of the triggered and modulated expressive behaviour animations with fluid and natural transitions between them. In the case where the CAG is used to drive speech-driven behaviour, i.e., gesticulation, the performed gesture animations are additionally synchronized and matched to the content of a given speech. The input signals may specify triggers of actions (e.g. start, stop) or modulation signals. These are generated elsewhere in the embodied agent’s system and may relate to aspects such as, but not limited to, speech-driven behaviour, locomotion systems, user and world interactive behaviour such as gazing or deictic gesturing, to the agent’s personality, its emotional state, and others.
[14]In the first example embodiment, there is a CAG provided comprising:
SUBSTITUTE SHEET (RULE 26) an animation database with stroke metadata; a Stateful Blend-Tree system; and a motion transition mechanism, further designated as Pose-Buffer.
[15]The animation database contains a set of named animations and the timeline of each animation may be annotated with time points corresponding to relevant moments of the animation timeline. Such time points may be used to represent a single relevant moment of the animation, or to specify a time range for a given relevant portion of the animation.
[16] Animations may be manually annotated or annotated in any other suitable manner (e.g. partly or fully automated using machine learning). Taking as an example a gesture animation, this corresponds to the gesture motion representation where a single gesture can be described as consisting of a number of consecutive movement phases: preparation phase, stroke phase and release phase. The stroke phase of a gesture is the most relevant portion of the motion to convey its meaning and may contain a specific moment in time corresponding to its stroke point, which is the moment of the gesture that is used to align it in time with emphasis and rhythmic points of the corresponding speech. The regions that precede and succeed the stroke phase consist of predesigned motion that blends into and out of the stroke phase from a resting pose.
[17]The Stateful Blend-Tree system, further designated as SBT, is a system that follows the structure of a blend-tree hierarchy of nodes. Each non-leaf node is a blending node which computes the blended result of the composition of motion that is output from its child nodes using a given blending operation and parameters (e.g. animation blending weights, or parameters required in the posebuffer).
[18]Blending operations may include any operations known in the art of animation such as additive, multiplicative, dynamic average, animation mixing. Blending operations may use techniques disclosed in W02022107087A1 and W02020089817A1 by the present applicant, and incorporated by reference herein.
[19]Each leaf node may contain a source or generator of motion which outputs motion descriptors, which is further designated as a motion source. All nodes may process any number of motion descriptor signals, also referred to as channels, with the condition that the channels of a given node are a subset of the channels of its parent node.
SUBSTITUTE SHEET (RULE 26) [20]The SBT takes as input a set of named signals which may be continuous (e.g. blending parameters with time-varying property) or discrete (certain triggers or events, e.g. start or stop). These signals may be used as triggers to request a given motion source to start or stop playback. These signals may additionally be used as modulators of the blending operations. An example of an input signal is a weight parameter of a blend node.
[21]The input signals of the SBT are generated externally to the CAG in components that may perform decision-making, action selection and behaviour modulation roles, such as, but not limited to, dialogue systems, emotion systems, gazing systems, personality and behaviour stylization systems, weight balancing systems, and other partial or full body behaviour control systems.
[22]The output of the SBT is the set of motion descriptors that correspond to the output of the Root Blend Node and may be used to compute the pose of an avatar.
[23]Figure 1 shows a high level comprehension of the incorporation of the CAG using an SBT within a general embodied agent system. An embodiment control system is anything driving the skeleton of the virtual entity or character. Examples include a 3D Tenderer and visualizer, a game engine, a robot controller, or any other system that takes in motion descriptors and applies them to the rig of a character in order to present it animated in some environment.
[24]The SBT controls both the blending and playback logic of motion sources, namely, when to start, stop or otherwise control the playback of a particular motion source. In addition, it includes the logic to compute the final motion that results from the composition of the motion of individual motion sources based on their playback state, the input modulation signals, the blend-tree hierarchy, and the parameters that configure the blending operations. It oversees the state of all motion sources in the database.
[25]In some embodiments, a plurality of SBTs may be provided to control the blending and playback of different components of a virtual agent or digitally-driven entity.
[26]A leaf node of an SBT may contain a mechanism that acts as a state machine. Such nodes are designated as Stateful Source Nodes or SSNs. Any node that is parent to a SSN is further designated as Stateful Blend Node or SBN.
[27]A leaf node of an SBT may also act as a non-stateful source node, in which case it acts as an SSN with one single and perpetual state and may be designated solely as Source Node.
SUBSTITUTE SHEET (RULE 26) [28] A SSN contains at least one state which is further referred to as Default state, and may define rules that cause it to switch to a different state.
[29]Each SSN may define a state or set of rules that discriminate whether the node is Active or Inactive, or in any other particularly relevant state.
[30]The output of the SBT is computed at a regular or semi-regular interval designated as a timestep. On each timestep, the input parameters are evaluated and applied to each node, and the output of each node is computed following the computation of its children nodes, for example, in the order of a depth-first search or via an alternative mechanism.
[31]Timesteps may be executed at a rate that is regular and fast enough so that the sequence of output poses may result in the continuous motion of an avatar, for example, within the range of 30 & 100 Hz.
[32]The Blend Tree may contain a Default Source which produces output in case no blend node is Active.
[33]Figure 2 shows an example SBT. This is one single example configuration of an SBT meant to illustrate the wide variety of combinations of different node types within the hierarchy of an SBT.
[34]The Pose-Buffer mechanism, further designated as PB, takes as input a stream of motion descriptors and outputs a stream of motion descriptors based on a set of operations that are configured into the PB.
[35]The PB may be used in any blend node including SBNs, from which it is provided as input, the resulting blended motion of the individual motion of its child nodes, and for which the SBN’s output corresponds to the output of the PB. We designate this node as a Pose-Buffer Node or PBN.
[36]The Pose-Buffer may be configured to automatically detect discontinuities in its input stream of motion descriptors and to run a given procedural motion generator that automatically computes in-between motion that smooths it and removes the discontinuity.
[37]The Pose-Buffer mechanism is illustrated in Figure 6. The operations ‘matched’ and ‘transition’ may be implemented in any way such that the ‘matched’ operation tests whether a given motion
SUBSTITUTE SHEET (RULE 26) signal ‘x’ described by characteristics such as its value and derivatives, correspond, in some measurable characteristic of the motion, and within a given threshold, to the characteristics of another given signal ‘y’; and that the ‘transition’ operation generates a motion signal, between the given characteristics of signals ‘x’ and ‘y’ such that the characteristics of the first signal are smoothly and continuously transitioned into the characteristics of the second signal, based on a multitude of parameters, while both the first and second signals may be varying on each consecutive timestep.
[38]In one example implementation of the embodiment, the ‘matched’ function comprises:
Figure imgf000009_0002
[39]where A corresponds to the value match threshold and B to the velocity match threshold and E to a very small value.
[40] In one example implementation of the embodiment, the ‘transition’ function comprises:
Figure imgf000009_0001
[41] The main purpose of the Pose-Buffer is to support the computation of a continuous blended motion of the PBN’s child nodes upon any type of changes to the collective state of such children. Therefore the individual nodes are not required to compute transition states or transition animations. Having a continuous signal renders a fluid and natural motion.
[42]The SBN may be configured with rules to support and enforce temporary exclusive playback within its child SSNs. One such rule, which we designate as Exclusivity, enforces that only a given subset of the child nodes may be performing at a given time.
SUBSTITUTE SHEET (RULE 26) [43]The SBN may be configured with rules defining that Exclusivity applies only to a given subset of the states of the child SSNs. These rules include logic that specify how a child SSN behaves whenever it is active in a non-exclusive state. Namely, while in a non-exclusive state, the output of the source motion may be ignored, or the state may be interrupted, in order to give precedence to another sibling node which is in, or switching into, an exclusive state.
[44]The SSN may be configured with rules specifying that when it is triggered to start playing, while it is already playing, that it allows it to self-interrupt and loop, i.e., to restart its motion source and keep on playing. We designate these rules as Self-Interruption.
[45]As a result of the possible combination of rules of the SBT, of the SBNs and of the SSNs, given a series of input play trigger signals across various SSNs of the SBT over time, the outcome from each SBN would be a multi-channel output with discontinuities in the motion descriptor signals, due to changes in the collective state of its children. These discontinuities may be solved by adding a Pose-Buffer mechanism into the SBN in order to automatically compute and output a continuous stream of multi-channel motion descriptor values.
[46]In one example, a given SSN contains a motion source with metadata specifying a StrokeStart time and a StrokePoint and a StrokeEnd time.
[47] In such an example, the input play trigger signals are generated externally with a timing such that the motion source is requested to start playing at the right moment so that it will reach its StrokePoint in sync with the timing of a given speech-directed marker such as a specific phoneme of a word.
[48]Figure 3 illustrates one same channel of two example animations A and B, each used as the motion source of two sibling SSNs which are children to a PBN along with an arbitrary number of additional siblings. In each of the animations there are 5 relevant points described in the animation metadata, namely, and in addition to the trivial AnimStart and AnimEnd, we see markers for the StrokeStart, StrokePoint, and StrokeEnd. The x-axis in the animation cur\ c represents time, and the y-axis in an animation curve may represent the value or a partial value of a motion descriptor or any suitable animation attribute over time, for example, an object’s position, rotation or other animatable quality.
[49]Figure 4 illustrates an example case where a sentence “This is a sentence with two animations” is performed by the embodied agent who speaks the sentence and performs matching gestures at
SUBSTITUTE SHEET (RULE 26) the same time. In this example the embodied agent system provides inputs to the CAG that trigger each animation to play with such a timing that A. StrokePoint will align with the moment of the agent’s speech corresponding to the second letter of the word “sentence”, and that B. StrokePoint will align with the fifth letter of the word “animations”, as described in the figure using the markup “s<e,A>ntence” and “anim<a,B>tion”. The portion of the figure immediately below the sentence shows a layered view of both animations, illustrating how the provided timing will cause an overlap between the ending part of animation A and initial part of animation B. In this example, due to exclusivity, the overlapping portion of animation A is interrupted and removed from the output, leaving only animation B to be performed. This results in a discontinuity, illustrated in the plot of the lower portion of Figure 4. Because both SSNs are children to a PBN, the Pose-Buffer mechanism automatically detects the discontinuity and performs a given inbetweening operation to generate a continuous transition between the point of discontinuity and an undetermined point where the transition-generated motion matches the target motion of B in both position and velocity, each within a given matching threshold. Once the match is achieved, the Pose-Buffer disables the transition computation and proceeds to output the SSN corresponding to B faithfully from its motion source.
[50]In the example of the previous paragraph, the transition may either be fully computed in advance, or computed ad-hoc, on a per-frame basis. The latter case allows to continuously modulate, manipulate and warp animation B in any manner, based on additional input signals, while the transition is being computed.
[51]In one example application of SSNs, a given SSN will produce no output while it is in the Inactive state which is used as the default state.
[52]In such an example, a given SSN contains, in addition to the Inactive state, a Starting state during which it outputs the motion corresponding to the region of the motion source between the first frame and the frame corresponding to StrokeStart, a Stroke state in which it outputs the motion corresponding to the region of the motion source between the frames corresponding to StrokeStart and StrokeEnd, and an Ending state in which it outputs the motion corresponding to the region of the motion source between the frame corresponding to StrokeEnd and the last frame.
SUBSTITUTE SHEET (RULE 26) [53]In such an example, a given SSN may start running an internal clock when it switches out of the Inactive state, and in this case, on each timestep, it will update the internal clock with the corresponding time that has elapsed since the last timestep.
[54]In such an example, a given SSN contains, in addition to the Inactive, Starting, Stroke and Ending states, a Wait state, into which it switches whenever its internal clock is pointing to the region of the timeline prior to the StrokeStart point, and another sibling has taken exclusivity of playback, or precedence by having been triggered first. In the Wait state, the SSN will keep on running its internal clock and advancing the timeline without, however, generating an output.
[55] In such an example, the SBNs may be configured with rules to solve concurrent starting conflicts in the case where more than one of the SSNs have switched from the Inactive state to another state at approximately the same time.
[56] In such an example, the conflict solving mechanism may interrupt a currently active SSN, or may disable (switch to Inactive) an SSN that is in the Wait state
[57]In such an example, the SBNs may be configured with rules to enforce Exclusivity as described above.
[58]In such an example, the SBNs may be configured with rules to enforce Self-Interruption as described above.
[59]The state transitions of each SSN may be driven by rules that are affected by external input parameters, or by its own internal state management, or by a collective state management which reacts to changes in state of any of its sibling nodes.
[60]An example state machine for SSNs targeted at continuous gesturing animation is illustrated in Figure 5. This state machine conveys an example set of states and transitions, however the SBT may be configured with any alternative state machine, set of states, transitions and rules. Each sibling SSN may use the same state machine, or a different state machine, where the illustrated letters from A to F may represent certain conditions, which may include conditions of both the individual and/or the collective state of the SSNs and/or their parent SBN.
[61]Some examples of conditions that may be used in the transitions of the example state machine include, but is not limited to: a trigger signal is active; a sibling node is in the Stroke state; a sibling node is in the Starting state; the node's animation clock has hit the StrokeStart position; etc.
SUBSTITUTE SHEET (RULE 26) [62]The full AG system operates in the following order. Discrete input signals are provided at any time to the SBT for a particular subset of motion source identifiers each of which correspond to an SSN. The input signals may contain, among other parameters, play and stop triggers, and modulation weights. For each motion source referenced in the input signals, if a play trigger is present, it triggers a request to the corresponding SSN to start playing the motion source. Depending on the state of the system, the requested motion source may be immediately played or placed on a waiting queue with a timeout. A change in any of the source node’s states may cause a change to the state of other source motions. A Pose-Buffer is used on the parent-node of source nodes to automatically generate in-between motion based on any collective state change in order to guarantee continuously correct motion.
ADVANTAGEOUS EFFECTS OF INVENTION
[63]Embodiments described herein provide a solution to the problem of computing a continuous stream of motion that is adequate for expressive behaviour, including social and communicative gestural performance. Advantageously, the resulting animation looks natural, and matches and may be synchronized with the spoken content to be delivered, without requiring a data-based approach or extensive labelling or training of an AG model.
[64]The AG system may operate with parameters that are not linked to the body pose characteristics, but rather with the semantic meaning of the animation being performed, and it is capable of composing the output animation from parts containing the portions of animation that retain their semantic meaning, rather than isolated small clips which individually do not have semantic meaning. Additionally, it does not require training a model or annotating motion characteristics of the animation dataset. Finally, the generated motion is continuous in nature and may be used to represent meaningful expression, including, but not limited to gesticulation, while maintaining the original naturalistic motion.
[65]When a gesture is triggered to perform during, for example, a speech-directed performance, a node containing a given triggered gesture may perform the Starting phase in case the avatar’s current pose was at or near the resting pose, or skip it and start performing directly on the Stroke phase in case its pose was away from the resting pose.
[66]One of the advantages of this logic is that the portions of animations referring to the preparation and release phases of gesturing may be skipped, leading to minimal portions of in-between motion generation, and to more faithful generated in-between motions.
SUBSTITUTE SHEET (RULE 26) [67] One of the advantages of this logic is that when performing consecutive gestural animations, the blend-tree with state-machine nodes can deliver consistent and artistically faithful gestural motion.
[68] One of the advantages of this logic is that it also allows for more gestures per unit time with shorter intervening regions for which in-between motion must be computed.
[69]By maintaining a consistent and artistically faithful performance of the stroke phase of each gesture, the semantic meaning and/or connotation of the gesture is preserved. The connotation of a beat gesture is the performance-based implication. The connotation for beat gestures being where there is no set interpretation; instead beat choice, size and speed contribute to the perceived personality of the avatar. For example, a straight-forward chop may provide the connotation of confidence and direct/task oriented communication style. The semantic meaning, being the set meaning of a symbolic, metaphoric or iconic gesture being communicated. Preserving the semantic meaning is essential for these gestures to facilitate non-verbal communication. For example, a wide two arm sweep can be symbolic of ‘wider’ or ‘larger’.
INTERPRETATION
[70]The methods and systems described may be utilised on any suitable electronic computing system. According to the embodiments described below, an electronic computing system utilises the methodology of the invention using various modules and engines. The electronic computing system may include at least one processor, one or more memory devices or an interface for connection to one or more memory devices, input and output interfaces for connection to external devices in order to enable the system to receive and operate upon instructions from one or more users or external systems, a data bus for internal and external communications between the various components, and a suitable power supply. Further, the electronic computing system may include one or more communication devices (wired or wireless) for communicating with external and internal devices, and one or more input/output devices, such as a display, pointing device, keyboard or printing device. The processor is arranged to perform the steps of a program stored as program instructions within the memory device. The program instructions enable the various methods of performing the invention as described herein to be performed. The program instructions, may be developed or implemented using any suitable software programming language and toolkit, such as, for example, a C-based language and compiler. Further, the program instructions may be stored in any suitable manner such that they can be transferred to
SUBSTITUTE SHEET (RULE 26) the memory device or read by the processor, such as, for example, being stored on a computer readable medium. The computer readable medium may be any suitable medium for tangibly storing the program instructions, such as, for example, solid state memory, magnetic tape, a compact disc (CD-ROM or CD-R/W), memory card, flash memory, optical disc, magnetic disc or any other suitable computer readable medium. The electronic computing system is arranged to be in communication with data storage systems or devices (for example, external data storage systems or devices) in order to retrieve the relevant data. It will be understood that the system herein described includes one or more elements that are arranged to perform the various functions and methods as described herein. The embodiments herein described are aimed at providing the reader with examples of how various modules and/or engines that make up the elements of the system may be interconnected to enable the functions to be implemented. Further, the embodiments of the description explain, in system related detail, how the steps of the herein described method may be performed. The conceptual diagrams are provided to indicate to the reader how the various data elements are processed at different stages by the various different modules and/or engines. It will be understood that the arrangement and construction of the modules or engines may be adapted accordingly depending on system and user requirements so that various functions may be performed by different modules or engines to those described herein, and that certain modules or engines may be combined into single modules or engines. It will be understood that the modules and/or engines described may be implemented and provided with instructions using any suitable form of technology. For example, the modules or engines may be implemented or created using any suitable software code written in any suitable language, where the code is then compiled to produce an executable program that may be run on any suitable computing system. Alternatively, or in conjunction with the executable program, the modules or engines may be implemented using, any suitable mixture of hardware, firmware and software. For example, portions of the modules may be implemented using an application specific integrated circuit (ASIC), a system-on-a-chip (SoC), field programmable gate arrays (FPGA) or any other suitable adaptable or programmable processing device. The methods described herein may be implemented using a general-purpose computing system specifically programmed to perform the described steps. Alternatively, the methods described herein may be implemented using a specific electronic computer system such as a data sorting and visualisation computer, a database query computer, a graphical analysis computer, a data analysis computer, a manufacturing data analysis computer, a business intelligence computer, an artificial intelligence computer system etc., where the computer has
SUBSTITUTE SHEET (RULE 26) been specifically adapted to perform the described steps on specific data captured from an environment associated with a particular field.
SUBSTITUTE SHEET (RULE 26)

Claims

CLAIMS A data structure for generating an animation of a virtual object or digital entity comprising a hierarchical blend tree including: a) a plurality of source nodes, wherein source nodes are leaf nodes of the hierarchical blend tree, and wherein source nodes are associated with a motion source outputting motion descriptors; b) and wherein at least one source node is associated with a node state, wherein the node state may be active or inactive; c) at least one non-leaf blend node, configured to blend the output of its active child nodes; and d) a root node, configured to blend the output of its child nodes to generate a final animation output for the hierarchical blend tree. The data structure of claim 1 wherein at least one node state is associated with a set of rules determining the node state. The data structure of claim 2 wherein the set of rules determining the node state are a state machine. The data structure of claim 1 wherein node states are selected from the group consisting of inactive, starting, stopping, stroke and wait. The data structure of claim 1 wherein at least one of the non-leaf blend nodes are associated with rules overriding the state of source nodes. The data structure of claim 5 wherein at least one rule overrides the state of source nodes to enforce only a single child node to be active at a given time.
SUBSTITUTE SHEET (RULE 26) A computer-implemented method for modifying a stream of motion descriptors including the steps of: a) receiving an input stream of motion descriptors; b) determining a discontinuity in the stream of motion descriptors; c) computing a modified stream of motion descriptors configured to remove the discontinuity; and d) returning the modified stream of motion descriptors. A data structure of claim 1, wherein at least one non-leaf blend node is configured to perform the method of claim 7. A computer-implemented method for animating a virtual object or digital entity using the data structure claimed in any one of claims 1 to 6. The method of claim 9 wherein the hierarchical blend tree is timestepped, wherein at each time step, input parameters are evaluated and applied to each node, and the output of the blend tree is computed. A computer-implemented method for generating an animation of a virtual object or digital entity using a hierarchical blend tree, wherein the hierarchical blend tree is configured with a plurality of leaf nodes representing motion sources and non-leaf blend nodes are configured to blend the motion sources of their respective children node, including the steps of: a) receiving an input signal for modifying one more animation parameters of one or more of the motion sources; b) modifying a node state of one or more leaf nodes associated with the one or more motion sources, and c) processing the hierarchical blend such that the output of each node is computed following output of children nodes; d) generating an animation using the blended output of the root node,
SUBSTITUTE SHEET (RULE 26) wherein the modification of the node state modifies at least one non-leaf blend node’s blending computation.
12. A computer-implemented method of modifying a stream of motion descriptors including the steps of: a) receiving a first motion signal and a second motion signal with associated timesteps; b) determining whether numerical characteristics of the first motion signal correspond to numerical characteristics of the second given signal within a given threshold; c) if the first motion signal does not correspond to the second given signal: generating a modified output signal between the characteristics of the first motion signal and the second motion signal; and d) returning the modified stream of motion descriptors.
13. The method of claim 12 wherein the characteristics are activation value and derivatives.
14. A data processing apparatus comprising means for carrying out the method of any one of claims 9 to 13.
15. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any one of claims 9 to 13.
16. A computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method of any one of claims 9 to 13.
SUBSTITUTE SHEET (RULE 26)
PCT/IB2023/062570 2022-12-14 2023-12-13 Continuous expressive behaviour in embodied agents WO2024127258A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NZ79564022 2022-12-14
NZ795640 2022-12-14

Publications (1)

Publication Number Publication Date
WO2024127258A1 true WO2024127258A1 (en) 2024-06-20

Family

ID=91485203

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2023/062570 WO2024127258A1 (en) 2022-12-14 2023-12-13 Continuous expressive behaviour in embodied agents

Country Status (1)

Country Link
WO (1) WO2024127258A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140267239A1 (en) * 2013-03-15 2014-09-18 Dreamworks Animation Llc Generalized instancing for three-dimensional scene data
US20150199315A1 (en) * 2012-02-13 2015-07-16 Google Inc. Systems and methods for animating collaborator modifications
WO2022108806A1 (en) * 2020-11-18 2022-05-27 Snap Inc. Body animation sharing and remixing
US20220180586A1 (en) * 2020-02-04 2022-06-09 Tencent Technology (Shenzhen) Company Ltd Animation making method and apparatus, computing device, and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150199315A1 (en) * 2012-02-13 2015-07-16 Google Inc. Systems and methods for animating collaborator modifications
US20140267239A1 (en) * 2013-03-15 2014-09-18 Dreamworks Animation Llc Generalized instancing for three-dimensional scene data
US20220180586A1 (en) * 2020-02-04 2022-06-09 Tencent Technology (Shenzhen) Company Ltd Animation making method and apparatus, computing device, and storage medium
WO2022108806A1 (en) * 2020-11-18 2022-05-27 Snap Inc. Body animation sharing and remixing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Types of Animation Blend Nodes", SANJEOKDAEWANG (NAVER BLOG), 20 February 2016 (2016-02-20), Retrieved from the Internet <URL:https://blog.naver.com/raveneer/220632681874> *

Similar Documents

Publication Publication Date Title
Prendinger et al. MPML: A markup language for controlling the behavior of life-like characters
Gebhard et al. Visual scenemaker—a tool for authoring interactive virtual characters
Fiebrink et al. A meta-instrument for interactive, on-the-fly machine learning
US20060181535A1 (en) Apparatus for controlling a virtual environment
CN107577661B (en) Interactive output method and system for virtual robot
WO2007130689A2 (en) Character animation framework
MXPA05011864A (en) Coordinating animations and media in computer display output.
Ashida et al. Pedestrians: Creating agent behaviors through statistical analysis of observation data
US20020008704A1 (en) Interactive behavioral authoring of deterministic animation
US20230009454A1 (en) Digital character with dynamic interactive behavior
Loyall et al. System for authoring highly interactive, personality-rich interactive characters
Steels Personal dynamic memories are necessary to deal with meaning and understanding in human-centric AI.
Ingalls The evolution of smalltalk: From smalltalk-72 through squeak
Corradini et al. Animating an interactive conversational character for an educational game system
Martin et al. Levels of Representation in the Annotation of Emotion for the Specification of Expressivity in ECAs
Nischt et al. MPML3D: a reactive framework for the Multimodal Presentation Markup Language
Mancini et al. Dynamic behavior qualifiers for conversational agents
WO2024127258A1 (en) Continuous expressive behaviour in embodied agents
Prendinger et al. MPML and SCREAM: Scripting the bodies and minds of life-like characters
Blumendorf Multimodal interaction in smart environments: a model-based runtime system for ubiquitous user interfaces
Feng et al. A platform for building mobile virtual humans
Assayag Improvising in creative symbolic interaction
Mehlmann et al. Modeling parallel state charts for multithreaded multimodal dialogues
CN115779436B (en) Animation switching method, device, equipment and computer readable storage medium
Geraci Design and implementation of embodied conversational agents

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23902915

Country of ref document: EP

Kind code of ref document: A1