WO2006052750A2

WO2006052750A2 - Asynchronous and parallel execution by physics processing unit

Info

Publication number: WO2006052750A2
Application number: PCT/US2005/040006
Authority: WO
Inventors: Jean-Pierre Bordes; Dilip Sequeira
Original assignee: Ageia Technologies, Inc.
Priority date: 2004-11-08
Filing date: 2005-11-03
Publication date: 2006-05-18
Also published as: WO2006052750A3

Abstract

Asynchronous and parallel execution of a main application on a host system and physics subroutines on a Physics Processing Unit (PPU) is provided in relation to various types of physics subroutines operating upon data types defining physics relationships between objects and features in an animation.

Description

Asynchronous and Parallel Execution by Physics Processing Unit

This application is related to U.S. Patent Application 10/715,459 filed November 19, 2003 and U.S. Patent Application 10/839,155 filed May 6, 2004. The subject matter of these two related applications is hereby incorporated by reference. This application is also related to U.S. Patent Application 10/982,764 filed November 8, 2004, the subject matter of which is hereby incorporated by reference.

BACKGROUND OF THE INVENTION The present invention relates generally to a system running a physics simulation within the context of a main application running on the system. More particularly, the present invention relates to systems, such as Personal Computers (PCs) and game consoles, comprising a physics co-processor, or a so-called Physics Processing Unit (PPU). Several exemplary embodiments of a PPU-enabled system are disclosed in related U.S. Patent Applications 10/715,370 filed November 19, 2003 and 10/839,155 filed May 6, 2004.

The term "PPU-enabled" generally describes any system incorporating a PPU to generate physics data for consumption by a main application running on a Central Processing Unit (CPU), such as a Pentium® or similar microprocessor. "Physics data" comprises any data or data structure related to a mathematical algorithm or logical/mathematical expression adapted to solve a physics problem or express a physics relationship.

Any set of logical computations or algorithms operating upon physics data is termed a physics "simulation." A simulation generally runs on the PPU in cooperation with the CPU to generate a body of physics data that accurately defines the movement and/or interaction of objects and features in an animated scene displayed by a peripheral device associated with the system. In one sense the physics simulation run on the PPU can be said to visually enhance the animation of a scene generated by the main application running on the CPU. Such computationally derived, physics-enhanced animations form an increasingly important aspect of numerous applications. Computer games are an excellent example of applications that benefit from the added realism provided by animations derived from a defined set of physics-based parameters and data. The term "animation" is used here to generally describe any visual representation of an event. The term "physics-based animation" refers to any animation derived, at least in part, from one or more computational processes operating upon physics data defining a physical characteristic or behavior. A simulation is often said to drive the resulting animation. However, the direct relationship between simulation and animation and the fact that the underlying simulation is not apparent to the system user typically results in an alternative use of the terms animation and simulation.

Cutting edge applications generally demand that physics-based animations, and the underlying simulations run in real-time. This requirement poses a significant problem for conventional systems. For example, conventional PCs are able to resolve only a limited amount of physics data in the time allowed by real-time animation frame rates. This disability arises from structural limitations in the CPU architecture, data bandwidth limitations, and the computational workload placed upon the CPU by other processes inherent in the execution of the main application. For clarity of reference, the term "system" subsumes the term "host system."

A system includes a PPU, whereas a host system generally includes a combination of CPU and an associated main memory. This combination of host system elements interacts with the PPU.

System resources typically brought to bear on the problem of a physics-based animation are conceptually illustrated in Figure 1. In Figure 1, a Central Processing Unit (CPU) 1 together with its associated drivers and internal memories, access data from a main memory 2, and/or one or more peripheral devices 3. A Graphics Processing Unit (GPU) 4 with its associated memory 4A and a Physics Processing Unit (PPU) 5 with its associated main PPU memory 5A, send data to and receive data from main memory 2.

Specific memory architectures are legion in number. The term "main memory" generally refers to any collection of data storage elements associated with the CPU and typically includes Random Access Memory (RAM), Read/Write memory, and related data registers and data buffers. A main application 7 is typically loaded from a peripheral 3 and runs, at least in part, from main memory 2 using CPU resources. Many contemporary applications include significant graphics content and are thus intended to run with the aid of separate GPU 4. GPUs are well know in the industry and are specifically designed to run in cooperation with a CPU to create (or "render") animations having a three dimensional (3-D) quality. As a result, main application 7 accesses one or more graphical rendering subroutines associated with GPU 4 using an Application Programming Interface (API) and related drivers 9. Similarly, one or more physics subroutines associated with PPU 5 are accessed using a PPU API and related drivers 8. An API is a well understood programming technique used to establish a lexicon of command instructions by which one piece of software may call another piece of software. The term "call" as variously used hereafter broadly describes any interaction by which one piece of software causes the retrieval, storage, indexing, update, etc., of another piece of software, or the execution of a computational process in firmware or hardware. The term "run" describes any process in which hardware resources perform an operation under the direction of a software resource.

In response to the growing appetite for physics-based animations, so-called physics engines have conventionally been added to the program code implementing applications. Indeed, a market has recently emerged directed to the development of physics engines or so-called "physics middleware." Companies like HAVOK and MathEngine have developed specialty software that may be called by a main application to better incorporate natural looking, physics-based animations into the application.

Conventional software-based physics engines allow programmers increased latitude to assign virtual mass and coefficients of friction to objects animated within the execution of the main application. Similarly, virtual forces, impulses, and torques may be applied to objects. In effect, software-based physics engines provide programmers with a library of procedures to simplify the visual creation of scenes having physics-based interaction between objects. Unfortunately, the growing appetite for animated realism can not be met by merely providing additional specialty software, and thereby layering upon the CPU additional processing requirements. This is true regardless of the relative sophistication of the specialty software.

Consider the synchronous execution flow of a conventional application shown in Figure 2. The application is executed using conventional hardware and software resources provided by a host system, including a CPU and a main memory. A PC host system running a PC game is assumed for purposes of illustration. The application includes numerous subroutines implementing the game's functionality. Audio must be provided. Graphics data must be resolved between the CPU and GPU in order to display game scenes on a peripheral display. At some point during execution of the application, physics data is required. Accordingly, the application calls the physics middleware to compute the physics data. Since the physics middleware is executed using CPU resources (e.g., logic units, data registers, memory caches, etc.), the execution of the application stalls for a period of time. Hence, this conventional approach is term "synchronous" because the execution of the main application is synchronized with the execution of physics middleware. Since there is only a single computational unit, the CPU, at work here, only a single piece of software can be executed during a given time period. Furthermore, should the execution cycle for the physics middleware exceed a threshold related to the animation frame rate, the game running on the PC begins to visually and/or audibly stall. In other words, overly lengthy execution cycles by the physics middleware can destroy the real-time quality of the application being executed. As a result, the quantity and quality of physics content provided by conventional physics middleware remains very limited.

As noted in the referenced applications, conventional CPUs lack the quantity of parallel execution units needed to run complex, physics-based simulations in real time. Within the context of real-time physics simulations, the data bandwidth provided between CPU 1 and main memory 2 is too limited and data latency is too high. Data pipeline flushes are too frequent. Data caches within the CPU are too small and their set-associative nature further limits their usefulness in the computation of physics data. Conventional CPUs have too few registers and lack specialized instructions (e.g., cross product, dot product, vector normalization). In sum, the general purpose architecture and instruction set associated with conventional CPUs are insufficient to execute the number of computational operations required to implement complex, real-time, physics-based animations.

Additionally, the synchronous execution of the application and physics middleware on the same hardware resources proves fatal to the real-time incorporation of sophisticated physics data within the application. While the conventional, synchronous approach is straightforward, it is also very limited in its application. The sequential calculation of highly interdependent sets of physics data using typical resolution cycles is inherently limited by the other computational requirements placed upon the CPU. Further, the extended "wait" periods necessary for synchronous execution are highly inefficient.

SUMMARY OF THE INVENTION The present invention provides significant performance advantages over conventional systems synchronously running a main application with physics middleware. By executing a main application on a host system in parallel and asynchronously with the execution of physics subroutines on an associated Physics Processing Unit "PPU," the present invention enables a dramatic improvement in the quantity and quality of physics-based data enhancements to the main application.

Accordingly in one aspect, present invention provides a system comprising a host system and a PPU. The host system comprises at least a Central Processing Unit (CPU) and a main memory. The main memory stores, at least in part, a main application, and execution of the main application defines a body of simulation data. The PPU is associated a PPU memory storing, at least in part, one or more physics subroutines. Execution of the one or more subroutines generates physics data in relation to the body of simulation data. With this general system configuration, execution of the main application by the host system occurs, at least in part, asynchronously and in parallel with execution of the one or more physics subroutines by the PPU.

In addition, the host system typically comprises one or more peripherals, including a display, a Graphics Processing Unit (GPU) and an associated GPU memory, where the GPU memory stores, at least in part, graphical rendering subroutines adapted to animate a scene on the display. In a related aspect of the present invention, execution of the graphical rendering subroutines by the GPU occurs, at least in part, asynchronously and in parallel with execution of the one or more physics subroutines by the PPU.

In other related aspects, execution of the main application transfers simulation data stored in the main memory to the PPU memory via a communications channel, and execution of the one or more physics subroutines transfers physics data stored in PPU memory to the main memory via the communications channel. The communications channel preferably enables at least one data transfer protocol compatible with PCI and PCI express. In another aspect, the present invention provides a method of operating a system like the one generally described above. The method comprises assigning priority to each one of the plurality of physics subroutines, and upon receiving a command from the main application, executing the plurality of physics subroutines in a sequence defined in accordance with their respective priorities. This priority of physics subroutines is preferably provided by a task list generated by the main application and transferred to the PPU memory.

By operation of the main application, the method defines simulation data comprising inputs, parameters, static data, state data, and/or data structures. Once defined, this simulation data is transferred to the PPU. Upon executing one or more of the physics subroutines, the PPU returns physics data from the PPU to the main memory.

Recognizing that the communications channel may be in adequate to transfer all of the physics data generated by the PPU during a given time frame, the present invention also provides a method of transferring physics data from a PPU and its associated PPU memory to a host system. Thus, assuming a communications channel having a maximum data transfer capacity of "Y" data bits per unit of time, and a total physics data of "X" data bits related to "n" object/features within an animation scene, the method comprises; selecting "m" object/features in the animation scene, where "m" is less than or equal to "n", and transferring only a portion "Z" of the total "X" data bits during a unit of time, wherein the portion "Z" is less than or equal to "Y", and wherein the portion "Z" relates to the "m" selected object/features.

In yet another aspect, the present invention provides a method of executing a physics subroutine on a system like the one generally described above. The method begins with execution of the main application by the host system. The main application initializes simulation data comprising at least one data structure, input, and parameter, and transfers the simulation data from the main memory to the PPU memory via a communications channel.

The PPU executes the physics subroutine largely in parallel and asynchronously with execution of the main application running on the host system and using the simulation data. Following execution of the physics subroutine, the physics data generated is stored in the PPU memory, and transferred from the PPU memory to the main memory. BRIEF DESCRIPTION OF THE DRAWINGS

The drawings, taken together with the foregoing discussion, the detailed description that follows, and the claims, describe several preferred embodiments of the present invention. The drawings include the following: Figure 1 is a conceptual illustration of the principal hardware and software components forming a system adapted top the present invention;

Figure 2 is a flowchart illustrating the synchronous nature of conventional application/physics middleware execution;

Figure 3 is a flowchart illustrating the asynchronous, parallel execution of a main application on a host system and one or more physics subroutines on a PPU within the context of one embodiment of the present invention;

Figure 4 is a flowchart illustrating the asynchronous, parallel execution of an exemplary collision detection subroutine;

Figure 5 is a flowchart illustrating the asynchronous, parallel execution of an exemplary rigid body dynamics subroutine;

Figure 6 is a flowchart illustrating the asynchronous, parallel execution of an exemplary smooth particle hydrodynamics (fluid system) subroutine;

Figures 7 A, 7B, and 7C are related sections of a flowchart illustrating the asynchronous, parallel execution of a multiple physics subroutines; and, Figure 8 is a flowchart illustrating an exemplary static data configuration package;

Figure 9 is a flowchart illustrating an exemplary state data configuration package;

Figure 10 is a flowchart illustrating an exemplary package implementing a data transfer routi9ne between the host system and PPU; and,

Figure 11 conceptually illustrates the use of a interest registration scheme to select physics data for transfer from a PPU to a host system. DESCRIPTION OF EXEMPLARY EMBODIMENTS The present invention recognizes that conventional software-based solutions to physics simulations have limits that affect their practical application. Emerging applications, such as PC and console games, would benefit considerably by including many more active objects and related forces than could be reasonably simulated using specialty software run on a general purpose CPU.

Thus, the present invention approaches the problem of generating visually realistic physics animations from an entirely different perspective. Unlike conventional software-based solutions, the present invention preferably relies on a hardware-based Physics Processing Unit (PPU). A PPU implemented in accordance with the present invention may be viewed in one aspect as a specialty co-processor. In cooperation with the general purpose CPU provided by a host system, the PPU provides the enormous, additional, and highly specialized processing capabilities required to implement complex, real-time, physics simulations. However, the term "PPU" as used herein defines a broader class of processors and related system components capable of executing physics subroutines. A PPU configured as specialty co-processor is presently preferred, and therefore forms the context for much of the description that follows. However, it is well within the contemplation of the present invention that an additional, general purpose CPU configured within a host system, or even an additional processing core within a multiple CPU could be used to execute the physics subroutines required to create realistic, real-time animations. Given current capabilities, a second CPU (or second core) would not execute physics subroutines with nearly the efficiency of the preferred specialty, co-processor embodiment, but its use is possible. The term "PPU" should further be read as encompassing not only a single integrated circuit, or chip set logically executing the physics subroutines, but also the associated data transfer circuits, memories, registers, and buffers. Thus in one aspect, the term "PPU" may be understood as shorthand for a "PPU system."

The present invention is characterized in one aspect by the asynchronous and parallel operation of the CPU and PPU. The term "asynchronous" defines any system operating methodology wherein execution of any part of the main application using resources provided by the host system CPU is run in parallel with (1) the execution of computational algorithms adapted to determine physics data using resources provided by the PPU, (2) PPU-based definitions and/or updates to physics data, and/or (3) the transfer and/or storage of physics data from PPU main memory to/from a host system or external memory location. The term "memory" is broadly defined to include all memory types, including associated registers, latches, and buffers. The term ""main PPU memory'" is used to denote a primary, high-speed memory external to the computational components executing physics subroutines. In contrast, a hierarchy of other memories (e.g., intermediate, secondary, and primary) are preferably associated with the computational components.

In one aspect, the present invention seeks to maximize the run-time efficiency of both the CPU and the PPU. Thus, while it is possible for the main application to wait for physics data being computed by the PPU, such waits are generally unnecessary and/or greatly reduced over the wait times inherent in the synchronous operation of conventional systems. Asynchronous, therefore, describes any execution method capable of any degree of independent operation as between a main application and physics subroutines. Completely asynchronous operation is not required. In this context, the related terms "asynchronous[ly]" and "in parallel" connote a greatly expanded ability by the host system to execute a main application during time periods where physics data is being calculated and/or manipulated by the PPU, as compared with the synchronous mode of execution described above. The reverse is also true, since the PPU is able to execute physics subroutines during time periods where the host system is executing the main application.

Asynchronous operation allows the host system and PPU to be fully and efficiently utilized. Proper execution of the main application, even portions of the main application requiring significant quantities of physics data, may proceed without extended wait cycles that would otherwise threaten the real-time incorporation of physics data within the main application. That is, while the PPU performs the extensive computational algorithms required to generate sophisticated physics data, the CPU is free to execute other aspects of the main application.

This concept is illustrated by a simple example shown in Figure 3. A PC or console game is assumed as a working example of the main application, but this is only an example. Any type of main application adapted to incorporate physics data is susceptible to the benefits afforded by the present invention. In Figure 3, the main application runs on a host system asynchronously with respect to physics subroutines running on a PPU. In the illustrated example, a first artificial intelligence (AI) subroutine 10 generates a call to the PPU. The PPU call may take many specific forms, including for example, a computational request, a data transfer or update, etc. As a result of the PPU call, the PPU receives and/or updates data, transfers data internally, sends (returns) data to a host system memory, and/or executes computational algorithms sufficient to define, update, and/or compute physics data.

In the illustrated example, the PPU is presupposed to have stored certain physics data and executes three physics subroutines to define or update the physics data. Here, collision detection subroutine 20, an effects subroutine 21, and a (rigid body or hydro-) dynamics subroutine 22 are run. Following the execution of these three subroutines, physics data is "returned" to the host system. The term

"return(ed)" broadly defines any data transfer from a PPU to the host system, where such data transfer results from a prior PPU call.

As illustrated in Figure 3, during the period of time required for the PPU to execute the three physics subroutines, the host system is able to continue with the execution of the main application. Here, an audio subroutine 11 is executed, a graphical rendering subroutine 12, and a second AI subroutine 13 are executed while the PPU computes and transfers the physics data provided by the return. In this manner, CPU resources are fully available for other purposes during periods where the PPU undertakes the computationally intense deteπnination of physics data. That is, the main application and the physics subroutines run, in large part, asynchronously and in parallel.

Naturally, the exemplary subroutines referenced above are greatly limited and oversimplified for purposes of explanation. Software designers will define and arbitrarily designate numerous subroutines defining the main application, as well as the PPU-resident physics subroutines. However, regardless of the type, nature, and number of constituent subroutines, many of the subroutines implicated in the definition and incorporation of physics-based data within a main application will access one or more of data types commonly used to express physics-based relationship and characteristics. As with the definition of subroutines, many specific data forms and data structures are available to a software designer. However, regardless of the particular form or structure of the data being used, physics data will incorporate one or more of the following data types; static, state, input, and parameters. Examples of these data types are described hereafter in the context of several exemplary physics subroutines. Static data generally describes the permanent properties of objects and features (hereafter "object/features") included within an animation. An "object" is any article or character, or a part of an article or character, within an animation scene. A "feature" is any attribute or visual quality associated with an animation scene or an object in the animation scene. As will be described in greater detail hereafter, objects/features need not be visible to a viewer of the animation.

State data is any set of information that sufficiently describes at least one object/feature in time, such that future results (e.g., subsequent frames in the animation occurring later in time) can be calculated from present inputs. State data may further be defined as comprising two general subtypes; persistent state data and transient sate data.

Persistent state data defines or characterizes one or more object/features from one animation "frame" to the next. The tenn "frame" refers to any time period defining a predeteπnined increment in logical, computational, visual, and/or audible progress of a main application or physics subroutine. Persistent state data typically results from execution of a physics simulation (i.e., computations defining physics data underlying an animation). Thus, in the exemplary context of a PC or console game application, persistent state data is consumed by the host system to drive a physics data enhanced, interactive, 3-D experience. Persistent state data at the end of one animation frame is necessary to correctly evolve the physics simulation in a subsequent animation frame.

Transient state data defines or characterizes object/features within a given frame. For example, many physics subroutines run on the PPU are iterative in their nature. Inter-iteration state data is transient as it is updated by subsequent computational iterations. Although the main application may require access to some transient state data to drive a physics animation or update the main application, the data values associated with transient state variables are not required to evolve a physics simulation from frame to frame.

Inputs are variable definitions whose values are specified at various times by the host system (e.g., the main application, or a specific user input to the main application) to control a physics simulation. For example, at a particular time in the simulation, the application may request that a particular force be applied to a particular object for predetermined period of time. Parameters are values typically associated with one or more algorithms that globally affect the results of a physics simulation or the computations used to derive physics data. Parameters are not associated with any particular object/feature in the simulation and their values generally remain fixed throughout the duration of the simulation.

Examples of these various data types are best given in the context of several selected physics subroutines commonly associated with physics simulations run on a PPU. A collision detection subroutine is first considered. Collision detection is characterized by geometric calculations to determine whether two or more object/features in an animation overlap, intersect, or inter-penetrate. Such object/features take many geometric forms, including, for example, objects visually represented in an animation as 3 -dimensional articles, and objects visually represented as one or more line segments (or rays).

Typical collision detection subroutines make use of so-called "collision models." Collision models generally define the 3-dimensional size and shape of objects/features for the purpose of detecting collisions between the objects/features. Collision models may be categorized as being either mesh models or parametric models.

Mesh models include, for example, general mesh and convex mesh types. Examples of static data associated with mesh types include: a list or an array of three dimensional points commonly called vertices; a value indicating the total number of vertices; a list or an array of indices to a vertex array, etc. Sets of three vertices define surface faces (or triangles) of an object/feature in an animation. Additional examples of static data associated with mesh types includes: a value indicating the total number of faces; a list of vector values specifying a geometric normal for each face; a list or array of pairs of vertices that form the edges of each face; and a transformation matrix describing the reference frame relative to which the vertex positions are specified.

Parametric model types include sphere, capsule, and box. Examples of static data related to parametric collision models include, for example, the radius of a sphere, the radii of a capsule (i.e. length and thickness), and the three dimensions defining the height, width and length of a rectangular box.

State data is also implicated in the execution of a collision detection subroutine. An example of persistent state data is a cache of data describing the animated physical proximity of objects/features in a simulation. Such a cache can be used to accelerate collision detection by exploiting temporal coherence of collisions or proximity of objects/features in a simulation. The phrase "temporal coherence" describes a condition wherein from frame-to-frame the collisions, contacts, and proximity between object/features tends to vary only a little. Examples of transient state data includes: the dimensions, position, and orientation of axis-aligned bounding boxes (AABBs) for each collision model taking part in a specific data computation; a list of potentially intersecting object/feature pairs, as deteπnined by conventional AABB overlap tests; a list of intersecting object/feature pairs; the position, normal vector, and penetration depth for each contact formed between intersecting object/feature pairs. An additional example of transient state data accessed by a collision detection subroutine is the so-called "event stream" which is a set of transient state data regarding conditions within a particular collision calculation. The main application is generally notified of the occurrence of all collision events for which it has expressed an interest. Common inputs to a collision detection subroutine include the position, orientation, velocity, and angular velocity of each object/feature active in a simulation or calculation of physics data.

Common parameters to a collision detection subroutine include value(s) defining minimum tolerable separation values between object/features below which the object/features are declared to be intersecting or interpenetrating.

A rigid body dynamics subroutine is next considered. A typical rigid body dynamics subroutine determines the animated motion of a system of non-deformable object/features. This determination calculates the forces required to prevent interpenetration of object/features and to maintain relationships between objects/features connected by joint constraints.

Examples of static data accessed by a rigid body dynamics subroutine include: the mass and moments of inertia of each body. (Each animated object/feature comprises one or more "bodies.") Additional examples of static data include the elasticity, friction coefficients, and hardness of each body. The position, orientation, type, and limit values for joint constraints between bodies are also examples of static data routinely utilized by rigid body dynamics subroutine.

Persistent state data accessed by rigid body dynamics subroutines includes, for example, the position, velocity, orientation, angular velocity, and activation status (asleep or awake) of each body. Transient state data includes: the forces of constraint at each contact between rigid bodies as required to prevent interpenetration and/or to simulate friction between two intersecting objects; the displacement of joints with respect to various constraint condition(s), (i.e., joint error), and the movement of joints into limiting regions, (i.e., limit error). Common inputs to rigid body dynamics subroutines include; externally applied forces and torques, or vector values provided by the main application to produce a desired effect on bodies involved in the simulation, and externally imposed constraint velocities, (e.g. motor control values on certain joints).

Ready examples of parameters applied to rigid body dynamics subroutines include the size of an integration time-step, and the number of iterations performed by a constraint solver algorithm.

A smoothed particle hydrodynamics (SPH) subroutine will next be considered. Smoothed particle hydrodynamics subroutines simulate the motion of a volume of fluid. This is typically done by calculating the positions of many particles forming the fluid system and thereafter approximating the fluid surface from calculated particle positions.

Static data describing a fluid system typically include rest density, viscosity, rest spacing, particle lifetime, surface-to-particle distance, surface blending coefficients, and boundary particle modeling coefficients. Examples of persistent state data include the position and velocity of the fluid particles. Transient state data examples include the pressure and density of the fluid at each particle position.

Typical inputs to a smoothed particle hydrodynamics subroutine include: particle emitter position, flow rate, orientation, size and shape; drain position and orientation; and, a value defining the maximum number of particles in the system.

Typical parameters applied to a smoothed particle hydrodynamics subroutine include various configuration values, a solver time-step, and inter-particle potential (or kernel).

The output of a smoothed particle hydrodynamics subroutine (i.e., data returned to the host system following execution of the subroutine) is typically a collection of data vertices defining a mesh that represents the surface of the fluid. A surface extraction algorithm generates the mesh vertices by examining the separation between particles to determine which regions of space contain fluid and which are empty. The mesh data is typically not used by the smoothed particle hydrodynamics subroutine in subsequent frames. Rather the mesh data is transmitted directly to a vertex buffer in main memory preparatory to its use by a graphical rendering subroutine.

A clothing simulation subroutine is a more specialized type physics subroutine. For example, the physically realistic motion of a graphics mesh animating clothing related to an object in the main application is simulated by assigning the properties of mass and spring-like resistance to the vertices of a control mesh whose shape is applied to deform the graphics mesh. Collision detection is performed between the points of the control mesh and another object/feature representing an underlying character or a background environment.

Static data typically required for a clothing simulation subroutine include: a non-deformed clothing mesh; mass values for the vertices defining the control mesh; spring and damping coefficients values for these vertices; and, collision response coefficients for these vertices. Within the context of a clothing simulation subroutine the position of each control vertex is an example of persistent state data. This particular type of persistent state data is typically retained internal to the PPU. In contrast, the clothing simulation subroutine typically returns a matrix of values defining the displacement of each vertex in the clothing mesh with respect to the non-deformed graphics mesh defining the underlying character object.

Inputs to a clothing simulation subroutine include; kinematics or dynamic properties for the underlying character object, (i.e. the position, orientation, velocity and angular velocity of body forming the object), and the collision shape of the underlying character object. Parameters typically applied to a clothing simulation subroutine include; a time-step value defining a period of time over which the clothing simulation evolves, and convergence conditions.

The foregoing are but a few selected examples of physics subroutines used to generate the physics data required for a physically realistic animation. The term "subroutine" generally refers to any section of software smaller than a complete application. Thus, subroutines will vary in size, nature, and scope across a wide range.

Each of the functional subroutines described above is resident, at least in part, on the PPU. Each subroutine receives inputs, parameters and/or input data structures, operates upon one or more data sets, and thereafter typically returns physics data to the host system. Subroutines typically access static data provided by the host system, and most commonly provided by definitions found in a main application. Each physics subroutine accesses, defines, calculates, updates, and/or outputs persistent state data. Transient state data is often used to calculate persistent state data or respond to an input.

Regardless of the specific type of physics subroutine implicated in a physics data computation, state data must be coherently maintained between the PPU and the host system. That is, in one aspect of the present invention, the overall utility of a PPU-enabled host system requires effective coordination of state data between the host system and the PPU. Unfortunately, bandwidth limitations in the communications channel(s) between the host system and the PPU restrict the volume of data that can be exchanged in a given time period. As presently contemplated, a PPU will be connected within a system using conventional architectures, such as PCI or PCI Express (PCIe). PCI/PCIe transmission protocols are bandwidth limited. However, even in the contemplation of future generations where a PPU might be combined on the same card as a GPU within a system, combined within the same chip as a GPU, provided as a dedicated co-processor internal to the host system, or combined within the same integrated circuit as the CPU - - bandwidth limitations will remain a serious design consideration.

To accommodate these data transmission bandwidth limitations, a PPU- enabled host system maintains local copies of some relevant state data on the PPU (e.g., in the PPU main memory and/or in an internal PPU memory) as well as the host system. Different physics subroutines "mirror" state data between the PPU and host system in different ways. However, the static and state data copied to and periodically updated in the PPU allow the PPU to run asynchronously and in parallel with the host system. The coherent management and update of mirrored state data in the PPU and host system is thus an important design consideration. Several selected subroutines will now be described in some additional detail to further highlight this consideration. In the following discussion, the order of the described method steps is merely illustrative, not limiting. Further, numerous additional, but routine methods steps have been omitted from the description for purposes of clarity.

A collision detection subroutine is first described in conjunction with the partial flowchart shown in Figure 4. As previously noted, a collision subroutine typically operates on static data structures defining 3-D objects/features and rays. (These object/feature animations are actually displayed in two dimensions, but are generally termed "3-D" because to the human eye they appear within an animation to have depth as well as length and height). The static data structures defining the 3-D object/features and rays are termed "collision models," and include at least mesh models and parametric models. The collision detection subroutine performs geometric calculations to determine whether any two or more of the 3-D object/features intersect or overlap in the simulation (and therefore the animation). The tolerance of intersection and/or overlap of object/features is defined by parameters.

Collision models are generally defined by a main application running on the host system. During a general initialization or a subsequent initialization subroutine, the main application extracts collision models and related parameters - typically from a host system peripheral or the Internet (30). Collision model data structures (e.g., data lists, matrixes, vectors tables, etc.) are generated in accordance with the extracted collision models (31). As presently preferred, collision model data structures are defined using the host system CPU, however, the PPU but might optionally be used for this task.

The main application next determines inputs applied to each collision model (32). Following definition of the parameters, collision model data structures, and inputs by the host system, this collection of data is transferred to the PPU via a competent communications channel.

Parameters, inputs, and collision model data structures are stored in one or more memories associated with the PPU (35). Once transferred from the host system, each collision model data structure is retained in PPU main memory, i.e., are stored locally throughout the simulation, unless specifically removed by a command from the main application. Upon receiving an appropriate subroutine call issued from the host system CPU (34), the PPU executes the collision detection subroutine in order to calculate the potential collision and/or overlap of object/features with other object/features (36). Accordingly, the execution of the collision detection subroutine results in the generation of considerable transient state data that is consumed by the PPU without general reference by the host system. The host system often retains only a list of identifiers for the collision models presently "known" to the PPU. However, the host system may optionally retain copies of one or more collision model data structures for purposes and tasks unrelated to the PPU.

Execution of the collision detection subroutine generates a output data (37). At a minimum, the output data describes calculation results from the collision detection subroutine indicating the collision and/or overlap of object/features. The output data is returned to the host system using one of several conventional techniques including, for example, a DMA transfer, a register write and flag set; a periodic interrupt of the CPU by the PPU; or a response or data return to a periodic polling by the CPU (38). Output data is often very important to the proper execution of the main application, and thus the returned output data is generally analyzed by the main application as soon as reasonably possible once it becomes available within the host system (40). At some point in time, the collision detection subroutine finishes and updates the relevant persistent state data stored in PPU memory (39). At some point in time, the PPU may receive updated or new values for inputs, parameters, and/or collision model data structures (42). In the illustrated example, the main application upon analyzing the returned output data (40), determines whether upon the basis of the returned output data new collision detection data is required (41). Updates and new data may be received by way of an interrupt, a scheduled fetch cycle, a periodic polling call, or any other conventional data transfer technique (43). The use of commands (instruction and/or messages) defined by one or more conventionally provided Application Programming Interfaces (APIs) is contemplated within the present invention. API commands are used, for example, to call the collision detection subroutine, update existing and/or insert new collision model data structures, and/or transfer data between the PPU and host system. A rigid body dynamics subroutine is next considered in conjunction with the partial flowchart shown in Figure 5. The rigid body dynamics subroutine calculates the motion of "active" object/features within an animation. In conjunction with output data received from the collision detection subroutine, the rigid body dynamics subroutine also calculates the forces required to prevent interpenetration of object/features and/or maintain relationships between joint-connected bodies forming an object. The rigid body dynamics subroutine is iterative in nature and computationally intense. Accordingly, it is an ideal candidate for parallel and asynchronous execution by a PPU relative to execution of a main application by a host system. Object/features are defined by the main application as being either first class or a lower class (e.g., second class, third class, etc.) This definition typically occurs, along with the definition of input and parameter data, during a data initialization step within the main application (50). Object/features are assigned class designation (or priority) in relation to many factors. One important factor in determining an object/feature class is the relationship of that object/feature to main application subroutines and processes. For example, certain object/features may be directly rendered by the GPU from data provided by the PPU and stored within the host system's main memory. Other object/features require additional processing by one or more subroutines in the main application before consumption by a graphical rendering subroutine. Thus, data related to object/features requiring additional CPU-based processing are designated as first class (i.e., highest priority). First class designation may thus be thought of as indicating longer lead-time data structures requiring CPU processing within a given animation frame, as compared with shorter lead-time data structures that are merely written to main memory by the PPU and subsequently accessed by graphical rendering subroutines.

Corresponding rigid body data structures are generated for both first and lower class objects/features (51). As presently contemplated, the host system will maintain local copies of at least the data structures related to first class objects/features (52). The host system transfers the rigid body data structures, inputs, and parameters to the PPU via a competent communications channel using appropriate API commands (53). This collection of data is thereafter stored in one or more local memories associated with the PPU (55). Upon receiving an appropriate command from the host system CPU (54), the PPU executes the rigid body dynamics calculations. Such calculations are executed on an object/feature by object/feature (or a small group of object/features by small group of object/features) basis during a prescribed period of time. The rigid body dynamics subroutine completes all calculations related to first class objects/features before undertaking calculations related to lower class objects/features. Thus, the rigid body data structures transferred from the host system to the PPU will typically include a data component indicating the class of the object/feature.

Normally, each object/feature implicated in the rigid body dynamics subroutine must be accounted for during each animation frame. Accordingly, the rigid body dynamics subroutine locally updates persistent state data on a class by class basis, beginning with first class objects/features (57). Updates to first class objects/features data are preferably transferred to the host system as soon as available (58), and stored in a designated location in main memory (61). The persistent state data associated with lower class object/features is subsequently updated in the PPU (59), and transferred to the host system (60), updated in a designated location in Main memory (62), and thereafter made available for use by graphical rendering subroutines (63).

The main application may insert new rigid body data structures, input and/or parameters into the rigid body simulation being executed on the PPU (64). Where this occurs the rigid body dynamic subroutine updates the rigid body data structures, inputs, and/or parameters stored in PPU memory (65) before looping back to again execute the rigid body dynamics calculations.

A Smooth Particle Hydrodynamics (SPH) subroutine will next be considered in conjunction with the partial flowchart shown in Figure 6. A fluid system with inputs and parameters is first defined by the main application (80). A set of fluid system data structures is initialized (81) and thereafter transferred, together with the inputs and parameters (82) to the PPU.

Once the fluid system data structures, inputs, and parameters are stored in one or more memories associated the PPU (83), and upon an appropriate command received from the host system CPU (89), the PPU executes the one or more algorithms that characterize the SPH subroutine (84). The SPH-related, persistent state data is updated in the PPU every time-step (85). Also, every time-step, the SPH subroutine or a related subroutine resident on the PPU extracts output data defining, for example, an isosurface mesh from the data provided by the SPH subroutine (86). This output data is returned to the host system (87) and subsequently consumed by one or more graphical rendering subroutines (88). This set of steps is readily accomplished by periodically writing SPH-related data to a vertex buffer resident in the main memory that is capable of being accessed by the GPU and/or the CPU.

In one aspect, as noted above, the present invention provides the ability to prioritize object/features implicated in a physics subroutine. This general ability is extended within the present invention to schemes which prioritize various physics subroutines, or various software packages implementing the physics subroutines. The use of logically partitioned subroutines, and or enabling packages, within the context of an overall software program is well known. However, the efficient utilization of both CPU and PPU resources is further enabled by performing inter-subroutine, or package call, prioritization. That is, some subroutines will be run on the PPU before others in order to provide the main application with temporally critical state data updates and/or data structures requiring significant additional processing by the CPU before graphics rendering. For example, a rigid body dynamics subroutine should typically be run before a SPH subroutine.

Inter-subroutine, or package call, prioritization may be implemented in many different forms. For example, one subroutine may be allowed to interrupt another based on its higher priority. Alternatively, the main application may sequentially prioritize the commands transferred to the PPU to initiate multiple physics subroutines. Alternatively, the main application may transmit a batch of commands to the PPU, whereupon the PPU discriminates between the commands and sequentially executes a sequence of physics subroutines in a manner most likely to enhance the asynchronous and parallel operation of the PPU and CPU. Several of these related concepts are further illustrated in the partial flowchart shown in related Figures 7A, 7B, and 1C. This exemplary partial flowchart is written from a main application perspective.

As conceptually illustrated in Figure 7A, prior to executing a physics simulation, the main application generally "sets-up" the PPU by storing (or updating) any required data values, such as static data, inputs, parameters, and/or state data. This body of physics data, which will vary from application to application and simulation to simulation is generically referred to in this example as "simulation data." Simulation data preferably includes static data, state data, parameters and/or inputs, as previously described. Application initialization of the PPU is accomplished in three general steps. First, the application classifies object/features, and their related simulation data (90). Thereafter, a package is called, via one ore more calls, to issue one or more commands to initialize (i.e., wake-up) the PPU (91). This step may include a PPU system verification. Once the PPU is up and running, another one or more package call(s) transfers requisite data (including simulation data, PPU computation software modules, etc.) to the PPU (92). Typically these general initialization steps need only be run once prior to execution of the main application, which implicates execution of the physics subroutines on the PPU.

Following simulation data and PPU initialization (both hardware and software), the main physics simulation loop is ready to run. It is within the context of this main simulation loop that the main application calls one or more physics subroutine to generate the desired physics data. The main simulation loop generally runs as a sequence of time-steps and is initiated by one or more calls from the main application (93). The duration of a time-step is usually defined in relation to an animation frame rate (e.g., l/30^th of a second).

As presently preferred, the main simulation loop is functionally defined by a "task list" generated during PPU initialization steps. The task list includes at least a sequence of the physics subroutines and/or their constituent packages to be run on the PPU. Upon first beginning the main simulation loop, the simulation data downloaded to the PPU is complete and correct. Thus, a first determination by the main application as to whether the simulation data needs updating is "no" (94). However, in subsequent time-steps, the simulation data will require update before the main simulation loop continues. Simulation data update is accomplished by calling a package to appropriately modify the simulation data in the host system main memory (95).

Once the simulation data has been properly updated, another package is called to begin the next time-step of the main simulation loop (96). In the present example, a simple loop of two physics subroutines is illustrated. First, a collision detection subroutine is run and thereafter a rigid body dynamics subroutine is run.

Once the next time-step in the main simulation loop is stated the PPU immediately begins to run the collision detection subroutine, whereas the host system is likely to execute other main application module not directly related to the generation of physics data. (See, e.g., Figure 3). However, at some point in time or at some predetermined interval the main application determines whether the collision detection subroutine is complete (97). Where it is determined that the collision detection subroutine is not yet complete, the main application is able to execute other main application modules (98). For ease of reference, any of the multitude of modules capable of being run on the host system while the collision detection subroutine is being run on the PPU is termed in the flowchart a "non-PPU" module. Where the main application determines that the collision detection subroutine is complete, it calls a package allowing access to the collision detection output data (100). (See, Figure 7B). This package generally transfers output data from the PPU, stores the output data in main memory, and indicates (e.g., sets a flag) to the host system that the output data is available for consumption by the main application.

At this point, an optional sequence of steps may be incorporated into the main simulation loop. (Optional inclusion is determined during PPU initialization). The optional steps allow update of the simulation data prior to running the rigid body dynamics subroutine. The optional steps first determine whether the simulation needs update as a result of the collision detection output data (101). If yes, the current time- step pauses and the main application calls a package to modify the simulation data (102). Once the simulation data is appropriately modified, another package call continues the current simulation time-step ( 103).

While the PPU executes the rigid body dynamics subroutine, the main application is available to run non-PPU application modules (104). However, at this point in the main simulation loop, the collision detection output data is available for consumption by other main application modules. So, long as the main application determines that the rigid body dynamics subroutine is not yet complete for all 1^st class objects (105), the main application continues with non-PPU applications.

Once the rigid body dynamics subroutine has completed calculations for all 1^st class object/features, the main application calls a package allowing access to output data related to the 1^st class object/features (106). Following calculation and transfer of the rigid body dynamics output data related to 1^st class object/features, the rigid body dynamics subroutine will continue calculations related to lower class object/features. (See, Figure 7C). During this time period, the main application is again free to execute non-PPU related main application modules (110), until the main application determines that output data related to lower class rigid body object/features is ready, i.e., that the rigid body dynamics subroutine is complete,

(111). However, continued execution of the main application during this time period may draw upon the rigid body output data related to 1^st class objects/features.

Once the rigid body dynamics subroutine is complete, the main application calls a package allowing access to the lower class rigid body output data (112). A final time period is allocated for the execution of non-PPU main application modules requiring lower class rigid body output data as inputs (113). Following this final execution step, the main application determines whether the simulation is complete (114) and if so, terminated the main simulation loop (1 15). Where it is determined that the simulation is not yet complete, the main simulation loop returns to again to begin a next time-step (93).

At various points during the execution of a main simulation loop, modifications to the simulation data is required. This is most common in relation to state data, but as has been described static data, parameters and inputs must also be updated during the execution of a main simulation loop. Figures 8 and 9 illustrate exemplary methods by which static and state data may be modified during execution of a main simulation loop. (See, as examples, steps (95) and (102) in the foregoing discussion). The modification of static data, and this method might well be applied to the

PPU initialization routine discussed above in the context of steps (91) and (92), begins with a main application call to a package (120). This package call causes the host system CPU to update static data entries stored in main memory (121). Once static data is properly updated in main memory, the host system CPU preferably creates a command message describing the "new" static data entries (122). The CPU then transfers the command message and associated data from the main memory to PPU memory (123). A Direct Memory Access (DMA) (123) operation is one presently preferred technique for transferring command messages and/or data between the host system and PPU. (Other alternate techniques include writing a command or command message to shared memory or to a shared register, etc.). DMA operations are conventional and controlled by a separate DMA Controller (DMAC). Upon receiving the command message, it is processes in the PPU, preferably by the PCE (124). In response to the command message, static data entries stored in the PPU are updated to correspond with the new static data already stored in the main memory (125).

As shown in Figure 9, the update of state data by the main application proceeds similarly. It begins with a main application call to a package (130). This package call causes the host system CPU to update state data entries stored in main memory (131). Once state data is properly updated in main memory, the host system CPU preferably creates a command message describing the new state data entries (132). The CPU then transfers the command message and associated data from the main memory to PPU memory (133). Upon receiving the command message, it is processes in the PPU, preferably by the PCE (134). In response to the command message, state data entries stored in the PPU are updated to correspond with the new static data already stored in the main memory (135).

The interaction between the main application running on the host system and the PPU-resident computational processes is further described in relation the exemplary method shown in Figure 10. With simulation data and related PPU computational modules already loaded and ready to execute on the system, the main application calls a package to advance one time-step (140). In response, the CPU creates a command message (141) and thereafter issues a DMA to send the command message to the PPU (142). The PPU processes the command message using conventional techniques

(143) and thereafter advances the simulation one time-step. Advancing the simulation one time-step typically involves executing a sequence of packages associated with a series of physics subroutines forming the simulation. The sequence of packages is identified in a task list communicated from the host system to the PPU during system initialization. By advancing the simulation one time-step, the PPU generates at least a set of updated state data values that must be feedback to the main application such that the main simulation loop may continue properly.

Accordingly, the PPU modifies the state data entries in the PPU memory (145), creates one or more message(s) describing the state data updates (146), and transfers the state update message to the host system, preferably by means of an DMA operation (147). Upon receiving and processing the state data update message (148), the CPU modifies the state data entries in main memory (149), and calls a package to query the updated state data (150). The main application may execute one or more subroutines on the host system, or command execution of one or more physics subroutines on the PPU in response to its query of the updated state data.

As described above, near term embodiments of PPU-enhanced host system are likely to be characterized by the presence of more physics data (such as state data), as generated by the PPU, than can be practically transferred via conventional communications channels, such as PCI and PCIe, during prescribed time periods related to animation frame rates. Accordingly, in a related aspect, the present invention provides a scheme whereby only the physics data related to selected object/features or events is communicated from the PPU to the host system. The set of object/features or events for which data is transferred from the PPU to the host system is termed "registered events," since it is typical for the main application to designate or register interest in a subset of the object/features or events implicated in any given frame of an ongoing animation.

Registered events related to data derived from the execution of one or physics subroutines. Registered events are typically defined in relation to objects/features or events apparent in a selected portion of an animated scene. This concept is further explained in relation to the diagram shown in Figure 11. In Figure 11, the main application is assumed to be a PC or console game program in which a character 160 moves through an animated world space 161. Character 160 possesses a field of vision which can be controlled by the game user. Thus, the game user can only "see" a portion of the animated world space 161 defined by the animated character's field of vision. This field of vision is typically frustum shaped to approximate a human field of vision, but other geometric shapes might be used. In Figure 1 1, a frustum shaped field of vision 162 moves across the animated world space as character 160 is moved and rotated within the animated world space. In one selected embodiment of the present invention, this frustum shaped field of vision is used by the main application to define registered events during a predetermined period of time, such as one animation frame. Accordingly, in the illustrated example, objects/features (a), (b), and (c) are designated as registered events, because they appear at least partially with the frustum shaped field of vision 162. In contrast, objects/features (d), (e), and (f) located "behind" character 160 in the animated world space are not "visible" to the game user and are therefore unnecessary to the proper animation of this particular scene.

Within this exemplary context, the main application registers interest in objects (a), (b), and (c) which become registered events during the frame. Physics data related to these registered events is calculated and transferred to the host system. In contrast, object/features (d), (e), and (f) may become visible to the game user in subsequent animation frames, if character 160 turns around, thereby casting the frustum shaped field of vision over these presently "unseen" objects/features. In such a situation it is advantageous for the PPU-resident subroutines to update physics data for all objects/features whether they are presently associated with a registered event or not.

In this manner, the present invention greatly reduces the demand for communications bandwidth between the PPU and the host system. The present invention also provides efficient partitioning between the expenditure of CPU resources necessarily drawn to the physically realistic animation of "visible" objects features, and the expenditure of PPU resources to track "non-visible" objects/features in a scene. The CPU need not accept, store, and/or maintain the volumes of physics data associated with non-registered objects/features. However, an accurate real time rendering of such non-registered objects/features is readily available in subsequent animation frames because the PPU faithfully calculates, updates, and maintains physics data for ALL objects/features within an the scene. This is true for first class as well as lower class objects/features.

Stated in other terms, the PPU is able to generate a volume "X" data bits of physics data related to "n" object/features within a scene. However, communication channel bandwidth are limited to "Y" data bits per unit of time, where "Y" is less than "X". This communication channel bandwidth limitation is readily accommodated in a PPU-enabled host system by defining a set of set of registered events related to "m" object features, where "m" is less than or equal to "n." Thereafter, the PPU returns during a unit of time only a portion of the physics data "Z" to the host system, where this portion comprises physics data related to the "m" object/features designated by registered events, so long as the "Z" data bits per unit time remains less than "Y."

The present invention has been described above in relation to several presently preferred embodiments. The preferred embodiments, whether related to specific system or hardware related aspects or software aspects, are presented as examples teaching the making and use of the present invention. The scope of the present invention in not limited to just these examples. Rather, it is defined by the attached claims.

Claims

What is claimed is:

1. A system comprising: a host system comprising at least a Central Processing Unit (CPU) and a main memory, wherein the main memory stores, at least in part, a main application, and wherein execution of the main application defines a body of simulation data; and, a Physics Processing Unit (PPU) and a PPU memory, wherein the PPU memory stores, at least in part, one or more physics subroutines, and wherein execution of the one or more subroutines generates physics data in relation to the body of simulation data; wherein execution of the main application by the host system occurs, at least in part, asynchronously and in parallel with execution of the one or more physics subroutines by the PPU.

2. The system of claim 1, wherein the host system further comprises: one or more peripherals comprising a display; a Graphics Processing Unit (GPU) and GPU memory, wherein the GPU memory stores, at least in part, graphical rendering subroutines adapted to animate a scene on the display; wherein execution of the graphical rendering subroutines by the GPU occurs, at least in part, asynchronously and in parallel with execution of the one or more physics subroutines by the PPU.

3. The system of claim 2, wherein at least part of the main application is initially defined from data accessed by one of the peripherals.

4. The system of claim 2, wherein the CPU, the PPU and the GPU are independently capable of writing data to and receiving data from the main memory.

5. The system of claim 4, wherein the main memory further comprises a Direct Memory Access (DMA) controller executing DMA operations in response to commands received from the main application.

6. The system of claim 4, wherein execution of the main application transfers simulation data stored in the main memory to the PPU memory via a communications channel, and wherein execution of the one or more physics subroutines transfers physics data stored in PPU memory to the main memory via the communications channel.

7. The system of claim 6, wherein the simulation data comprises at least one selected from a group consisting of; static data, state data, inputs, and parameters.

8. The system of claim 7, wherein the static data comprises at least one selected from a group consisting of; collision model data structures, rigid body data structures, and fluid system data structures.

9. The system of claim 6 wherein the physics data comprises at least one selected from a group consisting of; persistent state data, transient state data, collision detection output data, mesh data, and vertex data.

10. The system of claim 6, wherein the communication channel is compatible with at least one of PCI and PCI express.

11. The system of claim 6, wherein execution of the graphical rendering subroutines transfers physics data stored in the main memory to the GPU memory.

12. The system of claim 1, wherein the one or more physics subroutines comprises at least one selected from a group of subroutines consisting of: a collision detection subroutine, a rigid body dynamics subroutine, a smooth particle hydrodynamics subroutine, and a clothing simulation subroutine.

13. A method of operating a system, the system comprising: a host system comprising at least a Central Processing Unit (CPU) and a main memory, wherein the main memory stores, at least in part, a main application; and, a Physics Processing Unit (PPU) and a PPU memory, wherein the PPU memory stores, at least in part, a plurality of physics subroutines; the method comprising: assigning a priority to each one of the plurality of physics subroutines; and, upon receiving a command from the main application, executing the plurality of physics subroutines in a sequence defined in accordance with their respective priorities.

14. The method of claim 13, wherein assigning a priority to each of the plurality of physics subroutines comprises: defining a task list by execution of the main application and transferring the task list to the PPU memory.

15. The method of claim 13, further comprising: by operation of the main application, defining simulation data comprising at least one selected from a group consisting of; inputs, parameters, static data, state data, and data structures; and thereafter, transferring the simulation data to the PPU.

16. The method of claim 13, further comprising: returning physics data from the PPU to the main memory following execution of the plurality of physics subroutines.

17. A method of transferring physics data from a Physics Processing Unit (PPU) and its associated PPU memory to a host system comprising; a Central Processing Unit (CPU) and a main memory via a communications channel having a maximum data transfer capacity of "Y" data bits per unit of time, wherein a total physics data comprises "X" data bits related to "n" object/features within an animation scene, and the method comprising: selecting "m" object/features in the animation scene, where "m" is less than or equal to "n"; and, transferring only a portion "Z" of the total "X" data bits during a unit of time, wherein the portion "Z" is less than or equal to "Y", and wherein the portion "Z" relates to the "m" selected object/features.

18. The method of claim 17, further comprising: by operation of the PPU, executing one or more physics subroutines and generating the total physics data; and thereafter, storing the total physics data in the PPU memory.

19. The method of claim 18, wherein the total physics data comprises state data related to the "n" object/features.

20. The method of claim 19, wherein the ^urrT object/features are selected in relation to a geometric region defined within the animation scene.

21. The method of claim 20, wherein the geometric region is a frustum shaped region associated with an animated character's field of view.

22. A method of executing a physics subroutine on a system, the system comprising: a host system comprising a Central Processing Unit (CPU) and a main memory, wherein the main memory stores, at least in part, a main application; and, a Physics Processing Unit (PPU) and a PPU memory, wherein the PPU memory stores, at least in part, a physics subroutine; the method comprising: beginning execution of the main application by the host system; by operation of the main application, initializing simulation data comprising at least one data structure, input, and parameter; transferring the simulation data from the main memory to the PPU memory via a communications channel; executing the physics subroutine on the PPU using the simulation data while continuing parallel execution of the main application on the CPU; storing physics data generated by the execution of the physics subroutine in the PPU memory; and, returning the physics data from the PPU memory to the main memory.

23. The method of claim 22, wherein the physics subroutine comprises a collision detection subroutine, wherein the simulation data comprises a collision model data structure, and wherein the returned physics data comprises collision detection output data.

24. The method of claim 22, wherein the physics subroutine comprises a rigid body dynamics subroutine, wherein the simulation data comprises at least one rigid body data structures, and wherein the returned physics data comprises persistent state data related to the at least one rigid body data structures.

25. The method of claim 23, further comprising: designating object/features corresponding to the at least one rigid body data structures as being either first class or a lower class.

26. The method of claim 25, wherein executing the physics subroutine further comprises: first executing the rigid body dynamics subroutine in relation to first class object/features; and thereafter, executing the rigid body dynamics subroutine in relation to lower class object/features.

27. The method of claim 26, wherein the step of returning the physics data further comprises: first returning physics data related to first class object/features; and thereafter, returning physics data relating to lower class object/features.

28. The method of claim 22, wherein the physics subroutine comprises a smooth particle hydrodynamics subroutine, wherein the simulation data comprises a fluid system data structure, and wherein the returned physics data comprises mesh data.

29. A method of executing a main simulation loop operating on a body of simulation data on a system, the system comprising: a host system comprising a Central Processing Unit (CPU) and a main memory, wherein the main memory stores, at least in part, a main application; and, a Physics Processing Unit (PPU) and a PPU memory; the method comprising: by operation of the main application, defining the body of simulation data and defining a plurality of PPU computational modules corresponding to a plurality of physics subroutines; and, initializing the PPU and PPU memory.

30. The method of claim 29, wherein initializing the PPU and PPU memory comprises: storing at least a portion of the simulation data in main memory; and, transferring the simulation data and the plurality of PPU computational modules to the PPU memory.

31. The method of claim 30 wherein the PPU computational modules are defined in relation to a task list generated by the main application.

32. The method of claim 31, further comprising: executing a first physics subroutine, as defined by the plurality of PPU computational modules, on the PPU while, at least in part, asynchronously and in parallel executing a first portion of the main application on the host system, until execution of the first physics subroutine is complete.

33. The method of claim 32, further comprising: returning a first set of physics data from the PPU to the host system upon completion of the first physics subroutine; and, allowing access to the first set of physics data in the host system.

34. The method of claim 33, further comprising: executing a second physics subroutine, as defined by the plurality of PPU computational modules, on the PPU while, at least in part, asynchronously and in parallel executing a second portion of the main application on the host system, until execution of the second physics subroutine is complete; wherein proper execution of the second portion of the main application requires access to the first set of physics data.

35. The method of claim 34, wherein the first physics subroutine is a collision detection subroutine, and the second physics subroutine is one selected from a group consisting of: a rigid body dynamics subroutine, and smooth particle hydrodynamics subroutine.

36. The method of claim 34, wherein the first physics subroutine is a rigid body dynamics subroutine as applied to first class object/features, and the second physics subroutine is the rigid body dynamics subroutine as applied to lower class object features.

37. The method of claim 29, further comprising: modifying the body of simulation data stored in main memory; creating in the host system a command message describing a modified body of simulation data; transferring the command message to the PPU; and, modifying simulation data stored the PPU memory in response to the command message.

38. The method of claim 37, wherein the modified body of simulation data comprises modified static data.

39. The method of claim of claim 37, wherein the modified body of simulation data comprises modified state data.