GB2312141A

GB2312141A - Generating areas of shadow in 2d image of 3d world

Info

Publication number: GB2312141A
Application number: GB9706004A
Authority: GB
Inventors: Karl-Heinz Klotz
Original assignee: Discreet Logic Inc
Current assignee: Discreet Logic Inc
Priority date: 1996-04-11
Filing date: 1997-03-22
Publication date: 1997-10-15
Anticipated expiration: 2017-03-22
Also published as: GB9706004D0; GB2312141B

Abstract

Image data representing a three dimensional view is processed to produce a rendered two dimensional image plane. Data including a definition of a light source 1003 is considered and shadow planes 1007 are calculated with respect to said light source. A first characteristic is identified for an edge and a second characteristic of said edge is also identified. These characteristics may be references to the relative positions with respect to viewing directions and an indication as to whether the edge is shared with another polygon eg. 1008. In response to these identifications a data plane is incremented or decremented in an amount responsive to the identified combination on a pixel-by-pixel basis. Each shadow plane 1007 is designated as positive if the viewing vector goes into the shadow volume and negative for an exiting vector. Vector through point 1006 on polygon 1008 intercepts positive plane 1007 only, designating a shadow pixel, when the shadow volume 1004 does not intercept the polygon, eg at 1005 or at the far side of the volume, no plane is intercepted or positive and negative planes cancel out. A non-shadow pixel is designated. See Figs 13 and 14 for full algorithm.

Description

Title: PROCESSING IMAGE DATA The present invention relates to generating shadows for virtual objects.

INTRODUCTION Techniques for generating realistic three-dimensional synthetic images are becoming established in increasingly diverse applications due to the steady decrease in cost of high performance processing components, and the continuing advance in the art of graphic manipulation procedures. As the realism of synthetic images improves, a clear goal has been identified, which is to produce synthetic images which are indistinguishable from real images. While this goal may be attainable when a single image is to be generated, the rapid generation of picture frames which represent complicated moving and interacting objects in real-time requires considerable computational resources.

Although several techniques for generating shadows of virtual objects are known, performing these operations in real time requires considerable processing power. A classic example of the need to generate shadows in real time is the virtual studio, where real images from a camera are combined with a model of a virtual world in order to give the illusion of a larger and visually appealing studio set. Any delay in calculating the image of the virtual world is highly undesirable, and there is a strong temptation to take mathematical short-cuts, for example not bothering to do any shadowing, in order to keep costs of graphics processing to a minimum. This results in a less than convincing virtual studio appearance.

The advantage of the virtual studio is that only a small real studio space is required, upon which the image of a much larger virtual studio area may be imposed, including the various three-dimensional stage props and logos specific to a program. Once a recording for a particular program has been completed, the entire virtual set may be replaced instantly, so the studio is ready for use in a completely different television program. In a traditional studio, different hardware, in the form of stage props and so on, may be needed for each different program. In the course of a week, many dozens of different television programs with different stage props may be required, which would either have to be stored carefully, or alternatively constructed from scratch.

It is the aim of the present invention to provide an improved method for matching calculating shadows of virtual objects.

SUMMARY OF THE INVENTION According to an aspect of the present invention, there is provided a method of processing image data to identify a shadow surface in a projection of three dimensional objects represented by said data, including a light source, comprising steps of; identifying a shadow plane cast by an edge of an object; identifying the first characteristic of said edge; identifying a second characteristic of same edge; and incrementing or decrementing a data plane in an amount responsive to an identified combination of said first and said second characteristic on a pixel-by-pixel basis or a rendered image plane.

BRIEF DESCRIPTION OF THE DRAWINGS The invention will now be described by way of example only, with reference to the accompanying drawings, in which: Figure 1 shows a real set in a virtual studio, including a television monitor; Figure 2 shows the combined image shown on the monitor shown in Figure 1; Figure 3 details control equipment used to generate the combined image shown in Figure 2, including a graphics processor; Figure 4 details connections between the graphics processor shown in Figure 3 and other equipment used in a virtual studio; Figure 5 details the graphics processor shown in Figure 3 and Figure 4, including a rendering processor and shared memory; Figure 6 details processes for combining live camera signals with virtual set images which are performed by the rendering processor shown in Figure 5; Figure 7 details data structures stored in the shared memory shown in Figure 5, including a scene tree, executable scripts and object animation functions; Figure 8 details processes and relationships for modifying the scene tree shown in Figure 7, including a process of constructing a display list; Figure 9 details the process of constructing a display list shown in Figure 8; Figures 1 0A, 1 0B and 10C detail stages of determining a shadow area; Figure 11 details processes for determining shadow areas, including calculating shadow areas for each object and updating stencil plane pixels; Figure 12 details a known method for the process of calculating shadow areas for each object and updating stencil plane pixels; Figure 13 details modifications to the method shown in Figure 12, including defining a variable; and Figure 14 details incrementing a stencil plane using the variable defined in Figure 13.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The invention will now be described by way of example only with reference to the previously identified drawings.

A virtual studio is shown in Figure 1. The studio set includes a presenter 101 against a blue background 102. A television camera 103, fitted with a zoom lens 104, is rotatably mounted on a fixed tripod 108.

The camera 103 generates a video signal which is supplied to processing equipment along a video cable 105. Sensors mounted on the camera 103 and between the camera 103 and the tripod 108, generate signals which define the pan, rotation and tilt of the camera 103, and the zoom and focus of the zoom lens 104. These signals are combined in interface and processing circuitry mounted with the camera, and are supplied over an RS432 serial data cable 106, to processing equipment.

The presenter 101 is able to view the resulting combined real and virtual images on a video monitor 107, mounted at the side of the studio set. In some circumstances, it will be necessary for the presenter to be aware of the location of virtual objects not physically located within the real set, in order to maintain a convincing illusion of their presence.

Thus, the presenter may point to a virtual object which does not physically exist, by co-ordinating their movements with the resulting image shown on the video monitor 107.

The image displayed on the video monitor 107, shown in Figure 1, is detailed in Figure 2. The presenter 101 is the only part of the displayed image included in the combined image. All the other areas 102 of the real studio within the field of view of the camera 103 are coloured blue, and are thus replaced by a synthesized virtual set. The components of the virtual set include a pedestal, 202, upon which is a statue 203. In the background there is a two dimensional backdrop 204 consisting of moving images from a film.

Thus the virtual set includes both three-dimensional and two dimensional objects, which are viewed by a virtual camera. The virtual location of the virtual camera is arranged to follow the real location of the real camera, so that a change in view of the presenter 101 will result in an appropriate shift in view of the objects in the virtual set. For example, the real camera 103 may pan to the left and zoom in slightly, so that the centre of the field of view shifts from the presenter 101 to the statue 203. Because all the virtual objects are accurately modelled in three dimensions, the parallax between the statue 203 and the background shifts accordingly. Furthermore, the two dimensional film clip shown on the virtual backdrop 204 is projected differently, so as to maintain coherence between real and virtual images.

Control over the virtual studio environment, including selection of virtual objects which are to be included in the overall image produced, is performed using the equipment shown in Figure 3. A high quality graphics terminal 301, such as that manufactured by Silicon Graphics Incorporated, displays the combined real and virtual images produced by the virtual studio. A graphics processor 302 provides the processing capability for generating the virtual set. The graphics processor 302 also receives video signals from the real camera 103 and combines these with the synthesised image of the virtual set. The graphics processor 302 is an SGI Onyx Reality Engine Two, manufactured by Silicon Graphics Incorporated.

An editing terminal 303 is used to control the set up of the virtual studio using a text editor. The editing terminal 303 is connected to an SGI Indigo workstation 304, which provides storage and editing facilities. The workstation 304 communicates with the graphics processor 302 via an ethernet connection. Thus, an operator may control the graphics environment which is synthesized by the graphics workstation 302 and displayed on the high quality graphics monitor 301, using the terminal 303 which is connected to the workstation 304.

Typical operations carried out by operators using the equipment shown in Figure 3 relate to the particular requirements of operating a virtual studio. Firstly, it is essential that the locations of the real and virtual cameras should be matched. Thus, having positioned the camera 103 on its tripod 108, and perhaps selecting a suitable type of lens 104 for the program which is to be broadcast or recorded, it is necessary to determine the exact physical location of the camera. This is done in two stages. Firstly the optical centre of the lens is located.

When mounting a lens on a camera, although the lens is mounted firmly, its precise location cannot be predicted with absolute accuracy.

Thus, when zooming in and out, the part of the video image which remains stationary is typically slightly out of alignment with the centre of the image as it is measured electronically.

For example, in a video camera which uses charge coupled devices (CCD) as its image sensors, the image comprises a matrix of pixels, with each pixel comprising three sub-pixels defining the red, green and blue components, as produced by three separate CCD sensors. The image has a precise number of pixels in the horizontal and vertical dimensions. Typically this number may be in the region of six hundred vertical pixels by eight hundred horizontal pixels. The electronic centre of the image is located at the pixel co-ordinates (400,300).

Having mounted a lens, the camera operator zooms in and out in order to determine which part of the image remains stationary. It is this location which is then considered to be the optical centre of the camera and lens combination. Having calibrated the optical centre, the camera operator need not measure the physical location of the camera. indeed, this would not be a useful measurement, since the measurements that are required must be made with respect to the precise location of an image focused onto the CCD plane, which may be located at an unknown, or at least not sufficiently precisely known, location within the casing of the camera 103.

Thus, in order to accurately calibrate the physical location of the camera, or more correctly, to match the location of the focused image in the real camera with those produced by the virtual camera, sightings of several known points in the real studio set are made. Thus, in order to define the location of the camera in three dimensions, sightings of three points in the studio are made by matching the optical centre, now marked by a cross on a monitor, with markers in the studio. The locations of these points in three dimensions are precisely known, and are fixed. Better accuracy may be achieved by sighting four or more known points, with inconsistency between the combined results being averaged to provide a reading of improved accuracy. For example, if five points are sighted, these five are subdivided into all possible permutations of groups of three. The position of the camera is calculated for each permutation, and then the average of the results is used to define the camera position.

Thus a sequence of calibrations is performed by the camera operator making various sightings, and a terminal operator, using the terminal 303, supplies appropriate control instructions to the system such that data received from the camera's rotation, pan, tilt, focus and zoom sensors, is combined in the appropriate way during these calibration procedures.

The camera 103 shown in Figure 1 supplies two types of electrical signals. The first type of signal is video, an electrical representation of the image focused onto the COD sensors in the camera. The second type of electrical signal defines the position of the camera and its lens settings. A typical zoom lens 104 mounted on a television camera includes rings for zoom, focus and aperture. Of these, the zoom and focus are required in order to define realistic realtime behaviour of the virtual camera, Thus, rotary sensors are mounted on the camera lens. These rotary sensors contain twin optical emitters and detectors, separated by a serrated disc. The disc is mechanically coupled to the movement of a lens ring, such that the passage of light between one emitter-sensor pair occurs in precedence to the passage of light between the other emitter sensor pair. Thus, the direction of rotation of the serrated disk may be detected by the precedence of an electrical signal from either of the optical sensors. Furthermore, rotation of the serrated disk results in repeated blocking and unblocking of the light reaching each sensor, and this may be used to determine a change in position. This technique is known as optical quadrature detection, and generates electrical pulses which are particularly suitable for interfacing to digital electronic circuitry.

Thus, each of the zoom and focus rings has a rotary sensor, which supplies electrical signals which may be interpreted as providing a relative indication of the respective ring position. By calibrating the absolute position of the lens rings with reference to a known visual target, the relative incrementing and decrementing electrical signals from the rotary sensors can be used to derive an absolute position of the zoom and focus rings, in conjunction with appropriate calibration instructions issued from the terminal 303 shown in Figure 3.

Additional rotary sensors are provided on the camera and its associated camera head mount, which is a multi-dimensional fixture providing freedom of movement of the entire camera in dimensions of pan - rotate about a vertical axis, or vertical panoramic, and tilt - rotate about a horizontal axis, or horizontal panoramic. The absolute values of these sensors are determined during the sighting calibration procedure described above.

Connections between the camera 103 and other studio equipment are summarised in Figure 4. The camera 103 supplies signals from its lens and position sensors to an interface unit 402 which is mounted on the camera tripod. The interface unit 402 receives separate signals from each of the sensors, and translates received pulses into counter values. The interface also receives a time signal from a synchronising source, such as a video mixer 404. This time signal enables each frame of video supplied from the camera to be correlated with accompanying position and lens data. Thus, as each new frame of video is transmitted to the graphics processor over the video connection 105, a packet of data is supplied from the interface 402 over an RS432 serial data connection to the graphics processor.

This packet of data includes a time code defining the frame to which the data relates, and the relative positions of all the optical sensors used by the camera lens and mounting system. The graphics processor is thereby able to combine images and positional and lens data received over different connections and types of interface circuit. This highly organised approach also facilitates upwards compatibility with future generations of virtual studio equipment.

The graphics processor 302 receives video signals and data from the camera 103. Video signals are also generated internally from a database which defines the virtual environment with which the real video image is to be combined. The perspective viewpoint of the virtual camera model is calculated to define a projection of the internal threedimensional virtual environment in such a way as to match the view of the real camera in the real studio. Thus movement of the real camera, and manipulations of the lens optics, is measured and transmitted to the graphics processor, where appropriate mathematical operations are performed in order to generate a synthetic projection of the virtual world which appears, as nearly as is possible, the same as if the virtual world was being viewed by the real camera looking at real objects.

Combined virtual and real views are supplied to the high resolution video monitor 301, and, in a typical studio environment, to a video mixer 404. The video mixer may receive signals from a variety of video sources 405, such as other cameras, computer generated images, video playback and recording equipment, live satellite receivers and so on. The video mixer supplies video signals to a variety of equipment 406, such as studio monitors, recording equipment, and so on. Thus, in a typical studio production environment, images from the virtual studio may be selected on cue after a title sequence for a television program has been generated by a video playback device 405.

Simultaneous monitoring of several video channels is possible using monitors 406, so that the producer may view images prior to selecting a particular source for live transmission, or recording.

In the case of a live program transmission, the video mixer may also supply signals to a transmitter uplink. Typically this comprises a digitally modulated microwave radio link to a network of broadcast transmitters, or to a geo-stationary television broadcast satellite.

The graphics processor 302 shown in Figure 4 is detailed in Figure 5. Four main processors, CPU1 501, CPU2, 502, CPU3 503 and CPU4 504 perform the various calculations and data manipulation procedures necessary to create and mix the virtual set with images from the real camera. Each processor has high speed local memory 505, 506, 507 and 508. CPU4 504 is connected directly to a rendering processor 509, which is specifically designed to perform threedimensional image rendering at high speed. The rendering processor receives video signals from the real camera 103 over the video connection 105. An analogue to digital converter circuit translates each frame of video from the camera into a digital form suitable for digital processing as video data.

The video signal is split into three components, one each for red, green and blue. Each component is supplied to its own analogue to digital converter. Each red, green and blue converter supplies digital values to a frame store connected to the rendering processor 509. The frame store is part of the high speed local memory 510 which is connected to the rendering processor 509. The rendering processor is also able to generate video signals. Each red, blue and green digital value for each pixel within a frame is supplied to a digital to analogue converter, and a video signal is derived from the outputs of this conversion process in accordance with accepted studio standards.

Thus the rendering processor 509 is able to receive video signals, process them, and generate modified video signals. The rendering processor is highly sophisticated in its capabilities, and may build complex images in response to instructions and data supplied to it from processor CPU4 504.

All four main processors 501, 502, 503 and 504 are connected via a common parallel interface. Thus it is possible to split an application into logical processing tasks. Initial conditions and end conditions for each task may be made available to all processors, but computations performed within each task are done independently. This makes it possible for each task to be performed at high speed, as there is no need to communicate with other tasks on other processors until an allocated task is complete. Furthermore, local high speed memory 505, 506, 507 or 508 may be used to store data and instructions for each task, reducing the need to communicate over a global communications bus 511.

Communication over the bus 511 is considerably slower than that which may occur between a processor and its local memory. Firstly, when communicating over the bus 511, it is necessary to ensure that only one processor attempts to control the bus 511 at any one time.

This requires time consuming bus arbitration protocols. Furthermore, if there are four processors, the maximum data bandwidth of the bus is theoretically divided by four. In practice the reduction in bandwidth is greater than this, due to the arbitration protocols.

A further speed restriction is inherent in bus designs which connect several processors. The speed at which signals may be communicated over a electrical connections is to some extent dictated by the distance over which the signals must travel. If processors are distributed over several circuit boards, the speed of the bus 511 is restricted, especially compared with the speed of data transfers between digital components communicating on a single or closely adjacent circuit board.

Thus, wherever possible, processes are split into specific tasks, which may take advantage of the particular processor architecture which is in use. For certain types of task, data may be shared between processors. Shared memory 512 is provided for this. Communications with external devices over ethernet, RS432, and high resolution monitors, computer keyboards and so on, is provided by input output interface 513.

Choosing the most efficient task mapping for processors is not straightforward, as several different trade-offs exist between speed, memory, cost and program complexity. In the preferred embodiment, CPU1 501 is allocated operating system tasks, CPU2 502 is allocated camera interface tasks, CPU3 is allocated matrix calculation and culling tasks and CPU4 504 is allocated tasks of editing, script execution, object importation, object animation and rendering. The purpose of each of these tasks will be explained later.

As described above, the real camera 103 views a set which has a blue background. Areas of the image which are blue are replaced with images of the virtual set generated in the graphics processor 302 detailed in Figure 5. The process of selecting an image source for each point in the video image is known as chroma-keying. This process is summarised in Figure 6.

Camera position data is supplied to a calibration process 601.

Fixed calibration data for the real camera stored on a hard disk 602 is used to determine real time parameters of the virtual camera, such as view angle, plane of focus and so on, in response to measurement information supplied from the sensors on the real camera, which are necessary to calculate the correct projection of objects in the virtual world for each video frame.

The updated model of the virtual camera is supplied to a virtual set construction process 603, which also receives external control signals 604 and video texture definitions 605. The virtual set is constructed, and its projection calculated according to the parameters for the model of the virtual camera, resulting in a single frame of video data. This virtual frame is supplied to a chroma key process 606.

The digitised video signal from the real camera 103 is supplied to a chroma suppress circuit, which sets all pixels of the real image containing a particular colour to a value of zero. A zero value pixel supplied to the chroma key process 606 signifies that, for that particular pixel, the corresponding pixel of the virtual video frame is valid, and this is supplied to a video mixer process 608. Otherwise the chroma key process supplies pixels of value zero. The opposite process is performed by the chroma suppress process 607, which supplies all non-blue pixel data to the video mixing process 608. Thus, the video mixing process 608 superimposes non-blue parts of real video data onto the image generated of the virtual set. The video mixing process 608 also includes anti-liaising processes in order to ensure that visible distortion does not occur on the boundaries between real and virtual images.

The process of virtual set construction shown in Figure 6 is defined by data stored in the shared memory 512 shown in Figure 5.

The virtual set is defined by a data structure known as a scene tree. A representation of the scene tree and other key data structures stored in shared memory is shown in Figure 7. The scene tree 701 comprises a number of objects, which are defined recursively. Thus object 702 represents the stage backdrop 204 shown in Figure 2, and an object defined within the backdrop is a link object 703 to a film clip which is supplied from some external real time video source.

Other simple objects are defined non-recursively, such as the pedestal 202, shown in Figure 2, represented by the non-recursive object 704. Complex objects, such as the statue 203 which is also shown in Figure 2, are defined by many layers of recursive objects within an overall object 705 defining the statue. As the scene tree is analyzed, the further down the level of recursion one goes, the simpler the object. Thus, at the lowest level of recursion, objects are defined as primitives, in other words a shape, such as a polygon, whose basic structure is understood by the rendering processor 509, and need not be further defined.

Repeated references to a single instance of a primitive object such as a polygon enables complex three-dimensional structures to be constructed from simpler ones, to whatever level of detail is required.

Also included in the shared memory are executable scripts 711, which are executed at the beginning of each frame and perform manipulations on data structures defined within the scene tree 701. Object animation functions 712 enable objects within the scene tree to be manipulated in the form of an animation, for example the rotation of a propeller on a virtual aeroplane object as it flies across a virtual set.

Manipulation of the scene tree 701 is summarised in Figure 8.

The scene tree is a file which may be viewed and manipulated, though not in real time, by a text editor 801. The text editor 801 is also able to perform manipulations of the executable scripts 711. These are written in the C programming language, and are compiled so that they may be automatically executed at the beginning of each virtual set frame construction process.

A control interface supplies control data to the scene tree 701 and to the animation functions 712. The purpose of this is to enable real time control, or possibly synchronisation over various aspects of the virtual set. For example, it may be desired that a virtual aeroplane should fly through the virtual set, not at a predetermined time, but rather in response to a cue from the program producer, which could be provided over a serial RS432 link from the video mixer 404.

The camera interface 803 controls the way in which the scene tree 701 is manipulated, in that data from the calibrated real camera is used to define the perspective projection of the real world onto a two dimensional plane.

Three-dimensional modelling is a time consuming task. For example, the statue 203 shown in Figure 2 is a highly complex shape, and may even have been determined by three dimensional white laser scanning of a real object. Thus three dimensional models may be incorporated into the scene tree, via a three dimensional model import process 804. This provides access to a rich library of three dimensional shapes from a wide variety of sources.

Thus, before the scene tree 701 is interpreted as a description of a particular instance in time of the virtual set, various data and or electrical signals may be used to determine conditional aspects of its structure. Once these extemal influences have been taken into account, the scene tree is optimised in an optimisation process 805. The optimisation process 805 attempts to ensure that the structure of the scene tree that is supplied to the rendering process is as efficient as possible. After optimisation, the scene tree is converted into a display list in process 806.

The display list generating process 806 breaks down the scene tree into vertices of object primitives which may then be supplied to the rendering processor 509. The rendering processor can then connect vertices with lines, fill polygons or other primitives with surfaces and textures, and perform other tasks related to three-dimensional graphics rendering of object primitives.

The process 806 of generating a display list is detailed in Figure 9. In process 901, the next object is selected.

In process 902, object transformations are concatenated. Each object, whether it is a primitive or not, may be manipulated in a number of ways in order to perform animation or related function. These manipulations are combinations of movement or translation, stretching or rotation. These basic transformations are known as affine transformations. Each such manipulation is performed arithmetically by evaluating a transformation matrix multiplied by the points which define the vertices of an object. Given a set of points in three-dimensional virtual space, generally referred to as vertices in world space, each vertex may be multiplied sequentially by any number of transformation matrices, thus enabling complex manipulations to be performed, without having to calculate a unique equation for any one of an infinite variety of possible geometric transformations.

Furthermore, by sequentially multiplying by several transformation matrices, in the form of a list of transformations, it becomes possible to remove transformation matrices from the list, and so undo effects which turn out to be undesirable. This is the general approach adopted in most two dimensional and three dimensional graphics systems. The process of multiplying by a list of matrices is known as matrix concatenation. Matrices may be used for special operations, other than modifying position or shape in world space, including projecting a view of a three dimensional model into a two dimensional plane, such as that of a video frame.

A non-intuitive aspect of transformation matrices is that matrices for use in two-dimensions are defined as three-by-three matrices, and three dimensional transformations are accomplished using four-by-four transformation matrices. The co-ordinate system used in a four-by-four matrix system is not x,y,z, but xlw, ylw, ziw and w. w is not a physically measurable quantity, but provides a mathematical trick that makes the general technique of matrix concatenation possible. Details of matrices and their applications may be found in Computer Graphics, by Edward Angel, ISBN 0-201-13548-5.

As objects may be defined recursively, in process 902, the object is analyzed into its lowest constituent objects. Then, working back up the recursive data structure, transformations at each level are concatenated onto the list of vertices which are defined as making up the object at the current level of recursion. In this way, for example, the propeller of a virtual model aeroplane may rotate. This propeller is itself part of a larger object, the aeroplane, which flies from one side of the studio to the other. Thus a transformation of rotation are concatenated for the propeller object, and then transformations defining the path of flight are concatenated for the plane object. Considering a single vertex on the propeller, this will have rotation and the various path of flight transformations concatenated to it, while other parts of the aeroplane will have only the path of flight transformations. This, therefore, is the highly structured approach to three-dimensional modelling which is adopted when defining objects for use in a virtual studio.

In process 903, a viewing matrix is concatenated, in addition to whatever other transformations have already been concatenated. The viewing matrix is a special matrix, defined by the location of the real camera, and is required in order to simplify projection of the threedimensional world space into a two dimensional plane which will be performed in process 904.

The world space in which objects are defined by the scene tree may be considered as a fixed volume, with any point in it defined by an x, y, z co-ordinate. In fact four co-ordinates are used, as explained above, xlw, ylw ziw and w. The initial non-transformed state of any vertex has the value w equal to unity, so xlw, ylw and z/w are in fact equal to x, y and z before transformations have been applied. At some stage in the rendering process, it will be necessary to project an image onto a two-dimensional plane, which may be considered as the plane of the image focused in the virtual camera, and the image of the virtual world which would be displayed on a monitor.

This two-dimensional projection has a variable angle with respect to the x, y and z axes of the virtual world space. An equation may be used to define this plane, in terms of the x, y, z co-ordinates of world space. Then it might be possible to project the three dimensional model onto this space using basic geometrical equations. In three dimensions, this approach requires considerable calculation, and a simpler solution is to rotate and move all objects in world space so that the projection plane is defined by the x axes, and is perpendicular to the z axis. Then projection is simplified considerably. Thus, concatenation of the viewing matrix, performed in process 903, rotates and moves any object in world space so that the system of co-ordinates is normalised to the location of the projection plane. Another way of viewing this is that the virtual camera remains still while the virtual world moves around it; corresponding to a fixed real world that is viewed by a moving real camera. The relative movements are identical.

In process 904, perspective projection of the currently selected object onto the projection plane is performed by concatenating a projection matrix. Note however, that the z co-ordinate is not discarded or set to zero, as this is required in order to perform hidden surface removal.

In process 905 object culling is performed. Objects which lie outside the x coordinate range of the projection plane are discarded, as are objects which are too close or too far from the virtual camera, for example, objects which are behind the virtual camera might otherwise be displayed as being inverted, when they should not be displayed at all.

In process 907 the resulting vertices are added to the display list, along with a reference to the object primitives which they define, and other details, such as the type of surface, texture, specular reflectivity and so on. This information will later be used by the graphics rendering processor 509 which has highly optimised circuits for translating this information into frame pixel data in real time.

In process 908, a question is asked as to whether any other objects remain to be added to the display list. If no other objects remain, the display list is supplied to the graphics pipeline of the rendering processor 509. Construction of the display list takes a variable amount of time, depending on the number and complexity of the objects and transformations which it defines. Thus the display list may be produced well in advance of the next frame, or possibly take longer than one frame to calculate. The graphics pipeline is a concept which synchronises display lists with video frame outputs. Thus, when a display list is early, it is stored in the pipeline until it is needed. If the display list cannot be generated in time for the next frame, the previous display list is used, thereby minimising the visible effects. Clearly, though, this is a situation which is avoided if at all possible, as it reduces the realism of the resulting image.

Due to the amount of parallel processing which occurs in the system, a delay of a few frames is incurred. Thus the image of the combined virtual world and the real world is noticeably delayed in time by a fraction of a second with respect to the real time. This delay is related to the processing capacity of the computer hardware used to render the virtual images, and may be expected to decrease as more processing power becomes available.

The sequence of steps shown in Figure 9 results in an image being drawn by the rendering processor 509. In order for a realistic image to be produced, it is necessary for the rendering processor to calculate any shadows which are cast by objects in the path of a light source. In order for shadowing to be effectively calculated, four types of lighting effect are defined for an object surface which affect the intensity of the red green and blue components which are emitted by that surface in the final rendered image frame.

These four types of lighting effect are: Emission, ambient, diffuse and specular. Emission is the amount of light, in the red green and blue components, which is emitted by the object, even in the absence of ambient lighting. For example, an image of a bulb filament will have an emission characteristic, for example bright yellow in the case of a tungsten bulb filament. Ambient light is the colour of the object in response to ambient lighting, for example, the colour of an object which is in shadow. Typically an image will have some degree of lighting specified, but which does not have any particular localised direction, and therefore does not result in any shadows being cast.

Diffuse light is the colour emitted by an object in response to direct lighting, for example, the colour emitted by an object which is lit by a localised light source, which may or may not be visible in the final rendered frame, but which nevertheless exists at some local point in the three dimensional virtual world. Specular lighting is the reflection of a light source, where the actual image of the light source is identifiable, for example when the surface of an object is highly polished or a mirror. By adjusting the ratio between specular and diffuse light colour for a surface it is possible to define a range of surface reflectivities, such as might be expected in the real world.

Given these four primary surface properties, it becomes possible to determine how a surface might appear in or out of shadow. A problem remains, however, in that it is not obvious how to determine which surfaces are in shadow and which are not.

A known approach to the problem is shown in Figures 10A, 10B and 10C. In figure 10A the image area is represented in a square 1001.

A polygon 1002 in three dimensions is projected onto the image plane 1001. In Figure 10B a light source 1003, defined in three dimensions such that it does not appear on the screen 1001, casts a shadow volume 1004 behind the polygon 1002.

In Figure 10C, another polygon 1005 intersects the shadow volume 1004, such that the shadow 1006 of the polygon 1002 is partially cast upon it.

A simple known approach to determining the area of the polygon 1005 which is in shadow is done in the following way: The shadow volume 1004 is considered as having a number of sides, in this case three, because the polygon 1002 is a triangle. Each of these sides is called a shadow plane. The scene is rendered initially for emission and ambient characteristics which are those which do not relate to shadowing. Importantly this fills the z buffer of the rendering processor 509 with values for each image pixel indicating the z co-ordinate for each pixel. Next, each pixel is considered individually. If there are any shadow planes that project across a pixel, and which are in front of the pixel which is displayed, in other words the shadow plane at that pixel has a higher z value than the z buffer value, then this shadow plane is counted.

Shadow planes are counted positively if the shadow plane is facing forward, in other words, one looks through it into shadow, or they are counted negatively if the shadow plane is facing backwards. Thus, considering the point 1006 in Figure 1 OC, the front shadow plane of the shadow volume 1004 is the only shadow plane encountered in front of the object 1005. The count from the method outlined would be one, indicating that the pixel should be in shadow. Considering point 1007, the count, using the above method, would be one, going into the front shadow plane, followed by minus one, going out of the back shadow plane, resulting in a total of zero. Thus the pixel at point 1007 is not in shadow.

The count values made in this way for each pixel are stored in an area of video memory called a stencil plane. Having completed entering values for all pixels in the stencil plane, it is then possible to render the objects for their diffuse and specular lighting characteristics. This is only done for those pixels which have a zero stencil value, and which are therefore not in shadow. As the diffuse and specular red, green and blue components are calculated, these values are accumulated for each pixel with the values already stored when rendering the objects in emission and ambient light.

This technique may be extended for multiple light sources, with shadow volumes and their respective planes calculated for each light source, and accumulating resulting red, green and blue components for those pixels which are not shaded from that light source. This process is summarised in Figure 11.

In process 1101, the red, green and blue pixel planes in the rendering buffer are set to zero. In process 1102, objects are drawn into the red, green and blue, and z pixel planes in accordance with the emission and ambient lighting characteristics of their visible surfaces. In process 1103, a light source is selected.

In process 1104, the shadow planes for the objects are calculated, and the stencil plane is updated. In process 1105 the diffuse and specular lighting characteristics for the visible surfaces of each object are accumulated to each red, green and blue pixel, for pixels which have their respective stencil plane pixel equal to zero. In process 1106 a question is asked as to whether another light source needs to be considered. If the answer to this question is yes, control is directed to process 1104, and the same processes are repeated for the new light source. Alternatively, the outcome of decision process 1106 is negative, and control is directed to process 1107, where it is known that the frame is ready for rendering.

The process 1104 for calculating shadow areas, shown in Figure 11 is detailed in Figure 12. In process 1201 the shadow planes are calculated. In process 1202 the stencil bit planes are set to zero.

Several bits are used to represent each pixel in the stencil plane. In process 1203 the first pixel in the picture frame is considered.

In process 1204 a question is asked as to whether there are any shadow planes in front of the z value stored for the currently considered pixel. If the outcome of this question is no, control is directed to process 1207. Alternatively, control is directed to process 1205. In process 1205 a count is made of the number of shadow planes crossed going into shadow before the z value of the object displayed at the currently considered pixel is encountered. In process 1206 a count is made of the number of shadow planes encountered going out of shadow. For each of these a negative count is made to the stencil plane.

In process 1207 a question is asked as to whether there are any remaining pixels on the display which are to be considered. If the answer to this question is yes, control is directed to process 1204, and the processes are repeated for the next pixel. Alternatively, the result of this question is negative, and control is directed to process 1105 in Figure 11.

The method which has been described is simple, and has difficulty when shadows are complex. For example, objects which cast shadows may themselves be partly in shadow, and this is not fully taken into account. An improved approach to calculating shadow volumes is detailed in Figures 13 and 14.

Processes 1204, 1205 and 1207 shown in Figure 12 are replaced by the processes shown in Figure 13. In process 1301 a question is asked as to whether there are any shadow planes in front of the object whose z value is stored in the z buffer at the currently considered pixel.

If the result of this question is negative, control is directed to process 1207 in Figure 12. Alternatively, control is directed to process 1302, where the first shadow plane encountered as one moves into the screen is considered.

In process 1303 a question is asked as to whether the shadow plane is cast from a double edge. A double edge is where two polygons, for example in a mesh, are side by side, with no gap. If the result of this question is in the negative, control is directed to process 1306. Alternatively, if the currently considered shadow plane is cast from a double edge, control is directed to process 1304. In process 1304 a question is asked as to whether the polygon which is casting the shadow plane has mixed single and double edges. For example, a polygon at the edge of a mesh will have mixed single and double edges. If the polygon does not have mixed edges, control is directed to process 1306. Alternatively control is directed to 1305.

In process 1305 a condition has been reached where it is known that the shadow plane is being cast from a double edge and that the polygon from which it is cast has mixed edges. A variable, N, is set to the value two in this condition. In all other conditions, encountered at process 1306, N is set to the value of one. After the value of N has been set in process 1305 or process 1306, control is directed to process 1307.

In process 1307 a question is asked as to whether the polygon which is casting the shadow plane is itself in light. If the result of this question is yes, control is directed to process 1401 in Figure 14.

Alternatively control is directed to process 1404 in Figure 14. After the processes in Figure 14 have been executed, control is directed to process 1308, where a question is asked as to whether there is another shadow plane in front of the object displayed at the current pixel. If the result of this question is yes, control is directed to process 1302.

Alternatively, control is directed back to process 1207, in Figure 12.

In Figure 13, the stencil plane is incremented according to various conditions. In process 1401 a question is asked as to whether the shadow plane under consideration is front facing. If it is, control is directed to process 1402, where the stencil plane is incremented by the value N. Alternatively control is directed to process 1403, where the stencil plane is decremented by the value N. Thereafter control is directed to process 1407, where control is retumed to process 1308 in Figure 13.

In process 1404 a question is asked as to whether the currently considered shadow plane is front facing. If it is, control is directed to process 1405, where the value of the stencil plane is decremented by N. Alternatively control is directed to process 1406, where the stencil plane is incremented by N. Thereafter control is directed to process 1407, and then to process 1308 in Figure 13.

Claims

1. A method of processing image data to identify a shadow surface in a projection of three dimensional objects represented by said data, including a light source, comprising steps of: identifying a shadow plane cast by an edge; identifying a first characteristic of said edge; identifying a second characteristic of said edge; and incrementing or decrementing a data plane in an amount responsive to an identified combination of said first and said second characteristic on a pixel-by-pixel basis for a rendered image plane.

2. A method according to Claim 1, including selecting said incrementing or decrementing in response to a third characteristic of said shadow plane at said edge of said object.

3. A method according to Claim 1, wherein said first characteristic is an indication as to whether a view is the outward or inward facing, with respect to its shadow volume.

4. A method according to Claim 1, wherein said second characteristic is whether the object is a polygon having at least one edge shared with another polygon and at least one edge not shared with another polygon.

5. A method according to Claim 2, wherein said third characteristic is whether said edge is in light or not.

6. Apparatus for processing image data to identify a shadow surface in a projection of three dimensional objects, including a light source, comprising means for rendering said three dimensional data to produce a two dimensional array of pixels; and processing means configured to identify a shadow plane cast by an edge of said object, identify a first characteristic of said edge, identify a second characteristic of said edge, and to increment or decrement a data plane in an amount responsive to allow identified combination of said first and second characteristics on a pixel-by-pixel basis for a rendered image plane.

7. Apparatus according to Claim 6, wherein said processing means is arranged to select said incrementing or decrementing in response to a third characteristic of said shadow plane at an edge of said object.

8. Apparatus according to Claim 1, wherein said first characteristic is an indication as to whether a view is outward or inward facing in respect to the shadow plane.

9. Apparatus according to Claim 6, wherein said second characteristic is an indication as to whether the object is a polygon having at least one edge shared with another polygon and at least one edge not shared with another polygon.

10. Apparatus according to Claim 7, wherein said third characteristic is an indication as to whether said edge is in light or not.

11. A method of processing image data substantially as herein described with reference to the accompanying drawings.

12. Apparatus for processing image data substantially as herein described with reference to the accompanying drawings.