WO2006024873A2

WO2006024873A2 - Image rendering

Info

Publication number: WO2006024873A2
Application number: PCT/GB2005/003415
Authority: WO
Inventors: Paul Brooke
Original assignee: Sony Computer Entertainment Europe Limited
Priority date: 2004-09-03
Filing date: 2005-09-05
Publication date: 2006-03-09
Also published as: GB0419593D0; GB2417846A; WO2006024873A3

Abstract

Real-time image rendering apparatus for rendering an image of a display object comprises means for receiving successive captured images from a video camera; means for deriving a reflection vector, in respect of a point of interest on an object to be displayed, in dependence on the local orientation at the point of interest and a direction of view; means for referencing a position in a current captured image using the reflection vector, to detect a captured image property at that captured image position; and means for varying the appearance of the object at the point of interest in dependence on the detected captured image property and an object property at the point of interest.

Description

IMAGE RENDERING

This invention relates to image rendering.

Many applications of image rendering use the virtual environment surrounding a displayed object to contribute to the surface appearance of the object. This generally increases the realism of the rendered object. Two examples are the use of lighting effects, so that the object varies in appearance depending on its relative position and surface orientation with respect to a light source, and the use of environmental reflections so that the object's appearance can vary in dependence on its reflectivity and surface orientation as well as its surroundings.

Various techniques have been proposed to achieve this, such as cube mapping, spherical mapping and dual paraboloid mapping. These techniques have some process steps in common. In particular, once a viewing direction is defined, a "reflection vector" is calculated for a small area of the surface of an object, based on the viewing direction with respect to the object and the surface normal at that position on the object. This reflection vector points to a position in a "map" of the environment which approximates to the view of the surroundings from the object's point of view. The image properties of the environment at the mapped position are then applied to the object's surface at that position. The degree to which they modify the object depends on the notional reflectivity of the object.

To use any of these mapping techniques, the environment can be pre-prepared and static. But in some applications such as, for example, a real-time representation of a car race by a computer games machine, the environment depends on the instantaneous position of the car being viewed. In this type of situation, the environment texture data in any of the above arrangements is derived at the time it is needed.

Deriving the environment texture data for use in any of these mapping processes is very processor-intensive. To derive the data for the cube mapping processes, a six pass process is used. For spherical mapping four passes of two major processing steps are needed. The processing requirements to generate a dual paraboloid map are twice those of the spherical map.

This invention provides real-time image rendering apparatus for rendering an image of a display object, the apparatus comprising: means for receiving successive captured images from a video camera; means for referencing a position in a current captured image in dependence on the local orientation at the point of interest and the direction of view, to detect the captured image property at that captured image position; and means for varying the appearance of the object at the point of interest in dependence on the detected captured image property and an object property at the point of interest.

The invention potentially avoids the need for a substantial part of the map processing described above, and provides an advantageously realistic user interface, by using the input from a video camera associated with the apparatus as an environment map. In this way, if the camera is directed at a user watching a display screen on which the rendered image are displayed, the user can see his own reflection rendered onto displayed objects.

Preferably the current image is a most recently captured image, though earlier images (for example the third most recently captured image) could of course be used for artistic effect.

The invention also provides computer software having program code for carrying out a method as above. The computer software is preferably provided by a providing medium such as a transmission medium or a storage medium.

Further respective aspects and features of the invention are defined in the appended claims.

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:

Figure 1 schematically illustrates the overall system architecture of the PlayStation2; Figure 2 schematically illustrates the architecture of an Emotion Engine;

Figure 3 schematically illustrates the configuration of a Graphic synthesiser;

Figure 4 schematically illustrates an environmental mapping process;

Figure 5 schematically illustrates the generation of reflection vectors in respect of three object points; Figure 6 schematically illustrates a user operating a PlayStation 2 video game with an attached EyeToy camera;

Figure 7 schematically illustrates a captured image;

Figures 8 to 10 schematically illustrate reflection vectors;

Figure 11 schematically illustrates a spherical mapping process; Figure 12 is a schematic flowchart representing one way of applying reflections as textures; and

Figures 13A to 13C schematically illustrate PlayStation 2 screen views with the reflections applied. Figure 1 schematically illustrates the overall system architecture of the

PlayStation2. A system unit 10 is provided, with various peripheral devices connectable to the system unit.

The system unit 10 comprises: an Emotion Engine 100; a Graphics Synthesiser 200; a sound processor unit 300 having dynamic random access memory (DRAM); a read only memory (ROM) 400; a compact disc (CD) and digital versatile disc (DVD) reader 450; a Rambus Dynamic Random Access Memory (RDRAM) unit 500; an input/output processor (IOP) 700 with dedicated RAM 750. An (optional) external hard disk drive (HDD) 800 may be connected.

The input/output processor 700 has two Universal Serial Bus (USB) ports 715A and 715B and an iLink or IEEE 1394 port (iLink is the Sony Corporation implementation of IEEE 1394 standard). The IOP 700 handles all USB, iLink and game controller data traffic. For example when a user is playing a game, the IOP 700 receives data from the game controller and directs it to the Emotion Engine 100 which updates the current state of the game accordingly. The IOP 700 has a Direct Memory Access (DMA) architecture to facilitate rapid data transfer rates. DMA involves transfer of data from main memory to a device without passing it through the CPU. The USB interface is compatible with Open Host Controller Interface (OHCI) and can handle data transfer rates of between 1.5 Mbps and 12 Mbps. Provision of these interfaces mean that the PlayStation2 is potentially compatible with peripheral devices such as video cassette recorders (VCRs), digital cameras, set-top boxes, printers, keyboard, mouse and joystick.

Generally, in order for successful data communication to occur with a peripheral device connected to a USB port 715A or 715B, an appropriate piece of software such as a device driver should be provided. Device driver technology is very well known and will not be described in detail here, except to say that the skilled man will be aware that a device driver or similar software interface may be required in the embodiment described here.

In the present embodiment, a video camera 730 with an associated microphone 735 and an LED indicator 740 is connected to a USB port 715A. Although various types of video camera may be used, a particularly suitable type of video camera 730 is a so- called "webcam", that is, a medium-resolution camera based on a single charge-coupled device (CCD) element and including a basic hardware-based real-time data compression and encoding arrangement, so that compressed video and audio data are transmitted by the camera 730 to the USB port 715A in an appropriate format, such as an intra-image based MPEG (Motion Picture Expert Group) standard, for decoding at the PlayStation 2 system unit 10.

The camera LED indicator 740 is arranged to receive control data via the USB data connection to the system unit 10. The CPU can send a control signal via this route to set the LED to an "off mode, a "steady on" mode and a "flashing" mode in which the LED flashes at a rate of between, say, 1 and 3 flashes per second. The logic required to cause the LED to flash is provided in the camera circuitry, so it is not necessary for the system unit 10 to instruct each individual flash of the LED.

In an alternative embodiment of the invention, rather than using a microphone built into the webcam, a stand-alone microphone 745 is provided. A stand-alone microphone may be placed closer to a user than a built in webcam microphone, thus providing improved quality sound input to the system. Where a stand-alone microphone is provided, it can be coupled to the system via a second USB port 715B.

Apart from the USB ports, two other ports 705, 710 are proprietary sockets allowing the connection of a proprietary non-volatile RAM memory card 720 for storing game-related information, a hand-held game controller 725 or a device (not shown) mimicking a hand-held controller, such as a dance mat.

The Emotion Engine 100 is a 128-bit Central Processing Unit (CPU) that has been specifically designed for efficient simulation of 3 dimensional (3D) graphics for games applications. The Emotion Engine components include a data bus, cache memory and registers, all of which are 128-bit. This facilitates fast processing of large volumes of multi-media data. Conventional PCs, by way of comparison, have a basic 64-bit data structure. The floating point calculation performance of the PlayStation2 is 6.2 GFLOPs. The Emotion Engine also comprises MPEG2 decoder circuitry which allows for simultaneous processing of 3D graphics data and DVD data. The Emotion Engine performs geometrical calculations including mathematical transforms and translations and also performs calculations associated with the physics of simulation objects, for example, calculation of friction between two objects. It produces sequences of image rendering commands which are subsequently utilised by the Graphics Synthesiser 200. The image rendering commands are output in the form of display lists. A display list is a sequence of drawing commands that specifies to the Graphics Synthesiser which primitive graphic objects (e.g. points, lines, triangles, sprites) to draw on the screen and at which coordinates. Thus a typical display list will comprise commands to draw vertices, commands to shade the faces of polygons, render bitmaps and so on. The Emotion Engine 100 can asynchronously generate multiple display lists.

The Graphics Synthesiser 200 is a video accelerator that performs rendering of the display lists produced by the Emotion Engine 100. The Graphics Synthesiser 200 includes a graphics interface unit (GIF) which handles, tracks and manages the multiple display lists. The rendering function of the Graphics Synthesiser 200 can generate image data that supports several alternative standard output image formats, i.e., NTSC/PAL, High Definition Digital TV and VESA. In general, the rendering capability of graphics systems is defined by the memory bandwidth between a pixel engine and a video memory, each of which is located within the graphics processor. Conventional graphics systems use external Video Random Access Memory (VRAM) connected to the pixel logic via an off-chip bus which tends to restrict available bandwidth. However, the Graphics Synthesiser 200 of the PlayStation2 provides the pixel logic and the video memory on a single high-performance chip which allows for a comparatively large 38.4 Gigabyte per second memory access bandwidth. The Graphics Synthesiser is theoretically capable of achieving a peak drawing capacity of 75 million polygons per second. Even with a full range of effects such as textures, lighting and transparency, a sustained rate of 20 million polygons per second can be drawn continuously. Accordingly, the Graphics Synthesiser 200 is capable of rendering a film-quality image.

The Sound Processor Unit (SPU) 300 is effectively the soundcard of the system which is capable of recognising 3D digital sound such as Digital Theater Surround (DTS®) sound and AC-3 (also known as Dolby Digital) which is the sound format used for digital versatile disks (DVDs).

A display and sound output device 305, such as a video monitor or television set with an associated loudspeaker arrangement 310, is connected to receive video and audio signals from the graphics synthesiser 200 and the sound processing unit 300. The main memory supporting the Emotion Engine 100 is the RDRAM (Rambus

Dynamic Random Access Memory) module 500 produced by Rambus Incorporated. This RDRAM memory subsystem comprises RAM, a RAM controller and a bus connecting the RAM to the Emotion Engine 100. Figure 2 schematically illustrates the architecture of the Emotion Engine 100 of Figure 1. The Emotion Engine 100 comprises: a floating point unit (FPU) 104; a central processing unit (CPU) core 102; vector unit zero (VUO) 106; vector unit one (VUl) 108; a graphics interface unit (GIF) 110; an interrupt controller (INTC) 112; a timer unit 114; a direct memory access controller 116; an image data processor unit (IPU) 118; a dynamic random access memory controller (DRAMC) 120; a sub-bus interface (SIF) 122; and all of these components are connected via a 128-bit main bus 124.

The CPU core 102 is a 128-bit processor clocked at 300 MHz. The CPU core has access to 32 MB of main memory via the DRAMC 120. The CPU core 102 instruction set is based on MIPS III RISC with some MIPS IV RISC instructions together with additional multimedia instructions. MIPS III and IV are Reduced Instruction Set Computer (RISC) instruction set architectures proprietary to MIPS Technologies, Inc. Standard instructions are 64-bit, two-way superscalar, which means that two instructions can be executed simultaneously. Multimedia instructions, on the other hand, use 128-bit instructions via two pipelines. The CPU core 102 comprises a 16KB instruction cache, an 8KB data cache and a 16KB scratchpad RAM which is a portion of cache reserved for direct private usage by the CPU.

The FPU 104 serves as a first co-processor for the CPU core 102. The vector unit 106 acts as a second co-processor. The FPU 104 comprises a floating point product sum arithmetic logic unit (FMAC) and a floating point division calculator (FDIV). Both the FMAC and FDIV operate on 32-bit values so when an operation is carried out on a 128- bit value ( composed of four 32-bit values) an operation can be carried out on all four parts concurrently. For example adding 2 vectors together can be done at the same time. The vector units 106 and 108 perform mathematical operations and are essentially specialised FPUs that are extremely fast at evaluating the multiplication and addition of vector equations. They use Floating-Point Multiply-Adder Calculators (FMACs) for addition and multiplication operations and Floating-Point Dividers (FDIVs) for division and square root operations. They have built-in memory for storing micro-programs and interface with the rest of the system via Vector Interface Units (VIFs). Vector Unit Zero 106 can work as a coprocessor to the CPU core 102 via a dedicated 128-bit bus 124 so it is essentially a second specialised FPU. Vector Unit One 108, on the other hand, has a dedicated bus to the Graphics synthesiser 200 and thus can be considered as a completely separate processor. The inclusion of two vector units allows the software developer to split up the work between different parts of the CPU and the vector units can be used in either serial or parallel connection.

Vector unit zero 106 comprises 4 FMACS and 1 FDIV. It is connected to the CPU core 102 via a coprocessor connection. It has 4 Kb of vector unit memory for data and 4 Kb of micro-memory for instructions. Vector unit zero 106 is useful for performing physics calculations associated with the images for display. It primarily executes non- patterned geometric processing together with the CPU core 102.

Vector unit one 108 comprises 5 FMACS and 2 FDIVs. It has no direct path to the

CPU core 102, although it does have a direct path to the GIF unit 110. It has 16 Kb of vector unit memory for data and 16 Kb of micro-memory for instructions. Vector unit one 108 is useful for performing transformations. It primarily executes patterned geometric processing and directly outputs a generated display list to the GIF 110.

The GIF 110 is an interface unit to the Graphics Synthesiser 200. It converts data according to a tag specification at the beginning of a display list packet and transfers drawing commands to the Graphics Synthesiser 200 whilst mutually arbitrating multiple transfer. The interrupt controller (INTC) 112 serves to arbitrate interrupts from peripheral devices, except the DMAC 116.

The timer unit 114 comprises four independent timers with 16-bit counters. The timers are driven either by the bus clock (at 1/16 or 1/256 intervals) or via an external clock. The DMAC 116 handles data transfers between main memory and peripheral processors or main memory and the scratch pad memory. It arbitrates the main bus 124 at the same time. Performance optimisation of the DMAC 116 is a key way by which to improve Emotion Engine performance. The image processing unit (IPU) 118 is an image data processor that is used to expand compressed animations and texture images. It performs I-PICTURE Macro-Block decoding, colour space conversion and vector quantisation. Finally, the sub-bus interface (SIF) 122 is an interface unit to the IOP 700. It has its own memory and bus to control I/O devices such as sound chips and storage devices.

Figure 3 schematically illustrates the configuration of the Graphic Synthesiser 200. The Graphics Synthesiser comprises: a host interface 202; a set-up / rasterizing unit 204; a pixel pipeline 206; a memory interface 208; a local memory 212 including a frame page buffer 214 and a texture page buffer 216; and a video converter 210.

The host interface 202 transfers data with the host (in this case the CPU core 102 of the Emotion Engine 100). Both drawing data and buffer data from the host pass through this interface. The output from the host interface 202 is supplied to the graphics synthesiser 200 which develops the graphics to draw pixels based on vertex information received from the Emotion Engine 100, and calculates information such as RGBA value, depth value (i.e. Z-value), texture value and fog value for each pixel. The RGBA value specifies the red, green, blue (RGB) colour components and the A (Alpha) component represents opacity of an image object. The Alpha value can range from completely transparent to totally opaque. The pixel data is supplied to the pixel pipeline 206 which performs processes such as texture mapping, fogging and Alpha-blending and determines the final drawing colour based on the calculated pixel information. Alpha blending is a process involving mixing image properties of two images to produce a composite image, the mixing being dependent on the parameter alpha; often this is used in respect of a foreground and a background image, with alpha representing the transparency of the foreground image. If a particular pixel or region of the foreground image has an alpha value indicating 100% non-transparency then the composite image at that position will depend entirely on the foreground image; none of the background image will "show through". On the other hand, an alpha value at that point indicative of complete transparency will mean that the composite image at that point will be entirely dependent on the background image. An alpha value between these two extremes will lead to the composite image, at that position, being dependent on a mixing of the foreground and background images.

The pixel pipeline 206 comprises 16 pixel engines PEl, PE2 .. PEl 6 so that it can process a maximum of 16 pixels concurrently. The pixel pipeline 206 runs at 150MHz with 32-bit colour and a 32-bit Z-buffer. The memory interface 208 reads data from and writes data to the local Graphics Synthesiser memory 212. It writes the drawing pixel values (RGBA and Z) to memory at the end of a pixel operation and reads the pixel values of the frame buffer 214 from memory. These pixel values read from the frame buffer 214 are used for pixel test or Alpha-blending. The memory interface 208 also reads from local memory 212 the RGBA values for the current contents of the frame buffer. The local memory 212 is a 32 Mbit (4MB) memory that is built-in to the Graphics Synthesiser 200. It can be organised as a frame buffer 214, texture buffer 216 and a 32- bit Z-buffer 215. The frame buffer 214 is the portion of video memory where pixel data such as colour information is stored.

The Graphics Synthesiser uses a 2D to 3D texture mapping process to add visual detail to 3D geometry. Each texture may be wrapped around a 3D image object and is stretched and skewed to give a 3D graphical effect. The texture buffer is used to store the texture information for image objects. The Z-buffer 215 (also known as depth buffer) is the memory available to store the depth information for a pixel. Images are constructed from basic building blocks known as graphics primitives or polygons. When a polygon is rendered with Z-buffering, the depth value of each of its pixels is compared with the corresponding value stored in the Z-buffer. If the value stored in the Z-buffer is greater than or equal to the depth of the new pixel value then this pixel is determined visible so that it should be rendered and the Z-buffer will be updated with the new pixel depth. If however the Z-buffer depth value is less than the new pixel depth value the new pixel value is behind what has already been drawn and will not be rendered.

The local memory 212 has a 1024-bit read port and a 1024-bit write port for accessing the frame buffer and Z-buffer and a 512-bit port for texture reading. The video converter 210 is operable to display the contents of the frame memory in a specified output format. Figure 4 schematically illustrates an environmental mapping process. An object

1000 lies within a virtual 3D environment generated by the game being played. In this example, it represents a reflective sphere, but it could be taken to represent any object with a reflective or at least partially reflective surface, such as a car bonnet or a teapot. The object 1000 is to be rendered as if viewed from a viewing position 1010. This is not generally a notional position of a viewer of the screen on which the image is to be displayed; rather, it is more generally taken to be the position of the screen itself (in the virtual space). As the object 1000 has a reflective surface, the rendering involves calculating the reflections of the environment off the object 1000 as seen from the viewing position 1010. These reflections are dependent on the environment, the topology of the object 1000 and the location of the object 1000 and the view position 1010 within the environment. For example, a cube 1020 also exists within the environment and may or may not be seen from the viewing position 1010 as a reflection off the object 1000 depending on the relative locations of the object 1000, the cube 1020 and the viewing position 1010. To calculate the appearance, as seen from the viewing position 1010, of a surface point 1030 located on the surface of the object 1000, a viewing vector 1040 from the viewing position 1010 to the surface point 1030 is calculated. This can be normalised to form a vector U of unit length. A normal vector 1050 is shown that represents the normal to the surface of the object 1000 at the surface point 1030. This can be normalised to form a vector N of unit length. A reflection vector 1060 can be calculated from N and U using Equation 1 below:

Equation 1 : ^~R = TJ - 2(U ^■ N)N

The reflection vector 1060 points to the position in the environment that would be seen from the viewing position 1010 as a reflection at the surface point 1030 off the surface of the object 1000. Whilst these vectors could be calculated within any coordinate system, it is common to use a coordinate system with the viewing position 1010 located at the origin.

In an environmental mapping process, the reflection vector 1060 is used to index a point in a texture map. A texture map is a 2D representation of the 3D environment as seen from a particular mapping-point located within the environment and viewed in a particular direction. This mapping -point may or may not be the same as the surface point 1030. The generation of texture maps will be described later. As the texture map is generated from the mapping-point facing in a particular direction, some sections of the texture map may relate to the environment in front of the mapping-point whilst others may relate to the environment behind the mapping-point. An example mapping point 1070 and an example viewing direction 1080 are shown in Figure 4.

Figure 5 schematically illustrates the generation of three reflection vectors, R₀ , R_x and R₂ in respect of three surface points 1100, 1110 and 1120 given a viewing position 1130. The three reflection vectors R₀ , R_x and R₂ are calculated according to Equation 1 , using normalised versions of three viewing vectors U₀ , U_x and U₂ and three unit normal vectors N₀ , N₁ and N₂ respectively. A reflection vector could be calculated in a similar manner for any other surface point. Alternatively, if a relatively flat surface topology is assumed within a triangle 1140, formed from the three surface points 1100, 1110 and 1120, then the reflection vector for a point within the triangle 1140 could be calculated by interpolating the three reflection vectors R₀ , R_x and R₂ . Figure 6 schematically illustrates a user operating a PlayStation 2 video game with an attached EyeToy™ camera. "EyeToy" is a Sony proprietary name for a camera connectable to a PlayStation 2 machine in the manner illustrated in respect of the webcam 730 of Figure 1. In particular a user 1200 is watching a television display 1210 while operating a hand-held controller 1220 connected to a PlayStation 2 video game apparatus 1230. The PlayStation 2 1230 provides a video output to the television 1210 and also receives as an input video frames captured by an EyeToy camera 1240.

Figure 7 schematically illustrates a captured image, which shows the user 1200 substantially facing the EyeToy camera 1240. Other image details have been omitted for clarity.

Figures 8 to 10 schematically illustrate reflection vectors. The three figures illustrate a viewing direction, signified by a schematic eye 1300, viewing a small portion 1310 of an object of interest for display. A captured image 1320 from the camera 1240 is used as an environment map to form a reflection on the surface of the object of interest. Accordingly, the reflection vector r is used to provide a mapping between the local orientation of the portion 1310 and a position 1330 in the environment map 1320. Another example is shown in Figure 9 where the portion of the object of interest 1310' is angled such that the reflection vector r points to a different position 1340 on the environment map 1320. Figure 9 also illustrates a normal 1350 to the plane of the portion 1310'.

Used in this way, the image 1320 applies an environment map rather like one face of a cubic environment map, and the operation of the system is realistic to the extent that the object displayed on the screen appears to have at least partial qualities similar to those of a mirror and the user 1200 observing screen sees his own reflection on the object of interest.

Figure 10 illustrates a situation where the local orientation of the portion 1310" is such that the reflection vector does not intercept the environment map 1320 and no reflection is applied.

In applying the reflection in the case of Figures 8 and 9, account must be taken of the initial or base colour of the portion 1310, 1310', and also of a degree of reflectivity of that portion. If the portion is classed as highly reflective, then the environment map will contribute more to the ultimate appearance of that portion than if its reflectivity is low. The image properties of the points 1330, 1340 etc at the intersection of the reflection vectors and the environment map 1320 are combined with the image properties of the portion of interest 1310, 1310' etc using standard techniques. In particular, one technique is to apply the environment map data as a "texture", which is a standard technique in PlayStation 2 game software. This technique will be referenced below in the description of Figure 12. Another possibility is simply to execute a mixing of the image representing the viewed object and the relevant part of the environment map using (for example) alpha blending between the object and the environment map as described above. Here, if the object being viewed is classified as dull (non-reflective) and dark, alpha could be set to a value which causes little or none of the environment map to be applied to that object. If the object is classified as bright and shiny (reflective) then alpha could be set to a value causing the environment map to have a significant influence on the displayed view of that object.

It will be appreciated that multiple objects can be rendered in dependence on the captured images acting as an environment map. The positions of the multiple objects in virtual space may differ and so, as shown in Figure 5, the reflection vectors (and the referenced positions in the environment map) will generally differ. It will also be appreciated that other items in the scene (e.g. sky) may be rendered with no dependence at all on the reflection maps. Further, other objects in the scene may be deemed to be non-reflective — an example being trees. For these objects, to save on processing requirements, again no account will be taken of the reflection maps.

As an alternative to this partial-cubic environmental mapping, it is possible to obtain an environmental map from the captured image via spherical environmental mapping. In spherical environmental mapping, an infinitely small reflective sphere is assumed to be inside the environment, with each reflection vector coming from the same point in the environment, i.e. this sphere. The coordinate system used places the camera (or viewer) at the origin with the z-axis is pointing out of the camera, the y-axis directed upwards from the camera and the x-axis directed to the right of the camera.

Given a reflection vector R = \f_x,f'_y,r_z ) in this coordinate system, it can be shown that spherical environmental mapping associates R with the point (s,t) in the environmental map, where:

In such a spherical environmental map, s and t lie in the range from 0 to 1. Note that not every point [s,t) in the spherical environmental map corresponds to a reflection vector. Instead, only points within a circle 1380 (Figure 11) inside the spherical environmental map 1320 correspond to reflection vectors. This arrangement would not give a reflection similar to the user looking into a mirror; it would give the user a distorted, "fish eye" view of himself. It is mentioned merely to illustrate that other types of mapping using real-time captured images are possible.

Figure 12 is a schematic flow chart illustrating the handling of image data from the camera, including the application of the captured image as an environment map.

The steps illustrated in Figure 12 are carried out by various different parts of the system. In general terms these are: the IOP 700, the Emotion Engine (IPU) 118, the Emotion Engine (CPU) 102 and the graphics synthesiser 200. Figure 12 is arranged as four columns, each column corresponding to operations carried out by one of these parts. The steps shown in Figures 4 are carried out under control of software stored on a

DVD disk and read by the reader 450, although software received over a network connection such as an internet connection (not shown) may be used instead. They are repeated for each image (e.g., a progressive-scanned frame) received from the camera 730. The image rate may be set within the operating software of the PlayStation 2 system unit 10. An example image rate which may be suitable is a rate of 50 frames per second.

At a step 1382, the IOP 700 receives data from the camera 730 corresponding to one frame. As mentioned above, this data is in a compressed form such as an intra-image MPEG format. At a step 1384, the Emotion Engine 100 reads the frame's worth of image data from the IOP and routes it to the IPU 118. At a step 1386, the IPU 118 decodes the MPEG-encoded image data into a luminance-chrominance (Y, Cb, Cr) format. The Y, Cb, Cr representation of the image is then handled in four different ways by the Emotion Engine's CPU 102.

At a step 1388, the CPU 102 converts the Y, Cb, Cr format data into component (red, green, blue, or RGB) data. The RGB data is passed to the GS 200 which stores the frame in the texture buffer 216. Applying an environment map as a texture, via the texture buffer 216, is a known technique used in PlayStation 2 game software as an efficient way of varying the surface appearance of an object of interest. The reflectivity of the object relates to the amount by which the appearance of the object is modified by the influence of the environment map, and so the reflectivity can simply be represented by the degree by which the texture is applied to the object. Here, a parameter such as alpha is defined (as described above) which classifies the object between the extremes of dark, rough surface, non-reflective (in which case little or none of the environment map is applied) and light, smooth surface, substantially fully reflective (in which case a large contribution from the environment map is applied). In between these extremes, the object's colour will also influence the effect of the reflection to be generated.

The remaining part of the environment mapping, namely the selection of which part of the environment map (the captured image) relates to which part of the object for display, can be achieved at the time that the captured image is stored in the texture buffer, so that the appropriate portions of the captured image (according to the mapping scheme selected) are stored at appropriate image locations in the texture buffer.

In the scheme described with reference to Figure 12, a most recently captured (complete) image is used. However, if (for example) the decoding, conversion and mapping stages took, say, three image periods, the "current image" used for mapping would be a slightly less recently captured image. The image used for mapping is, however, updated in real-time, preferably once for each captured image received from the camera, although the map could be updated every n images, where n is an integer greater than 1. Another alternative is for an m-image delay (between image capture and display via an environment map) to be introduced for artistic effect.

Finally, Figures 13A to 13C schematically illustrate PlayStation 2 screen views, with the reflections applied. In each case, the only reflective objection which is shown on the screen is a substantially flat fronted object 1400. In Figure 13A the object 1400 is shown substantially in the plane of the display screen, in Figure 13B it is shown with the upper portion slanting away from the user and it is shown at a skew orientation in the Figure 13C. The mapping process described above alters the reflected view of the user 1200 according to the orientation of the object of interest.

Some possible uses of this technology include, for example, applying reflections of the user to the inside of helmet visors "worn" during motor racing or flight games; or applying reflections of the user to dashboards in cars, controls in aeroplanes etc.

In so far as the embodiments of the invention described above are implemented, at least in part, using software-controlled data processing apparatus, it will be appreciated that a computer program providing such software control and a storage or transmission medium by which such a computer program is stored are envisaged as aspects of the present invention.

References

Environment mapping techniques, including the use of textures, are described in (for example):

"View Independent Environment Maps", Heidrich et al, Proceedings of Siggraph/Eurographics Workshop on Graphics Hardware, 1998, ISBN 0-89791-097-X, pp 39-ff

"Advanced Environment Mapping in VR Applications", Kautz et al, Eurographics Open SG Symposium 2003.

Claims

1. Real-time image rendering apparatus for rendering an image of a display object, the apparatus comprising: means for receiving successive captured images from a video camera; means for referencing a position in a current captured image in dependence on the local orientation at the point of interest and the direction of view, to detect the captured image property at that captured image position; and means for varying the appearance of the object at the point of interest in dependence on the detected captured image property and an object property at the point of interest.

2. Apparatus according to claim 1, in which the referencing means is operable: to derive a reflection vector, in respect of the point of interest on the object to be displayed, in dependence on the local orientation at the point of interest and the direction of view; and to reference a position in a current captured image using the reflection vector, to detect the captured image property at that captured image position.

3. Apparatus according to claim 1 or claim 2, comprising a video camera for generating the captured images.

4. Apparatus according to any one of the preceding claims, in which the current captured image is a most recently captured image.

5. Apparatus according to any one of the preceding claims, in which the object property represents object reflectivity.

6. Apparatus according to any one of the preceding claims, in which the object property represents an object colour property.

7. Apparatus according to any one of the preceding claims, in which the object property represents an object surface texture property.

8. Apparatus according to any one of the preceding claims in which: the direction of view represents a viewing direction of a display screen if the image were displayed on the display screen; and the referencing means is arranged to reference a position in the captured image so that, if the video camera were directed in a direction substantially normal to the display screen, the object properties are modified so as to represent a reflection of the scene captured by the video camera.

9. Apparatus according to any one of the preceding claims, the apparatus being operable to render more than one object for display; a subset of the objects being rendered in dependence on the captured image properties.

10. Apparatus according to any one of the preceding claims, in which the varying means is arranged to vary the colour and/or brightness and/or contrast at the point of interest.

11. Apparatus according to any one of the preceding claims, in which the local orientation at the point of interest is represented by a normal vector relating to the point of interest.

12. Apparatus according to any one of the preceding claims, comprising a display for displaying the rendered image.

13. A video game apparatus comprising image rendering apparatus according to any one of the preceding claims.

14. A real-time image rendering method for rendering an image of a display object, the method comprising the steps of: receiving successive captured images from a video camera; referencing a position in a current captured image in dependence on the local orientation at the point of interest and the direction of view, to detect the captured image property at that captured image position; and varying the appearance of the object at the point of interest in dependence on the detected captured image property and an object property at the point of interest.

15. Computer software comprising program code which, when run on a computer, executes the method steps according to claim 14.

16. A medium by which software according to claim 15 is provided.

17. A medium according to claim 16, the medium being a storage medium.

18. A medium according to claim 16, the medium being a transmission medium.