WO2013179275A2

WO2013179275A2 - Method and system for generating an interactive display

Info

Publication number: WO2013179275A2
Application number: PCT/IB2013/054557
Authority: WO
Inventors: Simon Philip PILBEAM
Original assignee: Donald, Heather June
Priority date: 2012-06-01
Filing date: 2013-06-03
Publication date: 2013-12-05
Also published as: WO2013179275A3

Abstract

A system for generating an interactive display of an avatar includes a display unit for displaying an image of an operator-controlled avatar. The system has a camera for capturing an image of a user of the system. The system includes a control centre having a display on which the image of the user can be displayed to an operator. An input device is usable by the operator to generate control signals to control the behavior of the avatar. An animation engine responds to the control signals to animate the image of the avatar appropriately. A communication link is provided between the display unit and the control centre. The system permits the operator to observe the user and to manipulate the input device to cause the image of the avatar to respond to the user. Microphones and loudspeakers are provided to allow two-way communication between the operator and the user, and a transaction can be carried out between the user and the operator, as if the user was interacting with the avatar itself.

Description

Method and system for generating an interactive display

BACKGROUND OF THE INVENTION

THIS invention relates to a method and a system for generating an interactive display. In particular, the method and system are intended to generate an interactive display of an avatar which is controlled by an operator.

For the purposes of this patent specification, an avatar is understood to be an animated graphic representation of a virtual creature (human or otherwise) which can be generated by computer hardware and software. Such avatars are well known in the context of role playing computer games, where a participant in a game can generate a customized avatar which is then controlled by the participant in the virtual world of the game. In so- called massively multiplayer online role playing games (M ORPGs) the virtual world can accommodate numerous avatars which interact with one another. SUMMARY OF THE INVENTION

According to the invention there is provided a system for generating an interactive display of an avatar, the system comprising: a display unit for displaying an image of an operator-controlled avatar; a camera for capturing an image of a user of the system; a control centre including: a display in which the image of the user can be displayed to an operator; and at least one input device operable by the operator to generate control signals to control the behavior of the avatar; an animation engine responsive to the control signals to animate the image of the avatar accordingly; and a communication link between the display unit and the control centre, the system being operable to permit the operator to observe the user and to manipulate said at least one input device to cause the image of the avatar to respond to the user.

The system preferably includes a microphone and an audio output device associated with the visual display unit, and a microphone and an audio output device at the control centre useable by the operator to enable real time audio communication between the operator and the user.

The system preferably includes an audio processing module arranged to modify the operator's voice to correspond to the displayed avatar. The system may include a voice recognition engine arranged to detect words spoken by a user, to compare detected words with a set of predetermined words, and to initiate an automated action when certain words are detected.

For example, the voice recognition engine may be arranged to display an object on the display unit which is related to a detected word.

The voice recognition engine may further be arranged to transcribe the words spoken by said user for real time or future analysis.

Similarly the voice recognition engine may be arranged to transcribe the words spoken by the operator for real time or future analysis.

The voice recognition engine may be arranged to detect predetermined keywords and details of user demographics and to generate statistics relating thereto.

The system may include a face recognition engine operable in conjunction with the voice recognition engine to identify users who have previously interacted with the system.

In a preferred embodiment the animation engine includes a graphics server networked with the control center and the display unit.

The system may include a virtual world client component on a computer arranged to drive the display unit, and on a computer of the operator, to allow the operator to control activity within a virtual environment on the display.

The system may include a printer operable to print information related to an interaction between the user and the operator, including personal information of the user. The system may further include a card processing device for reading a payment card presented by the user and processing a payment relating to a transaction entered into between the user and the operator.

According to another aspect of the invention, a method of generating an interactive display of an avatar includes: generating an image of an avatar on a display; capturing an image of a user observing the image of the avatar on the display, and displaying the captured image to an operator in real time or near real time; and generating control signals in response to operation of an input device by the operator to control the behavior of said avatar in real time or near real time and to animate the image of the avatar accordingly, thereby to cause the image of the avatar to interact with the user.

The method may include capturing the voices of the user and the operator to enable real time audio communication between the operator and the user.

The method may include processing the operator's captured voice to modify at least one characteristic thereof to correspond to the displayed avatar.

The method may include detecting words spoken by the user, comparing detected words with a set of predetermined words, and initiating an automated action when certain words are detected.

For example, the method may include displaying an object on the display unit which is related to a detected word. The method may include generating a printout for the user of information related to an interaction between the user and the operator, including personal information of the user.

The method may include reading a payment card presented by the user and processing a payment relating to a transaction entered into between the user and the operator.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a simplified schematic diagram illustrating major components of an example embodiment of a system for generating an interactive display of an avatar according to the invention;

Figures 2(a) to (d) are respective portions of a flow chart, illustrating major steps in the operation of the system of Figure 1; and

Figure 3 is a simplified schematic diagram of major hardware components of the system of Figure 1.

DESCRIPTION OF PREFERRED EMBODIMENTS

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of an embodiment of the present disclosure. It will be evident, however, to one skilled in the art that the present disclosure may be practiced without these specific details.

Unlike known systems which generate a display of an avatar in a virtual computer-generated environment, the present invention generates a display of an avatar in a real-world setting. A visual display is combined with a duplex audio channel to allow an asymmetric interaction where one side of the interaction is a real person and the other a graphical avatar, who then interact via a graphical display, sound, and various accessories.

Figure 1 illustrates an example embodiment of the system schematically, wherein one or more real people 10 (referred to herein as users of the system) are within sight of a visual display 12 such as a television or computer monitor. The display could also be, for example, a projected image or hologram. The display 12 can comprise a plurality of display devices. For example, in a more advanced embodiment a first display screen can be positioned facing the person 10, with another display screen on either side of the first screen, and angled at 90 degrees to the first screen to create side-views on each side into a virtual 3D environment displayed on the screens, and another display screen at the rear facing backwards to represent a rear view.

The physical layout of an embodiment of the system is illustrated in Figure 3. A kiosk-style housing 80 contains audience-facing components including a display 12, microphone(s) 18, speakers) 20, camera(s) 22, a printer 40 and a credit card unit 42, as well as a computer (not shown in Figure 3). The housing 80 is located near a possible audience, for example in a shopping mall.

This kiosk unit 80 is connected via a network 82 such as the Internet to one or more graphics world servers 88 as well as one or more system back end servers 86, and to an operator 84 with an associated computer.

The operator 84 is situated in a location hidden from the audience, which location can be geographically distant from the kiosk 80.

Returning to Figure 1, the avatar(s) 14 may have any desired appearance, such as a person-like appearance, an animal-like appearance, a ghost-like appearance, or an object-like appearance, for example. The avatar(s) 14 are three-dimensional graphical characters in a preferred embodiment, but can also be two-dimensional.

The avatar(s) 14 are, in a preferred embodiment, capable of moving so as to cause the audience members 10 to perceive that the movement is live. They are not permanently static.

The background scenes, and other objects shown within the display 12 can be of infinite variety since they are graphical in nature which allows the possibility of graphics design or the use of existing imagery.

On the display 12, an animated graphics character (avatar) 14 appears, rendered by the system in the appropriate orientation(s), visible to the real person(s) 10. A plurality of avatars is also supported. Some assisting avatars (not the main avatar) 14 can be partially or fully automated, although not necessarily to respond to stimuli from the audience 10, however in the preferred embodiment a main avatar 14 is ultimately controlled by a hidden/remote operator 48 (corresponding to the operator 84 in Figure 3). in a preferred embodiment of the system, the display device 12 is touch- sensitive, and sends feedback to the operator 48, and also to graphics server(s) 44 which generate the data to be displayed, when touched by audience member(s) 10.

A computer 16, or embedded processing unit, preferably with a powerful graphics processing unit, is used in a preferred embodiment to support the operation of the display 12, render the avatar(s) 14 and support a plurality of other functions for the operator 48, and for the overall system. In a preferred embodiment the computer 16 runs an operating system such as Microsoft Windows (trade mark).

One or more microphones 18 are positioned relative to the display device 12 to enable sound to be captured from the audience 10, and ultimately transmitted to the operator 48 to enable the operator to listen to and interact with the audience, but also to meet the needs of other items within the system as described below.

The microphone(s) 18 can be positioned so as to assist the operator to determine the physical positions of the audience members 10 relative to the display device 12.

One or more speakers 20 are positioned relative to the display device 12 to enable the audience members 10 to hear sounds generated as part of the simulated environment being represented on the display 12, including sounds from the avatar(s) 14.

The speakers) 20 are positioned so as to create the aural perception within the audience of the source of the sounds relative to the imaginary source within the graphical environment being represented within the display 12 depending upon the position of the audience member(s) 10 to the display 12.

One or more cameras 22 are positioned to capture live or near real-time images of the audience member(s) 10 around the display 12, as well as to capture the real physical environment around the display 12.

In a preferred embodiment wide angle, dual, video or webcams 22 are used positioned to capture the audience member(s) 10 as they are approaching, or looking at the display 12. The video stream from the cameras is ultimately fed in a manner which allows the operator 48 to see the images in or near real-time, and in a manner that eases the operator's perception of the relative positions and viewing direction of the audience member(s) 10 to the display 12.

In the preferred embodiment the computer 16 is connected to a Local Area Network or a Wide Area Network 24, although in other embodiments it is also possible to merge some or all of the components such as the computer 16, and the graphic servers) 44 and the Server-side for the system 46 and the operator's computer 50, with a corresponding merge of the software components where redundant in such an embodiment.

To generate the graphics 14 and related graphical environment on the display 12, the software used in a preferred embodiment is a 3D graphical environment such as a Virtual World like SecondLife or configurable computer game or animation software. Custom-built graphics rendering can be used in other embodiments of the system.

In a preferred embodiment the graphics engine has multiple software components including the graphics server(s) 44, and a client-side component 26, which can be used on the computer 16 driving the display 12, and on the computer of the operator 48 to allow the operator to interact and drive the activity within the perceived graphics environment and therefore ultimately on the display 12, with related and synchronized outputs such as the sound arising from the speakers) 20.

In a preferred embodiment all mention of the graphics engine 44 and client- side component 26 are hidden and not visible on the display 12.

The virtual world client-side component 26 is configured to show an independent perspective within the imaginary virtual graphics world on the display 12. This can be thought of as a camera position for each perspective onto the display 12 that could be witnessed by an audience 10.

The camera 22 images are captured in real-time or near real-time, and are blended, or processed 28, and transmitted 30 by any capable software component(s) to generate imagery to be seen by the operator 48 either independently or generated within the graphics environment virtual world and presented to the operator blended in the graphics world. In a preferred embodiment Adobe Media Encoder is used together with Adobe Media Server to transmit and make available a captured media stream from a webcam 22 to the virtual world server(s) 44, for near real-time blending with the graphics environment, and becomes ultimately visible to the operator 48 via the operator's use of the virtual world client-side 52 software on the operator's computer 50.

In another embodiment of the system, two webcam streams are blended by a software component 28 near real-time to create a very wide angle view of the physical environment around the display 12. This also enables the operator 48 to ultimately have greater visibility of the audience members) 10.

In a preferred embodiment the system continuously captures sound from the microphones) 18, uttered by the audience member(s) 10, and processes the sound to recognize speech in multiple languages into text using suitable software 32.

In a preferred embodiment the speech recognition software component 32 used is the Dragon SDK whereby the Dragon recognition engine does the recognition and the SDK allows for the recognized text stream to be recorded locally on the computer 16 and also transmitted to the system's back end server 46 for near real-time analysis.

In a preferred embodiment a software component 34 of the system takes periodic snapshots of the captured video stream arising from the camera(s) 22 or from any of the intermediate media processing stages such as the image blending 28, for the purposes of future recall, but also for onward transmission to assist with audience member recognition.

In a preferred embodiment this is a security software application acquiring the feed using a webcam 22 stream splitter and transmitting the resultant images at predetermined intervals (e.g. every 2 seconds) to the system back end server 46 for recognition processing.

In a preferred embodiment the system allows supervisory staff or the operator 48 to remotely connect see what is shown on the display 12 and visible to the audience member(s) 10 from a remote location by capturing electronically a simultaneous copy of the imagery on the display 12 using a software component 36 such as VNC. Uses include the ability to ensure the system is still working as expected.

In a preferred embodiment of the system an active monitoring software component 38 is used to check certain operating parameters, such as the connection latency to the graphics server(s) 44 and upon detecting any operating parameters are beyond a configured level the component 38 will temporarily display other media on the display 12, to hide the avatar(s) 14 and the virtual world environment, and alert the operator 48.

In a preferred embodiment the failure detection component 38 causes other advertising media with a message stating the equivalent of "We'll be right back after this break" on the display 12.

In a preferred embodiment of the system one or more printers) 40 are situated relative to the display 12 to ultimately enable the operator 48 or the back end server 44 to cause to be printed items such as vouchers, receipts, coupons, maps, for audience members) 10 to receive physically.

In a preferred embodiment of the system a credit card swipe unit 42 is positioned relative to the display 12 to enable audience member(s) 10 to swipe or insert their credit card or loyalty card or similar, for the purposes of assisting with recognition or identification of audience member(s) 10 and also to enable the processing of purchase transactions by the system.

In a preferred embodiment of the system graphics servers 44 for the graphical virtual world environment are situated remotely. This server(s) integrate the various media streams, sound, process all the operator 48 inputs and other back end server 46 inputs, adds any scenery and autonomous graphical objects, to allow the combined output effect 14 to ultimately be rendered onto the display 12. This also allows more than one operator 48 to participate if desired. Various enhancements of the system are possible. To give some examples, a QR code can be generated in the graphical environment and displayed on the display 12 for the audience to scan with their mobile phones. The system can simulate an "In-world" phone (the avatar 14 can appear to originate and receive phone calls or messages) i.e. the avatar can act out a wide variety of simulated actions. Similarly, the avatar can appear to actually take a "photo" of the audience and either show it to them via the display, or send it to them electronically, or post it in social media. The operator can also update a social media site to continue the impersonation and expand the relationship channel and therefore expand the exposure of brands being promoted. The audience can interact with the avatar via social media.

The operator 48 in a preferred embodiment of the system is situated remotely connected via a wide area network 24, using their computer 50, and using a headset 74 with built in microphone, through which the audio stream initially captured from the audience facing microphone(s) 18, can ultimately be heard when processed through the virtual world 44 and received via the operator's virtual world client software 52.

The operator's voice is captured via the headset 74 and relayed via the operator's virtual world client 52 to the virtual world server(s) 44 for integration into the graphical scene media which, via the virtual world client 26 on the computer 16 driving the display 12, is reproduced synchronized with the activities of the avatar(s) 14 shown on the display 12 to the audience members) 10.

In a preferred embodiment of the system the operator's captured voice from the headset 74 is routed first through a voice morphing software application 54 to deliberately alter the voice qualities to suit the intended imaginary characteristics of the avatar(s) 14, before the stream reaches the virtual world client 52, so that the projected voice perceived by the audience member(s) 10 from the avatar(s) 14 within the graphical environment is resultantiy altered. in a preferred embodiment the software application for voice morphing 54 used is Voice Changer Diamond. (Some virtual world graphics environments contain a voice morphing component 54.)

In a preferred embodiment of the system the operator 48 uses a separate computer 50 connected over a wide area network 24, containing a virtual world client 52 to interact with the graphical environment managed by the virtual world servers 44 and control the movements and actions of the avatar(s) 14.

In a preferred embodiment of the system, the operator 48 also uses the virtual world client 52 to perceive the live visual and audio media feed initially captured of the audience members) 10 by the cameras 22 situated relative to the display 12, as the media feed(s) are integrated Into the graphics world by the virtual world server(s) 44.

In a preferred embodiment of the system the operator's speech captured via the headset 74 is also continuously processed and recognized and transcribed into text by the recognition software 56 for real time and future analysis by the system back end server(s) 46.

In a preferred embodiment of the system the transcribed text by the speech recognition software 56 is sent to the system back end server(s) 46 to the transcription analyzer 64 to be merged in parallel with transcribed text as captured by the speech recognition software 32 from the audience member(s) 10.

This enables continuous integrated analysis of the conversed interaction between the audience member(s) 10 and the speech resulting from the avatar(s) 14, for purposes including identified keyword statistics by time and avatar 14 and operator 48 and audience 10 demographics. This is useful (for example) as evidence to advertisers of brand promotion activities, consumer response, especially where a revenue stream is dependent upon being required to discuss a specific product with audience, i.e. pay per "hit" like pay per click, but for spoken words originating either from the promoting avatar or the audience. This is also useful for management supervision.

In a preferred embodiment of the system, the transcription analyzer 64 also uses the identified keyword statistics as configured, to inform the media insertion engine 68 to change to relevant media to be shown or inserted into the graphical environment by the virtual world server(s) 44 as desired.

For example, if the system detects a person 10 speaking about a competitive product, it can automatically put an advert for the benefits of a sponsored product somewhere into the graphical background being witnessed on the display. Or, when hearing about a brand, the system can generate brand named fireworks and the brand products can parade themselves through the displayed world.

In a preferred embodiment, the operator's computer 50 also runs a software application 58 which prompts the operator 48 visually on the operator's computer screen, as to what could be said to the audience member(s) via the avatar(s) 14 in the virtual world shown on the display 12.

In a preferred embodiment this prompter 58 is in contact with the transcription analyzer 64 and the demographic detection 66 and the face recognition engine 62 to inform it of input criteria to put through a configurable rule set, which trigger proposed configurable actions to be prompted at the operator 48.

For example: we've seen this audience member before (face recognition) last visited on Tuesday afternoon (face recognition timeline), at that time the avatar discussed XYZ (from transcription timeline), and the audience member stated a preference (recorded in CRM toolset), now try to discuss these alternative products which match their identified demographic (either from CRM toolset or from demographic detection), and make specific offers.

These prompts can also be integrated into the virtual world environment to instruct the virtual world to include certain objects or change certain graphics, but also to allow the operator to see the prompts within the virtual world via the virtual world client.

The prompter 58 in a preferred embodiment also interacts directly with the virtual world server(s) 44 to insert certain graphical or sound effects into the graphical environment and witnessed by the audience member(s) 10.

The prompter 58 in a preferred embodiment also reminds the operator 48 when to say certain things or to cause the avatar(s) 14 to perform certain actions, such as repeating a slogan, to be heard or seen ultimately by the audience members) 10.

This is useful for ensuring that certain brand sponsors get exposure i.e. a revenue stream, by repeating "Have a Coke and a Smile" for example, at 5 minute intervals.

In a preferred embodiment of the system, a Customer Relationship Management System (CRM) software tool 60 is situated on the operator's computer 50. This is linked, in a preferred embodiment, to the recognition tools including the face recognition engine 62 and the transcription analyzer 64. This supports the operator 48 in recognizing returning audience members 10, even if previously encountered by other operators 48, recording details of interactions with new audience member(s) 10 including conversations, key terms mentioned, remembering the audience member's preferences, and generally allowing the impression that the avatar(s) 14 remember the audience member(s) 10, and allowing an ongoing relationship to be sustained and simulated between the avatar(s) 14 and the audience members) 10.

This is useful to deal with the volume of audience member(s) 10 likely to be encountered, and to reduce the reliance on the operator's 48 own memory, as well as to allow specific avatar(s) 14 to appear to behave as consistently towards new and returning audience member(s) 10, even though those avatar(s) 14 may have been controlled by separate operators 48 over time.

In a preferred embodiment of the system a demographic detection software component 66 is situated on the server side of the system 46. This software 66, interacts with the face recognition engine 62, which is receiving continuous snapshots from the cameras 22 facing the audience members 10, attempts to assess the approximate age groups and other factors such as gender, in order to assist the operator 48 via the prompter 58 to say predefined things or have the graphical environment engine 44 or the avatar(s) 14 enact specific effects which may be suitable for the audience member(s) 10 in the context of their demographics.

A face recognition software component 62 is, in a preferred embodiment of the system, situated on the back end server of the system 46, receiving continuous image snapshots from the cameras 22 facing the audience member(s) 10, via the other intermediate components on the computer 16 connected to the cameras 22 including the camera image capture software 28 or the snapshot capture software 34.

This face recognition component 62 attempts to identify new and returning audience member(s) 10 from its local or remote list of identified faces, and the output of the currently identified face(s) is made available on a near real-time basis to other components of the system. In the prototype system, this is achieved by interfacing with the API for Face.com. in a preferred embodiment of the system an advertising insertion software component 68 is situated on the back end server for the system 46. This software component is preconfigured to insert pre-spedfied media, for example advertisements, into the graphical environment being generated by the graphics server(s) 44 for ultimate display 12 to the audience 10. The configuration of the media and other effects to be inserted include scheduling rules, real-time adaptation to inputs from the other components in the system including recognition of demographics 66 and returning audience members 62, the recognized audience 10 preferences, and previous and current detected 64 conversations.

In a preferred embodiment a local or remote server 46 is connected to the network 24 to conveniently house various components of the system.

In a preferred embodiment of the system, a transcription analyzer software component 64 is situated on the back end server 46 which receives streams of recognized spoken words as text from either or both of the audience-side 10 speech recognition component 32, and the operator-side speech recognition component 56.

The transcription analyzer 64 merges the streams of text on a compatible parallel timeline, one or both sides of the ongoing interactions between the operators) 48 and the audience member(s) 10, and detects pre-configured keywords and terminology contained in the text streams, for output to other components of the system, which can then connect these detections with other attributes such as a specific audience member or demographic; and also for other purposes including statistical analysis of pre-configured terms and keyword frequency over time, as well as unspecified keyword and terminology frequency over time.

This enables statistical reporting for advertisers as to when and how often their brands were discussed/exposed to which demographics at which geographies (even down to recognized person), including the audience's responses to that exposure. This also enables the system to generate revenue based on the advertiser "paying per mention". The transcription analyzer 64 is also used by the prompter 58 which with or without any other inputs from other components can prompt the operator 48 what to speak about.

For example: With this audience member who has returned, last time the avatar was controlled by a previous operator to appear to discuss brand XYZ, now instead speak about brand ABC since XYZ has already been discussed/promoted.

A printing control software component 70 is, in a preferred embodiment of the system, available on the operator's computer 50 to allow the operator to cause printed outputs to be printed on the printers) 40 for the audience member(s) 10 to receive. Such printed outputs include coupons, vouchers, receipts, personalized maps, and purchased products.

In a preferred embodiment of the system, the operator 48 has a transaction payment software component 72 available to process payments including credit cards. This control 72 interacts with the credit card swipe unit 42 when a physical card is present. This allows the system to take payments for transactions.

Other machines including fingerprint readers can be attached to the system to identify audience members) 10, and to facilitate certain types of transactions. For example, a passport scanner can be connected for identifying and letting an audience member through an electronically locked door.

The system can be used to allow recognized audience member(s) 10 to earn points/money/discounts for viewing or discussing specific brands/offers/advertisements.

The flow chart of Figure 2 illustrates the operation of a preferred embodiment of the system in an exemplary scenario. In this example, the display 12 and related accessories are located in a shopping mall corridor, and one avatar 14 is configured and is being rendered within the graphics world 44 and therefore visible on the display 12. 100 An audience member 10 walks near the display 12 in the shopping mall corridor. 102 The operator 48 controls the Avatar 14 to wave, and speaks into microphone 74 saying "Hello". 104 The audience member 10 notices the waving avatar 14 on the display 12 and perceives hearing it calling out "Hello" from the sound captured from the operator 48 and processed through the voice morphing 54 and ultimately to the speakers) 20, so decides to approach the display 12. 106 The facial recognition engine 62 detects a face on the latest image sent to it 34 from the camera(s) 22, and verifies it has not been witnessed by the system before and informs the operator on their console by overlaying a graphical question mark over the media image 28 being captured by the camera(s) 22 being blended for the operator's 48 viewing in the graphical world 44, which the operator 48 can see through their virtual world client 52 rendered in the virtual world. 108 The operator 48 sees, as captured via the camera(s) 22, the audience member 10 approaching the display, and also sees that the face recognition 62 is indicating that it cannot identify the person, so the operator 48 asks, via their microphone 74, the person 10 what their name is. 110 The audience member 10 perceives that the avatar 14 appears to be asking what their name is while looking at the avatar 14 on the display 12. The virtual world graphic servers 44 together with the virtual world client software 26, has ensured that the avatar's lips move in a synchronized way with the sound of the operator's speech to enhance this perception. The audience member 10 responds "Bob" to the avatar 14. The audience member's uttered response is picked up by the microphone(s) 18 and relayed ultimately to the operator 48. 112 The demographic module 66 detects the audience member 10 is probably an adult male, and informs the operator 48 by adding this fact as further descriptive text overlaid within the graphical world 44 the operator 48 is viewing. 114 The operator 48 records the audience member's name in the CRM toolset 60 which is also linked informationally to the identified face supplied automatically with a serial number by the facial recognition engine 62, and sees the demographic information is already complete in the CRM module 60 as supplied automatically by the demographic detection module 66. 116 The prompter software module 58, due to being informed by the demographic module 66, advises the operator 48 visually through the graphical world 44 that "Outdoor Products" have many specials on today relevant to adult males, and provides a helpful phrase example for the operator 48 to initiate conversation. 118 Seeing what the prompter 58 is suggesting, the operator 48 says into their microphone 74 Today would be a great day to visit the Outdoor Products store here in the mall, there are lots of specials today there. Do you have any plans soon for a trip anywhere?", and controls the avatar 14 to gesture animatedly on the display 12. 120 The continuous speech recognition 56, having transmitted these utterances from the operator 48 in text to the transcription analyzer 64, allows the analyzer 64 to note that operator 48 mentioned the store name. The analyzer 64 also adds this fact to the store representation statistics, with attributes including which demographic, location, and time of day. This allows the system to accumulate a summary of such statistics, including the underlying detail, and also to generate real time reporting directly transmitted, by electronic means including email, to the store who is the advertiser in this example. 122 The audience member 10 hears the question 364, perceiving that the avatar 14 is asking him, and replies to the avatar 14 that he plans a snowboarding trip in two weeks, 124 The operator 48 records the trip dates in the CRM toolset 60 linked to the audience member's other details, while saying "Wow, that's exciting!" into the microphone 74. 126 The transcription analyzer 64 records that the audience member 10 said "snowboarding", as a result of this having been one of the keywords the analyzer 64 was pre-configured, or dynamically configured, to watch for. 128 The media insertion engine 68, due to being informed by the transcription analyzer 64 which was configured to invoke an action based on detection of this term, advises the operator 48 that media insertion of snowboarding scenes is taking place, and instructs the graphics server 44 to immediately add snow effects into the graphical world and a snowboarding video clip on part of the background scene within the graphical world being ultimately shown on the display 12. 130 The prompter 58 advises the operator 48 to mention specials on snowboards and to discuss the model range by brand name, to cause the avatar 14 to appear to promote the brands to the audience member 10. 132 The operator 48 says: "I've heard there are specials on snowboards at Outdoor Products, you should consider checking out: Brand A, Brand B, Brand C, The audience member 10 perceives the avatar 14 is suggesting this. 136 In order to create further conversation and to modify the scene to keep the audience member 10 entertained by the avatar 14, the operator 48 causes the avatar 14 to additionally wear a graphical furry winter hat, by using the virtual world client 52 to instruct the virtual world servers 44 to include this graphical effect.

(Graphics effects, and objects, which can be automated to a high degree, are typically pre-built for the virtual world graphical environment. Objects can be realistic and can be used for example to allow the Avatar to appear to be demonstrating a product, for example, a kitchen utensil.) 138 The audience member 10 witnesses the scene on the display 12 changing and the avatar 14 wearing a furry hat and snow falling gentry in the scene with an action clip of a snowboarder in the background of the scene, and hears the brand suggestions being made as if they are being uttered by the avatar. 140 The audience member 10 replies to the avatar 14 saying: "OK, but where is the store?" 142 Hearing the reply, the operator 48 speaks "Here, I'll print you a map, one moment", and uses the printing control component 70 to generate a map to the Outdoor Products store from the current location, showing the brands on special and a customized message to the store's staff with the audience member's name: "Please show Bob the snowboards". The map also shows specials or discount promotions at other stores on the route. 144 The printing control component 70 communicates with the printer 40 to output the paper map. The audience member 10 receives the printout. 146 While the initial audience member 10, Bob, is examining the printed map, an additional new audience member 10 walks near the display 12. 148 The facial recognition component 62 detects another face, and verifies it is "Sue" who was last present the previous week, at this specific display 12 (not another geographically situated display elsewhere in a branch of the system), and visually alerts the operator 48 through the graphical environment 44 with the name, to indicate to the operator which face is "Sue" on the media, with other known attributes including the last encounter date and time. 150 The operator 48 sees, via the cameras 22, someone approaching and says: "Bob, please come back and tell me if you found a great snowboard? This is Sue approaching. Hi Sue, what do you think of my winter hat?" 152 The returning audience 10 member (Sue), perceiving the avatar 14 asking her, smiles at the Avatar 14 and says 'That's so funny!" 154 The prompter 58 is informed about the recognized face by the facial recognition engine component 62, and Interacts with the CRM toolset 60 to see what was last discussed, and advises the operator 48 visually to verify if a brand was tried. 156 The operator 48 says: "Did you try that soft drink, MegaJuice, I mentioned last week?", resulting in the avatar 14 appearing to ask the audience 10 member (Sue) this. 158 The transcription analyzer 64 notes that operator mentioned the specific soft drink brand, and adds this fact to the brand representation statistics, and to Sue's demographic, location and time of day for summary reporting to the brand owner who is another advertiser in this example.

It will be appreciated that the detection and recordal of references to a particular brand by the speech recognition module of the system make it possible to implement a "pay per mention" advertising revenue model.

[The interaction continues in this manner.] 160 A known audience member 10 approaches the display 12 and asks the avatar 14 if they can buy a cinema ticket for a specific film at a specific date and showing. 162 The operator 48, hearing this on their headset 48, controls the avatar 14 to nod its head and smile, and verbally replies "Sure, let me see if that^* s available", causing the audience member to perceive that the avatar 14 has understood.

The operator 48 looks up the availability of the performance ticket in the transaction payment control software module 72 which has been pre- configured and kept automatically updated with product availability. The transaction software module 72 electronically tells the media insertion engine 68 that the performance name, as a product, is being searched by the operator 48. 164 The media insertion engine 68 reacts by instructing, electronically, the virtual world graphics server(s) 44 to play a pre-configured trailer of the proposed film in the background of the graphical environment being witnessed by the audience member 10 around the avatar 14 on the display 12. 166 The operator 48 determines via the transaction software module 72 that the performance, i.e. the product for sale, is available, so the operator 48 prepares the transaction software 72 to take payment from the audience member 10, and says into their microphone 48 "Yes, it's available, you can pay by credit card now to secure a ticket". The media insertion engine 68 is instructed electronically by the transaction software 72 to display the transaction details including the price total on the display 12 via the graphical world server(s) 44. The transaction software 72 also instructs the credit card reader 42 of the transaction details to take payment. 168 The audience member 10 perceives the avatar 14 has said it found that the ticket is available and sees the price total on the screen. The audience member 10 inserts their credit card into the card reader 42, and enters their credit card PIN code as required into the card reader keypad 42 to pay for the product. 170 The card reader unit 42 electronically advises the transaction software 72 that the payment has succeeded, and the transaction software advises the operator 48. For this product the operator 48 is made aware by the transaction software 72 that a signature capture is also required to obtain a VIP pass to the performance. The operator 48 speaks "If you give me your signature, I can aiso issue a VIP pass for you, use the stylus attached to the display and sign on the box which will appear now on the display here". 172 The audience member 10 perceives the avatar 14 is asking for a signature and uses a stylus which was previously attached to the display 12 and, upon seeing a demarcated area appear on the display near the avatar 48 sign on the glass of the display which is sensitive to the stylus. The signature appears in real-time on the display 12. This signature is ultimately relayed electronically to the transaction software module 72^' for storage with the transaction.

In other embodiments signatures and other identification types can be captured using appropriate attachments to the system including finger print readers, signature pads, and RFID readers and token readers.

Other data can be captured from audience member(s) 10 such as telephone numbers using a similar technique via an attachment or touch- sensitive display 12. 174 The operator 48 completes the transaction using the transaction software 72, which then automatically prints, at the audience 10 facing printer 40, the purchased ticket, i.e. the product, the receipt, and a customized VIP voucher containing the captured signature, by electronically informing the print control software 70. 176 The CRM toolset 60, being electronically informed of the transaction by the transaction software 72, records the transaction event, connecting it with the other known and recognized details of the audience member 10. 178 The audience member 10 receives the ticket, receipt and VIP voucher from the printer 40.

Possible suitable locations for the described system include retail stores, shopping malls, entertainment venues, tourism venues, or other venues where staff could find access difficult (e.g. airside in airports). These are merely examples.

In summary, the described invention provides a graphical display of an avatar or virtual creature on a display device, observed by a person who can then interact with the avatar by talking to it or exhibiting any form of gesture or movement. The operator of the avatar is able to see the person's body language and hear their voice, and can control the avatar to respond on the display accordingly, resulting in the person perceiving the avatar to be responding, thereby forming a two-way interaction. The interaction is asymmetric, in the sense that one side of the interaction is a real person and the other a graphical avatar (controlled by a human operator), who then interact via the display and various accessories including cameras, microphones and loudspeakers.

It is believed that the described invention may have at least some of the following advantages: 1. Creation of a high degree of audience attention/attraction (e.g. in a retail environment).

2. High dwell time (unlike normal walk-by mass media). 3. Deep infusion and retention of marketing messages causing high recall of content exposed to.

4. The avatar is not pre-programmed, leading to authentic interactions with audience members, leading to higher attention from audience since the next moment is always unique to a degree (unlike repetition in a typical pre-recorded loop).

5. The avatars are not fully automated, so response isn't "robotic", leading to a warmer perception of a relationship interaction with the audience member.

6. The system allows employment of wider geography/capability of workforce into the audience's environment (e.g. retail).

7. The system allows employment of persons who would otherwise not be able or suitable to attend retail style employment, due to their appearance or disabilities.

8. The system allows multiple brands to be exposed simultaneously, e.g. thorough avatar's clothing, what avatar is demonstrating (object models), the sayings the avatar is using (promotional slogans), jewelry, background scenery (e.g. billboards in the background, or signs hanging from virtual trees).

Θ. The avatars can be any kind of creature or object, to attract different audience demographics, e.g. rabbits, mermaids, dancing boxes, a realistic person, etc.

10. The system allows more "visual real estate" in a small space, due to the perceived depth of the 3D for more display (e.g. a virtual shop with shelves lining the sides with products), saving space and therefore also the cost of the space.

The following are considered to be possible applicable end uses and sectors:

a) Advertising

b) Brand/Product/Service Advocacy

c) Brand and other evangelism.

d) Marketing and promotion

e) Brand/product information

The above list is not exhaustive.

Claims

CLAIMS 1. A system for generating an interactive display of an avatar, the system comprising: a display unit for displaying an image of an operator- controlled avatar; a camera for capturing an image of a user of the system; a control centre including: a display on which the image of the user can be displayed to an operator; and at least one input device operable by the operator to generate control signals to control the behavior of the avatar; an animation engine responsive to the control signals to animate the image of the avatar accordingly; and a communication link between the display unit and the control centre, the system being operable to permit the operator to observe the user and to manipulate said at least one input device to cause the image of the avatar to respond to the user. . The system of claim 1 including a microphone and an audio output device associated with the display unit, and a microphone and an audio output device at the control centre useable by the operator to enable real time audio communication between the operator and the user. The system of claim 2 including an audio processing module arranged to modify the operator's voice to correspond to the displayed avatar. The system of claim 2 including a voice recognition engine arranged to detect words spoken by the user, to compare detected words with a set of predetermined words, and to initiate an automated action when certain words are detected. The system of claim 4 in which the voice recognition engine is arranged to display an object on the display unit which is related to a detected word. The system of claim 4 wherein the voice recognition engine is further arranged to transcribe the words spoken by the user for real time or future analysis. The system of claim 4 wherein the voice recognition engine is arranged to transcribe the words spoken by the operator for real time or future analysis. The system of claim 6 wherein the voice recognition engine is arranged to detect predetermined keywords and details of user demographics and to generate statistics relating thereto. The system of claim 4 including a face recognition engine operable in conjunction with the voice recognition engine to identify users who have previously interacted with the system. The system of claim 1 wherein the animation engine includes a graphics server networked with the control center and the display unit. The system of claim 1 including a virtual world client component on a computer arranged to drive the display unit, and on a computer of the operator, to allow the operator to control activity within a virtual environment on the display. The system of claim 1 including a printer operable to print information related to an interaction between the user and the operator, including personal information of the user. The system of claim 1 including a card processing device for reading a payment card presented by the user and processing a payment relating to a transaction entered into between the user and the operator. A method of generating an interactive display of an avatar including: generating an image of an avatar on a display; capturing an image of a user observing the image of the avatar on the display, and displaying the captured image to an operator in real time or near real time; and generating control signals in response to operation of an input device by the operator to control the behavior of said avatar in real time or near real time and to animate the image of the avatar accordingly, thereby to cause the image of the avatar to interact with the user. The method of claim 14 including capturing the voices of the user and the operator to enable real time audio communication between the operator and the user. The method of claim 15 including processing the operator's captured voice to modify at least one characteristic thereof to correspond to the displayed avatar. The method of claim 15 including detecting words spoken by the user, comparing detected words with a set of predetermined words, and initiating an automated action when certain words are detected. The method of claim 17 including displaying an object on the display unit which is related to a detected word. The method of claim 14 including generating a printout for the user of information related to an interaction between the user and the operator, including personal information of the user. The method of claim 14 including reading a payment card presented by the user and processing a payment relating to a transaction entered into between the user and the operator.