US20040001065A1 - Electronic conference program - Google Patents

Electronic conference program Download PDF

Info

Publication number
US20040001065A1
US20040001065A1 US10/439,926 US43992603A US2004001065A1 US 20040001065 A1 US20040001065 A1 US 20040001065A1 US 43992603 A US43992603 A US 43992603A US 2004001065 A1 US2004001065 A1 US 2004001065A1
Authority
US
United States
Prior art keywords
generating
text
computers
commands
animation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/439,926
Inventor
George Grayson
James Bell
French Hickman
Douglas Gillespie
Trent Wyatt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Learn com LLC
Original Assignee
Learn com LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Learn com LLC filed Critical Learn com LLC
Priority to US10/439,926 priority Critical patent/US20040001065A1/en
Publication of US20040001065A1 publication Critical patent/US20040001065A1/en
Assigned to SILICON VALLEY BANK reassignment SILICON VALLEY BANK SECURITY AGREEMENT Assignors: LEARN.COM, INC.
Assigned to LEARN2 CORPORATION reassignment LEARN2 CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: 7THSTREET.COM, INC.
Assigned to 7TH LEVEL, INC. reassignment 7TH LEVEL, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BELL, JAMES W., GILLESPIE, DOUGLAS W., GRAYSON, GEORGE D., HICKMAN, FRENCH E., WYATT, TRENT M.
Assigned to 7THSTREET.COM, INC. reassignment 7THSTREET.COM, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: 7TH LEVEL, INC.
Assigned to LEARN.COM, INC. reassignment LEARN.COM, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LEARN2 CORPORATION
Assigned to SILICON VALLEY BANK reassignment SILICON VALLEY BANK SECURITY AGREEMENT Assignors: LEARN.COM, INC.
Assigned to LEARN.COM INC reassignment LEARN.COM INC RELEASE Assignors: SILICON VALLEY BANK
Assigned to LEARN.COM INC reassignment LEARN.COM INC RELEASE Assignors: SILICON VALLEY BANK
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1822Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/27Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving both synthetic and natural picture components, e.g. synthetic natural hybrid coding [SNHC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0018Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads

Definitions

  • This invention relates in general to computer software and; more particularly, to electronic conference software.
  • the Internet is a large network which connects millions of users world-wide.
  • the number of current Internet subscribers greatly exceeds the number of subscribers envisioned by the designers of the Internet.
  • the amount of data transferred over the Internet has exploded over the last few years, due in major part to the World Wide Web (WWW).
  • the WWW provides a graphical interface to the Internet. Accordingly, almost all Web sites are rich in graphics and sound which are automatically downloaded to users as they connect to a site.
  • video files such as MPEG (Motion Picture Experts Group) and AVI (Audio Video Interleaved, also known as MICROSOFT Video for Windows) are being added to Web sites to provide motion pictures and digital audio for downloading.
  • a meeting program allows two or more users to communicate aurally and visually.
  • the aural portion is performed by digitizing each participants voice and sending the audio packets to each of the other participants.
  • the video portion may, for example, send graphic images of selected participants to each participant of the meeting and/or allow users to share a drawing program.
  • chat program Another type of electronic conferencing program is the chat program.
  • a chat program allows one or more participants to communicate through text typed in at the keyboard of each participant of the chat session.
  • the video portion of a chat session can be accomplished through various techniques. Some chat rooms have no video portion and therefore only display the text of messages from the participants, while others use graphics to represent each user. Eliminating the video portion reduces the needed bandwidth relative to meeting software, but also some of the functionality.
  • the present invention communicates over a network by transferring a data stream of text and explicit commands from a host computer to one or more participant computers.
  • the participant computers generating audible speech and implicit commands responsive to said text and generate and generate animation responsive to said implicit and explicit commands.
  • the present invention provides significant advantages over prior art electronic conferencing programs, particularly with regard to the Internet and other on-line services. Most importantly, the bandwidth of transferring digital audio over a network is greatly reduced because text is transferred between computers and is translated into audible speech at the participating computers. Similarly, animation can be provided by storing graphic image files for repurposed animation at the participating computers responsive to the explicit commands and thereby reducing the bandwidth needed to produce animation at the participating computers.
  • FIG. 1 illustrates block diagram of an embodiment of a network which can be used in conjunction with the present invention
  • FIG. 2 illustrates a block diagram of a computer used in the network of FIG. 1;
  • FIG. 3 illustrates a state diagram describing operation of a host computer in generating a presentation
  • FIG. 4 illustrates a functional block diagram of a participant computer
  • FIGS. 5 a , 5 b and 5 c illustrate an example of a presentation
  • FIG. 6 illustrates a programming interface for programming presentations
  • FIG. 7 illustrates a user interface for a chat session
  • FIG. 8 illustrates a state diagram for operation of a host computer in a chat session
  • FIG. 8 illustrates a state diagram for operation of a participant computer in a chat session.
  • FIGS. 1 - 9 of the drawings like numerals being used for like elements of the various drawings.
  • FIG. 1 illustrates an embodiment of a network of computers which can be used as described herein to allow a plurality of users to communicate with one another using low bandwidth.
  • the network 10 could be, for example, the Internet, an Intranet (a private network using Internet protocols), a private network, such as a peer-to-peer network or a client-server network, or other publicly or privately available network.
  • the network 10 shown in FIG. 1 includes a plurality of computers 11 .
  • the computers 11 could be wired together (such as in a private intra-site network), through the telephone lines (for example, through the Internet or through another on-line service provider), or through wireless communication.
  • An electronic conference may be configured between a host computer 12 and one or more participant computers 14 .
  • Each of the computers 11 can be of conventional hardware design as shown in FIG. 2.
  • the network connection is coupled to a interface 16 (for example a modem coupled to the computer's serial port or a network interface card).
  • a display 18 and speakers 20 are coupled to processing circuitry 22 , along with storage 24 .
  • Processing circuitry 22 includes the processor, typically a microprocessor, video/graphics circuitry, such as a VGA display controller, audio processing circuitry, and input/output circuitry.
  • Storage 24 typically includes high-speed semiconductor memory, such as DRAMs (dynamic random access memory) and SRAMs (static random access memory), along with non-volatile memory, such as CD-ROMs (compact disk read only memory), DVDs (digital versatile disk), hard drives, floppy drives, magneto-optical drives and other fixed or removable media.
  • DRAMs dynamic random access memory
  • SRAMs static random access memory
  • non-volatile memory such as CD-ROMs (compact disk read only memory), DVDs (digital versatile disk), hard drives, floppy drives, magneto-optical drives and other fixed or removable media.
  • the network 10 of FIG. 1 allows communication between computers at low bandwidth.
  • Each participant computer 14 has the following resources: (1) graphic files for displaying animated characters, (2) a text-to-speech processor for converting text (typically in ASCII form) to audio speech, (3) a graphics processor to generate animation using the graphic image files responsive to graphics control information which is either implicit (from text) or explicit and (4) a communication processor controlling the flow of data between various computers 11 .
  • the text-to-speech processor could be, for example, SOFTVOICE by SoftVoice, Inc. is a software program which translates text to speech.
  • graphics are produced using repurposed animation.
  • a scene is composed of a background and one or more characters.
  • Each character may be composed of a plurality of graphic image files, each of which can be independently positioned and displayed. Animation is generated through manipulation of the graphic image files.
  • a first character may have several graphic image files depicting different head positions. Corresponding to each head position, a set of graphic files depict different lip positions. To display the character talking, the various files depicting the lip positions are displayed in a sequence synchronized to the speech so that the lips appear to be moving in a natural pattern as the speech is output through the speakers 20 . Because the files depicting the lip movements can be manipulated separately from the files displaying the head positions, only a small file need be accessed to change a lip position from one state to another, rather than changing a large file depicting the entire character.
  • An additional benefit of repurposed animation is that the various character parts can be reused to create new animation. Hence, once the participant computer has stored the various graphic image files, an unlimited number of animation sequences can be generated using the graphic image files by changing the sequence and positions of the files. Further, new files can be added to each participant computer 14 as desired.
  • the host generates presentations on one or more participant computers.
  • the capability is used, for example, to communicate with users as they connect to a particular site on the Internet as an alternative to high bandwidth movie files, such as MPEG and AVI files.
  • FIG. 3 A state diagram showing the basic operation of a presentation from the viewpoint of the host computer 12 is shown in FIG. 3.
  • the host computer 12 sends context information in state 32 .
  • the context information is used by the participant computer to set the initial scenario.
  • the context information may define, for example, the background for the display, the locations of “hot spots” in the background which may be used by the user of the participant computer to navigate to different sites or to obtain different services, and the characters in the presentation.
  • the host computer 12 begins sending a stream of text and explicit graphics and speech commands to the participant computer.
  • the text typically in ASCII form (although other forms could be used), defines the audio and also contains implicit graphics commands, since the text itself is used to generate the lip positions in the various characters.
  • the following stream could be sent to a participant computer 14 :
  • Explicit commands may also be used for the text-to-speech processor.
  • ⁇ set character — 1 voice, deep> could be used to give a character a desired inflection.
  • the participant computer 14 Upon receiving the stream, the participant computer 14 would begin the multimedia presentation. Thus, in response to the command ⁇ move character — 1 to position — 1> a participant computer 14 would begin an animation sequence defined by the command and by the present state of the animation.
  • the command ⁇ set voice charater — 1> would direct the text-to-speech processor to output speech in a certain predefined profile defined for character — 1.
  • the text “Hi, how are you today” would be output, using the text-to-speech processor 46 , in audio form to the user of a participant computer 14 .
  • the text-to-speech processor would output implicit control signals which indicate which phoneme is currently being output.
  • the implicit control information is used by the graphics processor to generate lip movements.
  • the lip movements are based not only on the particular phoneme being output, but also by other contextual information, such as the current position of the character which is speaking and other explicit graphics commands. For example, a “mad” gesture command could designate one set of lip positions mapped to the various phonemes while a “whisper” gesture command could designate a second set of lip positions mapped to the phonemes.
  • state 34 the host computer stops sending the text and control information if the user of the participant computer has exited or if the presentation has completed. The user may exit to another site or simply disconnect.
  • the user may generate an input which causes the presentation to be suspended or terminated pending another function. For example, a user may move to another site or initiate execution of a program, such as a JAVA (a Internet programming language by Sun Microsystems) applet or an ActiveX (an Internet programming language by Microsoft Corporation) applet by clicking on a background object.
  • a program such as a JAVA (a Internet programming language by Sun Microsystems) applet or an ActiveX (an Internet programming language by Microsoft Corporation) applet by clicking on a background object.
  • JAVA a Internet programming language by Sun Microsystems
  • ActiveX an Internet programming language by Microsoft Corporation
  • FIG. 4 illustrates a functional block diagram of a participant computer 14 .
  • the participant computer 14 receives communications from the host computer 12 through communications interface 40 .
  • the information stream received from the host computer 12 may be sent to one of three subsystems for processing: the scenario setup subsystem 42 , the gesture processor/interpreter 44 or the text-to-speech processor 46 .
  • the scenario setup subsystem 42 receives header information from the information stream sent by the host processor 12 to generate the background from the background database 48 .
  • the text-to-speech processor 46 receives text and explicit audio commands (such as the voice characteristic commands) from the information stream and generates an audio information stream for the computer's sound processor to generate an audible voice.
  • the text-to-speech processor also sends phoneme identifiers to the gesture processor/interpreter 44 in real-time as the audio is generated.
  • the gesture processor/interpreter 44 receives explicit graphics commands from the information stream.
  • the gesture processor/interpreter 44 based on the explicit graphics commands and the implicit graphics commands, such as phoneme information, generates the animation using character parts in the scene playback and lip synch animation databases 50 and 52 .
  • the background, scene playback and lip synch animation databases 48 - 52 store graphic image files to produce animation sequences.
  • the graphic image files can be obtained by the participant computer 14 through any number of means, such as downloading from the host computer 12 or another computer or loading from a removable media source, such as a floppy disk, CD-ROM or DVD.
  • the databases 48 - 52 can be updated by the same means.
  • an unlimited number of animations can be produced using repurposed animation techniques.
  • at least some of the animation sequences are predefined and stored in participant computers 14 .
  • “ ⁇ move character — 1 to position — 1>” defines a particular animation sequence based on the current state of the animation. Rather than download a large number of commands setting forth the sequence from the host computer, a single command would be downloaded and interpreted by the gesture processor/interpreter 42 at the participant computers 14 .
  • new animation sequences can be added to a participant computer through downloading or loading through a removable medium.
  • the lip animation is dependent not only on the phoneme being output from the text-to-speech processor 46 , but also by the position of the character. For example, a character facing forward would have different lip movements than a character facing sideways. Thus, if character — 1 is in position — 1, the lip files for position — 1 are used, while position — 2 may correspond to a different set of lip files. Consequently, there is a mapping between the scene playback database and the lip synch animation database.
  • FIGS. 5 a - c illustrate a sample animation which could be generated using the network described above.
  • the depiction shown in FIG. 5 a includes a background of non-animated objects 54 (i.e. objects which will not be animated dynamically responsive to the data stream from the host computer 12 , but which may be moving on screen as part of the background) and a pair of characters “U2” and “ME2” which are animated as a single character 56 (hereinafter “U2ME2”).
  • the background could be selected by header information in the data stream from the host computer 12 .
  • Some of the non-animated objects 54 may be hot spots for jumping to another site or performing a function, such as a file download or a JAVA script.
  • U2ME2 is in a first position, position — 1. It should be noted that a position is not necessarily a physical location on the screen, but could also refer to a particular orientation of a character. Thus position — 1 and position — 8 could be physically located at the same area of the screen, with U2ME2 facing towards the user in position — 1 and facing towards one another in position — 8.
  • the characters may speak using the text and audio commands in the data stream from the host computer.
  • the phonemes are identified by the text-to-speech processor 46 .
  • the phoneme identifiers are received by the gesture processor/interpreter 44 and used to generate natural lip movements by mapping each phoneme identifier to a lip synch file (which, as described above, is also determined by the current state of the animation).
  • FIG. 5 b illustrates U2ME2 at a second position, position — 2.
  • the movement from position — 1 to position — 2 would normally be a predetermined animation sequence which would be used each time the U2ME2 character moved from position — 1 to position — 2.
  • position — 2 more speech could be processed from text and audio control commands from host computer 12 .
  • U2ME2 is in a third position, position — 3.
  • position — 3 Once again, the movement from position — 2 to position — 3 would be a smooth animation between the two positions. Additional speech may be processed at this position.
  • FIG. 6 illustrates an example of a screen which could be used to program presentations using the characters described above.
  • the presentation programming screen 58 of FIG. 6 has a command area 60 which list the possible explicit graphic and audio commands which could be used in a presentation.
  • the list of commands can be scrolled up or down using the “actions up” or “actions down” buttons 62 a or 62 b , respectively.
  • To the left of the command area is the playlist area 64 which lists the entered commands for a particular presentation.
  • the playlist can be scrolled up or down using the scroll up or scroll down buttons 66 a or 66 b .
  • a work area 68 allows text to be entered, alone or in conjunction with chosen explicit commands.
  • a presentation could quickly be generated through very few keystrokes.
  • an example presentation could be generated as follows: Command Action in Presentation press ⁇ enter screen> U2ME2 enter press ⁇ U2 speak> sets text-to-speech processor to output audio in pattern defined for U2 type “I'm U 2. Welcome provides text for text-to-speech processor to our home” press ⁇ ME2 speak> sets text-to-speech processor to output audio in pattern defined for ME2 type “I'm ME 2.
  • a presentation could be much longer, with many more characters. However, the time spent in animating the characters for a new presentation would be minimal. Further, the size of the data stream for a 90 minute long presentation with full audio and animation would be less than 100 kilobytes and would take about a minute to load at a modem speed of 14.4 kbps (kilobits per second).
  • a 100 kilobyte presentation with animation and audio would last only about one second (depending upon resolution and frame rate).
  • the image of the MPEG or AVI file would be only about one-eighth of the screen, rather than the full screen which can be produced by the invention.
  • the presentation is downloaded using progressive downloading techniques, whereby a section of the data stream is downloaded, and a subsequent section of the data stream is downloaded while the presentation corresponding to the previous download is executed on the participant computer.
  • progressive downloading techniques whereby a section of the data stream is downloaded, and a subsequent section of the data stream is downloaded while the presentation corresponding to the previous download is executed on the participant computer.
  • a presentation may be designed to execute in an interactive or random manner by downloading sections of a data stream in response to a user action or by random selection.
  • An example of an interactive presentation would be a story in which the user picks which door to open. Subsequent sections would be downloaded to the user depending upon which door was opened. Several such selections could be provided to make the story more interesting.
  • a way to make a presentation non-repetitive would be to randomly select predefined sections or select sections based on user profiles. For example, a presentation of a companies goods may randomly select which product to present to a user on a random basis, so that the user does not receive the same promotion on each visit to the site. The presentation could further choose which products to promote (and thus which sections to download) based on user profile information, such as the age and gender of the user.
  • Chat and meeting sessions can be greatly enhanced by communicating with streams of text and explicit audio and graphics commands.
  • An example of a chat interface is shown in FIG. 7.
  • Each participant computer 14 is assigned an “avatar” 70 , which is an graphic identifier for the user.
  • avatars 70 are generally fanciful, although it would be possible for realistic depictions to be used. Further, the avatars 70 can appear two dimensional, as shown, or appear three dimensional. In the embodiment of FIG. 7, each avatar 70 is viewed in a defined space 72 , in an alternative embodiment, the avatars could move about using a VRML (Virtual Reality Modeling Language) technology.
  • VRML Virtual Reality Modeling Language
  • chat session interface shown in FIG. 7 is directed towards leisure use, more serious graphics could be used for business use. Further, while the embodiment shown has a total of four users, any number of users could be supported.
  • an alias space 74 is provided for the user's name or nickname. Thus, users may use their real name or provide a nickname.
  • the center of the interface 68 is divided into two sections, a graphic display section 76 and a text section 78 . Text input by the participant computers 14 is displayed in the text section 78 , while user-input graphics are displayed in the graphics section 76 .
  • a drawing toolbar 80 is displayed over the graphics section 76 .
  • the drawing toolbar 80 provides the tools for drawing in the graphics section 76 .
  • a flag icon 82 is used to define the voice inflection desired by each user. For example, the user at the participant computer 14 shown in FIG. 8 would be using an American accent; other accents could be used by clicking on the flag icon 82 .
  • the flag icon 82 represents explicit audio commands which will be sent as part of the text stream.
  • each user participating in the chat/meeting session chooses an avatar (or has the host computer 12 automatically choose an avatar) which is the user's graphical depiction to all other participants in the chat session.
  • the user can also choose voice characteristics (such as the accent, male/female, adult/child, and so on).
  • the communication is performed by transferring text with embedded explicit commands between the host computer 12 and the participant computers 14 .
  • text and explicit commands are initiated at the participant computers 14 and uploaded to the host computer 12 .
  • the host computer 12 receives a data stream from a participating computer 14 , it forwards that stream to all computers in the particular chat/meeting session.
  • the text is printed in the text window and transformed into audible speech by the text-to-speech processor 46 in each participant computer 14 .
  • the phonemes are identified and the associated avatar is animated responsive to the phoneme identifiers.
  • the avatars are animated not only by the implicit gesture commands from the text-to-speech processor 46 in the form of phoneme identifiers, but also by explicit commands such as ⁇ angry>, ⁇ happy>, ⁇ look left>or ⁇ look down>.
  • Other implicit commands can also be derived from the text in the form of punctuation by the “!” or “?” signs.
  • additional gestures such as raising arms to request an opportunity to speak, can be supported.
  • explicit commands can be chosen from a menu or, alternatively, typed in manually.
  • the participant computers are structured similar to those shown in FIGS. 2 and 4.
  • the communications subsystem 40 not only receives and distributes data streams from the host computer 12 , but also generates data streams to upload to the host computer 12 .
  • each participant computer 14 separately stores the scene playback files (which would contain the graphics needed to animate each avatar) and the lip synch animation files.
  • a state diagram for operation of the host computer 12 during a chat session is shown in FIG. 8.
  • the host computer 12 is in an wait state, where it is waiting for a communication from a participant computer 14 .
  • the host computer and the new participant exchange information necessary for communication and audio/visual properties of the new participant in state 92 .
  • the user can define its avatar 74 by choosing specific characteristics, such as head, hat, nose, lips and voice type.
  • the host computer 12 passes information regarding the new participant computer 12 to all of the current participant computers 12 , each of which should have the graphic files to output the chosen avatar. If any of the assets needed to reproduce a participant are not available, they can be downloaded from the host computer 12 or default characteristics can be used. Upon completion of the setup routine, the host computer 12 returns to the wait state 90 .
  • state 96 When a message is received from a participant computer 14 , the state shifts to state 96 , where the host computer receives and stores the message and then forwards the message to all computers participating in the chat session. The host computer 12 then returns to the wait state 90 .
  • FIG. 9 shows a state diagram of the operation of the participant computers with regard to communication during a chat session.
  • State 100 is the wait state, where no messages are currently being sent or received.
  • the text is sent to the text-to-speech processor 46 along with any explicit audio commands to generate an audible voice.
  • Explicit graphics commands from a received message are sent to the gesture processor/interpreter 44 along with implicit graphics commands from the text-to-speech processor 46 . These commands are used to animate the avatar corresponding to the received message.
  • the participant computer 14 returns to the wait state 100 .
  • the state shifts to state 104 , where the participant computer 14 uploads the message to the host computer 12 for broadcast to the group of participant computers 14 participating in the chat session.
  • the host computer may modify the user input; for example “ ⁇ grin>” could be modified to “%G”, which is smaller and easily identified as a command.
  • the bandwidth saving are minimal, the entire text of a command could be sent to the host computer.
  • the present invention provides significant advantages over the prior art.
  • the invention allows audio conversations or presentations, without using significant amounts of bandwidth over the network.
  • Applications such as chat programs are enhanced with animation and audible speech at low bandwidth. These capabilities make the conversations much more interesting and allow participants to listen to the conversation without constant viewing of the screen, which is necessary where only text is provided.
  • Meeting programs which normally transfer digital audio over the network, can greatly reduce their bandwidth requirements. Accordingly, audio conversations and presentations can be almost instantaneously received and output on the participating computers with audio and graphics. Presentations can be generated with very little production time or storage requirements.
  • graphics can enhance communications by allowing gestures which are fanciful or otherwise incapable of communication through live transmissions.

Abstract

Electronic conferencing is provide over a computer network, such as the Internet, by passing streams of text with embedded explicit audio and graphics commands. Text is translated to audible speech at the end-user computers by a text-to-speech processor to reduce the amount of data transferred between computers. Implicit commands are generated from the text at the end-user computers as the audible speech is generated. Implicit command may control, for example, the animation of lips to provide a realistic image of the words of the text being spoken. Explicit commands can be used to control the voice characteristics by the text-to-speech processor or to control animation.

Description

    BACKGROUND OF THE INVENTION
  • 1. Technical Field [0001]
  • This invention relates in general to computer software and; more particularly, to electronic conference software. [0002]
  • 2. Description of the Related Art [0003]
  • The popularity of the computer networks and, in particular, the Internet, has changed the ways in which people communicate. The Internet has made electronic mail (e-mail) and electronic conferencing available to the masses. Whereas the telephone was the only means for real-time communication several years ago, many people now use the Internet to communicate for both personal and business purposes. [0004]
  • The Internet is a large network which connects millions of users world-wide. The number of current Internet subscribers greatly exceeds the number of subscribers envisioned by the designers of the Internet. Further, the amount of data transferred over the Internet has exploded over the last few years, due in major part to the World Wide Web (WWW). The WWW provides a graphical interface to the Internet. Accordingly, almost all Web sites are rich in graphics and sound which are automatically downloaded to users as they connect to a site. More recently, video files, such as MPEG (Motion Picture Experts Group) and AVI (Audio Video Interleaved, also known as MICROSOFT Video for Windows) are being added to Web sites to provide motion pictures and digital audio for downloading. [0005]
  • With each added feature, the amount of data communicated over the Internet increases, causing delays and frustration to users. Some experts contend that the backbone of the Internet will become overburdened in the near future due to the increase in the number of users and the amount of data being transferred during a typical session. [0006]
  • One type of electronic conferencing program which is becoming increasingly useful in business and personal matters is meeting software. A meeting program allows two or more users to communicate aurally and visually. The aural portion is performed by digitizing each participants voice and sending the audio packets to each of the other participants. The video portion may, for example, send graphic images of selected participants to each participant of the meeting and/or allow users to share a drawing program. [0007]
  • The audio and video portions take significant bandwidth. Aside from burdening the Internet infrastructure, such activity can be frustrating to the meeting participants, since the audio and video information will take a significant amount of time to transfer to each participant. [0008]
  • Another type of electronic conferencing program is the chat program. A chat program allows one or more participants to communicate through text typed in at the keyboard of each participant of the chat session. The video portion of a chat session can be accomplished through various techniques. Some chat rooms have no video portion and therefore only display the text of messages from the participants, while others use graphics to represent each user. Eliminating the video portion reduces the needed bandwidth relative to meeting software, but also some of the functionality. [0009]
  • Therefore, a need has arisen to provide effective communication through the Internet or other network without using excessive bandwidth. [0010]
  • SUMMARY OF THE INVENTION
  • The present invention communicates over a network by transferring a data stream of text and explicit commands from a host computer to one or more participant computers. The participant computers generating audible speech and implicit commands responsive to said text and generate and generate animation responsive to said implicit and explicit commands. [0011]
  • The present invention provides significant advantages over prior art electronic conferencing programs, particularly with regard to the Internet and other on-line services. Most importantly, the bandwidth of transferring digital audio over a network is greatly reduced because text is transferred between computers and is translated into audible speech at the participating computers. Similarly, animation can be provided by storing graphic image files for repurposed animation at the participating computers responsive to the explicit commands and thereby reducing the bandwidth needed to produce animation at the participating computers. [0012]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which: [0013]
  • FIG. 1 illustrates block diagram of an embodiment of a network which can be used in conjunction with the present invention; [0014]
  • FIG. 2 illustrates a block diagram of a computer used in the network of FIG. 1; [0015]
  • FIG. 3 illustrates a state diagram describing operation of a host computer in generating a presentation; [0016]
  • FIG. 4 illustrates a functional block diagram of a participant computer; [0017]
  • FIGS. 5[0018] a, 5 b and 5 c illustrate an example of a presentation;
  • FIG. 6 illustrates a programming interface for programming presentations; [0019]
  • FIG. 7 illustrates a user interface for a chat session; [0020]
  • FIG. 8 illustrates a state diagram for operation of a host computer in a chat session; and [0021]
  • FIG. 8 illustrates a state diagram for operation of a participant computer in a chat session. [0022]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • The present invention is best understood in relation to FIGS. [0023] 1-9 of the drawings, like numerals being used for like elements of the various drawings.
  • FIG. 1 illustrates an embodiment of a network of computers which can be used as described herein to allow a plurality of users to communicate with one another using low bandwidth. The [0024] network 10 could be, for example, the Internet, an Intranet (a private network using Internet protocols), a private network, such as a peer-to-peer network or a client-server network, or other publicly or privately available network. The network 10 shown in FIG. 1 includes a plurality of computers 11. The computers 11 could be wired together (such as in a private intra-site network), through the telephone lines (for example, through the Internet or through another on-line service provider), or through wireless communication. An electronic conference may be configured between a host computer 12 and one or more participant computers 14.
  • Each of the [0025] computers 11 can be of conventional hardware design as shown in FIG. 2. The network connection is coupled to a interface 16 (for example a modem coupled to the computer's serial port or a network interface card). A display 18 and speakers 20 are coupled to processing circuitry 22, along with storage 24.
  • Processing circuitry [0026] 22 includes the processor, typically a microprocessor, video/graphics circuitry, such as a VGA display controller, audio processing circuitry, and input/output circuitry. Storage 24 typically includes high-speed semiconductor memory, such as DRAMs (dynamic random access memory) and SRAMs (static random access memory), along with non-volatile memory, such as CD-ROMs (compact disk read only memory), DVDs (digital versatile disk), hard drives, floppy drives, magneto-optical drives and other fixed or removable media.
  • In operation, the [0027] network 10 of FIG. 1 allows communication between computers at low bandwidth. Each participant computer 14 has the following resources: (1) graphic files for displaying animated characters, (2) a text-to-speech processor for converting text (typically in ASCII form) to audio speech, (3) a graphics processor to generate animation using the graphic image files responsive to graphics control information which is either implicit (from text) or explicit and (4) a communication processor controlling the flow of data between various computers 11. The text-to-speech processor could be, for example, SOFTVOICE by SoftVoice, Inc. is a software program which translates text to speech.
  • Repurposed Animation [0028]
  • In the preferred embodiment, graphics are produced using repurposed animation. In repurposed animation, a scene is composed of a background and one or more characters. Each character may be composed of a plurality of graphic image files, each of which can be independently positioned and displayed. Animation is generated through manipulation of the graphic image files. [0029]
  • For example, a first character may have several graphic image files depicting different head positions. Corresponding to each head position, a set of graphic files depict different lip positions. To display the character talking, the various files depicting the lip positions are displayed in a sequence synchronized to the speech so that the lips appear to be moving in a natural pattern as the speech is output through the [0030] speakers 20. Because the files depicting the lip movements can be manipulated separately from the files displaying the head positions, only a small file need be accessed to change a lip position from one state to another, rather than changing a large file depicting the entire character.
  • Repurposed animation is well known in the art, and is described in additional detail in U.S. Pat. No. 5,093,907, which is incorporated by reference herein. [0031]
  • An additional benefit of repurposed animation is that the various character parts can be reused to create new animation. Hence, once the participant computer has stored the various graphic image files, an unlimited number of animation sequences can be generated using the graphic image files by changing the sequence and positions of the files. Further, new files can be added to each participant computer [0032] 14 as desired.
  • Presentations [0033]
  • In a first embodiment of the present invention, the host generates presentations on one or more participant computers. The capability is used, for example, to communicate with users as they connect to a particular site on the Internet as an alternative to high bandwidth movie files, such as MPEG and AVI files. [0034]
  • A state diagram showing the basic operation of a presentation from the viewpoint of the host computer [0035] 12 is shown in FIG. 3. When a new participant computer 14 connects to the site offering the presentation, the host computer 12 sends context information in state 32. The context information is used by the participant computer to set the initial scenario. The context information may define, for example, the background for the display, the locations of “hot spots” in the background which may be used by the user of the participant computer to navigate to different sites or to obtain different services, and the characters in the presentation.
  • In [0036] state 34, the host computer 12 begins sending a stream of text and explicit graphics and speech commands to the participant computer. The text, typically in ASCII form (although other forms could be used), defines the audio and also contains implicit graphics commands, since the text itself is used to generate the lip positions in the various characters.
  • For example, the following stream could be sent to a participant computer [0037] 14:
  • <move character[0038] 1 to position1> <set character1 voice, English> “Hi, how are you today.” <move character1 to position 2> “I'd like to introduce some of my friends” <move character1 to position3> <set character1 voice, deep> “Where did they go?”
  • In the example above, explicit commands are set forth within <> and text is set forth between quotes. The command <move character[0039] 1 to position1>, for example, would be interpreted by the participant computer 14 to show an animation routine which a particular character, character1, moved from its present position to a position defined as position1. It should be noted that while the graphics commands are shown herein as text strings, numeric code strings may sent from the host computer 12 to the participant computers for more space efficiency; however, the programming interface, shown in greater detail hereinbelow would use text streams to represent explicit commands for ease of programming.
  • Explicit commands may also be used for the text-to-speech processor. For example, <set character[0040] 1 voice, deep> could be used to give a character a desired inflection.
  • Upon receiving the stream, the participant computer [0041] 14 would begin the multimedia presentation. Thus, in response to the command <move character1 to position1> a participant computer 14 would begin an animation sequence defined by the command and by the present state of the animation. The command <set voice charater1> would direct the text-to-speech processor to output speech in a certain predefined profile defined for character1. The text “Hi, how are you today” would be output, using the text-to-speech processor 46, in audio form to the user of a participant computer 14. As the audio was output, the text-to-speech processor would output implicit control signals which indicate which phoneme is currently being output. The implicit control information is used by the graphics processor to generate lip movements. The lip movements are based not only on the particular phoneme being output, but also by other contextual information, such as the current position of the character which is speaking and other explicit graphics commands. For example, a “mad” gesture command could designate one set of lip positions mapped to the various phonemes while a “whisper” gesture command could designate a second set of lip positions mapped to the phonemes.
  • In [0042] state 34, the host computer stops sending the text and control information if the user of the participant computer has exited or if the presentation has completed. The user may exit to another site or simply disconnect.
  • In some instances, the user may generate an input which causes the presentation to be suspended or terminated pending another function. For example, a user may move to another site or initiate execution of a program, such as a JAVA (a Internet programming language by Sun Microsystems) applet or an ActiveX (an Internet programming language by Microsoft Corporation) applet by clicking on a background object. In [0043] state 36, the requested function would be performed. After the requested function was completed, control would return to state 34, where the presentation was continued or restarted.
  • FIG. 4 illustrates a functional block diagram of a participant computer [0044] 14. The participant computer 14 receives communications from the host computer 12 through communications interface 40. The information stream received from the host computer 12 may be sent to one of three subsystems for processing: the scenario setup subsystem 42, the gesture processor/interpreter 44 or the text-to-speech processor 46. The scenario setup subsystem 42 receives header information from the information stream sent by the host processor 12 to generate the background from the background database 48. The text-to-speech processor 46 receives text and explicit audio commands (such as the voice characteristic commands) from the information stream and generates an audio information stream for the computer's sound processor to generate an audible voice. The text-to-speech processor also sends phoneme identifiers to the gesture processor/interpreter 44 in real-time as the audio is generated.
  • The gesture processor/[0045] interpreter 44 receives explicit graphics commands from the information stream. The gesture processor/interpreter 44, based on the explicit graphics commands and the implicit graphics commands, such as phoneme information, generates the animation using character parts in the scene playback and lip synch animation databases 50 and 52.
  • In operation, the background, scene playback and lip synch animation databases [0046] 48-52 store graphic image files to produce animation sequences. The graphic image files can be obtained by the participant computer 14 through any number of means, such as downloading from the host computer 12 or another computer or loading from a removable media source, such as a floppy disk, CD-ROM or DVD. The databases 48-52 can be updated by the same means.
  • Using the graphic image files, an unlimited number of animations can be produced using repurposed animation techniques. In the preferred embodiment, at least some of the animation sequences are predefined and stored in participant computers [0047] 14. For example, “<move character1 to position1>” defines a particular animation sequence based on the current state of the animation. Rather than download a large number of commands setting forth the sequence from the host computer, a single command would be downloaded and interpreted by the gesture processor/interpreter 42 at the participant computers 14. As with the graphic files, new animation sequences can be added to a participant computer through downloading or loading through a removable medium.
  • The lip animation is dependent not only on the phoneme being output from the text-to-[0048] speech processor 46, but also by the position of the character. For example, a character facing forward would have different lip movements than a character facing sideways. Thus, if character1 is in position1, the lip files for position1 are used, while position 2 may correspond to a different set of lip files. Consequently, there is a mapping between the scene playback database and the lip synch animation database.
  • FIGS. 5[0049] a-c illustrate a sample animation which could be generated using the network described above. The depiction shown in FIG. 5a includes a background of non-animated objects 54 (i.e. objects which will not be animated dynamically responsive to the data stream from the host computer 12, but which may be moving on screen as part of the background) and a pair of characters “U2” and “ME2” which are animated as a single character 56 (hereinafter “U2ME2”). The background could be selected by header information in the data stream from the host computer 12. Some of the non-animated objects 54 may be hot spots for jumping to another site or performing a function, such as a file download or a JAVA script.
  • In FIG. 5[0050] a, U2ME2 is in a first position, position1. It should be noted that a position is not necessarily a physical location on the screen, but could also refer to a particular orientation of a character. Thus position1 and position8 could be physically located at the same area of the screen, with U2ME2 facing towards the user in position1 and facing towards one another in position8.
  • In position[0051] 1, the characters may speak using the text and audio commands in the data stream from the host computer. As the audio is output, the phonemes are identified by the text-to-speech processor 46. The phoneme identifiers are received by the gesture processor/interpreter 44 and used to generate natural lip movements by mapping each phoneme identifier to a lip synch file (which, as described above, is also determined by the current state of the animation).
  • FIG. 5[0052] b illustrates U2ME2 at a second position, position 2. The movement from position1 to position 2 would normally be a predetermined animation sequence which would be used each time the U2ME2 character moved from position1 to position 2. At position 2, more speech could be processed from text and audio control commands from host computer 12.
  • In FIG. 5[0053] c, U2ME2 is in a third position, position3. Once again, the movement from position 2 to position3 would be a smooth animation between the two positions. Additional speech may be processed at this position.
  • The power of the presentation system described above lies in its small size, since the animation and graphics are pre-stored in the participant computer, and in its ease in programming new presentations. FIG. 6 illustrates an example of a screen which could be used to program presentations using the characters described above. [0054]
  • The [0055] presentation programming screen 58 of FIG. 6 has a command area 60 which list the possible explicit graphic and audio commands which could be used in a presentation. The list of commands can be scrolled up or down using the “actions up” or “actions down” buttons 62 a or 62 b, respectively. To the left of the command area is the playlist area 64 which lists the entered commands for a particular presentation. The playlist can be scrolled up or down using the scroll up or scroll down buttons 66 a or 66 b. A work area 68 allows text to be entered, alone or in conjunction with chosen explicit commands.
  • A list of commands which could be used in the example presentation set for above are given below. [0056]
    COMMAND COMMENT
    U2 speak set voice for U2
    ME2 speak set voice for ME2
    Move U2ME2 Pos1 Move U2ME2 to Position_1
    Move U2ME2 Pos2 Move U2ME2 to Position_2
    Move U2ME2 Pos3 Move U2ME2 to Position_3
    Move U2ME2 Pos4 Move U2ME2 to Position_4
    Move U2ME2 Pos5 Move U2ME2 to Position_5
    Move U2ME2 Pos6 Move U2ME2 to Position_6
    Move U2ME2 Pos7 Move U2ME2 to Position_7
    Move U2ME2 Pos8 Move U2ME2 to Position_8
    Enter screen U2ME2 enter screen
    Exit screen U2ME2 exit screen
    U2 mouth ON show U2's mouth
    ME2 mouth ON show ME2's mouth
    U2 mouth OFF don't, show U2's mouth
    ME2 mouth OFF don't show ME2's mouth
    U2 talk to ME2 U2 turns to ME2
    ME2 talk to U2 ME2 turns to U2
    U2 talk to screen U2 faces screen
    ME2 talk to screen ME2 faces screen
    ME2 attitude U2 ME2 talks to U2 with attitude
    U2 attitude ME2 U2 talks to ME2 with attitude
    ME2 look attitude U2 ME2 looks at with attitude
    U2 look attitude ME2 U2 looks at ME2 with attitude
  • A presentation could quickly be generated through very few keystrokes. For example, an example presentation could be generated as follows: [0057]
    Command Action in Presentation
    press <enter screen> U2ME2 enter
    press <U2 speak> sets text-to-speech processor to output audio
    in pattern defined for U2
    type “I'm U 2. Welcome provides text for text-to-speech processor
    to our home”
    press <ME2 speak> sets text-to-speech processor to output audio
    in pattern defined for ME2
    type “I'm ME 2. I'd like provides text for text-to-speech processor
    to show you around”
    press <move U2ME2 moves U2ME2 character to a position defined
    Pos 3> by as position_3
    type “We would like to provides text for text-to-speech processor
    tell you more about
    ourselves.”
    press <move U2ME2 animates movement from position_3 to
    Pos 1> position_1
    press <U2 speak> sets text-to-speech processor to output audio
    in pattern defined for U2
    type “If you would rather provides text for text-to-speech processor
    hear a story, press on the
    satellite dish>
    press <ME2 look animates movement of ME2 looking at U2 in
    attitude U2> position_1
    press <ME talk sets text-to-speech processor to output audio
    attitude U2> in pattern defined for ME2
    Type “Hey, that was my provides text for text-to-speech processor
    line.”
  • In practice, a presentation could be much longer, with many more characters. However, the time spent in animating the characters for a new presentation would be minimal. Further, the size of the data stream for a 90 minute long presentation with full audio and animation would be less than 100 kilobytes and would take about a minute to load at a modem speed of 14.4 kbps (kilobits per second). Using current day methods of sending animation, such as a MPEG or AVI file, a 100 kilobyte presentation with animation and audio would last only about one second (depending upon resolution and frame rate). Moreover, the image of the MPEG or AVI file would be only about one-eighth of the screen, rather than the full screen which can be produced by the invention. [0058]
  • While an entire presentation can be downloaded and performed on the participant computers, in the preferred embodiment, the presentation is downloaded using progressive downloading techniques, whereby a section of the data stream is downloaded, and a subsequent section of the data stream is downloaded while the presentation corresponding to the previous download is executed on the participant computer. By downloading sections of the data stream while previous sections are executing on the participant computer, the effective download time for the presentation is reduced. [0059]
  • Further, a presentation may be designed to execute in an interactive or random manner by downloading sections of a data stream in response to a user action or by random selection. An example of an interactive presentation would be a story in which the user picks which door to open. Subsequent sections would be downloaded to the user depending upon which door was opened. Several such selections could be provided to make the story more interesting. [0060]
  • A way to make a presentation non-repetitive would be to randomly select predefined sections or select sections based on user profiles. For example, a presentation of a companies goods may randomly select which product to present to a user on a random basis, so that the user does not receive the same promotion on each visit to the site. The presentation could further choose which products to promote (and thus which sections to download) based on user profile information, such as the age and gender of the user. [0061]
  • Chat/Meeting Sessions [0062]
  • Chat and meeting sessions can be greatly enhanced by communicating with streams of text and explicit audio and graphics commands. An example of a chat interface is shown in FIG. 7. [0063]
  • Each participant computer [0064] 14 is assigned an “avatar” 70, which is an graphic identifier for the user. As shown in FIG. 7, the avatars 70 are generally fanciful, although it would be possible for realistic depictions to be used. Further, the avatars 70 can appear two dimensional, as shown, or appear three dimensional. In the embodiment of FIG. 7, each avatar 70 is viewed in a defined space 72, in an alternative embodiment, the avatars could move about using a VRML (Virtual Reality Modeling Language) technology.
  • It should be noted that the particular embodiment of the chat session interface shown in FIG. 7 is directed towards leisure use, more serious graphics could be used for business use. Further, while the embodiment shown has a total of four users, any number of users could be supported. [0065]
  • Adjacent each avatar, an [0066] alias space 74 is provided for the user's name or nickname. Thus, users may use their real name or provide a nickname. The center of the interface 68 is divided into two sections, a graphic display section 76 and a text section 78. Text input by the participant computers 14 is displayed in the text section 78, while user-input graphics are displayed in the graphics section 76. A drawing toolbar 80 is displayed over the graphics section 76. The drawing toolbar 80 provides the tools for drawing in the graphics section 76. A flag icon 82 is used to define the voice inflection desired by each user. For example, the user at the participant computer 14 shown in FIG. 8 would be using an American accent; other accents could be used by clicking on the flag icon 82. The flag icon 82 represents explicit audio commands which will be sent as part of the text stream.
  • In operation, each user participating in the chat/meeting session chooses an avatar (or has the host computer [0067] 12 automatically choose an avatar) which is the user's graphical depiction to all other participants in the chat session. In the preferred embodiment, the user can also choose voice characteristics (such as the accent, male/female, adult/child, and so on). As described in connection with Presentations, supra, the communication is performed by transferring text with embedded explicit commands between the host computer 12 and the participant computers 14. In the case of a chat or meeting session, text and explicit commands are initiated at the participant computers 14 and uploaded to the host computer 12. When the host computer 12 receives a data stream from a participating computer 14, it forwards that stream to all computers in the particular chat/meeting session. The text is printed in the text window and transformed into audible speech by the text-to-speech processor 46 in each participant computer 14. As the speech is output, the phonemes are identified and the associated avatar is animated responsive to the phoneme identifiers.
  • In the preferred embodiment, the avatars are animated not only by the implicit gesture commands from the text-to-[0068] speech processor 46 in the form of phoneme identifiers, but also by explicit commands such as <angry>, <happy>, <look left>or <look down>. Other implicit commands can also be derived from the text in the form of punctuation by the “!” or “?” signs. For meeting software, additional gestures, such as raising arms to request an opportunity to speak, can be supported.
  • As in the Presentation section, explicit commands can be chosen from a menu or, alternatively, typed in manually. [0069]
  • The participant computers are structured similar to those shown in FIGS. 2 and 4. In the case of a chat/meeting session, the [0070] communications subsystem 40 not only receives and distributes data streams from the host computer 12, but also generates data streams to upload to the host computer 12. As described in connection with the Presentation section, each participant computer 14 separately stores the scene playback files (which would contain the graphics needed to animate each avatar) and the lip synch animation files.
  • A state diagram for operation of the host computer [0071] 12 during a chat session is shown in FIG. 8. In state 90, the host computer 12 is in an wait state, where it is waiting for a communication from a participant computer 14. When a new computer requests to become a participant in the chat session, the host computer and the new participant exchange information necessary for communication and audio/visual properties of the new participant in state 92. This involves, for example, identifying the user by Internet address (or other network address) and assigning avatar graphics and default voice properties. In the preferred embodiment, the user can define its avatar 74 by choosing specific characteristics, such as head, hat, nose, lips and voice type. In state 94, the host computer 12 passes information regarding the new participant computer 12 to all of the current participant computers 12, each of which should have the graphic files to output the chosen avatar. If any of the assets needed to reproduce a participant are not available, they can be downloaded from the host computer 12 or default characteristics can be used. Upon completion of the setup routine, the host computer 12 returns to the wait state 90.
  • When a message is received from a participant computer [0072] 14, the state shifts to state 96, where the host computer receives and stores the message and then forwards the message to all computers participating in the chat session. The host computer 12 then returns to the wait state 90.
  • FIG. 9 shows a state diagram of the operation of the participant computers with regard to communication during a chat session. [0073] State 100 is the wait state, where no messages are currently being sent or received. As a new message is received in state 102, the text is sent to the text-to-speech processor 46 along with any explicit audio commands to generate an audible voice. Explicit graphics commands from a received message are sent to the gesture processor/interpreter 44 along with implicit graphics commands from the text-to-speech processor 46. These commands are used to animate the avatar corresponding to the received message. After the message is processed, the participant computer 14 returns to the wait state 100.
  • When the user of a participant computer has prepared a message to send, the state shifts to [0074] state 104, where the participant computer 14 uploads the message to the host computer 12 for broadcast to the group of participant computers 14 participating in the chat session. In uploading the message, the host computer may modify the user input; for example “<grin>” could be modified to “%G”, which is smaller and easily identified as a command. Alternatively, because the bandwidth saving are minimal, the entire text of a command could be sent to the host computer.
  • The present invention provides significant advantages over the prior art. The invention allows audio conversations or presentations, without using significant amounts of bandwidth over the network. Applications such as chat programs are enhanced with animation and audible speech at low bandwidth. These capabilities make the conversations much more interesting and allow participants to listen to the conversation without constant viewing of the screen, which is necessary where only text is provided. Meeting programs, which normally transfer digital audio over the network, can greatly reduce their bandwidth requirements. Accordingly, audio conversations and presentations can be almost instantaneously received and output on the participating computers with audio and graphics. Presentations can be generated with very little production time or storage requirements. [0075]
  • Additionally, the use of graphics can enhance communications by allowing gestures which are fanciful or otherwise incapable of communication through live transmissions. [0076]
  • Although the Detailed Description of the invention has been directed to certain exemplary embodiments, various modifications of these embodiments, as well as alternative embodiments, will be suggested to those skilled in the art. The invention encompasses any modifications or alternative embodiments that fall within the scope of the claims. [0077]

Claims (21)

What is claimed is:
1. A method of communicating over a network, comprising the steps of:
transferring a data stream of text and explicit commands from a transmitting computer to one or more receiving computers;
generating audible speech at the one or more receiving computers responsive to said text;
generating implicit commands responsive to said text; and
generating animation at said one or more receiving computers responsive to said implicit and explicit commands.
2. The method of claim I wherein said step of generating implicit commands includes the step of generating lip synch commands for generating lip movements corresponding to the audible speech.
3. The method of claim 2 wherein said lip synch commands comprise phoneme identifiers corresponding to the audible speech.
4. The method of claim 2 wherein said step of generating implicit commands further comprises the step of generating gesture commands for animating gestures responsive to punctuation.
5. The method of claim 1 wherein step of transferring a data stream includes the step of transferring explicit animation commands and explicit speech commands.
6. The method of claim 5 wherein said explicit speech commands define voice characteristics and said step of generating audible speech comprises the step of generating audible speech responsive to said text and said explicit speech commands.
7. The method of claim 1 where said one or more receiving computers comprise at least two receiving computers and further comprising the step of transferring said data stream from one of said receiving computers to said transmitting computer and transferring said data stream from said transmitting computer to receiving computers to allow communication between said receiving computers.
8. The method of claim 1 and further comprising the steps of storing graphic image files in said receiving computers prior to transferring said data stream.
9. The method of claim 8 wherein said step of generating animation comprises the step of manipulating said graphic image files responsive to said explicit commands.
10. The method of claim 8 and further comprising the step of storing background files in said receiving computers.
11. A method of generating a presentation on a plurality of participant computers from host computer over a network, comprising the steps of:
downloading a data stream including text and animation control signals from said host to said participant computers over a network connection, said animation control signals defining an animation sequence using a plurality of image files stored on the participant computers;
generating animation on said participant computers by displaying said image files responsive to said animation control signals;
generating audible speech on said participant computers responsive to said text; and
generating additional animation on said participant computers responsive to said text.
12. The method of claim 11 wherein said step of generating additional animation on said participating computers comprises the step of generating lip movement animation.
13. The method of claim 12 and further comprising the step of generating phoneme information as the audible speech is generated.
14. The method of claim 12 wherein said step of generating additional animation on said participating computers comprises the step of generating facial expressions responsive to punctuation in said text.
15. The method of claim 11 wherein said step of downloading includes downloading speech control signals for defining voice characteristics associated with said text.
16. A method of enabling two or more participant computers to communicate over a network, comprising the steps of:
transferring a data stream including text from one of said participant computers to others of said participant computers;
generating audible speech on said other participant computers responsive to said text; and
generating animation on said other participant computers responsive to said data stream.
17. The method of claim 16 wherein said step of generating animation comprises the steps of:
generating phoneme identifiers corresponding the audible speech; and
mapping said phoneme identifiers to image files stored on said other participant computers.
18. The method of claim 16 wherein said uploading step comprises the step of uploading a data stream including text and explicit commands from said one participant computers to said other participant computers.
19. The method of claim 18 and further comprising the step of generating animation responsive to one or more of said explicit commands.
20. The method of claim 19 wherein said step of generating audible speech comprises the step of generating audible speech on said other participant computers responsive to said text and one or more of said explicit commands as the data stream is received.
21. The method of claim 16 wherein said transferring step comprises the step of transferring a data stream including text from said one participant computers to said other participant computers via a host computer.
US10/439,926 1996-11-18 2003-05-16 Electronic conference program Abandoned US20040001065A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/439,926 US20040001065A1 (en) 1996-11-18 2003-05-16 Electronic conference program

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US08/751,506 US5963217A (en) 1996-11-18 1996-11-18 Network conference system using limited bandwidth to generate locally animated displays
US41219099A 1999-10-05 1999-10-05
US10/439,926 US20040001065A1 (en) 1996-11-18 2003-05-16 Electronic conference program

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US41219099A Continuation 1996-11-18 1999-10-05

Publications (1)

Publication Number Publication Date
US20040001065A1 true US20040001065A1 (en) 2004-01-01

Family

ID=25022290

Family Applications (2)

Application Number Title Priority Date Filing Date
US08/751,506 Expired - Lifetime US5963217A (en) 1996-11-18 1996-11-18 Network conference system using limited bandwidth to generate locally animated displays
US10/439,926 Abandoned US20040001065A1 (en) 1996-11-18 2003-05-16 Electronic conference program

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US08/751,506 Expired - Lifetime US5963217A (en) 1996-11-18 1996-11-18 Network conference system using limited bandwidth to generate locally animated displays

Country Status (1)

Country Link
US (2) US5963217A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040002325A1 (en) * 1997-07-22 2004-01-01 Evans Michael Paul Mobile handset with browser application to be used to recognize textual presentation
US20050131677A1 (en) * 2003-12-12 2005-06-16 Assadollahi Ramin O. Dialog driven personal information manager
US20060109273A1 (en) * 2004-11-19 2006-05-25 Rams Joaquin S Real-time multi-media information and communications system
WO2008111085A2 (en) * 2007-03-13 2008-09-18 Oren Cohen A method and system for blind dating in an electronic dating service
US20110047267A1 (en) * 2007-05-24 2011-02-24 Sylvain Dany Method and Apparatus for Managing Communication Between Participants in a Virtual Environment
US20160294742A1 (en) * 2002-05-31 2016-10-06 Microsoft Technology Licensing, Llc Multiple personalities in chat communications
US10291556B2 (en) 2002-11-21 2019-05-14 Microsoft Technology Licensing, Llc Multiple personalities
US10504266B2 (en) 2003-03-03 2019-12-10 Microsoft Technology Licensing, Llc Reactive avatars
US10616367B2 (en) 2003-03-03 2020-04-07 Microsoft Technology Licensing, Llc Modifying avatar behavior based on user action or mood
US11654085B2 (en) 2018-05-18 2023-05-23 Baxter International Inc. Method of making dual chamber flexible container

Families Citing this family (106)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7859551B2 (en) 1993-10-15 2010-12-28 Bulman Richard L Object customization and presentation system
US6584498B2 (en) 1996-09-13 2003-06-24 Planet Web, Inc. Dynamic preloading of web pages
US6377978B1 (en) 1996-09-13 2002-04-23 Planetweb, Inc. Dynamic downloading of hypertext electronic mail messages
JP3012560B2 (en) * 1997-06-25 2000-02-21 日本電気ソフトウェア株式会社 Computer-based electronic dialogue method, computer-to-computer electronic dialogue device, and computer-readable recording medium recording computer-based electronic dialogue program
US6567779B1 (en) * 1997-08-05 2003-05-20 At&T Corp. Method and system for aligning natural and synthetic video to speech synthesis
US7366670B1 (en) * 1997-08-05 2008-04-29 At&T Corp. Method and system for aligning natural and synthetic video to speech synthesis
US6542923B2 (en) 1997-08-21 2003-04-01 Planet Web, Inc. Active electronic mail
US7325077B1 (en) 1997-08-21 2008-01-29 Beryl Technical Assays Llc Miniclient for internet appliance
CA2227361A1 (en) * 1998-01-19 1999-07-19 Taarna Studios Inc. Method and apparatus for providing real-time animation utilizing a database of expressions
JPH11212934A (en) * 1998-01-23 1999-08-06 Sony Corp Information processing device and method and information supply medium
US6636219B2 (en) 1998-02-26 2003-10-21 Learn.Com, Inc. System and method for automatic animation generation
US6684211B1 (en) * 1998-04-01 2004-01-27 Planetweb, Inc. Multimedia communication and presentation
US6351267B1 (en) * 1998-12-10 2002-02-26 Gizmoz Ltd Fast transmission of graphic objects
JP2000176168A (en) * 1998-12-18 2000-06-27 Konami Co Ltd Message preparation game machine and message preparation method
US6370597B1 (en) * 1999-08-12 2002-04-09 United Internet Technologies, Inc. System for remotely controlling an animatronic device in a chat environment utilizing control signals sent by a remote device over the internet
US6647417B1 (en) 2000-02-10 2003-11-11 World Theatre, Inc. Music distribution systems
US7647618B1 (en) 1999-08-27 2010-01-12 Charles Eric Hunter Video distribution system
US8090619B1 (en) 1999-08-27 2012-01-03 Ochoa Optics Llc Method and system for music distribution
US7209900B2 (en) 1999-08-27 2007-04-24 Charles Eric Hunter Music distribution systems
US6952685B1 (en) 1999-08-27 2005-10-04 Ochoa Optics Llc Music distribution system and associated antipiracy protection
US8656423B2 (en) 1999-08-27 2014-02-18 Ochoa Optics Llc Video distribution system
US6557026B1 (en) * 1999-09-29 2003-04-29 Morphism, L.L.C. System and apparatus for dynamically generating audible notices from an information network
USRE42904E1 (en) 1999-09-29 2011-11-08 Frederick Monocacy Llc System and apparatus for dynamically generating audible notices from an information network
US6766299B1 (en) * 1999-12-20 2004-07-20 Thrillionaire Productions, Inc. Speech-controlled animation system
US6404438B1 (en) * 1999-12-21 2002-06-11 Electronic Arts, Inc. Behavioral learning for a visual representation in a communication environment
US7124167B1 (en) 2000-01-19 2006-10-17 Alberto Bellotti Computer based system for directing communications over electronic networks
US9252898B2 (en) 2000-01-28 2016-02-02 Zarbaña Digital Fund Llc Music distribution systems
JP2001325195A (en) * 2000-03-06 2001-11-22 Sony Computer Entertainment Inc Communication system, entertainment device, recording medium and program
DE10018143C5 (en) * 2000-04-12 2012-09-06 Oerlikon Trading Ag, Trübbach DLC layer system and method and apparatus for producing such a layer system
JP4547768B2 (en) * 2000-04-21 2010-09-22 ソニー株式会社 Information processing apparatus and method, and recording medium
US6453294B1 (en) * 2000-05-31 2002-09-17 International Business Machines Corporation Dynamic destination-determined multimedia avatars for interactive on-line communications
US7159008B1 (en) 2000-06-30 2007-01-02 Immersion Corporation Chat interface with haptic feedback functionality
US6788949B1 (en) 2000-09-21 2004-09-07 At&T Corp. Method and system for transfer of mobile chat sessions
WO2002025595A1 (en) * 2000-09-21 2002-03-28 The Regents Of The University Of California Visual display methods for use in computer-animated speech production models
US7120583B2 (en) * 2000-10-02 2006-10-10 Canon Kabushiki Kaisha Information presentation system, information presentation apparatus, control method thereof and computer readable memory
WO2002037803A2 (en) * 2000-10-30 2002-05-10 Sonexis, Inc. Method and system for providing audio conferencing services using streaming audio
US7039676B1 (en) 2000-10-31 2006-05-02 International Business Machines Corporation Using video image analysis to automatically transmit gestures over a network in a chat or instant messaging session
US6963839B1 (en) 2000-11-03 2005-11-08 At&T Corp. System and method of controlling sound in a multi-media communication application
US7035803B1 (en) 2000-11-03 2006-04-25 At&T Corp. Method for sending multi-media messages using customizable background images
US7091976B1 (en) 2000-11-03 2006-08-15 At&T Corp. System and method of customizing animated entities for use in a multi-media communication application
US6990452B1 (en) 2000-11-03 2006-01-24 At&T Corp. Method for sending multi-media messages using emoticons
US7203648B1 (en) 2000-11-03 2007-04-10 At&T Corp. Method for sending multi-media messages with customized audio
US6976082B1 (en) 2000-11-03 2005-12-13 At&T Corp. System and method for receiving multi-media messages
US20080040227A1 (en) 2000-11-03 2008-02-14 At&T Corp. System and method of marketing using a multi-media communication system
US20020122391A1 (en) * 2001-01-12 2002-09-05 Shalit Andrew L. Method and system for providing audio conferencing services to users of on-line text messaging services
US20020095465A1 (en) * 2001-01-16 2002-07-18 Diane Banks Method and system for participating in chat sessions
US8112311B2 (en) 2001-02-12 2012-02-07 Ochoa Optics Llc Systems and methods for distribution of entertainment and advertising content
CA2442195A1 (en) * 2001-03-27 2002-10-03 Interlego Ag Method, system and storage medium for an iconic language communication tool
US20020194006A1 (en) * 2001-03-29 2002-12-19 Koninklijke Philips Electronics N.V. Text to visual speech system and method incorporating facial emotions
US7085259B2 (en) * 2001-07-31 2006-08-01 Comverse, Inc. Animated audio messaging
US7960005B2 (en) 2001-09-14 2011-06-14 Ochoa Optics Llc Broadcast distribution of content for storage on hardware protected optical storage media
US7671861B1 (en) 2001-11-02 2010-03-02 At&T Intellectual Property Ii, L.P. Apparatus and method of customizing animated entities for use in a multi-media communication application
US7221654B2 (en) * 2001-11-13 2007-05-22 Nokia Corporation Apparatus, and associated method, for selecting radio communication system parameters utilizing learning controllers
US7401020B2 (en) * 2002-11-29 2008-07-15 International Business Machines Corporation Application of emotion-based intonation and prosody to speech in text-to-speech systems
US7177286B2 (en) 2002-02-25 2007-02-13 Sonexis, Inc. System and method for processing digital audio packets for telephone conferencing
US7505423B2 (en) * 2002-02-25 2009-03-17 Sonexis, Inc. Telephone conferencing system and method
US7145883B2 (en) 2002-02-25 2006-12-05 Sonexis, Inc. System and method for gain control of audio sample packets
US7917581B2 (en) 2002-04-02 2011-03-29 Verizon Business Global Llc Call completion via instant communications client
AU2003223408A1 (en) * 2002-04-02 2003-10-20 Worldcom, Inc. Communications gateway with messaging communications interface
US8856236B2 (en) 2002-04-02 2014-10-07 Verizon Patent And Licensing Inc. Messaging response system
US7689649B2 (en) * 2002-05-31 2010-03-30 Aol Inc. Rendering destination instant messaging personalization items before communicating with destination
US7779076B2 (en) * 2002-05-31 2010-08-17 Aol Inc. Instant messaging personalization
AU2003237261A1 (en) * 2002-05-31 2003-12-19 America Online, Inc. Instant messaging personalization
US20030225847A1 (en) * 2002-05-31 2003-12-04 Brian Heikes Sending instant messaging personalization items
US20030232245A1 (en) * 2002-06-13 2003-12-18 Jeffrey A. Turak Interactive training software
US20040085259A1 (en) * 2002-11-04 2004-05-06 Mark Tarlton Avatar control using a communication device
US20050083851A1 (en) * 2002-11-18 2005-04-21 Fotsch Donald J. Display of a connection speed of an on-line user
AU2003291042A1 (en) 2002-11-18 2004-06-15 America Online, Inc. Enhanced buddy list interface
WO2004049113A2 (en) * 2002-11-21 2004-06-10 America Online, Inc. Multiple personalities
US7636755B2 (en) 2002-11-21 2009-12-22 Aol Llc Multiple avatar personalities
US20070113181A1 (en) * 2003-03-03 2007-05-17 Blattner Patrick D Using avatars to communicate real-time information
US7913176B1 (en) 2003-03-03 2011-03-22 Aol Inc. Applying access controls to communications with avatars
US20040260770A1 (en) * 2003-06-06 2004-12-23 Bruce Medlin Communication method for business
US6954522B2 (en) * 2003-12-15 2005-10-11 International Business Machines Corporation Caller identifying information encoded within embedded digital information
US7689543B2 (en) * 2004-03-11 2010-03-30 International Business Machines Corporation Search engine providing match and alternative answers using cumulative probability values
US20060075449A1 (en) * 2004-09-24 2006-04-06 Cisco Technology, Inc. Distributed architecture for digital program insertion in video streams delivered over packet networks
US7870590B2 (en) * 2004-10-20 2011-01-11 Cisco Technology, Inc. System and method for fast start-up of live multicast streams transmitted over a packet network
US7468729B1 (en) 2004-12-21 2008-12-23 Aol Llc, A Delaware Limited Liability Company Using an avatar to generate user profile information
US9652809B1 (en) 2004-12-21 2017-05-16 Aol Inc. Using user profile information to determine an avatar and/or avatar characteristics
US7680047B2 (en) * 2005-11-22 2010-03-16 Cisco Technology, Inc. Maximum transmission unit tuning mechanism for a real-time transport protocol stream
WO2007092629A2 (en) * 2006-02-09 2007-08-16 Nms Communications Corporation Smooth morphing between personal video calling avatars
US7965771B2 (en) 2006-02-27 2011-06-21 Cisco Technology, Inc. Method and apparatus for immediate display of multicast IPTV over a bandwidth constrained network
US8218654B2 (en) 2006-03-08 2012-07-10 Cisco Technology, Inc. Method for reducing channel change startup delays for multicast digital video streams
US7694002B2 (en) * 2006-04-07 2010-04-06 Cisco Technology, Inc. System and method for dynamically upgrading / downgrading a conference session
US20070263824A1 (en) * 2006-04-18 2007-11-15 Cisco Technology, Inc. Network resource optimization in a video conference
US8326927B2 (en) * 2006-05-23 2012-12-04 Cisco Technology, Inc. Method and apparatus for inviting non-rich media endpoints to join a conference sidebar session
US8526336B2 (en) * 2006-08-09 2013-09-03 Cisco Technology, Inc. Conference resource allocation and dynamic reallocation
US8358763B2 (en) 2006-08-21 2013-01-22 Cisco Technology, Inc. Camping on a conference or telephony port
US8031701B2 (en) 2006-09-11 2011-10-04 Cisco Technology, Inc. Retransmission-based stream repair and stream join
US8120637B2 (en) 2006-09-20 2012-02-21 Cisco Technology, Inc. Virtual theater system for the home
US7847815B2 (en) * 2006-10-11 2010-12-07 Cisco Technology, Inc. Interaction based on facial recognition of conference participants
US7693190B2 (en) * 2006-11-22 2010-04-06 Cisco Technology, Inc. Lip synchronization for audio/video transmissions over a network
US8121277B2 (en) * 2006-12-12 2012-02-21 Cisco Technology, Inc. Catch-up playback in a conferencing system
US8149261B2 (en) * 2007-01-10 2012-04-03 Cisco Technology, Inc. Integration of audio conference bridge with video multipoint control unit
US8769591B2 (en) 2007-02-12 2014-07-01 Cisco Technology, Inc. Fast channel change on a bandwidth constrained network
US8208003B2 (en) * 2007-03-23 2012-06-26 Cisco Technology, Inc. Minimizing fast video update requests in a video conferencing system
US20080253369A1 (en) 2007-04-16 2008-10-16 Cisco Technology, Inc. Monitoring and correcting upstream packet loss
US8315652B2 (en) 2007-05-18 2012-11-20 Immersion Corporation Haptically enabled messaging
US8289362B2 (en) * 2007-09-26 2012-10-16 Cisco Technology, Inc. Audio directionality control for a multi-display switched video conferencing system
US8787153B2 (en) 2008-02-10 2014-07-22 Cisco Technology, Inc. Forward error correction based data recovery with path diversity
US8484293B2 (en) 2010-12-30 2013-07-09 International Business Machines Corporation Managing delivery of electronic meeting content
US9015555B2 (en) 2011-11-18 2015-04-21 Cisco Technology, Inc. System and method for multicast error recovery using sampled feedback
US11169655B2 (en) * 2012-10-19 2021-11-09 Gree, Inc. Image distribution method, image distribution server device and chat system
WO2017137948A1 (en) * 2016-02-10 2017-08-17 Vats Nitin Producing realistic body movement using body images
US10529115B2 (en) * 2017-03-20 2020-01-07 Google Llc Generating cartoon images from photos
US10586369B1 (en) * 2018-01-31 2020-03-10 Amazon Technologies, Inc. Using dialog and contextual data of a virtual reality environment to create metadata to drive avatar animation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5491743A (en) * 1994-05-24 1996-02-13 International Business Machines Corporation Virtual conference system and terminal apparatus therefor
US5608839A (en) * 1994-03-18 1997-03-04 Lucent Technologies Inc. Sound-synchronized video system
US5657426A (en) * 1994-06-10 1997-08-12 Digital Equipment Corporation Method and apparatus for producing audio-visual synthetic speech
US5659692A (en) * 1992-01-13 1997-08-19 Massachusetts Institute Of Technology Computer method and apparatus for video conferencing
US5880731A (en) * 1995-12-14 1999-03-09 Microsoft Corporation Use of avatars with automatic gesturing and bounded interaction in on-line chat session
US5923337A (en) * 1996-04-23 1999-07-13 Image Link Co., Ltd. Systems and methods for communicating through computer animated images

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BE793543A (en) * 1971-12-30 1973-04-16 Ibm MECHANISM POSITION CODING METHODS
US4884972A (en) * 1986-11-26 1989-12-05 Bright Star Technology, Inc. Speech synchronized animation
US5111409A (en) * 1989-07-21 1992-05-05 Elon Gasper Authoring and use systems for sound synchronized animation
US5544317A (en) * 1990-11-20 1996-08-06 Berg; David A. Method for continuing transmission of commands for interactive graphics presentation in a computer network
US5630017A (en) * 1991-02-19 1997-05-13 Bright Star Technology, Inc. Advanced tools for speech synchronized animation
AU4538093A (en) * 1992-06-15 1994-01-04 Bunn, Daniel W. Audio communication system for a computer network
US5471318A (en) * 1993-04-22 1995-11-28 At&T Corp. Multimedia communications network
US5544315A (en) * 1993-05-10 1996-08-06 Communication Broadband Multimedia, Inc. Network multimedia interface
US5557724A (en) * 1993-10-12 1996-09-17 Intel Corporation User interface, method, and apparatus selecting and playing channels having video, audio, and/or text streams
US5475738A (en) * 1993-10-21 1995-12-12 At&T Corp. Interface between text and voice messaging systems
US5347306A (en) * 1993-12-17 1994-09-13 Mitsubishi Electric Research Laboratories, Inc. Animated electronic meeting place
GB2284968A (en) * 1993-12-18 1995-06-21 Ibm Audio conferencing system
US5502694A (en) * 1994-07-22 1996-03-26 Kwoh; Daniel S. Method and apparatus for compressed data transmission

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659692A (en) * 1992-01-13 1997-08-19 Massachusetts Institute Of Technology Computer method and apparatus for video conferencing
US5608839A (en) * 1994-03-18 1997-03-04 Lucent Technologies Inc. Sound-synchronized video system
US5491743A (en) * 1994-05-24 1996-02-13 International Business Machines Corporation Virtual conference system and terminal apparatus therefor
US5657426A (en) * 1994-06-10 1997-08-12 Digital Equipment Corporation Method and apparatus for producing audio-visual synthetic speech
US5880731A (en) * 1995-12-14 1999-03-09 Microsoft Corporation Use of avatars with automatic gesturing and bounded interaction in on-line chat session
US5923337A (en) * 1996-04-23 1999-07-13 Image Link Co., Ltd. Systems and methods for communicating through computer animated images

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040002325A1 (en) * 1997-07-22 2004-01-01 Evans Michael Paul Mobile handset with browser application to be used to recognize textual presentation
US20160294742A1 (en) * 2002-05-31 2016-10-06 Microsoft Technology Licensing, Llc Multiple personalities in chat communications
US10291556B2 (en) 2002-11-21 2019-05-14 Microsoft Technology Licensing, Llc Multiple personalities
US10504266B2 (en) 2003-03-03 2019-12-10 Microsoft Technology Licensing, Llc Reactive avatars
US10616367B2 (en) 2003-03-03 2020-04-07 Microsoft Technology Licensing, Llc Modifying avatar behavior based on user action or mood
US20050131677A1 (en) * 2003-12-12 2005-06-16 Assadollahi Ramin O. Dialog driven personal information manager
US20060109273A1 (en) * 2004-11-19 2006-05-25 Rams Joaquin S Real-time multi-media information and communications system
WO2008111085A2 (en) * 2007-03-13 2008-09-18 Oren Cohen A method and system for blind dating in an electronic dating service
US20110047267A1 (en) * 2007-05-24 2011-02-24 Sylvain Dany Method and Apparatus for Managing Communication Between Participants in a Virtual Environment
US8082297B2 (en) * 2007-05-24 2011-12-20 Avaya, Inc. Method and apparatus for managing communication between participants in a virtual environment
WO2008111085A3 (en) * 2008-03-13 2010-02-25 Oren Cohen A method and system for blind dating in an electronic dating service
US11654085B2 (en) 2018-05-18 2023-05-23 Baxter International Inc. Method of making dual chamber flexible container

Also Published As

Publication number Publication date
US5963217A (en) 1999-10-05

Similar Documents

Publication Publication Date Title
US5963217A (en) Network conference system using limited bandwidth to generate locally animated displays
EP1451672B1 (en) Rich communication over internet
US7788323B2 (en) Method and apparatus for sharing information in a virtual environment
US8115772B2 (en) System and method of customizing animated entities for use in a multimedia communication application
US8421805B2 (en) Smooth morphing between personal video calling avatars
US20100083324A1 (en) Synchronized Video Playback Among Multiple Users Across A Network
US20040128350A1 (en) Methods and systems for real-time virtual conferencing
US20090044112A1 (en) Animated Digital Assistant
TW200303519A (en) Method and apparatus for controlling the visual presentation of data
US20140282000A1 (en) Animated character conversation generator
CN114979682A (en) Multi-anchor virtual live broadcasting method and device
Agamanolis et al. Multilevel scripting for responsive multimedia
US20230334743A1 (en) Integrated input/output (i/o) for a three-dimensional (3d) environment
JP4625058B2 (en) Virtual space broadcasting device
JP2009048302A (en) Virtual space information summary preparation device
Leung et al. Creating a multiuser 3-D virtual environment
KR102510892B1 (en) Method for providing speech video and computing device for executing the method
JP3987483B2 (en) Multimedia content distribution system
KR102509106B1 (en) Method for providing speech video and computing device for executing the method
KR20230078204A (en) Method for providing a service of metaverse based on based on hallyu contents
KR100359389B1 (en) chatting system by ficture communication using distributted processing on internet
KR20230120940A (en) method for executing video chatting between 3D avatars of rendering an user&#39;s facial expression
CN115904159A (en) Display method and device in virtual scene, client device and storage medium
Goncalves et al. Expressive Audiovisual Message Presenter for Mobile Devices
Arons et al. Speech and audio in window systems: when will they happen?

Legal Events

Date Code Title Description
AS Assignment

Owner name: SILICON VALLEY BANK, GEORGIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:LEARN.COM, INC.;REEL/FRAME:018015/0782

Effective date: 20060728

AS Assignment

Owner name: 7TH LEVEL, INC., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GRAYSON, GEORGE D.;BELL, JAMES W.;HICKMAN, FRENCH E.;AND OTHERS;REEL/FRAME:018288/0239;SIGNING DATES FROM 19961115 TO 19961118

Owner name: LEARN2 CORPORATION, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:7THSTREET.COM, INC.;REEL/FRAME:018292/0318

Effective date: 20020314

Owner name: 7THSTREET.COM, INC., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:7TH LEVEL, INC.;REEL/FRAME:018289/0066

Effective date: 19990510

AS Assignment

Owner name: LEARN.COM, INC., FLORIDA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LEARN2 CORPORATION;REEL/FRAME:018305/0697

Effective date: 20020809

AS Assignment

Owner name: SILICON VALLEY BANK, CALIFORNIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:LEARN.COM, INC.;REEL/FRAME:021998/0981

Effective date: 20081125

STCB Information on status: application discontinuation

Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION

AS Assignment

Owner name: LEARN.COM INC, FLORIDA

Free format text: RELEASE;ASSIGNOR:SILICON VALLEY BANK;REEL/FRAME:023003/0449

Effective date: 20090723

Owner name: LEARN.COM INC, FLORIDA

Free format text: RELEASE;ASSIGNOR:SILICON VALLEY BANK;REEL/FRAME:023003/0462

Effective date: 20090723