WO2009092060A2 - Techniques adaptables pour fournir des données de transmission en continu en temps réel par avatar dans des systèmes de réalité virtuelle qui emploient des environnements rendus par avatar - Google Patents

Techniques adaptables pour fournir des données de transmission en continu en temps réel par avatar dans des systèmes de réalité virtuelle qui emploient des environnements rendus par avatar Download PDF

Info

Publication number
WO2009092060A2
WO2009092060A2 PCT/US2009/031361 US2009031361W WO2009092060A2 WO 2009092060 A2 WO2009092060 A2 WO 2009092060A2 US 2009031361 W US2009031361 W US 2009031361W WO 2009092060 A2 WO2009092060 A2 WO 2009092060A2
Authority
WO
WIPO (PCT)
Prior art keywords
avatar
emission
filter
segments
segment
Prior art date
Application number
PCT/US2009/031361
Other languages
English (en)
Other versions
WO2009092060A3 (fr
Inventor
James E. Toga
Ken Cox
Sidd Gupta
Rafal Boni
Original Assignee
Vivox Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivox Inc. filed Critical Vivox Inc.
Priority to CN200980110115.3A priority Critical patent/CN102186544B/zh
Priority to EP09701763A priority patent/EP2244797A4/fr
Priority to CA2712483A priority patent/CA2712483A1/fr
Priority to JP2010543299A priority patent/JP2011510409A/ja
Publication of WO2009092060A2 publication Critical patent/WO2009092060A2/fr
Publication of WO2009092060A3 publication Critical patent/WO2009092060A3/fr

Links

Classifications

    • A63F13/10
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/52Controlling the output signals based on the game progress involving aspects of the displayed game scene
    • A63F13/12
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/30Interconnection arrangements between game servers and game devices; Interconnection arrangements between game devices; Interconnection arrangements between game servers
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/45Controlling the progress of the video game
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/50Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers
    • A63F2300/53Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers details of basic data processing
    • A63F2300/534Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers details of basic data processing for network load management, e.g. bandwidth optimization, latency reduction
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/50Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers
    • A63F2300/55Details of game data or player data management
    • A63F2300/5526Game data structure
    • A63F2300/5533Game data structure using program state or machine event data, e.g. server keeps track of the state of multiple players on in a multiple player game
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/50Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers
    • A63F2300/55Details of game data or player data management
    • A63F2300/5546Details of game data or player data management using player registration data, e.g. identification, account, preferences, game history
    • A63F2300/5553Details of game data or player data management using player registration data, e.g. identification, account, preferences, game history user representation in the game field, e.g. avatar
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6009Methods for processing data by generating or executing the game program for importing or creating game content, e.g. authoring tools during game development, adapting content to different platforms, use of a scripting language to create content
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/66Methods for processing data by generating or executing the game program for rendering three dimensional images
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/80Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game specially adapted for executing a specific type of game
    • A63F2300/8082Virtual reality

Definitions

  • Virtual Reality Systems thai employ per-Avatar Rendered Environments
  • the techniques disclosed herein relate to virtual reality systems and more particularly to the rendering of streaming data in multi-avatar virtual environments.
  • VE - refers in this context to an environment created by a computer system that behaves in ways that follow a user of the computer system's expectations for a real-world environment
  • the computer system that produces the virtual environment is termed in the following a virtual reality system and creation of the virtual environment by the virtual reality system is termed rendering the virtual environment
  • a virtual environment may include an avatar, in this context an entity belonging to the virtual environment that has a point of perception in the virtual environment.
  • the virtual reality system may render the virtual environment for die avatar as perceived from the avatar's point of perception.
  • a user of a virtual environment system may be associated with a particular avatar in the virtual environment.
  • a user who is associated with an avatar can interact with the virtual environment via the avatar: the user can not only perceive the virtual environment from the avatar's point of perception, but can also change the avatar's point of perception in the virtual environment and otherwise change the relationship between the avatar and the virtual environment or change the virtual environment itself.
  • Such virtual environments are termed in the following interactive virtual environments.
  • virtual environments -and in particular multi-avatar interactive virtual environments in which avatars for many users are interacting with the virtual environment at the same time - have moved from engineering laboratories and specialized application areas into widespread use.
  • multi-avatar virtual environments include environments with substantial graphical and visual content like those of massively- multiplayer on-line games - MMOGs, such as World of Warcraft ® - and user-defined virtual environment environments - such as Second Life ® .
  • each user of the virtual environment is represented by an avatar of the virtual environment, and each avatar has a point of perception in the virtual environment based on the avatar's virtual location and other aspects in the virtual environment.
  • Users of the virtual environment control their avatars and interact within the virtual environment via client computers such as PC or workstation computers.
  • the virtual environment is further implemented using server computers. Renderings for a user's avatar are produced on a user's client computer according to data sent from the server computers. Data is transmitted between the client computers and server computers of the virtual reality system over the network in data packets.
  • the appearance and actions of the avatar for a user are what other avatars in the virtual environment perceive - see or hear, etc. - as representing the user's appearance and action.
  • the avatar there is no requirement for the avatar to appear or be perceived as resembling any particular entity, and an avatar for a user may intentionally appear quite different from the user's actual appearance - this is one of the appealing aspects to many users of interaction in a virtual environment in comparison to interactions in the "real world".
  • the virtual reality system must render the virtual environment differently for different avatars in a multi-avatar virtual environment. What a fust avatar perceives - e.g. "sees”', etc - will be from one point of perception, and what a second avatar perceives will be different.
  • the avatar “Ivan” might "see” avatars “Sue” and “David” and a virtual table from a particular location and virtual direction, but not see the avatar “Lisa” as that avatar is "behind” Ivan in the virtual environment and thus "out of view”.
  • a different avatar "Sue” might, at die same time, see the avatars Ivan, Sue, Lisa and David and two chairs from a completely different angle.
  • Another avatar “Maurice” might be at that moment in a completely different virtual location in the virtual environment, and not see any of the avatars Ivan, Sue, Lisa or David (nor do they see Maurice), but instead Maurice sees other avatars that are near the same virtual location as Maurice.
  • renderings mat differ for different avatars are termed per-avatar renderings.
  • FIG. 2 shows an example of a per-avatar rendering for a particular avatar in an example virtual environment
  • FiG. 2 is a static image from the rendering - in actuality the virtual environment would render the scene dynamically and in color.
  • the point of perception in this example of rendering is that of the avatar for which the virtual reality system is making the rendering shown in FIG. 2.
  • a group of avatars for eight users have "gone" to a particular locale in the virtual environment - the locale contains two tiered platforms at 221 and 223.
  • the users - who may be in real-world locations very far apart - have arranged to "meet" (via their avatars) in the virtual environment for a conference to discuss something, and thus their avatars represent their presence in the virtual environment
  • the avatar for which the virtual reality system is making the rendering is not visible, as the rendering is made from the point of perception of that avatar.
  • the avatar for which the rendering is made is referred to in FIG. 2 as 299.
  • the figure contains an unattached label 299 with a brace encompassing the entire image to indicate that the rendering was made from the point of the avatar indicated by "299".
  • platform 221 Four avatars are visible standing on platform 221, including avatars labeled 201, 209 and 213. The three remaining avatars are visible standing between the two platforms, including the avatar labeled 205.
  • the avatar 209 is standing behind the back of avatar 213.
  • neither of the avatars 209 or 299 would be visible, as they would be "out of view” for avatar 213.
  • FIG 2 is for a virtual reality system in which users may interact via their avatars, but the avatars cannot emit speech. Instead in this virtual reality system, users make their avatars "speak" by typing text on keyboards: the virtual environment renders the text in a "text balloon" above the avatar for the user: optionally, a bubble with the name of the user's avatar is rendered the same way.
  • One example for the avatar 201 is shown at 203.
  • users can cause their avatars to move or walk from one virtual location to another, or to turn to face a different direction, by using the arrow keys on a keyboard.
  • Two examples of this gesturing are visible: avatar 20S is gesturing, as can be seen from the raised hands and arms circled at 207, and avatar 209 is gesturing as shown by the position of the hands and arms in circled at 211.
  • Users can thus move, gesture, and converse with each other via their avatars. Users can, via their avatars, move to other virtual locations and places, meet with other users, hold meetings, make friends, and engage in many aspects of a "virtual life" within the virtual environment Problems in implementing large multi-avatar rendered environments
  • an emission that is, an output in the virtual environment which is produced by an entity in the virtual environment and which is perceivable to avatars in the virtual environment.
  • An example of such an emission is speech produced by one avatar in the virtual environment that is audible to other avatars in the virtual environment.
  • a characteristic of emissions is that they are represented in the virtual reality system by streaming data.
  • Streaming data in the present context is any data that has high data rates and changes unpredictably in real time. Because streaming data is constantly changing, it must be sent all the time, in a continual stream. In the context of a virtual environment, there may be many sources emitting streaming data at once. Further, the virtual location for the emission and the points of perception for possibly-perceiving avatars may change in real time.
  • Examples of kinds of emissions in a virtual environment include audible emissions that can be heard, visible emissions that can be seen, haptic emissions that can be felt by touch, olfactory emissions that can be smelted, taste emissions that can be tasted, and emissions peculiar to the virtual environment, such as virtual telepathic or force-field emissions.
  • a property of most emissions is intensity. The kind of intensity will of course depend on the kind of emission. With emissions of sound, for example, intensity is expressed as loudness.
  • Examples of streaming data are data representing sound (audio data), data representing moving images (video data), and also data representing continuous force or touch. New kinds of streaming data are constantly being developed. Emissions in a virtual environment may come from real-world sources, such as speech from the user associated with an avatar, or from generated or recorded sources.
  • the source of an emission in a virtual environment can be any entity of the virtual environment Taking sound as an example, examples of audible emissions in a virtual environment include sounds made by entities in the virtual environment - e.g. an avatar emitting what the avatar's user speaks into a microphone, a generated gurgling sound emitted by a virtual waterfall, a blast sound emitted a virtual bomb, a dicky-clack sound emitted by virtual high-heels on a virtual floor - and background sounds - e.g. a background sound of a virtual breeze or wind emitted by a region of virtual environment, or background sound emitted by a virtual herd of chewing animals.
  • sounds made by entities in the virtual environment - e.g. an avatar emitting what the avatar's user speaks into a microphone
  • a generated gurgling sound emitted by a virtual waterfall e.g. an avatar emitting what the avatar's user speaks into a microphone
  • the sounds in a sequence of sounds, the relative locations of the emitting sources and avatars, the quality of the sounds emitted by the sources, the audibility and apparent loudness of the sounds to an avatar, and the orientation of each potentially-perceiving avatar, may in fact all change in real time. The same is the case with other kinds of emissions and kinds of streaming data.
  • a potentially-perceiving avatar can actually perceive the sequence of sounds emitted by a source at a given moment depends at least on the volume of the sounds emitted by the source at each moment Further, it depends on the distance in the virtual reality between the source and the potentially-perceiving avatar at each moment.
  • sounds that are "too soft" relative to a point of perception in the virtual environment will not be audible to an avatar at that point of perception. Sounds that come from "far away” are heard or perceived as softer than when they come from a lesser distance.
  • the degree to which the sound is heard as softer with distance is termed a distance-weight factor in this context.
  • the intensity of a sound at the source is termed the intrinsic loudness of the sound.
  • the intensity of a sound at the point of perception is termed the apparent loudness.
  • whether an emitted sound is audible to a particular avatar may also be determined by other aspects of the particular avatar's location relative to the source, the sounds the perceiving avatar is hearing concurrently from other sources, or by the quality of the sounds.
  • the principles of psychoacoustics include the fact that louder sounds in the real world can mask, or make inaudible, sounds that are less loud (based on apparent loudness for the individual listener). This is referred to as the relative loudness or volume of the sounds, where the apparent loudness of one sound is greater in relation to the apparent loudness of another sound.
  • Further psychoacoustic effects include that sounds of some qualities tend to be heard over other sounds: for example, humans may be especially good at noticing or hearing the sound of a baby crying, even when the sound is soft and there are other louder sounds at the same time.
  • Directionality thus depends not only on the virtual location of the avatar for which the sounds are audible, but also on the location of every source of potentially audible sound in the virtual environment, and further on the orientation of the avatar is "facing" in the virtual environment
  • a virtual reality system of the existing art that might perform acceptably for rendering emissions to and from a small handful of sources and avatars may simply be unable to cope with the tens of thousands of sources and avatars in a large multi-avatar rendered environment In common words, such a system is not scalable to deal with large numbers of sources and avatars.
  • per-avatar rendering of emissions from multiple sources in a virtual environment such as audible emissions from multiple sources presents special problems, in that the streaming data representing the emissions from each source:
  • a virtual environment may support only "text chat” or “instant messages” in a broadcast or point-to-point fashion, and not have audio interaction between users via their avatars, because providing audio interaction is too difficult or costly.
  • a virtual environment implementation may only allow up to a low maximum number of avatars for the virtual environment, or partition the avatars so that only a low maximum number can be present at any time in a given "scene" in the virtual environment or permit only a limited number of users at a time to interact using emissions of streaming data.
  • Avatars may be limited to speaking and listening only on an open “party line”, with all sounds, or all sounds from the “scene” in the virtual environment, present all the time, and all avatars being given the same rendering of all the sounds.
  • Avatars may be able only to interact audibly when the avatars' users join an optional "chat session", for example a virtual intercom, with the speech of the avatars' users rendered at the original volumes and without direction, regardless of the virtual locations of the avatars in the environment.
  • chat session for example a virtual intercom
  • environmental media such as backgrounds sound for a waterfall may only be supported as sound generated locally in a client component for each user, such as playing a digital recording in a repeating loop, rather than as an emission in the virtual environment.
  • a separate control protocol is used in the network is used to manage the flow of streaming data.
  • One side effect is, due in part to the known problem of transmission delays on a network, a control event to change the flow of streaming data - such as to "mute" streaming data from a particular source, or to change the delivery of streaming data from being delivered to a first avatar to being delivered to a second avatar - may result in the change not taking place until after a noticeable delay: the control and delivery operations are not sufficiently synchronized.
  • an object of the invention is achieved by a filter in a system that renders an emission represented by a segment of streaming data.
  • the emission is rendered by the system as perceived at a point in time from a point of perception from which the emission is potentially perceivable
  • Characteristics of the filter include:
  • the filter has access to o current emission information for the emission represented by the segment of streaming data at the point in time; and o current point of perception information for the filter's point of perception at the point of time represented by the segment of streaming data.
  • the filter makes a determination from the current point of perception information and the current emission information whether the emission represented by the segment's streaming data is perceptible at the filter's point of perception.
  • the system does not use the segment in rendering the emission at the filter's point of perception when the determination indicates that the emission represented by the segment's streaming data is not perceptible at the point of time at the filter's point of perception.
  • the filter is a component of a virtual reality system that provides a virtual environment in which sources in the virtual environment emit emissions which are potentially perceived by avatars in the virtual environment
  • the filter is associated with an avatar and determines whether an emission represented by a segment is perceptible in the virtual environment by the avatar at the avatar's current point of perception. If it is not, the segment representing the emission is not used in rendering the virtual environment for the avatar's point of perception.
  • FIG. 1 shows a conceptual overview of the filtering techniques.
  • FlG. 2 shows a scene in an exemplary Virtual environment
  • users of the virtual environment who are represented by avatars are having a conference by having their avatars meet at a particular location in the virtual environment
  • FIG. 3 shows a conceptual view of the contents of a segment of streaming data in a preferred embodiment.
  • FIG.4 shows a specification of a portion of the SlRENl 4-3D V2 RTP Payload format.
  • FIG.6 shows greater detail of Stage 2 filtering.
  • FlG. 7 illustrates an adjacency matrix
  • references numbers in the drawings have three or more digits: the two right-hand digits are reference numbers in the drawing indicated by the remaining digits. Thus, an item with the reference number 203 first appears as item 203 in FIG. 2.
  • the following Detailed Description of the invention discloses an embodiment in which the virtual environment includes sources of audible emissions and the audible emissions are represented by streaming audio data.
  • a virtual reality system such as the kind exemplified by Second Life, is implemented in a networked computer system.
  • the techniques of this invention are integrated into the virtual reality system.
  • Streaming data representing sound emissions from sources of the virtual environment are communicated as segments of streaming audio data in data packets.
  • Information about the source of a segment relevant to determining perceptibility of the segment of the emission to an avatar is associated with each segment
  • the virtual reality system does per-avatar rendering on a rendering component, such as a client computer.
  • the rendering for the avatar is done on a client computer, and only the segments that would be audible to the avatar are sent via the network to the client computer. There, the segments are converted to audible output through headphones or speakers for the avatar's user.
  • an avatar need not be associated with a user, but may be any entity for which the virtual reality system makes a rendering.
  • an avatar may be a virtual microphone in the virtual environment.
  • a recording made using the virtual microphone would be a rendering of the virtual environment that consisted of those audio emissions in the virtual environment that were audible at the virtual microphone.
  • FIG. 1 shows a conceptual overview of the filtering techniques.
  • segments of streaming data representing emissions from different sources in the virtual environment are received to be filtered.
  • Each segment is associated with information about the source of the emission such as the location of the emission's source in the virtual environment and how intense the emission is at the source.
  • the emissions are audible emissions and the intensity is the loudness of the emission at the source.
  • segment routing component 10S has a segment stream combiner component 103 that combines the segments into an aggregated stream, as illustrated at 107.
  • the aggregated stream (consisting of all the sound streams' segments) is sent to a number of filter components.
  • Two examples of the filter components are shown at 111 and 121 - others are indicated by ellipses.
  • the filter component 111 is the filter component for the rendering for avatar(i). Details for fitter 111 are shown at 113, 114, 115, and 117: the other filters operate in a similar fashion.
  • the filter component 111 filters the aggregated stream 107 for those segments of streaming data for a given kind of emission that are needed to render the virtual environment appropriately for avatar(i).
  • the filtering is based on current avatar information 113 of avatar (i) and current streaming data source information 114.
  • Current avatar information 1 13 is any information about the avatar which affects avatar(i)'s ability to perceive the emission. What the current avatar information is depends on the nature of the virtual environment For example, in a virtual environment which has a notion of location, current avatar information may include the location in the virtual environment of the avatar's organ for detecting the emission. In the following, a location in a virtual environment will often be termed a virtual location. Of course, where mere are virtual locations, there are also virtual distances between those locations.
  • Current streaming data source information is current information about the sources of streaming data that affects avatar (i)'s ability to perceive an emission from a particular source.
  • One example of current streaming data source information 114 is the virtual location of the source's emission generation component. Another is the intensity of the emission at the source.
  • the segments with streaming data that is perceptible to avatar (i) and therefore needed for rendering the virtual environment for avatar(i) at 119 are output from filter 111.
  • perceptibility may be based on the virtual distance between the source and the perceiving avatar and/or on the relative loudness of the perceptible segments.
  • the segments that remain after filtering by filter 111 are provided as input to a rendering component 117, which renders the virtual environment for the current point of perception of avatar(i) in the virtual environment.
  • the emissions of the sources are audible sounds and the virtual reality system is a networked system in which the rendering of sound for an avatar is done in a client computer used by a user who is represented by an avatar.
  • a user's client computer digitizes streaming sound input, and sends segments of the streaming data in packets over the network.
  • Packets for transmitting data over a network are known in the art
  • the content, also called the payload, of the streaming audio packets in the preferred embodiment. This discussion illustrates aspects of the techniques of this invention.
  • FlG. 3 shows in conceptual form the payload of a streaming audio segment.
  • an avatar may not only perceive audible emissions, but also be a source for them.
  • the virtual location of the avatar's speech generator may be different from the virtual location of the avatar's sound detector. Consequently, an avatar may have a different virtual location as a source of sound man it has as a perceiver of sound
  • Element 300 shows in conceptual form the payload of a streaming data segment which is employed in the preferred embodiment.
  • the braces at 330 and 340 show respectively the two main portions of the segment payload, namely a header with metadata information about the streaming audio data represented by the segment, and the streaming audio data itself.
  • the metadata includes information such as the speaker location and the intensity.
  • the segment's metadata is part of current streaming data source information 114 for the source of the emission represented by the streaming data.
  • metadata 330 includes::
  • a userID value 301 that identifies the entity that is the source that emitted the sound represented by the streaming data in the segment For a source that is an avatar, this identifies the avatar.
  • a sessionID value 302 identifying a session.
  • a session is a set of sources and avatars.
  • a set of flags 303 indicating further information, such as information about the source's state at the time of the emission representing mis segment of streaming data.
  • One flag indicates the nature of the location value 305, "speaker” or "listener” location.
  • the intensity value 307 for audible emissions is computed from the intrinsic loudness of the sound, according to principles known in the relevant arts.
  • Other kinds of emissions may employ other values to express the intensity of the emission. For example, for an emission that appeared as text in the virtual environment, an intensity value may be input separately by a user, or text that is all UPPER-CASE may be given an intensity value which is greater than text that is Mixed-Case or all lowercase.
  • intensity values may be chosen as a matter of design such that the intensity of different kinds of emissions can be compared with each other, such as in filtering.
  • the streaming data segment is shown at 340 and the associated brace. Tn the segment, the data portion of the segment is shown as starting at 321 , continuing with all the data in the segment, and ending at 323.
  • the data in the streaming data portion 340 represents the emitted sound in a compressed format: the client software that creates the segments also converts the audio data to a compressed representation, so that less data (and thus fewer or smaller segments) need to be sent over the network.
  • a compressed format based on a Discrete Cosine Transform is used to transform the signal data from the time domain into the frequency domain, and to quantize a number of sub-bands according to psychoacoustic principles.
  • the representation may be in a different representation domain, and further the emission may be rendered in a different domain: speech emissions may be represented or rendered as text using speech-to-text algorithms or vice versa, sound emissions may be represented or rendered visually or vice versa, virtual telepathic emissions may be represented or rendered as a different kind of streaming data, and so forth.
  • FIG. 5 is a system view of the preferred embodiment, showing the operation of Stage I and Stage 2 filtering. FIG. 5 will now be described in overview.
  • a segment has a field for a sessionID 302.
  • Each segment which contains streaming data 320 belongs to a session and carries an identifier for the session the segment belongs to in field 320.
  • a session identifies a group of sources and avatars, referred to as the members of the session.
  • the set of sessions which have a source is a member is included in current source information 114 for that source.
  • the set of sessions of which an avatar is a member is included in current avatar information 113 for that avatar.
  • Techniques for representing and managing the members of a group and implementing systems to do so are familiar in the relevant arts.
  • the representation of session membership is referred to in the preferred embodiment as the session table.
  • a positional session is a session whose members are sources of emissions and avatars for which the emissions from the sources are at least potentially detectable in the virtual environment
  • a given source of an audible emission and any avatar which can potentially hear an audible emission from the given source must be a member of the same positional session.
  • the preferred embodiment has only a single positional session. Other embodiments may have more than one positional session.
  • a static session is a session whose membership is determined by users of the virtual reality system. Any audible emission made by an avatar belonging to a static session is heard by every other avatar belonging to that static session, regardless of the locations of the avatars in the virtual environment Static sessions thus work like telephone conference calls.
  • the virtual reality system of the preferred embodiment provides a user interface which permits a user to specify the static sessions that their avatar belongs to.
  • Other embodiments of filter 111 may involve different kinds of sessions or no sessions at all.
  • One extension to the implementation of sessions in the presently-preferred embodiment would be a set of session ID special values which would indicate not a single session, but a group of sessions.
  • the kind of session that is specified by a segment's sessionID determines how the segment is filtered by filter 111. If the sessionID specifies a positional session, the segments are filtered to determine whether the avatar for the filter can perceive the source in the virtual environment Segments which the avatar for the finer can perceive are then filtered by the relative loudness of the sources. In the latter filter, the segments from the positional session that are perceptible by the filter's avatar are filtered together with the segments from the static sessions of which the avatar is a member.
  • every source of an audible emission in the virtual environment makes segments for the audible emission which have the sessionID for the positional session; if the source is also a member of a static session and the emission is also audible in the static session, the source further makes a copy of each of the segments for the audible emission which have the sessionID for the static session.
  • An avatar to which the audible emission is perceptible in the virtual environment and which is also a member of a static session in which the emission is audible may thus receive more than one copy of the segment in its filter.
  • the filter detects the duplicates and passes only one of the segments on to the avatar.
  • Elements 501 and S09 are two of a number of client computers.
  • the client computers are generally 'personal' computers, with hardware and software for the integrated system implementation with the virtual environment: for example, the client computer has an attached microphone, keyboard, display, and headphones or speakers, and has software for performing client operations of the integrated system.
  • the client computers are connected to a network, as shown at S02 and 506 respectively.
  • Each client may control an avatar as directed by a user of the client
  • the avatar can emit sounds in the virtual embodiment and/or hear sounds emitted by sources.
  • the streaming data that represents the emissions in the virtual reality system is produced in the client when the client's avatar is a source of the emissions and is rendered in the client when the client's avatar can perceive the emissions. This is illustrated by the arrows in both directions between client computers and networks, such as between client 501 and network 502, and between client 509 and network 506.
  • network connections for segments and streaming data between components such as client 501 and the filtering system 517 employ standard network protocols such as the RTP and SlP network protocols for audio data - RTP and SIP protocols and many other techniques for network connections and connection management that are suitable are known in the art
  • RTP supports management of data by its arrival time, and upon a request for data which includes a time value, can return that data which have an arrival time which is the same or less recent than the time value.
  • the networks at 502 and 506 are shown as separate networks in FlG. 5, but of course may be the same network or interconnected networks.
  • filtering system 517 is in a server stack in the integrated system, separate from the server stacks of the unintegrated virtual reality system.
  • the filtering system has per-avatar filters 512 and 516 for the clients' avatars.
  • Each per-avatar filter filters streaming data representing audible emissions from a number of sources in the virtual environment.
  • the filtering determines the segments of streaming data representing audible emissions that are audible to a particular client's avatar, and sends the streaming audio for the audible segments over the network to the avatar's client. As shown at 503, segments that are audible to an avatar representing the user of client 501 are sent over the network 502 to client 501.
  • current emission source information Associated with each source of emissions is current emission source information: current information about the emission and its source and/or information about its source where the information may vary in real time. Examples are the quality of the emission at its source, the intensity of the emission at the source, and the location of the emission source.
  • current emission source information 114 is obtained from metadata in segments representing emissions from the source.
  • filtering is performed in two stages.
  • the filtering process employed in filtering system 517 is broadly as follows.
  • Stage 1 filtering For a segment and an avatar, the filtering process determines the virtual distance separating the source of the segment from the avatar, and whether the source of the segment would be within a threshold virtual distance of the avatar.
  • the threshold distance defines the audible vicinity for the avatar; emissions from sources outside this vicinity are not audible to the avatar. Segments which are outside the threshold are not passed on to filtering 2. This determination is done efficiently by considering metadata information for the segment such as the sessionlD described above, current source information for the source 114, and the current avatar information for the avatar 113. This filtering generally reduces the number of segments that must be filtered as described for Filtering 2 below.
  • Stage 1 filtering For a segment and an avatar, the filtering process determines whether the filter's avatar is a member of the session identified by the sessionlD of the segment. If the filter's avatar is a member of the session, the segment is passed on to filtering 2. This filtering generally reduces the number of segments to be filtered as described for Filtering 2 below.
  • Stage 2 filtering The filtering process determines the apparent loudness of all segments for this avatar which are passed by the Stage 1 filtering. The segments are then sorted by their apparent loudness, duplicate segments from different sessions are removed, and a subset consisting of the three segments with the greatest apparent loudness is sent to the avatar for rendering. The size of the subset is a matter of design choice. The determination is done efficiently by considering the metadata. Duplicate segments are ones that have the same userID and different sessionIDs.
  • filter system 517 that filter only segments belonging to the positional session are indicated by upper brace 541 brace upper on the right at 541, and the components that filter only segments belonging to static sessions are indicated by lower brace 542.
  • the components that do with Stage 1 filtering are indicated by the brace bottom on the left at 551
  • the components that do Stage 2 filtering are indicated by the brace bottom on the right at 552.
  • filter system component 517 is located on a server in the virtual reality system of the preferred embodiment
  • a filter for an avatar may however in general be located at any point in the path between the source of the emission and the rendering component for the avatar the filter is associated with.
  • Session manager 504 receives all incoming packets and provides them to segment routing 540, which, and performs Stage 1 filtering by directing the segments that are perceptible to a given avatar either via the positional session or a static session to the appropriate per- avatar filters for Stage 2 filtering
  • sets of segments output from segment routing component 540 are input to representative per-avatar filters 512 and 516 for each avatar.
  • Each avatar that can perceive the kind of emission represented by the streaming data has a corresponding per- avatar filter.
  • Each per-avatar filter selects from the segments belonging to each source those segments that are audible to the destination avatar, sorts them in terms of their apparent loudness, removes any duplicate segments, and sends the loudest three of the remaining segments to the avatar's client over the network.
  • FIG. 4 shows a more detailed description of the relevant aspects of the payload format for these techniques.
  • the payload format may also include non- streaming data used by the virtual reality system.
  • the integrated system of the preferred embodiment is exemplary of some of the many ways in which the techniques can be integrated with a virtual reality system or other application.
  • the format used in this integration is referred to as the SIREN14-3D format
  • the format makes use of encapsulation to carry multiple payloads in one network packet.
  • Element 401 states that this part of the specification concerns a preferred SIREN 14-3D version V2 RTP version of this format, and that one or more encapsulated payloads are carried by a network packet that is transmitted across the network using an RTP network protocol.
  • a SIREN 14-3D version V2 RTP payload consists of an encapsulated media payload with audio data, followed by zero or more other encapsulated payloads.
  • the content of each encapsulated payload is given by header Flags (lag bits 414, described below.
  • Element 410 describes the header portion of an encapsulated payload in the V2 format. Details of element 410 describe individual elements of metadata in the header 410.
  • the first value in the header is a userlD value that is 32 bits in size - this value identifies the source of the emission for this segment
  • smoothedEnergyEstimate 413 is the metadata value for the intensity value for the intrinsic loudness of the segment of audio data that follows the header: the value is an integer value in units of the particular system implementation.
  • the smoothedEnergyEstimate value 413 is a long-term "smoothed" value determined by smoothing together a number of original or “raw” values from the streaming sound data. This prevents undesirable filter results that could otherwise be the result from sudden moments of noise (such as "clicks") or data artifacts caused by the digitizing process for sound data in the client computer that may be present in the audio data.
  • the value in this preferred embodiment is computed for a segment using techniques known in the art for computing the audio energy reflected by the sound data of the segment.
  • HR Infinite Impulse response
  • Element 413 is followed by headerFlags 414, consisting of 32 flag bits. A number of these flag bits are used to indicate the kind of data and format that follows the header in the payioad.
  • Element 428 describes the flag for an AUDlO-ONLY payioad, with the numeric flag value of 0xI : this flag indicates the payioad data consists of 80 bytes of audio data in a compressed format for a segment of streaming audio
  • Element 421 describes the flag for a SPEAKER POSITION payioad, with the numeric flag value of 0x2: this flag indicates that the payioad data includes metadata consisting of the current virtual location of the "mouth" or speaking part of the source avatar. This may be followed by 80 bytes of audio data in a compressed format for a segment of streaming audio.
  • the location update data consist of three values for the X, Y and Z location in co-ordinates of the virtual environment.
  • each source which is an avatar sends a payload with SPE AKER JPOSJTION information 2.5 times a second.
  • Element 422 describes the flag for a LlSTENER_POSITlON payload, with the numeric flag value of 0x4: this flag indicates that the payload data includes metadata consisting of the current virtual location of the "ears" or listening part of the avatar. This may be followed by 80 bytes of audio data.
  • the location information allows the filter implementation to determine which sources are in the particular avatar's "audible vicinity". In the preferred embodiment, each source which is an avatar sends a payload with LlSTENER_POSITlON information 2.5 times a second.
  • Element 423 describes the flag for a LISTENER_ORIENTATION payload, with the numeric flag value of 0x10: this flag indicates that the payload data includes metadata consisting of the current virtual orientation or facing direction of the listening part of the user's avatar. This information allows the filter implementation and the virtual environment to extend the virtual reality so that an avatar can have "directional hearing" or a special virtual anatomy for hearing, like the ears of a rabbit or a cat.
  • Element 424 describes the flag for a SILENCE_FRAME payload, with the numeric flag value of 0x20: this flag indicates that the segment represents silence.
  • the source send pay loads of SILENCE_FRAME pay loads as necessary to send SPEAKER POSmON and LISTENER_POSITION payloads with location metadata as described above
  • audio emissions from an avatar are never rendered for that same avatar, and do not enter into any filtering of streaming audio data for that avatar: this is a matter of design choice. This choice is in keeping with the known practice of suppressing or not rendering "side-tone" audio or video signals in digital telephony and video communications.
  • An alternative embodiment may process and may filter emissions from a source that is also an avatar when determining what is perceptible for that same avatar.
  • the filtering techniques described here can be integrated with management functions of the virtual environment to achieve greater efficiency both in filtering streaming data, and in the management of the virtual environment
  • filtering system 517 The operation of filtering system 517 will now be described in detail.
  • the session manager 504 reads a time value from an authoritative master clock.
  • the session manager men obtains from the connections for incoming segments all those segments that have an arrival time the same as that time value or earlier. If more than one segment from a given source is returned, the less recent segments from that source are discarded. The segments remaining are referred to as the set of current segments.
  • Session manager 504 then provides the set of current segments to segment routing component 540, which routes the current segments to a specific per- avatar filters. The operation of the segment routing component will be described below. Segments which are not provided to segment routing component 540 are not filtered and are thus not delivered for rendering to an avatar.
  • Segment routing component 540 does stage 1 filtering on segments belonging to the positional session using adjacency matrix 535, which is a data table that records which sources are within the audibility vicinity of which avatars: the audibility vicinity of an avatar is the portion of the virtual environment that is within a specific virtual distance of the hearing part of the avatar. In the preferred embodiment, this virtual distance is 80 units in the virtual coordinate units of the virtual reality system. Sound emissions that are farther away from the hearing part of an avatar than this virtual distance are not audible to the avatar.
  • adjacency matrix 535 is a data table that records which sources are within the audibility vicinity of which avatars: the audibility vicinity of an avatar is the portion of the virtual environment that is within a specific virtual distance of the hearing part of the avatar. In the preferred embodiment, this virtual distance is 80 units in the virtual coordinate units of the virtual reality system. Sound emissions that are farther away from the hearing part of an avatar than this virtual distance are not audible to the avatar.
  • Adjacency matrix 535 is illustrated in detain in FIG. 7,
  • Adjacency matrix 535 is a two-dimensional data table. Each cell represents a source/avatar combination and contains a distance-weight value for the source-avatar combination.
  • the distance weight value is a factor for adjusting the intrinsic loudness or intensity value for a segment according to the virtual distance between the source and the avatar: the distance-weight factor is less at greater virtual distance.
  • the distance weight value is computed by a clamped formula for roll-off as a linear function of distance.
  • Other formulae may be used instead: for example, a formula may be chosen that is approximate for more efficient operation, or that includes effects such as clamping, or minimum and maximum loudness, more dramatic or less dramatic roll-off effects, or other effects. Any formula appropriate to the particular application may be used as a matter of design choice, for example, any from the following exemplary references:
  • the adjacency matrix has one row for each source, shown in FIG. 7 along the left side at 710 as A, B, C, etc. There is one column for each destination or avatar, as shown across the top at 720 as A, B, C, and D.
  • an avatar is also a source: accordingly for an avatar B there a column B at 732 as well as a row B at 730, but there may be more or fewer sources than avatars, and sources which are not avatars and vice versa.
  • Each ceil in the adjacency matrix is at the intersection of a row and column (source, avatar). For example, row 731 is the row for source D, and column 732 is the column for avatar B.
  • Each cell in the adjacency matrix contains either a distance weight value of 0, indicating that the source is not within the audibility vicinity of the avatar or is not audible to the avatar, or a distance weight value between 0 and 1 : this value is the distance weight factor computed according to the formula described above, which is the factor by which an intensity value should be multiplied to determine the apparent loudness for an emission from that source at that destination.
  • the cell 733 at the intersection of the row and the column hold the value of the weight factor for (D, B), which is shown in this example as 0.5.
  • the weight factor is computed using the current virtual location of the source represented by the cell's row and the current virtual location of the "ears" of the avatar represented by the column.
  • the cell for each avatar and itself is set to zero and is not changed, in keeping with treatment for side-tone audio known in the art of digital communications, that sound from an entity which is a source is not transmitted to the entity as a destination.
  • the values in the cells along diagonal 735 are shown in bold text for better readability.
  • the sources and other avatars send segments of streaming data with position data for their virtual locations 2.5 times a second.
  • the session manager 504 passes the location values and the userID of the segment 114 to the adjacency matrix updater 530 to update the location information associated with the segment's source or other avatar in the adjacency matrix 535, as indicated at 532.
  • the adjacency matrix updater 530 periodically updates the distance weight factors in all cells of the adjacency matrix 521. In the preferred embodiment, this is done at periods of 2.5 times per second, as follows:
  • the adjacency matrix updater 530 obtains the associated location information for each row of the adjacency matrix 535 from the adjacency matrix 535. After obtaining this location information for a row, the adjacency matrix updater 530 obtains the location information for the hearing part of the avatar for each column of the adjacency matrix 535. Obtaining the location information is indicated at 533.
  • the adjacency matrix updater 530 determines the virtual distance between the source location and the location of the hearing part of the avatar. If the distance is greater than the threshold distance for the audibility vicinity, the distance weight for the cell corresponding to the row of the source and the column of the avatar in adjacency matrix 535 is set to zero, as shown. If the source and the avatar are the same, the value is left unchanged as zero as noted above. Otherwise, the virtual distance between source X and destination Y is computed, and a distance weight value computed according to the formula described above: the distance weight value for the cell is set to this value. Updating the distance weight value is illustrated at 534.
  • segment routing component 540 determines that a source is outside the audibility vicinity of an avatar, segment routing component 540 does not route segments from the source to the stage 2 filter for the avatar, and thus these segments will not be rendered for the avatar.
  • session manager 504 also provides the current segments belonging to static sessions to segment routing component 540, for potential delivery to Stage 2 filter components such as those illustrated at 512 and 516.
  • the segment routing component 540 determines the set of avatars to which a particular segment for an emission should be sent and sends the segment to the 1 Stage 2 filters for those avatars.
  • the segments from a particular source which are sent to a particular stage 2 filter during a particular time slice may include segments from different sessions and may include duplicate segments.
  • the segment routing component accesses the session (able, described below, to determine the set of all avatars that are members of that session. This is shown at 525. The segment routing component then sends the segment to the each of the Stage 2 filters associated with those avatars.
  • the segment routing component accesses adjacency matrix 535. From the row of the adjacency matrix corresponding to the source of the packet, the segment routing component determines all the columns of the adjacency matrix that have a distance weight factor which is not zero., and the avatars of each such column. This is shown at 536, labeled "Adjacent avatars". The segment routing component then sends the segment to each of the Stage 2 filters associated with those avatars.
  • Session table 521 defines membership in sessions.
  • the session table is a two-column table: the first column contains a session ID value, and the second column contains an entity identifier such as an identifier for a source or avatar.
  • An entity is a member of all sessions identified by the session ID value in all rows for which the entity's identifier is in the second column.
  • the members of a session are all the entities appearing in the second column of all rows that have the session's session ID in the first column.
  • the session table is updated by a session table updater component 520, which responds to changes in static session membership by adding or removing rows to or from the session update table.
  • segment router 540 routes the segment to the stage 2 filter for the avatar.
  • FIG. 6 shows the operation of a Stage 2 filtering component such as 512 of the preferred embodiment.
  • Each Stage 2 filtering component is associated with a single avatar.
  • 600 shows a set of current segments 505 delivered to the Stage 2 filtering component
  • a set of representative segments 611, 612, 613, 614 and 615 are shown. Ellipses illustrate that their may be any number of segments.
  • the start of Filtering 2 processing is shown at 620.
  • the next set of current segments 505 is obtained as input
  • the steps of elements 624, 626, 628 and 630 are performed for each segment in the set of current segments obtained in step 620.
  • 624 shows the step of getting from each segment, the energy value of the segment and the source id of the segment.
  • the sessionID value is obtained. If the session ID value is that of the positional session, the next step is 628, as shown. If the session ID value is that of a static session, the next step is 632.
  • step 630 shows the step of multiplying the energy value of the segment by the distance weight from the cell, to adjust the energy value for the segment.
  • 632 shows the step of sorting all the segments obtained in step 622 by the energy value of each segment. After the segments have been sorted, all but 1 of any set of duplicates is removed.
  • 634 shows the step of outputting a subset of the segments obtained in 622 as output of the Filtering 2 filtering. In the preferred embodiment, the subset is the three segments with the greatest energy values as determined by the sorting step 632.
  • the output is represented at 690, showing representative segments 611, 614, and 615.
  • selection of the segments to be output to the avatar may include sorting and selection criteria different from those employed in the preferred embodiment.
  • segments representing audio emissions that are perceptible for a given avatar are rendered for that avatar according to the avatar's point of perception.
  • the rendering is performed on the user's client computer, and streams of audio data are rendered at an appropriate apparent volume and stereophonic or binaural direction according to the virtual distance and relative direction for the source and the user's avatar.
  • the segments sent to the renderer include the metadata for the segment, the metadata that was used for filtering can also be used in the renderer.
  • the segment's energy value which may have been adjusted during filtering 2, can be used in the rendering process. There is thus no need to transcode or modify the encoded audio data originally sent by the source, and the rendering thus does not suffer from any loss of fidelity or intelligibility.
  • Rendering is of course also greatly simplified by the reduction in the number of segments to be rendered that has resulted from the filtering.
  • the rendered sound is output for the user by playing the sound over headphones or speakers of the client computer.
  • the filtering may be implemented in a distributed embodiment, in a parallel fashion, or employing virtualization of computer resources. Further, filtering according to the techniques can be performed in various combinations and at various points in a system, with choices being made as required to best utilize the virtual reality system's network bandwidth and/or processing power.
  • any kind of filtering techniques may be employed that will separate segments that represent emissions that are perceptible to a particular avatar from segments that represent emissions that are not perceptible to the particular avatar.
  • many kinds of filtering can be employed singly, in sequence, or in combination using techniques of mis invention.
  • filtering according to the techniques of this invention can be used with any kind of emission and in any kind of virtual environment in which relationships between the source of an emission and the perceivers of an emission may vary in real time.
  • the preferred embodiment's use of relative loudness filtering with segments belonging to static segments is an example of the use of the techniques in a situation where filtering is not dependent on location.
  • the technique used with the static segments may, for example be used in telephone conference call applications.
  • Text messaging communications such as when streams of text messaging data from a number of avatars must be displayed or rendered concurrently in a virtual environment This is one of many possible examples of streaming visual data to which the techniques may be applied
  • the kinds of information needed to filter the emissions of the sources will depend on the properties of the virtual environment and the properties of the virtual environment may in turn depend on the application for which it is intended. For example, in a virtual environment for a conferencing system, the positions of the conferees relative to each other may not be important and in such a situation, filtering might be done only on the basis of information such as relative intrinsic loudness of the conferees' audio emissions and the association of a conferee with a particular session.
  • Filtering may also be combined with other processing to good effect
  • certain streams of media data may be identified in a virtual environment as "background sounds", such as the sound of flowing water of a virtual fountain in the virtual environment.
  • background sounds such as the sound of flowing water of a virtual fountain in the virtual environment.
  • the designers of the virtual environment may prefer that background sounds not be filtered identically to other streaming audio data, and not cause other data to be filtered out, but instead the data for background sounds be filtered and processed to be rendered at lesser apparent loudness when there are other streaming data that may otherwise have been masked and filtered.
  • Such an application of the filtering techniques permits background sounds to be generated by a server component in a virtual environment system, instead of being generated locally by a rendering component in a client component
  • a filtering may filter according to metadata and current avatar information such as source location, intensity, and avatar location, for two different kinds of emissions without regard to the two emissions being of different of different kinds. All that is required is that the intensity data be comparable.
  • the techniques of this invention can be used to reduce the amount of data that must be rendered, and thus it becomes much more possible to move rendering of real-time streaming data to the "edges' * of a networked virtual reality system - rendering on the destination clients rather than adding to the burden of doing rendering on a server component
  • a design may employ these techniques to reduce the amount of data to the extent that functionality previously implemented on the client, such as recording, can be performed on server components: thus allowing a designer for a particular application to choose to reduce the cost of clients, or to provide virtual functionality not supported on the client computer or its software
  • the current emission source information such as that provided by metadata relating to location and orientation, may be further useful for rendering streaming media data stereophonically or binaurally at the final point of rendering, so that the rendered sounds are perceived as coming from the appropriate relative direction - from the left, from the right, above, and so forth.
  • this associated information for filtering may thus have further synergistic advantages in rendering, in addition to those already mentioned
  • the filtering techniques are particularly useful where the streaming data represents emissions from sources in a virtual environment and is being rendered as required for different points of perception in the virtual environment.
  • the basis on which the filtering is done will of course depend on the nature of the virtual environment and on the nature of the emissions.
  • the psychoacoustic filtering techniques disclosed herein are further useful not just in virtual environments, but in any situation in which audio from multiple sources is rendered.
  • the technique of using metadata in the segments containing the streaming data both in the filtering and in rendering the streaming data at the renderer results in substantia] reduction in both network bandwidth requirements and processing resources.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Information Transfer Between Computers (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention concerne des techniques adaptables pour rendre des émissions représentées à l'aide de segments de données de transmission en continu, les émissions étant potentiellement perceptibles à partir de points de mariage de perception et les émissions et les points de perceptions étant des relations qui varient en temps réel. Les techniques filtrent les segments en déterminant pour une tranche temporelle si une émission donnée est perceptible en un point de perception donné, si ce n'est pas le cas, les segments de données de transmission en continu représentant l'émission ne sont pas utilisés pour faire des émissions des émissions perçues au point de perception donné. Les techniques sont utilisées dans des environnements virtuels en réseau pour rendre les émissions audio à des clients dans un système de réalité virtuelle en réseau. Avec les émissions audio, un élément déterminant la capacité de perception d'une émission donnée en un point de perception donné est le fait que les propriétés psycho-acoustiques d'autres émissions masquent ou non l'émission donnée. Les segments représentant les données de transmission en continu contiennent également des métadonnées qui sont utilisées à la fois dans le filtrage et le rendu des données de transmission en continu ou un point de perception auquel l'émission est perçue.
PCT/US2009/031361 2008-01-17 2009-01-17 Techniques adaptables pour fournir des données de transmission en continu en temps réel par avatar dans des systèmes de réalité virtuelle qui emploient des environnements rendus par avatar WO2009092060A2 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN200980110115.3A CN102186544B (zh) 2008-01-17 2009-01-17 用于在采用每个具象的渲染环境的虚拟现实系统中提供实时的每个具象的流数据的可扩展技术
EP09701763A EP2244797A4 (fr) 2008-01-17 2009-01-17 Techniques adaptables pour fournir des données de transmission en continu en temps réel par avatar dans des systèmes de réalité virtuelle qui emploient des environnements rendus par avatar
CA2712483A CA2712483A1 (fr) 2008-01-17 2009-01-17 Techniques adaptables pour fournir des donnees de transmission en continu en temps reel par avatar dans des systemes de realite virtuelle qui emploient des environnements rendus par avatar
JP2010543299A JP2011510409A (ja) 2008-01-17 2009-01-17 アバタ別にレンダリングされる環境を用いる仮想現実システムにおいてリアルタイムのアバタ別のストリーミングデータを提供するスケーラブルな技法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US2172908P 2008-01-17 2008-01-17
US61/021,729 2008-01-17

Publications (2)

Publication Number Publication Date
WO2009092060A2 true WO2009092060A2 (fr) 2009-07-23
WO2009092060A3 WO2009092060A3 (fr) 2010-01-14

Family

ID=40885910

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/031361 WO2009092060A2 (fr) 2008-01-17 2009-01-17 Techniques adaptables pour fournir des données de transmission en continu en temps réel par avatar dans des systèmes de réalité virtuelle qui emploient des environnements rendus par avatar

Country Status (7)

Country Link
EP (1) EP2244797A4 (fr)
JP (3) JP2011510409A (fr)
KR (1) KR20110002005A (fr)
CN (1) CN102186544B (fr)
CA (1) CA2712483A1 (fr)
TW (1) TW200941271A (fr)
WO (1) WO2009092060A2 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8994782B2 (en) 2011-01-04 2015-03-31 Telefonaktiebolaget L M Ericsson (Publ) Local media rendering
US10433096B2 (en) 2016-10-14 2019-10-01 Nokia Technologies Oy Audio object modification in free-viewpoint rendering
US20190306651A1 (en) 2018-03-27 2019-10-03 Nokia Technologies Oy Audio Content Modification for Playback Audio
US10531219B2 (en) 2017-03-20 2020-01-07 Nokia Technologies Oy Smooth rendering of overlapping audio-object interactions
US11074036B2 (en) 2017-05-05 2021-07-27 Nokia Technologies Oy Metadata-free audio-object interactions
US11096004B2 (en) * 2017-01-23 2021-08-17 Nokia Technologies Oy Spatial audio rendering point extension
US11395087B2 (en) 2017-09-29 2022-07-19 Nokia Technologies Oy Level-based audio-object interactions

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8860720B1 (en) * 2014-01-02 2014-10-14 Ubitus Inc. System and method for delivering graphics over network
US20150371295A1 (en) * 2014-04-29 2015-12-24 Socialplay Inc. System and method for cross-application virtual goods management
JP6217682B2 (ja) * 2015-03-27 2017-10-25 ブラザー工業株式会社 情報処理装置及びプログラム
CN105487657A (zh) * 2015-11-24 2016-04-13 小米科技有限责任公司 声音响度的确定方法及装置
CN106899860B (zh) * 2015-12-21 2019-10-11 优必达公司 通过网络传送媒体的系统及方法
WO2018135304A1 (fr) * 2017-01-18 2018-07-26 ソニー株式会社 Dispositif de traitement d'information, procédé de traitement d'informations et programme
KR102317134B1 (ko) 2017-10-31 2021-10-25 에스케이텔레콤 주식회사 증강현실용 컨텐츠 저작 장치 및 방법
KR102461024B1 (ko) 2017-10-31 2022-10-31 에스케이텔레콤 주식회사 헤드 마운티드 디스플레이 및 이를 이용하여 가상 공간에서 동작이 수행되도록 하는 방법
EP3499917A1 (fr) * 2017-12-18 2019-06-19 Nokia Technologies Oy Activation du rendu d'un contenu spatial audio pour consommation par un utilisateur
JP7189360B2 (ja) * 2019-08-20 2022-12-13 日本たばこ産業株式会社 コミュニケーション支援方法、プログラムおよびコミュニケーションサーバ
WO2021033258A1 (fr) * 2019-08-20 2021-02-25 日本たばこ産業株式会社 Procédé d'aide à la communication, programme et serveur de communication
WO2021033261A1 (fr) * 2019-08-20 2021-02-25 日本たばこ産業株式会社 Procédé d'aide à la communication, programme, et serveur de communication
US11188902B1 (en) * 2020-05-20 2021-11-30 Louise Dorothy Saulog Sano Live time connection application method and devices
KR102523507B1 (ko) * 2021-12-20 2023-04-19 전광표 사운드 맵 서비스 제공 장치 및 방법

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5347306A (en) * 1993-12-17 1994-09-13 Mitsubishi Electric Research Laboratories, Inc. Animated electronic meeting place
JPH10207684A (ja) * 1996-11-19 1998-08-07 Sony Corp 3次元仮想現実空間共有システムにおける情報処理装置、情報処理方法および媒体
US6106399A (en) * 1997-06-16 2000-08-22 Vr-1, Inc. Internet audio multi-user roleplaying game
US6241612B1 (en) * 1998-11-09 2001-06-05 Cirrus Logic, Inc. Voice communication during a multi-player game
US20040225716A1 (en) * 2000-05-31 2004-11-11 Ilan Shamir Methods and systems for allowing a group of users to interactively tour a computer network
US6935959B2 (en) * 2002-05-16 2005-08-30 Microsoft Corporation Use of multiple player real-time voice communications on a gaming device
JP2004267433A (ja) * 2003-03-07 2004-09-30 Namco Ltd 音声チャット機能を提供する情報処理装置、サーバおよびプログラム並びに記録媒体
JP3740518B2 (ja) * 2003-07-31 2006-02-01 コナミ株式会社 ゲーム装置、コンピュータの制御方法及びプログラム
JP2005322125A (ja) * 2004-05-11 2005-11-17 Sony Corp 情報処理システム、情報処理方法、プログラム
GB2415392B (en) * 2004-06-25 2008-11-05 Sony Comp Entertainment Europe Game processing
JP2006201912A (ja) * 2005-01-19 2006-08-03 Nippon Telegr & Teleph Corp <Ntt> 3次元仮想オブジェクト情報提供サービス処理方法と3次元仮想オブジェクト提供システムおよびプログラム
JP3863165B2 (ja) * 2005-03-04 2006-12-27 株式会社コナミデジタルエンタテインメント 音声出力装置、音声出力方法、ならびに、プログラム
US20060277466A1 (en) * 2005-05-13 2006-12-07 Anderson Thomas G Bimodal user interaction with a simulated object
EP2194509A1 (fr) * 2006-05-07 2010-06-09 Sony Computer Entertainment Inc. Procédé permettant de conferer des caracteristiques affectives à un avatar informatique au cours d'un jeu

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of EP2244797A4 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8994782B2 (en) 2011-01-04 2015-03-31 Telefonaktiebolaget L M Ericsson (Publ) Local media rendering
US9560096B2 (en) 2011-01-04 2017-01-31 Telefonaktiebolaget Lm Ericsson (Publ) Local media rendering
US10433096B2 (en) 2016-10-14 2019-10-01 Nokia Technologies Oy Audio object modification in free-viewpoint rendering
US11096004B2 (en) * 2017-01-23 2021-08-17 Nokia Technologies Oy Spatial audio rendering point extension
US10531219B2 (en) 2017-03-20 2020-01-07 Nokia Technologies Oy Smooth rendering of overlapping audio-object interactions
US11044570B2 (en) 2017-03-20 2021-06-22 Nokia Technologies Oy Overlapping audio-object interactions
US11074036B2 (en) 2017-05-05 2021-07-27 Nokia Technologies Oy Metadata-free audio-object interactions
US11442693B2 (en) 2017-05-05 2022-09-13 Nokia Technologies Oy Metadata-free audio-object interactions
US11604624B2 (en) 2017-05-05 2023-03-14 Nokia Technologies Oy Metadata-free audio-object interactions
US11395087B2 (en) 2017-09-29 2022-07-19 Nokia Technologies Oy Level-based audio-object interactions
US20190306651A1 (en) 2018-03-27 2019-10-03 Nokia Technologies Oy Audio Content Modification for Playback Audio
US10542368B2 (en) 2018-03-27 2020-01-21 Nokia Technologies Oy Audio content modification for playback audio

Also Published As

Publication number Publication date
JP2011510409A (ja) 2011-03-31
CA2712483A1 (fr) 2009-07-23
JP2013254501A (ja) 2013-12-19
EP2244797A4 (fr) 2011-06-15
KR20110002005A (ko) 2011-01-06
TW200941271A (en) 2009-10-01
CN102186544A (zh) 2011-09-14
WO2009092060A3 (fr) 2010-01-14
EP2244797A2 (fr) 2010-11-03
CN102186544B (zh) 2014-05-14
JP2015053061A (ja) 2015-03-19

Similar Documents

Publication Publication Date Title
WO2009092060A2 (fr) Techniques adaptables pour fournir des données de transmission en continu en temps réel par avatar dans des systèmes de réalité virtuelle qui emploient des environnements rendus par avatar
US20120016926A1 (en) Scalable techniques for providing real-time per-avatar streaming data in virtual reality systems that employ per-avatar rendered environments
JP5723905B2 (ja) 共用仮想領域通信環境における自動化されたリアルタイム・データ・ストリーム切換え
JP5232239B2 (ja) 共用仮想領域通信環境における自動化されたリアルタイム・データ・ストリーム切換え
US7574474B2 (en) System and method for sharing and controlling multiple audio and video streams
JP5563014B2 (ja) オーディオシーン作成用装置および方法
US20060008117A1 (en) Information source selection system and method
JP2009043274A (ja) 対話型立体的オーディオビジュアル・システム
CN1574870A (zh) 人际通信系统
US8577060B2 (en) Method and apparatus for dynamically determining mix sets in an audio processor
CN117219096A (zh) 一种在实时云渲染环境下多用户语音空间音频的实现方法
US11825026B1 (en) Spatial audio virtualization for conference call applications
JP7143874B2 (ja) 情報処理装置、情報処理方法およびプログラム
JP7191146B2 (ja) 配信サーバ、配信方法、及びプログラム
WO2023243375A1 (fr) Terminal d&#39;informations, procédé de traitement d&#39;informations, programme, et dispositif de traitement d&#39;informations
JP7508586B2 (ja) 没入型テレカンファレンスおよびテレプレゼンスのためのマルチグルーピングの方法、装置、およびコンピュータプログラム
KR20000054437A (ko) 화상 채팅 처리 방법
WO2021235173A1 (fr) Dispositif de traitement d&#39;informations, procédé de traitement d&#39;informations et programme
JP7399549B2 (ja) リモート端末向けの遠隔会議およびテレプレゼンスにおいてオーディオミキシングゲインをシグナリングする手法
JP5602688B2 (ja) 音像定位制御システム、コミュニケーション用サーバ、多地点接続装置、及び音像定位制御方法
WO2023042671A1 (fr) Procédé de traitement de signal sonore, terminal, système de traitement de signal sonore et dispositif de gestion
Wuolio et al. On the potential of spatial audio in enhancing virtual user experiences
Divjak et al. Visual and audio communication between visitors of virtual worlds
Rimell Immersive spatial audio for telepresence applications: system design and implementation
Kanada Simulated virtual market place by using voiscape communication medium

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200980110115.3

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2712483

Country of ref document: CA

Ref document number: 2010543299

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2660/KOLNP/2010

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2009701763

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20107018261

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09701763

Country of ref document: EP

Kind code of ref document: A2

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)