WO2003015884A1

WO2003015884A1 - Massively online game comprising a voice modulation and compression system

Info

Publication number: WO2003015884A1
Application number: PCT/CH2002/000436
Authority: WO
Inventors: Olivier Morgan
Original assignee: Komodo Entertainment Software Sa
Priority date: 2001-08-13
Filing date: 2002-08-12
Publication date: 2003-02-27

Abstract

The invention concerns a massively online game incorporating a voice compression and modulation system for enhancing the player's sensations when he is immersed in said virtual environment.

Description

title

MASSIVELY ONLINE GAMES INCLUDING A MODULATION AND COMPRESSION SYSTEM

VOICE

Description

1. Field of the invention

The present invention relates to Massively Online Games. To understand the context in which we are located, it is important to understand what the name "Massively Online Games" means (for the purposes of this document we will henceforth replace the expression "massively online games" by its English abbreviation MOG: Massively Online Game). MOGs are, as their name suggests, computer games where a large number of players connect to a single server to play together in a virtual environment defined by a computer program. The growing success of this kind of games can be explained by the fact that for the first time the player interacted directly with other players instead of being confronted with the limited artificial intelligence of a computer program. Therefore, it is easy to understand that the key element of this kind of games is the communication between the players.

2. Discussion of the context

The first known MOG is "Ultima Online" produced by Origin and published by Electronic Arts. Following the success of the latter, more recent titles have appeared such as, "EverQuest" produced by Verant Interactive and published by Sony Online Entertainment, and "Anarchy Online" produced and published by Funcom. All these MOGs were designed to work on a single platform: the PC. They therefore naturally opted for "chat", an inter-player communication system that has already proven itself. This communication system is based on the exchange of text over the Internet. A user enters a text on the keyboard of his PC and sends it, either to another user, or to several other users connected to the Internet. These companies chose this solution because it is easy to implement and reliable. However, this system does not transcribe the voice, it only transcribes the content of the message in text form. 3. Summary of the invention

The object of the present invention is to provide an improved communication system for MOGs. Instead of communicating with text, we offer players a system that allows them to send a voice message to other players. For this system to work, the player needs a system to capture the sound of his voice as well as another system to emit the sound coming from the game. The player utters a sound, a word, a sentence in his input system. his. The amount of information that can be captured is limited to a period of the order of ten seconds. The player is warned when he reaches the maximum recording capacity. Once the message has been entered, the system processes the information.

The first step of the invention consists in isolating the voice of the player from the audio signal. 1. There are two possible scenarios here, either the sound of the game comes from speakers 10, or headphones 11. If he wears headphones the system will recognize the absence of the sounds produced by the game If it listens to the sounds through a loudspeaker, the system will recognize the presence of the sounds that it emitted a few hundredths of a second ago and subtract them from the input signal. In both cases, a signal is obtained which only includes the player's voice.

The second step is compressing the voice to a size less than 4 bps. This compression can be done in several ways:

- Direct voice compression system: such as MPEG, WAVE, etc.

- Detection of phonemes contained in the voice message Fig. 2: a first system detects each phoneme in the message (note: the system considers blanks or pauses, as a particular phoneme, which is also recognized, and for which the following parameters are also applied); identifies it with the most similar known phoneme, thanks to a dictionary of phonemes 110 and to simple grammar rules 112 applied as a function of previously recognized phonemes 111 (a dictionary and specific grammar rules for each language); records the duration of phoneme 113 (the position of the phoneme in the signal also gives its duration: tnn - tdebut), as well as the intonation component of the voice in the time interval defined by phoneme 120. The audio signal can then be transcribed into a chain of symbols each comprising an indication of duration, as well as an indication of intonation 130. This chain of symbols has a size much smaller than the original message carried by the voice while retaining its characteristic features.

The third step is the synthesis of the voice. As for compression, several synthesis systems can be applied:

- Direct voice synthesis: if a direct voice compression system has been used, synthesis is provided by the decoder of the compression system, whether MPEG, WAVE or any other system. - Synthesis of the voice from a chain of phonemes Fig. 3: this system only works if the voice has been compressed in the form of a chain of phonemes with the corresponding duration and intonation information, as described above. The system produces a sound for each phoneme 210 thanks to a library of sounds 211 (a specific library for each language). The duration of the sound 212 as well as its intonation 213 are defined by the factors which accompany each phoneme. The chain of symbols is therefore transcribed again into an understandable message carrying the emotional characteristics of the original message.

The fourth step is the modulation of the voice. Indeed, the synthetic voice as it does not yet correspond to the character embodied by the player in the game. For each character a range of modulation is available to give him for example the higher voice of a woman or the lower of a man. The voice is synthesized with a modulation value chosen by default in the modulation range authorized by the game for each character. The player can then listen to his message and modify the modulation within the authorized range until he is satisfied. This operation is an initialization of the voice components of his character and the player will not have to return to it each time he sends a message. Once the modulation chosen by the player it is recorded by the program and re-used at each message synthesis.

The steps described above constitute the processing chain necessary to transform and transport the voice of a player to the other connected players. 4. However, once the player has recorded his message he is not obliged to send it immediately. He can, as we have seen in the processing chain, listen to him to make sure that the modulation suits him and also to make sure that the content will be understandable by other players 310. The content of his message is so stored 311 until the player decides to send it 312.

4. Description of the invention

As previously indicated, this invention applies to massively online games. The best way to realize the application of this invention is to describe a phase of one of these games. A player X stands in front of his game system (PC, console. Set Top Box or other). He does not wear a helmet and therefore hears the sounds emitted by the game through a speaker system. On its screen the computer program displays the background in which it must evolve as well as the incarnation of the other players in the form of one or more characters (a single character or a small team of characters for role-playing games, an entire army for real-time strategy games, etc.). Player X can then communicate with other players either by moving their character in the virtual world (nodding, waving, etc.), or by speaking into a microphone.

For the purposes of this example, let's say that this is a massively online role-playing game. Player X has chosen to embody a sexagenarian with a cheerful complexion who lavishes her sayings on anyone who wants to hear them. In the virtual world he is in a park where many birds sing in the rain. Player X sees another character on his screen. He approaches it in the virtual world, indicates to the system that he will start a recording for example by pressing a button and utters in his microphone one of these favorite sayings: "after the rain always comes the good weather". While he is recording his sentence an indicator displayed on the screen indicates the maximum time of the message, for example 10 seconds, as well as the time used in this case 3 seconds. Its saying is recorded by the system but it is mixed with the various sounds emitted by the game (falling rain, birds singing, noise of footsteps, etc.). His sentence is processed to keep only the sound of his voice, then compressed and stored pending a decision by the player. Player X wants to give the voice of his sixties a personal touch so he asks the system to broadcast his message on his speakers. The latter is automatically modulated with a default value transforming his voice into that of an elderly woman. He adjusts the modulation in the range allowed by the game for each listening until satisfaction. Once the modulation is determined, it is saved and the player no longer has to adjust it again. He then decides to send his saying to the other character. The latter, as well as all the characters that the computer program will allow to hear (nearby character or other conditions fulfilled), will receive the saying and will hear it on their loudspeaker added to ambient noise.

5. List of drawings

Fig. 1: this drawing represents the sound environment in which the player is at the moment when he is recording sounds, words or sentences intended for other players. In case 10 the player listens to the game sounds on the loudspeaker, his messages will therefore be recorded with the ambient noise of the game before being processed. In case 11 the player listens to the game sounds in his helmet, his messages will therefore be recorded without the ambient noise of the game.

Fig. 2: this drawing represents one of the possible means of compressing the voice. In this case it is a compression here by transforming the voice into a chain of symbols. Each symbol is composed of a phoneme detected in the original voice message (thanks to a comparison with a dictionary of phonemes 110 and the application of simple grammar rules 112 establishing, among other things, the possibilities of succession of detected phonemes) to which is added a data item of intonation of the phoneme 120 as well as its duration 113.

Fig. 3: this drawing represents one of the possible means for synthesizing the voice following the compression carried out in FIG. 2. The signal coded into a chain of symbols is divided into components of phoneme 210, duration 212 and intonation 213. Thanks to a library of sounds 211 comprising in particular a sound for each phoneme the system synthesizes an artificial voice incorporating the duration of each phoneme and his intonation.

Fig. 4: this drawing represents an overview of the invention. On the one hand, a player records his voice message, listens to it and adjusts the modulation in a range predefined by the game 310. The compressed message is stored 311 pending the player's decision to send the message. The message is transported 312 by means to the other players authorized to hear the message (authorization defined by the computer program of the game according to certain parameters: proximity or other). The message is synthesized, then modulated with the modulation component chosen by the sending player.

Claims

claims

1.A massively online game incorporating the following features:

- means for separating the player's voice from that of the sound emitted by the game;

- means of compression of the player's voice;

- Means allowing the player to choose the modulation of the voice of his character in a range determined by the computer program of the game;

- storage and transport of compressed information;

- means of voice synthesis;

- the modulation of the synthetic voice according to the modulation factor corresponding to the character and / or the choice of the player.