WO2003071520A1

WO2003071520A1 - Parameter-controlled voice synthesis

Info

Publication number: WO2003071520A1
Application number: PCT/DE2003/000049
Authority: WO
Inventors: Marian Trinkel; Uwe Nettelroth
Original assignee: Deutsche Telekom Ag
Priority date: 2002-02-19
Filing date: 2003-01-10
Publication date: 2003-08-28
Also published as: DE10207875A1

Abstract

The invention relates to a method for automatically converting a message, which is created as a text file by a sender, into a voice message that can be output to an addressee via a voice output device, particularly a loudspeaker. According to the invention, a conversion program that is implemented on a computer controls a speech generator ( voice ) for creating the voice message on the basis of the text file, whereby at least one control command is assigned to the text file by the sender. This control command is recognized by the conversion program, and the program, according to the control command, modifies the characteristics of the voice that gives the voice message, particularly with regard to the timbre and/or melody thereof.

Description

Parameter-controlled speech synthesis

The present invention relates to a method for automatically converting a message created by a sender as a text file into a voice message which can be output by a recipient via a voice output device, in particular a loudspeaker, a conversion program implemented on a computer having a voice generator (“voice”) for generating the The invention also relates to a system for implementing the method.

Such methods for speech synthesis are known and have already been implemented. It is thus possible to send a message using the “Short Message System” (SMS) from a terminal, for example a cell phone or a computer, as a text file to another terminal by means of a telecommunications network. This message is then integrated into the network by one Computer converted to a voice message (Text to Speach) using a voice, so the recipient no longer has to read the SMS message as is known, but is addressed directly and personally by the synthetic voice with the corresponding content of the message , who have names like "Dagmar" or "Detlef", who present the message to the addressee.

A disadvantage of the methods used hitherto is that the conversion uses only the voice which is usually only available and the voice message has only the characteristic coloration assigned to the voice. The available synthetic voices simulate the human voice quite well in terms of emphasis, but they lack it

Ability to modulate the emphasis individually to the point. Sometimes it is possible to choose between several voices, for example under different languages, but it is not possible to vary the expression within a message.

The object of the invention is now to provide a method which can be implemented with simple and inexpensive means and which is individual Variation of the expression also possible within a message. It is also an object of the invention to provide a system for implementing the method.

These objects are achieved by a method according to claim 1 and a system according to claim 9.

The essential basic idea of the invention is to give the sender of a text message the possibility of influencing the conversion of the message with regard to desired nuances in the emphasis when presenting the message content by identifying the text file. For this purpose, one or more control commands are assigned to the text file, which are recognized as such by the computer and then associated with the sender's wish to give his voice message a special characteristic. According to the invention, the sender assigns the text file to at least one control command which is recognized by the conversion program, the program modifying the characteristic of the voice speaking the voice message, in particular with regard to its tone color and / or its melody, in accordance with the control command. The assignment can be done by prefixing, appending or inserting the control command into the text file, which usually has a header and subsequent data.

The control command can in particular be a specific component of the text file, in particular a sentence, a text sequence, a word or a

Letters. Using a linguistic model as a basis, an individual and above all "human" emphasis on the message can be achieved. One or more individual control characters ("short cuts") or a complete program instruction can be used as the control command. The synthetic voice is then modified in accordance with the control command or commands, for example with regard to its timbre. For example, a sentence with control characters such as "I'm looking forward to school ©" can mean that the voice has an honest delight and no sarcastic undertone. For example, "emoticons" or "irony signs" are used in a message, which sound like a sound Implement design in the written feelings. The ability to change the characteristics of the synthetic voice regardless of the content of the voice message has several advantages. The obvious advantage is that the meaning of the content can be modified via a changing characteristic and that the message gets a certain undertone. It is thus possible to utter a sad message in a correspondingly quiet and overcast manner, or to give the voice a sarcastic undertone in the case of "good" news. In addition, the pronunciation and, in particular, the gender of the voice can be adapted to the circumstances The advantage, for example, that the medium of SMS, which was formerly attractive for young people, is given a further appeal by the flexibility. In the end, the sender can use the invention to convey exactly what he actually wants to express. According to the invention, a synthetic reading voice is given another human touch.

The variability within a message can be achieved either by using different available voices, the choice between the individual voices being made on the basis of the control commands. A control character "$" can mean that the female voice "Dagmar" is used, while "<$" means that the text should be read by "Detlef". A variation can, however, also be achieved by varying the characteristics of the only available "neutral" voice by changing the accessible setting parameters, such as timbre, pitch, emphasis, voice stretching or volume. Thus, the character "$" a feminine and the character "c?" a male touch of the "neutral" voice. The control commands are advantageously implemented at those points in the text file where a change in the characteristic is desired. In this way, multiple voices can be used within one message, which can lead to an attractive and unique way of expression.

In order to make the handling and use of the control characters comfortable, it is advantageous to use so-called "speaking" symbols, for example smilies or "short cuts", which in themselves have no further meaning for the computer, but whose character is different to the user easily accessible. Smilies (©) with different facial expressions can be used for the undertone in the voice or, for example, an "©" for a particularly lustful expression and an "ef" for a grave voice. A number of such speaking control characters can be offered, for example, in the menu function of the telephone from which the message is being sent. When the message is read out, the mood and the desired undertone of the sender are reproduced. Emotions are taken into account in speech synthesis (text-to-speach) and the mood of the sender is thus passed on.

As already explained, an advantageous area of application of the invention is the short message system (SMS). The voice message is then sent as text via the SMS and, after conversion, is output via the loudspeaker of a telephone or a computer. A similar field of application is offered by e-mails that are sent over the Internet and are output after the conversion via the loudspeaker of a telephone or a computer. The new service brings a new game excitement and increased pleasure for the users. The invention provides a new feature for natural communication between man and machine. So each sender can get his own sound design. As explained, a linguistic model can be implemented in an advanced form of configuration using implemented control commands and thus help the voice to a higher degree of naturalness. Ultimately, a control command according to the invention can be assigned to each syllable or each letter.

The invention is advantageously implemented with a system that has a computer implemented in a communication network on which a program for speech synthesis is implemented. This so-called “voice” converts a message as a text file into a spoken text and sends the message over a voice line to a terminal also implemented in the network. The spoken text is output via a loudspeaker of the terminal. A module is implemented in the program, that recognizes a control command implemented in the text file, the module recognizing the characteristics of the voice speaking the voice message, modified in particular with regard to their timbre or melody, in accordance with the control command.

Claims

Expectations

1. A method for automatically converting a message created by a sender as a text file into a voice message that can be output by an addressee via a voice output device, in particular a loudspeaker, a conversion program implemented on a computer having a voice generator (“voice”) for generating the voice message with the Text file controls, characterized in that the sender assigns at least one control command to the text file, which is recognized by the conversion program, the program modifying the characteristic of the voice speaking the voice message, in particular with regard to its tone color and / or its melody, in accordance with the control command.

2. The method according to claim 1, characterized in that the control command is assigned to a specific component of the text file, in particular a sentence, a text sequence, a word or a letter.

3. The method according to claim 1 or 2, characterized in that a single control character or a program instruction is used as a control command.

4. The method according to any one of the preceding claims, characterized in that the or

Control commands are implemented in the places in the text file where a change in the characteristic is desired.

5. The method according to any one of the preceding claims, characterized in that different available voices are used to vary the characteristic, the selection being made on the basis of the control commands.

6. The method according to any one of the preceding claims, characterized in that an available voice is used and the characteristics thereof are varied on the basis of accessible setting parameters such as timbre, pitch, emphasis, voice stretching or volume.

7. The method according to any one of the preceding claims, characterized in that voices of different languages are used in a voice message.

8. The method according to any one of the preceding claims, characterized in that the voice message as

Text is sent via the Short Message System (SMS) and is output after conversion via the voice output device of a telephone or a computer.

9. The method according to any one of the preceding claims, characterized in that the voice message as

E-mail is sent over the Internet and, after conversion, is output via the voice output device of a telephone or a computer.

10. The method according to any one of the preceding claims, characterized in that as control commands

"Speaking" symbols, for example "smilies" or "short cuts", are used, which in themselves have no further meaning for the content.

11. System for implementing the method according to any one of the preceding claims comprising a computer that in a

Telecommunications network is implemented and on which a program for voice synthesis ("voice") is implemented, which converts a message as a text file into spoken text and sends it over a voice line to a terminal device also implemented in the network, which has a voice output device for output of the spoken text, characterized by a module that is assigned to the program and that is implemented in the text file Recognizes control characters, the module modifying the characteristics of the voice speaking the voice message, in particular with regard to their timbre and melody, in accordance with the control character.