WO2004055779A1 - Device for generating speech, apparatus connectable to or incorporating such a device, and computer program product therefor - Google Patents

Device for generating speech, apparatus connectable to or incorporating such a device, and computer program product therefor Download PDF

Info

Publication number
WO2004055779A1
WO2004055779A1 PCT/EP2003/012879 EP0312879W WO2004055779A1 WO 2004055779 A1 WO2004055779 A1 WO 2004055779A1 EP 0312879 W EP0312879 W EP 0312879W WO 2004055779 A1 WO2004055779 A1 WO 2004055779A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
readable data
control unit
display
sending
Prior art date
Application number
PCT/EP2003/012879
Other languages
French (fr)
Inventor
Nercivan Kerimovska
Gunnar Klinghult
Anna Tomasson
Original Assignee
Sony Ericsson Mobile Communications Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from EP03011580.2A external-priority patent/EP1431958B1/en
Application filed by Sony Ericsson Mobile Communications Ab filed Critical Sony Ericsson Mobile Communications Ab
Priority to AU2003279398A priority Critical patent/AU2003279398A1/en
Priority to US10/539,238 priority patent/US8340966B2/en
Publication of WO2004055779A1 publication Critical patent/WO2004055779A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Definitions

  • the present invention relates to a device for generating speech associated with information shown on a display, especially displays on portable devices such as mobile telephones and the like.
  • a conversion circuit converts the data shown to audible speech helping the user to operate the apparatus.
  • the invention also relates to an apparatus arranged to cooperate with such a device or inco ⁇ orating such a device, and a computer program product therefor.
  • the displays are used to display menus controlling the operation and settings of the device or other information relating to messages or games.
  • the displays are often small, which may be a problem for the user, especially if he is visually impaired. Also for other reasons, there is a need for an audible version of the display.
  • the present invention solves this problem by transforming the information displayed to audible speech.
  • the invention provides a device for generating speech, wherein a microcontroller is connectable to an apparatus for receiving data to be converted to speech, and sending the data to a conversion circuit; and a conversion circuit connectable to a speaker system for converting the data to a speech signal.
  • the data is supplied as ASCII characters.
  • the conversion circuit supports various selectable languages and the conversion circuit is capable of downloading languages via the connected apparatus.
  • the conversion circuit supports various selectable voices and the conversion circuit is capable of downloading voices via the connected apparatus.
  • the speed of the speech signal is adjustable.
  • the microcontroller is connectable to a memory containing language information, such as various languages, abbreviation lists and dictionaries.
  • the microcontroller is connectable to a memory containing voice settings.
  • the microcontroller is connectable to the apparatus by means of a system connector having an interface for audio signals, serial channels, power leads and analog and digital ground leads.
  • the device may be implemented as a functional cover, comprising a shell covering the front of the apparatus and a microprocessor cooperating with the processor of the apparatus.
  • the connectable apparatus may be a portable telephone, a pager, a communicator or an electronic organiser.
  • the invention provides an apparatus having a display for showing various readable data, wherein a control unit is arranged to extract readable data for sending to a device for generating speech as mentioned above.
  • the readable data may include texts from menus, text messages, help information, calendars or confirmation of actions taken with the apparatus.
  • control unit is arranged to extract a part of the readable data, such as a line or a word, at a time from the display and sending it automatically to the speech generating device at a fixed or controllable rate, and/or the control unit is arranged to extract a line at a time from the display and sending it to the speech generating device in dependence of scrolling in the display.
  • a part of the readable data such as a line or a word
  • control unit is also arranged to extract a part of the readable data, such as a character, a line or a word, at a time from the display and sending it to the speech generating device in dependence of inputting characters to the apparatus.
  • a part of the readable data such as a character, a line or a word
  • control unit may be arranged to send readable data as triggered by the input of definite characters, such as letters, signs, spaces or punctuation marks.
  • the control unit is arranged to extract readable data from a selected file and sending it automatically to the speech generating device at a fixed or controllable rate.
  • the invention provides an apparatus having a display for showing various readable data, including a control unit and a device for generating speech comprising a conversion circuit for converting data to a speech signal and connectable to a speaker system, wherein the control unit is arranged to extract readable data for sending to the speech generating device.
  • the speaker system may be integrated with the apparatus.
  • the data is supplied as ASCII characters.
  • the conversion circuit supports various selectable languages, and is capable of downloading languages.
  • the conversion circuit supports various selectable voices, and is capable of downloading voices .
  • the speed of the speech signal is adjustable.
  • the apparatus is connectable to a memory containing language information, such as various languages, abbreviation lists and dictionaries.
  • the apparatus is connectable to a memory containing voice settings.
  • the readable data includes texts from menus, text messages, help e information, calendars or confirmation of actions taken with the apparatus.
  • control unit is arranged to extract a part of the readable data, such as a line or a word, at a time from the display and sending it automatically to the speech generating device at a fixed or controllable rate, and/or the control unit is arranged to extract a line at a time from the display and sending it to the speech generating device in dependence of scrolling in the display.
  • a part of the readable data such as a line or a word
  • control unit is arranged to extract a part of the readable data, such as a character, a line or a word, at a time from the display and sending it to the speech generating device in dependence of inputting characters to the apparatus.
  • a part of the readable data such as a character, a line or a word
  • control unit may be arranged to send readable data as triggered by the input of definite characters, such as letters, signs, spaces or punctuation marks.
  • control unit is arranged to extract readable data from a selected file and sending it automatically to the speech generating device at a fixed or controllable rate.
  • the apparatus may be a portable telephone, a pager, a communicator or an electronic organiser.
  • the invention provides a computer program product loadable into the internal memory of an apparatus having a display for showing various readable data, wherein the computer program product comprises software code portions to achieve the functionality of the apparatus as mentioned above.
  • the computer program product may be embodied on a computer readable medium.
  • fig. 1 is a block diagram of the main blocks of the invention
  • fig. 2 is a perspective view of a system connector
  • fig. 3 is a data flow diagram
  • fig. 4 is an example of a mobile phone using the present invention.
  • the invention will be described in relation to a mobile phone including text- to-speech conversion.
  • the invention is also applicable in many other devices, e.g. pagers, communicators, electronic organisers and the like portable devices.
  • Text-to-speech conversion is a feature that is of interest in many different areas and applications. One of the more interesting is the use in mobile phones.
  • the text-to-speech conversion is done in hardware with a text-to-speech circuit.
  • a highlighted menu label, an SMS or other readable data are sent to a microcontroller.
  • the data may be received as ASCII characters and these are forwarded to the text-to-speech circuit by the microcontroller.
  • the text-to-speech circuit converts the characters to audio signals and sends them to a loudspeaker system.
  • the invention makes the mobile telephone more user-friendly by reading messages and menus to help the user locate himself while browsing the menus system.
  • Fig. 1 shows an embodiment of the invention in which the speech generating device is implemented as an accessory.
  • the accessory is to be attached to a mobile phone 1 via its system connector.
  • the accessory may be implemented as a so called active or functional cover, that is a shell covering e.g. the front of the phone and also connected to the phone's system connector.
  • the functional cover contains a microprocessor holding additional functions and cooperating with the processor of the telephone.
  • the actual outer shape of the accessory depends on the mobile phone and is not shown here.
  • the speech generating device 5 is shown within the dashed square and includes a microcontroller 6 receiving the data to be converted from the mobile phone and passing it to a text-to-speech (TTS) circuit 7.
  • TTS text-to-speech
  • the TTS circuit 7 converts the text to audio signals and sends them via an (optional) amplifier 8 to a loudspeaker 9.
  • the speech generating device is built into the mobile phone and may use the internal hardware, software and speaker system 11, see figure 4.
  • Existing telephones are usually provided with a microprocessor and a digital signal processor capable of being programmed to perform the required text to speech conversion.
  • the text to speech conversion may be embodied as a software product, e.g. a computer program on a readable medium or deliverable through the Internet.
  • the microcontroller may for example be a commercially available circuit comprising a programmable flash memory, general purpose input/output lines and working registers, internal and external interrupts, a programmable serial universal asynchronous receiver and transmitter (UART) and a port for a serial peripheral interface.
  • the registers are programmed to control the behaviour of the microcontroller in the desired way.
  • the microcontroller is responsible for receiving the data to be converted to speech and sending the data to the TTS circuit.
  • the TTS circuit 7 may be a commercially available circuit.
  • the circuit should have an output designed to drive a speaker, and preferably also a telesocket for headphone or an external loudspeaker.
  • a general amplifier 8 could be used, e.g. a fully differential audio power amplifier.
  • the TTS circuit should also support SMS (Short Message Service) and preferably a modifiable abbreviation list.
  • the TTS circuit also should support various languages. In a preferred embodiment it is possible to program other languages through a serial port allowing the user to download different languages.
  • a standard speaker voice is built-in, but preferably it is also possible to download different speaker voices or connect external memories, for instance so called memory sticks, containing voice data.
  • the speech generating device When the speech generating device is connected or integrated in a mobile phone or communicator, databases could be downloaded via the telecommunication network or the Internet.
  • the TTS circuit receives data to be read through its input port, e.g. ASCII characters, converts it into spoken audio and sends it to an analog output.
  • a typical circuit comprises a text processor, a smoothing filter and multilevel memory storage array.
  • the voice and audio signals are stored in the memory in their natural, uncompressed form, which provides a good voice reproduction quality.
  • the speech conversion is conventional and is not described in detail here.
  • the text-to-speech mechanism comprises text normalisation, word to phoneme conversion and phoneme mapping.
  • the text normalisation is the process of translating the incoming text to pronounceable words. It expands abbreviations and translates numeric strings to spoken words.
  • the abbreviation list can be modified. This enables flexibility of adding abbreviations specifically for the text, either by the developer or by the end user to customise the device.
  • Even the unique characters of SMS are supported, meaning that icons such as smilies ;-) will be replaced by its corresponding true spoken meaning. This means that an SMS containing abbreviations and icons will be correctly recited.
  • the TTS circuit should have an internal input buffer that could hold at least
  • the microcontroller 6 preferably is connected to a volume control to adjust the volume of a speaker system connected. For instance, two buttons could be provided, one to increase the volume and one to decrease the volume. The buttons are suitably connected to the interrupt pins of the microcontroller.
  • the speech generating device is provided with an interface for connecting the device to the phone via its system connector.
  • the system connector interface comprises audio signals, two serial channels, power leads and the analog and digital ground leads.
  • a typical system connector interface 10 is shown in fig. 2.
  • the mobile telephone is arranged to extract texts and characters from the data shown on the display and to send it to the speech generating device.
  • the extracted text string may be sent to the device to place the data on the system bus. All text strings are stored in a list and a text ID is a pointer used to point out the different text strings.
  • Fig. 3 shows the data flow diagram between the blocks in the system.
  • the different blocks need the right interfaces to communicate properly with each other.
  • the interface between the phone 1 and the microcontroller 6 consists of a universal asynchronous receiver and transmitter UART, while the microcontroller 6 and the TTS circuit 7 communicate via a serial peripheral interface.
  • the UART may form part of a commercial microcontroller.
  • Fig. 4 shows an example of the operation of the present invention.
  • the mobile phone 1 includes a display 2 currently showing part of a message, e.g. an SMS.
  • the keypad includes scroll buttons 3 for moving in the display.
  • one line 4 of the display is marked by highlighting the text.
  • the control unit extracts one line or word after another at a fixed or adjustable rate and sends it automatically to the speech generating device for translating into spoken audio signals. It is preferably possible to pause, rewind and move fast forward in the text. The speed of the speech reading the text can be adjusted to suit each individual.
  • the user scrolls in the display by means of the buttons 3 to select one line for sending to conversion circuit and reading aloud.
  • the user may also select a whole text or a file, such as a message or downloaded article. The selected text is sent to the conversion circuit.
  • the text to speech conversion is active when the user is writing a message, such as an SMS. After inputting a letter or sign, this is read aloud.
  • a whole word is finished, e.g. as triggered by the input of a space, the word is sent to the conversion circuit and read aloud.
  • a punctuation mark is input the whole last sentence may be read, and finally the whole message may be read before it is sent.
  • the control unit sends the text to be read automatically in dependence of a definite set of characters, such as spaces and punctuation marks, and also, optionally, each input sign or letter.
  • the text- to- speech conversion in the phone is not only an aid for the visually impaired and car drivers but also a step further in personalising the phone.
  • a voice command from the user can be used to control functions in the phone, like make a call or navigating in menus, and the speech function can then confirm the commands and possibly add help messages.
  • Extended help functions giving spoken explanations to a selected topic, like a step-by-step instruction on how to install an e-mail account.
  • the whole instruction manual can be accessed in this way.
  • This function can be activated and controlled by a shortcut or by voice recognition. - By saving texts on memory sticks connectable to the device or the mobile phone, it is possible to have huge text masses like books read.
  • Reading reminder and alerts from a calendar Reading reminder and alerts from a calendar.
  • Reading pages and articles downloaded from the Internet or by WAP Use as a navigation aid together with GPS (Global Positioning System) and the Yellow Pages route service.
  • GPS Global Positioning System
  • Different voices are possible. It is contemplated that popular voices like film stars etc. could be available for downloading or sold as connectable memory sticks.
  • the spoken audio signal could also be combined with music files, e.g. MIDI (Musical Instrument Digital Interface) files.
  • the invention may be implemented as a separate accessory connectable to an apparatus, or an apparatus incorporating such a device.
  • the invention also relates to an apparatus connectable to such a device.
  • the invention may be implemented by hardware or by software included in a self-contained apparatus or various combinations thereof. The scope of the invention is only limited by the claims below.

Abstract

The invention relates to a device for generating speech associated with information shown on a display (2), especially displays on portable devices such as mobile telephones (1) . A conversion circuit converts the data shown to audible speech helping the user to operate the apparatus. The invention also relates to an apparatus arranged to cooperate with such a device or incorporating such a device, and a computer program product therefor.

Description

DEVICE FOR GENERATING SPEECH, APPARATUS CONNECTABLE TO OR INCORPORATING SUCH A DEVICE, AND COMPUTER PROGRAM PRODUCT THEREFOR
Field of invention
The present invention relates to a device for generating speech associated with information shown on a display, especially displays on portable devices such as mobile telephones and the like. A conversion circuit converts the data shown to audible speech helping the user to operate the apparatus. The invention also relates to an apparatus arranged to cooperate with such a device or incoφorating such a device, and a computer program product therefor.
State of the art
In portable devices such as mobile telephones etc. the displays are used to display menus controlling the operation and settings of the device or other information relating to messages or games. The displays are often small, which may be a problem for the user, especially if he is visually impaired. Also for other reasons, there is a need for an audible version of the display.
The present invention solves this problem by transforming the information displayed to audible speech.
Summary of the invention
In a first aspect, the invention provides a device for generating speech, wherein a microcontroller is connectable to an apparatus for receiving data to be converted to speech, and sending the data to a conversion circuit; and a conversion circuit connectable to a speaker system for converting the data to a speech signal.
Preferably, the data is supplied as ASCII characters.
Suitably, the conversion circuit supports various selectable languages and the conversion circuit is capable of downloading languages via the connected apparatus.
Suitably, the conversion circuit supports various selectable voices and the conversion circuit is capable of downloading voices via the connected apparatus.
Preferably, the speed of the speech signal is adjustable.
Preferably, the microcontroller is connectable to a memory containing language information, such as various languages, abbreviation lists and dictionaries. Preferably, the microcontroller is connectable to a memory containing voice settings.
Suitably, the microcontroller is connectable to the apparatus by means of a system connector having an interface for audio signals, serial channels, power leads and analog and digital ground leads.
The device may be implemented as a functional cover, comprising a shell covering the front of the apparatus and a microprocessor cooperating with the processor of the apparatus.
The connectable apparatus may be a portable telephone, a pager, a communicator or an electronic organiser.
In a second aspect, the invention provides an apparatus having a display for showing various readable data, wherein a control unit is arranged to extract readable data for sending to a device for generating speech as mentioned above.
The readable data may include texts from menus, text messages, help information, calendars or confirmation of actions taken with the apparatus.
Suitably, the control unit is arranged to extract a part of the readable data, such as a line or a word, at a time from the display and sending it automatically to the speech generating device at a fixed or controllable rate, and/or the control unit is arranged to extract a line at a time from the display and sending it to the speech generating device in dependence of scrolling in the display.
Suitably, the control unit is also arranged to extract a part of the readable data, such as a character, a line or a word, at a time from the display and sending it to the speech generating device in dependence of inputting characters to the apparatus.
Then, the control unit may be arranged to send readable data as triggered by the input of definite characters, such as letters, signs, spaces or punctuation marks.
Preferably, the control unit is arranged to extract readable data from a selected file and sending it automatically to the speech generating device at a fixed or controllable rate. In a third aspect, the invention provides an apparatus having a display for showing various readable data, including a control unit and a device for generating speech comprising a conversion circuit for converting data to a speech signal and connectable to a speaker system, wherein the control unit is arranged to extract readable data for sending to the speech generating device.
The speaker system may be integrated with the apparatus.
Suitably, the data is supplied as ASCII characters.
Suitably, the conversion circuit supports various selectable languages, and is capable of downloading languages.
Suitably, the conversion circuit supports various selectable voices, and is capable of downloading voices .
Preferably, the speed of the speech signal is adjustable.
Suitably, the apparatus is connectable to a memory containing language information, such as various languages, abbreviation lists and dictionaries.
Suitably, the apparatus is connectable to a memory containing voice settings.
Preferably, the readable data includes texts from menus, text messages, help e information, calendars or confirmation of actions taken with the apparatus.
Suitably, the control unit is arranged to extract a part of the readable data, such as a line or a word, at a time from the display and sending it automatically to the speech generating device at a fixed or controllable rate, and/or the control unit is arranged to extract a line at a time from the display and sending it to the speech generating device in dependence of scrolling in the display.
Suitably, the control unit is arranged to extract a part of the readable data, such as a character, a line or a word, at a time from the display and sending it to the speech generating device in dependence of inputting characters to the apparatus.
Then, the control unit may be arranged to send readable data as triggered by the input of definite characters, such as letters, signs, spaces or punctuation marks. Preferably, the control unit is arranged to extract readable data from a selected file and sending it automatically to the speech generating device at a fixed or controllable rate.
The apparatus may be a portable telephone, a pager, a communicator or an electronic organiser.
In a fourth aspect, the invention provides a computer program product loadable into the internal memory of an apparatus having a display for showing various readable data, wherein the computer program product comprises software code portions to achieve the functionality of the apparatus as mentioned above.
The computer program product may be embodied on a computer readable medium.
Brief description of the drawings
Embodiments of the invention will be described in detail below with reference to the accompanying drawings, of which: fig. 1 is a block diagram of the main blocks of the invention, fig. 2 is a perspective view of a system connector, fig. 3 is a data flow diagram, and fig. 4 is an example of a mobile phone using the present invention.
Detailed description of preferred embodiments
The invention will be described in relation to a mobile phone including text- to-speech conversion. The invention is also applicable in many other devices, e.g. pagers, communicators, electronic organisers and the like portable devices.
Text-to-speech conversion is a feature that is of interest in many different areas and applications. One of the more interesting is the use in mobile phones.
Today mobile phones are used by almost everyone and a feature like this can be an important aid, especially for the visually impaired and for users who need to focus on other things while using the phone, for instance car drivers using hands-free equipment. The text-to-speech conversion is done in hardware with a text-to-speech circuit. A highlighted menu label, an SMS or other readable data are sent to a microcontroller. The data may be received as ASCII characters and these are forwarded to the text-to-speech circuit by the microcontroller. The text-to-speech circuit converts the characters to audio signals and sends them to a loudspeaker system. The invention makes the mobile telephone more user-friendly by reading messages and menus to help the user locate himself while browsing the menus system.
Fig. 1 shows an embodiment of the invention in which the speech generating device is implemented as an accessory. The accessory is to be attached to a mobile phone 1 via its system connector. The accessory may be implemented as a so called active or functional cover, that is a shell covering e.g. the front of the phone and also connected to the phone's system connector. The functional cover contains a microprocessor holding additional functions and cooperating with the processor of the telephone. Thus, the actual outer shape of the accessory depends on the mobile phone and is not shown here.
The speech generating device 5 is shown within the dashed square and includes a microcontroller 6 receiving the data to be converted from the mobile phone and passing it to a text-to-speech (TTS) circuit 7. The TTS circuit 7 converts the text to audio signals and sends them via an (optional) amplifier 8 to a loudspeaker 9.
In another embodiment, the speech generating device is built into the mobile phone and may use the internal hardware, software and speaker system 11, see figure 4. Existing telephones are usually provided with a microprocessor and a digital signal processor capable of being programmed to perform the required text to speech conversion. Thus, the text to speech conversion may be embodied as a software product, e.g. a computer program on a readable medium or deliverable through the Internet.
The microcontroller may for example be a commercially available circuit comprising a programmable flash memory, general purpose input/output lines and working registers, internal and external interrupts, a programmable serial universal asynchronous receiver and transmitter (UART) and a port for a serial peripheral interface. The registers are programmed to control the behaviour of the microcontroller in the desired way. The microcontroller is responsible for receiving the data to be converted to speech and sending the data to the TTS circuit.
The TTS circuit 7 may be a commercially available circuit. The circuit should have an output designed to drive a speaker, and preferably also a telesocket for headphone or an external loudspeaker. To get a higher volume a general amplifier 8 could be used, e.g. a fully differential audio power amplifier. The TTS circuit should also support SMS (Short Message Service) and preferably a modifiable abbreviation list. The TTS circuit also should support various languages. In a preferred embodiment it is possible to program other languages through a serial port allowing the user to download different languages. A standard speaker voice is built-in, but preferably it is also possible to download different speaker voices or connect external memories, for instance so called memory sticks, containing voice data. When the speech generating device is connected or integrated in a mobile phone or communicator, databases could be downloaded via the telecommunication network or the Internet. The TTS circuit receives data to be read through its input port, e.g. ASCII characters, converts it into spoken audio and sends it to an analog output. A typical circuit comprises a text processor, a smoothing filter and multilevel memory storage array. The voice and audio signals are stored in the memory in their natural, uncompressed form, which provides a good voice reproduction quality. The speech conversion is conventional and is not described in detail here.
Briefly, the text-to-speech mechanism comprises text normalisation, word to phoneme conversion and phoneme mapping. The text normalisation is the process of translating the incoming text to pronounceable words. It expands abbreviations and translates numeric strings to spoken words. The abbreviation list can be modified. This enables flexibility of adding abbreviations specifically for the text, either by the developer or by the end user to customise the device. Even the unique characters of SMS are supported, meaning that icons such as smilies ;-) will be replaced by its corresponding true spoken meaning. This means that an SMS containing abbreviations and icons will be correctly recited. The TTS circuit should have an internal input buffer that could hold at least
256 characters in order to receive an entire SMS consisting of 160 characters. This means that no extra memory is needed in the connecting apparatus.
The microcontroller 6 preferably is connected to a volume control to adjust the volume of a speaker system connected. For instance, two buttons could be provided, one to increase the volume and one to decrease the volume. The buttons are suitably connected to the interrupt pins of the microcontroller.
The speech generating device is provided with an interface for connecting the device to the phone via its system connector. The system connector interface comprises audio signals, two serial channels, power leads and the analog and digital ground leads. A typical system connector interface 10 is shown in fig. 2.
The mobile telephone is arranged to extract texts and characters from the data shown on the display and to send it to the speech generating device. The extracted text string may be sent to the device to place the data on the system bus. All text strings are stored in a list and a text ID is a pointer used to point out the different text strings.
Fig. 3 shows the data flow diagram between the blocks in the system. The different blocks need the right interfaces to communicate properly with each other. The interface between the phone 1 and the microcontroller 6 consists of a universal asynchronous receiver and transmitter UART, while the microcontroller 6 and the TTS circuit 7 communicate via a serial peripheral interface. The UART may form part of a commercial microcontroller.
Fig. 4 shows an example of the operation of the present invention. The mobile phone 1 includes a display 2 currently showing part of a message, e.g. an SMS. The keypad includes scroll buttons 3 for moving in the display. Currently one line 4 of the display is marked by highlighting the text. In an automatic mode, the control unit extracts one line or word after another at a fixed or adjustable rate and sends it automatically to the speech generating device for translating into spoken audio signals. It is preferably possible to pause, rewind and move fast forward in the text. The speed of the speech reading the text can be adjusted to suit each individual.
In another mode, the user scrolls in the display by means of the buttons 3 to select one line for sending to conversion circuit and reading aloud. The user may also select a whole text or a file, such as a message or downloaded article. The selected text is sent to the conversion circuit.
In a further mode, the text to speech conversion is active when the user is writing a message, such as an SMS. After inputting a letter or sign, this is read aloud. When a whole word is finished, e.g. as triggered by the input of a space, the word is sent to the conversion circuit and read aloud. Further, when a punctuation mark is input the whole last sentence may be read, and finally the whole message may be read before it is sent. The control unit sends the text to be read automatically in dependence of a definite set of characters, such as spaces and punctuation marks, and also, optionally, each input sign or letter.
The text- to- speech conversion in the phone is not only an aid for the visually impaired and car drivers but also a step further in personalising the phone. Some of the possibilities with the text-to-speech function in a mobile telephone are:
Interaction with voice control. A voice command from the user can be used to control functions in the phone, like make a call or navigating in menus, and the speech function can then confirm the commands and possibly add help messages.
Extended help functions, giving spoken explanations to a selected topic, like a step-by-step instruction on how to install an e-mail account. The whole instruction manual can be accessed in this way. This function can be activated and controlled by a shortcut or by voice recognition. - By saving texts on memory sticks connectable to the device or the mobile phone, it is possible to have huge text masses like books read.
Reading reminder and alerts from a calendar.
Reading pages and articles downloaded from the Internet or by WAP. Use as a navigation aid together with GPS (Global Positioning System) and the Yellow Pages route service. Different voices are possible. It is contemplated that popular voices like film stars etc. could be available for downloading or sold as connectable memory sticks. The spoken audio signal could also be combined with music files, e.g. MIDI (Musical Instrument Digital Interface) files.
The invention may be implemented as a separate accessory connectable to an apparatus, or an apparatus incorporating such a device. The invention also relates to an apparatus connectable to such a device. The invention may be implemented by hardware or by software included in a self-contained apparatus or various combinations thereof. The scope of the invention is only limited by the claims below.

Claims

1. A device (5) for generating speech, characterised by: a microcontroller (6) connectable to an apparatus for receiving data to be converted to speech, and sending the data to a conversion circuit (7); a conversion circuit (7) connectable to a speaker system (9) for converting the data to a speech signal.
2. A device according to claim 1, characterised in that the data is supplied as ASCII characters.
3. A device according to claim 1 or 2, characterised in that the conversion circuit (7) supports various selectable languages.
4. A device according to claim 3, characterised in that the conversion circuit (7) is capable of downloading languages via the connected apparatus.
5. A device according to any one of claims 1 to 4, characterised in that the conversion circuit (7) supports various selectable voices.
6. A device according to claim 5, characterised in that the conversion circuit (7) is capable of downloading voices via the connected apparatus (1).
7. A device according to any one of claims 1 to 6, characterised in that the speed of the speech signal is adjustable.
8. A device according to any one of claims 1 to 7, characterised in that the microcontroller (6) is connectable to a memory containing language information, such as various languages, abbreviation lists and dictionaries.
9. A device according to any one of claims 1 to 8, characterised in that the microcontroller (6) is connectable to a memory containing voice settings.
10. A device according to any one of claims 1 to 9, characterised in that the microcontroller (6) is connectable to the apparatus (1) by means of a system connector having an interface (10) for audio signals, serial channels, power leads and analog and digital ground leads.
11. A device according to claims 10, characterised in that the device is implemented as a functional cover, comprising a shell covering the front of the apparatus (1) and a microprocessor cooperating with the processor of the apparatus (1).
12. A device according to any one of claims 1 to 11, characterised in that the connectable apparatus (1) is a portable telephone, a pager, a communicator or an electronic organiser.
13. An apparatus (1) having a display (2) for showing various readable data, characterised by a control unit arranged to extract readable data for sending to a device (5) for generating speech in accordance with any one of the preceding claims.
14. An apparatus according to claim 13, characterised in that the readable data includes texts from menus, text messages, help information, calendars or confirmation of actions taken with the apparatus ( 1 ).
15. An apparatus according to claims 13 or 14, characterised in that the control unit is arranged to extract a part of the readable data, such as a line or a word, at a time from the display (2) and sending it automatically to the speech generating device (5) at a fixed or controllable rate.
16. An apparatus according to claims 13, 14 or 15, characterised in that the control unit is arranged to extract a part of the readable data, such as a line or a word, at a time from the display (2) and sending it to the speech generating device (5) in dependence of scrolling in the display (2).
17. An apparatus according to claims 13, 14, 15 or 16, characterised in that the control unit is arranged to extract a part of the readable data, such as a line or a word or a character, at a time from the display (2) and sending it to the speech generating device (5) in dependence of inputting characters to the apparatus.
18. An apparatus according to claims 17, characterised in that the control unit is arranged to send readable data as triggered by the input of definite characters, such as letters, signs, spaces or punctuation marks.
19. An apparatus according to any one of claims 13 to 18, characterised in that the control unit is arranged to extract readable data from a selected file and sending it automatically to the speech generating device (5) at a fixed or controllable rate.
20. An apparatus (1) having a display for showing various readable data, characterised by including a control unit and a device for generating speech comprising a conversion circuit for converting data to a speech signal and connectable to a speaker system (9; 11), wherein the control unit is arranged to extract readable data for sending to the speech generating device .
21. An apparatus according to claim 20, characterised in that the speaker system (11) is integrated with the apparatus.
22. An apparatus according to claim 20 or 21, characterised in that the data is supplied as ASCII characters.
23. An apparatus according to claim 20, 21 or 22, characterised in that the conversion circuit supports various selectable languages.
24. An apparatus according to claim 23, characterised in that the apparatus (1) is capable of downloading languages.
25. An apparatus according to any one of claims 20 to 24, characterised in that the conversion circuit supports- various selectable voices.
26. An apparatus according to claim 25, characterised in that the apparatus (1) is capable of downloading voices.
27. An apparatus according to any one of claims 206 to 26, characterised in that the speed of the speech signal is adjustable.
28. An apparatus according to any one of claims 20 to 27, characterised in that the apparatus (1) is connectable to a memory containing language information, such as various languages, abbreviation lists and dictionaries.
29. An apparatus according to any one of claims 20 to 28, characterised in that the apparatus (1) is connectable to a memory containing voice settings.
30. An apparatus according to any one of claims 20 to 29, characterised in that the readable data includes texts from menus, text messages, help information, calendars or confirmation of actions taken with the apparatus (1).
31. An apparatus according to any one of claims 20 to 29, characterised in that the control unit is arranged to extract a part of the readable data, such as a line or a word, at a time from the display and sending it automatically to the speech generating device at a fixed or controllable rate.
32. An apparatus according to any one of claims 20 to 31, characterised in that the control unit is arranged to extract a part of the readable data, such as a line or a word, at a time from the display and sending it to the speech generating device in dependence of scrolling in the display (2).
33. An apparatus according to any one of claims 20 to 32, characterised in that the control unit is arranged to extract a part of the readable data, such as a character, a line or a word, at a time from the display (2) and sending it to the speech generating device (5) in dependence of inputting characters to the apparatus.
34. An apparatus according to claims 33, characterised in that the control unit is arranged to send readable data as triggered by the input of definite characters, such as letters, signs, spaces or punctuation marks.
35. An apparatus according to any one of claims 20 to 34, characterised in that the control unit is arranged to extract readable data from a selected file and sending it automatically to the speech generating device (5) at a fixed or controllable rate.
36. An apparatus according to any one of claims 13 to 35, characterised in that the apparatus is a portable telephone, a pager, a communicator or an electronic organiser.
37. A computer program product loadable into the internal memory of an apparatus (1) having a display for showing various readable data, characterised by comprising software code portions to achieve the functionahty of the apparatus in accordance with any one of claims 20 to 36.
38. A computer program product according to claim 37, embodied on a computer readable medium.
PCT/EP2003/012879 2002-12-16 2003-11-14 Device for generating speech, apparatus connectable to or incorporating such a device, and computer program product therefor WO2004055779A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2003279398A AU2003279398A1 (en) 2002-12-16 2003-11-14 Device for generating speech, apparatus connectable to or incorporating such a device, and computer program product therefor
US10/539,238 US8340966B2 (en) 2002-12-16 2003-11-14 Device for generating speech, apparatus connectable to or incorporating such a device, and computer program product therefor

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
EP02445177.5 2002-12-16
EP02445177 2002-12-16
EP03011580.2 2003-05-22
EP03011580.2A EP1431958B1 (en) 2002-12-16 2003-05-22 Apparatus connectable to or incorporating a device for generating speech, and computer program product therefor
US47402503P 2003-05-29 2003-05-29
US60/474,025 2003-05-29

Publications (1)

Publication Number Publication Date
WO2004055779A1 true WO2004055779A1 (en) 2004-07-01

Family

ID=32600621

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2003/012879 WO2004055779A1 (en) 2002-12-16 2003-11-14 Device for generating speech, apparatus connectable to or incorporating such a device, and computer program product therefor

Country Status (2)

Country Link
AU (1) AU2003279398A1 (en)
WO (1) WO2004055779A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011004207A1 (en) * 2009-07-10 2011-01-13 Metáll-Print Kft. Method and system for compressing short messages, computer program and computer program product therefor
US8515760B2 (en) * 2005-01-19 2013-08-20 Kyocera Corporation Mobile terminal and text-to-speech method of same

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0776097A2 (en) * 1995-11-23 1997-05-28 Wireless Links International Ltd. Mobile data terminals with text-to-speech capability
WO2001057851A1 (en) * 2000-02-02 2001-08-09 Famoice Technology Pty Ltd Speech system
US20020034956A1 (en) * 1998-04-29 2002-03-21 Fisseha Mekuria Mobile terminal with a text-to-speech converter
WO2002069320A2 (en) * 2001-02-28 2002-09-06 Vox Generation Limited Spoken language interface
US20020143534A1 (en) * 2001-03-29 2002-10-03 Koninklijke Philips Electronics N.V. Editing during synchronous playback

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0776097A2 (en) * 1995-11-23 1997-05-28 Wireless Links International Ltd. Mobile data terminals with text-to-speech capability
US20020034956A1 (en) * 1998-04-29 2002-03-21 Fisseha Mekuria Mobile terminal with a text-to-speech converter
WO2001057851A1 (en) * 2000-02-02 2001-08-09 Famoice Technology Pty Ltd Speech system
WO2002069320A2 (en) * 2001-02-28 2002-09-06 Vox Generation Limited Spoken language interface
US20020143534A1 (en) * 2001-03-29 2002-10-03 Koninklijke Philips Electronics N.V. Editing during synchronous playback

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SMITH M ET AL: "FlexVoice DiSP - Text To Speech Distributed Speech Processing", WHITE PAPER, MINDMAKER INC., 6 February 2002 (2002-02-06), XP002262275, Retrieved from the Internet <URL:http://www.flexvoice.com/DiSP_wp.pdf> [retrieved on 20031120] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8515760B2 (en) * 2005-01-19 2013-08-20 Kyocera Corporation Mobile terminal and text-to-speech method of same
WO2011004207A1 (en) * 2009-07-10 2011-01-13 Metáll-Print Kft. Method and system for compressing short messages, computer program and computer program product therefor

Also Published As

Publication number Publication date
AU2003279398A1 (en) 2004-07-09

Similar Documents

Publication Publication Date Title
US8340966B2 (en) Device for generating speech, apparatus connectable to or incorporating such a device, and computer program product therefor
US6985913B2 (en) Electronic book data delivery apparatus, electronic book device and recording medium
EP1113416B1 (en) User interface for text to speech conversion
JP2007525897A (en) Method and apparatus for interchangeable customization of a multimodal embedded interface
WO2009006081A2 (en) Pronunciation correction of text-to-speech systems between different spoken languages
JP2003157256A (en) Multilingual conversation support system
KR20070100837A (en) Methods and apparatus for automatically extending the voice vocabulary of mobile communications devices
JPH08194712A (en) Language information supply device
JP4729171B2 (en) Electronic book apparatus and audio reproduction system
JP4075349B2 (en) Electronic book apparatus and electronic book data display control method
CN101137979A (en) Phrase constructor for translator
WO2004055779A1 (en) Device for generating speech, apparatus connectable to or incorporating such a device, and computer program product therefor
JP4070963B2 (en) Mobile communication equipment
CN100527223C (en) Device for generating speech, apparatus connectable to or incorporating such a device, and computer program product therefor
JP2001265566A (en) Electronic book device and sound reproduction system
EP1187431B1 (en) Portable terminal with voice dialing minimizing memory usage
Németh et al. Speech generation in mobile phones
US7181249B2 (en) Anthropomorphic mobile telecommunication apparatus
KR102496398B1 (en) A voice-to-text conversion device paired with a user device and method therefor
KR100754571B1 (en) Terminal device for executing speech synthesis using utterance description language
KR100513040B1 (en) Apparatus and its method for generating ringing tone using speech synthesis in wireless terminal
JP2002288170A (en) Support system for communications in multiple languages
Németh et al. Cross platform solution of communication and voice/graphical user interface for mobile devices in vehicles
Gardner-Bonneau et al. Speech Generation in Mobile Phones
KR20050051239A (en) Key command input method by voice in mobile communication terminal

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 20038A63436

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2006217981

Country of ref document: US

Ref document number: 10539238

Country of ref document: US

122 Ep: pct application non-entry in european phase
WWP Wipo information: published in national office

Ref document number: 10539238

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP