GB2406983A

GB2406983A - Digital radio receiver with speech synthesis

Info

Publication number: GB2406983A
Application number: GB0422412A
Authority: GB
Inventors: Gavin Robert Ferris
Original assignee: RadioScape Ltd
Current assignee: RadioScape Ltd
Priority date: 2003-10-08
Filing date: 2004-10-08
Publication date: 2005-04-13
Anticipated expiration: 2024-10-08
Also published as: GB0323551D0; GB0422412D0; GB2406983B; WO2005036786A1

Abstract

A digital radio receiver comprises a speech synthesis module able to generate synthesised speech, the synthesised speech conveying information ordinarily shown on a display of the receiver. The present invention therefore uses audio as the primary source of feedback in place of the LCD used in radios available today. These techniques are predominantly aimed at making radios easier to use for the visually impaired but are also applicable to application areas where visual feedback may be distracting, such as automotive applications. The present invention can be implemented in any digital radio system that provides additional textual information and hence includes DAB, DRM. RDS, IBOC and others.

Description

Talk radio 2406983

DIGITAL RADIO RECEIVER WITH SPEECH SYNTHESIS

FIELD OF THE INVENTION

The present invention provides a number of methods and techniques for making a digital radio (e.g. a Digital Audio Broadcasting (DAB) radio) usable without visual feedback.

DESCRIPTION OF THE PRIOR ART

Digital radio receivers can decode digital radio signals conforming to one or more different formats, such as DAB, DRM or IBOC. One of the interesting features about digital radios, is that digital radios have displays that can display textual information broadcast with a radio channel (c.g. the name of the song being broadcast on that channel). Because digital radios generally have displays that can show text and in some cases graphics, much of the user interaction developments for digital radios have centered on new graphical user interfaces, or other ways of presenting information to the end-user that take advantage of the display and the ability for a digital radio channel to carry an information payload.

SUMMARY OF THE PRESENT INVENTION

In a first aspect of the invention, there is a digital radio receiver comprising a speech synthesis module able to generate synthesised speech, the synthesised speech conveying information ordinarily shown on a display of the receiver.

Hence, the present invention moves away from the established bias in digital radio user interaction design by ignoring, or at least supplementing the display with synthesized speech.

Hence, the synthesised speech could confirm a control operation performed by an end-user (c.g. switching to pre-set station 1 could be accompanied by a synthesized voice saying "preset one"; the operation of tuning could be accompanied by the synthesized voice saying "tuning" etc.).

The receiver may include a display and the information conveyed by the synthesised speech is then also shown on the display. In other cases, the synthesized speech may not correspond to anything in fact displayed on a display.

In addition, the synthcsised speech could be generated by synthesising textual information broadcast with a digital radio channel received by the receiver. I7or example, the textual information could be DLS, such as the station name the receiver is currently tuned into. It could be an alert relating to a newly received DLS.

The textual information could be stored in memory (e.g. to facilitate rapid replay at a future time; to enable the device to cycle through several discrete items of synthesised speech under the control of the enduser). To improve the quality of the speech used, compressed speech versions of the the textual information could also be downloaded via transport protocols such as MC)T.

In adelition to being able to synthesis spccch, the receiver could also be able to process a broadcast data file, such as a DLS announcement or radio station jingle. The data file could also include information that improves the accuracy of the speech synthesis.

The receiver could implement a radio pause or rewind function is activated, in which live radio is temporarily stored in memory, whilst synthesized speech outputs from a speaker in the receiver.

In another aspect, there is a method of enabling a digital radio receiver to provide information; comprising the step of synthesising speech in a speech synthesis module in the receiver, the synthesised speech conveying information ordinarily shown on a display of the receiver.

The present invention therefore uses audio as the primary source of feedback in place of the TED used in radios available today. These techniques are predominantly aimed at making radios easier to use for the visually impaired but are also applicable to application areas where visual feedback may be distracting, such as automotive applications. The present invention can be implemented in any radio system that provides additional textual information and hence includes DAB, DRM., IBOC and others.

DETAILED DESCRIPTION

Base implementation In the base implementation, the receiver confirms all operations performed by the user using a spoken equivalent of the operations result. For example changing the volume would cause a spoken message of the form 'Volume now 10'. For a button that has multiple functions each press would announce the new mode, for example this might cycle between 'Volume mode', 'Bass mode', 'Treble mode','Balance mode' and 'Playback resumed'.

As DAB provides textual information about services, this can also be communicated to the user using a spoken equivalent. For example if we consider a radio that has a 'Select', 'Up' and 'Down' buttons for selecting a new station, the interaction to select a new channel User presses 'select' button on the digital radio Audio for current service is muted Radio says 'Select station mode'- i.e. speech synthesised User presses 'up' button on the digital radio Radio looks for next station and outputs a spoken representation of its name - e.g. 'BBC Radio four' User presses 'up' button again Radio says 'BBC Five live' User presses 'select' button Radio says 'BBC Five live selected' Radio starts playing Five live.

The radio makes use of a text to speech TTS module that converts text into speech that would normally be fed back to the user via the LCD. The LCD can still be used to provide a second source of information. TTS is a well understood problem and there a number of vendors selling off the shelf TTS libraries suitable for embedded devices typically found in today's digital radios. High quality TTS can consume large amount of memory and processor time so different radio models will have different TTS modules that will be chosen as the best trade-off between quality and use of the resources available.

Experienced users of the TTS facility may find that the radio is frustratingly slow to operate.

There are number of techniques that can be employed to improve this: If the radio is in the middle of speaking and the user presses a key then the current phrase should be interrupted and the phrase corresponding to the new key press should be started immediately. For example when changing the volume the user might press the 'Up' button a number of times. Instead of hearing 'volume now 5', 'Volume now 6' and 'Volume now 7', the user might hear something like 'Vol ', Vol', 'Volume now 7'.

The radio can have an advanced mode where confirmations are minimised and abbreviated. 'Select mode' could be abbreviated to 'Select' and confirmations like Station X selected' would be omitted. Radio station names can be abbreviated using contracted form hints transmitted with the station name. In some cases this leads to names of stations that can't easily be pronounced; in these cases the TTS module would return an error saying that the station name was unpronounceable and the normal longer name would be used instead.

If the radio supports a pause/rewind facility, then the currcat station could be paused while the radio is providing spoken feedback. This will eliminate the chance of parts of programmer being missed due to spoken feedback interrupting the programme. Thc user would then be able to 'catch up' to the live programme at a time of their choosing.

DAB also sends associated textual information with a programme known as Dynamic Label Segments (DT,S). A user may want to be able to hear these along with the actual on air audio.

By pressing a 'DLS' button, the DLS text can be spoken by the radio after first muting the current service. This would use the same TTS functionality described above.

The difficulty for a user is knowing when the DLS has changed and also knowing if it contains something that the user has not heard before. Typically, a radio station will have a series of DLS labels that it cycles through for a given programme. To make DLS more usable, the radio can cache a set of the most recent DLS texts. It can also remember which DLS the user has listened to. When they press the DLS button they would be able to cycle through the cached DLS with the most recent unlistened to DLS being presented first. If the user is interested in being notified of new DLS content then the radio can have a mode that beeps or says something like 'New DLS available' when new DLS is received.

It is envisaged that sophisticated receivers might support synthesis of text in multiple languages, not just English.

Extensions to the invention Broadcast 'Speech Tags' To improve the quality of the spoken feedback the broadcaster could transmit spoken versions of the station name and DLS for the receiver to pick up, store and subsequently use. The receiver uses the broadcaster supplied 'speech tags' rather than relying on synthesis in the receiver itself. These could make use of modern vocoding technology for high compression, or may simply provide additional intonation information to make the speech synthesis more realistic. The format would have to be agreed between broadcasters and radio manufactures. The trade off is between high compression and therefore low cost for broadcasters and higher cost for receiver manufactures due to needing to support another audio speech decoder.

The only difficultly comes with voicing dynamic text such as DLS, since its content cannot be known ahead of time. One advantage to the use of simple text synthesis is that it does not require any modification to the existing digital radio broadcasts. On the other hand, use of explicitly coded 'tags' (downloaded into the receiver using MOT, for example) would allow stations to perform better 'branding' (for example, by adding a jingle for the station name, rather than simple flat speech).

The Multimedia Object Transport (MOT) protocol is the most natural choice for this for distributing these speech tags. The MOT carousel could be sent in Programme Associated Data (PAD) or sent as a separate data service.

For the standard control audio (such as 'Plus One Bass') the receiver manufacturer to build in a high-quality set of voice tags to be used in each case.

Standardised User Interface In another envisaged embodiment, the whole layout of the receiver physical controls, the control surface design (e.g. , encompassing Braille) and the expected semantics of operation of the receiver (e.g., when a control is first touched, it announces its function but does not operate, and only on the second touch with a fixed time interval does the control actuate) is standardised so as to provide the maximum assistance to a particular target group of users, for example those suffering from visual impairment. It is expected that any such standardisation would provide a set of HMI (human machine interface) 'idioms' for such routine tasks as navigating in and out of submenus, indicating selection, indicating list of options etc., all provided through a standardised control set and requiring no visual feedback.

User recognition In another envisaged embodiment the radio could contain a microphone and voice recognition module. This could be used to provide custom settings for different users of a shared radio. Different users may have differcut preset stations, preferred listening volume and mode of use preferences. The radio could also remember the last station each user was listening too. An example application is a digital radio in a family car. On power up the radio would ask for the user's name and then load their custom settings. If the radio isn't able to recognise the user or nothing is heard then it would default to the last user within a preset timeout.

Voice control The receiver can be further extended to enable voice based control allowing control without physical contact or visual feedback, making it potentially useful for the elderly or those with serious physical impairment. Alternatively, voice control could be combined with visual feedback if so desired. Voice control would use a voice recognition module to understand the user's request. There arc varying levels of sophistication of implementation. The simplest implementation would have a predefined set of possible commands that closely match the

I

names of the buttons on an equivalent 'physical' interface. For example commands might include: 'Select', 'Volume', 'Up', 'Down' etc Restricting the set to a known set of commands makes the recognition task easier and hence translates to a cheaper end unit price.

The end user experience can be made easier to use by also enabling the radio to understand station names. This would enable a user to select a station by saying a radio name rather than having to go through a sequence like 'Select', 'Up', Up', 'Up', 'Select'. This would of course require more powerful speech recognition than in the previous example.

Electronic Programme Guide DAB includes the facility to transmit an Electronic Programme Guide (EPG) which includes a list of the programmcs that are on for various stations. A voiced feedback receiver could enable the user to browse the EPG and listen to the details of each programme. These details include the name, time, duration and description. Each of these could be spoken in turn as the user navigates through the EPG. If the radio supports setting up timed recordings based on EPG then these could also be spoken to the user so that they can be reminded of what they have scheduled to be recorded.

Claims

1. A digital radio receiver comprising a speech synthesis module able to generate synthesised speech, the synthesised speech conveying information ordinarily shown on a display of the receiver.

2. The receiver of Claim 1 in which the synthesiscd speech confirms a control operation performed by an end-user.

3. The receiver of Claim 1 or 2 in which the receiver includes a display and the information conveyed by the synthesised speech is also shown on the display.

4. The receiver of any preceding Claim in which the synthesised speech is generated by synthesising textual information broadcast with a digital radio channel received by the 1 5 receiver.

5. The receiver of Claim 4 in which the textual information is DLS.

6. The receiver of Claim 4 in which the textual information is a station name.

7. The receiver of Claim 4 in which the textual information is from an Electronic Programme Guide.

8. The receiver of any preceding Claim in which the textual information is stored in memory to enable the device to cycle through several discrete items of synthesiscd speech under the control of the end-user.

9. The receiver of Claim 4 in which the synthesised speech is an alert relating to a newly received DLS or other textual information.

10. The receiver of any preceding Claim 4 - 9 in which the spoken representation of the textual information is downloaded using a MOT.

11. The receiver of any preceding Claim further able to receive and process a data file.

12. The receiver of Claim 11 in which the data file is a spoken representation of a DLS or radio station jingle.

13. The receiver of Claim 11 in which the data file includes information that improves the accuracy of the speech synthesis.

14. The receiver of any preceding Claim in which a radio pause or rewind function is activated, in which live radio is temporarily stored in memory, whilst synthesized speech outputs from a speaker in the receiver.

15. The receiver of any preceding Claim, able to process digital radio signals in one or more of the following formats: DAB, DRM, IBOC.

16. A method of enabling a digital radio receiver to provide information; comprising the step of synthcsising speech in a speech synthesis module in the receiver, the synthesised speech conveying information ordinarily shown on a display of the receiver.