GB2361556A - Text-to-speech browser - Google Patents

Text-to-speech browser Download PDF

Info

Publication number
GB2361556A
GB2361556A GB0009767A GB0009767A GB2361556A GB 2361556 A GB2361556 A GB 2361556A GB 0009767 A GB0009767 A GB 0009767A GB 0009767 A GB0009767 A GB 0009767A GB 2361556 A GB2361556 A GB 2361556A
Authority
GB
United Kingdom
Prior art keywords
information
output
unit
speech
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0009767A
Other versions
GB0009767D0 (en
Inventor
Trevor Douglas Shonfeld
David Bending
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Roundpoint Inc
Original Assignee
Roundpoint Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Roundpoint Inc filed Critical Roundpoint Inc
Priority to GB0009767A priority Critical patent/GB2361556A/en
Publication of GB0009767D0 publication Critical patent/GB0009767D0/en
Publication of GB2361556A publication Critical patent/GB2361556A/en
Application status is Withdrawn legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services, time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML

Abstract

A browser (130) is operable to receive character information from an information source (32) such as a web-page, and to generate an audio speech output on the basis thereof. The browser is capable of responding to user input actions, either by manual actuation or by spoken commands, to navigate through units of information, in accordance with links between the units, to select a particular unit or units for output, the level of detail being selectable. The browser may be used in an entertainment system in a vehicle, or with e.g. a computer, mobile telephone or TV-set-top-box.

Description

2361556 ELECTRONIC BROWSER This invention relates to a browser f or

receiving data in the f orm of mark-up language, and converting the data into a form presentable to a user. The invention is particularly applicable to implementation in a motor vehicle, or for use by a visually impaired person.

Mark-up languages, and particularly hypertext mark-up language (HTML), are used extensively for describing information stored and accessible via the Internet. Data defined in a mark-up language can be retrieved by a browser, such as Microsoft Internet Explorer, for conversion into a graphical display on a personal computer. Mark-up languages are so called because they comprise sets of mark-up tags. Tags are configurational characters embedded in data, which a browser intercepts and uses to configure the presentation of the data in the intended manner. HTML is popular because it is substantially platform independent, and defines a set of tags which provide a useful range of data presentation options.

There is now a demand for access to information in many different situations. For example, browsers are being 2 developed which can be implemented on mobile telephones or other hand held devices. It would be desirable to provide a unit in a motor vehicle which could be used to access information.

However, visual output can be distracting to a driver, and in some jurisdictions it is illegal to place a display output in a vehicle where it can be viewed by the driver. Further, it is likely that the size of any display provided in a motor vehicle would be limited, and would not be suitable for the display of a large quantity of graphical or textual information.

An aspect of the invention provides a browser capable of generating an audio output, and of receiving-user inputs to cause navigation through a plurality of units of information, up and down and across levels of a hierarchical structure.

The user input commands may be spoken or caused by activation of a manual input.

Another aspect of the invention provides a browser capable of receiving a mark-up text input, and of analysing the mark-up text to provide a spoken output.

3 The audio output may be modulated by tags in the mark-up text input, and the browser may include means for monitoring tags, and making modulating changes to the corresponding audio output in accordance with selected tags.

The browser may include means for mapping between tags intended for modifying textual output, such as tags from the basic HTML set, and tags for modulating a speech generating means of the browser. Common speech engines also operate on the basis of text based input with tags inserted therein, to control the nature of the speech output. These speech tags have no natural correspondence with tags for configuring displayed information such as HTML tags. In one embodiment, the browser of the invention includes means for mapping between display tags and speech tags. The mapping means may be preconfigured, so that the user does not have to make any modifications to the browser in order to produce a suitable speech output, with the speech output modulated in line with changes to display characteristics inserted through the output data. The mapping means may be a look-up table.

Another aspect of the invention provides a plug-in capable of execution alongside a browser and a speech 4 engine, for intercepting mark-up output from the browser and converting mark-up tags in said output into tags f or configuration of the speech engine, prior to delivery of the converted output into the speech engine.

Another aspect of the invention provides apparatus f or browsing information received from a remote location, the apparatus including means for automatically converting text configuration signal in said information into speech engine configuration signals, for configuring a speech engine to generate a corresponding speech output.

Preferably the apparatus comprises a controller switchable to receive an audio input, to recognise spoken commands in said input, and to control operation of said apparatus to deliver a speech output.

Further aspects and advantages of the invention may become apparent from the following description of specific embodiments of the invention, provided by way of example only, with reference to the accompanying drawings, in which:

Figure 1 is a schematic diagram of a system incorporating a vehicle entertainment unit of a first embodiment of the present invention; Figure 2 is a schematic diagram of the entertainment unit illustrated in Figure 1; Figure 3 is a schematic diagram of a tag mapping unit of an audio browser of the entertainment unit illustrated in 10 Figure 2; Figure 4 is a schematic diagram of a speech generation unit of the entertainment unit illustrated in Figure 2; Figure 5 is a schematic diagram of an abridging unit of the speech generation unit illustrated in Figure 4; Figure 6 is a schematic diagram of a fundamental file structure for use with an information product in the 20 first specific embodiment of the invention; Figure 7 is a schematic diagram of an information product having the file structure illustrated in Figure 6; 6 Figure 8 is a f low diagram showing response of the specific embodiment to user actions; Figure 9 is a flow diagram illustrating the mode of operation of a browser unit of the entertainment system illustrated in Figure 2; Figure 10 is a flow diagram illustrating the mode of operation of the tag mapping unit illustrated in Figure 3; Figure 11 is a flow diagram illustrating the mode of operation of a speed processing unit of the abridging unit illustrated in Figure 5; Figure 12 is a flow diagram illustrating the mode of operation of a hierarchy sorter of the abridging unit illustrated in Figure 5; Figure 13 is a system incorporating a vehicle entertainment unit of a second specific embodiment of the invention; Figure 14 is a schematic diagram of the vehicle 25 entertainment unit illustrated in Figure 13; 7 Figure 15 is a schematic diagram of a radio scanning browser unit of the vehicle entertainment unit illustrated in Figure 14; and Figure 16 is a schematic diagram of a system incorporating a vehicle entertainment unit of a third specific embodiment of the invention; In a first embodiment, a system 10 illustrated in Figure 10 1 comprises a vehicle entertainment unit 100, for installation in a motor vehicle, preferably in the dashboard thereof. The entertainment unit 100 is operable to receive operator inputs by a control panel 102 and/or a microphone 104, and is operable to produce 15 graphical and/or textual outputs on a display 106 and audio output via a loudspeaker 108. The entertainment unit 100 comprises a memory card reader 120, for receiving a memory card 20 storing information in mark-up language form. The memory card reader 120 reads the 20 information from the memory card 20, and delivers this information to a text-to-speech browser 130, which converts information into signals to drive the display 106 and the loudspeaker 108.

8 The information stored on the memory card 20 is retrieved from a web server 30 storing a web page 32. The web server 30 is accessible via the Internet 40 through a local server 42 and telephone network 441 by means of a personal computer 50 having a browser 52 and a corresponding memory card reader/writer 54 into which the memory card 20 can be installed.

The personal computer 50 accesses the web server 30 by means of a modem 56, and retrieves information from the web page 32, to be stored on the memory card 20. A suitable format for the memory card 20 is the Memory Stick (trade mark) by Sony Electronics Limited. Memory Stick storage media are presently available in different capacities up to 64 M.B.

The storage of the information can be by means of standard "save" instruction provided on the desktop of the PC 50. This causes the creation of a standard HTML file on the memory card. Several files can be retrieved and saved at the same time.

A user can retrieve a copy of the page 32, which, for example, is a news based web page, onto the personal computer 50. Thereafter, the copy of the web page 32 can 9 be stored on the memory card 20. Then the memory card 20 can be transferred to the vehicle entertainment unit 100.

In response to browsing commands received from the user, the vehicle entertainment unit 100 will then convert information held in one or more files on the memory card in mark-up language form, into a speech output signal to drive the loudspeaker 108. The control panel 102 and the microphone 104 are receptive of user commands, for example, to speed up or slow down delivery of the information as speech output, to increase /decrease the volume of the speech output, and browsing commands to select a different set of information to be presented to the user.

The text-to-speech browser 130 comprises a browser unit 132, which refers to the information on the memory card via the memory card reader 120. The browser unit 132 passes information to a tag mapping unit 134 which prepares the information for submission to a speech generation unit 136.

Further, the browser unit 132 generates a display output, corresponding to the mark-up language retrieved from the memory card 20, directly to a display driver 140, for driving the display 106. It will be understood that the display output and therefore the display 106 and the display driver 140 are optional features, which may be dispensed with, particularly in the case of legal restriction against the use of graphical displays in motor vehicles.

The tag mapping unit 134, as illustrated in further detail in Figure 31 includes a text tag recognition unit which receives mark-up text in a stream f rom. the browser unit 132. The text tag recognition unit 160 refers to a tag map 162, comprising a text tag list 164 and a speech tag list 166, to find a speech tag corresponding to a recognised text tag. Then, a speech tag insertion unit 168 inserts the appropriate speech tag in the information.

As illustrated further in Figure 4, the speech generation unit 136 includes an abridging unit 170 and a speech engine 190. The speech engine 190 can be an of f the shelf product, such as products by Lernout & Haupsie, Lucent, IBM and others.

The abridging unit 170 is responsive to user input commands to modify the text, with speech tags, delivered to the speech generation unit 136, to deliver speech tag 11 text into the speech engine 190 in accordance with a required level of detail. The operation of the abridging unit 170 will now be described with reference to Figures and 8.

The abridging unit 170 has a hierarchy sorter 172, which receives speech tagged text and sorts the tagged text into levels 174a, b, c, d of detail. The levels of detail are determined by monitoring sub-headings in the web page. While this may be an approximation of the actual level of detail within the web page, it provides a useful way of distinguishing between general statements, which may be provided within the header or title and extremely high levels of detail which are likely to be placed several levels of sub-headings below the title level.

Accordingly, a title will be placed in level 1 (174a), an introductory statement will be held in level 2 (174b), a passage beneath a sub-heading will be placed in level 3 (174c), and level 4 (174d) will contain further statements contained in a passage beneath a sub-heading in the web page, perhaps such as text in smaller font or in italics.

12 The level of abridgement is requested indirectly, by a user requesting a particular output speed. Current requested speed is held in a current requested speed storage unit 178. This is compared with a predetermined maximum speed 176, by a speed processing unit 180, which then sets an abridgement level, held in an abridgement level storage unit 182, and a current speed for the speech engine, held in a current speed storage unit 184.

The text is released to the speech engine 190 in accordance with the level of the abridgement requested by the user.

The entertainment system 100 is to be used in conjunction with information presented in the form -of a file structure of units of information as illustrated in Figures 6 and 17. Figure 6 illustrates the general arrangement of an information file structure, containing a plurality of interlinked files, each containing information held in HTML format. A category file 1 lists a series of categories, each of which is a hypertext link to a corresponding sub-category file 2A to 2D.

13 Each sub-category file 2A to 2D contains a series of subcategories with item lists, with each item list containing a list of hyperlinks to item files 3.

A particular example of an information product, containing a f ile structure as shown in Figure 6 is illustrated in Figure 7. Reference numerals in Figure 7 correspond with reference numerals in Figure 6, qualified by a prime mark ( I). Such a product as shown in Figure 10 7 could be provided on a memory stick 20 as previously described, either downloaded from a remote location by the user or purchased as a complete product. When output of the information is commenced, the contents 15 of f ile 1 1 are read out. Each of the items in that f ile is a category heading, which is also a link to a second level file 2A', 2B' etc. If the user selects the "news" category, either by pressing a predetermined button while the "news" category heading is being read out by the 20 text-to-speech browser, or by speaking a predetermined command at the same time, then the text-to-speech browser will commence reading the contents of file 2A'. This file lists sub-categories and corresponding item lists, the items being news stories. If, in that case, the user 25 requested that the "next item" be read out, the text-to- 14 speech browser would pass to file 2B', to read the headings contained in the "sport" category.

Returning to the contents of file 2A', if the user then selects the reading of local news story 1, which is item 1 in list Al' in file 2A', then file 3A(1) ' is read out.

The reading of the contents of that file is governed by the abridging unit 170, which analyses section headings and sub-section headings to establish which parts of the document should be read out. once the reading of that news story is complete, the text-to-speech browser passes to file 3A(2) I, which is the next file in the third level of the file hierarchy.

operation in relation to the file structure -illustrated in Figures 6 and 7 will be described with reference to Figure 8.

Operation commences on installation of information into the vehicle entertainment unit 100. In the illustrated example, operation commences automatically, but this could also be initialised by receipt of a user input.

Operation commences with step S8-2, where the highest level file is identified and the text-to-speech browser starts reading the contents thereof.

The user input monitor 110 then carries out a series of input monitoring steps, on both the control panel 102 and speech commands received from the microphone 104. If a valid user input is recognised, then an appropriate action is taken. Firstly, in step S8-4, a check is made as to whether a request has been received for reading to stop. If so, then in step S8-6, the text-to-speech browser 130 stops reading. Then, in step S8-8, the user input monitor monitors for a user input requesting the text-to-speech browser 130 to start reading again. This step continues to monitor until such a request is received, when the routine returns to step S8-2.

If a stop reading request is not received in step S8-4, then the user input monitor checks whether an item -selection request has been received, in relation to an item link being read at that time. Thus, in the example shown in Figure 7, if the text-to-speech browser 130 is reading the words "NEWS" from unit 1 1, and such a request is received, then an item selection request is deemed to have been made while that item is being read. If so, then in step S8-12, the contents of the linked item are read out. In that case, file 2A1 in Figure 7 would be read out, comprising lists of news items held in sub- categories. Following step S8-12, the user input monitor continues to monitor the inputs from S8-4 onwards.

If no item selection request is received while a link to an item is being read, then in step S8-14, the user input monitor checks whether a request has been received for the attention of the text-to-speech browser to move up one level in the file structure. This means that if file 2A' is currently being read out, the user input directs the browser to return to f ile I I, which is the level directly above file 2A1. If so, then in step S8-16, the contents of the item one level above the present item is read out, and the operation continues from step S5-4 onwards.

If no "up one level" request is received, or if such a request is received but no level above the present level exists, then in step S8-18, the user input monitor monitors for a "previous item" request. A previous item is one in the same level, such as in the level containing f iles 2A, 2B, 2C and 2D in Figure 6 or in the level containing files 3A(l), 3A(2), 3A(3), 3B(l), 3B(2), etc.

Thus, if the text-to-speech browser is currently reading item B2.1, which is held in file 3B(2), and the user input monitor 110 receives a "previous item" request, 17 this is interpreted as a request that the text-to-speech browser 130 commences reading item B1.1 held in file 3B(1).

The lists of items held in the series of files 3A, 3B, 3C and 3D can be incorporated into a single list with four separate entry points. In that way, file 3B(1) has a corresponding previous file, which is the las- file in the series of f iles 3A. Thus, if a "previous item" 10 request is received when item B1. 1 is being read out, the decision of step S8-18 would be made in the affirmative. If the decision of S8-18 is made in the affirmative, then that previous item is read out in step S8-20. Operation 15 then continues from step S8-4 onwards. If not, then in step S8-22 the user input monitor monitors the input for a "next item" request. If such a request is received, and a next item in the same level exists, then in step S8-24 the contents of that next item are read out. The 20 concept of a "next" item is interpreted in the same way as the concept of a "previous" item, all of the files in a given level being placed in a single list, such that all items and information are held in one sequence. After step SB-24, the operation continues from step S8-4 25 onwards.

18 If step S8-22 is answered in the negative, the operation proceeds to step S8-26, whereby a check is made as to whether the speech engine has completed reading the present item. If not, then the operation returns to step S8-4 to continue monitoring user inputs while the present item is being read. If the reading of the present item has been completed then, in step S8-28, a check is made as to whether a next item exists. If not, then the operation has ended, coming to the end of the list of items of information on that particular level. If a next item exists, then that next item is read out in step S8-24. Operation then continues from step S8-4 as before.

The above method of operation of the text-to-speech browser is merely one of many different ways in which user input actions can influence the operation of the browser. For instance, additional functionality could be provided, so that the user does not need to progress to the lowest level of information in order to retrieve all new stories. A single spoken command, such as "read all stories" could be provided for a user, receipt of which by the user input monitor could cause the text-to-speech browser to commence at file 3A(l), and to continue reading stories until the last of the files in that level 19 is completed. Further, spoken input commands of higher complexity could be provided, to allow a user to use more natural language input. For example, it is conceivable that a user might request all stories relating to a particular topic, and then the text-to-speech browser searches for stories in that category and then reads them out in sequence.

The sequence of steps exemplifying operation of the text to-speech browser 130 is set out in Figure 9. In step S9-2, the browser unit 132 monitors for a browsing command input by the user. If a browsing command is input, then in step S9-4, the browser unit 132 retrieves information from the memory card 20, on the basis of the browsing command received from the user. Instructions can be received as selections of buttons on the control panel 102, or as spoken instructions received in the microphone 104. The user input monitor 110 includes a facility for translating instructions received by either means into messages for the browser unit 132 or for configuration of the speech generation unit 136. The user input monitor may be given the facility of learning the characteristics of the voices of habitual users of the entertainment unit 100, enabling more accurate recognition of spoken commands and conversion thereof into messages.

The browser unit 132 passes a stream of mark-up language data to the tag mapping unit 134 which, in step S9-6, intercepts mark-up tags in the mark-up language data output from the browser unit 132, and converts those mark-up tags into tags for driving a speech generation unit 136. The speech generation unit 136 receives the converted mark-up language from the tag mapping unit 134 and, in step S9-8, generates a speech signal, which is passed to an amplifier 150 in step S9-10 for driving the loudspeaker 108. Characteristics of the speech signal, such as voice quality and amplitude, are controllable by means of speech tags included in the input to the speech generation unit 136.

Operation of the tag mapping unit 134 is illustrated in Figure 10. In step S10-2, the tag mapping unit 134 waits to receive a message from the browser unit 132 to commence reading received text in step S10-4. In step S10-6, the text tag recognition unit 160 monitors the stream of mark-up text until it identifies a text tag.

Text tags are used in mark-up language as instructions to a browser to generate a display. When a text tag is 21 identified, in step S10-8 the tag is passed to the tag map 162 of the tag mapping unit 134. The text tag list 164 lists the standard set of tags available for use in the mark-up language. For example, a web page written in HTML will probably start with a heading, bounded by <hl> tags. As illustrated, the <hl> tag in the text tag list 164 maps to a corresponding entry in the speech tag list 166. The speech tag list 166 lists a set of available speech tags, which act as instructions to the speech generation unit 136. In this example, the <hl> tag maps to a corresponding speech tag, to which the speech generation unit 136 is responsive to alter its output settings such that text bounded by the <hl> tags in the original input is output by the speech generation unit 136 in a neutral, announcement style voice.

Once a text tag has been recognised by the text tag recognition unit 160, and has been delivered to the tag map 162, the tag map 162 returns the appropriate entry in the speech tag list 166 to the speech tag insertion unit 168. In step S10-10, the speech tag insertion unit 168 replaces the text tag recognised by the text tag recognition unit 150 with the corresponding speech tag.

The output from the speech tag insertion unit is the mark-up language text, wherein the various text tags are 22 converted into speech tags appropriate f or driving the speech generation unit 136.

Thereafter, in step S10-14, the text tag recognition unit 160 continues to read the input text. If in step S10-6 no tag is detected, a check is made in step S10-12 as to whether the end of the input text has been reached. If it has, then the text tag recognition unit 160 ceases operation. otherwise, the unit 160 continues to read the text in step S10-14.

operation of the abridger will now be described in conjunction with Figure 11. In step S11-2, the abridger monitors for a change of speed request from the user input monitor. When a change of speed -request is received, in step S11-4, it is determined whether the speed is to be increased or decreased. If the change of speed request is for the speed to be increased, then the current requested speed held in the current requested speed unit 178 is increased, in step S11-6.

In step S11-8, the current requested speed is compared with a value held in the maximum speed unit 176. The value held in the maximum speed unit 176 is preconfigured, and is chosen to correspond with the 23 maximum speed of spoken output at the speech engine 190 while maintaining comprehensibility. The speed processing unit 180 makes a comparison between the current requested speed 178 and the maximum speed 176.

If the current requested speed exceeds the maximum speed 176 then in step Sll-10 the level of the abridgement is increased. This abridgement is stored in the abridgement level unit 182.

The abridgement level unit 182 controls the hierarchy sorter 172 such that the higher the level of abridgement stored in the abridgement level unit 182, the more levels 174 that are discarded by the hierarchy sorter 172.

thus, if the value held in the abridgement level unit 182 is zero, all levels are delivered to the speech engine, whereas if the value held in the abridgement level unit 182 is 3 then three levels will be abridged and only level I (174a) illustrated in Figure 5 will be delivered to the speech engine (190).

If, in step S11-8, it is found that the current requested speed is not greater than the maximum speed, then in step Sll-12 the speed processing unit 180 increases the current speed and this value is stored in the current 24 speed unit 184. After step S11-12, the procedure returns to monitoring for speed change requests in step S11-2.

If, in step S11-4, the change of speed request is found to be f or the speed to be decreased then, in step S 11- 14, the abridgement unit 170 decreases the currently requested speed held in the current requested speed unit 178. A comparison is then made by the speed processing unit 180 between the current requested speed and the maximum speed value 176, in step S11-16. If the current requested speed is less than the maximum speed then, in step S11-18, the current speed, held in the current speed unit 184, is decreased. This causes the speech engine to decrease its speed of delivery of spoken output.

otherwise, if the current requested speed is not less than the maximum speed, then the level of abridgement, as represented by the value held in the abridgement level unit 182, is decreased in step S11-20. This causes more information to be delivered by the hierarchy sorter 172 to the speech engine 190.

In that way, the abridgement unit 170 can deliver more or less information to the speech engine 190 as required, if the requested speed of delivery of speech exceeds the maximum possible speed of delivery of speech to the user.

The text-to-speech browser 130 can be implemented in several different ways. In practice, the text-to-speech browser 130 can be part of original equipment, supplied to an original equipment manufacturer (OEM) as a computer program on a disk for transmitting on a carrier signal.

The OEM will then incorporate the text-to-speech browser into the entertainment system 100. Incorporation may comprise building the software into a larger software program. This could be made most convenient if the source code (high level programming language instructions) are supplied to the OEM.

Alternatively, the text-to-speech browser may be constructed by a home user, using a Java Applet or plug in, either retrieved from a web page via the Internet, or received as a computer program product on a storage medium such as a magnetic or optical disks.

The text-to-speech browser 130 need not be provided as a single component. In fact, as described above, the text to-speech browser 130 may incorporate standard sub components corresponding to the browser unit 132and the speech generation unit 136. The browser unit can be a browser as will be found on various original equipment, as can the speech generation unit. The tag mapping unit 26 134 can be downloaded or installed into the original equipment during manufacture or after supply, to consolidate the browser unit and the speech generation unit into the text-to-speech browser 130.

Improved versions of the tag mapping unit 134 can be released and supplied to users after purchase of original equipment, to enable extended application of the principle of mapping between mark-up instructions for text and mark-up instructions for speech.

Figure 13 illustrates a second specific embodiment of the invention, making use of a one way communications channel to transmit information from an information server 250 to a user.

The information server 250 stores a plurality of information files 236, containing information for transmission to the user. The information files 236 are passed to a transmitter 240, with a radio antenna 242, which continuously transmits each of the information files 236 on a radio signal.

An in-car entertainment unit 200 is illustrated in Figure 13, those parts thereof having substantially common 27 function with the corresponding parts in the example illustrated in Figure 1 being assigned identical reference numerals. The entertainment unit 200 comprises a radio receiver unit 220 with a receiving antenna 218 extending therefrom, for receiving a radio signal from the transmitting antenna 242 of the transmitter 240. The radio receiver unit 220 receives the information files 236 as transmitted between the antennas 242, 218, and stores the information files 236 in a memory buffer 222.

It will be appreciated that not every information file will be received by the radio receiver unit 220 simultaneously; however, in this example, all information is continuously transmitted by the transmitter 240.

A text-to-speech browser 230, of substantially the same construction as the text-to-speech browser illustrated in Figure 1, browses information stored in the memory buffer 222, and sends browsing instructions to the radio receiver 220, for possible change of receiving frequency.

This will allow the text-to-speech browser 230 to be responsive to user instructions to receive a further information file 236, for that information file to be received in the radio receiver unit 220 and to be stored in the memory buffer 222 for browsing by the text-to- 28 speech browser 230. Consequently, spoken output of a particular information file can be selected by a user.

From Figure 14, it will be appreciated that several of the functional units of the entertainment unit 200 are identical to the functional units of the entertainment unit 100 illustrated in Figure 1. Even within the text to-speech browser 230, the tag mapping unit 134 and the speech generation unit 136 are as described in relation to the first embodiment. Further, the text-to-speech browser 230 comprises a radio scanning browser 232. This unit refers to information held in the memory buffer 222, the contents of which are controlled by means of browsing instructions sent from the radio scanning browser 232 to the radio receiver unit 220.

The radio scanning browser 232 is illustrated in further detail in Figure 15. The radio scanning browser 232 comprises an extension to the browser 132 described above in relation to the first embodiment. In addition to the browser unit 132, which receives information from the memory buffer and sends information to the tag mapping unit 134, browsing instructions received from the user input monitor 110 are interpreted by the browser and sent towards the radio receiver unit 220.Interposed between 29 the browser 132 and the radio receiver unit 220 is an information change request unit 233 which interprets the browsing instructions sent by the browser 132 into a request for the radio receiver unit 220 to receive a different unit of information.

A third specific embodiment will now be described with reference to Figure 16. In Figure 16, an entertainment unit 300 is provided, with a telephone control unit 320 and a text-to-speech browser 130. The text-to-speech browser 130 is of identical functionality to that of the entertainment unit 100 illustrated in Figure 1.

In this case, the telephone control unit 320 allows connection of the entertainment unit 300 -Eo a mobile telephone unit 322. Mobile telephone unit 322 can be used to establish two way telephony connection with a base station 46, connected into a telephone network 44.

Via the telephone network 44, the user of a mobile telephone 322 can access a local server 42 for access to the Internet 40, by means of which the previously described web server 30 can be accessed for retrieval of a copy of the information held in the web page 32.

Browsing instructions are sent by the text-to-speech browser130, to the telephone control unit 320, by means of which information is retrieved from the web page 32 into the text-to-speech browser 130 for generation of a speech output at the loudspeaker 108.

Whereas the function of abridgement has been described above interlinked with user selection of a comfortable listening speed, it would be possible to separate these two functions, allowing a user to select a level of abridgement without also maximising the speed of delivery 10 of speech output. The present invention has been illustrated by way of example in relation to an in-car entertainment system. However, it will be appreciated that various different 15 applications of the present invention are also- envisaged. For example, the text-to-speech browser can be incorporated on a personal computer, such as a desktop PC or a portable PC, to supplement existing functionality thereof. This can allow a user to browse web pages on 20 the Internet without having view of the VDU of a PC, thus a visually impaired user can browse for information over the Internet. Increasingly, hand held computers and other personal 25 digital apparatus (PDA) are becoming available, with the 31 capability to browse the Internet, either through connection to a land based telephone network or through a wireless protocol such as Wireless Application protocol (WAP). WAP can also be incorporated into a mobile telephone, for the browsing of pages from websites accessible via the Internet; the text-to-speech browser illustrated by way of example above can also be applied to this device to allow a user to listen to a spoken output corresponding to text on a web page, and to control browsing by voice commands.

Further, the present invention can be applied to a set top box for use with a television, for browsing the Internet. This set top box incorporates a browser, to which the present invention can be applied-. Further, digital televisions are available with browsers incorporated therein; the present invention can also be applied to such a device.

The wireless application protocol can be applied to devices limited only by miniaturisation technology.

Wrist watches, electronic personal organisers and mobile telephones are capable of incorporating the present invention.

32 Further, while the present invention has been illustrated by way of three examples for the retrieval of information for browsing, comprising downloading from the Internet on to a portable memory storage device, instantly downloading from an information source over a one way communications channel (radio communication) and instantly downloading from a website or other information source accessible via the Internet over a two way wireless communication channel, other manners of retrieving information for browsing are envisaged.

For example, portable storage devices, storing files of text based information could be offered for sale in retail outlets for consumers to purchase and use in their motor vehicles. This could substitute for pfirchase of a newspaper, allowing a user to listen to news articles while driving. This storage medium could be an optical storage disk or a magnetic storage disk. The storage medium could be read-only or recordable; in the latter case, the user could pay a fee into a machine in a retail outlet, which then loads the current day's news on to the storage medium, replacing the previous day's information.

33 In the case of instant one way communication, data can be broadcast over TV or radio channels, via Teletext or via other broadcasting means such as satellites.

In the case of instant two way communication, satellite networks are again appropriate, such as Iridium and INMARSAT. Further, cellular telephone networks, which are wireless but also incorporate a network of land stations, are useful, as illustrated in such as that 10 illustrated in Figure 14, and broad band networks such as cable networks can be used to increase data transmission rates. Infra red transmission such as defined by IRDA standards can also be utilised to establish two way communication with a base station. 15 Further, various wireless home protocols are currently in development to connect appliances together. This can provide for the transfer of information to a user from various appliances and from remote information sources. 20 Thus, a refrigerator with computing capability, to process information relating to food stored therein, can retrieve shopping and/or recipe information from remote locations, perhaps via the Internet, and by means of the present invention can cause generation of a spoken output 25 of information potentially useful to the listener. This 34 would allow a user to configure the refrigerator to cause generation of a speech output of a recipe, and to play back the recipe as quickly or as slowly as the user required. This type of application is capable of being implemented by means of Blue Tooth technology.

Claims (47)

CLAIMS:
1. Information retrieval and output apparatus comprising:
information receiving means for receiving character information defining a plurality of units of information and links therebetween; speech generation means for generating an audio speech output signal corresponding with information received by the receiving means; and selecting means operable by a user, to navigate in accordance with links between the units of information, to select units of information for output by the speech generation means.
2. Apparatus according to claim 1, wherein said information receiving means is operable to receive character information in a form suitable for conversion into a display signal for output, and wherein said apparatus comprises information conversion means for converting said character information into a form suitable for conversion by said speech generation means into said audio speech output signal.
3. Apparatus according to claim 1 or claim 2, wherein 36 said selecting means comprises user input monitoring means for generating control signals in response to user input actions; wherein:
the apparatus is arranged so that:
in response to a first control signal, information from a first unit of information is output by the speech generation means; in response to a second control signal, information from a second unit of information defined by a link from the first unit is output by the speech generation means; and in response to a third control signal, information from said first unit is output by the speech generation means.
4. Apparatus according to claim 3 wherein the apparatus is arranged so that in response to a fourth control signal, the speech generation means is operable to cease output of information.
5. Apparatus according to any of claims 1 to 4 wherein the units of information are defined as parents and children.
37
6. Apparatus according to claim 5 wherein the second unit is a child unit of the f irst unit.
7. Apparatus according to any preceding claim comprising storage means for storing information for output to a user.
8. Apparatus according to any preceding claim wherein the apparatus is arranged so that in response to said speech generation means completing output of a unit of information, said speech generation means is operable to commence output of a further unit of information.
9. Apparatus according to claim 1 wherein said information receiving means is operable -to receive information in a hierarchical structure with a unit of information on a given hierarchical level comprising details of units of information in a hierarchical level below said hierarchical level, said selecting means being operable to select a unit of information for output by the speech generation means, said selecting means comprising user input monitoring means for generating control signals in response to user input actions; wherein said selecting means is arranged so that:
in response to a first control signal, information 38 from a unit of information in a first hierarchical level is selected for output; in response to a second control signal, information from a unit of information in a second hierarchical level, below said first level, is selected for output; in response to a third control signal, information from a further unit of information in said second hierarchical level is selected for output; and in response to a fourth control signal, information from said unit of information in said first hierarchical level is selected for output.
10. Apparatus according to claim 9 wherein said speech generation means is operable in response to a fifth control signal to output a unit of information selected by said selecting means, and is operable in response to a sixth control signal to suspend output of said unit of information.
11. Apparatus according to claim 9 or claim 10 wherein said speech generation means is operable in response to a seventh control signal to increase speed of speech output, and in response to an eighth control signal to reduce speed of speech output.
39
12. Apparatus in accordance with claim 11 wherein said speech generation means is operable to selectively output information, the level of selection being set in response to instances of said seventh and eighth control signals.
13. Information retrieval and output apparatus comprising:
information receiving means for receiving information in a form intended for conversion into a display signal for output; display configuration command detection means for detecting, in information received by said information receiving means, a display configuration command intended for configuration of a display signal for output; audio output configuration command genetation means for generating an audio output configuration command corresponding with a detected display configuration command; and generation means for generating an audio speech output signal corresponding with said received information, said speech generation means being responsive to an audio output configuration command generated by said audio output configuration command generation means to modulate said audio speech output in a corresponding manner.
14. Apparatus in accordance with claim 13 wherein said display configuration command detection means is operable to detect a display configuration command being a member of a set of permitted display configuration commands.
15. Apparatus in accordance with claim 14 wherein said audio output configuration command generation means comprises configuration command mapping means for mapping each member of said set of permitted display configuration commands to a corresponding audio output configuration command, for configuration of said speech generation means for modulation of said audio speech output.
16. Apparatus in accordance with any of claims 13 to 15 further comprising speech generation speed control means operable to receive speed control user input signals, for controlling output speed of said speech generation means.
17. Apparatus in accordance with claim 16 wherein said speech generation speed control means comprises information discrimination means defining a plurality of levels of detail of information and for discriminating information received in said apparatus by said levels of detail, for selected output of levels of detail of 41 information by said speech generation means.
18. Apparatus in accordance with claim 17 wherein said information discrimination means is operable to reduce a number of levels of detail of information delivered to the speech generation means in response to receipt of a speed control user input signal corresponding to a request to increase speed of output f rom said speech generation means.
19. Apparatus in accordance with claim 17 or claim 18 wherein said information discrimination means is operable to increase a number of levels of detail of information delivered to the speech generation means in response to receipt of a speed control user iniput signal corresponding to a request to reduce speed of output from said speech generation means.
20. Apparatus in accordance with any of claims 13 to 19 wherein said information receiving means is operable to receive information comprising data and configuration commands interspersed in said data.
21. Apparatus in accordance with claim 20 wherein said display configuration command detection means is operable 42 to detect a configuration command interspersed in data received by said information receiving means.
22. Apparatus in accordance with any preceding claim wherein said information receiving means is operable to receive information in a mark-up language format.
23. Apparatus in accordance with claim 22 wherein said information receiving means is operable to receive information in hypertext mark-up language format.
24. Apparatus in accordance with claim 22 wherein said information receiving means is operable to receive information in an extended mark-up language format.
25. A method of retrieving and outputting information comprising:
receiving information for output to a user; generating an audio speech output signal corresponding with received information; and selecting information for output in units in accordance with links between the units of information, said selecting step comprising monitoring for user input actions and generating control signals in response to user input actions; 43 wherein the method further comprises the steps of:
outputting a first unit of information in response to a first control signal; outputting a second unit of information, defined by a link from the f irst unit, in response to a second control signal; and outputting said first unit in response to a third control signal.
26. A method according to claim 25 wherein the units of information are defined as parents and children.
27. A method according to claim 26 wherein the second unit is a child unit of the f irst unit.
28. A method according to any of claims 25 to 27 further comprising the step of storing information received in said receiving step'.
29. A method according to any of claims 25 to 28 further comprising the step of ceasing output of information in response to a fourth control signal.
30. A method according to any of claims 25 to 29 further comprising the step of commencing output of the further 44 unit of information in response to completing output of a unit of information.
31. A method of retrieving and outputting information comprising:
receiving information for output to a user, said information being in a hierarchical structure with a unit of information in a given hierarchical level comprising details of units of information in a hierarchical level below said hierarchical level; generating an audio speech output signal corresponding with said received information; selecting a unit of information for output, said selecting step comprising monitoring user input actions and generating a control signal in responge to a user input action; wherein said selecting step comprises:
selecting for output a unit of information in a first hierarchical level in response to a first control signal; selecting for output a unit of information in a second hierarchical level, below said first hierarchical level, in response to a second control signal; selecting for output a further unit of information in said second hierarchical level in response to a third control signal; and selecting for output said unit of information in said first hierarchical level in response to a fourth control signal.
32. A method according to claim 31 wherein said generating step comprises, in response to a fifth control signal, outputting a unit of information selected by said selecting means and, in response to a sixth control signal, suspending output of said unit of information.
33. A method according to claim 31 or claim 32 wherein said generating step comprises, in response to a seventh control signal, increasing speed of speech output and, in response to an eighth control signal, reducing speed of speech output.
34. A method according to claim 33 wherein said generating step comprises selectively outputting information, at a level of selection set in response to instances of said seventh and eighth control signals.
35. A method of retrieving and outputting information comprising:
receiving information in a form intended for 46 conversion into a display signal for output; detecting, in received information, a display configuration command intended for configuration of a display signal for output; generating an audio output configuration command corresponding with a detected display configuration command; and generating an audio speech output signal corresponding with said received information, modulating said audio speech output in accordance with the generated audio output configuration command.
36. Method in accordance with claim 35 further comprising the step of providing a predetermined set of permitted display configuration commands and wherein said detecting step comprises detecting a display configuration command being a member of said set of permitted display configuration commands.
37. Method in accordance with claim 36 further comprising the step of mapping each permitted display configuration command to a corresponding audio output configuration command and said generating step comprises generating an audio output configuration command being mapped to the display configuration command detected in 47 said detecting step.
38. Method in accordance with any of claims 35 to 37 further comprising, in response to receiving a speech speed control user input signal, controlling output speed in said speech generating step.
39. Method in accordance with any of claims 35 to 38 wherein said receiving step comprises receiving information comprising data and configuration commands interspersed in said data.
40. Method in accordance with claim 39 wherein said detecting step comprises detecting a configuration command interspersed in said data received in said receiving step.
41. Method in accordance with any of claims 25 to 40 wherein said step of receiving comprises receiving information in a mark-up language format.
42. Method in accordance with claim 41 wherein said receiving step comprises receiving information in hypertext mark-up language format.
48
43. Method in accordance with claim 42 wherein said receiving step comprises receiving information in an extended mark-up language format.
44. A computer program comprising processor executable instructions operable to cause a computer to become configured as apparatus in accordance with at least one of claims 1 to 24. 10
45. A computer program comprising processor executable instructions for causing a computer to become operable to perform the method of at least one of claims 25 to 43.
46. A storage medium storing a computer program in 15 accordance with claim 44 or claim 45.
47. A signal carrying computer readable instructions forming a computer program in accordance with claim 44 or claim 45.
GB0009767A 2000-04-19 2000-04-19 Text-to-speech browser Withdrawn GB2361556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0009767A GB2361556A (en) 2000-04-19 2000-04-19 Text-to-speech browser

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
GB0009767A GB2361556A (en) 2000-04-19 2000-04-19 Text-to-speech browser
PCT/GB2001/001788 WO2001079986A2 (en) 2000-04-19 2001-04-19 Electronic browser
AU93361/01A AU9336101A (en) 2000-04-19 2001-04-19 Electronic browser

Publications (2)

Publication Number Publication Date
GB0009767D0 GB0009767D0 (en) 2000-06-07
GB2361556A true GB2361556A (en) 2001-10-24

Family

ID=9890283

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0009767A Withdrawn GB2361556A (en) 2000-04-19 2000-04-19 Text-to-speech browser

Country Status (3)

Country Link
AU (1) AU9336101A (en)
GB (1) GB2361556A (en)
WO (1) WO2001079986A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2378875A (en) * 2001-05-04 2003-02-19 Andrew James Marsh Annunciator for converting text messages to speech

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100629434B1 (en) * 2004-04-24 2006-09-27 한국전자통신연구원 Apparatus and Method for processing multimodal web-based data broadcasting, and System and Method for receiving multimadal web-based data broadcasting
WO2009083832A1 (en) * 2007-12-21 2009-07-09 Koninklijke Philips Electronics N.V. Device and method for converting multimedia content using a text-to-speech engine
US9405847B2 (en) * 2008-06-06 2016-08-02 Apple Inc. Contextual grouping of a page
KR101617461B1 (en) * 2009-11-17 2016-05-02 엘지전자 주식회사 Method for outputting tts voice data in mobile terminal and mobile terminal thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0847179A2 (en) * 1996-12-04 1998-06-10 AT&amp;T Corp. System and method for voiced interface with hyperlinked information
US5899975A (en) * 1997-04-03 1999-05-04 Sun Microsystems, Inc. Style sheets for speech-based presentation of web pages
US5983184A (en) * 1996-07-29 1999-11-09 International Business Machines Corporation Hyper text control through voice synthesis

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3340585B2 (en) * 1995-04-20 2002-11-05 富士通株式会社 Voice response unit
US6018710A (en) * 1996-12-13 2000-01-25 Siemens Corporate Research, Inc. Web-based interactive radio environment: WIRE

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983184A (en) * 1996-07-29 1999-11-09 International Business Machines Corporation Hyper text control through voice synthesis
EP0847179A2 (en) * 1996-12-04 1998-06-10 AT&amp;T Corp. System and method for voiced interface with hyperlinked information
US5899975A (en) * 1997-04-03 1999-05-04 Sun Microsystems, Inc. Style sheets for speech-based presentation of web pages

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Bell Labs: PhoneBrowser TTS...", at www.bell-labs.com/news/2000/march/6/1.html, 31 Mar 2000. *
"Building...Model...", at www.brookes.ac.uk/schools/cms/research/speech/publications//40ergsoc.htm. *
"Home Page Reader Overview", at http://www-3.ibm.com/able/hpr.html, 29 Mar 2000. *
"Voices from the Web", at www.siemens.de/FuI/en/zeitschrift/archiv/Heft1_99/artikel8.html, 3 Apr 00. *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2378875A (en) * 2001-05-04 2003-02-19 Andrew James Marsh Annunciator for converting text messages to speech

Also Published As

Publication number Publication date
WO2001079986A2 (en) 2001-10-25
GB0009767D0 (en) 2000-06-07
WO2001079986A3 (en) 2003-04-24
AU9336101A (en) 2001-10-30

Similar Documents

Publication Publication Date Title
EP0811940B1 (en) Web browser allowing navigation between hypertext objects using remote control
CA2634209C (en) Method and system for displaying data on a mobile terminal
US6803926B1 (en) System and method for dynamically adjusting data values and enforcing valid combinations of the data in response to remote user input
US7957354B1 (en) Internet enabled cellular telephones
US7366979B2 (en) Method and apparatus for annotating a document
EP1320972B1 (en) Network server
US6684088B1 (en) System and method for displaying electronic mail messages on a low bandwidth device
US9549001B1 (en) Method and device for sourcing and constructing a playlist
US8065388B1 (en) Methods and systems of delivering directions to a mobile device
US7286651B1 (en) Method and system for multi-modal interaction
US8032577B2 (en) Apparatus and methods for providing network-based information suitable for audio output
US5923738A (en) System and method for retrieving internet data files using a screen-display telephone terminal
US6026437A (en) Method and system in a computer network for bundling and launching hypertext files and associated subroutines within archive files
CN1292372C (en) Data communications
US7216298B1 (en) System and method for automatic generation of HTML based interfaces including alternative layout modes
US20020032699A1 (en) User interface for network browser including pre processor for links embedded in hypermedia documents
US20010056578A1 (en) System for providing video-on-demand services in wireless network environment and method therefor
EP1952279B1 (en) A system and method for conducting a voice controlled search using a wireless mobile device
US6594348B1 (en) Voice browser and a method at a voice browser
US5802526A (en) System and method for graphically displaying and navigating through an interactive voice response menu
US7742759B2 (en) Methods and apparatuses for programming user-defined information into electronic devices
US7454714B2 (en) Visually distinguishing menu items
US5909551A (en) Interactive recording/reproducing medium and reproducing system
US8768329B2 (en) Methods and apparatuses for programming user-defined information into electronic devices
US6278449B1 (en) Apparatus and method for designating information to be retrieved over a computer network

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)