WO2001079986A2 - Electronic browser - Google Patents

Electronic browser Download PDF

Info

Publication number
WO2001079986A2
WO2001079986A2 PCT/GB2001/001788 GB0101788W WO0179986A2 WO 2001079986 A2 WO2001079986 A2 WO 2001079986A2 GB 0101788 W GB0101788 W GB 0101788W WO 0179986 A2 WO0179986 A2 WO 0179986A2
Authority
WO
WIPO (PCT)
Prior art keywords
information
operable
unit
accordance
speech
Prior art date
Application number
PCT/GB2001/001788
Other languages
French (fr)
Other versions
WO2001079986A3 (en
Inventor
Trevor Douglas Shonfeld
David John Bending
Original Assignee
Roundpoint Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Roundpoint Inc. filed Critical Roundpoint Inc.
Priority to AU93361/01A priority Critical patent/AU9336101A/en
Publication of WO2001079986A2 publication Critical patent/WO2001079986A2/en
Publication of WO2001079986A3 publication Critical patent/WO2001079986A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML

Definitions

  • This invention relates to a browser for receiving data in the form of mark-up language, and converting the data into a form presentable to a user.
  • the invention is particularly applicable to implementation in a motor vehicle, or for use by a visually impaired person.
  • Mark-up languages and particularly hypertext mark-up language (HTML), are used extensively for describing information stored and accessible via the Internet. Data defined in a mark-up language can be retrieved by a browser, such as Microsoft Internet Explorer, for conversion into a graphical display on a personal computer. Mark-up languages are so called because they comprise sets of mark-up tags. Tags are configurational characters embedded in data, which a browser intercepts and uses to configure the presentation of the data in the intended manner. HTML is popular because it is substantially platform independent, and defines a set of tags which provide a useful range of data presentation options .
  • An aspect of the invention provides a browser capable of generating an audio output, and of receiving user inputs to cause navigation through a plurality of units of information, up and down and across levels of a hierarchical structure.
  • the user input commands may be spoken or caused by activation of a manual input.
  • Another aspect of the invention provides a browser capable of receiving a mark-up text input, and of analysing the mark-up text to provide a spoken output.
  • the audio output may be modulated by tags in the mark-up text input, and the browser may include means for monitoring tags, and making modulating changes to the corresponding audio output in accordance with selected tags.
  • the browser may include means for mapping between tags intended for modifying textual output, such as tags from the basic HTML set, and tags for modulating a speech generating means of the browser.
  • tags intended for modifying textual output
  • Common speech engines also operate on the basis of text based input with tags inserted therein, to control the nature of the speech output. These speech tags have no natural correspondence with tags for configuring displayed information such as HTML tags.
  • the browser of the invention includes means for mapping between display tags and speech tags.
  • the mapping means may be preconfigured, so that the user does not have to make any modifications to the browser in order to produce a suitable speech output, with the speech output modulated in line with changes to display characteristics inserted through the output data.
  • the mapping means may be a look-up table.
  • Another aspect of the invention provides a plug-in capable of execution alongside a browser and a speech engine, for intercepting mark-up output from the browser and converting mark-up tags in said output into tags for configuration of the speech engine, prior to delivery of the converted output into the speech engine.
  • Another aspect of the invention provides apparatus for browsing information received from a remote location, the apparatus including means for automatically converting text configuration signal in said information into speech engine configuration signals, for configuring a speech engine to generate a corresponding speech output.
  • the apparatus comprises a controller switchable to receive an audio input, to recognise spoken commands in said input, and to control operation of said apparatus to deliver a speech output.
  • Figure 1 is a schematic diagram of a system incorporating a vehicle entertainment unit of a first embodiment of the present invention
  • FIG 2 is a schematic diagram of the entertainment unit illustrated in Figure 1;
  • FIG 3 is a schematic diagram of a tag mapping unit of an audio browser of the entertainment unit illustrated in Figure 2 ;
  • Figure 4 is a schematic diagram of a speech generation unit of the entertainment unit illustrated in Figure 2;
  • Figure 5 is a schematic diagram of an abridging unit of the speech generation unit illustrated in Figure 4.
  • Figure 6 is a schematic diagram of a fundamental file structure for use with an information product in the first specific embodiment of the invention.
  • Figure 7 is a schematic diagram of an information product having the file structure illustrated in Figure 6;
  • Figure 8 is a flow diagram showing response of the specific embodiment to user actions
  • Figure 9 is a flow diagram illustrating the mode of operation of a browser unit of the entertainment system illustrated in Figure 2;
  • Figure 10 is a flow diagram illustrating the mode of operation of the tag mapping unit illustrated in Figure 3;
  • Figure 11 is a flow diagram illustrating the mode of operation of a speed processing unit of the abridging unit illustrated in Figure 5;
  • Figure 12 is a flow diagram illustrating the mode of operation of a hierarchy sorter of the abridging unit illustrated in Figure 5;
  • Figure 13 is a system incorporating a vehicle entertainment unit of a second specific embodiment of the invention.
  • Figure 14 is a schematic diagram of the vehicle entertainment unit illustrated in Figure 13;
  • Figure 15 is a schematic diagram of a radio scanning browser unit of the vehicle entertainment unit illustrated in Figure 14.
  • Figure 16 is a schematic diagram of a system incorporating a vehicle entertainment unit of a third specific embodiment of the invention.
  • a system 10 illustrated in Figure 1 comprises a vehicle entertainment unit 100, for installation in a motor vehicle, preferably in the dashboard thereof.
  • the entertainment unit 100 is operable to receive operator inputs by a control panel 102 and/or a microphone 104, and is operable to produce graphical and/or textual outputs on a display 106 and audio output via a loudspeaker 108.
  • the entertainment unit 100 comprises a memory card reader 120, for receiving a memory card 20 storing information in mark-up language form.
  • the memory card reader 120 reads the information from the memory card 20, and delivers this information to a text-to-speech browser 130, which converts information into signals to drive the display 106 and the loudspeaker 108.
  • the information stored on the memory card 20 is retrieved from a web server 30 storing a web page 32.
  • the web server 30 is accessible via the Internet 40 through a local server 42 and telephone network 44, by means of a personal computer 50 having a browser 52 and a corresponding memory card reader/writer 54 into which the memory card 20 can be installed.
  • the personal computer 50 accesses the web server 30 by means of a modem 56, and retrieves information from the web page 32, to be stored on the memory card 20.
  • a suitable format for the memory card 20 is the Memory Stick (trade mark) by Sony Electronics Limited. Memory Stick storage media are presently available in different capacities up to 64 MB.
  • the storage of the information can be by means of standard "save" instruction provided on the desktop of the PC 50. This causes the creation of a standard HTML file on the memory card. Several files can be retrieved and saved at the same time.
  • a user can retrieve a copy of the web page 32, which, for example, is a news based web page, onto . the personal computer 50. Thereafter, the copy of the web page 32 can be stored on the memory card 20. Then the memory card 20 can be transferred to the vehicle entertainment unit 100. In response to browsing commands received from the user, the vehicle entertainment unit 100 will then convert information held in one or more files on the memory card 20 in mark-up language form, into a speech output signal to drive the loudspeaker 108.
  • the control panel 102 and the microphone 104 are receptive of user commands, for example, to speed up or slow down delivery of the information as speech output, to increase/decrease the volume of the speech output, and browsing commands to select a different set of information to be presented to the user.
  • the text-to-speech browser 130 comprises a browser unit 132, which refers to the information on the memory card 20 via the memory card reader 120.
  • the browser unit 132 passes information to a tag mapping unit 134 which prepares the information for submission to a speech generation unit 136.
  • the browser unit 132 generates a display output, corresponding to the mark-up language retrieved from the memory card 20, directly to a display driver 140, for driving the display 106.
  • a display driver 140 for driving the display 106.
  • the display output and therefore the display 106 and the display driver 140 are optional features, which may be dispensed with, particularly in the case of legal restriction against the use of graphical displays in motor vehicles.
  • the tag mapping unit 134 includes a text tag recognition unit 160 which receives mark-up text in a stream from the browser unit 132.
  • the text tag recognition unit 160 refers to a tag map 162, comprising a text tag list 164 and a speech tag list 166, to find a speech tag corresponding to a recognised text tag. Then, a speech tag insertion unit 168 inserts the appropriate speech tag in the information.
  • the speech generation unit 136 includes an abridging unit 170 and a speech engine 190.
  • the speech engine 190 can be an off the shelf product, such as products by Lernout & Haupsie, Lucent, IBM and others.
  • the abridging unit 170 is responsive to user input commands to modify the text, with speech tags, delivered to the speech generation unit 136, to deliver speech tag text into the speech engine 190 in accordance with a required level of detail.
  • the operation of the abridging unit 170 will now be described with reference to Figures 5 and 8 .
  • the abridging unit 170 has a hierarchy sorter 172, which receives speech tagged text and sorts the tagged text into levels 174a, b, c, d of detail.
  • the levels of detail are determined by monitoring sub-headings in the web page. While this may be an approximation of the actual level of detail within the web page, it provides a useful way of distinguishing between general statements, which may be provided within the header or title and extremely high levels of detail which are likely to be placed several levels of sub-headings below the title level.
  • level 1 a title will be placed in level 1 (174a)
  • an introductory statement will be held in level 2 (174b)
  • a passage beneath a sub-heading will be placed in level 3 (174c)
  • level 4 (174d) will contain further statements contained in a passage beneath a sub-heading in the web page, perhaps such as text in smaller font or in italics.
  • the level of abridgement is requested indirectly, by a user requesting a particular output speed.
  • Current requested speed is held in a current requested speed storage unit 178.
  • This is compared with a predetermined maximum speed 176, by a speed processing unit 180, which then sets an abridgement level, held in an abridgement level storage unit 182, and a current speed for the speech engine, held in a current speed storage unit 184.
  • the text is released to the speech engine 190 in accordance with the level of the abridgement requested by the user.
  • the entertainment system 100 is to be used in conjunction with information presented in the form of a file structure of units of information as illustrated in Figures 6 and 17.
  • Figure 6 illustrates the general arrangement of an information file structure, containing a plurality of interlinked files, each containing information held in HTML format.
  • a category file 1 lists a series of categories, each of which is a hypertext link to a corresponding sub-category file 2A to 2D.
  • Each sub-category file 2A to 2D contains a series of sub- categories with item lists, with each item list containing a list of hyperlinks to item files 3.
  • FIG. 7 A particular example of an information product, containing a file structure as shown in Figure 6 is illustrated in Figure 7.
  • Reference numerals in Figure 7 correspond with reference numerals in Figure 6, qualified by a prime mark ( ' ) .
  • Such a product as shown in Figure 7 could be provided on a memory stick 20 as previously described, either downloaded from a remote location by the user or purchased as a complete product.
  • each of the items in that file is a category heading, which is also a link to a second level file 2A', 2B' etc. If the user selects the "news" category, either by pressing a predetermined button while the "news" category heading is being read out by the text-to-speech browser, or by speaking a predetermined command at the same time, then the text-to-speech browser will commence reading the contents of file 2A' .
  • This file lists sub-categories and corresponding- item lists, the items being news stories. If, in that case, the user requested that the "next item” be read out, the text-to- speech browser would pass to file 2B', to read the headings contained in the "sport" category.
  • file 3A(1)' is read out.
  • the reading of the contents of that file is governed by the abridging unit 170, which analyses section headings and sub-section headings to establish which parts of the document should be read out.
  • the text-to-speech browser passes to file 3A(2) * , which is the next file in the third level of the file hierarchy.
  • Operation commences on installation of information into the vehicle entertainment unit 100.
  • operation commences automatically, but this could also be initialised by receipt of a user input.
  • Operation commences with step S8-2, where the highest level file is identified and the text-to-speech browser starts reading the contents thereof.
  • the user input monitor 110 then carries out a series of input monitoring steps, on both the control panel 102 and speech commands received from the microphone 104. If a valid user input is recognised, then an appropriate action is taken. Firstly, in step S8-4, a check is made as to whether a request has been received for reading to stop. If so, then in step S8-6, the text-to-speech browser 130 stops reading. Then, in step S8-8, the user input monitor monitors for a user input requesting the text-to-speech browser 130 to start reading again. This step continues to monitor until such a request is received, when the routine returns to step S8-2.
  • step S8-4 the user input monitor checks whether an item selection request has been received, in relation to an item link being read at that time.
  • an item selection request is deemed to have been made while that item is being read. If so, then in step S8-12, the contents of the linked item are read out. In that case, file 2A' in Figure 7 would be read out, comprising lists of news items held in sub- categories.
  • the user input monitor continues to monitor the inputs from S8-4 onwards.
  • step S8-14 the user input monitor checks whether a request has been received for the attention of the text-to-speech browser to move up one level in the file structure. This means that if file 2A' is currently being read out, the user input directs the browser to return to file 1 ' , which is the level directly above file 2A' . If so, then in step S8-16, the contents of the item one level above the present item is read out, and the operation continues from step S5-4 onwards .
  • step S8-18 the user input monitor monitors for a "previous item" request.
  • a previous item is one in the same level, such as in the level containing files 2A, 2B, 2C and 2D in Figure 6 or in the level containing files 3A(1), 3A(2), 3A(3), 3B(1), 3B(2), etc.
  • step S8-18 If the decision of S8-18 is made in the affirmative, then that previous item is read out in step S8-20. Operation then continues from step S8-4 onwards. If not, then in step S8-22 the user input monitor monitors the input for a "next item" request. If such a request is received, and a next item in the same level exists, then in step S8-24 the contents of that next item are read out.
  • the concept of a "next" item is interpreted in the same way as the concept of a "previous" item, all of the files in a given level being placed in a single list, such that all items and information are held in one sequence. After step S8-24, the operation continues from step S8-4 onwards.
  • step S8-22 If step S8-22 is answered in the negative, the operation proceeds to step S8-26, whereby a check is made as to whether the speech engine has completed reading the present item. If not, then the operation returns to step S8-4 to continue monitoring user inputs while the present item is being read. If the reading of the present item has been completed then, in step S8-28, a check is made as to whether a next item exists. If not, then the operation has ended, coming to the end of the list of items of information on that particular level. If a next item exists, then that next item is read out in step S8-24. Operation then continues from step S8-4 as before.
  • the above method of operation of the text-to-speech browser is merely one of many different ways in which user input actions can influence the operation of the browser.
  • additional functionality could be provided, so that the user does not need to progress to the lowest level of information in order to retrieve all new stories.
  • a single spoken command such as "read all stories” could be provided for a user, receipt of which by the user input monitor could cause the .text-to-speech browser to commence at file 3A(1), and to continue reading stories until the last of the files in that level is completed.
  • spoken input commands of higher complexity could be provided, to allow a user to use more natural language input. For example, it is conceivable that a user might request all stories relating to a particular topic, and then the text-to-speech browser searches for stories in that category and then reads them out in sequence.
  • step S9-2 the browser unit 132 monitors for a browsing command input by the user. If a browsing command is input, then in step S9-4, the browser unit 132 retrieves information from the memory card 20, on the basis of the browsing command received, from the user. Instructions can be received as selections of buttons on the control panel 102, or as spoken instructions received in the microphone 104.
  • the user input monitor 110 includes a facility for translating instructions received by either means into messages for the browser unit 132 or for configuration of the speech generation unit 136. The user input monitor may be given the facility of learning the characteristics of the voices of habitual users of the entertainment unit 100, enabling more accurate recognition of spoken commands and conversion thereof into messages.
  • the browser unit 132 passes a.stream of mark-up language data to the tag mapping unit 134 which, in step S9-6, intercepts mark-up tags in the mark-up language data output from the browser unit 132, and converts those mark-up tags into tags for driving a speech generation unit 136.
  • the speech generation unit 136 receives the converted mark-up language from the tag mapping unit 134 and, in step S9-8, generates a speech signal, which is passed to an amplifier 150 in step S9-10 for driving the loudspeaker 108. Characteristics of the speech signal, such as voice quality and amplitude, are controllable by means of speech tags included in the input to the speech generation unit 136.
  • step S10-2 the tag mapping unit 134 waits to receive a message from the browser unit 132 to commence reading received text in step S10-4.
  • step S10-6 the text tag recognition unit 160 monitors the stream of mark-up text until it identifies a text tag.
  • Text tags are used in mark-up language as instructions to a browser to generate a display.
  • step S10-8 the tag is passed to the tag map 162 of the tag mapping unit 134.
  • the text tag list 164 lists the standard set of tags available for use in the mark-up language. For example, a web page written in HTML will probably start with a heading, bounded by ⁇ hl> tags.
  • the ⁇ hl> tag in the text tag list 164 maps to a corresponding entry in the speech tag list 166.
  • the speech tag list 166 lists a set of available speech tags, which act as instructions to the speech generation unit 136.
  • the ⁇ hl> tag maps to a corresponding speech tag, to which the speech generation unit 136 is responsive to alter its output settings such that text bounded by the ⁇ hl> tags in the original input is output by the speech generation unit 136 in a neutral, announcement style voice.
  • the tag map 162 returns the appropriate entry in the speech tag list 166 to the speech tag insertion unit 168.
  • the speech tag insertion unit 168 replaces the text tag recognised by the text tag recognition unit 150 with the corresponding speech tag.
  • the output from the speech tag insertion unit is the mark-up language text, wherein the various text tags are converted into speech tags appropriate for driving the speech generation unit 136.
  • step S10-14 the text tag recognition unit 160 continues to read the input text. If in step S10-6 no tag is detected, a check is made in step S10-12 as to whether the end of the input text has been reached. If it has, then the text tag recognition unit 160 ceases operation. Otherwise, the unit 160 continues to read the text in step S10-14.
  • step Sll-2 the abridger monitors for a change of speed request from the user input monitor.
  • step Sll-4 it is determined whether the speed is to be increased or decreased. If the change of speed request is for the speed to be increased, then the current requested speed held in the current requested speed unit 178 is increased, in step SI1-6.
  • step SI1-8 the current requested speed is compared with a value held in the maximum speed unit 176.
  • the value held in the maximum speed unit 176 is preconfigured, and is chosen to correspond with the maximum speed of spoken output at the speech engine 190 while maintaining comprehensibility.
  • the speed processing unit 180 makes a comparison between the current requested speed 178 and the maximum speed 176. If the current requested speed exceeds the maximum speed 176 then in step Sll-10 the level of the abridgement is increased. This abridgement is stored in the abridgement level unit 182.
  • the abridgement level unit 182 controls the hierarchy sorter 172 such that the higher the level of abridgement stored in the abridgement level unit 182, the more levels 174 that are discarded by the hierarchy sorter 172. Thus, if the value held in the abridgement level unit 182 is zero, all levels are delivered to the speech engine,
  • step SI1-8 If, in step SI1-8, it is found that the current requested speed is not greater than the maximum speed, then in step Sll-12 the speed processing unit 180 increases the current speed and this value is stored in the current speed unit 184. After step Sll-12, the procedure returns to monitoring for speed change requests in step Sll-2.
  • step Sll-4 the change of speed request is found to be for the speed to be decreased then, in step Sll-14, the abridgement unit 170 decreases the currently requested speed held in the current requested speed unit 178.
  • a comparison is then made by the speed processing unit 180 between the current requested speed and the maximum speed value 176, in step Sll-16. If the current requested speed is less than the maximum speed then, in step Sll-18, the current speed, held in the current speed unit 184, is decreased. This causes the speech engine 190 to decrease its speed of delivery of spoken output. Otherwise, if the current requested speed is not less than the maximum speed, then the level of abridgement, as represented by the value held in the abridgement level unit 182, is decreased in step Sll-20. This causes more information to be delivered by the hierarchy sorter 172 to the speech engine 190.
  • the abridgement unit 170 can deliver more or less information to the speech engine 190 as required, if the requested speed of delivery of speech exceeds the maximum possible speed of delivery of speech to the user.
  • the text-to-speech browser 130 can be implemented in several different ways.
  • the text-to-speech browser 130 can be part of original equipment, supplied to an original equipment manufacturer (OEM) as a computer program on a disk for transmitting on a carrier signal.
  • OEM original equipment manufacturer
  • the OEM will then incorporate the text-to-speech browser 130 into the entertainment system 100.
  • Incorporation may comprise building the software into a larger software program. This could be made most convenient if the source code (high level programming language instructions) are supplied to the OEM.
  • the text-to-speech browser may be constructed by a home user, using a Java Applet or plug- in, either retrieved from a web page via the Internet, or received as a computer program product on a storage medium such as a magnetic or optical disks.
  • the text-to-speech browser 130 need not be provided as a single component.
  • the text- to-speech browser 130 may incorporate standard subcomponents corresponding to the browser unit 132 and the speech generation unit 136.
  • the browser unit can be a browser as will be found on various original equipment, as can the speech generation unit.
  • the tag mapping unit 134 can be downloaded or installed into the original equipment during manufacture or after supply, to consolidate the browser unit and the speech generation unit into the text-to-speech browser 130. Improved versions of the tag mapping unit 134 can be released and supplied to users after purchase of original equipment, to enable extended application of the principle of mapping between mark-up instructions for text and mark-up instructions for speech.
  • Figure 13 illustrates a second specific embodiment of the invention, making use of a one way communications channel to transmit information from an information server 250 to a user.
  • the information server 250 stores a plurality of information files 236, containing information for transmission to the user.
  • the information files 236 are passed to a transmitter 240, with a radio antenna 242, which continuously transmits each of the information files 236 on a radio signal.
  • the entertainment unit 200 comprises a radio receiver unit 220 with. a receiving antenna 218 extending therefrom, for receiving a radio signal from the transmitting antenna 242 of the transmitter 240.
  • the radio receiver unit 220 receives the information files 236 as transmitted between the antennas 242, 218, and stores the information files 236 in a memory buffer 222. It will be appreciated that not every information file will be received by the radio receiver unit 220 simultaneously; however, in this example, all information is continuously transmitted . by the transmitter 240.
  • a text-to-speech browser 230 browses information stored in the memory buffer 222, and sends browsing instructions . to the radio receiver 220, for possible change of receiving frequency. This will allow the text-to-speech browser 230 to be responsive to user instructions to receive a further information file 236, for that information file to be received in the radio receiver unit 220 and to be stored in the memory buffer 222 for browsing by the text-to- speech browser 230. Consequently, spoken output of a particular information file can be selected by a user.
  • the functional units of the entertainment unit 200 are identical to the functional units of the entertainment unit 100 illustrated in Figure 1. Even within the text- to-speech browser 230, the tag mapping unit 134 and the speech generation unit 136 are as described in relation to the first embodiment. Further, the text-to-speech browser 230 comprises a radio scanning browser 232. This unit refers to information held in the memory buffer 222, the contents of which are controlled by means of browsing instructions sent from the radio scanning browser 232 to the radio receiver unit 220.
  • the radio scanning browser 232 is illustrated in further detail in Figure 15.
  • the radio scanning browser 232 comprises an extension to the browser 132 described above in relation to the first embodiment.
  • browsing instructions received from the user input monitor 110 are interpreted by the browser and sent towards the radio receiver unit 220.
  • an information change request unit 233 Interposed between the browser 1-32 and the radio receiver unit 220 is an information change request unit 233. which interprets the browsing instructions sent by the browser 132 into a request for the radio receiver unit 220 to receive a different unit of information.
  • an entertainment unit 300 is provided, with a telephone control unit 320 and a text-to-speech browser 130.
  • the text-to-speech browser 130 is of identical functionality to that of the entertainment unit 100 illustrated in Figure 1.
  • the telephone control unit 320 allows connection of the entertainment unit 300 to a mobile telephone unit 322.
  • Mobile telephone unit 322 can be used to establish two way telephony connection with a base station 46, connected into a telephone network 44. Via the telephone network 44, the user of a mobile telephone 322 can access a local server 42 for access to the Internet 40, by means of which the previously described web server 30 can be accessed for retrieval of a copy of the information held in the web page 32. Browsing instructions are sent by the .text-to-speech browser 130, to the telephone control unit 320, by means of which information is retrieved from the web page 32 into the text-to-speech browser 130 for generation of a speech output at the loudspeaker 108.
  • the text-to-spee ⁇ h browser can be incorporated on a personal computer, such as a desktop PC or a portable PC, to supplement existing functionality thereof. This can allow a user to browse web pages on the Internet without having view of the VDU of a PC, thus a visually impaired user can browse for information over the Internet.
  • WAP Wireless Application protocol
  • WAP can also be incorporated into a mobile telephone, for the browsing of pages from websites accessible via the Internet; the text-to-speech browser illustrated by way of example above can also be applied to this device to allow a user to listen to a spoken output corresponding to text on a web page, and to control browsing by voice commands .
  • the present invention can be applied to a set top box for use with a television, for browsing the Internet.
  • This set top box incorporates a browser, to which the present invention can be applied.
  • digital televisions are available with browsers incorporated therein; the present invention can also be applied to such a device.
  • the wireless application protocol can be applied to devices limited only by miniaturisation technology. Wrist watches, electronic personal organisers and mobile telephones are capable of incorporating the present invention.
  • portable storage devices storing files of text based information could be offered for sale in retail outlets for consumers to purchase and use in their motor vehicles. This could substitute for purchase of a newspaper, allowing a user to listen to news articles while driving.
  • This storage medium could be an optical storage disk or a magnetic storage disk.
  • the storage medium could be read-only or recordable; in the latter case, the user could pay a fee into a machine in a retail outlet, which then loads the current day's news on to the storage medium, replacing the previous day's information.
  • data can be broadcast over TV or radio channels, via Teletext or via other broadcasting means such as satellites.
  • satellite networks are again appropriate, such as Iridium and INMARSAT.
  • cellular telephone networks which are wireless but also incorporate a network of land stations, are useful, as illustrated in such as that illustrated in Figure 14, and broad band networks such as cable networks can be used to increase data transmission rates.
  • Infra red transmission such as defined by IRDA standards can also be utilised to establish two way communication with a base station.
  • a refrigerator with computing capability to process information relating to food stored therein, can retrieve shopping and/or recipe information from remote locations, perhaps via the Internet, and by means of the present invention can cause generation of a spoken output of information potentially useful to the listener.
  • This type of application is capable of being implemented by means of Blue Tooth technology.

Abstract

A browser (130) is operable to receive character information from an information source (32), and to generate an audio speech output on the basis thereof. The browser is capable of responding to user input actions, either by manual actuation or by spoken commands, to navigate through units of information, in accordance with links between the units, to select a particular unit or units for output.

Description

ELECTRONIC BROWSER
This invention relates to a browser for receiving data in the form of mark-up language, and converting the data into a form presentable to a user. The invention is particularly applicable to implementation in a motor vehicle, or for use by a visually impaired person.
Mark-up languages, and particularly hypertext mark-up language (HTML), are used extensively for describing information stored and accessible via the Internet. Data defined in a mark-up language can be retrieved by a browser, such as Microsoft Internet Explorer, for conversion into a graphical display on a personal computer. Mark-up languages are so called because they comprise sets of mark-up tags. Tags are configurational characters embedded in data, which a browser intercepts and uses to configure the presentation of the data in the intended manner. HTML is popular because it is substantially platform independent, and defines a set of tags which provide a useful range of data presentation options .
There is now a demand for access to information in many different situations. For example, browsers are being developed which can be implemented on mobile telephones or other hand held devices. It would be desirable to provide a unit in a motor vehicle which could be used to access information.
However, visual output can be distracting to a driver, and in some jurisdictions it is illegal to place a display output in a vehicle where it can be viewed by the driver. Further, it is likely that the size of any display provided in a motor vehicle would be limited, and would not be suitable for the display of a large quantity of graphical or textual information.
An aspect of the invention provides a browser capable of generating an audio output, and of receiving user inputs to cause navigation through a plurality of units of information, up and down and across levels of a hierarchical structure.
The user input commands may be spoken or caused by activation of a manual input.
Another aspect of the invention provides a browser capable of receiving a mark-up text input, and of analysing the mark-up text to provide a spoken output. The audio output may be modulated by tags in the mark-up text input, and the browser may include means for monitoring tags, and making modulating changes to the corresponding audio output in accordance with selected tags.
The browser may include means for mapping between tags intended for modifying textual output, such as tags from the basic HTML set, and tags for modulating a speech generating means of the browser. Common speech engines also operate on the basis of text based input with tags inserted therein, to control the nature of the speech output. These speech tags have no natural correspondence with tags for configuring displayed information such as HTML tags. In one embodiment, the browser of the invention includes means for mapping between display tags and speech tags. The mapping means may be preconfigured, so that the user does not have to make any modifications to the browser in order to produce a suitable speech output, with the speech output modulated in line with changes to display characteristics inserted through the output data. The mapping means may be a look-up table.
Another aspect of the invention provides a plug-in capable of execution alongside a browser and a speech engine, for intercepting mark-up output from the browser and converting mark-up tags in said output into tags for configuration of the speech engine, prior to delivery of the converted output into the speech engine.
Another aspect of the invention provides apparatus for browsing information received from a remote location, the apparatus including means for automatically converting text configuration signal in said information into speech engine configuration signals, for configuring a speech engine to generate a corresponding speech output.
Preferably the apparatus comprises a controller switchable to receive an audio input, to recognise spoken commands in said input, and to control operation of said apparatus to deliver a speech output.
Further aspects and advantages of the invention may become apparent from the following description of specific embodiments of the invention, provided by way of example only, with reference to the accompanying drawings, in which:
Figure 1 is a schematic diagram of a system incorporating a vehicle entertainment unit of a first embodiment of the present invention;
Figure 2 is a schematic diagram of the entertainment unit illustrated in Figure 1;
Figure 3 is a schematic diagram of a tag mapping unit of an audio browser of the entertainment unit illustrated in Figure 2 ;
Figure 4 is a schematic diagram of a speech generation unit of the entertainment unit illustrated in Figure 2;
Figure 5 is a schematic diagram of an abridging unit of the speech generation unit illustrated in Figure 4;
Figure 6 is a schematic diagram of a fundamental file structure for use with an information product in the first specific embodiment of the invention;
Figure 7 is a schematic diagram of an information product having the file structure illustrated in Figure 6;
Figure 8 is a flow diagram showing response of the specific embodiment to user actions; Figure 9 is a flow diagram illustrating the mode of operation of a browser unit of the entertainment system illustrated in Figure 2;
Figure 10 is a flow diagram illustrating the mode of operation of the tag mapping unit illustrated in Figure 3;
Figure 11 is a flow diagram illustrating the mode of operation of a speed processing unit of the abridging unit illustrated in Figure 5;
Figure 12 is a flow diagram illustrating the mode of operation of a hierarchy sorter of the abridging unit illustrated in Figure 5;
Figure 13 is a system incorporating a vehicle entertainment unit of a second specific embodiment of the invention;
Figure 14 is a schematic diagram of the vehicle entertainment unit illustrated in Figure 13;
Figure 15 is a schematic diagram of a radio scanning browser unit of the vehicle entertainment unit illustrated in Figure 14; and
Figure 16 is a schematic diagram of a system incorporating a vehicle entertainment unit of a third specific embodiment of the invention;
In a first embodiment, a system 10 illustrated in Figure 1 comprises a vehicle entertainment unit 100, for installation in a motor vehicle, preferably in the dashboard thereof. The entertainment unit 100 is operable to receive operator inputs by a control panel 102 and/or a microphone 104, and is operable to produce graphical and/or textual outputs on a display 106 and audio output via a loudspeaker 108. The entertainment unit 100 comprises a memory card reader 120, for receiving a memory card 20 storing information in mark-up language form. The memory card reader 120 reads the information from the memory card 20, and delivers this information to a text-to-speech browser 130, which converts information into signals to drive the display 106 and the loudspeaker 108.
The information stored on the memory card 20 is retrieved from a web server 30 storing a web page 32. The web server 30 is accessible via the Internet 40 through a local server 42 and telephone network 44, by means of a personal computer 50 having a browser 52 and a corresponding memory card reader/writer 54 into which the memory card 20 can be installed.
The personal computer 50 accesses the web server 30 by means of a modem 56, and retrieves information from the web page 32, to be stored on the memory card 20. A suitable format for the memory card 20 is the Memory Stick (trade mark) by Sony Electronics Limited. Memory Stick storage media are presently available in different capacities up to 64 MB.
The storage of the information can be by means of standard "save" instruction provided on the desktop of the PC 50. This causes the creation of a standard HTML file on the memory card. Several files can be retrieved and saved at the same time.
A user can retrieve a copy of the web page 32, which, for example, is a news based web page, onto . the personal computer 50. Thereafter, the copy of the web page 32 can be stored on the memory card 20. Then the memory card 20 can be transferred to the vehicle entertainment unit 100. In response to browsing commands received from the user, the vehicle entertainment unit 100 will then convert information held in one or more files on the memory card 20 in mark-up language form, into a speech output signal to drive the loudspeaker 108. The control panel 102 and the microphone 104 are receptive of user commands, for example, to speed up or slow down delivery of the information as speech output, to increase/decrease the volume of the speech output, and browsing commands to select a different set of information to be presented to the user.
The text-to-speech browser 130 comprises a browser unit 132, which refers to the information on the memory card 20 via the memory card reader 120. The browser unit 132 passes information to a tag mapping unit 134 which prepares the information for submission to a speech generation unit 136.
Further, the browser unit 132 generates a display output, corresponding to the mark-up language retrieved from the memory card 20, directly to a display driver 140, for driving the display 106. It will be understood that the display output and therefore the display 106 and the display driver 140 are optional features, which may be dispensed with, particularly in the case of legal restriction against the use of graphical displays in motor vehicles.
The tag mapping unit 134, as illustrated in further detail in Figure 3, includes a text tag recognition unit 160 which receives mark-up text in a stream from the browser unit 132. The text tag recognition unit 160 refers to a tag map 162, comprising a text tag list 164 and a speech tag list 166, to find a speech tag corresponding to a recognised text tag. Then, a speech tag insertion unit 168 inserts the appropriate speech tag in the information.
As illustrated further in Figure 4, the speech generation unit 136 includes an abridging unit 170 and a speech engine 190. The speech engine 190 can be an off the shelf product, such as products by Lernout & Haupsie, Lucent, IBM and others.
The abridging unit 170 is responsive to user input commands to modify the text, with speech tags, delivered to the speech generation unit 136, to deliver speech tag text into the speech engine 190 in accordance with a required level of detail. The operation of the abridging unit 170 will now be described with reference to Figures 5 and 8 .
The abridging unit 170 has a hierarchy sorter 172, which receives speech tagged text and sorts the tagged text into levels 174a, b, c, d of detail. The levels of detail are determined by monitoring sub-headings in the web page. While this may be an approximation of the actual level of detail within the web page, it provides a useful way of distinguishing between general statements, which may be provided within the header or title and extremely high levels of detail which are likely to be placed several levels of sub-headings below the title level.
Accordingly, a title will be placed in level 1 (174a), an introductory statement will be held in level 2 (174b), a passage beneath a sub-heading will be placed in level 3 (174c), and level 4 (174d) will contain further statements contained in a passage beneath a sub-heading in the web page, perhaps such as text in smaller font or in italics.
The level of abridgement is requested indirectly, by a user requesting a particular output speed. Current requested speed is held in a current requested speed storage unit 178. This is compared with a predetermined maximum speed 176, by a speed processing unit 180, which then sets an abridgement level, held in an abridgement level storage unit 182, and a current speed for the speech engine, held in a current speed storage unit 184.
The text is released to the speech engine 190 in accordance with the level of the abridgement requested by the user.
The entertainment system 100 is to be used in conjunction with information presented in the form of a file structure of units of information as illustrated in Figures 6 and 17. Figure 6 illustrates the general arrangement of an information file structure, containing a plurality of interlinked files, each containing information held in HTML format. A category file 1 lists a series of categories, each of which is a hypertext link to a corresponding sub-category file 2A to 2D.
Each sub-category file 2A to 2D contains a series of sub- categories with item lists, with each item list containing a list of hyperlinks to item files 3.
A particular example of an information product, containing a file structure as shown in Figure 6 is illustrated in Figure 7. Reference numerals in Figure 7 correspond with reference numerals in Figure 6, qualified by a prime mark ( ' ) . Such a product as shown in Figure 7 could be provided on a memory stick 20 as previously described, either downloaded from a remote location by the user or purchased as a complete product.
When output of the information is commenced, the contents of file 1' are read out. Each of the items in that file is a category heading, which is also a link to a second level file 2A', 2B' etc. If the user selects the "news" category, either by pressing a predetermined button while the "news" category heading is being read out by the text-to-speech browser, or by speaking a predetermined command at the same time, then the text-to-speech browser will commence reading the contents of file 2A' . This file lists sub-categories and corresponding- item lists, the items being news stories. If, in that case, the user requested that the "next item" be read out, the text-to- speech browser would pass to file 2B', to read the headings contained in the "sport" category.
Returning to the contents of file 2A' , if the user then selects the reading of local news story 1, which is item 1 in list Al ' in file 2A' , then file 3A(1)' is read out. The reading of the contents of that file is governed by the abridging unit 170, which analyses section headings and sub-section headings to establish which parts of the document should be read out. Once the reading of that news story is complete, the text-to-speech browser passes to file 3A(2) * , which is the next file in the third level of the file hierarchy.
Operation in relation to the file structure illustrated in. Figures 6 and 7 will be described with reference to Figure 8.
Operation commences on installation of information into the vehicle entertainment unit 100. In the illustrated example, operation commences automatically, but this could also be initialised by receipt of a user input. Operation commences with step S8-2, where the highest level file is identified and the text-to-speech browser starts reading the contents thereof.
The user input monitor 110 then carries out a series of input monitoring steps, on both the control panel 102 and speech commands received from the microphone 104. If a valid user input is recognised, then an appropriate action is taken. Firstly, in step S8-4, a check is made as to whether a request has been received for reading to stop. If so, then in step S8-6, the text-to-speech browser 130 stops reading. Then, in step S8-8, the user input monitor monitors for a user input requesting the text-to-speech browser 130 to start reading again. This step continues to monitor until such a request is received, when the routine returns to step S8-2.
If a stop reading request is not received in step S8-4, then the user input monitor checks whether an item selection request has been received, in relation to an item link being read at that time. Thus, in the example shown in Figure 7, if the text-to-speech browser 130 is reading the words "NEWS" from unit 1' , and such a request is received, then an item selection request is deemed to have been made while that item is being read. If so, then in step S8-12, the contents of the linked item are read out. In that case, file 2A' in Figure 7 would be read out, comprising lists of news items held in sub- categories. Following step S8-12, the user input monitor continues to monitor the inputs from S8-4 onwards.
If no item selection request is received while a link to an item is being read, then in step S8-14, the user input monitor checks whether a request has been received for the attention of the text-to-speech browser to move up one level in the file structure. This means that if file 2A' is currently being read out, the user input directs the browser to return to file 1 ' , which is the level directly above file 2A' . If so, then in step S8-16, the contents of the item one level above the present item is read out, and the operation continues from step S5-4 onwards .
If no "up one level" request is received, or if such a request is received but no level above the present level exists, then in step S8-18, the user input monitor monitors for a "previous item" request. A previous item is one in the same level, such as in the level containing files 2A, 2B, 2C and 2D in Figure 6 or in the level containing files 3A(1), 3A(2), 3A(3), 3B(1), 3B(2), etc. Thus, if the text-to-speech browser is currently reading item B2.1, which is held in file 3B(2), and the user input monitor 110 receives a "previous item" request, this is interpreted as a request that the text-to-speech browser 130 commences reading item Bl.l held in file 3B(1).
The lists of items held in the series of files 3A, 3B, 3C and 3D can be incorporated into a single list with four separate entry points. In that way, file 3B(1) has a corresponding previous file, which is the last file in the series of files 3A. Thus, if a "previous item" request is received when item Bl.l is being read out, the decision of step S8-18 would be made in the affirmative.
If the decision of S8-18 is made in the affirmative, then that previous item is read out in step S8-20. Operation then continues from step S8-4 onwards. If not, then in step S8-22 the user input monitor monitors the input for a "next item" request. If such a request is received, and a next item in the same level exists, then in step S8-24 the contents of that next item are read out. The concept of a "next" item is interpreted in the same way as the concept of a "previous" item, all of the files in a given level being placed in a single list, such that all items and information are held in one sequence. After step S8-24, the operation continues from step S8-4 onwards.
If step S8-22 is answered in the negative, the operation proceeds to step S8-26, whereby a check is made as to whether the speech engine has completed reading the present item. If not, then the operation returns to step S8-4 to continue monitoring user inputs while the present item is being read. If the reading of the present item has been completed then, in step S8-28, a check is made as to whether a next item exists. If not, then the operation has ended, coming to the end of the list of items of information on that particular level. If a next item exists, then that next item is read out in step S8-24. Operation then continues from step S8-4 as before.
The above method of operation of the text-to-speech browser is merely one of many different ways in which user input actions can influence the operation of the browser. For instance, additional functionality could be provided, so that the user does not need to progress to the lowest level of information in order to retrieve all new stories. A single spoken command, such as "read all stories" could be provided for a user, receipt of which by the user input monitor could cause the .text-to-speech browser to commence at file 3A(1), and to continue reading stories until the last of the files in that level is completed. Further, spoken input commands of higher complexity could be provided, to allow a user to use more natural language input. For example, it is conceivable that a user might request all stories relating to a particular topic, and then the text-to-speech browser searches for stories in that category and then reads them out in sequence.
The sequence of steps exemplifying operation of the text- to-speech browser 130 is set out in Figure 9. In step S9-2, the browser unit 132 monitors for a browsing command input by the user. If a browsing command is input, then in step S9-4, the browser unit 132 retrieves information from the memory card 20, on the basis of the browsing command received, from the user. Instructions can be received as selections of buttons on the control panel 102, or as spoken instructions received in the microphone 104. The user input monitor 110 includes a facility for translating instructions received by either means into messages for the browser unit 132 or for configuration of the speech generation unit 136. The user input monitor may be given the facility of learning the characteristics of the voices of habitual users of the entertainment unit 100, enabling more accurate recognition of spoken commands and conversion thereof into messages.
The browser unit 132 passes a.stream of mark-up language data to the tag mapping unit 134 which, in step S9-6, intercepts mark-up tags in the mark-up language data output from the browser unit 132, and converts those mark-up tags into tags for driving a speech generation unit 136. The speech generation unit 136 receives the converted mark-up language from the tag mapping unit 134 and, in step S9-8, generates a speech signal, which is passed to an amplifier 150 in step S9-10 for driving the loudspeaker 108. Characteristics of the speech signal, such as voice quality and amplitude, are controllable by means of speech tags included in the input to the speech generation unit 136.
Operation of the tag mapping unit 134 is illustrated in Figure 10. In step S10-2, the tag mapping unit 134 waits to receive a message from the browser unit 132 to commence reading received text in step S10-4. In step S10-6, the text tag recognition unit 160 monitors the stream of mark-up text until it identifies a text tag. Text tags are used in mark-up language as instructions to a browser to generate a display. When a text tag is identified, in step S10-8 the tag is passed to the tag map 162 of the tag mapping unit 134. The text tag list 164 lists the standard set of tags available for use in the mark-up language. For example, a web page written in HTML will probably start with a heading, bounded by <hl> tags. As illustrated, the <hl> tag in the text tag list 164 maps to a corresponding entry in the speech tag list 166. The speech tag list 166 lists a set of available speech tags, which act as instructions to the speech generation unit 136. In this example, the <hl> tag maps to a corresponding speech tag, to which the speech generation unit 136 is responsive to alter its output settings such that text bounded by the <hl> tags in the original input is output by the speech generation unit 136 in a neutral, announcement style voice.
Once a text tag has been recognised by the text tag recognition unit 160, and has been delivered to the tag map 162, the tag map 162 returns the appropriate entry in the speech tag list 166 to the speech tag insertion unit 168. In step SlO-10, the speech tag insertion unit 168 replaces the text tag recognised by the text tag recognition unit 150 with the corresponding speech tag. The output from the speech tag insertion unit is the mark-up language text, wherein the various text tags are converted into speech tags appropriate for driving the speech generation unit 136.
Thereafter, in step S10-14, the text tag recognition unit 160 continues to read the input text. If in step S10-6 no tag is detected, a check is made in step S10-12 as to whether the end of the input text has been reached. If it has, then the text tag recognition unit 160 ceases operation. Otherwise, the unit 160 continues to read the text in step S10-14.
Operation of the abridger will now be described in conjunction with Figure 11. In step Sll-2, the abridger monitors for a change of speed request from the user input monitor. When a change of speed request is received, in step Sll-4, it is determined whether the speed is to be increased or decreased. If the change of speed request is for the speed to be increased, then the current requested speed held in the current requested speed unit 178 is increased, in step SI1-6.
In step SI1-8, the current requested speed is compared with a value held in the maximum speed unit 176. The value held in the maximum speed unit 176 is preconfigured, and is chosen to correspond with the maximum speed of spoken output at the speech engine 190 while maintaining comprehensibility. The speed processing unit 180 makes a comparison between the current requested speed 178 and the maximum speed 176. If the current requested speed exceeds the maximum speed 176 then in step Sll-10 the level of the abridgement is increased. This abridgement is stored in the abridgement level unit 182.
The abridgement level unit 182 controls the hierarchy sorter 172 such that the higher the level of abridgement stored in the abridgement level unit 182, the more levels 174 that are discarded by the hierarchy sorter 172. Thus, if the value held in the abridgement level unit 182 is zero, all levels are delivered to the speech engine,
whereas if the value held in the abridgement level unit
182 is 3 then three levels will be abridged and only level 1 (174a) illustrated in Figure 5 will be delivered to the speech engine (190).
If, in step SI1-8, it is found that the current requested speed is not greater than the maximum speed, then in step Sll-12 the speed processing unit 180 increases the current speed and this value is stored in the current speed unit 184. After step Sll-12, the procedure returns to monitoring for speed change requests in step Sll-2.
If, in step Sll-4, the change of speed request is found to be for the speed to be decreased then, in step Sll-14, the abridgement unit 170 decreases the currently requested speed held in the current requested speed unit 178. A comparison is then made by the speed processing unit 180 between the current requested speed and the maximum speed value 176, in step Sll-16. If the current requested speed is less than the maximum speed then, in step Sll-18, the current speed, held in the current speed unit 184, is decreased. This causes the speech engine 190 to decrease its speed of delivery of spoken output. Otherwise, if the current requested speed is not less than the maximum speed, then the level of abridgement, as represented by the value held in the abridgement level unit 182, is decreased in step Sll-20. This causes more information to be delivered by the hierarchy sorter 172 to the speech engine 190.
In that way, the abridgement unit 170 can deliver more or less information to the speech engine 190 as required, if the requested speed of delivery of speech exceeds the maximum possible speed of delivery of speech to the user.
The text-to-speech browser 130 can be implemented in several different ways. In practice, the text-to-speech browser 130 can be part of original equipment, supplied to an original equipment manufacturer (OEM) as a computer program on a disk for transmitting on a carrier signal. The OEM will then incorporate the text-to-speech browser 130 into the entertainment system 100. Incorporation may comprise building the software into a larger software program. This could be made most convenient if the source code (high level programming language instructions) are supplied to the OEM.
Alternatively, the text-to-speech browser may be constructed by a home user, using a Java Applet or plug- in, either retrieved from a web page via the Internet, or received as a computer program product on a storage medium such as a magnetic or optical disks.
The text-to-speech browser 130 need not be provided as a single component. In fact, as described above, the text- to-speech browser 130 may incorporate standard subcomponents corresponding to the browser unit 132 and the speech generation unit 136. The browser unit can be a browser as will be found on various original equipment, as can the speech generation unit. The tag mapping unit 134 can be downloaded or installed into the original equipment during manufacture or after supply, to consolidate the browser unit and the speech generation unit into the text-to-speech browser 130. Improved versions of the tag mapping unit 134 can be released and supplied to users after purchase of original equipment, to enable extended application of the principle of mapping between mark-up instructions for text and mark-up instructions for speech.
Figure 13 illustrates a second specific embodiment of the invention, making use of a one way communications channel to transmit information from an information server 250 to a user.
The information server 250 stores a plurality of information files 236, containing information for transmission to the user. The information files 236 are passed to a transmitter 240, with a radio antenna 242, which continuously transmits each of the information files 236 on a radio signal.
An in-car entertainment unit 200 is illustrated in Figure 13, those parts thereof having substantially common function with the corresponding parts in the example illustrated in Figure 1 being assigned identical reference numerals. The entertainment unit 200 comprises a radio receiver unit 220 with. a receiving antenna 218 extending therefrom, for receiving a radio signal from the transmitting antenna 242 of the transmitter 240. The radio receiver unit 220 receives the information files 236 as transmitted between the antennas 242, 218, and stores the information files 236 in a memory buffer 222. It will be appreciated that not every information file will be received by the radio receiver unit 220 simultaneously; however, in this example, all information is continuously transmitted . by the transmitter 240.
A text-to-speech browser 230, of substantially the same construction as the text-to-speech browser illustrated in Figure 1, browses information stored in the memory buffer 222, and sends browsing instructions . to the radio receiver 220, for possible change of receiving frequency. This will allow the text-to-speech browser 230 to be responsive to user instructions to receive a further information file 236, for that information file to be received in the radio receiver unit 220 and to be stored in the memory buffer 222 for browsing by the text-to- speech browser 230. Consequently, spoken output of a particular information file can be selected by a user.
From Figure 14, it will be appreciated that several of the functional units of the entertainment unit 200 are identical to the functional units of the entertainment unit 100 illustrated in Figure 1. Even within the text- to-speech browser 230, the tag mapping unit 134 and the speech generation unit 136 are as described in relation to the first embodiment. Further, the text-to-speech browser 230 comprises a radio scanning browser 232. This unit refers to information held in the memory buffer 222, the contents of which are controlled by means of browsing instructions sent from the radio scanning browser 232 to the radio receiver unit 220.
The radio scanning browser 232 is illustrated in further detail in Figure 15. The radio scanning browser 232 comprises an extension to the browser 132 described above in relation to the first embodiment. In addition to the browser unit 132, which receives information from the memory buffer and sends information to the tag mapping unit 134, browsing instructions received from the user input monitor 110 are interpreted by the browser and sent towards the radio receiver unit 220. Interposed between the browser 1-32 and the radio receiver unit 220 is an information change request unit 233. which interprets the browsing instructions sent by the browser 132 into a request for the radio receiver unit 220 to receive a different unit of information. A third specific embodiment will now be described with reference to Figure 16. In Figure 16, an entertainment unit 300 is provided, with a telephone control unit 320 and a text-to-speech browser 130. The text-to-speech browser 130 is of identical functionality to that of the entertainment unit 100 illustrated in Figure 1.
In this case, the telephone control unit 320 allows connection of the entertainment unit 300 to a mobile telephone unit 322. Mobile telephone unit 322 can be used to establish two way telephony connection with a base station 46, connected into a telephone network 44. Via the telephone network 44, the user of a mobile telephone 322 can access a local server 42 for access to the Internet 40, by means of which the previously described web server 30 can be accessed for retrieval of a copy of the information held in the web page 32. Browsing instructions are sent by the .text-to-speech browser 130, to the telephone control unit 320, by means of which information is retrieved from the web page 32 into the text-to-speech browser 130 for generation of a speech output at the loudspeaker 108.
Whereas the function of abridgement has been described above interlinked with user selection of a comfortable listening speed, it would be possible to separate these two functions, allowing a user to select a level of abridgement without also maximising the speed of delivery of speech output.
The present invention has been illustrated by way of example in relation to an in-car entertainment system. However, it will be appreciated that various different applications of the present invention are also envisaged. For example, the text-to-speeσh browser can be incorporated on a personal computer, such as a desktop PC or a portable PC, to supplement existing functionality thereof. This can allow a user to browse web pages on the Internet without having view of the VDU of a PC, thus a visually impaired user can browse for information over the Internet.
Increasingly, hand held computers and other personal digital apparatus (PDA) are becoming available, with the capability to browse the Internet, either through connection to a land based telephone network or through a wireless protocol such as Wireless Application protocol (WAP). WAP can also be incorporated into a mobile telephone, for the browsing of pages from websites accessible via the Internet; the text-to-speech browser illustrated by way of example above can also be applied to this device to allow a user to listen to a spoken output corresponding to text on a web page, and to control browsing by voice commands .
Further, the present invention can be applied to a set top box for use with a television, for browsing the Internet. This set top box incorporates a browser, to which the present invention can be applied. Further, digital televisions are available with browsers incorporated therein; the present invention can also be applied to such a device.
The wireless application protocol can be applied to devices limited only by miniaturisation technology. Wrist watches, electronic personal organisers and mobile telephones are capable of incorporating the present invention.
Further, while the present invention has been illustrated by way of three examples for the retrieval of information for browsing, comprising downloading from the Internet on to a portable memory storage device, instantly downloading from an information source over a one way communications channel (radio communication) and instantly downloading from a website or other information source accessible via the Internet over a two way wireless communication channel, other manners of retrieving information for browsing are envisaged.
For example, portable storage devices, storing files of text based information could be offered for sale in retail outlets for consumers to purchase and use in their motor vehicles. This could substitute for purchase of a newspaper, allowing a user to listen to news articles while driving. This storage medium could be an optical storage disk or a magnetic storage disk. The storage medium could be read-only or recordable; in the latter case, the user could pay a fee into a machine in a retail outlet, which then loads the current day's news on to the storage medium, replacing the previous day's information.
In the case of instant one way communication, data can be broadcast over TV or radio channels, via Teletext or via other broadcasting means such as satellites.
In the case of instant two way communication, satellite networks are again appropriate, such as Iridium and INMARSAT. Further, cellular telephone networks, which are wireless but also incorporate a network of land stations, are useful, as illustrated in such as that illustrated in Figure 14, and broad band networks such as cable networks can be used to increase data transmission rates. Infra red transmission such as defined by IRDA standards can also be utilised to establish two way communication with a base station.
Further, various wireless home protocols are currently in development to connect appliances together. This can provide for the transfer of information to a user from various appliances and from remote information sources. Thus, a refrigerator with computing capability, to process information relating to food stored therein, can retrieve shopping and/or recipe information from remote locations, perhaps via the Internet, and by means of the present invention can cause generation of a spoken output of information potentially useful to the listener. This would allow a user to configure the refrigerator to cause generation of a speech output of a recipe, and to play back the recipe as quickly or as slowly as the user required. This type of application is capable of being implemented by means of Blue Tooth technology.

Claims

1. Information processing and output apparatus comprising: an information receiver operable to receive information in a form for conversion into a display signal for output; a display configuration command detector operable to detect in information received by said information receiver display configuration commands; and a speech generator operable to generate an audio speech output signal corresponding with selected portions of information received by said receiver, said speech generator being operable to select portions of information for which an audio speech output signal is to be generated upon the basis of display configuration commands associated with said portions of information detected by said display configuration command detector.
2. Apparatus in accordance with claim 1, wherein said display configuration command detector is operable to detect display configuration commands being members of a set of permitted display configuration commands .
3. Apparatus in accordance with claim 2, wherein said apparatus further comprises: an information processing unit operable to process information received by said receiver to generate hierarchically structured data comprising portions of information associated with display configuration commands detected by said display configuration command detector, wherein the position of a portion of information in said hierarchically structured data is selected on the basis of said display configuration commands associated with said portion of information, and wherein said speech generator is operable to select portions of information for which an audio speech output signal is to be generated utilizing said hierarchically structured data generated by said information processing unit.
4. Apparatus in accordance with claim 3, wherein said information processing unit is operable to generate hierarchically structured data comprising a tree structure of parent nodes associated with portions of information and data identifying children nodes associated with further portions of information.
5. Apparatus in accordance with claim 4, wherein said apparatus further comprises: a selection unit operable by a user, to select nodes in said hierarchically structured data generated by said information processing unit, wherein said speech generator is operable to generate an audio speech output signal corresponding to portions of information associated with nodes identified as children nodes of a node selected by said selection unit.
6. Apparatus in accordance with claim 5, wherein said selection unit is operable by a user to select a node based upon a comparison operation with user input and information associated with said node.
7. Apparatus in accordance with claim 5, wherein said selection unit is operable upon receipt of a control signal to select a parent node of a currently selected node.
8. Apparatus in accordance with claim 2, wherein said apparatus further comprises: an abridgement level store operable to store data identifying an abridgement level; and said speech generator comprises a configuration command mapping unit operable to map each member of said set of permitted display configuration commands to a corresponding audio output level, said speech generator being operable to select for output portions of information associated with audio output levels relative to a threshold defined by data stored by said abridgement level store.
9. Apparatus in accordance with claim 8, further comprising a speech generation speed controller operable to receive speed control user input signals, and control the output speed of said speech generator.
10. Apparatus in accordance with claim 9, further comprising an abridgement level input unit operable to vary data identifying a current abridgement level stored in said abridgement level store.
11. Apparatus in accordance with claim 10, wherein said abridgement level input unit is operable to vary data stored in said abridgement level store to reduce the number of portions of information for which audio speech output signals are generated by said speech generator in response to receipt of a speed control user input signal corresponding to a request to increase speed of output from said speech generation means.
12. Apparatus in accordance with claim 10, wherein said abridgement level input unit is operable to vary data stored in said abridgement store to increase the number of portions of information for which audio speech output signals are generated by said speech generator in response to receipt of a speed control user input signal corresponding to a request to reduce speed of output from said speech generator.
13. Apparatus in accordance with claim 1, wherein said information receiver is operable to receive information comprising data and configuration commands interspersed in said data.
14. Apparatus in accordance with claim 13, wherein said display configuration command detector is operable to detect a configuration command interspersed in data received by said information receiver.
15. Apparatus in accordance with claim 13, wherein said information receiver is operable to receive information in a mark-up language format.
16. Apparatus in accordance with claim 15, wherein said information receiver is operable to receive information in hypertext mark-up language format.
17. Apparatus in accordance with claim 16, wherein said information receiver is operable to receive information in an extended mark-up language format.
18. Apparatus in accordance with claim 13, wherein said information receiver is operable to receive information defining a plurality of units of information and links therebetween; said apparatus further comprising: a selection unit operable by a user, to navigate in accordance with links between the units of information, to select units of information for processing by the speech generator.
19. Apparatus according to claim 18, wherein said selection unit comprises a user input monitor operable to generate control signals in response to user input actions; wherein: the apparatus is arranged so that: in response to a first control signal, information from a first unit of information is processed by the speech generator; in response to a second control signal, information from a second unit of information defined by a link from the first unit is processed by the speech generator; and in response to a third control signal, information from said first unit is processed by the speech generator.
20. Apparatus according to claim 19 wherein the apparatus is arranged so that in response to a fourth control signal, the speech generator is operable to cease output of information.
21. Apparatus according to claim 19, wherein the units of information are defined as parents and children.
22. Apparatus according to claim 21 wherein the second unit is a child unit of the first unit.
23. Apparatus according to claim 1. comprising a data store operable to store information for output to a user.
24. Apparatus according to claim 23, wherein the apparatus is arranged so that in response to said speech generator completing output of a portion of information, said speech generator is operable to commence output of a further portion of information.
25. A method of processing and outputting information comprising the steps of: receiving information in a form for conversion into a display signal for output; detecting in received information display configuration commands; and generating an . audio speech output signal corresponding with selected portions of received information wherein said portions of information for which an audio speech output signal is generated are selected upon the basis of detected display configuration commands associated with said portions of information.
26. A method in accordance with claim 25, wherein said detection of display configuration commands comprises detecting display configuration commands being members of a set of permitted display configuration commands .
27. A method in accordance with claim 26, further comprising processing received information to generate hierarchically structured data comprising portions of information associated with detected display configuration commands, wherein the position of a portion of information in said hierarchically structured data is selected on the basis of said display configuration commands associated with said portion of information, and wherein said selection of portions of information for which an audio speech output signal generated utilizes said generated hierarchically structured data.
28. A method in accordance with claim 26, wherein said generation of hierarchically structured data comprises generating a tree structure of parent nodes associated with portions of information and identified children nodes associated with further portions of information.
29. A method in accordance with claim 28, further comprising the step of: selecting a node in said generated hierarchically structured data, and said generation of an. audio signal comprises generating an audio speech output signal corresponding to portions of information identified as children nodes of said selected node.
30. A method in accordance with claim.29, wherein said selection of a node comprises a selection based- upon a comparison operation with user input and information associated with said node.
31. A method in accordance with claim 30, wherein said selection of a node comprises receipt of a control signal to select a parent node of a currently selected node.
32. A method in accordance with claim 26, further comprising the steps of: storing data identifying an abridgement level; mapping each member of said set of permitted display configuration commands to a corresponding audio output level thereby associating portions of information with audio output levels; and selecting for output portions of information associated with audio output levels relative to a threshold defined by said data identifying an abridgement level.
33. A method in accordance with claim 25, wherein said received information comprises information defining a plurality of units of information and links therebetween; said method further comprising the steps of: selecting units of information for processing in accordance with links between the units of information.
34. A method according to claim 33, wherein said selection step comprises monitoring user input and generating control signals in response to user input actions; wherein: in response to a first control signal, information from a first unit of information is processed; in response to a second control signal, information from a second unit of information defined by a link from the first unit is processed; and in response to a third control signal, information from said first unit is processed.
35.. A method according to claim 34 wherein in response to a fourth control signal, output of information is ceased.
36. A method according to claim 34, wherein the units of information are defined as parents and children. .
37. A method according to claim 36 wherein the second unit is a child unit of the first unit.
38. A method according to claim 28 comprising storing information for output to a user.
39. A method according to claim 28, wherein in response to completing output of a portion of information, output of a further portion of information is commenced.
40. A computer program comprising processor executable instructions operable to cause a computer to become configured as apparatus in accordance with at least one of claims 1 to 24.
41. A computer program comprising processor executable instructions for causing a computer to become operable to perform the method of at least one of claims 25 to 39.
42. A storage medium storing a computer program in accordance with claim 40 or claim 41.
43. A signal carrying computer readable instructions forming a computer program in accordance with claim 40 or claim 41.
44. Apparatus for processing and outputting information substantially as hereinbefore described with reference to the accompanying drawings .
45. A method of processing and outputting information substantially as hereinbefore described with reference to the accompanying drawings .
PCT/GB2001/001788 2000-04-19 2001-04-19 Electronic browser WO2001079986A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU93361/01A AU9336101A (en) 2000-04-19 2001-04-19 Electronic browser

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0009767.5 2000-04-19
GB0009767A GB2361556A (en) 2000-04-19 2000-04-19 Text-to-speech browser

Publications (2)

Publication Number Publication Date
WO2001079986A2 true WO2001079986A2 (en) 2001-10-25
WO2001079986A3 WO2001079986A3 (en) 2003-04-24

Family

ID=9890283

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2001/001788 WO2001079986A2 (en) 2000-04-19 2001-04-19 Electronic browser

Country Status (3)

Country Link
AU (1) AU9336101A (en)
GB (1) GB2361556A (en)
WO (1) WO2001079986A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1751980A1 (en) * 2004-04-24 2007-02-14 Electronics and Telecommunications Research Institute Apparatus and method for processing multimodal data broadcasting and system and method for receiving multimodal data broadcasting
WO2009083832A1 (en) * 2007-12-21 2009-07-09 Koninklijke Philips Electronics N.V. Device and method for converting multimedia content using a text-to-speech engine
WO2009148790A1 (en) * 2008-06-06 2009-12-10 Apple Inc. Processing a page
EP2323358A1 (en) * 2009-11-17 2011-05-18 Lg Electronics Inc. Method for outputting tts voice data in a mobile terminal and mobile terminal thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2378875A (en) * 2001-05-04 2003-02-19 Andrew James Marsh Annunciator for converting text messages to speech

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0848373A2 (en) * 1996-12-13 1998-06-17 Siemens Corporate Research, Inc. A sytem for interactive communication
US5781886A (en) * 1995-04-20 1998-07-14 Fujitsu Limited Voice response apparatus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1078952A (en) * 1996-07-29 1998-03-24 Internatl Business Mach Corp <Ibm> Voice synthesizing method and device therefor and hypertext control method and controller
US6282511B1 (en) * 1996-12-04 2001-08-28 At&T Voiced interface with hyperlinked information
US5899975A (en) * 1997-04-03 1999-05-04 Sun Microsystems, Inc. Style sheets for speech-based presentation of web pages

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781886A (en) * 1995-04-20 1998-07-14 Fujitsu Limited Voice response apparatus
EP0848373A2 (en) * 1996-12-13 1998-06-17 Siemens Corporate Research, Inc. A sytem for interactive communication

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
T.V.RAMAN: "Audio System for Technical Readings" CORNELL UNIVERSITY - PHD DISSERTATION, [Online] - May 1994 (1994-05) XP002217804 Retrieved from the Internet: <URL:http://citeseer.nj.nec.com/rd/0%2C633 09%2C1%2C0.25%2CDownload/http://citeseer.n j.nec.com/cache/papers/cs/3525/ftp:zSzzSzf tp.cs.cornell.eduzSzpubzSzramanzSzaster-th esis.pdf/raman94audio.pdf> [retrieved on 2002-10-23] *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1751980A1 (en) * 2004-04-24 2007-02-14 Electronics and Telecommunications Research Institute Apparatus and method for processing multimodal data broadcasting and system and method for receiving multimodal data broadcasting
EP1751980A4 (en) * 2004-04-24 2007-10-03 Korea Electronics Telecomm Apparatus and method for processing multimodal data broadcasting and system and method for receiving multimodal data broadcasting
WO2009083832A1 (en) * 2007-12-21 2009-07-09 Koninklijke Philips Electronics N.V. Device and method for converting multimedia content using a text-to-speech engine
WO2009148790A1 (en) * 2008-06-06 2009-12-10 Apple Inc. Processing a page
CN102112985A (en) * 2008-06-06 2011-06-29 苹果公司 Processing a page
CN102112985B (en) * 2008-06-06 2015-10-21 苹果公司 To the process of the page
US9405847B2 (en) 2008-06-06 2016-08-02 Apple Inc. Contextual grouping of a page
EP2323358A1 (en) * 2009-11-17 2011-05-18 Lg Electronics Inc. Method for outputting tts voice data in a mobile terminal and mobile terminal thereof
US8473297B2 (en) 2009-11-17 2013-06-25 Lg Electronics Inc. Mobile terminal

Also Published As

Publication number Publication date
GB0009767D0 (en) 2000-06-07
WO2001079986A3 (en) 2003-04-24
GB2361556A (en) 2001-10-24
AU9336101A (en) 2001-10-30

Similar Documents

Publication Publication Date Title
AU722611B2 (en) Serving signals
US6708153B2 (en) Voice site personality setting
US20080133702A1 (en) Data conversion server for voice browsing system
US6108629A (en) Method and apparatus for voice interaction over a network using an information flow controller
US20020161928A1 (en) Smart agent for providing network content to wireless devices
WO2004066125A2 (en) Multi-modal information retrieval system
WO2003046719A2 (en) Implementing sms-based value added service
KR20030043969A (en) Web server
KR20020075377A (en) Control codes for programmable remote supplied in xml format
US20140281920A1 (en) Web Based Communication of Information With Reconfigurable Format
WO2001079986A2 (en) Electronic browser
US7272659B2 (en) Information rewriting method, recording medium storing information rewriting program and information terminal device
US7283623B2 (en) Internet browsing using a uniform interface
WO2001073560A1 (en) Contents providing system
JPH10322478A (en) Hypertext access device in voice
GB2334648A (en) Internet access for a mobile communications device
KR20020072922A (en) Wireless internet service system using categorized service and the method thereof
CA2577236C (en) Method and apparatus for serving data to identified users
KR20060024807A (en) Mobile device for mapping sms characters to e.g. sound, vibration, or graphical effects
KR20080012473A (en) Information searching system for mobile phone and method thereof
CA2580499A1 (en) Serving signals
KR19980074304A (en) V-Cal with Internet function and its recording control method

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP