WO2001057850A2 - Systeme de navigation vocale robuste offrant un faisceau unifie de services telephoniques et reseau - Google Patents

Systeme de navigation vocale robuste offrant un faisceau unifie de services telephoniques et reseau Download PDF

Info

Publication number
WO2001057850A2
WO2001057850A2 PCT/US2001/003742 US0103742W WO0157850A2 WO 2001057850 A2 WO2001057850 A2 WO 2001057850A2 US 0103742 W US0103742 W US 0103742W WO 0157850 A2 WO0157850 A2 WO 0157850A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
information
service
web site
response
Prior art date
Application number
PCT/US2001/003742
Other languages
English (en)
Other versions
WO2001057850A3 (fr
Inventor
Alex Kurganov
Harold E. Poel
Valery Zhukoff
Original Assignee
Webley Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Webley Systems, Inc. filed Critical Webley Systems, Inc.
Priority to AU2001234833A priority Critical patent/AU2001234833A1/en
Publication of WO2001057850A2 publication Critical patent/WO2001057850A2/fr
Publication of WO2001057850A3 publication Critical patent/WO2001057850A3/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Definitions

  • the present invention relates to a robust and highly reliable system that allows users to browse web sites and retrieve information by using conversational voice commands. Additionally, the present invention allows users to control and monitor other systems and devices that are connected the Internet or any other network by using voice commands. Additionally, the invention relates to a personalized system for accessing information from the Internet or other information sources using speech commands.
  • the present invention relates to a method for providing integrated Internet and telecommunications services from a common provider that enables subscribers to inexpensively send, receive, and transfer telephone calls, email messages, voice mail messages, paging messages, and facsimile messages.
  • the first option is to use a desktop or a laptop computer connected to a telephone line via a modem or connected to a network with Internet access.
  • the second option is to use a Personal Digital Assistant (PDA) that has the capability of connecting to the Internet either through a modem or a wireless connection.
  • PDA Personal Digital Assistant
  • the third option is to use one of the newly designed web-phones or web-pagers that are now being offered on the market.
  • desktop computers are very large and bulky and are difficult to transport- Laptop computers solve this inconvenience, but many are still quite heavy and are inconvenient to carry. Further, laptop computers cannot be carried and used everywhere a user travels. For instance, if a user wishes to obtain information from a remote location where no electricity or communication lines are installed, it would be nearly impossible to use a laptop computer. Oftentimes, information is needed on an immediate basis where a computer is not accessible. Furthermore, the use of laptop or desktop computers to access the Internet requires either a direct or a dial-up connection tan an Internet Service Provider (ISP). Oftentimes, such connections are not available when a user desires to connect to the Internet to acquire information.
  • ISP Internet Service Provider
  • PDAs The second option for remotely accessing web sites is the use of PDAs.
  • PDAs with the ability to connect to the Internet and access web sites are not readily available. As a result, these PDAs tend to be very expensive. Furthermore, users are usually required to pay a special service fee to enable the web browsing feature of the PDA.
  • a further disadvantage of these PDAs is that web sites must be specifically designed to allow these devices to access information on the web site. Therefore, a limited number of web sites are available that are accessible by these web-enabled PDAs.
  • a user attempting to find information using a telephone expects immediate responses to his search requests.
  • a system that introduces too much delay between the time a user makes a request and the time of response will not be tolerated by users and will lose its usefulness. Therefore, it is important that a voice browsing system that uses telephonic communications selects web sites that provide rapid responses since speed is an important factor for maintaining the system's desirability and usability. Therefore, a need exists for a system that accesses web sites based upon their speed of operation.
  • Wireline telephone services, cellular telephone service, facsimile messages, email messages, voice mail messages, pager services, and Internet access services are just some of the important methods and services widely used for business and personal communications.
  • people require the ability to send and receive messages, access information, conduct business transactions, organize daily schedules, and stay in touch with homes and offices from almost anywhere, at any time, in an easy to use and economical manner.
  • portable electronic devices such as cellular telephones, pagers, and Personal Digital Assistants (PDAs).
  • PDAs Personal Digital Assistants
  • the explosive growth of Internet and related networking services demonstrates the importance of such systems to personal communications and the ability to quickly and easily access information.
  • These networks currently host a variety of services such as contact lists, scheduling and date book information, electronic mail, conferencing, electronic commerce, games, software libraries and electronic newspapers and magazines.
  • the problem of accessing and processing all of the available information from communication systems, networks and services is particularly acute for mobile business professionals.
  • the mobile professional whether working out of the home or while on the road, may have a cellular telephone, a facsimile machine, a pager, intranet mail, Internet mail, and voice mail services. Success for this professional, depends in large part on the ability to easily, quickly and inexpensively access, sort, and respond to the messages delivered to each of these communication devices and on the ability to obtain necessary information to conduct business within proliferating networks and services.
  • An additional object of an embodiment of the present invention is to provide a system and method that allows the searching and retrieving of publicly available information by controlling a web browsing server using naturally spoken voice commands.
  • the ranking order is automatically adjusted if the system detects that a given web site is not functioning, is too slow, or has been modified in such a way that the requested information cannot be retrieved any longer.
  • An additional object of an embodiment of the present invention is to provide a system and method for using voice commands to control and monitor devices connected to a network.
  • One object of a preferred embodiment of the present invention is to allow users to customize a voice browsing system.
  • a further object of a preferred embodiment is to allow users to customize the information retrieved from the Internet or other computer networks and accessed by speech commands over telephones.
  • a further object of an embodiment of the present invention is to provide a system and method for interfacing a plurality of different communication services enabling each service to transfer data and calls to another service.
  • Another object of an embodiment of the present invention to provide a reduced cost system and method for transferring data and calls to multiple locations.
  • the present invention relates to a system for acquiring information from sources on a network, such as the Internet.
  • a voice browsing system maintains a database containing a list of information sources, such as web sites, connected to a network. Each of the information sources is assigned a rank number which is listed in the database.
  • a network interface system accesses the information source with the highest rank number in order to retrieve information requested by the user.
  • a preferred embodiment of the present invention allows users to access and browse web sites when they do not have access to computers with Internet access. This is accomplished by providing a voice browsing system and method that allows users to browse web sites using conversational voice commands spoken into any type of voice enabled device (i.e., any type of wireline or wireless telephone, IP phone, wireless PDA, or other wireless device). These spoken commands are then converted into data messages by a speech recognition software engine running on a user interface system. These data messages are then sent to and processed by a network interface system. This network interface system then generates the proper requests that are transmitted to the desired web site over the Internet. Responses sent from the web site are received and processed by the network interface system and then converted into an audio message via a speech synthesis engine or a pre-recorded audio concatenation application and finally transmitted to the user's voice enabled device.
  • voice enabled device i.e., any type of wireline or wireless telephone, IP phone, wireless PDA, or other wireless device.
  • a preferred embodiment of the voice browser system and method uses a web site polling and ranking methodology that allows the system to detect changes in web sites and adapt to those changes in real-time. This enables the voice browser system of a preferred embodiment to deliver highly reliable information to users over any voice enabled device.
  • This ranking system also enables the present invention to provide rapid responses to user requests. Long delays before receiving responses to requests are not tolerated by users of voice-based systems, such as telephones. When a user speaks into a telephone, an almost immediate response is expected. This expectation does not exist for non-voice communications, such as email transmissions or accessing a web site using a personal computer. In such situations, a reasonable amount of transmission delay is acceptable.
  • the ranking system of implemented by a preferred embodiment of the present invention ensures users will always receive the fastest possible response to their request.
  • a second embodiment of the present invention allows users to control and monitor the operation of a variety of household devices connected to a network using speech commands spoken into a voice enabled device.
  • a third embodiment of present invention enables a user to create a user-defined record in the database that identifies an information source, such as a web site, containing information of interest to the user.
  • This record identifies the location of the information source and also contains a recognition grammar assigned by the user.
  • a network interface system Upon receiving a speech command from the user that is described with the assigned recognition grammar, a network interface system accesses the information source and retrieves the information requested by the user.
  • a customized, voice-activated information access system is provided.
  • a user creates a descriptor file defining specific information found on a web site the user would like to access in the future.
  • the user assigns a pronounceable name or identifier to the selected content and this pronounceable name is saved in a user-defined database record as a recognition grammar along with the URL of the selected web site.
  • a telephone call is placed to a media server.
  • the user provides speech commands to the media server which include the recognition grammar assigned to the desired search.
  • the media server retrieves the user- defined record from a database and passes the information to a web browsing server which retrieves the information from associated web site.
  • the retrieved information is then transmitted to the user using a speech synthesis software engine.
  • a fourth embodiment of the present invention provides a unified communications system that provides a variety of different communication services from a single service provider. These services include local telephone service, long distance, cellular telephone service, Internet access, voice mail, email, facsimile service, and paging services. Each of the different communication services are linked together by a system controller operated by a single service provider.
  • the unified system allows users to easily and economically transfer information received by one of the communication services to a second communication service.
  • the system implements speech recognition technology thereby allowing users to control all of the communication services uses speech commands.
  • the communications system provides a single operating menu that allows users to control and access all of the features and services provided by the system. This operating menu may be accessed using speech commands, touch-tone commands, or via a computer.
  • FIG. 1 is a depiction of the voice browsing system of the first embodiment of the present invention
  • FIG. 2 is a block diagram of a database record used by the first preferred embodiment of the present invention
  • FIG. 3 is a block diagram of a media server used by the preferred embodiment
  • FIG. 4 is a block diagram of a web browsing server used by the preferred embodiment
  • FIG. 5 is a depiction of the device browsing system of the second embodiment of the present invention.
  • FIG. 6 depicts a personal information selection system used with a third preferred embodiment of the present invention.
  • FIG. 7 depicts a web page displayed by the clipping client of the third preferred embodiment
  • FIG. 8 is a block diagram of a user-defined database record used by the third preferred embodiment of the present invention.
  • FIG. 9 is a block diagram showing methods available for users to access the communications system of the fourth preferred embodiment.
  • FIG. 10 is a block diagram of a system controller used with the fourth preferred embodiment
  • FIG. 11 is a block diagram of the various services that may be provided by single service provider according to the fourth preferred embodiment.
  • a first embodiment of the present invention is a system and method for allowing users to browse information sources, such as web sites, by using naturally spoken, conversational voice commands spoken into a voice enabled device. Users are not required to learn a special language or command set in order to communicate with the voice browsing system of the present invention. Common and ordinary commands and phrases are all that is required for a user to operate the voice browsing system.
  • the voice browsing system recognizes naturally spoken voice commands and is speaker-independent; it does not have to be trained to recognize the voice patterns of each individual user. Such speech recognition systems use phonemes to recognize spoken words and not predefined voice patterns.
  • the first embodiment allows users to select from various categories of information and to search those categories for desired data by using conversational voice commands.
  • the voice browsing system of the first preferred embodiment includes a user interface system referred to as a media server.
  • the media server contains a speech recognition software engine. This speech recognition engine is used to recognize natural, conversational voice commands spoken by the user and converts them into data messages based on the available recognition grammar. These data messages are then sent to a network interface system.
  • the network interface system is referred to as a web browsing server.
  • the web browsing server accesses the appropriate information source, such as a web site, to gather information requested by the user. Responses received from the information sources are then transferred to the media server where speech synthesis engine converts the responses into audio messages that are transmitted to the user.
  • a database 2 designed by Webley Systems Incorporated is connected to one or more web browsing servers 4 as well as to one or more media servers 8.
  • the database 2 contains a separate set of records for each web site accessible by the system.
  • An example of a web site record is shown in FIG. 2.
  • Each web site record 20 contains the rank number of the web site 22, the associated Uniform Resource Locator (URL) 24, and a command that enables the appropriate "extraction agent" 26 that is required in order to generate proper requests sent to and to format data received from the web site.
  • the database record 20 also contains the timestamp 28 indicating the last time the web site was accessed.
  • the extraction agent is described in more detail below.
  • the database 2 categorizes each database record 20 according to the type of information provided by each web site.
  • a first category of database records 20 may correspond to web sites that provide "weather” information.
  • the database 2 may also contain a second category of records 20 for web sites that provide "stock” information.
  • These categories may be further divided into sub categories.
  • the "weather” category may contain subcategories depending upon type of weather information available to a user, such as "current weather” or "extended forecast”.
  • a list of web site records may be stored that provide weather information for multiple days.
  • the use of subcategories may allow the web browsing feature to provide more accurate, relevant, and up-to-date information to the user by accessing the most relevant web site.
  • the number of records contained in each category or subcategory is not limited. In the preferred embodiment, three web site records are provided for each category.
  • Table 1 below depicts two database records 20 that are used with the preferred embodiment. These records also contain a field indicating the "category” of the record, which is "weather” in each of these examples.
  • the database also contains a listing of pre-recorded audio files used to create concatenated phrases and sentences. Further, database 2 may contain customer profile information, system activity reports, and any other data or software servers necessary for the testing or administration of the voice browsing system.
  • the media servers 8 function as user interface systems.
  • the media servers 8 contain a speech recognition engine 30, a speech synthesis engine 32, an Interactive Voice Response (IVR) application 34, a call processing system 36, and telephony and voice hardware 38 required to communicate with the Public Switched Telephone Network (PSTN) 18.
  • PSTN Public Switched Telephone Network
  • each media server is based upon Intel's Dual Pentium III 730 MHz microprocessor system.
  • the speech recognition function is performed by a speech recognition engine 30 that converts voice commands received from the user's voice enabled device 14 (i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units) into data messages.
  • voice enabled device 14 i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units
  • voice commands and audio messages are transmitted using the PSTN 18 and data is transmitted using the TCP/IP communications protocol.
  • TCP/IP Transmission Control Protocol
  • Other possible transmission protocols would include S ⁇ VVoIP (Session Initiation Protocol/Voice over IP), Asynchronous Transfer Mode (ATM) and Frame Relay.
  • S ⁇ VVoIP Session Initiation Protocol/Voice over IP
  • ATM Asynchronous Transfer Mode
  • Frame Relay A preferred speech recognition engine is developed by Nuance Communications of 1380 Willow Road, Menlo Park, California 94025 (www.nuance.com). The Nuance engine capacity is measured in recognition units based on CPU type as defined in the vendor specification.
  • the natural speech recognition grammars i.e., what a user can say that will be recognized by the speech recognition engine) were developed by Webley Systems. Table 2 below provides a partial source code listing of the recognition grammars used by the speech recognition engine of the preferred embodiment for obtaining weather information.
  • the media server 8 uses recognition results generated by the speech recognition engine 30 to retrieve a web site record 20 stored in the database 2 that can provide the information requested by the user.
  • the media server 8 processes the recognition result data identifying keywords that are used to search the web site records 20 contained in the database 2 For instance, if the user's request was "What is the weather in Chicago?", the keywords "weather” and “Chicago” would be recognized.
  • a web site record 20 with the highest rank number from the "weather” category within the database 2 would then be selected and transmitted to the web browsing server 4 along with an identifier indicating that
  • the media servers 8 also contain a speech synthesis engine 32 that converts the data retrieved by the web browsing servers 4 into audio messages that are transmitted to the user's voice enabled device 14.
  • a preferred speech synthesis engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenue, Burlington, Massachusetts 01803 (www.lhsl.com).
  • the web browsing servers 4 provide access to any computer network such as the Internet 12. These servers are also capable of accessing databases stored on Local Area Networks (LANs) or Wide Area Networks (WANs).
  • the web browsing servers receive responses from web sites and extract the data requested by the user. This task is also known as "content extraction.”
  • the web browsing servers 4 also perform the task of periodically polling or "pinging" various web sites and modifying the ranking numbers of these web sites depending upon their response and speed. This polling feature is further discussed below.
  • the web browsing server 4 is comprised of a content extraction agent 40, a content fetcher 42, a polling and ranking agent 44, and the content descriptor files 46. Each of these are software applications and will be discussed below.
  • the web browsing server 4 Upon receiving a web site record 20 from the database 2 in response to a user request, the web browsing server 4 invokes the "content extraction agent" command 26 contained in the record 20.
  • the content extraction agent 40 allows the web browsing server 4 to properly format requests and read responses provided by the web site 16 identified in the URL field 24 of the web site record 20.
  • Each content extraction agent command 26 invokes the content extraction agent and identifies a content description file associated with the web page identified by the URL 24. This content description the directs the extraction agent where to extract data from the accessed web page and how to format a response to the user utilizing that data. For example, the content description for a web page providing weather information would indicate where to insert the "city" name or ZIP code in order to retrieve Chicago weather information. Additionally, the content description file for each supported URL indicates the location on the web page where the response information is provided. The extraction agent 40 uses this information to properly extract from the web page the information requested by the user.
  • Table 3 below contains source code for a content extraction agent 40 used by the preferred embodiment.
  • 6--stamp -C>hLN ⁇ PdbuuE*itn/ord, itn/cb/sprint_hd
  • # check parameters die "Usage: $0 service [params] ⁇ n" if $#ARGV ⁇ 1; #print STDERR @ARGV;
  • Table 4 below contains source code of the content fetcher 42 used with the content extraction agent 40 to retrieve information from a web site.
  • service_name [service_parameters] , i.e. stock sft or weather
  • $debug 1; use URI: :URL; use LWP: :UserAgent; use HTTP: : Request : -.Common; use Vail : : NarList ; use Sybase : :CTlib; use HTTP: : Cookies;
  • %Content map ⁇ ( $Param ⁇ Output ⁇ ->[ $_ ], $values [ $_ ]
  • Table 5 below contains the content descriptor file source code for obtaining weather information from the web site www.cnn.com that is used by the extraction agent 40 of the preferred embodiment.
  • Regular_expression Author   (.+) Four Day Forecast ( ⁇ S+)
  • Post-filter wind” S" South
  • Post-filter _wind”W”West
  • Post-filter _wind/mph/miles per hour/
  • Temperature is _current_temperature_F Fahrenheit, _current_temperature_C Celsium.
  • Humidity is _humidity. Wind from the wind.
  • Table 6 below contains the content descriptor file source code for obtaining weather information from the web site www.lycos.com that is used by the extraction agent 40 of the preferred embodiment.
  • Post-filter _wind/kph ! /kilometers per hour/
  • the current weather in _location is __current_weather .
  • the current temperature is _current_temperature__F Farenheit
  • Humidity is _humidity.
  • each web browsing server 4 accesses the web site specified in the URL 24 and retrieves the requested information, the information is forwarded to the media server 8.
  • the media server uses the speech synthesis engine 32 to create an audio message that is then transmitted to the user's voice enabled device 14.
  • each web browsing server 4 is based upon Intel's Dual Pentium III 730 MHz microprocessor system. Referring to FIG. 1, the operation of the robust voice browser system will be described.
  • a user establishes a connection between his voice enabled device 14 and a media server 8. This may be done using the Public Switched Telephone Network (PSTN) 18 by calling a telephone number associated with the voice browsing system 19.
  • PSTN Public Switched Telephone Network
  • ISR interactive voice response
  • the INR application plays audio messages to the user presenting a list of options, such as, "stock quotes”, “flight status”, “yellow pages", “weather”, and “news”. These options are based upon the available web site categories and may be modified as desired.
  • the user selects the desired option by speaking the name of the option into the voice enabled device 14. As an example, if a user wishes to obtain restaurant information, he may speak into his telephone the phrase "yellow pages”.
  • the INR application would then ask the user what he would like to find and the user may respond by stating "restaurants”.
  • the user may then be provided with further options related to searching for the desired restaurant. For instance, the user may be provided with the following restaurant options, "Mexican Restaurants", “Italian Restaurants", or "American Restaurants”.
  • the user then speaks into the telephone 14 the restaurant type of interest.
  • the INR application running on the media server 8 may also request additional information limiting the geographic scope of the restaurants to be reported to the user. For instance, the IVR application may ask the user to identify the zip code of the area where the restaurant should be located.
  • the media server 8 uses the speech recognition engine 30 to interpret the speech commands received from the user. Based upon these commands, the media server 8 retrieves the appropriate web site record 20 from the database 2. This record and any additional data, which may include other necessary parameters needed to perform the user's request, are transmitted to a web browsing server 4.
  • a firewall 6 may be provided that separates the web browsing server 4 from the database 2 and media server 8.
  • the firewall provides protection to the media server and database by preventing unauthorized access in the event the firewall for web browsing server 10 fails or is compromised. Any type of firewall protection technique commonly known to one skilled in the art could be used, including packet filter, proxy server, application gateway, or circuit- level gateway techniques.
  • the web browsing server 4 then uses the web site record and any additional data and executes the extraction agent 40 and relevant content descriptor file 46 to retrieve the requested information.
  • the information received from the responding web site 16 is then processed by the web browsing server 4 according to the content descriptor file 46 retrieval by the extraction agent .
  • This processed response is then transmitted to the media server 8 for conversion into audio messages using either the speech synthesis software 32 or selecting among a database of prerecorded voice responses contained within the database 2.
  • each web site record contains a rank number 22 as shown in FIG. 2.
  • the web site ranking method and system of the present invention provides robustness to the voice browser system and enables it to adapt to changes that may occur as web sites evolve. For instance, the information required by a web site 16 to perform a search or the format of the reported response data may change. Without the ability to adequately monitor and detect these changes, a search requested by a user may provide an incomplete response, no response, or an error. Such useless responses may result from incomplete data being provided to the web site 16 or the web browsing server 4 being unable to recognize the response data messages received from the searched web site 16.
  • This polling mechanism continually polls or "pings" each of the sites listed in the database 2.
  • a web browsing server 4 sends brief requests to each web site listed in database 2.
  • the web browsing server 4 monitors the response received from each web site and determines whether it is a complete response and whether the response is in the expected format specified by the content descriptor file 46 used by the extraction agent 40.
  • the polled web sites that provide complete responses in the format expected by the extraction agent 40 have their ranking established based on their "response time". That is, web sites with faster response times will be will be assigned higher rankings than those with slower response times. If the web browsing server 4 receives no response from the polled web site or if the response received is not in the expected format, then the rank of that web site is lowered.
  • warning message or alarm may be generated for the system administrator indicating that the specified web site has been modified or is not responsive and requires further review.
  • the web browsing servers 4 access web sites based upon their ranking number, only those web sites that produce useful and error-free responses will be used by the voice browser system to gather information requested by the user. Further, since the ranking numbers are also based upon the speed of a web site in providing responses, only the most time efficient sites are accessed. This system assures that users will get complete, timely, and relevant responses to their requests. Without this feature, users may be provided with information that is not relevant to their request or may not get any information at all. The constant polling and re-ranking of the web sites used within each category allows the voice browser of the present invention to operate efficiently. Finally, it allows the voice browser system of the present invention to dynamically adapt to changes in the rapidly evolving web sites that exist on the Internet.
  • the web sites accessible by the voice browser of the preferred embodiment may use any type of mark-up language, including Extensible Markup Language (XML), Wireless Markup Language (WML), Handheld Device Markup Language (HDML), Hyper Text Markup Language (HTML), or any variation of these languages-
  • XML Extensible Markup Language
  • WML Wireless Markup Language
  • HDML Handheld Device Markup Language
  • HTML Hyper Text Markup Language
  • FIG. 5 A second embodiment of the present invention is depicted in FIG. 5.
  • This embodiment provides a system and method for controlling a variety of devices 50 connected to a network 52 by using conversational speech commands spoken into a voice enabled device 54 (i.e., wireline or wireless telephones, Internet Protocol (IP) phones, or other special wireless units).
  • the networked devices may include various household devices. For instance, voice commands may be used to control household security systems, NCRs, TVs, outdoor or indoor lighting, sprinklers, or heating and air conditioning systems.
  • Each of these devices 50 is connected to a network 52.
  • These devices 50 may contain embedded microprocessors or may be connected to other computer equipment that allow the device 50 to communicate with network 52.
  • the devices 50 appear as "web sites" connected to the network 52.
  • This allows a network interface system, such as a device browsing server 56, a database 57, and a user interface system, such as a media server 58, to operate similar to the web browsing server 4, database 2 and media server 8 described in the first preferred embodiment above.
  • a network 52 interfaces with one or more network interface systems, which are shown as device browsing servers 56 in FIG. 5.
  • the device browsing servers perform many of the same functions and operate in much the same way as the web browsing servers 4 discuss above in the first preferred embodiment.
  • the device browsing servers 56 are also connected to a database 57.
  • Database 57 lists all devices that are connected to the network 52. For each device
  • the database 57 contains a record similar to that shown in FIG. 2. Each record will contain at least a device identifier, which may be in the form of a URL, and a command to
  • Database 57 may also include any other data or software necessary to test and administer the device browsing system.
  • a device descriptor file contains a listing of the options and functions available for each of the devices 50 connected on the network 52. Furthermore, the device descriptor file contains the information necessary to properly communicate with the networked devices 50. Such information would include, for example, communication protocols, message formatting requirements, and required operating parameters.
  • the device browsing server 56 receives messages from the various networked devices 50, appropriately formats those messages and transmits them to one or more media servers 58 which are part of the device browsing system.
  • the user's voice enabled devices 54 can access the device browsing system by calling into a media server 58 via the Public Switched Telephone Network (PSTN) 59.
  • PSTN Public Switched Telephone Network
  • the device browsing server is based upon Intel's Dual Pentium III 730 MHz microprocessor system.
  • the media servers 58 act as user interface systems and perform the functions of natural speech recognition, speech synthesis, data processing, and call handling.
  • the media server 58 operates similarly to the media server 8 depicted in FIG. 3.
  • the media server 58 When data is received from the device browser server 56, the media server 58 will convert the data into audio messages via a speech synthesis engine that are then transmitted to the voice enabled device of the user 54. Speech commands received from the voice enabled device of the user 54 are converted into data messages via a speech recognition engine running on the media server 58.
  • a preferred speech recognition engine is developed by Nuance Communications of 1380 Willow Road, Menlo Park, California 94025 (www.nuance.com).
  • a preferred speech synthesis engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenue, Burlington, Massachusetts 01803 (www.lhsl.com).
  • the media servers 58 of the preferred embodiment are based on Intel's Dual Pentium III 730 MHz microprocessor system. A specific example for using the system and method of this embodiment of the invention will now be given.
  • a user may call into a media server 58 by dialing a telephone number associated with an established device browsing system. Once the user is connected, the IUR application of the media server 58 will provide the user with a list of available systems that may be monitored or controlled based upon information contained in database 57.
  • the user may be provided with the option to select "Home Systems” or "Office Systems”. The user may then speak the command "access home systems”.
  • the media server 58 would then access the database 57 and provide the user with a listing of the home subsystems or devices 50 available on the network 52 for the user to monitor and control. For instance, the user may be given a listing of subsystems such as "Outdoor Lighting System”, “Indoor Lighting System”, “Security System”, or "Heating and Air Conditioning System”. The user may then select the indoor lighting subsystem by speaking the command "Indoor Lighting System”.
  • the IUR application would then provide the user with a set of options related to the indoor lighting system.
  • the media server 58 may then provide a listing such as "Dining Room”, “Living Room”, “Kitchen”, or “Bedroom”. After selecting the desired room, the IUR application would provide the user with the options to hear the "status” of the lighting in that room or to "turn on”, “turn off, or “dim” the lighting in the desired room. These commands are provided by the user by speaking the desired command into the users voice enabled device 54. The media server 58 receives this command and translates it into a data message. This data message is then forwarded to the device browsing server 56 which routes the message to the appropriate device 50.
  • the device browsing system 51 of this embodiment of the present invention also provides the same robustness and reliability features described in the first embodiment.
  • the device browsing system 51 has the ability to detect whether new devices have been added to the system or whether current devices are out-of-service. This robustness is achieved by periodically polling or "pinging" all devices 50 listed in database 57.
  • the device browsing server 56 periodically polls each device 50 and monitors the response. If the device browsing server 56 receives a recognized and expected response from the polled device, then the device is categorized as being recognized and in-service. However, if the device browsing server 56 does not receive a response from the polled device 50 or receives an unexpected response, then the device 50 is marked as being either new or out-of-service. A warning message or a report may then be generated for the user indicating that a new device has been detected or that an existing device is experiencing trouble.
  • this embodiment allows users to remotely monitor and control any devices that are connected to a network, such as devices within a home or office. Furthermore, no special telecommunications equipment is required for users to remotely access the device browser system. Users may use any type of voice enabled device (i.e., wireline or wireless telephones, IP phones, or other wireless units) available to them. Furthermore, a user may perform these functions from anywhere without having to subscribe to additional services. Therefore, no additional expenses are incurred by the user.
  • the third preferred embodiment of the present invention enables a user to associate information of interest found on a specific information source, such as a web site, with a pronounceable name or identification word. This pronounceable name/identification word forms a recognition grammar in the preferred embodiment.
  • the user When the user wishes to retrieve the selected information, he may use a telephone or other voice enabled device to access a voice browser system. The user then speaks the pronounceable name or command described within the recognition grammar associated with the desired information. The voice browsing system then accesses the associated information source and returns to the user, using a voice synthesizer, the requested information.
  • a user 60 uses a computer 62 to access a network, such as a WAN, LAN, or the Internet, containing various information sources.
  • a network such as a WAN, LAN, or the Internet
  • the user 60 access the Internet 12 and begins searching for web sites 16, which are information sources that contain information of interest to the user.
  • web sites 16 are information sources that contain information of interest to the user.
  • the user 60 identifies a web site 16 containing information the user would like to access using only a voice enabled device, such as a telephone, and the voice browsing system 19, the user initiates a "clipping client" engine 64 on his computer 62.
  • the clipping client 64 allows a user 60 to create a set of instructions for use by the voice browsing system 19 in order to report personalized information back to the user upon request.
  • the instruction set is created by "clipping" information from the identified web site.
  • a user 60 may be interested in weather for a specific city, such as Chicago.
  • the user 60 identifies a web site from which he would like to obtain the latest Chicago weather information.
  • the clipping client 64 is then activated by the user 60.
  • the clipping client 64 displays the selected web site in the same manner as a conventional web browser such as Microsoft's® Internet Explorer.
  • FIG. 7 depicts a sample of a web page 70 displayed by the clipping client 64.
  • the user 60 begins creation of the instruction set for retrieving information from the identified web site by selecting the uniform resource locator (URL) address 72 for the web site In the preferred embodiment, this selection is done by highlighting and copying the URL address 72
  • the user selects the information from the displayed web page that he would like to have retrieved when a request is made Referring to FIG 7, the user would select the information regarding the weather conditions in Chicago 74
  • the web page 70 may also contain additional information such as advertisements 76 or links to other web sites 78 which are not of interest to the user
  • the clipping client 64 allows the user to select only that portion of the web page containing information of interest to the user Therefore, unless the advertisements 76 and links 78 displayed on the web page are of interest to the user, he would not select this information Based on the web page information 74 selected
  • Table 7 below is an example of a content descriptor file created by the clipping client of the preferred embodiment This content descriptor file relates to obtaining weather information from the web site www cnn com
  • Pre-f ⁇ lter ⁇ n"
  • Pre-filter ,r ⁇ [ ⁇ ⁇ >] +>”
  • the clipping client 64 prompts the user to enter an identification word or phrase that will be associated with the identified web site and information. For example, the user could associate the phrase "Chicago weather" with the selected URL 72 and related weather information 74.
  • the identification word or phrase is stored as a personal recognition grammar that can now be recognized by a speech recognition engine of the voice browsing system 19 which will be discussed below.
  • the personal recognition grammar, URL address 72, and a command for executing a content extraction agent are stored within a database used by the voice browser system 19 which has been discussed above in relation to the first preferred embodiment.
  • the database 2 of the voice browsing system 19 contains a section that stores the personal recognition grammars and related web site information generated by the clipping client 64.
  • An example of a user - defined web site record is shown in FIG. 8.
  • Each user-defined web site record 80 contains the recognition grammar 82 assigned by the user, the associated Uniform Resource Locator (URL) 84, and a command that enables the "content extraction agent" 86 and retrieves the appropriate content descriptor file 86 required to generate proper requests to the web site and to properly format received data.
  • the content exaction agent has been described above in relation to the first preferred embodiment and Tables 3 and 4.
  • the web site record 80 also contains the timestamp 88 indicating the last time the web site was accessed.
  • the media servers 8 when a user access the voice browsing system 19, he will be prompted if he would like to use his "user-defined searches.” If the user answers affirmatively, the media servers 8 will retrieve from the database 2 the personal recognition grammars 82 defined by the user while using the clipping client 64.
  • the web browsing server 4 Upon receiving a user-defined web site record 80 from the database 2 in response to a user request, the web browsing server 4 invokes the "content extraction agent" command 86 contained in the record 80.
  • the content extraction agent 40 retrieves the content descriptor file 46 associated with the user-defined record 80.
  • the content descriptor file 46 directs the extraction agent where to extract data from the accessed web page and how to format a response to the user utilizing that data.
  • the content descriptor file 46 for each supported URL indicates the location on the web page where the response information is provided. The extraction agent 40 uses this information to properly extract from the web page the information requested by the user.
  • the content extraction agent 40 can also parse the content of a web page in which the user-desired information has changed location or format. This is accomplished based on the characteristic that most hypertext documents include named objects like tables, buttons, and forms that contain textual content of interest to a user. When changes to a web page occur, a named object may be moved within a document, but it still exists. Therefore, the content extraction agent 40 simply searches for the relevant name of desired object. In this way, the information requested by the user may still be found and reported regardless of changes that have occurred.
  • Table 3 above contains source code for a content extraction agent 40 which may be used by the third preferred embodiment.
  • Table 4 above contains source code of the content fetcher 42 which may be used with the content extraction agent 40 to retrieve information from a web site.
  • PSTN Public Switched Telephone Network
  • FIG. 1 the operation of the personal voice-based information retrieval system will be described.
  • a user establishes a connection between his voice enabled device 14 and a media server 8 of the voice browsing system 19. This may be done using the Public Switched Telephone Network (PSTN) 18 by calling a telephone number associated with the voice browsing system 19.
  • PSTN Public Switched Telephone Network
  • the media server 8 initiates an interactive voice response (IVR) application.
  • the IVR application plays audio message to the user presenting a list of options, which includes "perform a user-defined search.”
  • the user selects the option to perform a user-defined search by speaking the name of the option into the voice enabled device 14
  • the media server 8 then accesses the database 2 and retrieves the personal recognition grammars 82. Using the speech synthesis software engine 32, the media server 8 then asks the user, "Which of the following user-defined searches would you like to perform" and reads to the user the identification name, provided by the recognition grammar 82, of each user-defined search. The user selects the desired search by speaking the appropriate speech command or pronounceable name described within the recognition grammar 82.
  • These speech recognition grammars 82 define the speech commands or pronounceable names spoken by a user in order to perform a user-defined search. If the user has a multitude of user-defined searches, he may speak command described within the recognition grammar 82 of the desired search at anytime without waiting for the media server 8 to list all available user-defined searches. This feature is commonly referred to as a
  • the media server 8 uses the speech recognition engine 30 to interpret the speech commands received from the user. Based upon these commands, the media server 8 retrieves the appropriate user-defined web site record 80 from the database 2. This record is then transmitted to a web browsing server 4.
  • a firewall 6 may be provided that separates the web browsing server 4 from the database 2 and media server 8. The firewall provides protection to the media server and database by preventing unauthorized access in the event the firewall 10 for the web browsing server fails or is compromised. Any type of firewall protection technique commonly known to one skilled in the art could be used, including packet filter, proxy server, application gateway, or circuit-level gateway techniques.
  • the web browsing server 4 accesses the web site 16 specified by the URL 84 in the user-defined web site record 80 and retrieves the user-defined information from that site using the content extraction agent and specified content descriptor file specified in the content extraction agent command 86. Since the web browsing server 4 uses the URL and retrieves new information from the Internet each time a request is made, the requested information is always updated.
  • the content information received from the responding web site 16 is then processed by the web browsing server 4 according to the associated content descriptor file.
  • This processed response is then transmitted to the media server 8 for conversion into audio messages using either the speech synthesis engine 32 or selecting among a database of prerecorded voice responses contained within the database 2.
  • This message is then transmitted to the user's voice enabled device 14.
  • the web sites accessible by the personal information retrieval system and voice browser of the preferred embodiments may use any type of mark-up language, including Extensible Markup Language (XML), Wireless Markup Language (WML), Handheld Device Markup Language (HDML), Hyper Text Markup Language (HTML), or any variation of these languages.
  • XML Extensible Markup Language
  • WML Wireless Markup Language
  • HDML Handheld Device Markup Language
  • HTML Hyper Text Markup Language
  • a single communications system offers a plurality of communication services to users from a single provider. These services are required by users in order to effectively communicate with others and manage personal, as well as business, information.
  • the system provided by the fourth preferred embodiment offers to users the following services for home and business uses: local telephone service, cellular telephone service, long distance service, Internet access service, and a variety of messaging services that include voice mail, facsimile, electronic mail ("email"), and paging.
  • the system provides users with the option to obtain multiple email and voice mail accounts. Further, the system allows users to select either dial-up Internet access service or to broadband Internet access service, which includes Digital Subscriber Line service (DSL) and cable modem service.
  • DSL Digital Subscriber Line service
  • a communications system allowing a user to subscribe to each of these services from a single provider enables each service to be integrated together and operate seamlessly to the user.
  • This integration allows each service to easily and efficiently communicate with and transfer data to other services as will be described below.
  • a user also acquires these services from many different companies is subject to incompatibility problems which resulting from the varying products used by the different companies.
  • the system provided by the fourth preferred embodiment eliminates these incompatibility problems since all services are provided by the same company. This ability to eliminate incompatibility problems can improve user interest in the "collectively bundled" group of services made available by the provider. Further, significant cost economies can be realized by a service provider with the ability to offer a "collectively bundled" group of services. These cost economies can improve customer satisfaction with the service provider.
  • Providing a bundle of services enables the service provider to lower the unit costs for each service since a user will be subscribing to a several services provided by the same company.
  • Several economic advantages will also be realized by the service provider.
  • Much of the infrastructure (i.e., hardware and software) required for a service provider to provide the various services can be used for multiple services. Therefore, as the number of different services offered by a service provider increases, the less the amount of expenditures that need to be made for capital improvements.
  • the provider will be able to market a wide variety of services to users, each of which can be offered at a lower per unit cost than competitors. Further, the service provider will be able to actually increase revenues since it has the ability to provide multiple services. Referring to FIG.
  • the communications system 90 of the fourth preferred embodiment may be accessed by users via a voice enabled device 92 (i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units) or a computer 96 using a modem or other communication connection (i.e., Digital Subscriber Line connection, cable modem, LAN, or WAN).
  • a voice enabled device 92 i.e., any type of wireline or wireless telephone, Internet Protocol (IP) phones, or other special wireless units
  • IP Internet Protocol
  • a user accesses the communications system 90 via a telephone by calling a toll-free number.
  • FIG. 10 depicts a control system 100 used with the communications system 90 of the fourth preferred embodiment.
  • the control system 100 functions as a central system that monitors and controls the features, services, and functions of the communications system 90.
  • a user accesses the control system 100, he is presented with an operating menu that enables the user to control all of the services and features of the communications system 90.
  • the control system 100 provides three methods for a user to access the operating menu and handle his communications.
  • the control system contains a speech recognition software engine 102.
  • This speech recognition engine 102 uses phonemes to recognize speech commands. Therefore, the speech recognition engine can recognize naturally spoken speech commands and is speaker-independent; it does not have to be trained to recognize the speech patterns of each individual user.
  • a preferred speech recognition engine is developed by Nuance Communications of 1380 Willow Road, Menlo Park, California 94025 (www.nuance.com).
  • the natural speech recognition grammars i.e., what a user can say that will be recognized by the speech recognition engine) were developed by Webley Systems.
  • the control system 100 also contains a speech synthesis software engine 104 that converts text messages into audio messages that may be transmitted to a user.
  • a preferred speech synthesis engine is developed by Lernout and Hauspie Speech Products, 52 Third Avenu, Burlington, Massachusetts 01803 (www.lhsl.com).
  • the control system 100 also contains a call processing system 106 and telephony and voice hardware 108 required to interface the communications system 90 with the Public Switched Telephone Network (PSTN) 94.
  • PSTN Public Switched Telephone Network
  • a user may also access the communication system 90 using the touch-tone signals provided by the users voice enabled device 92, such as a telephone.
  • the third option a user has for accessing the communication system 90 is via a computer 96.
  • a user may access and control all of the available features and services of via the operating menu provided by the control system 100.
  • the user's computer 96 may be connected directly with the communication system 90 using a modem and the Public Switched Telephone Network (PSTN) 94, or via the Internet 97.
  • PSTN Public Switched Telephone Network
  • the communications system 90 comprises a control system 100 that is a central system connected to a variety of different communication services in a ring-type manner similar to a token ring.
  • the services connected to the control system 100 in the preferred embodiment include a local telephone access service 110, a cellular telephone service 112, a voice mail service 114, a facsimile service 116, an email service 118, an Internet access service 120, a long distance service 122, and a pager service 124.
  • These services may also be sub-divided into "business" related services on "home” related services. That is, the local telephone service may be subdivided into "home local telephone access" and "business local telephone access”.
  • the local telephone access service 110 is typically provided as a landline service (i.e., it is a non- wireless service).
  • the local telephone access service 110 may also be provided using voice- over-IP technology. This allows telephone calls to be established using a data network such as the Internet.
  • the control system 100 can communicate with all of the services 110, 112, 114, 116,
  • the control system 100 provides an interface between all of the services 110, 112, 114, 116, 118, 120, 122, and 124. By communicating through the control system 100, all of the services are able to communicate with each other. This enables a call being handled by the cellular telephone service 112 to be transferred to the voice mail service 114. Further, this arrangement allows a call being handled by the local telephone access service 110 to transfer the call to a user's cellular telephone, via the cellular telephone service 112, or to be transferred to the voice mail service 114. This example demonstrates a unique advantage of the preferred embodiment.
  • a single voice mail service 114 can be used to record messages from callers that have called either a user's cellular telephone or wireline telephone (the telephone associated with the local telephone access service). Therefore, a user who subscribes to these three services will be provided with the advantage of being able to retrieve all voice mail messages from one location.
  • the control system 100 can transfer calls, email messages, facsimile messages, or voice mail messages between any of the services that are part of the communications system 90.
  • a user can therefore instruct the control system 100 to forward any received email messages to a local facsimile machine specified within the facsimile service 116.
  • facsimile messages received for a user may be subsequently forwarded to a user's email address maintained by the email service 118.
  • the ability of the communications system 90 to provide users with a variety of options for receiving and routing messages and calls are beneficial features for users. These options provide users with an added level of control over how they manage information and communicate with others.
  • the communications system 90 of this embodiment also enables users to transfer telephone calls to other locations for a reduced fee.
  • the service provider may easily monitor the number of call transfers attempted by the user.
  • the service provider may allow a user to transfer a received call to one location free of charge. Any additional transfers would be subject to a fee. For instance, a user may specify that all incoming calls to the communications system 90 for a user should be transferred to a business telephone. This transfer would be done free of charge. However, if the user decides to transfer the call to an additional device, such as a cellular telephone, the user would be charged for this additional transfer.
  • This method for transferring telephone calls would present a substantial reduction in costs for cellular telephones users who subscribe to the cellular service 112 of the communications system 90.
  • Most cellular service providers charge per minute fees for any call received on a cellular telephone.
  • all calls that are initially transferred to this cellular telephone from the control system 100 would be free. No per minute usage fees would be charged. This would present a dramatic cost reduction for most users who previously possessed cellular service from an outside provider.
  • This feature allows users to instruct the communications system 90 to forward received calls to a second telephone number if the user does not answer the call at the first designated telephone number. Users can program the communications system 90 to "follow" the user by sequentially transferring a received call to different locations or services communications devices until the user is contacted.
  • the user can create a list of predetermined contact numbers used by the system 90 in trying to locate the user. This list may include telephone numbers for office, home, cellular telephone, pager and other designated locations. The user may also indicate the order in which the system 90 should call each of the numbers in trying to locate the user.
  • the "Follow Me” feature also logs the originating telephone number used by the user when accessing the communications system 90 to retrieve stored messages or make a telephone call. A user can instruct the system 90 to subsequently use this number to re- contact the user when an incoming call is received for that subscriber. For example, a user may be traveling and have the communications system 90 forward all telephone calls to the hotel room where the user is staying. Further, since the communications system is accessible via the telephone, the user is able to obtain and send messages from the hotel telephone.
  • the system advises the user of the telephone number of the calling party, and/or the callers identity.
  • the system may recognize the caller's identity by comparing the telephone number of the caller with a user's contact list which is stored within a database 126 connected with the control system 100. If the user is already on the telephone when a new call is received, the system will whisper the pending inbound call information to the user, allowing the user the option to take the call, thereby putting the user's current call on hold, or direct the pending inbound call into the user's voice mail system provided by the voice mail service 114.
  • the integrated nature of the services provided by the fourth preferred embodiment allows a user to either retrieve email messages using an Internet connection to the communication system 90 or retrieve then by having then read to the user over a telephone using the speech synthesis engine 104.
  • the integrated features of the communications system enables users to immediately respond to email messages by another email message, a voice mail message, or placing a call to the originator of the email message.
  • a speech-to- text feature enables users to create email messages using only a telephone. Additionally, the speech recognition feature discussed above allows users to edit, forward, saving or deleting email messages using speech commands and a telephone.
  • the speech synthesis engine 104 may also be used by users to review by telephone facsimile messages received by the facsimile service 116. Further, the speech-to-text feature * may be used to create facsimile messages by telephone. The user may then issue speech commands, which are recognized by the speech recognition engine 102, to send, edit, forward, or delete the existing or newly created facsimile messages.
  • the communication system 90 of the of the fourth embodiment also contains a
  • notification feature that enables users to be notified of messages received by the system.
  • a user can program the communications system 90 to notify the user via a pager, using the paging service 124, when an incoming message has been received. This notification can further indicate whether the incoming message is a voice mail, email, or a facsimile message.
  • the communications system 90 of the preferred embodiment enables users to maintain a contact list maintained in a database 126 accessible by the control system 100.
  • This contact list enables users to broadcast email, voice mail, or facsimile messages to groups of contacts.
  • This contact list can be accessed by the user at anytime using either an Internet connection or telephone connection with the communications system 90.
  • the speech recognition engine 102 described above enables users to access and edit the contact list and send messages to contacts by using simple speech commands.
  • the "collectively bundled" communication system also allows users to retrieve, on demand or at predetermined intervals, selected information from the Internet.
  • a user may establish predefined Internet searches using the Internet access service 106 of the communications system 90. The user can then specify that the Internet access service 106 perform the search using an Internet search engine (e.g., www.yahoo.com). The search can be performed upon receiving a speech command from the user or it may periodically be executed based upon a schedule set by the user.
  • the control system 100 would then notify the user of the search results using the method specified by the user. For example, the user may select to by notified of the search results by email, voice mail, or facsimile- Additionally, the speech synthesis engine 104 may be used to read the search results to the user over a telephone connection.
  • a user can access the communication system 90 via a computer 96.
  • the system 90 allows a user to access and play voice mail messages (using a downloadable audio player, such as RealPlayer, obtainable from www.real.com), read and send email and facsimile message, and manage the user's contact list using computer connection established through the Internet access service 120 or through a direct dial-up connection using the local telephone access service 110.
  • voice mail messages using a downloadable audio player, such as RealPlayer, obtainable from www.real.com
  • RealPlayer obtainable from www.real.com

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système permettant d'acquérir des informations à partir de certaines sources sur un réseau, notamment l'Internet. Un système de navigation vocale tient à jour une base de données contenant une liste de sources d'informations, telles que des sites web, connectée à un réseau. Chaque source d'informations reçoit un numéro d'ordre. En réponse à une commande vocale reçue d'un utilisateur, un système d'interface réseau accède à la source d'informations assortie du numéro d'ordre le plus élevé pour en extraire les informations demandées. En outre, un utilisateur peut créer un enregistrement, défini par l'utilisateur, identifiant une source d'informations contenant des informations qui présentent un intérêt pour l'utilisateur. Cet enregistrement renferme une grammaire de reconnaissance basée sur une commande vocale attribuée par l'utilisateur. Un système d'interface réseau répond à une commande vocale décrite dans la grammaire de reconnaissance par l'accès à la source d'informations et l'extraction des informations spécifiées dans l'enregistrement défini par l'utilisateur. Le fait d'associer un système de communication unifié audit système permet à un seul fournisseur de services de fournir différents services de communication.
PCT/US2001/003742 2000-02-04 2001-02-05 Systeme de navigation vocale robuste offrant un faisceau unifie de services telephoniques et reseau WO2001057850A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001234833A AU2001234833A1 (en) 2000-02-04 2001-02-05 Robust voice and device browser system including unified bundle of telephone andnetwork services

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US18055800P 2000-02-04 2000-02-04
US18034400P 2000-02-04 2000-02-04
US18034500P 2000-02-04 2000-02-04
US18034300P 2000-02-04 2000-02-04
US60/180,343 2000-02-04
US60/180,558 2000-02-04
US60/180,344 2000-02-04
US60/180,345 2000-02-04
US23306800P 2000-09-15 2000-09-15
US60/233,068 2000-09-15

Publications (2)

Publication Number Publication Date
WO2001057850A2 true WO2001057850A2 (fr) 2001-08-09
WO2001057850A3 WO2001057850A3 (fr) 2002-05-02

Family

ID=27539032

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/003742 WO2001057850A2 (fr) 2000-02-04 2001-02-05 Systeme de navigation vocale robuste offrant un faisceau unifie de services telephoniques et reseau

Country Status (2)

Country Link
AU (1) AU2001234833A1 (fr)
WO (1) WO2001057850A2 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005109826A1 (fr) * 2004-05-04 2005-11-17 Qualcomm Incorporated Procede et dispositif destines au classement de services multimedia et de progiciels
WO2016174585A1 (fr) * 2015-04-27 2016-11-03 Toonimo Inc. Directives multimédias adaptées à un contenu

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4776016A (en) * 1985-11-21 1988-10-04 Position Orientation Systems, Inc. Voice control system
US5086385A (en) * 1989-01-31 1992-02-04 Custom Command Systems Expandable home automation system
US5497373A (en) * 1994-03-22 1996-03-05 Ericsson Messaging Systems Inc. Multi-media interface
US5867494A (en) * 1996-11-18 1999-02-02 Mci Communication Corporation System, method and article of manufacture with integrated video conferencing billing in a communication system architecture
US5867495A (en) * 1996-11-18 1999-02-02 Mci Communications Corporations System, method and article of manufacture for communications utilizing calling, plans in a hybrid network
US5873080A (en) * 1996-09-20 1999-02-16 International Business Machines Corporation Using multiple search engines to search multimedia data
US5890123A (en) * 1995-06-05 1999-03-30 Lucent Technologies, Inc. System and method for voice controlled video screen display
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
US5974413A (en) * 1997-07-03 1999-10-26 Activeword Systems, Inc. Semantic user interface
US5999525A (en) * 1996-11-18 1999-12-07 Mci Communications Corporation Method for video telephony over a hybrid network
US6081518A (en) * 1999-06-02 2000-06-27 Anderson Consulting System, method and article of manufacture for cross-location registration in a communication system architecture
US6115742A (en) * 1996-12-11 2000-09-05 At&T Corporation Method and apparatus for secure and auditable metering over a communications network

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4776016A (en) * 1985-11-21 1988-10-04 Position Orientation Systems, Inc. Voice control system
US5086385A (en) * 1989-01-31 1992-02-04 Custom Command Systems Expandable home automation system
US5497373A (en) * 1994-03-22 1996-03-05 Ericsson Messaging Systems Inc. Multi-media interface
US5890123A (en) * 1995-06-05 1999-03-30 Lucent Technologies, Inc. System and method for voice controlled video screen display
US5873080A (en) * 1996-09-20 1999-02-16 International Business Machines Corporation Using multiple search engines to search multimedia data
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
US5867494A (en) * 1996-11-18 1999-02-02 Mci Communication Corporation System, method and article of manufacture with integrated video conferencing billing in a communication system architecture
US5867495A (en) * 1996-11-18 1999-02-02 Mci Communications Corporations System, method and article of manufacture for communications utilizing calling, plans in a hybrid network
US5999525A (en) * 1996-11-18 1999-12-07 Mci Communications Corporation Method for video telephony over a hybrid network
US6115742A (en) * 1996-12-11 2000-09-05 At&T Corporation Method and apparatus for secure and auditable metering over a communications network
US5974413A (en) * 1997-07-03 1999-10-26 Activeword Systems, Inc. Semantic user interface
US6081518A (en) * 1999-06-02 2000-06-27 Anderson Consulting System, method and article of manufacture for cross-location registration in a communication system architecture

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005109826A1 (fr) * 2004-05-04 2005-11-17 Qualcomm Incorporated Procede et dispositif destines au classement de services multimedia et de progiciels
KR100900008B1 (ko) * 2004-05-04 2009-05-29 콸콤 인코포레이티드 미디어 서비스들 및 프로그램 패키지들에 대한 랭킹을 위한방법 및 장치
US7830833B2 (en) 2004-05-04 2010-11-09 Qualcomm Incorporated Method and apparatus for ranking of media services and program packages
WO2016174585A1 (fr) * 2015-04-27 2016-11-03 Toonimo Inc. Directives multimédias adaptées à un contenu
US10564991B2 (en) 2015-04-27 2020-02-18 Toonimo Inc. Content adapted multimedia guidance

Also Published As

Publication number Publication date
AU2001234833A1 (en) 2001-08-14
WO2001057850A3 (fr) 2002-05-02

Similar Documents

Publication Publication Date Title
US10096320B1 (en) Acquiring information from sources responsive to naturally-spoken-speech commands provided by a voice-enabled device
US10320981B2 (en) Personal voice-based information retrieval system
US7412260B2 (en) Routing call failures in a location-based services system
US6728731B2 (en) Method and apparatus for accessing targeted, personalized voice/audio web content through wireless devices
US6885734B1 (en) System and method for the creation and automatic deployment of personalized, dynamic and interactive inbound and outbound voice services, with real-time interactive voice database queries
US7522711B1 (en) Delivery of audio driving directions via a telephone interface
US7286990B1 (en) Universal interface for voice activated access to multiple information providers
US20020164000A1 (en) System for and method of creating and browsing a voice web
US6983250B2 (en) Method and system for enabling a user to obtain information from a text-based web site in audio form
WO2001057850A2 (fr) Systeme de navigation vocale robuste offrant un faisceau unifie de services telephoniques et reseau
WO2001076212A1 (fr) Interface universelle pour un acces a commande vocale a une pluralite de fournisseurs d'information

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP