EP1606748A2 - Multimodus-warehouse-anwendungen - Google Patents
Multimodus-warehouse-anwendungenInfo
- Publication number
- EP1606748A2 EP1606748A2 EP04720473A EP04720473A EP1606748A2 EP 1606748 A2 EP1606748 A2 EP 1606748A2 EP 04720473 A EP04720473 A EP 04720473A EP 04720473 A EP04720473 A EP 04720473A EP 1606748 A2 EP1606748 A2 EP 1606748A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- input
- information
- voice
- modality
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/087—Inventory or stock management, e.g. order filling, procurement or balancing against orders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
Definitions
- the updated inventory data may include a revision of the listing, based on the job data and reflecting an action of the worker in performing the task.
- a first input modality of the plurality of input modalities may be associated with an auto-identification signal for identifying a distributed, selected, or counted item associated with the task.
- a worker may carry a container that is equipped with a reader for reading the auto-identification signal.
- the retail data may include product information related to quantity, location, or description information associated with a product for sale.
- the customer request may include a request for the product information, and the sales information may include the product information.
- the sales information also may include directions to a route for accessing products for sale.
- the retail data may include customer information related to a purchase history associated with an identified customer, and the sales information may include suggested purchases generated by the server based on the purchase history.
- the sales information may include pricing information, and the second field is operable to receive financial information for completing a purchase of the product for sale.
- the selected input modality may be associated with an auto-identification signal.
- FIG. 21 is a block diagram of a format for entering a web site address.
- FIG. 22 is a flow chart of a process for searching for one or matches to a search string.
- FIG. 23 is a block diagram of a system for performing one or more of the described processes.
- FIG. 24 is a block diagram of a structure for implementing a two-level, dynamic grammar.
- FIG. 25 is a web page for entering information about a user.
- FIG. 28 is an example of a limited implementation of the system of FIG. 14.
- FIG. 29 is an example of a process for using the system of FIG. 15.
- FIG. 38B-F are screenshots illustrating an item-picking process.
- FIG. 39 illustrates a portable digital assistant ("PDA") for use in the system of FIG. 35.
- FIGS. 40A-B are block diagrams illustrating item-moving techniques.
- FIG. 41 is a flow chart illustrating a process for stocking an item.
- FIG. 42 is a flow chart illustrating a process for taking an inventory of an item.
- FIG. 43 is a block diagram of a multimodal sales system.
- FIG. 44 is a flow chart of a process to access product information.
- FIG. 45 is a flow chart of a process to purchase a product.
- FIG. 46 A- J are screenshots of an implementation of the process of FIG. 45. DETAILED DESCRIPTION
- An “interface” refers to a component that either accepts input from a user or provides output to a user. Examples include a display, a printer, a speaker, a microphone, a touch screen, a mouse, a roller ball, a joystick, a keyboard, a temperature sensor, a light sensor, a light, a heater, an air quality sensor such as a smoke detector, and a pressure sensor.
- a component may be, for example, hardware, software, or a combination of the two.
- a modality gateway or a modality interface refers to a gateway (or interface) that is particularly adapted for a specific mode, or modes, of input and/or output.
- a browser is a modality gateway in which the modality includes predominantly manual modes of input (keyboard, mouse, stylus), visual modes of output (display), and possibly aural modes of output (speaker).
- multiple modes may be represented in a given modality gateway.
- a system may include several different modality gateways and interfaces, such gateways and interfaces are referred to as, for example, a first-modality gateway, a first-modality interface, a second-modality gateway, and a second-modality interface.
- a system 200 is one example of an implementation of the system 100.
- the control unit 140 is implemented with a web server 240 that includes a built-in synchronization controller.
- the device 160 is implemented by a device 260 that may be, for example, a computer or a mobile device.
- the first gateway 165 and the first interface 170 are implemented by a browser 265 and a browser interface 270, respectively, of the device 260.
- the second gateway 185 and the second interface 175 are implemented by a voice gateway 285 and a voice interface 275, respectively.
- a publish/subscribe system 250 is analogous to the publish/subscribe system 150.
- Connections 230, 280, 290, 294, and 296 are analogous to the connections 130, 180, 190, 194, and 196.
- the voice interface 275 may include, for example, a microphone and a speaker.
- the voice interface 275 may be used to send voice commands to, and receive voice prompts from, the voice gateway 285 over the connection 290.
- the commands and prompts may be transmitted over the connection 290 using, for example, voice telephony services over an Internet protocol ("IP") connection (referred to as voice over LP, or "VoIP").
- IP Internet protocol
- VoIP voice over LP
- the voice gateway 285 may perform the voice recognition function for incoming voice data.
- the voice gateway 285 also may receive from the web server 240 VXML pages that include dialogue entries for interacting with the user using voice.
- the voice gateway 285 may correlate recognized words received from the user with the dialogue entries to determine how to respond to the user's input. Possible responses may include prompting the user for additional input or executing a command based on the user's input.
- the publish/subscribe system 250 may function, for example, as a router for subscribed entities. For example, if the gateways 265 and 285 are subscribed, then the publish/subscribe system 250 may route messages from the web server 240 to the gateways 265 and 285.
- FIGS. 3-6 depict examples of processes that may be performed using the system 200. Four such processes are described, all dealing with synchronizing two gateways after a user has navigated to a new page using one of the two gateways. The four processes are server push, browser pull, voice-interrupt listener, and no-input tag. Referring to FIG.
- a process 300 for use with the system 200 includes having the browser 265 subscribe to the publish/subscribe system 250 (310). Subscription may be facilitated by having the web server 240 insert a function call into a HTML page. When the browser 265 receives and loads the page, the function call is executed and posts a subscription to the publish/subscribe system 250. The subscription includes a call-back pointer or reference that is inserted into the subscription so that, upon receiving a published message, the publish/subscribe system 250 can provide the message to the browser 265. After subscribing, the browser 265 then listens to the publish/subscribe system 250 for any messages. In one implementation, the browser 265 uses multiple frames including a content frame, a receive frame, and a send frame. The send frame is used to subscribe; the receive frame is used to listen; and the content frame is the only frame that displays content. Subscription (310) may be delayed in the process 300, but occurs before the browser 265 receives a message (see 350).
- the process 300 includes having the voice gateway 285 request a VXML page
- a page may be, for example, a content page or a server page.
- a content page includes a web page, which is what a user commonly sees or hears when browsing the web.
- Web pages include, for example, HTML and VXML pages.
- a server page includes a programming page such as, for example, a Java Server Page ("JSP").
- JSP Java Server Page
- a server page also may include content.
- the web server 240 need not designate the particular browser 265 for which the message is intended (by, for example, specifying an IP address and a port number). Rather, the web server 240 sends a message configured for a specific "topic" (usually a string parameter). All subscribers to that topic receive the message when the message is published by the web server 240 using the publish/subscribe system 250.
- an HTML page corresponds to a VXML page if the HTML page and the VXML page allow the user to interface with the same information.
- An item may correspond to itself if two gateways can use the item to allow a user to interface with information in the item using different modalities.
- the process 300 includes having the publish/subscribe system 250 receive the message from the web server 240 and send the message to the browser 265 (350).
- the publish/subscribe system 250 may use another HTTP post message to send the message to all subscribers of the specified topic.
- the publish/subscribe system 250 may use a call-back pointer or reference that may have been inserted into the subscription from the browser 265.
- the process 300 includes having the browser 265 receive the message (360).
- the browser 265 is assumed to be in a streaming HTTP mode, meaning that the HTTP connection is kept open between the browser 265 and the publish/subscribe system 250. Because the browser 265 is subscribed, a HTTP connection is also kept open between the publish/subscribe system 250 and the web server 240.
- the web server 240 repeatedly instructs the browser 265, through the publish/subscribe system 250, to "keep alive" and to continue to display the current HTML page. These "keep alive" communications are received by the receive frame of the browser 265 in an interrupt fashion.
- the browser 265 receives the message in the browser receive frame and executes the embedded JavaScript command.
- a process 400 for use with the system 200 includes having the voice gateway 285 request a VXML page
- the web server 240 may delay sending the VXML page until later in the process 400 in order, for example, to better time the arrival of the requested VXML page at the voice gateway 285 with the arrival of the corresponding HTML page at the browser 265.
- the process 400 includes having the web server 240 note that the state of the voice gateway 285 has changed and determine the corresponding page that the browser 265 needs in order to remain synchronized (430).
- the web server 240 thus tracks the state of the gateways 265 and 285.
- the process 400 includes having the browser 265 send a request to the web server 240 for any updates (440).
- the requests are refresh requests or requests for updates, and the browser 265 sends the requests on a recurring basis from a send frame using a HTTP get message.
- the process 400 includes having the web server 240 send a response to update the browser 265 (450).
- the web server 240 responds to the refresh requests by sending a reply message to the browser receive frame to indicate "no change.”
- the web server 240 embeds a JavaScript command in the refresh reply to the browser 265 that, upon execution by the browser 265, results in the browser 265 coming to a synchronized state.
- the JavaScript command for example, instructs the browser 265 to load a new HTML page.
- the process 400 includes having the browser 265 receive the response and execute the embedded command (460). Upon executing the embedded command, the browser 265 content frame is updated with the corresponding HTML page. The command provides the URL of the corresponding page.
- the web server 240 sends a standard response to indicate "no changes" and to instruct the browser 265 to reload the current HTML page from the web server 240. However, the web server 240 also embeds a command in the current HTML page on the web server
- the browser 265 will execute the embedded command and update the HTML page.
- a process 500 for use with the system 200 includes having the voice gateway 285 subscribe to the publish/subscribe system 250 (510).
- a function call may be embedded in a VXML page received from the web server 240, and the function call may be executed by the voice gateway 285 to subscribe to the publish/subscribe system 250.
- the voice gateway 285 may be referred to as voice-interrupt listener.
- the voice gateway 285 can subscribe at various points in time, such as, for example, when the voice gateway 285 is launched or upon receipt of a VXML page. In contrast to a browser, the voice gateway does not use frames. Subscription (510) may be delayed in the process 500, but occurs before the voice gateway 285 receives a message (see 550).
- the process 500 includes having the browser 265 request from the web server 240 a HTML page (520) and having the web server 240 send to the browser 265 the requested
- HTML page (530) This may be initiated, for example, by a user selecting a new URL from a "favorites" pull-down menu on the browser 265.
- the web server 240 may delay sending the requested HTML page (530) until later in the process 500 in order, for example, to better time the arrival of the requested HTML page at the browser 265 with the arrival of the corresponding VXML page at the voice gateway 285.
- the process 500 includes having the web server 240 send a message to the publish/subscribe system 250 to indicate a corresponding VXML page (540).
- the web server 240 sends a HTTP post message to the publish/subscribe system 250, and this message includes a topic to which the voice gateway 285 is subscribed.
- the web server 240 also embeds parameters, as opposed to embedding a JavaScript command, into the message.
- the embedded parameters indicate the corresponding VXML page.
- the process 500 includes having the publish/subscribe system 250 send the message to the voice gateway 285 (550).
- the publish/subscribe system 250 may simply reroute the message to the subscribed voice gateway 285 using another HTTP post message.
- the process 500 also includes having the voice gateway 285 receive the message (560).
- the voice gateway 285 is assumed to be in a streaming HTTP mode, listening for messages and receiving recurring "keep alive" messages from the publish/subscribe system 250.
- the voice gateway 285 analyzes the embedded parameters and executes a command based on the parameters.
- the command may be, for example, a request for the corresponding VXML page from the web server 240.
- a process 600 for use with the system 200 includes having the web server 240 send to the voice gateway 285 a VXML page with a no-input tag embedded (610). Every VXML page may have a no-input markup tag ( ⁇ no input>) that specifies code on the voice gateway 285 to run if the voice gateway 285 does not receive any user input for a specified amount of time.
- the URL of a JSP (Java Server Page) is embedded in the code, and the code tells the voice gateway 285 to issue a HTTP get command to retrieve the JSP.
- the same no-input tag is embedded in every VXML page sent to the voice gateway 285 and, accordingly, the no-input tag specifies the same JSP each time.
- the process 600 includes having the browser 265 request a HTML page (620), having the web server 240 send the requested HTML page to the browser 265 (630), and having the web server 240 note the state change and determine a corresponding VXML page (640).
- the web server 240 updates the contents of the JSP, or the contents of a page pointed to by the JSP, with information about the corresponding VXML page. Such information may include, for example, a URL of the corresponding VXML page.
- the web server 240 may delay sending the requested HTML page (630) until later in the process 600 in order, for example, to better time the arrival of the requested HTML page at the browser 265 with the arrival of the corresponding VXML page at the voice gateway 285.
- the process 600 includes having the voice gateway 285 wait the specified amount of time and send a request for an update (650). After the specified amount of time, as determined by the code on the voice gateway 285, has elapsed, the voice gateway 285 issues a HTTP get command for the JSP. When no user input is received for the specified amount of time, the user may have entered input using a non- voice mode and, as a result, the voice gateway 285 may need to be synchronized.
- the process 600 includes having the web server 240 receive the update request and send the corresponding VXML page to the voice gateway 285 (660).
- the JSP contains an identifier of the corresponding VXML page, with the identifier being, for example, a URL or another type of pointer.
- the web server 240 issues a HTTP post message to the voice gateway 285 with the VXML page corresponding to the current HTML page.
- the process 600 includes having the voice gateway 285 receive the corresponding
- VXML page (670).
- the voice gateway 285 receives and loads the corresponding VXML page, and the browser 265 receives and loads the HTML page (see 630), the two gateways 265 and 285 are synchronized. It is possible, however, that the two gateways 265 and 285 were never unsynchronized because the user did not enter a browser input, in which case the voice gateway 285 simply reloads the current VXML page after no voice input was received during the specified amount of waiting time.
- the process 600 has an inherent delay because the process waits for the voice gateway 285 to ask for an update. It is possible, therefore, that the voice gateway 285 will be out of synchronization for a period of time on the order of the predetermined delay.
- a voice input received while the voice gateway 285 is out of synchronization can be handled in several ways. Initially, if the context of the input indicates that the gateways 265 and 285 are out of synchronization, then the voice input may be ignored by the voice gateway 285. For example, if a user clicks on a link and then speaks a command for a dialogue that would correspond to the new page, the voice gateway 285 will not have the correct dialogue.
- the web server 240 may determine that the gateways 265 and 285 are not in synchronization and may award priority to either gateway. Priority may be awarded, for example, on a first-input basis or priority may be given to one gateway as a default.
- a system 700 includes a web server 710 communicating with a synchronization controller 720 on a device 730.
- the device 730 also includes a browser 735 in communication with the browser interface 270, and a voice mode system 740 in communication with the voice interface 275.
- the web server 710 may be, for example, a standard web server providing HTML and VXML pages over a HTTP connection.
- the device 730 may be, for example, a computer, a portable personal digital assistant ("PDA"), or other electronic device for communicating with the Internet.
- the device 730 is a portable device that allows a user to use either browser or voice input and output to communicate with the Internet.
- the web server 710 does not need to be redesigned because all of the synchronization and communication is handled by the synchronization controller 720.
- the voice mode system 740 stores VXML pages that are of interest to a user and allows a user to interface with these VXML pages using voice input and output.
- VXML pages can be updated or changed as desired and in a variety of ways, such as, for example, by downloading the VXML pages from the WWW during off-peak hours.
- the voice mode system 740 is a voice gateway, but is referred to as a voice mode system to note that it is a modified voice gateway.
- the voice mode system 740 performs voice recognition of user voice input and renders output in a simulated voice using the voice interface 275.
- the synchronization controller 720 also performs synchronization between the browser and voice modes. Referring to FIGS. 8 and 9, two processes are described for synchronizing the browser 735 and the voice mode system 740, or alternatively, the browser interface 270 and the voice interface 275.
- a process 800 includes having the synchronization controller 720 receive a browser request for a new HTML page (810).
- the browser 735 may be designed to send requests to the synchronization controller 720, or the browser 735 may send the requests to the web server 710 and the synchronization controller 720 may intercept the browser requests.
- the process 800 includes having the synchronization controller 720 determine a VXML page that corresponds to the requested HTML page (820).
- the HTML data also includes the URL for the corresponding VXML page.
- the browser 735 sends both the URL for the requested HTML page and the URL for the corresponding VXML page to the synchronization controller 720.
- the synchronization controller 720 determines the corresponding VXML page simply by receiving from the browser 265 the URL for the corresponding VXML page.
- the synchronization controller 720 also may determine the corresponding page by, for example, performing a table look-up, accessing a database, applying a translation between HTML URLs and VXML URLs, or requesting information from the web server 710.
- the process 800 includes having the synchronization controller 720 pass the identifier of the corresponding VXML page to the voice mode system 740 (830).
- the identifier may be, for example, a URL.
- the voice mode system 740 may intercept browser requests for new HTML pages, or the browser 735 may send the requests to the voice mode system 740. In both cases, the voice mode system 740 may determine the corresponding VXML page instead of having the synchronization controller 720 determine the corresponding page (820) and send an identifier (830).
- the process 800 includes having the synchronization controller 720 pass the browser's HTML page request on to the server 710 (840).
- the synchronization controller 720 may, for example, use a HTTP request. In implementations in which the synchronization controller 720 intercepts the browser's request, passing of the request
- the process 900 includes having the synchronization controller 720 request the corresponding HTML page from the web server 710 (950) and having the browser receive the corresponding HTML page (960).
- the synchronization controller 720 may use, for example, a HTTP get command.
- a system 1000 includes having a web server 1010 communicate with both a synchronization controller 1020 and a voice gateway 1025.
- the synchronization controller 1020 further communicates with both the voice gateway 1025 and several components on a device 1030.
- the device 1030 includes the browser interface 270, a browser 1040, and the voice interface 275.
- the browser 1040 communicates with the browser interface 270 and the synchronization controller 1020.
- the voice interface 275 communicates with the synchronization controller 1020.
- the web server 1010 is capable of delivering HTML and VXML pages.
- the device 1030 may be, for example, a computer or a portable PDA that is equipped for two modes of interfacing to the WWW.
- the system 1000 allows the two modes to be synchronized, and the system 1000 does not require the web server 1010 to be enhanced or redesigned because the synchronization controller 1020 is independent and separate from the web server 1010.
- FIGS. 11 and 12 two processes are described for synchronizing the browser 1040 and the voice gateway 1025, or alternatively, the browser interface 270 and the voice interface 275. Both processes assume that the user input is a request for a new page, although other inputs may be used. Referring to FIG.
- a process 1100 includes having the synchronization controller 1020 receive a browser request for a new HTML page (1110). The process 1100 also includes having the synchronization controller 1020 pass the HTML request on to the web server 1010 (1120) and determine the corresponding VXML page (1130). These three operations 1110-1130 are substantially similar to the operations 810, 840, and 820, respectively, except for the location of the synchronization controller (compare 720 with 1120). The synchronization controller 1020 may delay sending the browser request to the web server 1010 (1120) until later in the process 1100 in order, for example, to better time the arrival of the requested HTML page at the browser 1040 with the arrival of the corresponding VXML page at the synchronization controller 1020 (see 1150).
- the process 1100 includes having the synchronization controller 1020 request the corresponding VXML page through the voice gateway 1025 (1140).
- the synchronization controller 1020 may request the page in various ways. For example, the synchronization controller 1020 may send a simulated voice request to the voice gateway 1025, or may send a command to the voice gateway 1025.
- the process 1100 includes having the synchronization controller 1020 receive the corresponding VXML page (1150).
- the voice gateway 1025 receives the requested VXML page and sends the VXML page to the synchronization controller 1020.
- the synchronization controller 1020 does not receive the VXML page, and the voice gateway 1025 does the voice recognition and interfacing with the user with the synchronization controller 1020 acting as a conduit.
- a process 1200 includes having the synchronization controller 1020 receive a voice input from the voice interface 275 requesting a new VXML page (1210).
- the process 1200 includes having the synchronization controller (i) parse the voice input and pass the request for a new VXML page along to the voice gateway 1025 (1220), and (ii) determine the corresponding HTML page (1230).
- the synchronization controller 1020 has access to and stores the current
- VXML page which allows the synchronization controller 1020 to parse the voice input.
- having the current VXML page also may allow the synchronization controller 1020 to determine the corresponding HTML page for "voice click" events. If the user's input is not the voice equivalent of clicking on a link, but is, for example, a spoken URL, then by having the capability to do the voice recognition, the synchronization controller may be able to parse the URL and request that the server provide the URL for the corresponding HTML page.
- the process 1200 includes having the synchronization controller 1020 request the corresponding HTML page from the server (1240), and having the browser receive the requested HTML page (1250). In another implementation, the synchronization controller 1020 does not determine the corresponding page, but requests that the web server 1010 determine the corresponding page and send the corresponding page.
- the synchronization controller 1020 does not parse the voice input, but merely passes the VoLP request along to the voice gateway 1025. If the voice input is a request for a VXML page, the voice gateway 1025 determines the corresponding HTML page and provides the synchronization controller 1020 with a URL for the HTML page.
- a device 1300 includes a synchronization controller interface
- the device 1300 is similar to the device 1030 except that the functionality allowing the browser 1040 and the voice interface 275 to communicate with the synchronization controller 1020 is separated as the synchronization controller interface 1310.
- the device 1300 is a mobile device. Such a mobile device is smaller and lighter than if a synchronization controller was also implemented on the mobile device. Further, because such a mobile device does not contain the functionality of a synchronization controller, but only includes an interface, the mobile device may be able to take advantage of improvements in a synchronization controller without having to be redesigned.
- Each of the above implementations may be used with more than two different modes.
- inventory, shipping, or other data may be accessed in a warehouse using three different modes, and one or more machines accessing the warehouse data may need to be synchronized.
- the first mode may include keyboard input; the second mode may include voice input; and the third mode may include input from scanning a bar code on a pallet, for example, to request a particular record.
- Output for any of the modes may include, for example, display output, voice output, or printer output.
- the server system 110 includes one or more devices for storing, at least temporarily, information that can be accessed by one or more gateways.
- a web server has a storage device for storing web pages.
- the server system 110 may include multiple storage devices that are located locally or remotely with respect to each other.
- the server system 110 may include one or more storage devices that are located locally to another component, such as, for example, the device 160 or the second gateway 185.
- the server system 110 or the synchronization controller 120 are not contained in the unit 140.
- connections 130, 180, 190, 194, and 196, and other connections throughout the disclosure may be direct or indirect connections, possibly with one or more intervening devices.
- a connection may use one or more media such as, for example, a wired, a wireless, a cable, or a satellite connection.
- a connection may use a variety of technologies or standards such as, for example, analog or digital technologies, packet switching, code division multiple access (“CDMA”), time division multiple access (“TDMA”), and global system for mobiles (“GSM”) with general packet radio service
- GPRS Globalstar Network
- a connection may use a variety of established networks such as, for example, the Internet, the WWW, a wide-area network (“WAN”), a local-area network (“LAN”), a telephone network, a radio network, a television network, a cable network, and a satellite network.
- WAN wide-area network
- LAN local-area network
- telephone network such as, for example, the Internet, the WWW, a wide-area network (“WAN”), a local-area network (“LAN”), a telephone network, a radio network, a television network, a cable network, and a satellite network.
- Such information may include, for example, the first data itself, an address of the first data or some other pointer to the first data, an encoding of the first data, and parameters identifying particular information from the first data.
- the first data may include any of the many examples described in this disclosure as well as, for example, an address of some other data, data entered by a user, and a command entered by a user.
- a mobile device By keeping the synchronization controller functions off of a mobile device, for example, mobile devices maybe more lightweight, less expensive, and more robust to technology enhancements in the synchronization controller.
- a proxy model By using a proxy model, a mobile device is still free of the synchronization controller and enjoys the noted benefits. Further, by using a proxy model, the multitude of existing web servers may not need to be redesigned, and the synchronization controller may allow multiple types of mobile devices to communicate with the same server infrastructure.
- Using a publish subscribe system operating as in the implementations described or according to other principles, also may facilitate an architecture with minimal install time for client devices, such that client devices are changed only minimally.
- a synchronization controller may consist of one or more components adapted to perform, for example, the functions described for a synchronization controller in one or more of the implementations in this disclosure.
- the components may be, for example, hardware, software, firmware, or some combination of these.
- Hardware components include, for example, controller chips and chip sets, communications chips, digital logic, and other digital or analog circuitry.
- Such synchronizing mechanisms may include, for example, (i) sending a message to a publish/subscribe system, (ii) sending a message to a browser, possibly with a URL for a new page or a JSP, (iii) updating state information by, for example, updating a JSP, (iv) sending a corresponding page directly to a gateway, (v) requesting a corresponding page from an intermediary or from a storage location having the page, (vi) determining a corresponding page, and (vii) requesting a determination of a corresponding page and, possibly, requesting receipt of that determination.
- Various of the listed mechanisms may be performed by a synchronization controller, a web server, a gateway, or another component adapted to provide such functionality.
- page is not meant to be restrictive and refers to data in a form usable by a particular gateway, interface, or other component.
- a page may be sent to a gateway, provided to a gateway, or received from a gateway, even though the page may first go through a controller or a publish/subscribe system.
- a corresponding page may be determined by requesting another component to provide the corresponding URL.
- a device may be programmed to associate the multiple modes supported on the device. Implementations described above also may query a user for information that identifies the modes and/or devices that the user desires to have associated.
- a user interface may allow a user to gain access to data, such as, for example, products in a catalog database, or to enter data into a system, such as, for example, entering customer information into a customer database.
- User interfaces are used for applications residing on relatively stationary computing devices, such as desktop computers, as well as for applications residing on mobile computing devices, such as laptops, palmtops, and portable electronic organizers.
- a voice-activated user interface can be created to provide data access and entry to a system, and voice input may be particularly appealing for mobile devices.
- a grammar for speech recognition for a given voice- driven application, mobile or otherwise can be written to enable accurate and efficient recognition.
- Particular implementations described below provide a user interface that allows a user to input data in one or more of a variety of different modes, including, for example, stylus and voice input.
- Output also may be in one or more of a variety of modes, such as, for example, display or voice.
- Particular implementations may be used with mobile devices, such as, for example, palmtops, and the combination of voice and stylus input with voice and display output may allow such mobile devices to be more useful to a user. Implementations also may be used with the multi-modal synchronization system described in the incorporated provisional application.
- Implementations allow enhanced voice recognition accuracy and/or speed due in part to the use of a structured grammar that allows a grammar to be narrowed to a relevant part for a particular voice recognition operation. For example, narrowing of the grammar for a voice recognition operation on a full search string may be achieved by using the results of an earlier, or parallel, voice recognition operation on a component of the full search string. Other implementations may narrow the grammar by accepting parameters of a search string in a particular order from a user, and, optionally, using the initial parameter(s) to narrow the grammar for subsequent parameters.
- Examples include (i) reversing the standard order of receiving street address information so that, for example, the country is received before the state and the grammar used to recognize the state is narrowed to the states in the selected country, (ii) segmenting an electronic mail address or web site address so that a user supplies a domain identifier, such as, for example "com,” separately, or (iii) automatically inserting the "at sign" and the "dot” into an electronic mail address and only prompting the user for the remaining terms, thus obviating the often complex process of recognizing these spoken characters. Implementations also may increase recognition accuracy and speed by augmenting a grammar with possible search strings, or utterances, thus decreasing the likelihood that a voice recognition system will need to identify an entry by its spelling.
- Implementations also allow enhanced database searching. This may be achieved, for example, by using a structured grammar and associating grammar entries with specific database entries. In this manner, when the structured grammar is used to recognize the search string, then particular database entries or relevant portions of the database may be identified at the same time.
- Searching the first search space may include searching a database.
- Searching the limited second search space may include searching at least part of the database.
- Limiting the second search space may include limiting the part of the database that is searched to database entries that include a match for the first part of the search string, thus allowing a quicker search compared to searching the full database.
- the second part of the search string may include a voice input or a manual input. Searching the first search space and searching the limited second search space may be performed at least partially in parallel.
- the search string may include an address.
- Accessing the first part of the search string may include accessing a voice input. Searching the first search space for the match may include performing voice recognition on the first part of the search string. Accessing at least the second part of the search string may include accessing the voice input. Limiting the second search space may include limiting the second search space to grammar entries associated with the first part of the search string. Searching the limited second search space may include performing voice recognition on at least the second part of the search string using the limited second search space, thereby allowing enhanced voice recognition of the second part of the search string compared to performing voice recognition using the unlimited second search space.
- Accepting the first input may include accepting a first voice input and performing voice recognition on the first input, wherein performing voice recognition on the first input in isolation allows enhanced voice recognition compared to performing voice recognition on the search string.
- the first set of options may include manufacturer designations and the second set of options may include product designations from a manufacturer designated by the first input.
- the search string may include an address.
- Accepting the first input may include receiving the first input auditorily from the user.
- Voice recognition may be performed on the first input in isolation, wherein performing voice recognition on the first input in isolation allows enhanced voice recognition compared to performing voice recognition on the search string.
- Providing the second set of options may include searching a set of data items for the first input and including in the second set of options references only to those data items, from the set of data items, that include the first input.
- Accepting the second input may include receiving the second input auditorily from the user.
- Voice recognition may be performed on the second input in isolation, wherein performing voice recognition on the second input in isolation allows enhanced voice recognition compared to performing voice recognition on the search string.
- a third set of options may be provided to the user, the third set of options relating to a third part of the search string and being provided to the user in the page.
- a third input may be accepted from the user, the third input being selected from the third set of options, wherein the second set of options that is provided to the user is also based on the accepted third input.
- the second set of options may be modified based on the third input.
- the first set of options may include manufacturer designations.
- the third set of options may include price range designations.
- the second set of options may include product designations from a manufacturer designated by the first input in a price range designated by the third input.
- a grammar for speech recognition for a given voice- driven application can be written to enable accurate and efficient recognition.
- Particular implementations described below provide a user interface that allows a user to input data in one or more of a variety of different modes, including, for example, stylus and voice input.
- Output also may be in one or more of a variety of modes, such as, for example, display or voice.
- Particular implementations may be used with mobile devices, such as, for example, palmtops, and the combination of voice and stylus input with voice and display output may allow such mobile devices to be more useful to a user. Implementations also may be used with the multi-modal synchronization system described in the incorporated provisional application.
- Implementations allow enhanced voice recognition accuracy and or speed due in part to the use of a structured grammar that allows a grammar to be narrowed to a relevant part for a particular voice recognition operation. For example, narrowing of the grammar for a voice recognition operation on a full search string may be achieved by using the results of an earlier, or parallel, voice recogmtion operation on a component of the full search string. Other implementations may narrow the grammar by accepting parameters of a search string in a particular order from a user, and, optionally, using the initial parameters) to narrow the grammar for subsequent parameters.
- Examples include (i) reversing the standard order of receiving street address information so that, for example, the country is received before the state and the grammar used to recognize the state is narrowed to the states in the selected country, (ii) segmenting an electronic mail address or web site address so that a user supplies a domain identifier, such as, for example "com,” separately, or (iii) automatically inserting the "at sign" and the "dot” into an electronic mail address and only prompting the user for the remaining terms, thus obviating the often complex process of recognizing these spoken characters. Implementations also may increase recognition accuracy and speed by augmenting a grammar with possible search strings, or utterances, thus decreasing the likelihood that a voice recognition system will need to identify an entry by its spelling.
- Implementations also allow enhanced database searching. This may be achieved, for example, by using a structured grammar and associating grammar entries with specific database entries. In this manner, when the structured grammar is used to recognize the search string, then particular database entries or relevant portions of the database may be identified at the same time.
- performing voice recognition includes accessing a voice input including at least a first part and a second part, performing voice recognition on the first part of the voice input, performing voice recognition on a combination of the first part and the second part using a search space, and limiting the search space based on a result from performing voice recognition on the first part of the voice input. Limiting the search space allows enhanced voice recognition of the combination compared to performing voice recognition on the unlimited search space.
- Performing voice recognition on the first part may produce a recognized string, and the recognized string may be associated with a set of recognizable utterances from the search space.
- Limiting the search space may include limiting the search space to a set of recognizable utterances.
- Voice recognition on the first part may be performed in parallel with voice recognition on the combination, such that the search space is not limited until after voice recognition on the combination has begun.
- Voice recognition on the first part may be performed before voice recognition on the combination, such that the search space is limited before voice recognition on the combination has begun.
- Performing voice recognition on the first part of the voice input may include comparing the first part to a set of high-occurrence patterns in the search space, followed by comparing the first part to a set of low-occurrence patterns in the search space.
- Performing voice recognition on the first part of the voice input may include using a second search space.
- Voice recognition may be performed on the second part of the voice input.
- the second search space may be limited based on a result from performing voice recognition on the second part of the voice input. Limiting the search space also may be based on the result from performing voice recognition on the second part of the voice input.
- Accessing circuitry may be used to access a voice input including at least a first part and a second part.
- Recognition circuitry may be used to perform voice recognition on the first part of the voice input and on the combination of the first part and the second part, wherein voice recognition may be performed on the combination using a search space.
- a recognition engine may be used and may include the recognition circuitry.
- Limiting circuitry may be used to limit the search space based on a result from performing voice recognition on the first part of the voice input. Limiting the search space may allow enhanced voice recognition of the voice input compared to performing voice recognition on the unlimited search space.
- One or more of the accessing circuitry, the recognition circuitry, and the limiting circuitry may include a memory with instructions for performing one or more of the operations of accessing the voice input, performing voice recognition, and limiting the search space based on the result from performing voice recognition on the first part of the voice input.
- One or more of the accessing circuitry, the recognition circuitry, and the limiting circuitry may include a processor to perform one or more of the operations of accessing the voice input, performing voice recognition, and limiting the search space based on the result from performing voice recognition on the first part of the voice input.
- the circuitry may be used to perform one of the other features described for this or another aspect.
- accepting input from a user includes providing a first set of options to a user, the first set of options relating to a first parameter of a search string, and being provided to the user in a page. A first input is accepted from the user, the first input being selected from the first set of options.
- a second set of options is limited based on the accepted first input, the second set of options relating to a second parameter of the search string.
- the second set of options is provided to the user in the page, such that the user is presented with a single page that provides the first set of options and the second set of options.
- Accepting the first input from the user may include receiving an auditory input and performing voice recognition. Performing voice recognition on the first input in isolation may allow enhanced voice recognition compared to performing voice recognition on the search string. Accepting the first input from the user may include receiving a digital input.
- a second input may be accepted from the user, the second input being selected from the second set of options.
- Accepting the first input may include receiving the first input auditorily from the user.
- Voice recognition may be performed on the first input in isolation. Performing voice recognition on the first input in isolation may allow enhanced voice recognition compared to performing voice recognition on the search string.
- Providing the second set of options may include searching a set of data items for the first input and including in the second set of options references only to those data items that include the first input.
- Accepting the second input may include receiving the second input auditorily from the user.
- Voice recognition may be performed on the second input in isolation. Performing voice recognition on the second input in isolation may allow enhanced voice recognition compared to performing voice recognition on the search string.
- a third set of options may be provided to the user, and the third set of options may relate to a third parameter of the search string and be provided to the user in the page.
- a third input may be accepted from the user, and the third input may be selected from the third set of options.
- the second set of options provided to the user also may be based on the accepted third input.
- the second set of options provided to the user may be modified based on the accepted third input.
- Providing the second set of options may include searching a set of data for the first input and providing only data items from the set of data that include the first input.
- the first input may include a manufacturer designation that identifies a manufacturer.
- Providing the second set of options may be limited to providing only data items manufactured by the identified manufacturer.
- Circuitry may be used (i) to provide a first set of options to a user, the first set of options relating to a first parameter of a search string, and being provided to the user in a page, (ii) to accept a first input from the user, the first input being selected from the first set of options, (iii) to limit a second set of options based on the accepted first input, the second set of options relating to a second parameter of the search string, and/or (iv) to provide the second set of options to the user in the page, such that the user is presented with a single page that provides the first set of options and the second set of options.
- the circuitry may include a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed.
- the circuitry may include a processor operable to perform at least one of the enumerated operations.
- the circuitry may be used to perform one of the other features described for this or another aspect.
- receiving items of an address from a user includes providing the user a first set of options for a first item of an address, receiving from the user the first address item taken from the first set of options, limiting a second set of options for a second item of the address based on the received first item, providing the user the limited second set of options for the second address item, and receiving the second address item.
- Recognition may be performed on the received second address item. Performing voice recognition on the second address item in isolation may allow enhanced voice recognition compared to performing voice recognition on a combination of the first address item and the second address item or on the address.
- the first address item may include a state identifier.
- the second address item may include a city identifier identifying a city.
- the user may be provided a third list of options for a zip code identifier. The third list of options may exclude a zip code not in the identified city.
- the zip code identifier may be received auditorily from the user. The user may select the zip code identifier from the third list of options.
- the zip code identifier may identify a zip code. Voice recognition may be performed on the auditorily received zip code identifier.
- Excluding a zip code in the third list of options may allow enhanced voice recognition compared to not excluding a zip code.
- the user may be provided a fourth list of options for a street address identifier.
- the fourth list of options may exclude a street not in the identified zip code.
- the street address identifier may be received auditorily from the user.
- the user may select the street address identifier from the fourth list of options.
- the street address identifier may identify a street address.
- Voice recognition may be performed on the auditorily received street address identifier.
- Exclusion of a street in the fourth list of options may allow enhanced voice recognition compared to not excluding a street.
- Providing the user the first list of options may include providing the first list on a display.
- Providing the user the second list of options may include providing the second list auditorily.
- Circuitry may be used (i) to provide the user a first set of options for a first item of an address, (ii) to receive from the user the first address item taken from the first set of options, (iii) to limit a second set of options for a second item of the address based on the received first item, (iv) to provide the user the limited second set of options for the second address item, and/or (v) to receive the second address item.
- the circuitry may include a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed.
- the circuitry may include a processor operable to perform at least one of the enumerated operations.
- the circuitry may be used to perform one of the other features described for this or another aspect.
- both recognition processes (1430) are performed at least partially in parallel and recognizing the smaller component, such as "Sony," is faster than recognizing the entire search string.
- the recognition process for the full search string is started on the entire search space of grammar entries and is narrowed after the resulting solution space for the smaller component is determined in operation 1440.
- the process 1600 includes providing a second set of options for a second parameter based on the first parameter (1630).
- a user interface may provide a list of product types, including, for example, desktops, laptops, and palmtops, that are available from the manufacturer entered in operation 1620.
- the process 1600 includes entering a second parameter selected from the second set of options (1640). Continuing the example from above, a user may select, and enter, a product type from the list provided in operation 1630.
- the process 1600 includes providing a list of matches, based on the first and second parameters (1650).
- the list of matches may include all computers in the database that are manufactured by the entered manufacturer and that are of the entered product type.
- the list of matches may include all Sony laptops.
- the process 1600 may be used, for example, instead of having a user enter a onetime, full search phrase.
- the process 1600 presents a set of structured searches or selections from, for example, drop-down lists.
- the first and second parameters can be considered to be parts of a search string, with the cumulative search string producing the list of matches provided in operation 1650.
- the database may be structured to allow for efficient searches based on the parameters provided in operations 1610 and 1630. Additionally, in voice input applications, by structuring the data entry, the grammar and vocabulary for each parameter may be simplified, thus potentially increasing recognition accuracy and speed. Implementations may present multiple parameters and sets of options, and these may be organized into levels. In the process 1600, one parameter was used at each of two levels.
- multiple parameters may be presented at a first level, with both entries determining the list of options presented for additional multiple parameters at a second level, and with all entries determining a list of matches.
- Such parameters may include, for example, manufacturer, brand, product type, price range, and a variety of features of the products in the product type. Examples of features for computers include processor speed, amount of random access memory, storage capacity of a hard disk, video card speed and memory, and service contract options.
- a picture of a page 1700 for implementing the process 1600 includes a first level 1710 and a second level 1720.
- the first level 1710 provides a first parameter 1730 for the product, with a corresponding pull-down menu 1740 that includes a set of options.
- the set of options in pull-down menu 1740 may include, for example, desktop, laptop, and palmtop.
- the second level 1720 provides a second parameter 1750 for the brand, with a corresponding pull-down menu 1760 that includes a set of options.
- the set of options in pull-down menu 1760 are all assumed to satisfy the product parameter entered by the user in pull-down menu 1740 and may include, for example, Sony, HP/Compaq, Dell, and IBM. Assuming that "laptop" was selected in the pulldown menu 1740, then the pull-down menu 1760 would only include brands (manufacturers) that sell laptops.
- the page 1700 also includes a category 1770 for models that match the parameters entered in the first and second levels 1710 and 1720.
- the matching models are viewable using a pull-down menu 1780.
- all of the search string information as well as the results may be presented in a single page.
- the page 1700 is also presentable in a single screen shot, but other single-page implementations may use, for example, a web page that spans multiple screen lengths and requires scrolling to view all of the infonnation.
- a process 1800 for recognizing an address includes determining a list of options for a first part of an address (1810).
- the address may be, for example, a street address or an Internet address, where Internet addresses include, for example, electronic mail addresses and web site addresses. If the address is a street address, the first part may be, for example, a state identifier.
- the process 1800 includes prompting a user for the first part of the address
- the prompt may, for example, simply include a request to enter information, or it may include a list of options.
- the process 1800 includes receiving the first part of the address (1830). If the first part is received auditorily, the process 1800 includes performing voice recognition of the first part of the address (1840).
- the process 1800 includes determining a list of options for a second part of the address based on the received first part (1850).
- the second part may be, for example, a city identifier
- the list of options may include, for example, only those cities that are in the state identified by the received state identifier.
- the process 1800 could continue with subsequent determinations of lists of options for further parts of the address.
- a list of options for a zip code could be determined based on the city identified by the received city identifier. Such a list could be determined from the available zip codes in the identified city. City streets in the city or the zip code could also be determined. Further, country information could be obtained before obtaining state information.
- the process 1800 may prompt the user in a number of ways. For example, the user may be prompted to enter address information in a particular order, allowing a system to process the address information as it is entered and to prepare the lists of options. Entry fields for country, state or province, city, zip or postal code, street, etc., for example, may be presented top-down on a screen or sequentially presented in speech output. Referring to FIG. 19, there is shown another way to prompt the user in the process 1800.
- a system may use a pop-up wizard 1900 on the screen of a device to ask the user to enter specific address information.
- a system may preserve the normative order of address information, but use visual cues, for example, to prompt the user to enter the information in a particular order. Visual cues may include, for example, highlighting or coloring the border or the title of an entry field.
- the process 1800 may be applied to data entered using a voice mode or another mode. After the data is entered at each prompt, and after it is recognized if voice input is used, a database of addresses may be searched to determine the list of options for the next address field. Such systems allow database searching on an ongoing basis instead of waiting until all address information is entered. Such systems also allow for guided entry using pull-down menus and, with or without guided entry, alerting a user at the time of entry if an invalid entry is made for a particular part of an address.
- the process 1800 also may be applied to other addresses, in addition to street addresses or parts thereof.
- the process 1800 may be applied to Internet addresses, including, for example, electronic mail addresses and web site addresses.
- a format 2000 for entering an electronic mail address includes using a user identifier 2010, a server identifier 2020, and a domain identifier 2030.
- the "at sign" separating the user identifier 2010 and the server identifier 2020, and the “dot” separating the server identifier 2020 and the domain identifier 2030 may be implicit and inserted automatically, that is, without human intervention.
- a format 2100 for entering a web site address includes using a network identifier 2110, a server identifier 2120, and a domain identifier 2130.
- the two "dots" separating the three identifiers 2110, 2120, 2130 may be implicit and inserted automatically.
- the network identifier may be selected from, for example, "www,” “wwwl,” “www2,” etc.
- the process 2200 includes searching a first search space for a match for the first part of the search string (2220).
- the first search space may include, for example, a search space in a grammar of a voice recognition engine, a search space in a database, or a search space in a list of options presented to a user in a pull-down menu. Searching may include, for example, comparing text entries, voice waveforms, or codes representing entries in a codebook of vector-quantized waveforms.
- the process 2200 includes limiting a second search space based on a result of searching the first search space (2230).
- the second search space may, for example, be similar to or the same as the first search space.
- Limiting may include, for example, paring down the possible grammar or vocabulary entries that could be examined, paring down the possible database entries that could be examined, or paring down the number of options that could be displayed or made available for a parameter of the search string. And paring down the possibilities or options may be done, for example, so as to exclude possibilities or options that do not satisfy the first part of the search string.
- the process 2200 includes accessing at least a second part of the search string (2240) and searching the limited second search space for a match for the second part of the search string (2250).
- Accessing the second part of the search string may include, for example, receiving a voice input, a stylus input, or a menu selection, and the second part may include the entire search string.
- Searching the limited second search space may be performed, for example, in the same way or in a similar way as searching the first search space is performed.
- the process 2200 is intended to cover all of the disclosed processes.
- a system 2300 for implementing one or more of the above processes includes a computing device 2310, a first memory 2320 located internal to the computing device 2310, a second memory 2330 located external to the computing device 2310, and a recognition engine 2340 located external to the computing device 2310.
- the computing device may be, for example, a desktop, laptop, palmtop, or other type of electronic device capable of performing one or more of the processes described.
- the first and second memories 2320, 2330 may be, for example, permanent or temporary memory capable of storing data or instructions at least temporarily.
- the recognition engine 2340 may be a voice recognition engine or a recognition engine for another mode of input.
- the second memory 2330 and the recognition engine 2340 are shown as being external to, and optionally connected to, the computing device 2310. However, the second memory 2330 and the recognition engine 2340 also may be integrated into the computing device 2310 or be omitted from the system 2300.
- FIGS. 1, 7, and 10 depict several examples, and other examples have been described.
- One action which a user might perform when utilizing the gateway synchronization capabilities of such systems is the selection of a web page that is linked to a currently- viewed web page, where this selection can be performed, for example, either by voice input using a VXML page, or by clicking on an HTML link embedded in an HTML page, using, for example, a stylus or mouse.
- FIGS. 8 and 9 Another action which a user might perform is to enter text into, for example, multiple fields within a form on a single web page.
- variations of processes 300-600 in FIGS. 3-6 include techniques for implementing commands relating to a particular page.
- variations of operations 810 and 910 allow the synchronization controller 720 of FIG. 7 to receive inputs such as browser inputs and voice inputs, where the inputs may include a data input and/or a focus request for moving to a new field.
- the voice mode system 740 receives a user's city selection for a field in a VXML page, and then subsequently moves a focus to a field for state selection.
- Text can be entered using either manual entry by, for example, keyboard, or via a voice-recognition system associated with a co ⁇ esponding and synchronized VXML page.
- FIGS. 19-21 describe examples of such text entry; more specifically, these figures and related text and examples describe techniques whereby, for example, a grammar is selectively narrowed when performing voice-recognition on a search string, or where a grammar is progressively narrowed as a plurality of related entries are input.
- a grammar is selectively narrowed when performing voice-recognition on a search string, or where a grammar is progressively narrowed as a plurality of related entries are input.
- Another technique, allowed for in the discussion above, for entering text or other information into multiple fields within a form is to have a two-level, hierarchical dynamic grammar.
- this technique there are multiple levels and instances of independent, discrete grammars, rather than multiple subsets of a larger and/or interdependent grammar(s).
- 2410 includes vocabulary for voice commands that are recognized by an operating device or software regardless of a current state of a system or a page. For example, even when a user is currently entering pieces of text information into one of a plurality of fields on a page, the global grammar will be continuously operable to recognize voice input references for, for example, names of other fields on the page, commands for activating the browser (such as, for example, "back,” "home,” or “refresh”), and device commands such as "restart.”
- a second level includes a plurality of specific grammars.
- the second level might include a first grammar 2420 for recognizing voice commands inputting a city name, a second grammar 2430 for recognizing voice commands for inputting a state name, and a third grammar 2440 for recognizing voice commands for inputting a street address.
- grammars 2410-2440 are separate, discrete, independent grammars.
- One consequence of having independent grammars is that a given word may be replicated in multiple grammars, thereby increasing total storage requirements.
- grammar 2420 might include the word “Washington” for identifying a name of the city, Washington, DC.
- Grammar 2430 might also include the word “Washington,” here to identify the state of Washington.
- grammar 2440 might include the word “Washington” in case a user lives on “Washington Street.”
- other voice recognition systems may have a single, large grammar (or a plurality of dependent grammars), in which the word “Washington” is stored once and entered into whatever field is currently active upon detection of the word “Washington.”
- Such systems may be relatively poor in recognizing voice input when there are multiple fields for voice recognition that are active at the same time.
- the fields of last name and first name may both exist in a form on a software application and may be concurrently active to display a result of recognizing a voice input such as "Davis" (which may be, for example, a valid selection within both a "first name” and a "second name” field).
- Such fields with similar data in the recognition grammar may compete for the results of voice recognition, and therefore increase the probability of inaccuracy.
- the multi-level grammar of FIG. 24 may thus provide increased speed and/or accuracy of voice recognition.
- This speed and/or accuracy improvement results, for example, from the fact that only one from among the second level of grammars is active a particular time. Therefore, the size of the vocabulary that must be searched by a voice- recognition system may be severely reduced. With a smaller vocabulary, recognition accuracy generally increases, and processing time generally decreases.
- the input indication system 2480 may be located on a server system, and/or on a local system such as a mobile computing device.
- the input indication system 2480 may be a field(s) within a form on a graphical user interface such as a web page, as discussed above, so that voice data input by the user and recognized by the voice recognition system 2470 can be displayed to the user.
- the input indication system 2480 also may be a recorded or computer-generated voice repeating a recognized word to the user, such as might be used in a telephone entry system.
- FIG. 25 shows a web page 2500, being viewed on a portable device, for entering information about a user.
- page 2500 may be a VXML page including a first name field 2510, a last name field 2520, a state field 2530, a zip code field 2540, a city field 2550, and a street address field 2560.
- Page 2500 also illustrates a plurality of buttons 2570, which are intended to illustrate a plurality of conventional web commands, such as "refresh,” "home,” “favorites folder,” and so on.
- a user may activate the first name field 2510 using a variety of techniques.
- field 2510 could be selected by a voice command recognized by the first level grammar that includes global grammar 2410.
- the field could be selected using a stylus, mouse, or other mechanical input.
- the field could be automatically highlighted, due to being the first field in the form.
- FIG. 26 shows a web page 2600, again being viewed on a portable device, for entering information about a user.
- Page 2600 has essentially the same fields as page 2500, however, page 2600 illustrates a visual cue highlighting a first name field 2610.
- the visual cue serves as a technique for indicating to the user which grammar is currently active.
- Various examples of such visual cues may include a cursor within the field, a highlighting of the field, a specific coloration of the field, or any other technique for indicating that the particular field and its associated grammar is active.
- field 2610 is automatically highlighted as the first field on the page 2600.
- the visual cue may automatically move to the field 2520, and so on through the page 2600.
- an entry into the various fields may be aided by pull-down menu(s), such as in fields 2530 and 2550, or may be filled without the aid of pull-down menus(s), such as in fields 2510 (2610), 2520, 2540, and 2560.
- first level grammars such as global grammar 2410
- second level grammars which remains active even when a particular one of the second level grammars is activated
- the user may alternatively choose fields individually, simply by providing an activation signal for a selected one of the (currently) deactivated grammars.
- the activation signal may involve simply saying the name of the desired field associated with the grammar to be activated.
- multi-modal synchronization of pages 2500/2600 allows the user to utilize an activation signal involving a physical selection of a field (for example, using a physical tool such as a stylus or a mouse), even when the pages include, or are associated with, VXML pages/data.
- the global grammar 2410 may be included within each of the independent grammars 2420, 2430, and 2440, particularly in the case where the global grammar 2410 is relatively small in size, h this example, total memory requirements will likely be increased; however, the need to have two processes running simultaneously (that is, two grammars) would be eliminated.
- FIGS. 24-26 are particularly advantageous with respect to mobile computing devices, in which computing/processing resources are at a relative premium. Moreover, often in small mobile computing devices, text entry is awkward, difficult, or non-existent, so that speedy, accurate voice entry, particularly into forms such as web pages, would be very useful and advantageous.
- System 2700 includes a second mobile device 2730 including a second VoIP client 2734 and a second browser 2736, with the second browser 2736 including a second browser adaptor 2738.
- Second VoLP client 2734 is coupled to a second voice gateway 2740 that includes a second voice gateway adaptor 2744.
- Process 2900 includes having voice gateway adaptor 2724 subscribe to a unique channel based on the LP address of the mobile device 2710 (2930).
- Voice gateway adaptor 2724 may use, for example, HTTP to communicate with messaging handler 2770.
- the IP address is embedded in the HTTP communication between browser 2716 and web server 2750, and web server 2750 detects the LP address by extracting the IP address from the communication, hi one implementation, web server 2750 assumes that a unique messaging channel referenced by the LP address exists and associates the session with the unique messaging channel using a table or data structure.
- Process 2900 includes having web server 2750 send a web page to browser 2716 in response to first web browser 2716 connecting to web server 2750 (2970).
- the web page sent to a browser is typically a HTML page. If the browser-server connection was established (2960) in response to a user entering the URL of a desired web page, then web server 2750 may send the requested web page.
- browser 2716 may connect to web server 2750 before voice gateway 2720 connects to web server 2750.
- voice gateway 2720 connects to web server 2750.
- the roles of the two gateways 2716 and 2720 are generally reversed from that described in process 2900.
- Firewall 3010 thus shields the IP address of mobile device 2710 from transmissions to voice gateway 2720 and web server 2750. Accordingly, if process 2900 is used with system 3000, then the LP address of firewall 3010 will be detected by voice gateway adaptor 2724 in operation 2920 and by web server 2750 in operation 2965. This would cause voice gateway adaptor 2724 to subscribe to a messaging channel identified by the LP address of firewall 3010. Continuing with this example, in operation 2980 browser adaptor 2718 would not be able to subscribe to the same messaging channel unless browser adaptor 2718 knew the LP address of firewall 3010. A more general problem exists, however, for many implementations. Typical implementations will have multiple mobile devices coupled to firewall 3010.
- VoLP client 2714 provides a unique identifier to voice gateway
- voice gateway adaptor 2724 can be configured to detect the unique identifier in operation 2920
- web server 2750 can be configured to detect the unique identifier in operation 2965
- browser adaptor 2718 can be configured to subscribe to the messaging channel identified by the unique identifier and created in operation 2930.
- a process 3100 may be used to send a synchronization message.
- Process 3100 may be used by various implementations including, for example, the implementations associated with system 2800 and system 3000.
- Process 3100 includes receiving a request for first-modality data (3110).
- the first modality data includes first content, with the first-modality data being configured to be presented using a first modality, and the request coming from a requestor and being received at a first device.
- First-modality data includes data that may be presented to a user using a first modality, or that may be responded to by the user using the first modality.
- Other modality data such as second-modality data and third-modality data, may be defined similarly.
- First-modality data may include, for example, a web page or other data structure, and such a data structure typically includes content.
- Content generally refers to information that is presented to a user or that a user may be seeking.
- a data structure also may include, for example, a header having header information, and other formatting information.
- a web page may include content that is displayed to a display device by a browser application, and the HTML of the web page may include header and formatting information that control aspects of the display and routing of the web page.
- Process 3100 includes sending a message allowing request of second modality data (3120).
- the message is sent from the first device for receipt by a second device, with the message being sent in response to receiving the request and including information that allows the second device to request second-modality data that includes second content that overlaps the first content, with the second-modality data being configured to be presented using a second modality.
- the content of the second-modality data may overlap the content of the first-modality data by having common content. For example, a HTML page (first-modality data) and a co ⁇ esponding VXML page (second- modality data) have common content.
- the information allowing a request of the second-modality data may be of various types.
- first-modality data and the co ⁇ esponding second-modality data may be synchronized by presenting the first-modality data and the co ⁇ esponding second- modality data to a user in such a manner that the user may respond to the overlapping content using either the first modality or the second modality.
- Process 3100 includes determining the information that is included in the sent message (3130). For example, if the URL of the first-modality data and the co ⁇ esponding second-modality data are different, and the information includes the URL of the first-modality data, then the URL of the co ⁇ esponding second-modality data may be determined by, for example, using a table look-up or an algorithm, or requesting the information from another component or a user.
- Process 3100 includes sending the first-modality data to the requestor (3140). One or more additional components may be involved in sending the first-modality data to the requestor, either upstream or downstream.
- Process 3100 includes receiving a request for the second-modality data from the second device (3150). The request may be, for example, (i) a request for second-modality data at a URL identified by the infonnation included in the sent message, (ii) a request for second-modality data at a URL determined from the information included in the sent message, or (iii) a request for second-modality data at an address pointed to by a web page at a URL identified by or determined from the information included in the sent message.
- Process 3100 includes sending the second-modality data to the second device (3160).
- One or more additional components may be involved in sending the second- modality data to the second device, and may be involved either upstream or downstream of the sending.
- a server may send data through a firewall to a gateway.
- Process 3100 includes sending a second message (3170).
- the second message is sent from the first device in response to receiving the request and for receipt by a third device.
- the second message includes second information allowing the third device to request third-modality data that includes third content that overlaps both the first content and the second content, with the tl ⁇ rd-modality data being configured to be presented using a third modality.
- the second information allows a third modality to synchronize with the first two modalities.
- the first-modality data, the co ⁇ esponding second-modality data, and the co ⁇ esponding third-modality data may be synchronized by presenting each to the user in such a manner that the user may respond to the overlapping content using either the first modality, the second modality, or the third modality.
- Process 3100 includes sending another message from the first device (3190).
- the other message is sent in response to receiving the other request, and is sent for receipt by another device.
- the other message includes third information that allows the other device to request second second-modality data that includes fifth content that overlaps the fourth content, with the second second-modality data being configured to be presented using the second modality.
- two users may each be using separate mobile communication devices to navigate a network such as the WWW, and each user's modalities may be synchronized. That is, the first user may have his/her two modalities synchronized and the second user may have his/her two modalities synchronized, but there need not be any synchronization between the two users.
- the second first-modality data and the second co ⁇ esponding second-modality data may be synchronized by presenting the second first-modality data and the second co ⁇ esponding second-modality data to a second user in such a manner that the second user may respond to the overlapping content using either the first modality or the second modality.
- web server 2750 may determine the URL of the co ⁇ esponding HTML page and send the URL of the co ⁇ esponding HTML page in the message rather than sending the URL of the VXML page (3130).
- Web server 2750 may send the requested VXML page to voice gateway 2720 (3140). Web server 2750 may receive a request for the co ⁇ esponding HTML page from browser 2716, possibly through firewall 3010 (3150). Web server 2750 may send the co ⁇ esponding HTML page to browser 2716 (3160).
- Web server 2750 may send a second message, with the second message going to a third-modality gateway (not shown) and including the URL of the VXML page, with the URL of the VXML page allowing the third-modality gateway to request co ⁇ esponding third-modality data (3170).
- Web server 2750 may perform various operations of process 3100 using any of the server-push, browser-pull, voice-interrupt listener, or no-input tag implementations described earlier.
- a voice gateway requests a VXML page from a server (320; 3110), and the server sends a message to a browser indicating the co ⁇ esponding HTML page (340-350; 3120).
- hi browser-pull for example, a voice gateway requests a VXML page from a server (410; 3110), and the server sends a response to a browser with an embedded command that updates the browser with the co ⁇ esponding HTML page when the browser executes the embedded command (450; 3120).
- a browser requests a HTML page from a server (520; 3110), and the server sends a message to a voice gateway indicating the co ⁇ esponding VXML page (540-550; 3120).
- a voice gateway indicating the co ⁇ esponding VXML page (540-550; 3120).
- no-input tag for example, a browser requests a HTML page from a server (620; 3110).
- the server has previously sent a no- input tag to a voice gateway allowing the voice gateway to request a JSP (610; 3120), and the server now updates the JSP with, for example, the address of the co ⁇ esponding VXML page, thereby allowing the voice gateway to request the co ⁇ esponding VXML page (640; 3120).
- a synchronization controller receives a request for a HTML page from a browser (1110; 3110), and the synchronization controller sends a message to a voice gateway so that the voice gateway requests the co ⁇ esponding VXML page (1140; 3120).
- a synchronization controller receives a request for a HTML page from a browser (810; 3110), and the synchronization controller passes an identifier of the co ⁇ esponding VXML page to a voice mode system (830; 3120).
- Browser 3216 and voice gateway 3220 are modified in that they can each send information to, and receive information from, browser adaptor 3218 and voice gateway adaptor 3224, respectively.
- Browser 2716 and voice gateway 2720 conversely, only receive information from browser adaptor 2718 and voice gateway adaptor 2724, respectively.
- web server 3230 is modified from web server 2750 in that web server 3230 does not include an adaptor nor include functionality associated with using an adaptor. Accordingly, web server 3230 does not publish messages. Messages are published, as well as received, by voice gateway adaptor 3224 and browser adaptor 3218.
- browser 3216 when browser 3216 receives input from a user requesting a HTML page, browser 3216 publishes (using browser adaptor 3218) a message to the unique messaging channel with the URL of the requested HTML page.
- Voice gateway adaptor 3224 receives the message and instructs voice gateway 3220 to request the co ⁇ esponding VXML page from web server 3230.
- browser adaptor 3218 instead of the. server publishing the URL to the voice gateway adaptor in operation 2975, browser adaptor 3218 publishes the URL.
- voice gateway 3220 when voice gateway 3220 receives input from VoLP client 2724 requesting a VXML page, voice gateway 3220 publishes (using voice gateway adaptor 3224) a message to the unique messaging channel with the URL of the requested VXML page.
- Browser adaptor 3218 receives the message and instructs browser 3216 to request the co ⁇ esponding HTML page from web server 3230.
- Browser adaptor 3218 and voice gateway adaptor 3224 may use the mechanisms described earlier to detect or obtain an LP address of mobile device 3210, or a user LD or device ID. Further, a login procedure may be used including, for example, a user entering login information into browser 3216 and voice gateway 3220 (using, for example, VoLP client 2727). Such login information may be used by web browser 3230 (or some other component(s)) to authenticate and uniquely identify the user. A login procedure also may be used with the earlier implementations described for systems 2800 and 3000.
- System 3200 may be used to illustrate selected aspects of process 3100.
- mobile device 3210 may receive a request for a HTML page from a user (3110).
- Mobile device 3210 may send the URL of the requested HTML page to voice gateway 3220 in a message, with the URL allowing voice gateway 3220 to request the co ⁇ esponding VXML page (3120).
- Mobile device 3210 may send the message using browser adaptor 3218, messaging handler 2770, and voice gateway adaptor 3224.
- mobile device 3210 may determine the URL for the co ⁇ esponding VXML page (3130) and send the URL for the co ⁇ esponding VXML page in the message to voice gateway 3220.
- Mobile device 3210 may send a second message including the URL of the requested HTML page, with the second message going to a third-modality device and the sent URL allowing the third-modality device to request the co ⁇ esponding third-modality data (3170).
- voice gateway 3220 may receive a request for a VXML page (3110).
- voice gateway 3220 may determine the URL for the co ⁇ esponding HTML page (3130) and send the URL for the co ⁇ esponding HTML page in the message to browser 3216.
- Voice gateway 3220 may send a second message including the URL of the requested VXML page, with the second message going to a third-modality device and the sent URL allowing the third-modality device to request the co ⁇ esponding third-modality data (3170).
- Process 3300 includes ascertaining the co ⁇ esponding second data (3330).
- the co ⁇ esponding data may be ascertained by, for example, receiving information indicating the co ⁇ esponding second data, or determining the co ⁇ esponding second data based on the first data.
- Process 3300 includes presenting the first content to a user using the first modality
- Process 3300 may be illustrated by, for example, system 3200.
- mobile device 3210 may request a VXML page (3310), the request being made to voice gateway 3220 using VoIP client 2727.
- Mobile device 3210 may thereafter automatically request the co ⁇ esponding HTML page from web server 3230 (3320).
- Mobile device 3210 may receive the URL of the co ⁇ esponding HTML page from voice gateway adaptor 3224 (3330), with the URL being received in a message at browser adaptor 3218.
- Mobile device 3210 may present the requested VXML page to a user using VoLP client 2727 and a speaker (3340), and may present the co ⁇ esponding HTML page to the user using browser 3216 (3350).
- Various operations of process 3300 also may be performed by, for example, proxy or fused implementations, i a proxy implementation, for example, a synchronization controller requests a HTML page from a web server (1120; 3310), and the synchronization controller requests the co ⁇ esponding VXML page (1270; 3320).
- a synchronization controller requests a HTML page from a web server (840; 3310), and the synchronization controller requesting the co ⁇ esponding VXML page by passing an identifier of the co ⁇ esponding VXML page to a voice mode system (830; 3320).
- a device 730 requests a HTML page (840; 3310), (ii) determines the co ⁇ esponding VXML page (820; 3330), (iii) requests the co ⁇ esponding VXML page (830; 3320), (iv) presents the requested HTML page after receiving the HTML page (see
- firewall 3010 may be performed by, for example, a proxy, a gateway, or another intermediary. Implementations may use multiple intermediaries in various configurations.
- An implementation may include any number of modalities, and the number of modalities may be, for example, fixed, variable but determined, or variable and unknown. The number of modalities may be fixed beforehand in a system, for example, that is specifically designed to support mobile devices communicating with a browser and voice and using two modalities. The number of modalities also may be variable but determined during an initial connection or power-up by a mobile device by, for example, having the system query a user for the number of modalities to be used.
- Providing co ⁇ esponding data for non-navigation commands may include, for example, having a component enter text, change a preference, or provide a focus in another modality.
- Examples of various modalities include voice, stylus, keyboard/keypad, buttons, mouse, and touch for input, and visual, auditory, haptic (including vibration), pressure, temperature, and smell for output.
- a first modality may be defined as including voice input and auditory output
- a second modality may be defined as including manual input and visual and auditory output.
- a modality also may be restricted to either input or output.
- Interfaces for various modalities may include, for example, components that interact with a user directly or indirectly.
- Directly interacting components may include, for example and as previously described, a speaker.
- Indirectly interacting components may include, for example, a VoIP client that communicates with the speaker.
- Various implementations perform one or more operations, functions, or features automatically. Automatic refers to being performed substantially without human intervention, that is, in a substantially non-interactive manner. Examples of automatic processes include a process that is started by a human user and then runs by itself, or perhaps requires periodic input from the user. Automatic implementations may use electronic, optic, mechanical, or other technologies.
- FIG. 35 is a block diagram of a multimodal warehousing system 3500.
- a warehouse 3502 includes a first location 3504, a second location 3506, and a third location 3508, at each of which a worker 3510 or a manager 3512 may perform various tasks.
- the warehouse 3502 represents one or more warehouses for storing a large number of products for sale in an accessible, cost- efficient manner.
- the warehouse 3502 may represent a site for fulfilling direct mail orders for shipping the stored products directly to customers.
- the warehouse 3502 also may represent a site for providing inventory to a retail outlet, such as, for example, a grocery store.
- the warehouse 3502 also may represent an actual shopping location, i.e., a location where customers may have access to products for purchase.
- the locations 3504, 3506, and 3508 represent particular sites within the warehouse 3502 at which one or more products are shelved or otherwise stored, and are used below to illustrate particular functionalities of the multi-modal warehousing system 3500.
- an enterprise system including a server system 3514, is in communication with a mobile computing device 3515 via a network 3516.
- the server system 3514 includes an inventory management system that stores and processes information related to items in inventory.
- the server system 3514 may be, for example, a standalone system or part of a larger business support system, and may access (via the network 3516) both internal databases 3517 storing inventory information and external databases 3518 which may store financial information (e.g. credit card information).
- financial information e.g. credit card information
- access to the internal databases 3517 and the external databases 3518 may be mediated by various components, such as, for example, a database management system and/or a database server.
- Locations 3504, 3506, and 3508 and/or associated storage containers maybe associated with different item types.
- the storage location of an item may be associated with a location and/or storage container by the server system 3514.
- the server system 3514 may provide the worker 3510 or the manager 3512 with, for example, suggestions on best routes to take to perform warehousing tasks.
- the server system 3514 may provide the mobile computing device 3515 with information regarding items that need to be selected from a storage area.
- This information may include one or more entries in a list of items that need to be selected.
- the entries may include a type of item to select (for example, 1/4" phillips head screwdriver), a quantity of the item (for example, 25), a location of the item (that is, stocking location), and an item identifier code.
- Other information such as specific item handling instructions, also may be included.
- warehouses such as the warehouse 3502 often are very large, so as to store large numbers of products in a cost-efficient manner.
- warehouses often provide difficulties to the worker 3510 attempting to find and access a particular item or type of item in a fast and cost-effective manner, for example, for shipment of the item(s) to a customer.
- the worker 3510 may spend unproductive time navigating long aisles while searching for an item type.
- the size and complexity of the warehouse 3502 may make it difficult for the manager 3512 to accurately maintain proper count of inventory.
- the worker 3510 fails to accurately note the effects of his or her actions; for example, failing to co ⁇ ectly note the number of items selected from (or added to) a shelf. Even if the worker 3510 co ⁇ ectly notes his or her activities, this information may not be properly or promptly reflected in the inventory database 3517.
- the warehouse system 3500 allows the worker 3510 multimodal access to warehouse and/or inventory data, and automates warehouse functionality when possible and practical. Examples of these multimodal techniques and capabilities, as well as associated automated functionalities, are discussed in detail below with reference to the locations 3504, 3506, and 3508 of the warehouse 3502.
- the worker 3510 may use a tote 3520 to collect, or "pick," a first item 3522 from a shelf 3524.
- the mobile computing device may use a tote 3520 to collect, or "pick," a first item 3522 from a shelf 3524.
- the PDA 3515 may be a portable device, such as a personal digital assistant ("PDA") 3526, that may be small enough to be carried by a user without occupying either of the hands of the user (e.g., may be attached to the user's belt).
- the PDA 3526 may receive item entries from the enterprise system 3514.
- all of the item entries may be downloaded at one time and stored as a "pick list" (that is, a list of items to select or pick) in the memory of the PDA 3526.
- the pick list may list the item entries in a predetermined order associated with the location of the items in the storage area. For example, the order of the item entries may co ⁇ espond to an item selection order that optimizes the efficiency of the path taken by the user as he or she picks items in the storage area.
- the pick list may be stored in the server system 3514, and item entries may be downloaded to the PDA 3526 one at a time from the server system 3514.
- next item entry is not accessed until the cu ⁇ ent item entry has been processed (that is, the items co ⁇ esponding to the entry have been picked).
- the item entries also may be provided to the PDA 3526 a single entry at a time in a predetermined order associated with the location of the items in the storage area.
- the worker 3510 may efficiently move throughout the warehouse 3502 while collecting, counting, or distributing items. Results of these actions may be promptly and accurately reported to the server system 3514, so that inventory information is accurate and up to date.
- the manager 3512 may take a count, or "inventory,” of items 3542.
- the manager 3512 may use the PDA 3526 or the mounted screen 3540 to update the internal warehouse databases 3517 via the server system 3514.
- the manager 3512 also may receive directions from the server system 3514 about how to conduct the inventory.
- the server system 3514 may instruct the manager 3512 on which items to count, and/or in what order.
- the techniques described above for enabling multimodal capabilities may be implemented in the picking, stocking, or counting techniques just described.
- the server system 3514 may include a server 3544 and a format determination system 3546, which may generally represent, for example, the server system 110 and synchronization controller 120 of FIG. 1.
- the format determination system may be implemented in the PDA 3526, as shown, for example, in FIG. 7.
- the worker 3510 and/or manager 3512 may have simultaneous access to various different modes of input/output, so as to increase the ease and efficiency of their duties.
- the worker 3510 may use a voice- recognition functionality to notify the server system 3514 of the worker's cu ⁇ ent location or job status. This ability allows the hands of the worker 3510 to remain free for selecting items for placement into the tote 3520.
- voice input becomes non-prefe ⁇ ed for example, if the worker 3510 enters a noisy area of the warehouse 3502
- other modalities will be available to the worker 3510, such as the bar code 3535 or stylus input into the PDA 3526.
- the worker 3510 may print an order from the PDA 3526 before entering a noisy warehouse area.
- similar advantages exist in, for example, stocking and counting scenarios.
- the techniques described above for enabling multimodal capabilities may be implemented in various scenarios associated with a warehouse environment. For example, the techniques may be implemented in moving, shipping, and receiving scenarios.
- FIG. 36 is a flow chart of a process 3600 enabling the server system 3514 to interact with mobile and stationary devices in a warehouse environment.
- a user of the mobile or stationary device is authorized to perform warehouse duties (3604). This authorization may include, for example, having the user enter a name and password. This information is verified by the server system 3514.
- the server system 3514 then provides multi-modal interfaces to the mobile or stationary device (3606). For example, the server may enable both voice and stylus input at a mobile device of the user, so that the user may input information described below.
- the user requests a job type in a chosen mode, e.g., using voice inp ⁇ t (3608).
- the job type may include, for example, selection of item(s) for stocking, picking, or counting (taking inventory).
- the server system 3514 then co ⁇ esponds the information received from the user (3610) to the information in the internal warehouse databases 3516.
- it may be necessary to co ⁇ elate a response to a job type request received in HTML by way of a stylus input with response data formatted in VXML, so as to continuously provide the user with the option of using both stylus and voice inputs as the user communicates over time with the server system 3514.
- the information of the pick list is also provided in a VXML (voice extensible markup language) format to a voice gateway that communicates with the mobile device for input/output using a microphone and a speaker on the mobile device.
- VXML voice extensible markup language
- the implementation also provides communication between the enterprise system and an RFLD gateway.
- the RFID gateway receives input from an
- System 3700 includes a tote 3750 for collecting items that are selected by, for example, a person or a machine (a "picker"), such as the worker 3510.
- Tote 3750 includes a label 3752 of "Tote," a bar code 3754, and a communication device 3756, such as, for example, an RFID reader for communicating with RFID tags.
- the tote 3752 may represent any device for ca ⁇ ying items, such as, for example, a cart (including a shopping cart that may be used in a retail environment).
- System 3700 includes a portable digital assistant (“PDA”) 3760 including a display 3762.
- PDA portable digital assistant
- FIG. 38A shows several elements of system 3700 as first item 3730 is being placed into tote 3750. Arrows 3810 indicate that the first item 3730 is being placed into the tote 3750.
- the RFLD tag 3734 communicates with the device 3756 to identify the item 3730. Communication between RFID tag 3734 and device 3756 is indicated by a dashed line 3820.
- the picker may read instructions on the PDA 3760 or an overhanging display associated with a particular bin or group of bins (i.e., it should be understood that the above-described multi-modal architecture(s) may split modalities for the same user across multiple devices).
- voice confirm may be performed with phonetically 'distant' words. That is, instead of reading out the bin numbers, the picker may read out words (associated with and displayed on the bins, for example) that are phonetically distinctive to improve recognition thereof by the associated voice-recognition system.
- FIGS. 38B-38F An implementation of a particular pick list is depicted in FIGS. 38B-38F.
- the worker 3510 may hear: "Enter your User LD” when they see a first screen 3872. If a resulting spoken LD is co ⁇ ect, a second screen 3874 shown in Fig. 38C may result (there also may be other requirements, such as a password, entered using the stylus/keypad). As the screen 3874 comes up, the worker 3510 may hear: "Please scan or enter the tote number.” The worker 3510 may then scan the barcode label 3754 on the tote 3750 by using the barcode scanner in the mobile device 3760.
- the device 3760 may output the verbal command to "Pick 5 each.”
- the worker 3510 picks the right quantity of the item and says “done.”
- the worker 3510 may read out check-digits or a check- word from under the relevant bin. These digits/words may be phonetically as distinct from each other as possible for closely placed bins, and/or may be random. They also may be changed regularly.
- the worker 3510 may then continue line by line down the transfer order.
- the line item being worked may be highlighted (e.g., by color) with item description attached.
- the worker 3510 also may click on a checkbox 3882 in a left column of each line 3884 item to indicate completion of the co ⁇ esponding task(s).
- the worker 3510 then sees a fourth screen 3886 if the bin is expected to be empty after the worker 3510 has picked, the worker 3510 may be asked to confirm the same.
- the worker 3510 may hear: "Is the bin empty?” and may then reply "yes” if it is, or "no” if it is not, in which case the worker 3510 may then be asked: "What's the observed quantity?”
- the worker 3510 may subsequently say or enter with stylus the number of items left in the bin into a field 3888, and then say or click a second
- the worker 3510 picks items that are tagged with AutoLD chips, such as, for example, RFID tags.
- AutoLD chips such as, for example, RFID tags.
- the tote/box into which the picked items are placed have at least one reader for these chips.
- the tote also may have a barcode or RFID tag to be used to identify the tote.
- the reader confirms the item to the server and the worker 3510 receives the next set of instructions.
- the interaction, described above, relating to the screen of Fig. 38D maybe as follows.
- the worker 3510 hears: "Please go to Aisle 01, Section 01, Bin 10. (Pause) Pick 5 each.”
- the worker 3510 picks the right quantity of the item and places them in the tote.
- the worker 3510 may be instructed to the next line item. This process is repeated by continuing line by line down the transfer order.
- the line item being worked on may be highlighted, for example, in yellow on the PDA 3760 with item description attached.
- the display of the line item and, optionally, additional line items in the order (pick list) may help the worker 3510 to remain oriented in the pick list and to remember the current instruction.
- an RFLD tag provides a mode of input that can speed the picking process and increase the accuracy of the picking process.
- One implementation includes the modes of voice, stylus/display, bar code scanning (of bins or totes, for example), RFID tag reading (of products, bins, totes, for example).
- Other modes are possible, and each of the modes may interact with the system and update the system.
- the worker 3510 may, for example, be allowed to use voice commands to update the pick list if an RFLD tag is missing from a product.
- Communication between the RFID tag 3734 and the device 3756 may follow a variety of protocols, several of which are described in the following implementations.
- RFLD tag 3734 is continually transmitting (as is RFLD tag 3744), and device 3756 responds to the strongest signal, making an implicit assumption that the strongest signal belongs to the RFLD tag that is physically closest to device 3756.
- device 3756 requires a minimum received power before responding, the minimum received power indicating that the RFLD tag is within a certain distance.
- an RFLD tag includes may vary with implementation or item, including, for example, an LD number alone, an item description, a manufacturing date or other manufacturing information, and/or storage information.
- an RFID tag on an item of food may include an ID number, an item description, a manufacturing date or a shelf-life, and a storage temperature.
- the PDA 3760 is in communication with a central system (server
- the PDA 3760 may use a wireless network to upload information when a pick-list has been fully selected, and to download a new pick list. PDA 3760 may use a wired network to achieve the same effect and may download/upload information in batch mode for purposes of efficiency.
- the bar codes illustrated in FIGS. 37 and 38 may be used to achieve a variety of design objectives.
- the PDA 3760 may include a bar code reader to scan bar code 3727 before first item 3730 is placed into tote 3750. The PDA 3760 may then verify, for example, that the picker has gone to the co ⁇ ect bin.
- bar codes may be scanned during an inventory- verification process to indicate which bin is being inventoried.
- bar codes may be scanned during a restocking process to indicate the bin into which an item is being restocked.
- FIG. 39 shows a PDA 3960 that is similar to the PDA 3760, but that explicitly includes additional features enabling additional modes of communication with, for example, a picker, a bin, or a tote.
- the PDA 3960 includes a display 3962 allowing information to be displayed and to be input using, for example, a stylus.
- the PDA 3960 includes a keyboard 3964 and a microphone 3965 allowing a picker to enter information by touch or voice, respectively.
- the PDA 3960 includes a speaker 3966 allowing information to be audibly output.
- the PDA 3960 includes a bar code scanner 3967 for scanning a bar code on, for example, a tote, a bin, or an item.
- the PDA 3960 includes a communication device 3968 for communicating with, for example, the device 3756 (indirectly or directly), or, in other implementations, with the RFLD tag 3734.
- Communication device 3968 may use, for example, RF technology, infrared technology, or a hard-wired connection (hard-wired to, for example, a tote).
- the PDA 3960 also includes a credit card reader 3969 so that financial transactions may be completed using the PDA 3960.
- the various communication modalities illustrated in the PDA 3960 can be integrated so that as each is used to interact with information, such as, for example, a pick-list, the information is updated in the various output modalities and accessible in the various input modalities. As refe ⁇ ed to above, this allows varied presentations of the information, and also allows for increased efficiency and reduced workflow e ⁇ ors.
- Tag readers also may be varied, including, for example, RFLD readers, barcode scanners, polymer tag readers, and sensors.
- the data on a tag may be automatically read by a tag reader. Such reading may be, for example, continuous, a periodic scan, or a scan that is triggered by, for example, a proximity sensor.
- Readers may be positioned, for example, on a manufacturing line, in storage locations, in shipping and receiving areas, at loading docks, within trucks or other moving vehicles, and also may be hand-held wireless-connected devices.
- Process 4100 includes inputting item information into PDA 3960 or, for example, some other device or system (4120).
- the item information may include, for example, (i) a name or description of the item, such as, for example, label 3732, (ii) a SKU, product number, or other identifying number, such as, for example, identification number 4030, or (iii) the information conveyed by a bar code, such as, for example, bar code 4020.
- Inputting the item information may include, for example, a user speaking the item information into microphone 3965, scanning the item information with bar code scanner 3967, and entering the item information using keyboard 3964 or a stylus operating with display 3962.
- an RFLD reader receives the item information from an RFID tag on the item, with the RFLO reader being located on tote 3750, for example.
- the modality used to input the item information into PDA 3960 is one of multiple modalities used in process 4100 (4120). As indicated throughout the discussion of process 4100, various modalities may be used in several of the operations. Process 4100 is characterized by the use of at least two different modalities, although as stated above, process 4100 may be adapted to use only a single modality. Process 4100 optionally includes outputting the item information from PDA 3960 after the item information is input (4130). For example, after a user scans bar code 4020 on first item 3730 (see FIG. 40A) using bar code scanner 3967, PDA 3960 may output the item information on display 3962 or speaker 3966 so that the user can receive the item information. The item information may be displayed throughout process 4100 so that the user can refer back to the item information as needed.
- first bin 3710 includes a bin RFLD reader that receives a transmission from each item placed into the bin.
- the bin RFID reader may be designed so that it only receives transmissions from items that are placed into first bin 3710.
- the bin RFLD reader may transmit the received information through RFLD gateway 3830 to server 3850, and server 3850 may communicate all or part of the information to PDA
- process 4100 can be used to stock an item on a shelf in a store as well as to stock an item in a bin in a warehouse.
- Various differences may exist between the environments, such as, for example, the shelves in a store may not have bar codes, and the exact implementation of process 4100 may need to be altered to accommodate these differences.
- a store worker uses a headset communicating with server 3850 through voice gateway 4070, without the use of PDA 3960.
- the store worker picks up various items to be restocked (4110), speaks the SKU of an item into the headset (4120), receives a voice command over the headset indicating the shelf where the item is to be stocked (4140), walks to the indicated shelf (4150), places the item on the shelf (4170), and speaks "done" into the headset to inform server 3850 that the item has been placed on the indicated shelf (4180).
- the store worker then repeats the process for each item that needs to be stocked.
- a process 4200 is shown for taking an inventory of an item using, for example, the system of FIGS. 40A-40B.
- Process 4200 may be used to put an item in a bin in a warehouse, on a shelf in a store, or in some other environment as well.
- the modality used to output the storage location from PDA 3960 is one of multiple modalities used in process 4200 (4210). As indicated throughout the discussion of process 4200, various modalities may be used in several of the operations.
- Process 4200 includes outputting from PDA 3960 an indication of an item to count in the storage location (4240).
- the indication may include, for example, (i) a name or description of the item, such as, for example, label 3732, and (ii) a SKU, product number, or other identifying number, such as, for example, identification number 4030.
- PDA 3960 may step the user through each item serially, prompting the user to determine the inventory of each item in turn. Implementations also may allow the user to indicate that the storage location contains an additional type of item that was not output by PDA 3960. PDA 3960 may output the indication using, for example, any of the techniques described with respect to operations 4130 or 4270 above.
- Process 4200 includes the user counting the inventory of the item in the storage location (4250) and inputting the inventory of the item into PDA 3960 (4260).
- the user may input the inventory (4260) by, for example, speaking a quantity into microphone
- a separate operation may be used to verify that the user is counting the co ⁇ ect item.
- Such a separate operation may include, for example, any of the techniques described with respect to operation 4120 above.
- such a separate operation also may include, for an item with an RFLD tag, selecting one of the item from the storage location and placing the item within receiving range of an RFID reader.
- Certain warehouse environments may include shelf RFLD readers, in which case the inventory of items having RFLD tags may be continually updated with real-time data.
- Process 4200 may be used in these environments to verify the inventory indicated for one or more items.
- Inventory adjustments may be performed, including, for example, placing an order for items that have a low inventory. Inventory adjustments may be performed independently of an inventory process and may be based on, for example, a computer record of inventory and of the volume and timing of sales. Process 4200 may be used to ensure that the computer record of inventory is accurate and, thus, that the reordering process is based on accurate information.
- process 4200 can be used to inventory an item on a shelf in a store as well as to inventory an item in a bin in a warehouse.
- Various differences may exist between the environments, such as, for example, the shelves in a store may not have bar codes, and the exact implementation of process 4100 may need to be altered to accommodate these differences.
- Process 4200 describes a process for inventorying an item. As indicated earlier, if process 4200 is repeated, the inventory may be taken, for example, for a shelf that contains multiple items, for an area that contains multiple shelves, and for a warehouse or store that contains multiple areas. Accordingly, process 4200 may be used, for example, to perform an amiual physical inventory of an entire warehouse, or to perform some form of cycle counting.
- Cycle counting can be defined as any regularly recurring inventory (counting) program that counts less than the entire physical inventory each time. Many variations of cycle counting can be used or created, such as, for example, counting each item once per year or counting certain items more frequently than others.
- the system may respond with an indication that, for example, a restocking order needs to be placed, a restocking order has already been placed, or inventory for restocking has been received and is waiting to be put in the bin.
- the "shoot the hole” inventory process is also refe ⁇ ed to as "ad hoc" cycle counting because less than the entire physical inventory is inventoried in each count, but the items are not necessarily counted in a regularly recurring manner.
- Product information also may include details about products, as well as information about how the products relate to each other, such as, for example, complimentarity between products (e.g., ice cream and ice cream toppings).
- customers who cannot find a desired product, or cannot access information about the product may leave the store without making a purchase.
- customers who spend inordinate amounts of time searching for products, or waiting to complete a transaction for the products may not return to the store for future purchases.
- product information may be available to the store operators, but may be inaccessible to the customer while in the store.
- the product information should be accurate and up-to-date, so that the operator may ensure that products are ordered, priced, and stocked in a timely manner.
- Implementations described below facilitate a customers' shopping experience by providing information to and about the customer.
- implementations operate across a plurality of devices, and provide multi-modal access to store information.
- the store 4300 may be operated in a more efficient manner, so that sales are increased and customer satisfaction and loyalty are improved.
- the customer 4302 may be provided with a cart 4304 into which products may be placed and transported through the store 4300.
- the cart 4304 may be
- the RFID gateway 3830 may receive input from a plurality of RFLD-enabled carts 4304.
- the server system 3514 may then use this received input from the RFLD gateway to 3830 track the movement of customers 4302 through the store 4300. For example, the server system 3514 may discern a bottleneck of RFID-enabled shopping carts 4306 and alert store workers 3510 to open a new checkout line.
- the server system 3514 also may record customer "linger" by shelves or products through tracking the RFID-enabled shopping carts 4306.
- the recorded customer "linger time” may then be later analyzed for a co ⁇ elation between "linger time” and sales of the co ⁇ esponding product.
- the server system 3514 maybe used, for example, to provide visualization information of RFID- enabled cart 4304 movement patterns, sales flow of goods (i.e., what sells when), and groups of goods purchased.
- the product information and the customer information may be multimodally accessible to one or both of the worker 3510 and the customer 4302 using a mobile device, such as, for example, a customer PDA implemented as the PDA 3960 or a stationary device, such as, for example a manager's portal 4310 or an information kiosk 4312.
- a mobile device such as, for example, a customer PDA implemented as the PDA 3960 or a stationary device, such as, for example a manager's portal 4310 or an information kiosk 4312.
- the PDA 3960 may be mounted onto the cart 4304.
- Product information may be accessed by scanning an identification tag 4314 on an item 4316, such as, for example, an RFLD tag or a bar code, using the communication device 3968 on the PDA 3960 or a similar communication device 4318 on the information kiosk 4312.
- the manager's portal 4310 refers generally to information available to store operators that is designed to enable efficient and cost-effective administration of the store
- the portal 4310 may be, for example, an Internet or Intranet website that is available to the manager from an enterprise-wide server system, which could include, or be a part of, the server system 3514.
- the portal 4310 also may represent locally-stored inventory information, which may or may not be shared with other store locations. Even more generally, the portal 4310 may be understood to represent any information that is available to a store manager or other personnel and that might be helpful to shopping customers.
- the kiosk 4312 should be understood to represent any publicly available computing device that may be used to locally present information to the shopping public.
- the kiosk 4312 may have multiple input/output modes, including at least any of the modalities discussed herein.
- the kiosk 4312 may include a single station having multiple substations (e.g., multiple sets of displays and I/O interfaces), or may include a number of computing devices placed throughout the store 4300.
- FIG. 44 is a flowchart of examples of ways the customer 4302 may access the product information stored on the server system 3514. As the customer 4302 enters a sales area (4401), the customer 4302 may access the product information by using the infonnation kiosk 4312 or the PDA 3960 (4402).
- the customer 4302 may be prompted by the server system 3514 to identify herself, such as, for example by entering a user name and password (4404).
- the identification of the customer 4302 enables the server system 3514 to, for example, access a purchase history for the customer (4406).
- the customer 4302 may then be asked for a predetermined shopping list (4408). If the customer 4302 has a shopping list (4410), the list may be input in one of a plurality of modalities (4412).
- the customer 4302 may scan the item 4316 using the PDA 3960, as described above, and query (using one of a plurality of modalities) the server system 3514: "What goes well with this product?" The server system 3514 may then output a multi-modal interface with suggestions based on previous customer preferences, other customer preferences, and excess inventory and/or promotions.
- the customer 4302 and the worker 3510 may access customer information, such as, for example, financial information, in conjunction with product information to purchase a product using the PDA 3960, the manager's portal 4310, and/or the information kiosk 4312.
- customer information such as, for example, financial information
- the customer may scan the identification tag 4314 using the information kiosk 4312 or the PDA 3960.
- the server system 3514 may then note items and alter the checkout system to consolidate items for ready pickup and/or delivery. If the customer 4302 is authenticated, financial information may be accessed by the server system 3514, as described above, and a financial transaction may be completed either by the worker 3510 or by the customer 4302.
- Inputting the product identifier may include, for example, the customer 4302 speaking the item information into the microphone 3965, scanning the item infonnation with the bar code scanner 3967, and entering the item information using the keyboard
- an RFID reader receives the item information from an RFID tag on the product, with the RFLD reader 4306 located on the cart 4304, for example.
- a printer that is either built-in such as in, for example, a calculator having an integrated reel printer, or is detached and connected over, for example, a wireless connection.
- Process 4500 optionally includes inputting a request for additional product information into PDA 3960 (4550) and, optionally, outputting the additional product information from PDA 3960 (4560).
- Inputting the request may include, for example, navigating through one or more screens to request price information for a displayed product. Such navigation may include, for example, using a stylus or voice command.
- Process 4500 includes inputting payment information for the customer 4302 into the PDA 3960 (4570).
- Payment information may be input using, for example, voice input over microphone 3965, keyboard 3964, and a stylus or other mode of input for display
- Personal information also may be entered, perhaps as part of the payment information. Implementations may, for example, use one or more of the techniques and approaches described earlier. In one implementation that allows voice input, the implementation prompts for address information in a "reverse" order — for example, state, then zip code, then city, then street address — to allow for smaller grammars and better search results.
- Implementations of process 4500 may use multiple modalities in performing the various input and output operations. Implementations of process 4500 need not be performed with hand-held or mobile devices but may be performed with, for example, one or more fixed-location computers on a sales floor, such as, for example the information kiosk 4312.
- the worker 3510 may click on, or say a name of any link 4602, 4604, 4606, 4608, and 4610 of a screen 4612 to perform a desired operation.
- the worker 3510 may want to search for particular products by selecting the "Sales Catalog" link 4602.
- a screen 4614, shown in FIG. 46B may come up. There are multiple possibilities for a product search.
- the worker may have said or entered an existing customer's name into a field 4658, accessing the server system 3514's customer information.
- the worker 3510 may then select a delivery type 4660 and a payment method 4662.
- the worker 3510 may review the order by selecting "review the order"
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Applications Claiming Priority (11)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US45476203P | 2003-03-14 | 2003-03-14 | |
| US454762P | 2003-03-14 | ||
| US47089803P | 2003-05-16 | 2003-05-16 | |
| US470898P | 2003-05-16 | ||
| US47421703P | 2003-05-30 | 2003-05-30 | |
| US474217P | 2003-05-30 | ||
| US743348 | 2003-12-23 | ||
| US10/743,343 US20040181467A1 (en) | 2003-03-14 | 2003-12-23 | Multi-modal warehouse applications |
| US10/743,348 US7603291B2 (en) | 2003-03-14 | 2003-12-23 | Multi-modal sales applications |
| US743343 | 2003-12-23 | ||
| PCT/US2004/007724 WO2004084024A2 (en) | 2003-03-14 | 2004-03-12 | Multi-modal warehouse applications |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP1606748A2 true EP1606748A2 (de) | 2005-12-21 |
Family
ID=33033340
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP04720473A Ceased EP1606748A2 (de) | 2003-03-14 | 2004-03-12 | Multimodus-warehouse-anwendungen |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP1606748A2 (de) |
| WO (1) | WO2004084024A2 (de) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1969540A1 (de) * | 2005-12-05 | 2008-09-17 | Sap Ag | Umgang mit aussergewöhnlichen situationen in einer warenhausverwaltung |
| US8388410B2 (en) | 2007-11-05 | 2013-03-05 | P.R. Hoffman Machine Products, Inc. | RFID-containing carriers used for silicon wafer quality |
| US10552786B2 (en) * | 2014-12-26 | 2020-02-04 | Hand Held Products, Inc. | Product and location management via voice recognition |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001069422A2 (en) * | 2000-03-10 | 2001-09-20 | Webversa, Inc. | Multimodal information services |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7685252B1 (en) * | 1999-10-12 | 2010-03-23 | International Business Machines Corporation | Methods and systems for multi-modal browsing and implementation of a conversational markup language |
| AU2919401A (en) * | 1999-10-28 | 2001-05-08 | Supersale.Com | Method and apparatus for acquiring and providing inventory data |
| US6732934B2 (en) * | 2001-01-12 | 2004-05-11 | Symbol Technologies, Inc. | Escorted shopper system |
-
2004
- 2004-03-12 EP EP04720473A patent/EP1606748A2/de not_active Ceased
- 2004-03-12 WO PCT/US2004/007724 patent/WO2004084024A2/en not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001069422A2 (en) * | 2000-03-10 | 2001-09-20 | Webversa, Inc. | Multimodal information services |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2004084024A2 (en) | 2004-09-30 |
| WO2004084024A3 (en) | 2005-02-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7603291B2 (en) | Multi-modal sales applications | |
| US20040181467A1 (en) | Multi-modal warehouse applications | |
| US11995698B2 (en) | System for virtual agents to help customers and businesses | |
| US11632341B2 (en) | Enabling communication with uniquely identifiable objects | |
| EP2149114B1 (de) | System und verfahren zum erhalten von wareninformationen | |
| EP1481328B1 (de) | Benutzerschnittstelle und dynamische grammatik in einer multimodus-synchronisationsarchitektur | |
| AU2005224747B2 (en) | Real-time sales support and learning tool | |
| US8521585B2 (en) | System and method for using voice over a telephone to access, process, and carry out transactions over the internet | |
| US20250086651A1 (en) | Leveraging data for platform support using large language machine-learned model-based agents | |
| CN101014951A (zh) | 顾客信息亭 | |
| WO2004084024A2 (en) | Multi-modal warehouse applications | |
| CN120071936A (zh) | 一种基于迁移学习的智能导购语音识别方法及系统 | |
| WO2025178831A1 (en) | Generating database query using machine-learned large language models | |
| CN1788277A (zh) | 多模态仓库应用 | |
| TW474090B (en) | Speech-enabled information processing | |
| US20250147954A1 (en) | Database search based on machine learning based language models | |
| US20210383432A1 (en) | System and method for interactive business promotion based on artificial intelligence | |
| KR20010107215A (ko) | 음성 인식 기능 서버와 통화 발신 위치 추적 장치를 갖춘인터넷-폰을 이용한 구매 방법 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20051006 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PL PT RO SE SI SK TR |
|
| AX | Request for extension of the european patent |
Extension state: AL LT LV MK |
|
| RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: HANLEY, JOHN Inventor name: KWEK, JU-KAY Inventor name: OR, WAI Inventor name: ANDERSON, JORDAN Inventor name: LESSMOELLMANN, CHRISTOPH Inventor name: GONG, LI Inventor name: WENG, JIE Inventor name: RAIYANI, SAMIR |
|
| RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: HANLEY, JOHN Inventor name: KWEK, JU-KAY Inventor name: OR, WAI Inventor name: ANDERSON, JORDAN Inventor name: LESSMOELLMANN, CHRISTOPH Inventor name: GONG, LI Inventor name: WENG, JIE Inventor name: RAIYANI, SAMIR |
|
| DAX | Request for extension of the european patent (deleted) | ||
| RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: SAP SE |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
| 18R | Application refused |
Effective date: 20160722 |