US20090287626A1

US20090287626A1 - Multi-modal query generation

Info

Publication number: US20090287626A1
Application number: US12/200,648
Authority: US
Inventors: Timothy Seung Yoon Paek; Bo Thiesson; Yun-Cheng Ju; Bongshin Lee; Christopher A. Meek
Original assignee: Microsoft Corp
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2008-05-14
Filing date: 2008-08-28
Publication date: 2009-11-19
Also published as: US8090738B2; US20090287680A1; US20090287681A1

Abstract

A multi-modal search system (and corresponding methodology) is provided. The system employs text, speech, touch and gesture input to establish a search query. Additionally, a subset of the modalities can be used to obtain search results based upon exact or approximate matches to a search result. For example, wildcards, which can either be triggered by the user or inferred by the system, can be employed in the search.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Patent application Ser. No. 61/053,214 entitled “MULTI-MODALITY SEARCH INTERFACE” and filed May 14, 2008. This application is related to pending U.S. patent application Ser. No. _______ entitled “MULTI-MODAL QUERY REFINEMENT” filed on ______ and to pending U.S. patent application Ser. No. ______ entitled “MULTI-MODAL SEARCH WILDCARDS” filed on ______. The entireties of the above-noted applications are incorporated by reference herein.

BACKGROUND

The Internet continues to make available ever-increasing amounts of information which can be stored in databases and accessed therefrom. With the proliferation of mobile and portable terminals (e.g., cellular telephones, personal data assistants (PDAs), smartphones and other devices), users are becoming more mobile, and hence, more reliant upon information accessible via the Internet. Accordingly, users often search network sources such as the Internet from their mobile device.
There are essentially two phases in an Internet search. First, a search query is constructed that can be submitted to a search engine. Second the search engine matches this search query to actual search results. Conventionally, these search queries were constructed merely of keywords that were matched to a list of results based upon factors such as relevance, popularity, preference, etc.
The Internet and the World Wide Web continue to evolve rapidly with respect to both volume of information and number of users. As a whole, the Web provides a global space for accumulation, exchange and dissemination of information. As mobile devices become more and more commonplace to access the Web, the number of users continues to increase.
In some instances, a user knows the name of a site, server or URL (uniform resource locator) to the site or server that is desired for access. In such situations, the user can access the site, by simply typing the URL in an address bar of a browser to connect to the site. Oftentimes, the user does not know the URL and therefore has to ‘search’ the Web for relevant sources and/or URL's. To maximize likelihood of locating relevant information amongst an abundance of data, Internet or web search engines are regularly employed.
Traditionally, to locate a site or corresponding URL of interest, the user can employ a search engine to facilitate locating and accessing sites based upon alphanumeric keywords and/or Boolean operators. In aspects, these keywords are text- or speech-based queries, although, speech is not always reliable. Essentially, a search engine is a tool that facilitates web navigation based upon textual (or speech-to-text) entry of a search query usually comprising one or more keywords. Upon receipt of a search query, the search engine retrieves a list of websites, typically ranked based upon relevance to the query. To enable this functionality, the search engine must generate and maintain a supporting infrastructure.
Upon textual entry of one or more keywords as a search query, the search engine retrieves indexed information that matches the query from an indexed database, generates a snippet of text associated with each of the matching sites and displays the results to the user. The user can thereafter scroll through a plurality of returned sites to attempt to determine if the sites are related to the interests of the user. However, this can be an extremely time-consuming and frustrating process as search engines can return a substantial number of sites. More often than not, the user is forced to narrow the search iteratively by altering and/or adding keywords and Boolean operators to obtain the identity of websites including relevant information, again by typing (or speaking) the revised query.
Conventional computer-based search, in general, is extremely text-centric (pure text or speech-to-text) in that search engines typically analyze content of alphanumeric search queries in order to return results. These traditional search engines merely parse alphanumeric queries into ‘keywords’ and subsequently perform searches based upon a defined number of instances of each of the keywords in a reference.
Currently, users of mobile devices, such as smartphones, often attempt to access or ‘surf’ the Internet using keyboards or keypads such as, a standard numeric phone keypad, a soft or miniature QWERTY keyboard, etc. Unfortunately, these input mechanisms are not always efficient for the textual input to efficiently search the Internet. As described above, conventional mobile devices are limited to text input to establish search queries, for example, Internet search queries. Text input can be a very inefficient way to search, particularly for long periods of time and/or for very long queries.

SUMMARY

The following presents a simplified summary of the innovation in order to provide a basic understanding of some aspects of the innovation. This summary is not an extensive overview of the innovation. It is not intended to identify key/critical elements of the innovation or to delineate the scope of the innovation. Its sole purpose is to present some concepts of the innovation in a simplified form as a prelude to the more detailed description that is presented later.
The innovation disclosed and claimed herein, in one aspect thereof, comprises a search system and corresponding methodologies that can couple speech, text and touch for search interfaces and engines. In other words, rather than being completely dependent upon conventional textual input, the innovation can combine speech, text, and touch to enhance usability and efficiency of search mechanisms. Accordingly, it can be possible to locate more meaningful and comprehensive results as a function of a search query.
In aspects, a multi-modal search management system employs a query administration component to analyze multi-modal input (e.g., text, speech, touch) and to generate appropriate search criteria. Accordingly, comprehensive and meaningful search results can be gathered. The features of the innovation can be incorporated into a search engine or, alternatively, can work in conjunction with a search engine.
In other aspects, the innovation can be incorporated or retrofitted into existing search engines and/or interfaces. Yet other aspects employ the features, functionalities and benefits of the innovation in mobile search applications, which has strategic importance given the increasing usage of mobile devices as a primary computing device. As described above, mobile devices are not always configured or equipped with full-function keyboards, thus, the multi-modal functionality of the innovation can be employed to greatly enhance comprehensiveness of search.
In yet another aspect thereof, machine learning and reasoning is provided that employs a probabilistic and/or statistical-based analysis to prognose or infer an action that a user desires to be automatically performed.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the innovation are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles of the innovation can be employed and the subject innovation is intended to include all such aspects and their equivalents. Other advantages and novel features of the innovation will become apparent from the following detailed description of the innovation when considered in conjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example block diagram of a system that establishes a query from a multi-modal input in accordance with aspects of the innovation.

FIG. 2 illustrates an example user interface in accordance with an aspect of the innovation.

FIG. 3 illustrates an example of a typical speech recognition system in accordance with an aspect of the innovation.

FIG. 4 illustrates an alternative example block diagram of a speech recognition system in accordance with an aspect of the innovation.

FIG. 5 illustrates an example flow chart of procedures that facilitate generating a query from a multi-modal input in accordance with an aspect of the innovation.

FIG. 6 illustrates an example flow chart of procedures that facilitate analyzing a multi-modal input in accordance with an aspect of the innovation.

FIG. 7 illustrates an example block diagram of a query administration component in accordance with an aspect of the innovation.

FIG. 8 illustrates an example analysis component in accordance with an aspect of the innovation.

FIG. 9 illustrates a block diagram of a computer operable to execute the disclosed architecture.

FIG. 10 illustrates a schematic block diagram of an exemplary computing environment in accordance with the subject innovation.

DETAILED DESCRIPTION

The innovation is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the subject innovation. It may be evident, however, that the innovation can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the innovation.
As used in this application, the terms “component” and “system” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component can be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers.
As used herein, the term to “infer” or “inference” refer generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources.
While certain ways of displaying information to users are shown and described with respect to certain figures as screenshots, those skilled in the relevant art will recognize that various other alternatives can be employed. The terms “screen,” “web page,” and “page” are generally used interchangeably herein. The pages or screens are stored and/or transmitted as display descriptions, as graphical user interfaces, or by other methods of depicting information on a screen (whether personal computer, PDA, mobile telephone, or other suitable device, for example) where the layout and information or content to be displayed on the page is stored in memory, database, or another storage facility.
Conventional voice-enabled search applications encourage users to “just say what you want” in order to obtain useful content such as automated directory assistance (ADA) via a mobile device. Unfortunately, when users only remember part of what they are looking for, they are forced to guess, even though what they know may be sufficient to retrieve the desired information. Additionally, oftentimes, quality of the voice recognition is impaired by background noise, speaker accents, speaker clarity, quality of recognition applications or the like.
The innovation discloses systems (and corresponding methodologies) that expand the conventional capabilities of voice-activated search to allow users to explicitly constrain the recognition results to match the queery by supplementing the speech with additional criteria, for example, to provide partial knowledge in the form of text hints. In doing so, a multi-modal approach is presented which incorporates voice with text, touch, etc. This multi-modal functionality enables users more accurately access desired information.
In aspects, the innovation discloses a multi-modal search interface that tightly couples speech, text and touch by utilizing regular expression queries that employ ‘wildcards,’ where parts of the query can be input via different modalities. For instance, modalities such as speech, text, touch and gestures can be used at any point in the query construction process. In other aspects, the innovation can represent uncertainty in a spoken recognized result as wildcards in a regular expression query. In yet other aspects, the innovation allows users to express their own uncertainty about parts of their utterance using expressions such as “something” or “whatchamacallit” which can then be translated into or interpreted as wildcards.
Referring initially to the drawings, FIG. 1 illustrates an example block diagram of a system 100 that employs a multi-modal search management system 102 to construct meaningful search results based upon a multi-modal input query. It is to be understood that, as used herein, ‘multi-modal’ can refer to most any combination of text, voice, touch, gestures, etc. While examples described herein are directed to a specific multi-modal example that employs text, voice and touch only, it is to be understood that other examples exist that employ a subset of these identified modalities. As well, it is to be understood that other examples exist that employ disparate modalities in combination with or separate from those described herein. For instance, other examples can employ gesture input, pattern recognition, among others to establish a search query. Similarly, while the examples are directed to mobile device implementations, it is to be understood that the features, functions and benefits of the innovation can be applied to most any computing experience, platform and/or device without departing from the spirit and scope of this disclosure and claims appended hereto.
As shown the multi-modal search management component 102 can include a query administration component 104 and a search engine component 106. Essentially, these subcomponents (104, 106) enable a user to establish a query using multiple modalities and to search for data and other resources using the multi-modal query, for example, a query constructed using text, voice, touch, gestures, etc. Features, functions and benefits of the innovation will be described in greater detail below.
Internet usage, especially via mobile devices, continues to grow as users seek anytime, anywhere access to information. Because users frequently search for businesses, directory assistance has recently been the focus of conventional voice search applications utilizing speech as the primary input modality. Unfortunately, mobile scenarios often contain noise which degrades performance of speech recognition functionalities. Thus, the innovation presents a multi-modal search management system 102 which can employ user interfaces (UIs) that not only can facilitate touch and text whenever speech fails, but also allows users to assist the speech recognizer via text hints.
Continuing with the ADA example from above, in generating a search query, the innovation can also take advantage of most any partial knowledge users may have about a business listing by letting them express their uncertainty in a simple, intuitive manner. In simulation experiments conducted on real voice search data, leveraging multi-modal refinement resulted in a 28% relative reduction in error rate. Providing text hints along with the spoken utterance resulted in even greater relative reduction, with dramatic gains in recovery for each additional character.
As can be appreciated, according to market research, mobile devices are believed to be poised to rival desktop and laptop PCs (personal computers) as a dominant Internet platform, providing users with anytime, anywhere access to information. One common request for information is the telephone number or address of local businesses. Because perusing a large index of business listings can be a cumbersome affair using existing mobile text and touch input mechanisms, directory assistance has been emerged as a focus of voice search applications, which utilize speech as the primary input modality. Unfortunately, mobile environments pose problems for speech recognition, even for native speakers. First, mobile settings often contain non-stationary noise which cannot be easily cancelled or filtered. Second, speakers tend to adapt to surrounding noise in acoustically unhelpful ways. Under such adverse conditions, task completion for voice search is less than stellar, especially in the absence of an effective correction user interface for dealing with speech recognition errors.
In operation, the query administration component 104 can receive multi-modal input(s), generate an appropriate query and instruct the search engine component 106 accordingly. As will be understood upon a review of the figures and discussions that follow, the query administration component 104 enables one modality to be supplemented with another thereby enhancing interpretation and ease of use in locating meaningful search results. In one example, speech input can be supplemented with textual hints (e.g., a beginning letter of a word) to enhance recognition accuracy. Similarly, textual input can be supplemented with speech to enhance scope of a search query. Still further, system generated and user prompted wildcards can be used to facilitate, improve, increase or boost functionality.
In view of the challenges of conventional voice search approaches, especially mobile voice search, the multi-modal search management system 102 can generate (or otherwise employ) a UI as illustrated in FIG. 2. The multi-modal UI tightly couples speech with touch and text in at least two directions; users can not only use touch and text to clarify, supplement or generate their queries whenever recognition of speech is not sufficiently reliable, but they can also use speech whenever text entry becomes burdensome. Additionally, the innovation enables leverage of this tight coupling by transforming a typical n-best list, or a list of phrase alternates from the recognizer, into a palette of words with which users can compose and refine queries, e.g., as described in the Related Application identified above.
The innovation can also take advantage of most any partial knowledge users may have about the words, e.g., of the business listing. For example, a user may only remember that the listing starts with an “s” and also contains the word “avenue”. Likewise, the user may only remember “Saks something,” where the word “something” is used to express uncertainty about what words follow. While the word ‘something’ is used in the aforementioned example, it is to be appreciated that most any desired word or indicator can be used without departing from the spirit/scope of the innovation and claims appended hereto. The innovation represents this uncertainty as wildcards in an enhanced regular expression search of the listings, which exploits the popularity of the listings.
This disclosure is focused on three phases. First, a description of the system 100 architecture together with a contrast against a typical architecture of conventional voice search applications. The specification also details the backend search infrastructure deployed for fast and efficient retrieval. Second, the disclosure presents an example UI that highlights the innovation's tightly coupled multi-modal generation capabilities and support of partial knowledge with several user scenarios.
It is to be understood that the ADA examples described herein are included to provide perspective to the features, functions and benefits of the innovation and are not intended to limit the scope of the disclosure and appended claims in any manner. The following ADA example references an implementation where users can request telephone or address information of residential and business listings using speech recognition via a network (e.g., Internet) equipped mobile device (e.g., smartphone, cell phone, personal digital assistant, personal media player, navigation system, pocket PC . . . ). As will be appreciated, with increased use of Internet-capable mobile communication devices, ADA is a growing industry with over 30 million U.S. callers per month. Many voice search applications focus exclusively on telephony-based ADA. However, more recent applications have migrated onto other mobile devices, providing users with a rich client experience which includes, among other services, maps and driving directions in addition to ADA. Whether users call ADA or use a data channel to send utterances, the speech recognition task is most always dispatched to speech servers, due to the fact that decoding utterances for large domains with many choices (e.g., high perplexity domains) requires sufficient computational power, which to date does not exist on mobile devices. However, it is to be appreciated that the features, functions and benefits of the innovation can be employed in connection with any data or electronic search including, but not limited to, Internet and intranet searching embodiments.
Returning to the ADA example, because there are currently over 18 million listings in the U.S. Yellow Pages alone, and users frequently may not use the exact name of the listing as found in the directory (e.g., “Maggiano's Italian Restaurant” instead of “Maggiano's Little Italy”), grammar-based recognition approaches that rely on lists fail to scale properly. As such, approaches to ADA have focused on combing speech recognition with information retrieval techniques.
As described supra, voice search applications encourage users to “just say what you want” in order to obtain useful mobile content such as ADA. Unfortunately, when users only remember part of what they are looking for, they are forced to guess, even though what they know may be sufficient to retrieve the desired information. In this disclosure, it is proposed to expand the capabilities of voice search to allow users to explicitly express their uncertainties as part of their queries, and as such, to provide partial knowledge. Applied to ADA, the disclosure highlights the enhanced user experience uncertain expressions affords and delineates how to perform language modeling and information retrieval.
Voice search applications encourage users to “just say what you want” in order to obtain useful mobile content such as business listings, driving directions, movie times, etc. Because certain types of information require recognition of a large database of choices, voice search is often formulated as both a recognition and information retrieval (IR) task, where a spoken utterance is first converted into text and then used as a search query for IR. ADA exemplifies the challenges of voice search. Not only are there millions of possible listings (e.g., 18 million in the US alone), but users also do not frequently know, remember, or say the exact business names as listed in the directory. As illustrated in FIG. 2, in some cases, users think they know but are mistaken (e.g., “Le Sol Spa” for the listing “Le Soleil Tanning and Spa”). In other cases, they remember only part of the name with certainty (e.g., listing starts with “Le” and contains the word “Spa”). In these cases, what they remember may actually be sufficient to find the listing. Unfortunately, in current voice search applications, users are forced to guess and whatever partial knowledge they could have provided is lost.
In this specification, the innovation enables expansion of the capabilities of voice search to enable users to explicitly express their uncertainties as part of their queries, and as such, to allow systems to leverage most any partial knowledge contained in those queries.
Voice search applications with a UI as shown in FIG. 2 can offer even richer user experiences. In accordance with the example multi-modal interface, the innovation displays not only the top matches for uncertain expressions, but also the query itself for users to edit, for example, in case they wanted to refine their queries using text as set for the in the Related Application identified above. FIG. 2 illustrates a screenshot of results for the spoken utterance “Le S Something Spa”, from the previous example, as well the more general expression “Le Something Spa”. Note that the system not only retrieved exact matches for the utterances as a regular expression query, but also approximate matches.
As discussed earlier, the innovation's approach to voice search involve recognition plus IR. For ADA recognition, n-gram statistical language models are typically used to compress and generalize across listings as well as their observed user variations. In order to support n-gram recognition of uncertain expressions, The training data can be modified. Given that not enough occurrences of the word “something” appeared in the training sentences for it to be accurately recognized (e.g., 88), that number was boosted artificially by creating pseudo-listings from the original data. For every listing which was not a single word (e.g., “Starbucks”), the innovation adds new listings with “*” and “i-*” replacing individual words, where i denotes the initial letter of the word being replaced. For listings with more than two words, because people tend to remember either the first or last word of a listing, the innovation can focus on replacing interior words. Furthermore, to preserve counts for priors, 4 new listings (and 4 duplicates for single word listings) were added. For example, for the listing “Le Soleil Tanning and Spa”, “Le *”, “Le S*”, “* Spa”, and “T* Spa” were generated. Although this approach of adding new listings with words replaced by “*” and “i-*” is a heuristic, it was found that it facilitated adequate bigram coverage. Finally, the pronunciation dictionary was modified so that “*” could be recognized as “something”.
The advantage of this approach is at least two-fold. First, because the innovation replaced words with “*” and “i-*” instead of the word “something” and avoids conflicts with businesses that had “something” as part of their name (only 9 in the Seattle area). Second, by having the recognition produce wildcards it is possible to treat the recognized result in its very condition as a regular expression for search.
Turning to a discussion of information retrieval, after obtaining a regular expression from the recognizer (e.g., “Le * Spa”), an index and retrieval algorithm can be used that could quickly find likely matches for the regular expression. This is accomplished by encoding the directory listing as a k-best suffix array. Because a k-best suffix array is sorted by both lexicographic order and most any figure of merit, such as the popularity of listings in the call logs, it is a convenient data structure for finding the most likely, or in this case, the most popular matches for a substring, especially when there could be many matches. For example, for the query “H* D*”, the k-best suffix array would quickly bring up “Home Depot” as the top match. Furthermore, because lookup time for finding the k most popular matches is close to O(log N) for most practical situations with a worst case guarantee of O(sqrt N), where N is the number of characters in the listings, user experience did not suffer from any additional retrieval latencies. Note that before any regular expression was submitted as a search query, a few simple heuristics were applied to clean it up (e.g., consecutive wildcards were collapsed into 1 wildcard).
Besides regular expression queries using a k-best suffix array, which provides popular exact matches to the listings, it is also useful to also obtain approximate matches. For this purpose, an improved term frequency can be implemented—e.g., inverse document frequency (TFIDF) algorithm. Because statistical language models can produce garbled output, voice search typically utilizes approximate search techniques, such as TFIDF, because they treat the output as just a bag of words. This is advantageous when users either incorrectly remember the order of words in a listing, or add spurious words. In some ways, the two IR methods are flip sides of each other. The strength of finding exact matches is that the innovation can leverage most any partial knowledge users may have about their queries (e.g., word order) as well as the popularity of any matches. Its weakness is that it assumes users are correct about their partial knowledge. On the other hand, this is the strength of finding approximate matches; it is indifferent to word order and other mistakes users often make.
FIG. 3 displays an example architecture for typical voice search applications. As illustrated, first, an utterance can be recognized using an n-gram statistical language model (SLM) that compresses and generalizes across training sentences. In the case of ADA, the training sentences comprise not only the exact listings and business categories but also alternative expressions for those listings. Because an n-gram is based on word collocation probabilities, the output of the speech recognizer is an n-best list containing phrases that may or may not match any of the training sentences. This is often acceptable if the phrases are submitted to an information retrieval (IR) engine that utilizes techniques which treat the phrases as just bags of words.
The IR engine (or search engine) retrieves matches from an index, which is typically a subset of the language model training sentences, such as the exact listings along with their categories. In the example architecture, if an utterance is recognized with high confidence, it is immediately sent to the IR engine to retrieve the best matching listing. However, if an utterance is ambiguous in any way, as indicated for example by medium to low confidence scores, voice search applications with a graphical UI very often display an n-best list to users for selection, at which point users can either select a result (e.g., phrase) or retry their utterance.
In contrast to the voice search architecture of FIG. 3, FIG. 4 illustrates an alternative example system architecture in accordance with the innovation. It is to be understood that the ‘Search Vox’ component illustrated in FIG. 4 is analogous to the multi-modal management system 102 of FIG. 1. As shown in FIG. 4, first, high confidence results immediately go to the IR engine. Second, users are shown the n-best list, though the interaction dynamics are fundamentally different than that of conventional systems. In accordance with the innovation, if subsequent refinement is desired, e.g., as set forth in the Related Application referenced above, users can not only select a phrase from the n-best list, but also the individual words which make up those phrases thereby refining search results by way of effectively drilling into a set of search results.
The n-best list is essentially treated as a sort of word palette or ‘bag of words’ from which users can select out those words that the speech recognizer heard or interpreted correctly, though they may appear in a different phrase. For example, suppose a user says “home depot,” but because of background noise, the phrase does not occur in the n-best list. Suppose, however, that the phrase “home office design” appears in the list. With typical (or conventional) voice search applications, the user would have to start over.
In accordance with the innovation, the user can simply select the word “home” and invoke the backend which finds the most popular listings that contain the word. For instance, the system can measure popularity by the frequency with which a business listing appears in the ADA call logs, for example, for Live Local Search. In order to retrieve the most popular listings that contain a particular word or substring, regular expressions can be used.
Because, in aspects, much of the effectiveness of the innovation's interface rests on its ability to retrieve listings using a wildcard query—or a regular expression query containing wildcards—a discussion follows that describes implementation of a RegEx engine followed by further details about wildcard queries constructed in the RegEx generator. Essentially, in operation, the RegEx generator and RegEx engine facilitate an ability to employ wildcards in establishing search queries.
It will be understood that the components of FIG. 4 can be deployed within the higher level components of FIG. 1, e.g., multi-modal search management system 102, query administration component 104 and search engine component 106. Three other sub-components of the system architecture are discussed below: the IR engine, the supplement generator, and the list filter (FIG. 4).
Turning first to a discussion of the RegEx engine, the index data structure chosen to use for regular expression matching can be based upon k-best suffix arrays. Similar to traditional suffix arrays, k-best suffix arrays arrange all suffixes of the listings into an array. While traditional suffix arrays arrange the suffixes in lexicographical order only, the k-best suffix arrays of the innovation arrange the suffixes according to two alternating orders—a lexicographical ordering and an ordering based on a figure of merit, such as popularity, preference, etc. The arrangement of the array borrows from ideas seen in the construction of KD-trees.
Because the k-best suffix array is sorted by both lexicographic order and popularity, it is a convenient structure for finding the most popular matches for a substring, especially when there are many matches. In an aspect, the k most popular matches can be found in time close to O(log N) for most practical situations, and with a worst case guarantee of O(sqrt N), where N is the number of characters in the listings. In contrast, a standard suffix array enables locating most all matches to a substring in O(log N) time, but does not impose any popularity ordering on the matches. To find the most popular matches, the user would have to traverse them all.
Consider a simple example which explains why this subtle difference is important to the application. The standard suffix array may be sufficiently fast when searching for the k-best matches to a large substring since there will not be many matches to traverse in this case. The situation is, however, completely different for a short substring such as, for example, ‘a’. In this case, a user would have to traverse all dictionary entries containing an ‘a’, which is not much better than traversing all suffixes in the listings—in O(N) time. With a clever implementation, it is possible to continue a search in a k-best suffix array from the position it was previously stopped. A simple variation of k-best suffix matching will therefore allow look up of the k-best (most popular) matches for an arbitrary wildcard query, such as, for instance ‘f* m* ban*’. The approach proceeds as the k-best suffix matching above for the largest substring without a wildcard (‘ban’). At each match, the innovation now evaluates the full wildcard query against the full listing entry for the suffix and continues the search until k valid expansions to the wildcard query are found.
The k-best suffix array can also be used to exclude words in the same way by continuing the search until expansions without the excluded words are found. The querying process is an iterative process, which gradually eliminates the wildcards in the text string. Whenever the largest substring in the wildcard query does not change between iterations, there is an opportunity to further improve the computational efficiency of the expansion algorithm. In this case, the k-best suffix matching can just be continued from the point where the previous iteration ended.
With an efficient k-best suffix array matching algorithm available for the RegEx engine, it can be deployed, for example onto a mobile device, because of the latencies associated with sending information back and forth along a wireless data channel. Speech recognition for ADA already takes several seconds to return an n-best list. It is desirable to provide short latencies for wildcard queries—the innovation is capable of enhancing (or shortening) the latencies.
Turning now to a discussion of the IR engine, besides wildcard queries, which provide exact matches to the listings, it is useful to also retrieve approximate matches to the listings. For at least this purpose, the innovation implements an IR engine based on an improved term frequency—inverse document frequency (TFIDF) algorithm. What is important to note about the IR engine is that it can treat queries and listings as bags of words. This is advantageous when users either incorrectly remember the order of words in a listing, or add additional words that do not actually appear in a listing. This is not the case for the RegEx engine where order and the presence of suffixes in the query matter.
Referring now to the RegEx generator, returning to the example in which a user selects the word “home” for “home depot” from a word palette, once the user invokes the backend, the word is sent as a query to a RegEx generator which transforms it into a wildcard query. For single phrases, the generator can simply insert wildcards before spaces, as well as to the end of the entire query. For example, for the query “home”, the generator could produce the regular expression “home*”.
For a list of phrases, such as an n-best list from the recognizer, the RegEx or wildcard generator uses minimal edit distance (with equal edit operation costs) to align the phrases at the word level. Once words are aligned, minimal edit distance is again applied to align the characters. Whenever there is disagreement between any aligned words or characters, a wildcard is substituted in its place. For example, for an n-best list containing the phrases “home depot” and “home office design,” the RegEx generator would produce “home * de*”. After an initial query is formulated, the RegEx generator applies heuristics to clean up the regular expression (e.g., in an aspect, no word can have more than one wildcard) before it is used to retrieve k-best matches from the RegEx engine. The RegEx generator is invoked in this form whenever speech is utilized, such as for leveraging partial knowledge.
Turning now to the supplement generator of FIG. 4, as discussed earlier, the innovation's interface treats a list of phrases as a word palette. Because the word palette is most useful when it is filled with words to choose from, whenever the recognizer produces a short n-best list with less phrases than can appear in the user interface (which for a pocket PC interface is most often 8 items as shown in FIG. 2), or whenever a no-speech query has been submitted (e.g., “home*” in the previous example), it is the job of the supplement generator (FIG. 4) to retrieve matches from the backend for the UI.
Currently, the supplement generator attempts to find exact matches from the RegEx engine first since it will be obvious to users why they were retrieved. Space permitting, approximate matches are also retrieved from the IR engine. This can be accomplished in the following manner: If any exact matches have already been found, the supplement generator will use those exact matches as queries to the IR engine until enough matches have been retrieved. If there are no exact matches, the supplement generator will use whatever query was submitted to the RegEx generator as the query.
Finally, the List filter simply uses a wildcard query to filter out an n-best list obtained from the speech recognizer. In operation, the List filter is used primarily for text hints, which are discussed infra.
As discussed in the previous section, the innovation can display an n-best list to the user, making an interface (e.g., UI of FIG. 2) appear, at least at first blush, like any other voice search application. This aspect facilitates a default correction mechanism users may expect of speech applications; namely, that when their utterances fail to be correctly recognized, they may still select from a list of choices, provided that their utterance exists among these choices. However, because re-speaking does not generally increase the likelihood that the utterance will be recognized correctly, and furthermore, because mobile usage poses distinct challenges not encountered in desktop settings, the interface endows users with a larger arsenal of recovery strategies—for example, text hints, word selection from a word palette or bag of words, etc.
FIG. 5 illustrates a methodology of generating a multi-modal query in accordance with an aspect of the innovation. While, for purposes of simplicity of explanation, the one or more methodologies shown herein, e.g., in the form of a flow chart, are shown and described as a series of acts, it is to be understood and appreciated that the subject innovation is not limited by the order of acts, as some acts may, in accordance with the innovation, occur in a different order and/or concurrently with other acts from that shown and described herein. For example, those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram. Moreover, not all illustrated acts may be required to implement a methodology in accordance with the innovation.
At 502, a multi-modal input is received, for example, by way of the UI of FIG. 2. In operation, multi-modal input can include text, speech, touch, gesture, etc. input. While examples are described herein, it is to be understood that the multi-modal input can render and employ UIs that are capable of receiving most any protocol combination. Additionally, the inputs can be received at different timings as appropriate.
At 504, the multi-modal input is analyzed to interpret the input. For instance, text can be parsed, speech can be converted, etc. An appropriate search query can be generated at 506. In other words, as a result of the analysis, a search query can be established to increase accuracy and meaningfulness of results. As shown, results in accordance with the search query can be obtained at 508.
Referring now to FIG. 6, there is illustrated a methodology of generating a search query in accordance with the innovation. At 602, a multi-modal input is received, for example, text, speech, touch, gesture, etc. At 604, a determination is made to conclude if the document includes text data. If so, at 606, the data can be parsed and analyzed to determine keywords, text hints and/or context of the text. Additionally, a determination can be made if wildcards should be used to effect a query.
Similarly, at 608, a determination can be made to conclude if the document includes audible data. If the document includes audible data, at 610, speech recognition (or other suitable sound analysis) mechanisms can be used to establish keywords associated with the audible data and subsequently the context of the keywords in view of the other input(s) as appropriate.
Still further, at 612, a determination is made if the document contains gesture-related data. As with text and sound described above, if gestures were used to input, an evaluation can be effected at 614. For instance, if the gesture was intended to identify a specific number of words, this criterion can be established at 614.
Once the data is analyzed (e.g., 604-614), at 616, a search query can be generated. Here, wildcards can be used as appropriate to establish a comprehensive search query. Additionally, as described above, TFIDF algorithms can be applied where appropriate. Still further, other logic and inferences can be made to establish user intent based upon the multi-modal input thereby establishing a comprehensive query that can be used to fetch meaningful search results.
Turning now to FIG. 7, an example block diagram of query administration component 104 is shown. Generally, the query administration component 104 can include a query generation component 702 and an analysis component 704. Together these sub-components (702, 704) facilitate transformation of a multi-modal input into a comprehensive search query.
The query generation component 702 employs input from the analysis component 704 to establish an ample and comprehensive search query that will produce results in line with intentions of the user input. As described in connection with the aforementioned methodologies, the innovation can evaluate the multi-modal input. In operation, the analysis component 704 can be employed to effect this evaluation. Logic can be employed in connection with the analysis component 704 to effect the evaluation.
FIG. 8 illustrates an example block diagram of an analysis component 704. As shown, the analysis component 704 can include a text evaluation component 802, a speech evaluation component 804 and a gesture evaluation component 806, all of which are capable of evaluating multi-modal input in efforts to establish comprehensive search queries. While specific modality evaluation components are shown in FIG. 8 (802, 804, 806), it is to be understood that alternative aspects can include other evaluation components without departing from the spirit and/or scope of the innovation.
As illustrated, a logic component 808 can be employed to effect the evaluation and/or interpretation of the input. In aspects, logic component 808 can include rules-based and/or inference-based (e.g., machine learning and reasoning (MLR)) logic. This logic essentially enables the multi-modal input to be interpreted or construed to align with the intent of the raw input (or portions thereof).
As stated above, the innovation can employ MLR which facilitates automating one or more features in accordance with the subject specification. The subject innovation (e.g., in connection with input interpretation or query generation) can employ various MLR-based schemes for carrying out various aspects thereof. For example, a process for determining an intention or interpretation based upon a speech input can be facilitated via an automatic classifier system and process.
A classifier is a function that maps an input attribute vector, x=(x1, x2, x3, x4, xn), to a confidence that the input belongs to a class, that is, f(x)=confidence(class). Such classification can employ a probabilistic and/or statistical-based analysis (e.g., factoring into the analysis utilities and costs) to prognose or infer an action that a user desires to be automatically performed.
A support vector machine (SVM) is an example of a classifier that can be employed. The SVM operates by finding a hypersurface in the space of possible inputs, which the hypersurface attempts to split the triggering criteria from the non-triggering events. Intuitively, this makes the classification correct for testing data that is near, but not identical to training data. Other directed and undirected model classification approaches include, e.g., naïve Bayes, Bayesian networks, decision trees, neural networks, fuzzy logic models, and probabilistic classification models providing different patterns of independence can be employed. Classification as used herein also is inclusive of statistical regression that is utilized to develop models of priority.
As will be readily appreciated from the subject specification, the subject innovation can employ classifiers that are explicitly trained (e.g., via a generic training data) as well as implicitly trained (e.g., via observing user behavior, receiving extrinsic information). For example, SVM's are configured via a learning or training phase within a classifier constructor and feature selection module. Thus, the classifier(s) can be used to automatically learn and perform a number of functions, including but not limited to determining according to a predetermined criteria how to interpret an input, how to establish a query, etc.
Below, user scenarios are highlighted that demonstrate two concepts: first, tight coupling of speech with touch and text, so that whenever one of the three modalities fails or becomes burdensome, users may switch to another modality in a complementary way; second, leveraging of most any partial knowledge a user may have about the constituent words of their intended query.
Turning to a discussion of query generation using a word palette. In accordance with the innovation, a user can select words by way of a touch screen thereby establishing a search query. Additionally, the selected words can be chosen (or otherwise identified) for inclusion or alternatively, exclusion, from a set of search results. In other words, a selection can be used as a filter to screen out results that contain a particular word or set of words. Moreover, a selection can be supplemented with speech (or other modality) thereby enhancing the searching capability of the innovation. While many of the examples described herein are directed to selection of words from an n-best list, it is to be understood that the innovation can treat most any display rendering as a bag of words thereby enabling selection to enhance comprehensive searching and query construction.
As stated supra, the innovation can support query generation via multi-modal input by combining speech with text hints. Just in the way that users can resort to touch and text when speech fails, they can also resort to speech whenever typing becomes burdensome, or when they feel they have provided enough text hints for the recognizer to identify their query.
In an example, the user starts typing “m” for the intended query “mill creek family practice,” but because the query is too long to type, the user utters the intended query after pressing a trigger or specific functional soft key button. After the query returns from the backend, all choices in the list now start with an “m” and indeed include the user utterance may be displayed.
The innovation can achieve this functionality by first converting the text hint in the textbox into a wildcard query and then using that to filter the n-best list as well as to retrieve additional matches from the RegEx engine. In principle, the innovation acknowledges that the query should be used to bias the recognition of the utterance in the speech engine itself.
Referring now to FIG. 9, there is illustrated a block diagram of a computer operable to execute the disclosed architecture. In order to provide additional context for various aspects of the subject innovation, FIG. 9 and the following discussion are intended to provide a brief, general description of a suitable computing environment 900 in which the various aspects of the innovation can be implemented. While the innovation has been described above in the general context of computer-executable instructions that may run on one or more computers, those skilled in the art will recognize that the innovation also can be implemented in combination with other program modules and/or as a combination of hardware and software.
Generally, program modules include routines, programs, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the inventive methods can be practiced with other computer system configurations, including single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, hand-held computing devices, microprocessor-based or programmable consumer electronics, and the like, each of which can be operatively coupled to one or more associated devices.
The illustrated aspects of the innovation may also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules can be located in both local and remote memory storage devices.
A computer typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media can comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
With reference again to FIG. 9, the exemplary environment 900 for implementing various aspects of the innovation includes a computer 902, the computer 902 including a processing unit 904, a system memory 906 and a system bus 908. The system bus 908 couples system components including, but not limited to, the system memory 906 to the processing unit 904. The processing unit 904 can be any of various commercially available processors. Dual microprocessors and other multi-processor architectures may also be employed as the processing unit 904.
The system bus 908 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures. The system memory 906 includes read-only memory (ROM) 910 and random access memory (RAM) 912. A basic input/output system (BIOS) is stored in a non-volatile memory 910 such as ROM, EPROM, EEPROM, which BIOS contains the basic routines that help to transfer information between elements within the computer 902, such as during start-up. The RAM 912 can also include a high-speed RAM such as static RAM for caching data.
The computer 902 further includes an internal hard disk drive (HDD) 914 (e.g., EIDE, SATA), which internal hard disk drive 914 may also be configured for external use in a suitable chassis (not shown), a magnetic floppy disk drive (FDD) 916, (e.g., to read from or write to a removable diskette 918) and an optical disk drive 920, (e.g., reading a CD-ROM disk 922 or, to read from or write to other high capacity optical media such as the DVD). The hard disk drive 914, magnetic disk drive 916 and optical disk drive 920 can be connected to the system bus 908 by a hard disk drive interface 924, a magnetic disk drive interface 926 and an optical drive interface 928, respectively. The interface 924 for external drive implementations includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies. Other external drive connection technologies are within contemplation of the subject innovation.
The drives and their associated computer-readable media provide nonvolatile storage of data, data structures, computer-executable instructions, and so forth. For the computer 902, the drives and media accommodate the storage of any data in a suitable digital format. Although the description of computer-readable media above refers to a HDD, a removable magnetic diskette, and a removable optical media such as a CD or DVD, it should be appreciated by those skilled in the art that other types of media which are readable by a computer, such as zip drives, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the exemplary operating environment, and further, that any such media may contain computer-executable instructions for performing the methods of the innovation.
A number of program modules can be stored in the drives and RAM 912, including an operating system 930, one or more application programs 932, other program modules 934 and program data 936. All or portions of the operating system, applications, modules, and/or data can also be cached in the RAM 912. It is appreciated that the innovation can be implemented with various commercially available operating systems or combinations of operating systems.
A user can enter commands and information into the computer 902 through one or more wired/wireless input devices, e.g., a keyboard 938 and a pointing device, such as a mouse 940. Other input devices (not shown) may include a microphone, an IR remote control, a joystick, a game pad, a stylus pen, touch screen, or the like. These and other input devices are often connected to the processing unit 904 through an input device interface 942 that is coupled to the system bus 908, but can be connected by other interfaces, such as a parallel port, an IEEE 1394 serial port, a game port, a USB port, an IR interface, etc.
A monitor 944 or other type of display device is also connected to the system bus 908 via an interface, such as a video adapter 946. In addition to the monitor 944, a computer typically includes other peripheral output devices (not shown), such as speakers, printers, etc.
The computer 902 may operate in a networked environment using logical connections via wired and/or wireless communications to one or more remote computers, such as a remote computer(s) 948. The remote computer(s) 948 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 902, although, for purposes of brevity, only a memory/storage device 950 is illustrated. The logical connections depicted include wired/wireless connectivity to a local area network (LAN) 952 and/or larger networks, e.g., a wide area network (WAN) 954. Such LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, e.g., the Internet.
When used in a LAN networking environment, the computer 902 is connected to the local network 952 through a wired and/or wireless communication network interface or adapter 956. The adapter 956 may facilitate wired or wireless communication to the LAN 952, which may also include a wireless access point disposed thereon for communicating with the wireless adapter 956.
When used in a WAN networking environment, the computer 902 can include a modem 958, or is connected to a communications server on the WAN 954, or has other means for establishing communications over the WAN 954, such as by way of the Internet. The modem 958, which can be internal or external and a wired or wireless device, is connected to the system bus 908 via the serial port interface 942. In a networked environment, program modules depicted relative to the computer 902, or portions thereof, can be stored in the remote memory/storage device 950. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
The computer 902 is operable to communicate with any wireless devices or entities operatively disposed in wireless communication, e.g., a printer, scanner, desktop and/or portable computer, portable data assistant, communications satellite, any piece of equipment or location associated with a wirelessly detectable tag (e.g., a kiosk, news stand, restroom), and telephone. This includes at least Wi-Fi and Bluetooth™ wireless technologies. Thus, the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices.
Wi-Fi, or Wireless Fidelity, allows connection to the Internet from a couch at home, a bed in a hotel room, or a conference room at work, without wires. Wi-Fi is a wireless technology similar to that used in a cell phone that enables such devices, e.g., computers, to send and receive data indoors and out; anywhere within the range of a base station. Wi-Fi networks use radio technologies called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, fast wireless connectivity. A Wi-Fi network can be used to connect computers to each other, to the Internet, and to wired networks (which use IEEE 802.3 or Ethernet). Wi-Fi networks operate in the unlicensed 2.4 and 5 GHz radio bands, at an 11 Mbps (802.11a) or 54 Mbps (802.11b) data rate, for example, or with products that contain both bands (dual band), so the networks can provide real-world performance similar to the basic 10 BaseT wired Ethernet networks used in many offices.
Referring now to FIG. 10, there is illustrated a schematic block diagram of an exemplary computing environment 1000 in accordance with the subject innovation. The system 1000 includes one or more client(s) 1002. The client(s) 1002 can be hardware and/or software (e.g., threads, processes, computing devices). The client(s) 1002 can house cookie(s) and/or associated contextual information by employing the innovation, for example.
The system 1000 also includes one or more server(s) 1004. The server(s) 1004 can also be hardware and/or software (e.g., threads, processes, computing devices). The servers 1004 can house threads to perform transformations by employing the innovation, for example. One possible communication between a client 1002 and a server 1004 can be in the form of a data packet adapted to be transmitted between two or more computer processes. The data packet may include a cookie and/or associated contextual information, for example. The system 1000 includes a communication framework 1006 (e.g., a global communication network such as the Internet) that can be employed to facilitate communications between the client(s) 1002 and the server(s) 1004.
Communications can be facilitated via a wired (including optical fiber) and/or wireless technology. The client(s) 1002 are operatively connected to one or more client data store(s) 1008 that can be employed to store information local to the client(s) 1002 (e.g., cookie(s) and/or associated contextual information). Similarly, the server(s) 1004 are operatively connected to one or more server data store(s) 1010 that can be employed to store information local to the servers 1004.
What has been described above includes examples of the innovation. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the subject innovation, but one of ordinary skill in the art may recognize that many further combinations and permutations of the innovation are possible. Accordingly, the innovation is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.

Claims

1. A system that facilitates multi-modal search, comprising:

a query administration component that converts a multi-modal input into a wildcard search query; and

a search engine component that employs the wildcard search query to retrieve a list of query suggestion results.

2. The system of claim 1, further comprising:

a query generation component that employs a plurality of modalities to generate the wildcard search query; and

an analysis component that evaluates the wildcard search query and renders the list of query suggestion results as a function of the search query

3. The system of claim 2, wherein the plurality of modalities includes at least two of text, touch or speech.

4. The system of claim 2, wherein the query generation component facilitates generation of the wildcard search query based upon at least a portion of the list of query suggestion results.

5. The system of claim 1, wherein the list of query suggestion results includes one of an n-best list or alternates list from a speech recognizer and a list of supplementary results that includes at least one of an ‘exact’ match via a wildcard expression or an ‘approximate’ match via an information retrieval algorithm.

6. The system of claim 5, wherein a wildcard expression is generated from at least part of the n-best list obtained from a speech recognizer and used to retrieve items in an index or database which match the substrings of the wildcard search query.

7. The system of claim 5, wherein at least part of the n-best list obtained from the speech recognizer is submitted as a query to an information retrieval algorithm that is indifferent to the order of words in the wildcard search query.

8. The system of claim 7, wherein the information retrieval algorithm is a Term Frequency Inverse Document Frequency (TFIDF) algorithm.

9. The system of claim 2, wherein the query generation component employs user generated text to constrain speech recognition upon generating the wildcard search query.

10. The system of claim 2, wherein the query generation component dynamically converts a user input into a wildcard, and wherein the analysis component employs the wildcard to retrieve a subset of the suggested query results.

11. The system of claim 10, wherein a user conveys uncertainty, and wherein the wildcard search query is a regular expression query.

12. The system of claim 1, further comprising an artificial intelligence (AI) component that employs at least one of a probabilistic and a statistical-based analysis that infers an action that a user desires to be automatically performed.

13. A computer-implemented method of multi-modal search, comprising:

receiving a multi-modal input from a user;

establishing a wildcard query based upon portions of the multi-modal input; and

rendering a plurality of suggested query results based upon the wildcard query.

14. The computer-implemented method of claim 13, wherein the multi-modal input includes at least two of text, speech, touch or gesture input.

15. The computer-implemented method of claim 13, further comprising:

converting a portion of the multi-modal input into a wildcard; and

retrieving a subset of the query suggestion results based upon the wildcard.

16. The computer-implemented method of claim 13, further comprising analyzing the input as a function of an algorithm irrespective of word order.

17. The computer-implemented method of claim 16, wherein the algorithm is a TFIDF algorithm.

18. The computer-implemented method of claim 13, wherein the multi-modal input includes at least a text hint coupled with a spoken input.

19. A computer-executable system that facilitates generation of a wildcard search query based upon a multi-modal input, comprising:

means for receiving the multi-modal input from a user, wherein the multi-modal input includes at least two of text, speech, touch or gesture input;

means for analyzing the multi-modal input irrespective of order or portions of the order; and

means for generating the wildcard search query based upon the analysis.

20. The computer-executable system of claim 19, further comprising means for generating a wildcard based at least in part upon a portion of the multi-modal input, wherein the wildcard search query employs the wildcard to match zero or more characters.