US20130226916A1 - Facet Suggestion for Search Query Augmentation - Google Patents

Facet Suggestion for Search Query Augmentation Download PDF

Info

Publication number
US20130226916A1
US20130226916A1 US13/857,102 US201313857102A US2013226916A1 US 20130226916 A1 US20130226916 A1 US 20130226916A1 US 201313857102 A US201313857102 A US 201313857102A US 2013226916 A1 US2013226916 A1 US 2013226916A1
Authority
US
United States
Prior art keywords
facets
facet
candidate
search results
presentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/857,102
Inventor
Mark H. Dredze
William N. Schilit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to US13/857,102 priority Critical patent/US20130226916A1/en
Publication of US20130226916A1 publication Critical patent/US20130226916A1/en
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SCHILIT, WILLIAM N., DREDZE, MARK H.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/3053
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions

Definitions

  • the disclosed embodiments relate generally to suggesting query refinements, and more specifically to ranking of potential query refinements that are based on facets associated with search results from an initial search query.
  • a user with 100 new electronic messages may scan the messages one by one, spending too much time addressing less relevant messages, and taking longer to discover the more important messages.
  • a search query returns many messages, the user may need to scan the messages individually to find the desired one(s). In either case, the user spends too much time looking at less relevant messages.
  • the set of information is an email folder or account in an electronic messaging system.
  • the set of information is a database of retail products.
  • GUI graphical user interface
  • a computer-implemented method searches a set of information.
  • the method utilizes a computer system having one or more processors and memory storing one or more programs.
  • the programs are executed by the one or more processors to perform the operations.
  • the method generates an initial set of search results based on an initial search query.
  • the method generates a set of candidate facets, where each of the candidate facets can be used to select a subset of the initial set of search results.
  • the method ranks the candidate facets in accordance with the selectivity of the candidate facets with respect to at least some of the search results and selects a plurality of facets from among the candidate facets for presentation to the user. The selection is in accordance with the rankings of the candidate facets.
  • the method formats the presentation facets for display to the user.
  • the method In response to user selection of any one of the presentation facets, the method generates a revised search query comprising the initial search query and the selected presentation facet, and generates a revised set of search results based on the revised search query.
  • the method determines, without further user input, for each candidate facet which facet characteristics from a predefined set of facet characteristics are characteristics of the candidate facet.
  • the method ranks the candidate facets in accordance with both the selectivity of the candidate facets with respect to at least some of the search results as well as the facet characteristics of the candidate facets.
  • each facet characteristic of the predefined set of facet characteristics has an associated weight
  • the ranking of the candidate facets is based in part on the weights associated with the facet characteristics of the candidate facets.
  • a system for searching a set of information includes: one or more processors, memory, and one or more programs stored in the memory.
  • the one or more programs comprise instructions that are executed by the one or more processors, and include instructions to generate an initial set of search results based on an initial search query.
  • the one or more programs further have instructions to perform the following operations without further user input: generate a set of candidate facets, each of which can be used to select a subset of the initial set of search results; rank the candidate facets in accordance with selectivity of the candidate facets with respect to at least some of the search results; select a plurality of presentation facets from among the candidate facets in accordance with the rankings of the candidate facets; and format for display the presentation facets.
  • the one or more programs further have instructions that execute in response to user selection of any one of the presentation facets. These instructions generate a revised search query comprising the initial search query and the selected presentation facet, and generate a revised set of search results based on the revised search query.
  • a non-transitory computer readable storage medium stores one or more programs configured for execution by one or more processors of a computer.
  • the one or more programs comprise instructions to be executed by the one or more processors, including instructions to generate an initial set of search results based on an initial search query.
  • the one or more programs further include instructions to perform the following operations without further user input: generate a set of candidate facets, each of which can be used to select a subset of the initial set of search results; rank the candidate facets in accordance with selectivity of the candidate facets with respect to at least some of the search results; select a plurality of presentation facets from among the candidate facets in accordance with the rankings of the candidate facets; and format for display the presentation facets.
  • the one or more programs further have instructions that execute in response to user selection of any one of the presentation facets. These instructions generate a revised search query comprising the initial search query and the selected presentation facet, and generate a revised set of search results based on the revised search query.
  • FIG. 1 illustrates an exemplary context in which some embodiments operate.
  • FIG. 2 is a functional description of a computer system according to some embodiments.
  • FIG. 3 provides an exemplary list of operators used to form candidate facets according to some embodiments.
  • FIG. 4 illustrates how weights are assigned to facet characteristics according to some embodiments.
  • FIG. 5 illustrates a functional process flow according to some embodiments.
  • FIGS. 6A and 6B provide a detailed descriptive process flow according to some embodiments.
  • FIG. 7 provides an exemplary graphical user interface for an email system that generates and ranks facets for user selection according to some embodiments.
  • first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
  • a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, without departing from the scope of the present invention.
  • the first contact and the second contact are both contacts, but they are not the same contact.
  • the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context.
  • the phrase “if it is determined” or “if (a stated condition or event) is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting (the stated condition or event)” or “in response to detecting (the stated condition or event),” depending on the context.
  • FIG. 1 is a block diagram of an embodiment of a search query augmentation system 100 .
  • the system 100 communicates over a network 106 (such as the Internet) with one or more clients 102 .
  • the clients 102 typically use web browsers 104 or other browsing applications to communicate with the system 100 , using HTTP requests and responses or other appropriate communication protocols.
  • the clients 102 may communicate with the system 100 using a software program other than a browser.
  • the large majority of clients 102 are typically remotely located from the system 100 , but one or more of the clients 102 can be located nearby the system 100 .
  • one or more programs execute on an Application Server 108 .
  • all or some of the modules that comprise the system 100 execute within the Application Server 108 .
  • Communications Module 110 provides communication between the system 100 and the Network 106 .
  • the Communications Module 100 receives search queries from clients 102 , conveys the search queries to Query Module 116 (via Control Module 112 ), and also conveys search results produced by the Query Module 116 in response to a respective query back to the requesting client 102 .
  • the Communications Module 110 also conveys to the requesting client, along with the search results, facets, which are query augmentation suggestions. Facet generation and ranking are described in detail below.
  • a user at a client 102 selects a facet (e.g., a facet presented along with search results)
  • the Communication Module 110 receives the selection from the client and conveys that information to the Query Module 116 (via Control module 112 ).
  • the Query Module 116 executes a revised query, comprising the terms of the prior query for which search results were returned, plus the user selected facet.
  • the new search results are returned by the Query Module 116 to the requesting client 102 via the Control Module 112 and Communications Module 110 .
  • the search query augmentation system 100 also returns one or more suggested facets for further narrowing the revised query.
  • a Control Module 112 runs one or more programs that control all of the other modules that comprise the system 100 .
  • a User Interface Module 114 manages the graphical user interface that the system 100 provides to clients 102 . For example, User Interface Module 114 determines what is displayed for a user on a client computer 102 , and determines what actions a user may take to provide input to the system (such as a search query).
  • Query Module 116 issues queries against a Database 122 to retrieve search results that are responsive to a user's query. In some embodiments, Query Module 116 maintains an Index 124 to facilitate the query process.
  • the Index 124 is a mapping of terms in a database 122 of documents (or other objects or information) to specific documents (or objects or information) in the Database 122 , and is sometimes called an inverse index (e.g., because it may be produced by “inverting” the documents in the Database 112 ).
  • the Index 112 may optionally include additional information as well, such as the locations of terms within documents in the Database 112 , and/or information about the portion(s) of the documents in which the term is located.
  • Facet Generation Module 118 generates lists of candidate facets that can be used to augment a user's search query. Facet Generation Module 118 uses a set of search results from a query, together with a list of operators (see, e.g., FIG. 3 below) to generate the candidate facets. If the number of candidate facets is very small, the User Interface Module 114 may present all of the candidate facets to the user. In general, however, the candidate facets are ranked by the Facet Ranking Module 120 , and a subset of the candidate facets is selected for presentation to the user (e.g., facets selected for conveyance to the requesting client 102 along with search results produced in response to the user's search query).
  • the selection of the subset is based on the ranking of the candidate facets, as described in more detail below with respect to FIG. 5 and FIGS. 6A and 6B .
  • the selection of the subset of facets may be performed by the Control Module 112 , or other designated module not shown in FIG. 1 .
  • the Query Module 116 and Database 122 are implemented on different servers from the other portions of the query augmentation system 100 , while in other embodiments the other portions of the query augmentation system 100 are fully integrated with the Query Module 116 and Database 122 . It is noted that Query Module 116 and Database 122 may together comprise an internet search engine, an enterprise search engine, a search engine specific to a particular online service (e.g., having a database with information concerning products or services offered for sale, or rental or online access), or the like.
  • FIG. 2 is a block diagram illustrating a Computer System 200 used for search query augmentation in accordance with some embodiments of the present invention.
  • Computer System 200 represents one The Computer System 200 typically includes one or more processing units (CPU's) 202 for executing modules, programs and/or instructions stored in memory 214 and thereby performing processing operations; one or more network or other communications interfaces 204 ; memory 214 ; and one or more communication buses 212 for interconnecting these components.
  • the Computer System 200 includes a user interface 206 comprising a display device 208 and one or more input devices 210 ; however, since Computer System 200 is typically implemented using a set of servers, in many embodiments Computer System 200 does not include a user interface.
  • memory 214 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices.
  • memory 214 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
  • memory 214 includes one or more storage devices remotely located from the CPU(s) 202 .
  • Memory 214 or alternately the non-volatile memory device(s) within memory 214 , comprises a non-transitory computer readable storage medium.
  • memory 214 or the non-transitory computer readable storage medium of memory 214 stores the following programs, modules and data structures, or a subset thereof:
  • memory 214 may store a subset of the modules and data structures identified above. Furthermore, memory 214 may store a database 122 or additional modules and data structures not described above.
  • the database 122 stores the set of information to be searched. In some embodiments, the database 122 stores the operator list 222 , the facet characteristic weights 228 , and or other data used by any of the modules comprising Computer System 200 .
  • FIG. 2 shows a Computer System used for search query augmentation
  • FIG. 2 is intended more as functional description of the various features which may be present in a set of one or more computers (e.g., one or more server systems or server computers) rather than as a structural schematic of the embodiments described herein.
  • computers e.g., one or more server systems or server computers
  • items shown separately could be combined and some items could be separated.
  • some items shown separately in FIG. 2 could be implemented on individual computer systems and single items could be implemented by one or more computer systems.
  • the actual number of computers used to implement a search query augmentation system and how features are allocated among them will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods.
  • FIGS. 3-7 provide further details about the operation of a search query augmentation system in accordance with embodiments of the present invention.
  • a query may contain one or more facets that specify properties sought in the search results.
  • facets use metadata to specify the properties, but the properties can also be based on the content of the records searched.
  • a search query could include a “PDF type” facet that specifies that the documents sought are Adobe® Acrobat® files (PDF), e.g., by including the search facet “type:PDF”.
  • PDF Adobe® Acrobat® files
  • This example facet comprises the operator “type:” and the operand “PDF”.
  • each “facet” comprises an operator plus zero or more operands.
  • a plurality of the candidate facets (and typically most of the candidate facets) comprise an operator plus one or more operands.
  • PDF type can be implemented using different operators with different numbers of parameters.
  • PDF type can implemented with a zero operand operator, “is_pdf”, that specifies a search for PDF files, with the same effect as “type:PDF”, which has an operator and one operand.
  • FIG. 3 provides a list 222 of exemplary operators that are used to build facets in some embodiments that search for electronic messages.
  • the operators “to:” 302 , “from:” 304 , and “cc:” 312 are used to specify data in fields of electronic message headers, and each of these operators requires a single operand.
  • the operand value can be a name (or a portion thereof), an email address, or a domain name (such as google.com or yahoo.com).
  • the “deliveredto:” operator 318 is similar to the first three, but in some embodiments the “deliveredto:” operator requires an operand that must be an email address or a domain name. These four operators are sometimes referred to as “person operators.”
  • the “subject:” operator 306 is used for facets that specify words that occur in the subject line of electronic messages.
  • “subject:football” is a facet that specifies a search for messages with the word “football” in the subject line.
  • the “label:” operator 308 is used for facets that specify words appearing in a label associated with an electronic message. Unlike a subject line, labels may be associated with an electronic message after it is sent, the labels can be created by recipients, and multiple labels may be assigned to the same message. If a user has assigned labels to messages, the labels can create an effective way for the user to find the labeled messages.
  • the “list:” operator 310 is used for facets that specify a mailing list that appears in the “to” or “from” header.
  • a mailing list may be used to specify a group of people or email addresses, so that a user may send a message to the group without specifying the names or email addresses individually.
  • the facet “list:dept200@company.com” would specify all messages sent to or from the mailing list dept200@company.com.
  • Facets can also be used to specify date ranges.
  • the operators “after:” 314 and “before:” 316 are used to specify a range of dates.
  • the operators “after:” 314 and “before:” 316 each requires a single operand, which must be a date.
  • a date operand is formatted as YYYY/MM/DD to prevent ambiguity.
  • the evaluation of dates uses the regional settings on the user's computer.
  • the date operand includes a time value, for example 10:30:00 AM in the operand 2009/10/15 10:30:00 AM. Some embodiments that allow the time to be specified require that the time be specified on a 24 hour clock.
  • the time may be specified on a 12 hour clock with an AM or PM designation.
  • the formatting of the time operands is determined by regional settings on the user's computer.
  • the end point date is included in the scope of the facet; other embodiments exclude the endpoint.
  • the facet “after:2009/12/10” would specify all messages sent on or after Dec. 10, 2009 (including the endpoint).
  • a single operator “between:” is used instead of the operators “after:” 314 and “before:” 316 . When used, the “between:” operator requires two operands, specifying the beginning and ending dates of a date range. The same issues or options for formatting of date and time operands apply to the “between:” operator.
  • the date each message is sent is the only relevant date for a query, and thus the operators “after:” and “before:” are unambiguous.
  • relevant dates such as the date created, date last changed, date last accessed, date posted on the website, etc.
  • some embodiments require two operands for the operators “after:” and “before:”.
  • One of the operands specifies which date field in the documents to look at, and the other operand specifies the comparison date.
  • Other embodiments address this issue by creating different operators for each of the relevant date fields in the documents.
  • some embodiments use the facets “edited.after:” and “edited.before:” to specify date ranges for when the documents were last edited.
  • the operators require a single date operand, as described above for the electronic mail operators “after:” and “before:”.
  • the remaining operators listed in FIG. 3 are zero-operand operators, and the meanings are fairly intuitive based on the names of the operators.
  • the “has:attachment” operator 320 specifies a search for electronic messages that have attachments.
  • the operators “is:starred” 322 , “is:unread” 324 , “is:read” 326 , and “is:chat” 328 all specify simple properties of electronic messages.
  • the “is:starred” operator is generally replaced by an operator whose name is more meaningful or relevant, such as “is:important”.
  • the “in:inbox” operator 330 , the “in:trash” operator 332 , and the “in:spam” operator 334 specify certain folders to search for messages. Because a zero-operand operator has no operands, the operand itself is a facet. For example, “has:attachment” is a facet.
  • the specified folder is an operand.
  • the operand can be any folder in a user's email account, and not just the predefined folders inbox, trash, and spam. For example, if a user has created a folder called “medical,” then the facet “in:medical” could be used to search for messages within the medical folder. In this case, the possible operands are based on the structure of the user's electronic mail folders rather than the initial set of search results.
  • the number of candidate facets can be quite large. For example, in embodiments searching for electronic messages, there is a candidate facet of the form “to:XXXX” for each name XXXX that appears in the “To:” header of a message in the search results. Rather than display all of the candidate facets to the user, embodiments of the present invention rank the candidate facets and display only the highest ranked facets to the user for selection. To evaluate the utility of the candidate facets, embodiments of the present invention use one or more facet characteristics, which are described in more detail below.
  • Facet Characteristic Weight table 228 in FIG. 4 assign weights to each facet characteristic, as shown in Facet Characteristic Weight table 228 in FIG. 4 .
  • Facet Characteristic Weight table 228 there is a predefined set of n facet characteristics characteristic 1 402 , characteristic 2 404 , . . . , characteristic n 406 , and these facet characteristics have weights weight 1 , weight 2 , . . . , weight n .
  • all of the weights are positive numbers, but in other embodiments some of the weights may be negative.
  • FIG. 5 illustrates a functional process flow 500 according to some embodiments.
  • Process flow 500 begins with a set of search results 502 , which is generated by Query Module 116 in response to a user's query.
  • the Facet Generation Module 118 uses the initial set of search results 502 , and the operator list 222 (e.g., as shown in FIG. 3 ), the Facet Generation Module 118 generates ( 504 ) candidate facets 224 .
  • Each zero-operand operator is itself a candidate facet, and for each operator that requires one or more operands, facets are generated based on the data in the initial set of search results. For example, the “from:” operator may be combined with each name or email address that appears in the “From:” header of the messages in the initial set of search results.
  • the facet list 224 does not have a predefined number of candidate facets.
  • the number of candidate facets depends on both the operator list 222 and the initial set of search results 502 .
  • Each candidate facet such as Facet 2 ( 506 ) has an associated facet definition 508 and a characteristic vector 510 .
  • the facet definition 508 identifies both an operator and any operands, such as “to:google.com” 512 for Facet 2.
  • the Facet Characteristic Determination procedure 226 determines which facet characteristics apply to each candidate facet, creating a characteristic vector 510 associated with each candidate facet.
  • Each characteristic vector 510 has n components, where n is the number of predefined facet characteristics.
  • the corresponding component in the characteristic vector is 1 if the facet characteristic applies to the candidate facet, and is 0 otherwise.
  • Each characteristic vector is thus a list of n zeros and ones, as shown for the characteristic vector 510 for Facet 2.
  • Facet 2 does not have characteristic 1
  • does have characteristic 2 does have characteristic 3 , etc.
  • Facet Ranking Module 120 ranks ( 516 ) the candidate facets to create a ranked list of candidate facets 518 .
  • the weights of the facet characteristics are stored as an n-component vector w, and the rank of each candidate facet is computed as the vector dot product of the weights w and the characteristic vector v, namely w ⁇ v.
  • more complex algorithms are used to calculate the ranking of each candidate facet. The simple use of a dot product with a set of weights w makes the approximation that each of the facet characteristics is independent of the other facet characteristics.
  • a facet ranking function is a function that assigns a ranking value to the characteristic vector v of facet.
  • the facet characteristics are grouped into clusters, and the Facet Ranking Module includes a cluster ranking function for each of the clusters.
  • the overall ranking for each candidate facet is the sum of the weights computed by the cluster ranking functions.
  • the clusters of facet characteristics i.e., the determination of which facet characteristics are assigned to each cluster
  • the top ranked candidate facets (sometimes called presentation facets 526 ) are automatically presented to the user. In other embodiments, the ranking is used in conjunction with other criteria to determine which candidate facets are presented to the user. In various embodiments, the number of top ranked candidates 526 that are presented to the user (e.g., in a web page or other results document or set 520 that also includes the user-submitted query 522 , and a subset 524 of the search results 502 ) is a fixed number, is based on the amount of room available for displaying facets to the user, or is based on other criteria such as a threshold value. See the description below of element 622 in FIG. 6B for more details.
  • FIGS. 6A and 6B provide a flowchart representing a method 600 for presenting to a user suggestions for augmenting a query, where the suggestions are based, at least in part, on information in an initial set of search results.
  • Method 600 is governed by instructions that are stored in a computer readable storage medium and that are executed by one or more processors of one or more servers or personal computing devices. Each of the operations shown in FIG. 6A or 6 B corresponds to instructions stored in a computer memory or computer readable storage medium.
  • the computer readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices.
  • the computer readable instructions stored on the computer readable storage medium are in source code, assembly language code, object code, or other instruction format.
  • the process begins when a user enters an initial search query and the initial search query is received ( 602 ).
  • the initial search query is received by a Communications Module 110 and/or Control Module 112 .
  • the set of information searched in response to the query is an email folder or email account in an electronic mail system.
  • the information searched in response to the query is a retail database of products. More generally, the information searched in response to the query is a database or other corpus of information.
  • the Query Module 116 Based on the initial search query, the Query Module 116 generates ( 604 ) an initial set of search results.
  • the Query Module 116 or Control Module 112 limits ( 606 ) the initial set to a predefined positive integer number of search results.
  • the set may be limited to 100 records.
  • the set of search results returned in response to the query is not limited, but there is a subsequent selection from among the set of search results so as to produce the initial set of search results.
  • the subsequent selection may impose a predefined limit on the number of search results selected, or may impose a quality or other restriction so as to produce the initial set of search results.
  • implementation of the limit may use random selection, selection of the highest ranked search results (e.g., PageRank), selection of the records most recently added to the database (e.g., email conversations with the most recently sent messages), selection of the records that are the most popular (e.g., selected the most frequently by users), or other criteria so as to produce a set of search results that complies with the limit. It is noted that one reason for limiting or reducing the size of the initial set of search results is to improve efficiency of subsequent operations of the method 600 while still providing sufficient data to produce good facets for presentation to the user.
  • PageRank selection of the records most recently added to the database
  • selection of the records that are the most popular e.g., selected the most frequently by users
  • the initial search query when the initial search query is blank, the initial set of search results is a currently viewed list of records. For example, if a user is viewing an email inbox, the “initial set of search results” comprises the messages in the user's inbox.
  • the search query when there is a default initial set of search results (such as the messages in an inbox), the search query is automatically filled in to correspond to the initial set of search results. For example, while viewing inbox messages, a search query field or display region may be filled in with “in:inbox” so that the displayed search query corresponds to what the user is viewing.
  • the Facet Generation Module 118 generates ( 608 ) a set of candidate facets.
  • Each of the candidate facets can be used to select a subset of the search results.
  • facets comprise an operator and zero or more operands.
  • the operators listed in FIG. 3 are exemplary operators when the set of information to be searched is an email folder or email account.
  • the candidate facet “has:attachment” specifies that, when this facet is included in a search query, each email message (or each email conversation in conversation based email systems) in the search results must have at least one attachment.
  • the operator “has:attachment” has zero operands.
  • the operator “from:” requires an operand, which identifies the sender of a message, such as “from:john”.
  • candidate facets are generated for each operator based on the metadata and/or content in the initial search results. For example, if the initial search results include messages from fifty distinct people, the facet generation operation 608 would create a facet of the form “from:YYYY”, where YYYY can be any of the fifty names.
  • the operators are predefined, the operators that have one or more operands can generate a large number of facets based on metadata and/or content in the search results.
  • the “from:” operator can also be used with other operands, such as domain names (e.g., “google.com”) and special purpose operands (e.g., “mycontactlist,” where “from:mycontactlist” is true for any message received from any email address listed in the user contact list).
  • domain names e.g., “google.com”
  • special purpose operands e.g., “mycontactlist,” where “from:mycontactlist” is true for any message received from any email address listed in the user contact list.
  • facets In an embodiment that searches a retail database of products, the generation of facets is similar. For example, if an initial search query is looking for television sets, facets could be generated that specify screen size, brand, price, and so on. Note that the specification of screen size could use a two-operand operator (the size is between x inches and y inches) or two distinct facets (the size is greater than x inches and the size is less than y inches). In some embodiments, some of the facets can be predefined, such as “brand:sony” to specify a facet that would restrict the result set to only Sony® brand televisions. If there were no Sony® televisions in the result set, then “brand:sony” would not be one of the generated facets.
  • the Facet Ranking Module 120 ranks ( 610 ) the candidate facets.
  • the ranking is in accordance with selectivity of the candidate facets with respect to the initial set of search results. For example, the facet “has:attachment” would not be selective if none of the messages in the initial set of search results had any attachments. The same candidate facet would also not be selective if all of the search results had attachments.
  • the selectivity of the candidate facets is based, at least in part, on how evenly each candidate facet splits the initial set of search results ( 612 ). For example, if exactly half of the initial set of search results are messages with attachments, then “has:attachment” is highly selective. More specifically, one exemplary mathematical definition of selectivity of a candidate facet is
  • N T is the total number of search results
  • N F is the number of search results that have the candidate facet
  • abs( ) computes the absolute value of the number in the parentheses.
  • a “perfect” score would be zero, indicating that a candidate facet exactly splits the search results. All other candidate facets would have a negative selectivity.
  • This definition of selectivity can be converted to positive values by, for example, adding an offset such as N T /2 to the selectivity score shown above.
  • each candidate facet has zero or more facet characteristics from a predefined set of facet characteristics (as described above with respect to FIGS. 3 and 4 ).
  • method 600 determines ( 614 ) which facet characteristics from a predefined set of facet characteristics each candidate facet has, and ranks ( 614 ) the candidate facets further in accordance with the facet characteristics (i.e., the candidate facets are ranked in accordance with both their selectivity and their facet characteristics).
  • Each of the facet characteristics has some predictive value or utility in terms of predicting which candidate facets are more likely to be selected by users to refine the search query for which the initial set of search results was produced.
  • some facet characteristics are positively correlated with predicted utility of candidate facets, while other facet characteristics are negatively correlated with the predicted utility of candidate facets.
  • the selection of the set of facet characteristics is not part of method 600 ; that selection process is described more fully below.
  • each of the facet characteristics has an associated weight
  • the ranking of the candidate facets is based in part on the weights associated with the facet characteristics ( 616 ).
  • some embodiments assign weights to each of the predefined facet characteristics. In some embodiments the weights are all positive numbers, but in other embodiments some of the weights may be negative.
  • some embodiments assign a characteristic vector 510 to each candidate facet 506 in facet list 224 .
  • the ranking of each candidate facet is calculated as the vector dot product of the characteristic vector and the weights of the facet characteristics.
  • the ranking of each candidate facet is the sum of the weights of all of the facet characteristics that apply to the candidate facet.
  • the ranking of each candidate facet is based on a more complex ranking function that computes, for each candidate facet, scores for multiple clusters of facet characteristics and combines those scores to produce a ranking value, instead of assigning a single fixed weight to each facet characteristic.
  • the weights of facet characteristics are manually assigned based on analysis or intrinsic knowledge of the facet characteristics. For example, one may assume that the facet characteristic of being the most selective should have a higher weight than being in the top five for selectivity.
  • the weight associated with each facet characteristic is based on historical popularity of presentation facets having the facet characteristics ( 618 ).
  • data is collected on which presentation facets users actually select compared to the predicted calculated ranking, and machine learning is used to adjust the weights to bring them more in line with actual usage.
  • the machine learning can be performed in a testing environment, or in a production environment on an occasional, periodic or continual basis to improve selection of the presentation facets.
  • the Facet Ranking Module 120 , Control Module 112 or User Interface Module 114 selects ( 620 ) a plurality of presentation facets from among the candidate facets in accordance with the rankings of the candidate facets.
  • the selection ( 620 ) takes the top R candidate facets based on the ranking, where R is the number of facets that can be displayed to a user.
  • the number of presentation facets is not fixed, but may be based on rankings or other criteria. For example, if there are 15 highly ranked candidate facets, then some embodiments would select all of them as presentation facets, even if the screen could only display ten of them at a time.
  • the candidate facets may be partitioned into distinct subsets (such as by operator), and the highest ranked candidate facets within each partition are selected as presentation facets.
  • the presentation facets are organized into clusters based on other properties. For example, in some embodiments once a list of presentation facets is selected, they are organized for presentation (e.g., ordered) by type (person facets, content facets, etc.). In some other embodiments, the display order of the presentation facets is based on other use metric(s). For example, although the selection of presentation facets may be based on a learned ranking, some embodiments display the presentation facets in order of how often they were previously clicked. In other embodiments, the display of the presentation facets is in alphabetical order. Some embodiments use a mixture of the above presentation methods, while other embodiments organize presentation facets based on history or preferences of a user (e.g., preferences recorded in a user profile).
  • additional facets are displayed when a user selects a “show me more” button or a “show me more like this” button (which reveals more facets of the same type or are similar to an identified presentation facet).
  • the additional facets are selected from the candidate facets based on the ranking of the candidate facets or other properties of the candidate facets (such as the operator).
  • the Facet Display Interface 218 within the User Interface Module 114 formats ( 622 ) the presentation facets for display.
  • the display of the presentation facets is described in more detail below with respect to FIG. 7 .
  • a user may select any of the presentation facets once they are displayed.
  • the Control Module 112 performs two operations: First, the Control Module 112 creates ( 626 ) a revised search query comprising the initial search query and the selected presentation facet. In some embodiments the revised search query is the concatenation of the text string of the initial query and a text string corresponding to the selected presentation facet. Second, the requery procedure 220 within the Query Module 116 generates ( 628 ) a revised set of search results based on the revised search query. In some embodiments, the revised search query generates the revised set of search results from scratch.
  • the revised set of search results is retrieved from the database using the revised search query, without making use of the prior search results.
  • the revised search query is applied to the initial set of search results to generate the revised set of search results.
  • the revised set of search results is selected from the initial set of search results.
  • Facet Generation Module 118 generates ( 608 ) a new set of candidate facets, and proceeds in the same way as processing the initial search query.
  • the presentation facets are links back to the search augmentation system 100 .
  • Each such link contains a URL and one or more URL parameters that specifies the previous search query (or other information that enables the search augmentation system 100 to obtain the previous search query) and the user selected presentation facet.
  • user selection of a presentation facet causes an HTTP request to be sent to the search augmentation system 100 with the aforementioned parameters.
  • user selection of a presentation facet causes the client application (e.g., a browser application) to augment the search query with the presentation facet, but does not automatically send the resulting revised search query to the search augmentation system.
  • the client application 104 at the client includes instructions for responding to user selection of a facet by augmenting the search query with the presentation facet.
  • a user need not select one of them.
  • the user can take any other action that is appropriate after querying the database. For example, the user could view any of the search results or could refine the search query manually.
  • FIG. 7 shows an exemplary graphical user interface (GUI) 700 for an email system that generates facets for selection by a user.
  • Search query entry box 702 allows a user to enter a search query. After the user enters a search query, the search query is displayed in the box 702 , and the user may execute the query by pressing search button 710 , pressing the ENTER key, or taking any other action designated by GUI 700 to execute the query. After execution of the search query, search results 708 are displayed in the GUI. In some embodiments, the “search results” 708 are the content of an email folder when no search query has been issued.
  • the “search results” 708 show a list of conversations (some of which have more than one message, as indicated by the integer value 705 in parentheses adjacent the sender list 707 for each listed conversation that contains more than one message) in the user's inbox.
  • GUI 700 shows presentation facets 704 - 1 , 704 - 2 , . . . , 704 - 5 , which are located in a horizontal array just below the search query entry box 702 .
  • a user selects a presentation facet by clicking on it. For example, if a user clicks on presentation facet “to:yahoo.com” 704 - 2 , the facet “to:yahoo.com” would be added to the search query.
  • the presentation facets are placed in other locations in the GUI 700 , the presentation facets are aligned vertically, or there are more or fewer presentation facets displayed.
  • a clickable icon 706 is placed next to each presentation facet that designates the logically opposite facet.
  • the icon (-) 706 next to presentation facet “is:unread” 704 - 5 would designate the facet “not is:unread”; i.e., messages that have been read.
  • the clickable icon (-) 706 may be placed next to each of the presentation facets.
  • One of skill in the art would recognize that many alternative graphic symbols or text could be used to designate presentation facets that are the exact opposites of the ones displayed.
  • a set of facet characteristics Prior to execution of method 600 in FIGS. 6A and 6B , a set of facet characteristics is established.
  • the universe of potential facet characteristics has at least three general categories.
  • the first general category consists of facet characteristics based on the historical activities of one or more users. Here the idea is that past behavior is a predictor of future behavior.
  • the second general category consists of measuring the effects of candidate facets on the initial set of search results, typically by counting based on some rule.
  • the third category consists of intrinsic characteristics of the operators and their values.
  • the first category of facet characteristics is based on the search history of the user, or alternately the search history of a community of users. Previous user behavior is a good indication of a facet's usefulness. A candidate facet that has been frequently selected in the past by a user, or community of users, is likely to be selected in the future. Additionally, the context in which a facet was selected may influence its selection. For example, a user may search for the term “john” and then select the facet “from:john smith.” In this case, the facet is useful in the context of the query “john” but may not be relevant to other queries.
  • Some embodiments of the present invention have facet characteristics corresponding to each popularity evaluation approach and how many times each candidate facet matches previous queries based on the popularity evaluation approach. For example, facet characteristics may correspond to zero matches, one or more match, exactly one match, exactly two matches, three or more matches, or other similar counts. Because of the three distinct approaches and the different counts that may be used, there can be many facet characteristics that measure popularity. Embodiments of the present invention may use any subset of these possible facet characteristics. One of skill in the art would recognize that alternative popularity approaches are possible and alternative matching methodologies are possible, creating a much broader list of possible facet characteristics.
  • some embodiments of the present invention include facet characteristics based on the relative popularity of candidate facets. Some embodiments include facet characteristics that identify the most popular of the candidate facets, the second most popular of the candidate facets, the third most popular of the candidate facets, or the top five most popular of the candidate facets. For example, one exemplary facet characteristic is “the most popular candidate facet based on terms in the search query.” This example facet characteristic would apply to only one candidate facet, unless there were two or more candidate facets that tied for usage.
  • the second category of facet characteristics is based on counting search results that match certain criteria.
  • One criterion (sometimes called selectivity) is how evenly a facet splits the search results. This criterion can be converted into facet characteristics by comparing the relative selectivity of the candidate facets. For example, some embodiments include facet characteristics corresponding to: the candidate facet that is number 1 in selectivity, the candidate facet that is number 2 in selectivity, the candidate facet that is number 3 in selectivity, the candidate facets that are in the top five for selectivity, and the candidate facets that are not in the top 5 for selectivity. Note that the last two exemplary facet characteristics are opposites. Generally, not being in the top 5 for selectivity would be negatively correlated with the ultimate ranking of candidate facets.
  • Additional exemplary facet characteristics in the second category are based on simple counts of the candidate facets.
  • the third category of facet characteristics is based on the basic types of the candidate facets.
  • the operators e.g., “to:”, “from:”
  • the values of the operands (“bill”, “domain.com,” etc.) may inherently affect the utility of a candidate facet.
  • facets that use the operator “to:” may be generally more relevant than facets that use the operator “cc:”.
  • the facet characteristics in this third category capture properties of candidate facets that are generally consistent across a wide range of users.
  • the examples provided here pertain to the context of searching email, but similar analysis would create facet characteristics applicable to other contexts.
  • a “person operator” such as “to:”, “from:”, or “cc:” would generally have an operand that is the name of a person or an email address
  • these operators can also have operands that are domain names (e.g., “to:google.com”) or symbolically represent something else (e.g., “to:me”).
  • Some embodiments of the present invention include facet characteristics to identify: a person operator whose operand is the name of a person; a person operator whose operand is a domain name; a person operator whose operand is the user (“me”); a person operator whose operand is an email address; or a person operator whose operand contains a hyphen.
  • hyphens and other non-alphanumeric characters in an operand correspond to properties of an email address. For example, in some organizations, hyphens are used only within mailing lists, such as “all-domestic-employees@company.com”. By examining the email addresses of other organizations, other facet characteristics could be created to evaluate candidate facets.
  • Some embodiments include facet characteristics that are conjunctions of the type of the operator together with the type of the value as just described. For example, some embodiments include the facet characteristic of being sent “to a domain name” (this facet characteristic would apply to the candidate facet “to:google.com” but would not apply to the candidate facet “from:google.com” or the candidate facet “to:bob”).
  • Some embodiments include the facet characteristic of having an operator value that is a personal name in a user's address book or having an operator value that is an email address in the user's address book. Address book membership may indicate familiarity and therefore may influence the relevance of any candidate facet that includes these people. Some embodiments also include a facet characteristic of having a personal name that is similar to a name that appears in the initial query. For example, if a user's initial query was “Bill,” then the candidate “from:bill smith” may be particularly relevant.

Abstract

A method searches a set of information using a computer. The method generates a set of search results based on a search query. Then, without further user input, the method generates a set of candidate facets, where each of the candidate facets can be used to select a subset of the search results. The method ranks the candidate facets in accordance with selectivity of the candidate facets and selects a plurality of facets from among the candidate facets for presentation to the user. The selection is in accordance with the rankings of the candidate facets. The method formats the presentation facets for display to the user. In response to user selection of a presentation facet, the method generates a revised search query comprising the original search query and the selected presentation facet, and generates a revised set of search results based on the revised search query.

Description

    RELATED APPLICATIONS
  • This application is a continuation of U.S. application Ser. No. 12/894,079, “Facet Suggestion for Search Query Augmentation,” filed Sep. 29, 2010, which claims priority to U.S. Provisional Application Ser. No. 61/247,512, filed Sep. 30, 2009, which are hereby incorporated by reference in their entirety.
  • TECHNICAL FIELD
  • The disclosed embodiments relate generally to suggesting query refinements, and more specifically to ranking of potential query refinements that are based on facets associated with search results from an initial search query.
  • BACKGROUND
  • As electronic mail becomes more popular, users face large volumes of messages in their electronic mail folders. This makes at least two important tasks difficult: triage of new messages and searching for existing messages. Whether for searching or triage, a user would like to find or dispose of the messages quickly, but may not know an optimal strategy.
  • For example, a user with 100 new electronic messages may scan the messages one by one, spending too much time addressing less relevant messages, and taking longer to discover the more important messages. Similarly, if a search query returns many messages, the user may need to scan the messages individually to find the desired one(s). In either case, the user spends too much time looking at less relevant messages.
  • SUMMARY
  • The above deficiencies and other problems associated with searching a set of information are reduced by the disclosed embodiments. In some embodiments, the set of information is an email folder or account in an electronic messaging system. In some other embodiments, the set of information is a database of retail products. In some embodiments, there is a graphical user interface (GUI) that displays potential facets to a user, which may be appended to a user's query to refine a search query.
  • In accordance with some embodiments, a computer-implemented method searches a set of information. The method utilizes a computer system having one or more processors and memory storing one or more programs. The programs are executed by the one or more processors to perform the operations. The method generates an initial set of search results based on an initial search query. Then, without further user input, the method generates a set of candidate facets, where each of the candidate facets can be used to select a subset of the initial set of search results. The method ranks the candidate facets in accordance with the selectivity of the candidate facets with respect to at least some of the search results and selects a plurality of facets from among the candidate facets for presentation to the user. The selection is in accordance with the rankings of the candidate facets. The method formats the presentation facets for display to the user. In response to user selection of any one of the presentation facets, the method generates a revised search query comprising the initial search query and the selected presentation facet, and generates a revised set of search results based on the revised search query.
  • In accordance with some embodiments of the aforementioned method, the method determines, without further user input, for each candidate facet which facet characteristics from a predefined set of facet characteristics are characteristics of the candidate facet. The method ranks the candidate facets in accordance with both the selectivity of the candidate facets with respect to at least some of the search results as well as the facet characteristics of the candidate facets.
  • In some embodiments of the aforementioned method, each facet characteristic of the predefined set of facet characteristics has an associated weight, and the ranking of the candidate facets is based in part on the weights associated with the facet characteristics of the candidate facets.
  • In accordance with some embodiments, a system for searching a set of information includes: one or more processors, memory, and one or more programs stored in the memory. The one or more programs comprise instructions that are executed by the one or more processors, and include instructions to generate an initial set of search results based on an initial search query. The one or more programs further have instructions to perform the following operations without further user input: generate a set of candidate facets, each of which can be used to select a subset of the initial set of search results; rank the candidate facets in accordance with selectivity of the candidate facets with respect to at least some of the search results; select a plurality of presentation facets from among the candidate facets in accordance with the rankings of the candidate facets; and format for display the presentation facets. The one or more programs further have instructions that execute in response to user selection of any one of the presentation facets. These instructions generate a revised search query comprising the initial search query and the selected presentation facet, and generate a revised set of search results based on the revised search query.
  • In accordance with some embodiments, a non-transitory computer readable storage medium stores one or more programs configured for execution by one or more processors of a computer. The one or more programs comprise instructions to be executed by the one or more processors, including instructions to generate an initial set of search results based on an initial search query. The one or more programs further include instructions to perform the following operations without further user input: generate a set of candidate facets, each of which can be used to select a subset of the initial set of search results; rank the candidate facets in accordance with selectivity of the candidate facets with respect to at least some of the search results; select a plurality of presentation facets from among the candidate facets in accordance with the rankings of the candidate facets; and format for display the presentation facets. The one or more programs further have instructions that execute in response to user selection of any one of the presentation facets. These instructions generate a revised search query comprising the initial search query and the selected presentation facet, and generate a revised set of search results based on the revised search query.
  • Thus methods and systems are provided that present useful facet suggestions to a user, making the processes of searching or triage of data faster and more efficient.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For a better understanding of the aforementioned embodiments of the invention as well as additional embodiments thereof, reference should be made to the Description of Embodiments below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
  • FIG. 1 illustrates an exemplary context in which some embodiments operate.
  • FIG. 2 is a functional description of a computer system according to some embodiments.
  • FIG. 3 provides an exemplary list of operators used to form candidate facets according to some embodiments.
  • FIG. 4 illustrates how weights are assigned to facet characteristics according to some embodiments.
  • FIG. 5 illustrates a functional process flow according to some embodiments.
  • FIGS. 6A and 6B provide a detailed descriptive process flow according to some embodiments.
  • FIG. 7 provides an exemplary graphical user interface for an email system that generates and ranks facets for user selection according to some embodiments.
  • Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details.
  • The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
  • DESCRIPTION OF EMBODIMENTS
  • It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, without departing from the scope of the present invention. The first contact and the second contact are both contacts, but they are not the same contact.
  • The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
  • As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context. Similarly, the phrase “if it is determined” or “if (a stated condition or event) is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting (the stated condition or event)” or “in response to detecting (the stated condition or event),” depending on the context.
  • FIG. 1 is a block diagram of an embodiment of a search query augmentation system 100. The system 100 communicates over a network 106 (such as the Internet) with one or more clients 102. The clients 102 typically use web browsers 104 or other browsing applications to communicate with the system 100, using HTTP requests and responses or other appropriate communication protocols. In alternative embodiments, the clients 102 may communicate with the system 100 using a software program other than a browser. The large majority of clients 102 are typically remotely located from the system 100, but one or more of the clients 102 can be located nearby the system 100. In some embodiments of the system 100, one or more programs execute on an Application Server 108. In certain embodiments, all or some of the modules that comprise the system 100 execute within the Application Server 108.
  • Communications Module 110 provides communication between the system 100 and the Network 106. For example, the Communications Module 100 receives search queries from clients 102, conveys the search queries to Query Module 116 (via Control Module 112), and also conveys search results produced by the Query Module 116 in response to a respective query back to the requesting client 102. The Communications Module 110 also conveys to the requesting client, along with the search results, facets, which are query augmentation suggestions. Facet generation and ranking are described in detail below. When a user at a client 102 selects a facet (e.g., a facet presented along with search results), the Communication Module 110 receives the selection from the client and conveys that information to the Query Module 116 (via Control module 112). In response the Query Module 116 executes a revised query, comprising the terms of the prior query for which search results were returned, plus the user selected facet. In response to the revised query, the new search results are returned by the Query Module 116 to the requesting client 102 via the Control Module 112 and Communications Module 110. Optionally, along with the new search results the search query augmentation system 100 also returns one or more suggested facets for further narrowing the revised query.
  • A Control Module 112 runs one or more programs that control all of the other modules that comprise the system 100. A User Interface Module 114 manages the graphical user interface that the system 100 provides to clients 102. For example, User Interface Module 114 determines what is displayed for a user on a client computer 102, and determines what actions a user may take to provide input to the system (such as a search query). Query Module 116 issues queries against a Database 122 to retrieve search results that are responsive to a user's query. In some embodiments, Query Module 116 maintains an Index 124 to facilitate the query process. The Index 124 is a mapping of terms in a database 122 of documents (or other objects or information) to specific documents (or objects or information) in the Database 122, and is sometimes called an inverse index (e.g., because it may be produced by “inverting” the documents in the Database 112). The Index 112 may optionally include additional information as well, such as the locations of terms within documents in the Database 112, and/or information about the portion(s) of the documents in which the term is located.
  • Facet Generation Module 118 generates lists of candidate facets that can be used to augment a user's search query. Facet Generation Module 118 uses a set of search results from a query, together with a list of operators (see, e.g., FIG. 3 below) to generate the candidate facets. If the number of candidate facets is very small, the User Interface Module 114 may present all of the candidate facets to the user. In general, however, the candidate facets are ranked by the Facet Ranking Module 120, and a subset of the candidate facets is selected for presentation to the user (e.g., facets selected for conveyance to the requesting client 102 along with search results produced in response to the user's search query). The selection of the subset is based on the ranking of the candidate facets, as described in more detail below with respect to FIG. 5 and FIGS. 6A and 6B. The selection of the subset of facets may be performed by the Control Module 112, or other designated module not shown in FIG. 1.
  • In some embodiments, the Query Module 116 and Database 122 are implemented on different servers from the other portions of the query augmentation system 100, while in other embodiments the other portions of the query augmentation system 100 are fully integrated with the Query Module 116 and Database 122. It is noted that Query Module 116 and Database 122 may together comprise an internet search engine, an enterprise search engine, a search engine specific to a particular online service (e.g., having a database with information concerning products or services offered for sale, or rental or online access), or the like.
  • FIG. 2 is a block diagram illustrating a Computer System 200 used for search query augmentation in accordance with some embodiments of the present invention. Computer System 200 represents one The Computer System 200 typically includes one or more processing units (CPU's) 202 for executing modules, programs and/or instructions stored in memory 214 and thereby performing processing operations; one or more network or other communications interfaces 204; memory 214; and one or more communication buses 212 for interconnecting these components. In some embodiments, the Computer System 200 includes a user interface 206 comprising a display device 208 and one or more input devices 210; however, since Computer System 200 is typically implemented using a set of servers, in many embodiments Computer System 200 does not include a user interface. In some embodiments, memory 214 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. In some embodiments, memory 214 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. In some embodiments, memory 214 includes one or more storage devices remotely located from the CPU(s) 202. Memory 214, or alternately the non-volatile memory device(s) within memory 214, comprises a non-transitory computer readable storage medium. In some embodiments, memory 214 or the non-transitory computer readable storage medium of memory 214 stores the following programs, modules and data structures, or a subset thereof:
      • an operating system 216 that includes procedures for handling various basic system services and for performing hardware dependent tasks;
      • a network communication module 110 that is used for connecting the Computer System 200 to other computers via the one or more communication network interfaces 204 (wired or wireless) and one or more communication networks 106, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
      • a control module 112 that includes procedures to control the interaction of the modules in the Computer System 200;
      • a user interface module 114 that determines what is presented to a user (at a client 102), how the information and other components are presented to a user, and what actions a user may take to interact with Computer System 200. In some embodiments, the user interface module 114 includes a facet display interface 218 that determines how facets are displayed to a user, how many facets are displayed, and how a user can select a facet;
      • a query module 116 that queries the database 122, in response to a search query, for search results that match the search query (or, alternatively, are responsive to the search query). In some embodiments, the query module 116 includes an index 124, described above. In some embodiments, the query module 116 includes a requery procedure 220 that builds a new set of search results after a user has selected a facet. As described below in FIG. 6B, the requery procedure 220 may run against the database 122, or may build a new set of search results from the prior set of search results;
      • a facet generation module 118 that creates a set of candidate facets 224 based on a set of search results and an operator list 222 (which may be stored in the database 122). After generating a list of candidate facets 224, a facet characteristic determination procedure 226 determines which facet characteristics apply to each of the candidate facets. In some embodiments, the facet characteristic determination procedure 226 is part of the facet ranking module 120;
      • a facet ranking module 120 that ranks the candidate facets according to selectivity of the candidate facets. In some embodiments the facet ranking module 120 uses weights assigned to each of the facet characteristics (228) as part of the facet ranking process.
  • Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory 214 may store a subset of the modules and data structures identified above. Furthermore, memory 214 may store a database 122 or additional modules and data structures not described above.
  • In some embodiments, the database 122 stores the set of information to be searched. In some embodiments, the database 122 stores the operator list 222, the facet characteristic weights 228, and or other data used by any of the modules comprising Computer System 200.
  • Although FIG. 2 shows a Computer System used for search query augmentation, FIG. 2 is intended more as functional description of the various features which may be present in a set of one or more computers (e.g., one or more server systems or server computers) rather than as a structural schematic of the embodiments described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. For example, some items shown separately in FIG. 2 could be implemented on individual computer systems and single items could be implemented by one or more computer systems. The actual number of computers used to implement a search query augmentation system and how features are allocated among them will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods.
  • FIGS. 3-7 provide further details about the operation of a search query augmentation system in accordance with embodiments of the present invention.
  • In addition to the search terms in a query, a query may contain one or more facets that specify properties sought in the search results. In many cases facets use metadata to specify the properties, but the properties can also be based on the content of the records searched. For example, when searching for documents on the Internet, a search query could include a “PDF type” facet that specifies that the documents sought are Adobe® Acrobat® files (PDF), e.g., by including the search facet “type:PDF”. This example facet comprises the operator “type:” and the operand “PDF”. As used herein, each “facet” comprises an operator plus zero or more operands. In some embodiments, a plurality of the candidate facets (and typically most of the candidate facets) comprise an operator plus one or more operands. One of skill in the art would recognize that a respective facet can be implemented using different operators with different numbers of parameters. For example, the above mentioned “PDF type” facet can implemented with a zero operand operator, “is_pdf”, that specifies a search for PDF files, with the same effect as “type:PDF”, which has an operator and one operand.
  • FIG. 3 provides a list 222 of exemplary operators that are used to build facets in some embodiments that search for electronic messages. The operators “to:” 302, “from:” 304, and “cc:” 312 are used to specify data in fields of electronic message headers, and each of these operators requires a single operand. In some embodiments, the operand value can be a name (or a portion thereof), an email address, or a domain name (such as google.com or yahoo.com). The “deliveredto:” operator 318 is similar to the first three, but in some embodiments the “deliveredto:” operator requires an operand that must be an email address or a domain name. These four operators are sometimes referred to as “person operators.”
  • The “subject:” operator 306 is used for facets that specify words that occur in the subject line of electronic messages. For example, “subject:football” is a facet that specifies a search for messages with the word “football” in the subject line. The “label:” operator 308 is used for facets that specify words appearing in a label associated with an electronic message. Unlike a subject line, labels may be associated with an electronic message after it is sent, the labels can be created by recipients, and multiple labels may be assigned to the same message. If a user has assigned labels to messages, the labels can create an effective way for the user to find the labeled messages. The “list:” operator 310 is used for facets that specify a mailing list that appears in the “to” or “from” header. A mailing list (or distribution list) may be used to specify a group of people or email addresses, so that a user may send a message to the group without specifying the names or email addresses individually. The facet “list:dept200@company.com” would specify all messages sent to or from the mailing list dept200@company.com.
  • Facets can also be used to specify date ranges. In some embodiments, the operators “after:” 314 and “before:” 316 are used to specify a range of dates. The operators “after:” 314 and “before:” 316 each requires a single operand, which must be a date. In some embodiments, a date operand is formatted as YYYY/MM/DD to prevent ambiguity. In other embodiments the evaluation of dates uses the regional settings on the user's computer. In some embodiments the date operand includes a time value, for example 10:30:00 AM in the operand 2009/10/15 10:30:00 AM. Some embodiments that allow the time to be specified require that the time be specified on a 24 hour clock. In other embodiments the time may be specified on a 12 hour clock with an AM or PM designation. Optionally, the formatting of the time operands is determined by regional settings on the user's computer. In some embodiments, the end point date is included in the scope of the facet; other embodiments exclude the endpoint. In a preferred embodiment, the facet “after:2009/12/10” would specify all messages sent on or after Dec. 10, 2009 (including the endpoint). In an alternative embodiment, a single operator “between:” is used instead of the operators “after:” 314 and “before:” 316. When used, the “between:” operator requires two operands, specifying the beginning and ending dates of a date range. The same issues or options for formatting of date and time operands apply to the “between:” operator.
  • For electronic messages, the date each message is sent is the only relevant date for a query, and thus the operators “after:” and “before:” are unambiguous. However, in other contexts, such as documents on the Internet, there may be multiple relevant dates, such as the date created, date last changed, date last accessed, date posted on the website, etc. In this context, some embodiments require two operands for the operators “after:” and “before:”. One of the operands specifies which date field in the documents to look at, and the other operand specifies the comparison date. Other embodiments address this issue by creating different operators for each of the relevant date fields in the documents. For example, some embodiments use the facets “edited.after:” and “edited.before:” to specify date ranges for when the documents were last edited. In these embodiments, the operators require a single date operand, as described above for the electronic mail operators “after:” and “before:”.
  • The remaining operators listed in FIG. 3 are zero-operand operators, and the meanings are fairly intuitive based on the names of the operators. For example, the “has:attachment” operator 320 specifies a search for electronic messages that have attachments. The operators “is:starred” 322, “is:unread” 324, “is:read” 326, and “is:chat” 328 all specify simple properties of electronic messages. In electronic message systems that indicate importance with a symbol other than a star, the “is:starred” operator is generally replaced by an operator whose name is more meaningful or relevant, such as “is:important”. In some embodiments of electronic message systems, there are operators that specify in which folder to look for messages. For example, the “in:inbox” operator 330, the “in:trash” operator 332, and the “in:spam” operator 334 specify certain folders to search for messages. Because a zero-operand operator has no operands, the operand itself is a facet. For example, “has:attachment” is a facet.
  • Some embodiments use “in:” as an operator, and the specified folder is an operand. In some embodiments that use “in:” as an operator, the operand can be any folder in a user's email account, and not just the predefined folders inbox, trash, and spam. For example, if a user has created a folder called “medical,” then the facet “in:medical” could be used to search for messages within the medical folder. In this case, the possible operands are based on the structure of the user's electronic mail folders rather than the initial set of search results.
  • Because of the large number of operators, and potentially very large number of operands used by those operators, the number of candidate facets can be quite large. For example, in embodiments searching for electronic messages, there is a candidate facet of the form “to:XXXX” for each name XXXX that appears in the “To:” header of a message in the search results. Rather than display all of the candidate facets to the user, embodiments of the present invention rank the candidate facets and display only the highest ranked facets to the user for selection. To evaluate the utility of the candidate facets, embodiments of the present invention use one or more facet characteristics, which are described in more detail below. When a plurality of facet characteristics are used, some embodiments assign weights to each facet characteristic, as shown in Facet Characteristic Weight table 228 in FIG. 4. As shown in FIG. 4, there is a predefined set of n facet characteristics characteristic1 402, characteristic2 404, . . . , characteristicn 406, and these facet characteristics have weights weight1, weight2, . . . , weightn. In some embodiments all of the weights are positive numbers, but in other embodiments some of the weights may be negative.
  • FIG. 5 illustrates a functional process flow 500 according to some embodiments. Process flow 500 begins with a set of search results 502, which is generated by Query Module 116 in response to a user's query. Using the initial set of search results 502, and the operator list 222 (e.g., as shown in FIG. 3), the Facet Generation Module 118 generates (504) candidate facets 224. Each zero-operand operator is itself a candidate facet, and for each operator that requires one or more operands, facets are generated based on the data in the initial set of search results. For example, the “from:” operator may be combined with each name or email address that appears in the “From:” header of the messages in the initial set of search results. The facet list 224 does not have a predefined number of candidate facets. The number of candidate facets depends on both the operator list 222 and the initial set of search results 502. Each candidate facet, such as Facet 2 (506) has an associated facet definition 508 and a characteristic vector 510. The facet definition 508 identifies both an operator and any operands, such as “to:google.com” 512 for Facet 2. The Facet Characteristic Determination procedure 226 (FIG. 2) determines which facet characteristics apply to each candidate facet, creating a characteristic vector 510 associated with each candidate facet. Each characteristic vector 510 has n components, where n is the number of predefined facet characteristics. In some embodiments, for each of the n facet characteristics, the corresponding component in the characteristic vector is 1 if the facet characteristic applies to the candidate facet, and is 0 otherwise. Each characteristic vector is thus a list of n zeros and ones, as shown for the characteristic vector 510 for Facet 2. In particular, Facet 2 (506) does not have characteristic1, does have characteristic2, does have characteristic3, etc.
  • Based on the facet list 224 (which includes the characteristic vectors), and the weights of the facet characteristics 228, Facet Ranking Module 120 ranks (516) the candidate facets to create a ranked list of candidate facets 518. In some embodiments the weights of the facet characteristics are stored as an n-component vector w, and the rank of each candidate facet is computed as the vector dot product of the weights w and the characteristic vector v, namely w·v. In some embodiments, more complex algorithms are used to calculate the ranking of each candidate facet. The simple use of a dot product with a set of weights w makes the approximation that each of the facet characteristics is independent of the other facet characteristics. In general, a facet ranking function is a function that assigns a ranking value to the characteristic vector v of facet. In some embodiments, the facet characteristics are grouped into clusters, and the Facet Ranking Module includes a cluster ranking function for each of the clusters. In these embodiments, the overall ranking for each candidate facet is the sum of the weights computed by the cluster ranking functions. In some embodiments, the clusters of facet characteristics (i.e., the determination of which facet characteristics are assigned to each cluster) are based on a determination of which facet characteristics are dependent on each other.
  • In some embodiments the top ranked candidate facets (sometimes called presentation facets 526) are automatically presented to the user. In other embodiments, the ranking is used in conjunction with other criteria to determine which candidate facets are presented to the user. In various embodiments, the number of top ranked candidates 526 that are presented to the user (e.g., in a web page or other results document or set 520 that also includes the user-submitted query 522, and a subset 524 of the search results 502) is a fixed number, is based on the amount of room available for displaying facets to the user, or is based on other criteria such as a threshold value. See the description below of element 622 in FIG. 6B for more details.
  • FIGS. 6A and 6B provide a flowchart representing a method 600 for presenting to a user suggestions for augmenting a query, where the suggestions are based, at least in part, on information in an initial set of search results. Method 600 is governed by instructions that are stored in a computer readable storage medium and that are executed by one or more processors of one or more servers or personal computing devices. Each of the operations shown in FIG. 6A or 6B corresponds to instructions stored in a computer memory or computer readable storage medium. The computer readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices. The computer readable instructions stored on the computer readable storage medium are in source code, assembly language code, object code, or other instruction format.
  • In an embodiment of method 600, the process begins when a user enters an initial search query and the initial search query is received (602). In some embodiments, the initial search query is received by a Communications Module 110 and/or Control Module 112. In some embodiments, the set of information searched in response to the query is an email folder or email account in an electronic mail system. In other embodiments, the information searched in response to the query is a retail database of products. More generally, the information searched in response to the query is a database or other corpus of information.
  • Based on the initial search query, the Query Module 116 generates (604) an initial set of search results. In some embodiments, the Query Module 116 or Control Module 112 limits (606) the initial set to a predefined positive integer number of search results. For example, the set may be limited to 100 records. In alternative embodiments, the set of search results returned in response to the query is not limited, but there is a subsequent selection from among the set of search results so as to produce the initial set of search results. For example, the subsequent selection may impose a predefined limit on the number of search results selected, or may impose a quality or other restriction so as to produce the initial set of search results. Regardless of whether a limit is imposed on the initial set of search results or as part of subsequent selection, implementation of the limit may use random selection, selection of the highest ranked search results (e.g., PageRank), selection of the records most recently added to the database (e.g., email conversations with the most recently sent messages), selection of the records that are the most popular (e.g., selected the most frequently by users), or other criteria so as to produce a set of search results that complies with the limit. It is noted that one reason for limiting or reducing the size of the initial set of search results is to improve efficiency of subsequent operations of the method 600 while still providing sufficient data to produce good facets for presentation to the user.
  • In some embodiments, when the initial search query is blank, the initial set of search results is a currently viewed list of records. For example, if a user is viewing an email inbox, the “initial set of search results” comprises the messages in the user's inbox. In alternative embodiments, when there is a default initial set of search results (such as the messages in an inbox), the search query is automatically filled in to correspond to the initial set of search results. For example, while viewing inbox messages, a search query field or display region may be filled in with “in:inbox” so that the displayed search query corresponds to what the user is viewing.
  • As seen in FIG. 6A, the Facet Generation Module 118 generates (608) a set of candidate facets. Each of the candidate facets can be used to select a subset of the search results. As described above in FIG. 3, facets comprise an operator and zero or more operands. The operators listed in FIG. 3 are exemplary operators when the set of information to be searched is an email folder or email account. For example, the candidate facet “has:attachment” specifies that, when this facet is included in a search query, each email message (or each email conversation in conversation based email systems) in the search results must have at least one attachment. The operator “has:attachment” has zero operands. As another example, the operator “from:” requires an operand, which identifies the sender of a message, such as “from:john”. In some embodiments, candidate facets are generated for each operator based on the metadata and/or content in the initial search results. For example, if the initial search results include messages from fifty distinct people, the facet generation operation 608 would create a facet of the form “from:YYYY”, where YYYY can be any of the fifty names. Although the operators are predefined, the operators that have one or more operands can generate a large number of facets based on metadata and/or content in the search results.
  • It is noted that in some embodiments the “from:” operator can also be used with other operands, such as domain names (e.g., “google.com”) and special purpose operands (e.g., “mycontactlist,” where “from:mycontactlist” is true for any message received from any email address listed in the user contact list). When such additional operands are used, even more candidate facets are generated at 608.
  • In an embodiment that searches a retail database of products, the generation of facets is similar. For example, if an initial search query is looking for television sets, facets could be generated that specify screen size, brand, price, and so on. Note that the specification of screen size could use a two-operand operator (the size is between x inches and y inches) or two distinct facets (the size is greater than x inches and the size is less than y inches). In some embodiments, some of the facets can be predefined, such as “brand:sony” to specify a facet that would restrict the result set to only Sony® brand televisions. If there were no Sony® televisions in the result set, then “brand:sony” would not be one of the generated facets.
  • Because of the potentially large number of generated candidate facets, the Facet Ranking Module 120 ranks (610) the candidate facets. In some embodiments, the ranking is in accordance with selectivity of the candidate facets with respect to the initial set of search results. For example, the facet “has:attachment” would not be selective if none of the messages in the initial set of search results had any attachments. The same candidate facet would also not be selective if all of the search results had attachments. In some embodiments, the selectivity of the candidate facets is based, at least in part, on how evenly each candidate facet splits the initial set of search results (612). For example, if exactly half of the initial set of search results are messages with attachments, then “has:attachment” is highly selective. More specifically, one exemplary mathematical definition of selectivity of a candidate facet is

  • selectivity=−abs(N T/2−N F)
  • where NT is the total number of search results, NF is the number of search results that have the candidate facet, and abs( ) computes the absolute value of the number in the parentheses. A “perfect” score would be zero, indicating that a candidate facet exactly splits the search results. All other candidate facets would have a negative selectivity. This definition of selectivity can be converted to positive values by, for example, adding an offset such as NT/2 to the selectivity score shown above.
  • In some embodiments, each candidate facet has zero or more facet characteristics from a predefined set of facet characteristics (as described above with respect to FIGS. 3 and 4). In some embodiments, method 600 determines (614) which facet characteristics from a predefined set of facet characteristics each candidate facet has, and ranks (614) the candidate facets further in accordance with the facet characteristics (i.e., the candidate facets are ranked in accordance with both their selectivity and their facet characteristics). Each of the facet characteristics has some predictive value or utility in terms of predicting which candidate facets are more likely to be selected by users to refine the search query for which the initial set of search results was produced. In some embodiments, some facet characteristics are positively correlated with predicted utility of candidate facets, while other facet characteristics are negatively correlated with the predicted utility of candidate facets. The selection of the set of facet characteristics is not part of method 600; that selection process is described more fully below.
  • In some embodiments, each of the facet characteristics has an associated weight, and the ranking of the candidate facets is based in part on the weights associated with the facet characteristics (616). As shown in FIG. 4, some embodiments assign weights to each of the predefined facet characteristics. In some embodiments the weights are all positive numbers, but in other embodiments some of the weights may be negative. As shown in FIG. 5, some embodiments assign a characteristic vector 510 to each candidate facet 506 in facet list 224. In some embodiments, the ranking of each candidate facet is calculated as the vector dot product of the characteristic vector and the weights of the facet characteristics. In embodiments where the characteristic vectors contain only zeros and ones, the ranking of each candidate facet is the sum of the weights of all of the facet characteristics that apply to the candidate facet. As noted above, in some embodiments the ranking of each candidate facet is based on a more complex ranking function that computes, for each candidate facet, scores for multiple clusters of facet characteristics and combines those scores to produce a ranking value, instead of assigning a single fixed weight to each facet characteristic.
  • In some embodiments, the weights of facet characteristics are manually assigned based on analysis or intrinsic knowledge of the facet characteristics. For example, one may assume that the facet characteristic of being the most selective should have a higher weight than being in the top five for selectivity.
  • In some embodiments, the weight associated with each facet characteristic is based on historical popularity of presentation facets having the facet characteristics (618). In these embodiments, data is collected on which presentation facets users actually select compared to the predicted calculated ranking, and machine learning is used to adjust the weights to bring them more in line with actual usage. The machine learning can be performed in a testing environment, or in a production environment on an occasional, periodic or continual basis to improve selection of the presentation facets.
  • The Facet Ranking Module 120, Control Module 112 or User Interface Module 114 selects (620) a plurality of presentation facets from among the candidate facets in accordance with the rankings of the candidate facets. In some embodiments, the selection (620) takes the top R candidate facets based on the ranking, where R is the number of facets that can be displayed to a user. In other embodiments, the number of presentation facets is not fixed, but may be based on rankings or other criteria. For example, if there are 15 highly ranked candidate facets, then some embodiments would select all of them as presentation facets, even if the screen could only display ten of them at a time. In other embodiments, the candidate facets may be partitioned into distinct subsets (such as by operator), and the highest ranked candidate facets within each partition are selected as presentation facets.
  • In some embodiments, the presentation facets are organized into clusters based on other properties. For example, in some embodiments once a list of presentation facets is selected, they are organized for presentation (e.g., ordered) by type (person facets, content facets, etc.). In some other embodiments, the display order of the presentation facets is based on other use metric(s). For example, although the selection of presentation facets may be based on a learned ranking, some embodiments display the presentation facets in order of how often they were previously clicked. In other embodiments, the display of the presentation facets is in alphabetical order. Some embodiments use a mixture of the above presentation methods, while other embodiments organize presentation facets based on history or preferences of a user (e.g., preferences recorded in a user profile). In some embodiments, additional facets are displayed when a user selects a “show me more” button or a “show me more like this” button (which reveals more facets of the same type or are similar to an identified presentation facet). In embodiments that provide the ability to reveal additional facets, the additional facets are selected from the candidate facets based on the ranking of the candidate facets or other properties of the candidate facets (such as the operator).
  • After the presentation facets are selected, the Facet Display Interface 218 within the User Interface Module 114 formats (622) the presentation facets for display. The display of the presentation facets is described in more detail below with respect to FIG. 7.
  • A user may select any of the presentation facets once they are displayed. In response to user selection of any one of the presentation facets (624), the Control Module 112 performs two operations: First, the Control Module 112 creates (626) a revised search query comprising the initial search query and the selected presentation facet. In some embodiments the revised search query is the concatenation of the text string of the initial query and a text string corresponding to the selected presentation facet. Second, the requery procedure 220 within the Query Module 116 generates (628) a revised set of search results based on the revised search query. In some embodiments, the revised search query generates the revised set of search results from scratch. In other words, the revised set of search results is retrieved from the database using the revised search query, without making use of the prior search results. In other embodiments, the revised search query is applied to the initial set of search results to generate the revised set of search results. Thus, in these other embodiments the revised set of search results is selected from the initial set of search results. After the revised set of search results is generated, Facet Generation Module 118 generates (608) a new set of candidate facets, and proceeds in the same way as processing the initial search query.
  • It is noted that in some embodiments the presentation facets are links back to the search augmentation system 100. Each such link contains a URL and one or more URL parameters that specifies the previous search query (or other information that enables the search augmentation system 100 to obtain the previous search query) and the user selected presentation facet. Thus, user selection of a presentation facet causes an HTTP request to be sent to the search augmentation system 100 with the aforementioned parameters.
  • In other embodiments, user selection of a presentation facet causes the client application (e.g., a browser application) to augment the search query with the presentation facet, but does not automatically send the resulting revised search query to the search augmentation system. This enables the user to further edit or further augment the search query before sending the search query to the search augmentation system 100 to obtain an new set of search results. In these embodiments, the client application 104 at the client includes instructions for responding to user selection of a facet by augmenting the search query with the presentation facet.
  • After the presentation facets are displayed, a user need not select one of them. The user can take any other action that is appropriate after querying the database. For example, the user could view any of the search results or could refine the search query manually.
  • FIG. 7 shows an exemplary graphical user interface (GUI) 700 for an email system that generates facets for selection by a user. Search query entry box 702 allows a user to enter a search query. After the user enters a search query, the search query is displayed in the box 702, and the user may execute the query by pressing search button 710, pressing the ENTER key, or taking any other action designated by GUI 700 to execute the query. After execution of the search query, search results 708 are displayed in the GUI. In some embodiments, the “search results” 708 are the content of an email folder when no search query has been issued. For example, in GUI 700, the “search results” 708 show a list of conversations (some of which have more than one message, as indicated by the integer value 705 in parentheses adjacent the sender list 707 for each listed conversation that contains more than one message) in the user's inbox.
  • GUI 700 shows presentation facets 704-1, 704-2, . . . , 704-5, which are located in a horizontal array just below the search query entry box 702. In some embodiments, a user selects a presentation facet by clicking on it. For example, if a user clicks on presentation facet “to:yahoo.com” 704-2, the facet “to:yahoo.com” would be added to the search query. In alternative embodiments, the presentation facets are placed in other locations in the GUI 700, the presentation facets are aligned vertically, or there are more or fewer presentation facets displayed. In some embodiments, a clickable icon 706 is placed next to each presentation facet that designates the logically opposite facet. For example, The icon (-) 706 next to presentation facet “is:unread” 704-5 would designate the facet “not is:unread”; i.e., messages that have been read. In general, the clickable icon (-) 706 may be placed next to each of the presentation facets. One of skill in the art would recognize that many alternative graphic symbols or text could be used to designate presentation facets that are the exact opposites of the ones displayed.
  • Prior to execution of method 600 in FIGS. 6A and 6B, a set of facet characteristics is established. The universe of potential facet characteristics has at least three general categories. The first general category consists of facet characteristics based on the historical activities of one or more users. Here the idea is that past behavior is a predictor of future behavior. The second general category consists of measuring the effects of candidate facets on the initial set of search results, typically by counting based on some rule. The third category consists of intrinsic characteristics of the operators and their values.
  • The first category of facet characteristics is based on the search history of the user, or alternately the search history of a community of users. Previous user behavior is a good indication of a facet's usefulness. A candidate facet that has been frequently selected in the past by a user, or community of users, is likely to be selected in the future. Additionally, the context in which a facet was selected may influence its selection. For example, a user may search for the term “john” and then select the facet “from:john smith.” In this case, the facet is useful in the context of the query “john” but may not be relevant to other queries.
  • There are several ways to evaluate candidate facets using the search history of users:
      • By facet: Score a candidate facet by how many times it has been used in previous queries. This includes every occurrence of the candidate facet in the user's query history, or the query history of a community of users.
      • By terms in search query: Given the terms in the current search query (if any), how many times has the candidate facet occurred together with all of these terms in previous queries? This includes any previous queries that contained all these terms plus the candidate facet as a subset, regardless of order in the previous queries. For example, the candidate facet “from:bill” and the query “to:me” would match the previous query “from:bill to:me has:attachment.”
      • By exact query: Given the terms in the current search query and the candidate facet, how many times have all of these terms been used together in a previous query without any additional terms? For example, the candidate facet “from:bill” and the query “to:me” would not match the previous query “from:bill to:me has:attachment” because the previous query has the additional term “has: attachment.”
  • These three popularity evaluation approaches provide a range of flexibility for measuring facet popularity. The first is context insensitive, the second mildly context sensitive and the third requires context. Different embodiments may use any subset of these approaches, or may use all three approaches.
  • Some embodiments of the present invention have facet characteristics corresponding to each popularity evaluation approach and how many times each candidate facet matches previous queries based on the popularity evaluation approach. For example, facet characteristics may correspond to zero matches, one or more match, exactly one match, exactly two matches, three or more matches, or other similar counts. Because of the three distinct approaches and the different counts that may be used, there can be many facet characteristics that measure popularity. Embodiments of the present invention may use any subset of these possible facet characteristics. One of skill in the art would recognize that alternative popularity approaches are possible and alternative matching methodologies are possible, creating a much broader list of possible facet characteristics.
  • Even when a candidate facet is not truly “popular,” it may be popular relative to other candidate facets, or vice versa. Using the same three popularity approaches above, some embodiments of the present invention include facet characteristics based on the relative popularity of candidate facets. Some embodiments include facet characteristics that identify the most popular of the candidate facets, the second most popular of the candidate facets, the third most popular of the candidate facets, or the top five most popular of the candidate facets. For example, one exemplary facet characteristic is “the most popular candidate facet based on terms in the search query.” This example facet characteristic would apply to only one candidate facet, unless there were two or more candidate facets that tied for usage.
  • The second category of facet characteristics is based on counting search results that match certain criteria. One criterion (sometimes called selectivity) is how evenly a facet splits the search results. This criterion can be converted into facet characteristics by comparing the relative selectivity of the candidate facets. For example, some embodiments include facet characteristics corresponding to: the candidate facet that is number 1 in selectivity, the candidate facet that is number 2 in selectivity, the candidate facet that is number 3 in selectivity, the candidate facets that are in the top five for selectivity, and the candidate facets that are not in the top 5 for selectivity. Note that the last two exemplary facet characteristics are opposites. Generally, not being in the top 5 for selectivity would be negatively correlated with the ultimate ranking of candidate facets.
  • Additional exemplary facet characteristics in the second category are based on simple counts of the candidate facets. In some embodiments, there is a facet characteristic based on the number of search results that match each candidate facet. For example, in an email context, the candidate facet “to:bob” would count the number of messages in the search results where Bob was listed in the “To:” field of the message header. More recent results may be more relevant to the user's search, so some embodiments include characteristics that count the number of matches within the most recent N records (e.g., email messages). For example, some embodiments include characteristics that count the number of matches from the first N search results, for N=5, 10, 20, or 50.
  • The third category of facet characteristics is based on the basic types of the candidate facets. The operators (e.g., “to:”, “from:”) and the values of the operands (“bill”, “domain.com,” etc.) may inherently affect the utility of a candidate facet. For example, facets that use the operator “to:” may be generally more relevant than facets that use the operator “cc:”. The facet characteristics in this third category capture properties of candidate facets that are generally consistent across a wide range of users. The examples provided here pertain to the context of searching email, but similar analysis would create facet characteristics applicable to other contexts. Also, although a “person operator” such as “to:”, “from:”, or “cc:” would generally have an operand that is the name of a person or an email address, these operators can also have operands that are domain names (e.g., “to:google.com”) or symbolically represent something else (e.g., “to:me”). These are described more fully above with respect to FIG. 3.
  • Some embodiments of the present invention include facet characteristics to identify: a person operator whose operand is the name of a person; a person operator whose operand is a domain name; a person operator whose operand is the user (“me”); a person operator whose operand is an email address; or a person operator whose operand contains a hyphen. In some embodiments, hyphens and other non-alphanumeric characters in an operand correspond to properties of an email address. For example, in some organizations, hyphens are used only within mailing lists, such as “all-domestic-employees@company.com”. By examining the email addresses of other organizations, other facet characteristics could be created to evaluate candidate facets. Some embodiments include facet characteristics that are conjunctions of the type of the operator together with the type of the value as just described. For example, some embodiments include the facet characteristic of being sent “to a domain name” (this facet characteristic would apply to the candidate facet “to:google.com” but would not apply to the candidate facet “from:google.com” or the candidate facet “to:bob”).
  • Some embodiments include the facet characteristic of having an operator value that is a personal name in a user's address book or having an operator value that is an email address in the user's address book. Address book membership may indicate familiarity and therefore may influence the relevance of any candidate facet that includes these people. Some embodiments also include a facet characteristic of having a personal name that is similar to a name that appears in the initial query. For example, if a user's initial query was “Bill,” then the candidate “from:bill smith” may be particularly relevant.
  • Those of skill in the art would recognize that many combinations of facet characteristics are possible that are consistent with the teaching of the present invention. Furthermore, if additional metadata fields are available, additional candidate facets may be generated, and thus additional facet characteristics may be appropriate to evaluate the utility of the additional candidate facets.
  • The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.

Claims (23)

What is claimed is:
1. A computer-implemented method of searching a set of information, comprising:
at a computer system having one or more processors and memory storing one or more programs executed by the one or more processors, wherein performance of the method is controlled by execution of the one or more programs:
generating an initial set of search results based on an initial search query;
without further user input:
generating a set of candidate facets, each of which can be used to select a subset of the initial set of search results;
ranking the candidate facets in accordance with selectivity of the candidate facets with respect to at least some of the search results, wherein the selectivity of the candidate facets is based, at least in part, on how evenly each candidate facet, when combined with the initial search query, splits the initial set of search results;
selecting a plurality of presentation facets from among the candidate facets in accordance with the rankings of the candidate facets;
formatting for display the presentation facets; and
in response to user selection of any one of the presentation facets, generating a revised search query comprising the initial search query and the selected presentation facet, and generating a revised set of search results based on the revised search query.
2. The method of claim 1, including responding to the initial search query by formatting for concurrent display the initial search query, a subset of the initial set of search results, and the presentation facets.
3. The method of claim 1, further comprising, without further user input, determining for each candidate facet which facet characteristics from a predefined set of facet characteristics are characteristics of the candidate facet, wherein the ranking further includes ranking the candidate facets in accordance with the facet characteristics of the candidate facets.
4. The method of claim 3, wherein each facet characteristic of the predefined set of facet characteristics has an associated weight, and the ranking of the candidate facets is based in part on the weights associated with the facet characteristics of the candidate facets.
5. The method of claim 4, wherein the weight associated with each facet characteristic is based on historical popularity of presentation facets having the facet characteristic.
6. The method of claim 1, wherein the set of information comprises one or more of the set consisting of:
electronic messages associated with an individual user;
product specifications for a set of products; and
product information about product features for a set of products.
7. The method of claim 1, further including generating, in response to the user selection of one of the presentation facets, a revised set of candidate facets corresponding to the revised set of search results; and formatting for display at least a subset of the revised set of candidate facets.
8. The method of claim 1, wherein generating a set of candidate facets includes generating at least one candidate facet for each of a plurality of predefined operators based on the initial search results.
9. The method of claim 8, wherein generating a candidate facet for a respective predefined operator includes generating a candidate facet based on metadata in the initial search results.
10. A system for searching a set of information, comprising:
one or more processors;
memory; and
one or more programs stored in the memory, the one or more programs comprising instructions executed by the one or more processors so as to:
generate an initial set of search results based on an initial search query;
without further user input:
generate a set of candidate facets, each of which can be used to select a subset of the initial set of search results;
rank the candidate facets in accordance with selectivity of the candidate facets with respect to at least some of the search results, wherein the selectivity of the candidate facets is based, at least in part, on how evenly each candidate facet, when combined with the initial search query, splits the initial set of search results;
select a plurality of presentation facets from among the candidate facets in accordance with the rankings of the candidate facets; and
format for display the presentation facets; and
in response to user selection of any one of the presentation facets, generate a revised search query comprising the initial search query and the selected presentation facet, and generate a revised set of search results based on the revised search query.
11. The system of claim 10, further comprising instructions to determine for each candidate facet, without further user input, which facet characteristics from a predefined set of facet characteristics are characteristics of the candidate facet, wherein the instructions to rank the candidate facets further include instructions to rank the candidate facets in accordance with the facet characteristics of the candidate facets.
12. The system of claim 11, wherein each facet characteristic of the predefined set of facet characteristics has an associated weight, and the instructions to rank the candidate facets performs the ranking in part based on the weights associated with the facet characteristics of the candidate facets.
13. The system of claim 12, wherein the weight associated with each facet characteristic is based on historical popularity of presentation facets having the facet characteristic.
14. The system of claim 10, wherein the set of information comprises one or more of the set consisting of:
electronic messages associated with an individual user;
product specifications for a set of products; and
product information about product features for a set of products.
15. The system of claim 10, wherein the instructions to generate a set of candidate facets include instructions to generate, in response to the user selection of one of the presentation facets, a revised set of candidate facets corresponding to the revised set of search results; and the one or more programs include instructions for formatting for display at least a subset of the revised set of candidate facets.
16. The system of claim 10, wherein the instructions to generate the initial set of search results includes instructions to generate at least one candidate facet for each of a plurality of predefined operators based on the initial search results.
17. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer, the one or more programs comprising instructions to be executed by the one or more processors so as to:
generate an initial set of search results based on an initial search query;
without further user input:
generate a set of candidate facets, each of which can be used to select a subset of the initial set of search results;
rank the candidate facets in accordance with selectivity of the candidate facets with respect to at least some of the search results, wherein the selectivity of the candidate facets is based, at least in part, on how evenly each candidate facet, when combined with the initial search query, splits the initial set of search results;
select a plurality of presentation facets from among the candidate facets in accordance with the rankings of the candidate facets; and
format for display the presentation facets; and
in response to user selection of any one of the presentation facets, generate a revised search query comprising the initial search query and the selected presentation facet, and generate a revised set of search results based on the revised search query.
18. The computer readable storage medium of claim 17, further comprising instructions to determine for each candidate facet, without further user input, which facet characteristics from a predefined set of facet characteristics are characteristics of the candidate facet, wherein the instructions to rank the candidate facets further include instructions to rank the candidate facets in accordance with the facet characteristics of the candidate facets.
19. The computer readable storage medium of claim 17, wherein each facet characteristic of the predefined set of facet characteristics has an associated weight, and the instructions to rank the candidate facets performs the ranking in part based on the weights associated with the facet characteristics of the candidate facets.
20. The computer readable storage medium of claim 19, wherein the weight associated with each facet characteristic is based on historical popularity of presentation facets having the facet characteristic.
21. The computer readable storage medium of claim 17, wherein the set of information comprises one or more of the set consisting of:
electronic messages associated with an individual user;
product specifications for a set of products; and
product information about product features for a set of products.
22. The computer readable storage medium of claim 17, wherein the instructions to generate a set of candidate facets include instructions to generate, in response to the user selection of one of the presentation facets, a revised set of candidate facets corresponding to the revised set of search results; and the one or more programs include instructions for formatting for display at least a subset of the revised set of candidate facets.
23. The computer readable storage medium of claim 17, the instructions to generate a set of candidate facets include instructions to generate at least one candidate facet for each of a plurality of predefined operators based on the initial search results.
US13/857,102 2009-09-30 2013-04-04 Facet Suggestion for Search Query Augmentation Abandoned US20130226916A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/857,102 US20130226916A1 (en) 2009-09-30 2013-04-04 Facet Suggestion for Search Query Augmentation

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US24751209P 2009-09-30 2009-09-30
US12/894,079 US8433705B1 (en) 2009-09-30 2010-09-29 Facet suggestion for search query augmentation
US13/857,102 US20130226916A1 (en) 2009-09-30 2013-04-04 Facet Suggestion for Search Query Augmentation

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/894,079 Continuation US8433705B1 (en) 2009-09-30 2010-09-29 Facet suggestion for search query augmentation

Publications (1)

Publication Number Publication Date
US20130226916A1 true US20130226916A1 (en) 2013-08-29

Family

ID=48146157

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/894,079 Expired - Fee Related US8433705B1 (en) 2009-09-30 2010-09-29 Facet suggestion for search query augmentation
US13/857,102 Abandoned US20130226916A1 (en) 2009-09-30 2013-04-04 Facet Suggestion for Search Query Augmentation

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/894,079 Expired - Fee Related US8433705B1 (en) 2009-09-30 2010-09-29 Facet suggestion for search query augmentation

Country Status (1)

Country Link
US (2) US8433705B1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120254162A1 (en) * 2011-03-31 2012-10-04 Infosys Technologies Ltd. Facet support, clustering for code query results
US20160132602A1 (en) * 2014-11-06 2016-05-12 Kumaresh Pattabiraman Guided search
WO2017019239A1 (en) * 2015-07-29 2017-02-02 Linkedin Corporation Hybrid facet counting
US9594540B1 (en) * 2012-01-06 2017-03-14 A9.Com, Inc. Techniques for providing item information by expanding item facets
US20170221120A1 (en) * 2016-01-30 2017-08-03 Wal-Mart Stores, Inc. Systems and methods for browse facet ranking
US9760369B2 (en) 2013-12-13 2017-09-12 Infosys Limited Assessing modularity of a program written in object oriented language
US20190065584A1 (en) * 2017-08-31 2019-02-28 International Business Machines Corporation Document ranking by progressively increasing faceted query
US10242103B2 (en) * 2017-02-15 2019-03-26 International Business Machines Corporation Dynamic faceted search
US10282453B2 (en) 2015-12-07 2019-05-07 Microsoft Technology Licensing, Llc Contextual and interactive sessions within search
US10346442B2 (en) 2016-11-17 2019-07-09 International Business Machines Corporation Corpus management by automatic categorization into functional domains to support faceted querying
US20220207087A1 (en) * 2020-12-26 2022-06-30 International Business Machines Corporation Optimistic facet set selection for dynamic faceted search
US11544324B2 (en) * 2021-02-22 2023-01-03 Oracle International Corporation Filter recommendation based on historical search result selection
US11797545B2 (en) 2020-04-21 2023-10-24 International Business Machines Corporation Dynamically generating facets using graph partitioning
US11940996B2 (en) 2020-12-26 2024-03-26 International Business Machines Corporation Unsupervised discriminative facet generation for dynamic faceted search

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10956475B2 (en) 2010-04-06 2021-03-23 Imagescan, Inc. Visual presentation of search results
US7933859B1 (en) 2010-05-25 2011-04-26 Recommind, Inc. Systems and methods for predictive coding
US9262532B2 (en) * 2010-07-30 2016-02-16 Yahoo! Inc. Ranking entity facets using user-click feedback
US9298816B2 (en) 2011-07-22 2016-03-29 Open Text S.A. Methods, systems, and computer-readable media for semantically enriching content and for semantic navigation
US11010432B2 (en) * 2011-10-24 2021-05-18 Imagescan, Inc. Apparatus and method for displaying multiple display panels with a progressive relationship using cognitive pattern recognition
US20130339372A1 (en) * 2012-06-14 2013-12-19 Santhosh Adayikkoth System and method for contexual ranking of information facets
US9280587B2 (en) * 2013-03-15 2016-03-08 Xerox Corporation Mailbox search engine using query multi-modal expansion and community-based smoothing
US9930002B2 (en) 2013-12-27 2018-03-27 Entefy Inc. Apparatus and method for intelligent delivery time determination for a multi-format and/or multi-protocol communication
US9819621B2 (en) 2013-12-27 2017-11-14 Entefy Inc. Apparatus and method for optimized multi-format communication delivery protocol prediction
US9843543B2 (en) 2013-12-27 2017-12-12 Entefy Inc. Apparatus and method for multi-format and multi-protocol group messaging
US20170193009A1 (en) 2015-12-31 2017-07-06 Entefy Inc. Systems and methods for filtering of computer vision generated tags using natural language processing
US10394966B2 (en) 2014-02-24 2019-08-27 Entefy Inc. Systems and methods for multi-protocol, multi-format universal searching
US10169447B2 (en) 2014-02-24 2019-01-01 Entefy Inc. System and method of message threading for a multi-format, multi-protocol communication system
US11755629B1 (en) 2014-02-24 2023-09-12 Entefy Inc. System and method of context-based predictive content tagging for encrypted data
US10095748B2 (en) 2014-03-03 2018-10-09 Microsoft Technology Licensing, Llc Personalized information query suggestions
US9547729B2 (en) 2014-05-30 2017-01-17 International Business Machines Corporation Adaptive query processor for query systems with limited capabilities
FR3035245B1 (en) * 2015-04-20 2018-08-24 Qwant METHOD OF SEARCHING ACCESSIBLE PAGES ON A NETWORK
FR3035244B1 (en) * 2015-04-20 2018-08-24 Qwant METHOD OF SEARCHING ACCESSIBLE PAGES ON A NETWORK
US10853426B2 (en) * 2015-04-20 2020-12-01 Qwant Method for searching for pages accessible over a network
US10445386B2 (en) 2015-10-14 2019-10-15 Microsoft Technology Licensing, Llc Search result refinement
US10409830B2 (en) * 2015-10-14 2019-09-10 Microsoft Technology Licensing, Llc System for facet expansion
US10135764B2 (en) 2015-12-31 2018-11-20 Entefy Inc. Universal interaction platform for people, services, and devices
US10353754B2 (en) 2015-12-31 2019-07-16 Entefy Inc. Application program interface analyzer for a universal interaction platform
US10824630B2 (en) * 2016-10-26 2020-11-03 Google Llc Search and retrieval of structured information cards
US20180189352A1 (en) 2016-12-31 2018-07-05 Entefy Inc. Mixed-grained detection and analysis of user life events for context understanding
US10491690B2 (en) 2016-12-31 2019-11-26 Entefy Inc. Distributed natural language message interpretation engine
US10572924B2 (en) * 2017-01-31 2020-02-25 Walmart Apollo, Llc Automatic generation of featured filters
US11544400B2 (en) * 2017-02-24 2023-01-03 Hyland Uk Operations Limited Permissions-constrained dynamic faceting of search results in a content management system
US10410261B2 (en) * 2017-05-25 2019-09-10 Walmart Apollo, Llc Systems and methods for determining facet rankings for a website
US11573990B2 (en) 2017-12-29 2023-02-07 Entefy Inc. Search-based natural language intent determination
US11948023B2 (en) 2017-12-29 2024-04-02 Entefy Inc. Automatic application program interface (API) selector for unsupervised natural language processing (NLP) intent classification
US10587553B1 (en) 2017-12-29 2020-03-10 Entefy Inc. Methods and systems to support adaptive multi-participant thread monitoring
US10628432B2 (en) * 2018-02-19 2020-04-21 Microsoft Technology Licensing, Llc Personalized deep models for smart suggestions ranking
US10726025B2 (en) 2018-02-19 2020-07-28 Microsoft Technology Licensing, Llc Standardized entity representation learning for smart suggestions
US11436522B2 (en) 2018-02-19 2022-09-06 Microsoft Technology Licensing, Llc Joint representation learning of standardized entities and queries
US10956515B2 (en) 2018-02-19 2021-03-23 Microsoft Technology Licensing, Llc Smart suggestions personalization with GLMix
US10902066B2 (en) * 2018-07-23 2021-01-26 Open Text Holdings, Inc. Electronic discovery using predictive filtering
US11645295B2 (en) 2019-03-26 2023-05-09 Imagescan, Inc. Pattern search box
US11361030B2 (en) * 2019-11-27 2022-06-14 International Business Machines Corporation Positive/negative facet identification in similar documents to search context
US11875393B2 (en) * 2020-01-28 2024-01-16 Salesforce, Inc. Generation of recommendations from dynamically-mapped data
US11663279B2 (en) * 2021-05-05 2023-05-30 Capital One Services, Llc Filter list generation system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061209A1 (en) * 2001-09-27 2003-03-27 Simon D. Raboczi Computer user interface tool for navigation of data stored in directed graphs
US20060059134A1 (en) * 2004-09-10 2006-03-16 Eran Palmon Creating attachments and ranking users and attachments for conducting a search directed by a hierarchy-free set of topics
US20060129531A1 (en) * 2004-12-09 2006-06-15 International Business Machines Corporation Method and system for suggesting search engine keywords
US20060179035A1 (en) * 2005-02-07 2006-08-10 Sap Aktiengesellschaft Methods and systems for providing guided navigation
US7657522B1 (en) * 2006-01-12 2010-02-02 Recommind, Inc. System and method for providing information navigation and filtration
US20100146012A1 (en) * 2008-12-04 2010-06-10 Microsoft Corporation Previewing search results for suggested refinement terms and vertical searches
US7865498B2 (en) * 2002-09-23 2011-01-04 Worldwide Broadcast Network, Inc. Broadcast network platform system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6578022B1 (en) * 2000-04-18 2003-06-10 Icplanet Corporation Interactive intelligent searching with executable suggestions
US20080250008A1 (en) * 2007-04-04 2008-10-09 Microsoft Corporation Query Specialization
US20090222412A1 (en) * 2008-02-29 2009-09-03 Microsoft Corporation Facet visualization
US8832135B2 (en) * 2008-05-02 2014-09-09 Verint Systems, Ltd. Method and system for database query term suggestion
US9152702B2 (en) * 2010-04-09 2015-10-06 Yahoo! Inc. System and method for selecting search results facets

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061209A1 (en) * 2001-09-27 2003-03-27 Simon D. Raboczi Computer user interface tool for navigation of data stored in directed graphs
US7865498B2 (en) * 2002-09-23 2011-01-04 Worldwide Broadcast Network, Inc. Broadcast network platform system
US20060059134A1 (en) * 2004-09-10 2006-03-16 Eran Palmon Creating attachments and ranking users and attachments for conducting a search directed by a hierarchy-free set of topics
US20060129531A1 (en) * 2004-12-09 2006-06-15 International Business Machines Corporation Method and system for suggesting search engine keywords
US20060179035A1 (en) * 2005-02-07 2006-08-10 Sap Aktiengesellschaft Methods and systems for providing guided navigation
US7657522B1 (en) * 2006-01-12 2010-02-02 Recommind, Inc. System and method for providing information navigation and filtration
US20100146012A1 (en) * 2008-12-04 2010-06-10 Microsoft Corporation Previewing search results for suggested refinement terms and vertical searches

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9348894B2 (en) * 2011-03-31 2016-05-24 Infosys Limited Facet support, clustering for code query results
US20120254162A1 (en) * 2011-03-31 2012-10-04 Infosys Technologies Ltd. Facet support, clustering for code query results
US9594540B1 (en) * 2012-01-06 2017-03-14 A9.Com, Inc. Techniques for providing item information by expanding item facets
US9760369B2 (en) 2013-12-13 2017-09-12 Infosys Limited Assessing modularity of a program written in object oriented language
US10691760B2 (en) * 2014-11-06 2020-06-23 Microsoft Technology Licensing, Llc Guided search
US20160132602A1 (en) * 2014-11-06 2016-05-12 Kumaresh Pattabiraman Guided search
WO2017019239A1 (en) * 2015-07-29 2017-02-02 Linkedin Corporation Hybrid facet counting
US10795894B2 (en) 2015-07-29 2020-10-06 Microsoft Technology Licensing, Llc Hybrid facet counting using different sampling rates
US10282453B2 (en) 2015-12-07 2019-05-07 Microsoft Technology Licensing, Llc Contextual and interactive sessions within search
US20170221120A1 (en) * 2016-01-30 2017-08-03 Wal-Mart Stores, Inc. Systems and methods for browse facet ranking
US10853863B2 (en) * 2016-01-30 2020-12-01 Walmart Apollo, Llc Systems and methods for browse facet ranking
US10346442B2 (en) 2016-11-17 2019-07-09 International Business Machines Corporation Corpus management by automatic categorization into functional domains to support faceted querying
US11163804B2 (en) 2016-11-17 2021-11-02 International Business Machines Corporation Corpus management by automatic categorization into functional domains to support faceted querying
US10242103B2 (en) * 2017-02-15 2019-03-26 International Business Machines Corporation Dynamic faceted search
US10838994B2 (en) * 2017-08-31 2020-11-17 International Business Machines Corporation Document ranking by progressively increasing faceted query
US20190065584A1 (en) * 2017-08-31 2019-02-28 International Business Machines Corporation Document ranking by progressively increasing faceted query
US11797545B2 (en) 2020-04-21 2023-10-24 International Business Machines Corporation Dynamically generating facets using graph partitioning
US20220207087A1 (en) * 2020-12-26 2022-06-30 International Business Machines Corporation Optimistic facet set selection for dynamic faceted search
US11940996B2 (en) 2020-12-26 2024-03-26 International Business Machines Corporation Unsupervised discriminative facet generation for dynamic faceted search
US11544324B2 (en) * 2021-02-22 2023-01-03 Oracle International Corporation Filter recommendation based on historical search result selection

Also Published As

Publication number Publication date
US8433705B1 (en) 2013-04-30

Similar Documents

Publication Publication Date Title
US8433705B1 (en) Facet suggestion for search query augmentation
JP4591217B2 (en) Recommendation information provision system
KR101016683B1 (en) Systems and methods for providing search results
US20170357725A1 (en) Ranking content items provided as search results by a search application
KR102046096B1 (en) Resource efficient document search
US9836544B2 (en) Methods and systems for prioritizing a crawl
US8620745B2 (en) Selecting advertisements for placement on related web pages
JP5632574B2 (en) System and method for improving ranking of news articles
US8498984B1 (en) Categorization of search results
US7580568B1 (en) Methods and systems for identifying an image as a representative image for an article
US8799302B2 (en) Recommended alerts
US9384289B2 (en) Method and system to identify geographical locations associated with queries received at a search engine
US8554768B2 (en) Automatically showing additional relevant search results based on user feedback
US20110179025A1 (en) Social and contextual searching for enterprise business applications
US20140108445A1 (en) System and Method for Personalizing Query Suggestions Based on User Interest Profile
US20050278314A1 (en) Variable length snippet generation
US8326836B1 (en) Providing time series information with search results
US7707142B1 (en) Methods and systems for performing an offline search
US8909720B2 (en) Identifying message threads of a message storage system having relevance to a first file
US8812602B2 (en) Identifying conversations in a social network system having relevance to a first file
JP2009508267A (en) Ranking blog documents
JP2007517308A (en) Method and system for improving search ranking using article information
US11886444B2 (en) Ranking search results using hierarchically organized coefficients for determining relevance
JP2014067314A (en) Electronic commerce server device
US20150358270A1 (en) System and method for targeting information based on a list of message content

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DREDZE, MARK H.;SCHILIT, WILLIAM N.;SIGNING DATES FROM 20100927 TO 20101025;REEL/FRAME:035941/0261

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION