WO2014209925A1 - Person search utilizing entity expansion - Google Patents

Person search utilizing entity expansion Download PDF

Info

Publication number
WO2014209925A1
WO2014209925A1 PCT/US2014/043750 US2014043750W WO2014209925A1 WO 2014209925 A1 WO2014209925 A1 WO 2014209925A1 US 2014043750 W US2014043750 W US 2014043750W WO 2014209925 A1 WO2014209925 A1 WO 2014209925A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
query
search query
related entity
expanded
Prior art date
Application number
PCT/US2014/043750
Other languages
French (fr)
Inventor
Justin Ormont
Marc Eliot Davis
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/931,922 external-priority patent/US20150006520A1/en
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Priority to CN201480037264.2A priority Critical patent/CN105493082A/en
Priority to KR1020157036770A priority patent/KR20160026907A/en
Priority to EP14740077.4A priority patent/EP3014486A1/en
Publication of WO2014209925A1 publication Critical patent/WO2014209925A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion

Definitions

  • a search query is received from a computer user, the search query identifying a person for which content (or references to content) is sought.
  • related entity data is obtained from at least one related entity source for the identified person.
  • Related entity data comprises at least one of a related entity (or entities) or a category associated with the identified person.
  • An expanded search query is generated according to the search query from the computer user and the related entity data. Search results are obtained according to the expanded search query and a search results presentation is generated and returned to the computer user in response to the search query.
  • a computer-readable medium bearing computer-executable instructions When executed on a computing system comprising at least a processor executing the instructions retrieved from the medium, the computing system is configured to carry out a method for responding to a search query from a user. More particularly, in response to receiving a search query from a computer user, where the search query identifies a person for which content (or references to content) is sought, related entity data is obtained from at least one related entity source for the identified person. An expanded search query is generated according to the search query from the computer user and the related entity data. Search results are obtained according to the expanded search query and a search results presentation is generated and returned to the computer user in response to the search query.
  • a computer system for responding to a search query for content related to a person comprises a processor and a memory, wherein the processor executes instructions stored in the memory as part of or in conjunction with additional components to respond to a search query for content related to a person.
  • additional components include (by way of illustration and not limitation) a query topic identification component, a related entity retrieval component, an expanded query generator, a search results retrieval component, and a search results presentation generator.
  • the query topic identification component configured to determine the identity of a person from the search query for which related content is sought.
  • the related entity retrieval component obtains related entity data corresponding to the identified person from a related entity source.
  • the expanded query generator After obtaining related entity data, the expanded query generator generates an expanded query from the search query for content related to the identified person and from the related entity data.
  • the related entity data comprises at least one of a related entity or a category associated with the identified person of the search query.
  • the search results retrieval component obtains search results from a content store according to the expanded search query.
  • the search results presentation generator generates a search results presentation according to the search results referencing content corresponding to the identified person and returns the search results presentation to the computer user.
  • Figure 1 is a block diagram of a networked environment suitable for
  • Figure 2 is a flow diagram illustrating an exemplary routine for providing improved results in response to a search query regarding content for a particular person through query expansion;
  • Figure 3 is a flow diagram illustrating an exemplary routine for generating an expanded search query according to aspects of the disclosed subject matter;
  • Figures 4 and 5 illustrate elements of expanded search queries;
  • Figure 6 is a block diagram illustrating exemplary components of a search engine configured to provide improved results in response to a search query from a computer user.
  • FIG. 1 is a block diagram illustrating an exemplary networked environment 100 suitable for implementing aspects of the disclosed subject matter, particularly in regard to providing improved search results to a computer user in response to a search query regarding a person.
  • the exemplary networked environment 100 includes one or more user computers, such as user computers 102-106, connected to a network 108, such as the Internet, a wide area network or WAN, and the like.
  • User computers include, by way of illustration and not limitation: desktop computers (such as desktop computer 104); laptop computers (such as laptop computer 102); tablet computers (such as tablet computer 106); mobile devices (not shown); game consoles (not shown); personal digital assistants (not shown); and the like.
  • User computers may be configured to connect to the network 108 by way of wired and/or wireless connections.
  • the exemplary networked environment 100 illustrates the network 108 as being located between the user computers 102-106 and the search engine 110, and again between the search engine 110 and the network sites 112-116. This illustration, however, should not be construed as suggesting that these are separate networks.
  • network sites 110-116 Also connected to the network 108 are various networked sites, including network sites 110-116.
  • the networked sites connected to the network 108 include a search engine 110 configured to respond to search queries from computer users, news sources 112 and 114 which host various news articles and content, a social networking site 116, and the like.
  • a computer user such as computer user 101, may navigate via a user computer, such as user computer 102, to these and other networked sites to access content, including news content.
  • the search engine 110 is configured to provide search results (typically in the form of references to content available on the network 108) in response to a search query from a computer user.
  • search engine 110 identifies content related to the identified person according to information in its content store, generates a search results presentation based on at least some of the identified content, and provides the search results presentation to the computer user.
  • Figure 1 also illustratively includes a social network site 116 and various news sources, including news sites 112-114.
  • a social network site 116 is an online site/service that provides a platform in which a computer user can establish a profile describing various aspects of the user, build relationships and social networks with other computer users, groups, and the like.
  • a computer user can establish or indicate various interests, activities, and backgrounds with those in his/her social network.
  • social networking site 116 a computer user is often able to indicate a preference or an interest in a particular entity on a social networking service as might be hosted by social networking site 116, whether that entity is a person, a place, a group, a concept, an activity, and the like.
  • social networking site 116 is included in the illustrative network environment 100, this is merely illustrative and should not be viewed as limiting upon the disclosed subject matter. In an actual embodiment, there may be any number of social network sites connected to the network 108.
  • the search engine 110 is configured to communicate (directly or indirectly through services calls and/or web crawlers) with multiple content sources, including news sites 112 and 114, social networking site 116, and other sites such as blogs and registries (not shown) to obtain information regarding the content that is available at each network site. Information regarding available content may also be pushed to the search engine from various services and/or networking sites. This information is stored (typically as references to the content) in a content store such that the search engine can obtain content from this content store in order to respond to a search query from a computer user, such as computer user 101. The search engine 110 may also obtain information regarding any given individual from search query logs, network browsing histories, purchase histories, and the like.
  • a search engine 110 may also be configured to obtain information from other network sites when responding to a search query.
  • the search engine 1 10 may obtain data from one or more social networking sites, such as social network site 116, as relevant information to return to the requesting computer user and/or as information to assist the search engine in identifying relevant information to return to the requesting computer user.
  • Figure 2 is a flow diagram of an exemplary routine for providing improved results in response to a search query regarding content corresponding to a particular person through query expansion.
  • the search engine 110 receives a search query from a computer user, such as computer user 101, the search query requesting content corresponding to a particular person.
  • a search query is typically (though not exclusively) a text string.
  • a search query for content relating to a person may be "Bruce Wayne”.
  • the search engine attempts to uniquely identify the person who is the subject matter of the search query.
  • the search engine attempts to uniquely identify the person for which content is requested according to at least general information and specific information relating to the requesting computer user.
  • the general information includes, by way of illustration and not limitation: popularity of search queries corresponding to a person with the name identified in the search query; trending popularity of a person with the name identified in the search query; other terms and/or phrases in the search query (e.g., "Bruce Wayne Seattle” or "Bruce Wayne Microsoft”); an image representative of the person; and the like.
  • Specific information relating to the requesting computer user may include, by way of illustration and not limitation: current location; prior search query history; current and former workplaces; current and former educational institutions that were attended; social networks; preferences (both explicitly and implicitly identified); general graph
  • the search engine 110 may, at least internally, associate a globally unique identifier to the person who is the subject matter of the search query. Moreover, once the person who is the subject matter of the search query is identifier, the search engine 110 may use the associated globally unique identifier in obtaining, or reranking, search results in response to the search query.
  • auto-suggest search recommendations may indicate a particular person as one of the auto-suggestions and, typically, that suggested person's unique identity is known.
  • another service may submit a search request for a person that uniquely identities the person to the search service such that the identity of the person needs not be determined.
  • the search request identifying a person for whom content is sought, there may also be times in which the name of that person is not known but some information is provided that may lead to uniquely identifying that person.
  • the computer user may not know the name of the general manager of the Seattle Seahawks, but in submitting the text "general manager of the Seattle Seahawks" the computer user often sufficiently identifies the person for whom content is sought that, in block 204, the identity of the person can be determined.
  • related entity data includes entities related to the identified person.
  • a related entity is an entity with which the identified person is related for some reason. While some of the reasons may be known, others may be unknown and implied according to statistical similarities. For example, assume that the identified person is an employee of Company A and is a member of Workgroup Z. Related entities to the identified person, based on this employment relationship, would typically include "Company A" and "Workgroup Z". Other related entities arising from this same employment relationship may include fellow co-workers.
  • Still other entities may also include other (previous) workgroups, past and present co-workers, and the like.
  • the identified person may also be an alumnus of particular university.
  • the university may be a related entity to the identified person, as well as the particular college in the university where the identified person studied, the degree that was awarded, academic achievements of the identified person, fellow students, and the like.
  • the identified person may be a member of a local master gardeners society and, as a result, the local master gardeners society may be a related entity to the identified person as well as fellow members of the society.
  • the search engine 110 obtains related entity data from one or more related entity sources.
  • the search engine 110 may store host or store various information regarding the identified person from a user profile store (e.g., the user profile store 628 of Figure 6) and, therefore, be one of the related entity sources.
  • the search engine 110 may store user profile information corresponding to the computer user. This user profile information may be based on explicitly identified information (from the identified person) as well as implicitly identified information (such as information derived from search queries, browsing history, and the like.)
  • Social networking sites such as social networking site 116, represent additional related entity sources.
  • a social networking site enables a person, such as the identified person of the search query, to establish relationships and social networks with other entities (that includes people, organizations, activities, causes, and the like.)
  • entities that includes people, organizations, activities, causes, and the like.
  • the search engine 110 can be configured to obtained related entity data from any number of these related entity sources.
  • the related entity information that is hosted by each of the related entity sources may comprise information that the identified person wishes to keep private.
  • the search engine identifies the requesting computer user and, if identified, can use attempt to use the permissions afforded to the requesting computer user in obtaining the related entity information.
  • a computer user is required to authenticate himself or herself in order to access information regarding the identified person.
  • requirements may include, by way of illustration and not limitation, that the requesting computer user be logged into one or more services in order to access and/or view content that would otherwise be restricted.
  • a related entity source may associate one or more categories to an individual (such as the identified person of a search query). Accordingly, the related entity data obtained from the related entity sources may also include category data. Category data (both in regard to the set of potential
  • a related entity source may have associated various categories with the identified person including "Employee”, “Alumnus”, and “Gardener”. Moreover, each of the related entity sources may maintain category information that defines what is meant to be associated with the category. This category information often includes a list of potential, though not necessarily required, relationships that may exists between a first entity belonging to a specific category (such as the identified person) and other entities.
  • the "Employee” category may define a set of potential relationships as including “employer”, “work group”, “current manager”, “direct reports”, “co-worker” and the like.
  • each entity that is categorized as an “Employee” could then have relationships with other entities as defined by the set of potential relationships.
  • a category that defines a set of potential relationships an entity of that category is not required to be related to other entities based on each and every potential relationship.
  • a given entity such as an entity corresponding to the identified person of a search query, may be associated with a plurality of categories.
  • categories may also be inferred. For example, an employee may be interested in former work performed previously at a company such that an inferred category is "co-worker".
  • a search model is identified/determined to apply to the expanded search query.
  • This search model includes information for weighting various elements (terms and phrases) of the expanded search query to improve search results.
  • Applying a search model to the expanded search query recognizes, at least in part, that not all query terms of the expanded search query are equal, i.e., some query terms are more important in identifying relevant search content for the identified person than others.
  • favoring/weighting employment-related query terms or education-related query terms provides improved search results when the relevancy of the various search results (or, more accurately stated, the content referenced by the search results) are presented to a particular user.
  • selection of a search model may be based on information regarding the requesting computer user. For example, if it is known that the requesting computer user is in college then an education model may be selected. Alternatively, selection of a search model may be made according to information regarding the identified person, from information available to the search engine 110 or external sources including from the related entity data. In yet additional embodiments, selection of a search model may be made according to information regarding both the requesting computer user as well as the identified person of the search query.
  • an expanded search query is generated according to the determined search model for the identified person. Generating an expanded search query is discussed in greater detail in regard to Figure 3. More particularly, Figure 3 is a flow diagram illustrating an exemplary routine 300 for generating an expanded search query according to related entity data obtained from related entity sources.
  • the identified person and filter elements of the received search query are included as an initial section of the expanded search query. While this may entail simply copying the received search query into the initial section, the initial search query may not necessarily simply be copied. Often a requesting computer user may misspell the name of the person that is sought or any one of the identifying filter elements associated with the person.
  • a received search query may be "Bruse Wayn Microsoft", in an effort to find content corresponding to "Bruce Wayne” who works at "Microsoft". If it can be determined that the name (or one or more filter elements) is misspelled, it would be less productive to include the original search query in the expanded search query. Hence, in block 204 of routine 200, the person is identified. Correction to the filter elements may also be made (though not explicitly called out in routines 200 and 300.)
  • query terms are derived from the obtained related entity data and
  • the related entities (related to the identified person) from the obtained related entity data are included in a related entities section of the expanded search query in accordance with the determined search model.
  • query terms are derived from the category data including both the category (as an entity) and category entities (as described below) are included in a category entities section of the expanded search query according to the search model.
  • the expanded search query is returned and the routine 300 terminates.
  • Figure 4 illustrates an exemplary expanded search query 400 corresponding to the example above, i.e., for the person "Bruce Wayne”. For this example, it is assumed that this identified person, "Bruce Wayne”, was associated with only one category, Employee.
  • the initial section 402 includes the original search query text 404, "Bruce. Wayne”, as well as alternative names related to the identified person, in this case "Batman Dark.Knight Matches. Malone Caped.Crusader".
  • alternative names related to the identified person in this case "Batman Dark.Knight Matches. Malone Caped.Crusader”.
  • not all computer users will have access rights to all information. In the example able, not all people might know of the alternative names that might uniquely reference "Bruce Wayne”.
  • syntactical conventions include (by way of illustration and not limitation): the operator 408 "inbody:” indicating to the search engine 110 that it should match a document when any one of the words/terms between the parentheses is found in the body of the content; a "noalter:” operator that indicates that the spelling of the terms should not be modified; and a “norelax:” operator that indicates that the terms are important and may not be dropped in matching content.
  • the expanded search query 400 also includes a related entity section 412 that includes the related entities to the identified person of the search query, such as text 416 "Research”. Still further included in the expanded search query is a category entities section 414 that includes the category entities of category "Employee”. As mentioned above, the category entities section 414 includes the category (“Employee") as well as the category entities such as text 418 "Workgroup”. These entries optionally help produce results based on how the computer user likely knows the identified person, in this case "Bruce Wayne”. As can be seen, the expanded search query for a particular person takes a search query, such as "Bruce Wayne” and expands the query with related entities as well as category entities to better identify content corresponding to the identified person.
  • this operator operates to let the ranking of a document go up as a matching token/value is found in the document, such as "Research”. It operates such that the specified terms are not required to be found in a resulting document but, if found, will result in the document being ranked as more relevant.
  • the operator, "word:”, operates to match on a document if one or more of the tokens in the parenthesis, such as "Workgroup”, is found in the document. In a sense, the operator "word:” operates as a type of max (or maximum value) operator, comparing each token between the parenthesis to the document and returning the single maximum value of the rank of the tokens.
  • expanded queries 400 and 500 generally include textual tokens (such as "Bruce.Wayne"), it should be appreciated that this is illustrative and should not be viewed as limiting upon the disclosed subject matter.
  • one or more the tokens in an expanded search query could be specific identifiers that identify the sought- for person and/or related entities.
  • expanded search query 500 includes an operator 510 that includes a Facebook numerical identifier ("740049358") as well as an operator 512 that includes a Facebook user identifier ("t-drake").
  • any particular sources of identifiers may be used and Facebook identifiers are illustrative only.
  • an identified person may be associated with more than one category.
  • the expanded search query 400 of Figure 4 describes information from a single category, it is for illustration.
  • Figure 5 illustrates an exemplary expanded search query 500 corresponding to the example above, i.e., for the identified person "Bruce Wayne", but in this example includes information from two categories, Employer and Education.
  • the expanded search query 500 includes the initial section 502 as well as related entities section 504 and category entities section 506.
  • the expanded search queries become more detailed and encompassing to assist the search engine to identify content corresponding to the identified person of the search query.
  • search results are obtained according to the expanded search query.
  • Obtaining search results according to a search query in this case a search query with expanded terms according to related entities and categories is known in the art.
  • search results are obtained according to the query terms from the received search query and optionally according to the query terms derived from the related entity data.
  • the query terms of the expanded search query that are derived from the related entity data are intended to expand the scope of content/search results that correspond to the identified person, but these query terms that are derived from the related entity data are not mandatory terms.
  • the expanded search query expands the scope of content that potentially relates to the identified person rather than narrowing the scope of content if those query terms were not optional.
  • a search results presentation is generated, at least in part, according to the obtained search results.
  • one or more search results pages are generated according to the obtained search results, with those results scoring the highest being presented in the first pages of the presentation.
  • at block 216 after generating the search results presentation, at least a portion of the presentation is returned to the requesting computer user in response to the search query. According to various embodiments, the results that are returned to the requesting computer user are organized according to the various categories of information regarding the subject person. Thereafter, the routine 200 terminates.
  • routine 200 While not displayed in routine 200, additional steps may be taken after the results are returned to the computer user.
  • one or more processes on the computer user's device may monitor the computer user's activity with regard to the results provided, e.g., which references (hyperlinks) the computer user followed, which were avoided, how long the computer user spent with some content vs. other content, and the like.
  • inferences may be made regarding specific people and/or entities such that subsequent queries may take these inferences into account. Indeed, some or all of the inferences, both for and against specific results, may be used to form the search models discussed above.
  • routines 200 and 300 while these routines are expressed in regard to discrete steps, these steps should be viewed as being logical in nature and may or may not correspond to any actual and/or discrete steps of a particular implementation. Nor should the order in which these steps are presented in the various routines be construed as the only order in which the steps may be carried out. Moreover, while these routines include various novel features of the disclosed subject matter, other steps (not listed) may also be carried out in the execution of the routines. Further, those skilled in the art will appreciate that logical steps of these routines may be combined together or be comprised of multiple steps. Steps of routines 200 and 300 may be carried out in parallel or in series, or pre-computed.
  • routines Often, but not exclusively, the functionality of the various routines is embodied in software (e.g., applications, system services, libraries, and the like) that is executed on computer hardware and/or systems as described below in regard to Figure 6. In various embodiments, all or some of the various routines may also be embodied in hardware modules, including system on chips, on a computer system.
  • software e.g., applications, system services, libraries, and the like
  • all or some of the various routines may also be embodied in hardware modules, including system on chips, on a computer system.
  • routines embodied in applications (also referred to as computer programs), apps (small, generally single or narrow purposed, applications), and/or methods
  • these aspects may also be embodied as computer-executable instructions stored by computer-readable media, also referred to as computer-readable storage media.
  • computer-readable media can host computer-executable instructions for later retrieval and execution.
  • the computer-executable instructions stored on the computer-readable storage devices are executed, they carry out various steps, methods and/or functionality, including those steps, methods, and routines described above in regard to routines 200 and 300.
  • Examples of computer-readable media include, but are not limited to: optical storage media such as Blu-ray discs, digital video discs (DVDs), compact discs (CDs), optical disc cartridges, and the like; magnetic storage media including hard disk drives, floppy disks, magnetic tape, and the like; memory storage devices such as random access memory (RAM), read-only memory (ROM), memory cards, thumb drives, and the like; cloud storage (i.e., an online storage service); and the like.
  • optical storage media such as Blu-ray discs, digital video discs (DVDs), compact discs (CDs), optical disc cartridges, and the like
  • magnetic storage media including hard disk drives, floppy disks, magnetic tape, and the like
  • memory storage devices such as random access memory (RAM), read-only memory (ROM), memory cards, thumb drives, and the like
  • cloud storage i.e., an online storage service
  • FIG. 6 is a block diagram illustrating exemplary components of a search engine configured to provide improved results in response to a search query from a computer user.
  • the search engine 110 includes a processor 602 (or processing unit) and a memory 604 interconnected by way of a system bus 610.
  • memory 604 typically (but not always) comprises both volatile memory 606 and non-volatile memory 608.
  • Volatile memory 606 retains or stores information so long as the memory is supplied with power.
  • non-volatile memory 608 is capable of storing (or persisting) information even when a power supply is not available.
  • RAM and CPU cache memory are examples of volatile memory whereas ROM and memory cards are examples of nonvolatile memory.
  • the processor 602 executes instructions retrieved from the memory 604 in carrying out various functions, particularly in responding to search queries with improved results through query expansion.
  • the processor 602 may be comprised of any of various commercially available processors such as single -processor, multi-processor, single-core units, and multi-core units.
  • processors such as single -processor, multi-processor, single-core units, and multi-core units.
  • mainframe computers personal computers (e.g., desktop computers, laptop computers, tablet computers, etc.); handheld computing devices such as smartphones, personal digital assistants, and the like; microprocessor-based or programmable consumer electronics; game consoles, and the like.
  • the system bus 610 provides an interface for the various components to inter-communicate.
  • the system bus 610 can be of any of several types of bus structures that can interconnect the various components (including both internal and external components).
  • the search engine 110 further includes a network communication component 612 for interconnecting the network site with other computers (including, but not limited to, user computers such as user computers 102-106, other network sites including network sites 112-116) as well as other devices on a computer network 108.
  • the network communication component 612 may be configured to communicate with other devices and services on an external network, such as network 108, via a wired connection, a wireless connection, or both.
  • the search engine 110 also includes query topic identification component 614 that is configured to obtain identify the subject matter of the search query, such as a person identified in the search query, as described above. Also included in the search engine 110 is a related entity retrieval component 616.
  • the related entity retrieval component 616 obtains related entity data corresponding to related entities of the identified person (or, more generally, related entities of the subject matter of the search query). As previously mentioned, the related entity data includes related entities, categories associated with the identified person, as well as category data corresponding to the associated categories.
  • the related entity retrieval component 616 obtains the related entity data from related entity sources as described above in regard to Figure 2.
  • An expanded query generator 618 generates an expanded search query from the search query received from a computer user according to the related entity data obtained by the related entity retrieval component 616.
  • a search results retrieval component is configured to obtain search results from a content store 626 according to the expanded search query generated by the expanded query component 618.
  • a search model component 624 is configured to select a search model (as described above) and apply the search model to the obtained search results.
  • the search results presentation generator 620 generates a search results presentation, typically including one or more search results pages, for presentation to the requesting computer user in response to the search query.
  • the various components of the search engine 110 of Figure 6 described above may be implemented as executable software modules within the computer systems, as hardware modules (including SoCs - system on a chip), or a combination of the two. Moreover, each of the various components may be implemented as an independent, cooperative process or device, operating in conjunction with one or more computer systems. It should be further appreciated, of course, that the various components described above in regard to the search engine 110 should be viewed as logical components for carrying out the various described functions. As those skilled in the art appreciate, logical components (or subsystems) may or may not correspond directly, in a one-to-one manner, to actual, discrete components. In an actual embodiment, the various components of each computer system may be combined together or broke up across multiple actual components and/or implemented as cooperative processes on a computer network 108.
  • aspects of the disclosed subject matter may be implemented on other computing devices and/or distributed on multiple computing devices, including a computer user's device.
  • at least some highly relevant content to a search request may be hosted on a site that is access-protected, i.e., the content is available to the computer user when he/she is authenticated and/or maintains an open log-in status with the site, but the content is otherwise restricted to others.
  • a search engine may indirectly obtain related entity data from this access- restricted site by way of the computer user's device; the computer user's device (e.g., upon which the computer user maintains a current logged in status with the site) accesses related entity data on behalf of the search service.
  • the computer user's device e.g., upon which the computer user maintains a current logged in status with the site accesses related entity data on behalf of the search service.
  • one or more components on the computer user's device obtain data corresponding to others from the access restricted sites in anticipation of a search request.
  • aspects of the disclosed subject matter may be suitably and advantageously applied to auto-generation of content relating to people.
  • various search queries regarding one or more persons may be made such that the "latest" content on the Internet regarding that person (or persons) may already be available when requested.
  • Yet another example would be to set up an environment such that a user may be notified when a new image/video/news story of that user occurs on the Internet.
  • aspects of the disclosed subject matter may be applied to topics or entities other than people.
  • an auto-generation page may be set up to display the latest regarding rock climbing, the Supreme Court, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Presented are systems and methods, as well as computer readable media, for responding to a search query for content (or references to content) relating to a person identified in the search query. According to various embodiments, upon receiving a search query from a computer user, related entity data is obtained from at least one related entity source for the identified person. Related entity data comprises at least one of a related entity (or entities) or a category associated with the identified person. An expanded search query is generated according to the search query from the computer user and the related entity data. Search results are obtained according to the expanded search query and a search results presentation is generated and returned to the computer user in response to the search query.

Description

PERSON SEARCH UTILIZING ENTITY EXPANSION
BACKGROUND
[0001] Locating content regarding a specific person on the Internet can be challenging. There are many factors that make "people search" difficult: most names are not unique. In any given area there may be several individuals with the same name. Additionally, the web presence of any given person may be low such that search results for that person will be dominated by results referring to a better known individual with the same name.
SUMMARY
[0002] The following Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. The
Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
[0003] According to aspects of the disclosed subject matter, a search query is received from a computer user, the search query identifying a person for which content (or references to content) is sought. Upon receiving the search query from a computer user, related entity data is obtained from at least one related entity source for the identified person. Related entity data comprises at least one of a related entity (or entities) or a category associated with the identified person. An expanded search query is generated according to the search query from the computer user and the related entity data. Search results are obtained according to the expanded search query and a search results presentation is generated and returned to the computer user in response to the search query.
[0004] According to further aspects of the disclosed subject matter, a computer-readable medium bearing computer-executable instructions is presented. When executed on a computing system comprising at least a processor executing the instructions retrieved from the medium, the computing system is configured to carry out a method for responding to a search query from a user. More particularly, in response to receiving a search query from a computer user, where the search query identifies a person for which content (or references to content) is sought, related entity data is obtained from at least one related entity source for the identified person. An expanded search query is generated according to the search query from the computer user and the related entity data. Search results are obtained according to the expanded search query and a search results presentation is generated and returned to the computer user in response to the search query.
[0005] According still further aspects of the disclosed subject matter, a computer system for responding to a search query for content related to a person is presented. The computer system comprises a processor and a memory, wherein the processor executes instructions stored in the memory as part of or in conjunction with additional components to respond to a search query for content related to a person. These additional components include (by way of illustration and not limitation) a query topic identification component, a related entity retrieval component, an expanded query generator, a search results retrieval component, and a search results presentation generator. In operation, the query topic identification component configured to determine the identity of a person from the search query for which related content is sought. The related entity retrieval component obtains related entity data corresponding to the identified person from a related entity source. After obtaining related entity data, the expanded query generator generates an expanded query from the search query for content related to the identified person and from the related entity data. According to various embodiments, the related entity data comprises at least one of a related entity or a category associated with the identified person of the search query. The search results retrieval component obtains search results from a content store according to the expanded search query. Thereafter, the search results presentation generator generates a search results presentation according to the search results referencing content corresponding to the identified person and returns the search results presentation to the computer user.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The foregoing aspects and many of the attendant advantages of the disclosed subject matter will become more readily appreciated as they are better understood by reference to the following description when taken in conjunction with the following drawings, wherein:
[0007] Figure 1 is a block diagram of a networked environment suitable for
implementing aspects of the disclosed subject matter;
[0008] Figure 2 is a flow diagram illustrating an exemplary routine for providing improved results in response to a search query regarding content for a particular person through query expansion;
[0009] Figure 3 is a flow diagram illustrating an exemplary routine for generating an expanded search query according to aspects of the disclosed subject matter; [0010] Figures 4 and 5 illustrate elements of expanded search queries; and
[0011] Figure 6 is a block diagram illustrating exemplary components of a search engine configured to provide improved results in response to a search query from a computer user.
DETAILED DESCRIPTION
[0012] For purposed of clarity, the use of the term "exemplary" in this document should be interpreted as serving as an illustration or example of something, and it should not be interpreted as an ideal and/or a leading illustration of that thing. An entity corresponds to an abstract or tangible thing that includes, by way of illustration and not limitation: person, a place, a group, a concept, an activity, and the like.
[0013] Turning to Figure 1, Figure 1 is a block diagram illustrating an exemplary networked environment 100 suitable for implementing aspects of the disclosed subject matter, particularly in regard to providing improved search results to a computer user in response to a search query regarding a person. The exemplary networked environment 100 includes one or more user computers, such as user computers 102-106, connected to a network 108, such as the Internet, a wide area network or WAN, and the like. User computers include, by way of illustration and not limitation: desktop computers (such as desktop computer 104); laptop computers (such as laptop computer 102); tablet computers (such as tablet computer 106); mobile devices (not shown); game consoles (not shown); personal digital assistants (not shown); and the like. User computers may be configured to connect to the network 108 by way of wired and/or wireless connections. For purposes of illustration only, the exemplary networked environment 100 illustrates the network 108 as being located between the user computers 102-106 and the search engine 110, and again between the search engine 110 and the network sites 112-116. This illustration, however, should not be construed as suggesting that these are separate networks.
[0014] Also connected to the network 108 are various networked sites, including network sites 110-116. By way of example and not limitation, the networked sites connected to the network 108 include a search engine 110 configured to respond to search queries from computer users, news sources 112 and 114 which host various news articles and content, a social networking site 116, and the like. A computer user, such as computer user 101, may navigate via a user computer, such as user computer 102, to these and other networked sites to access content, including news content.
[0015] According to aspects of the disclosed subject matter, the search engine 110 is configured to provide search results (typically in the form of references to content available on the network 108) in response to a search query from a computer user. In particular, in response to receiving a search query from a computer user for information regarding a particular person, the search engine 110 identifies content related to the identified person according to information in its content store, generates a search results presentation based on at least some of the identified content, and provides the search results presentation to the computer user.
[0016] Figure 1 also illustratively includes a social network site 116 and various news sources, including news sites 112-114. As will be readily appreciated, a social network site 116 is an online site/service that provides a platform in which a computer user can establish a profile describing various aspects of the user, build relationships and social networks with other computer users, groups, and the like. In a social network site 116, a computer user can establish or indicate various interests, activities, and backgrounds with those in his/her social network. Indeed, those skilled in the art will appreciate that a computer user is often able to indicate a preference or an interest in a particular entity on a social networking service as might be hosted by social networking site 116, whether that entity is a person, a place, a group, a concept, an activity, and the like. Though only one social network site 116 is included in the illustrative network environment 100, this is merely illustrative and should not be viewed as limiting upon the disclosed subject matter. In an actual embodiment, there may be any number of social network sites connected to the network 108.
[0017] As is known in the art, the search engine 110 is configured to communicate (directly or indirectly through services calls and/or web crawlers) with multiple content sources, including news sites 112 and 114, social networking site 116, and other sites such as blogs and registries (not shown) to obtain information regarding the content that is available at each network site. Information regarding available content may also be pushed to the search engine from various services and/or networking sites. This information is stored (typically as references to the content) in a content store such that the search engine can obtain content from this content store in order to respond to a search query from a computer user, such as computer user 101. The search engine 110 may also obtain information regarding any given individual from search query logs, network browsing histories, purchase histories, and the like. This information and the content obtained from the various network sites is typically indexed according to key words and phrases such that the information may be quickly identified and accessed. Further, in addition to information that is stored in the search engine's content store, a search engine 110 may also be configured to obtain information from other network sites when responding to a search query. For example, according to aspects of the disclosed subject matter, when responding to a search query, the search engine 1 10 may obtain data from one or more social networking sites, such as social network site 116, as relevant information to return to the requesting computer user and/or as information to assist the search engine in identifying relevant information to return to the requesting computer user.
[0018] To further illustrate aspects of the disclosed subject matter, reference is now made to Figure 2. Figure 2 is a flow diagram of an exemplary routine for providing improved results in response to a search query regarding content corresponding to a particular person through query expansion. Beginning at block 202, the search engine 110 receives a search query from a computer user, such as computer user 101, the search query requesting content corresponding to a particular person.
[0019] As will be readily appreciated, a search query is typically (though not exclusively) a text string. For example, a search query for content relating to a person may be "Bruce Wayne". Accordingly, as there may be several individuals who have the same name, at block 204, the search engine attempts to uniquely identify the person who is the subject matter of the search query. According to aspects of the disclosed subject matter, the search engine attempts to uniquely identify the person for which content is requested according to at least general information and specific information relating to the requesting computer user. The general information includes, by way of illustration and not limitation: popularity of search queries corresponding to a person with the name identified in the search query; trending popularity of a person with the name identified in the search query; other terms and/or phrases in the search query (e.g., "Bruce Wayne Seattle" or "Bruce Wayne Microsoft"); an image representative of the person; and the like. Specific information relating to the requesting computer user may include, by way of illustration and not limitation: current location; prior search query history; current and former workplaces; current and former educational institutions that were attended; social networks; preferences (both explicitly and implicitly identified); general graph
connectivity between the requesting computer user and potential subjects of a search query as well as the number of mutual friends; physical distance between the requesting user and the potential subjects; location of friends; former locations; and the like. Typically, though not exclusively, the search engine 110 may, at least internally, associate a globally unique identifier to the person who is the subject matter of the search query. Moreover, once the person who is the subject matter of the search query is identifier, the search engine 110 may use the associated globally unique identifier in obtaining, or reranking, search results in response to the search query.
[0020] Of course, the order presented in blocks 202 and 204 should be viewed as illustrative and not limiting upon the disclosed subject matter. Under various conditions, the identity of a person for whom content is sought may be known prior to
submitting/receiving a search request. For example, auto-suggest search recommendations may indicate a particular person as one of the auto-suggestions and, typically, that suggested person's unique identity is known. Alternatively, another service may submit a search request for a person that uniquely identities the person to the search service such that the identity of the person needs not be determined. Accordingly, while one embodiment is disclosed in regard to blocks 202 and 204 of Figure 2, this is illustrative of one embodiment, and is not limiting upon the disclosed subject matter.
[0021] In regard to the search request identifying a person for whom content is sought, there may also be times in which the name of that person is not known but some information is provided that may lead to uniquely identifying that person. For example, the computer user may not know the name of the general manager of the Seattle Seahawks, but in submitting the text "general manager of the Seattle Seahawks" the computer user often sufficiently identifies the person for whom content is sought that, in block 204, the identity of the person can be determined.
[0022] At block 206, after having identified the person who is the subject matter of the search query, the search engine 110 obtains related entity data corresponding to the identified person. According to aspects of the disclosed subject matter, related entity data includes entities related to the identified person. A related entity is an entity with which the identified person is related for some reason. While some of the reasons may be known, others may be unknown and implied according to statistical similarities. For example, assume that the identified person is an employee of Company A and is a member of Workgroup Z. Related entities to the identified person, based on this employment relationship, would typically include "Company A" and "Workgroup Z". Other related entities arising from this same employment relationship may include fellow co-workers. Still other entities, based on this same employment relationship, may also include other (previous) workgroups, past and present co-workers, and the like. In furtherance of the example above, the identified person may also be an alumnus of particular university. Hence, the university may be a related entity to the identified person, as well as the particular college in the university where the identified person studied, the degree that was awarded, academic achievements of the identified person, fellow students, and the like. Still further, assuming that the identified person also has a passion for gardening, the identified person may be a member of a local master gardeners society and, as a result, the local master gardeners society may be a related entity to the identified person as well as fellow members of the society.
[0023] According to aspects of the disclosed subject matter, the search engine 110 obtains related entity data from one or more related entity sources. The search engine 110 may store host or store various information regarding the identified person from a user profile store (e.g., the user profile store 628 of Figure 6) and, therefore, be one of the related entity sources. For example, the search engine 110 may store user profile information corresponding to the computer user. This user profile information may be based on explicitly identified information (from the identified person) as well as implicitly identified information (such as information derived from search queries, browsing history, and the like.) Social networking sites, such as social networking site 116, represent additional related entity sources. As indicated above, a social networking site enables a person, such as the identified person of the search query, to establish relationships and social networks with other entities (that includes people, organizations, activities, causes, and the like.) Of course, there may be a variety of related entity sources, each of which hosting information that may indicate a relationship between the identified person and other entities, and the search engine 110 can be configured to obtained related entity data from any number of these related entity sources.
[0024] It should be appreciated that the related entity information that is hosted by each of the related entity sources may comprise information that the identified person wishes to keep private. To resolve this, according to aspects of the disclosed subject the search engine identifies the requesting computer user and, if identified, can use attempt to use the permissions afforded to the requesting computer user in obtaining the related entity information. In various embodiments, a computer user is required to authenticate himself or herself in order to access information regarding the identified person. Other
requirements may include, by way of illustration and not limitation, that the requesting computer user be logged into one or more services in order to access and/or view content that would otherwise be restricted.
[0025] As suggested in regard to the examples above, a related entity source may associate one or more categories to an individual (such as the identified person of a search query). Accordingly, the related entity data obtained from the related entity sources may also include category data. Category data (both in regard to the set of potential
relationships defined by the category as well as the actual relationships of a person per a category) may be advantageously used in expanding a received search query (as discussed in greater detail below.) In the example above, a related entity source may have associated various categories with the identified person including "Employee", "Alumnus", and "Gardener". Moreover, each of the related entity sources may maintain category information that defines what is meant to be associated with the category. This category information often includes a list of potential, though not necessarily required, relationships that may exists between a first entity belonging to a specific category (such as the identified person) and other entities. The "Employee" category may define a set of potential relationships as including "employer", "work group", "current manager", "direct reports", "co-worker" and the like. Correspondingly, each entity that is categorized as an "Employee" could then have relationships with other entities as defined by the set of potential relationships. Of course, while a category that defines a set of potential relationships, an entity of that category is not required to be related to other entities based on each and every potential relationship. Further still, a given entity, such as an entity corresponding to the identified person of a search query, may be associated with a plurality of categories. In addition to defined categories, categories may also be inferred. For example, an employee may be interested in former work performed previously at a company such that an inferred category is "co-worker".
[0026] At block 208, a search model is identified/determined to apply to the expanded search query. This search model includes information for weighting various elements (terms and phrases) of the expanded search query to improve search results. Applying a search model to the expanded search query recognizes, at least in part, that not all query terms of the expanded search query are equal, i.e., some query terms are more important in identifying relevant search content for the identified person than others. Typically, though not exclusively, favoring/weighting employment-related query terms or education-related query terms provides improved search results when the relevancy of the various search results (or, more accurately stated, the content referenced by the search results) are presented to a particular user. According to various embodiments, selection of a search model may be based on information regarding the requesting computer user. For example, if it is known that the requesting computer user is in college then an education model may be selected. Alternatively, selection of a search model may be made according to information regarding the identified person, from information available to the search engine 110 or external sources including from the related entity data. In yet additional embodiments, selection of a search model may be made according to information regarding both the requesting computer user as well as the identified person of the search query.
[0027] At block 210, an expanded search query is generated according to the determined search model for the identified person. Generating an expanded search query is discussed in greater detail in regard to Figure 3. More particularly, Figure 3 is a flow diagram illustrating an exemplary routine 300 for generating an expanded search query according to related entity data obtained from related entity sources. At block 302, the identified person and filter elements of the received search query are included as an initial section of the expanded search query. While this may entail simply copying the received search query into the initial section, the initial search query may not necessarily simply be copied. Often a requesting computer user may misspell the name of the person that is sought or any one of the identifying filter elements associated with the person. For example, a received search query may be "Bruse Wayn Microsoft", in an effort to find content corresponding to "Bruce Wayne" who works at "Microsoft". If it can be determined that the name (or one or more filter elements) is misspelled, it would be less productive to include the original search query in the expanded search query. Hence, in block 204 of routine 200, the person is identified. Correction to the filter elements may also be made (though not explicitly called out in routines 200 and 300.)
[0028] In addition to including the query terms of the search query into the expanded search query, query terms are derived from the obtained related entity data and
included/incorporated in the expanded search query. In particular, at block 304, the related entities (related to the identified person) from the obtained related entity data are included in a related entities section of the expanded search query in accordance with the determined search model. At block 306, query terms are derived from the category data including both the category (as an entity) and category entities (as described below) are included in a category entities section of the expanded search query according to the search model. Thereafter, at block 308, the expanded search query is returned and the routine 300 terminates.
[0029] To better illustrate the above-described sections of the expanded search query, reference is made to Figure 4. Figure 4 illustrates an exemplary expanded search query 400 corresponding to the example above, i.e., for the person "Bruce Wayne". For this example, it is assumed that this identified person, "Bruce Wayne", was associated with only one category, Employee. As shown in the expanded search query 400, the initial section 402 includes the original search query text 404, "Bruce. Wayne", as well as alternative names related to the identified person, in this case "Batman Dark.Knight Matches. Malone Caped.Crusader". Of course, not all computer users will have access rights to all information. In the example able, not all people might know of the alternative names that might uniquely reference "Bruce Wayne". However, when the requesting computer user has full rights, such information may be useful to obtain improved results. Regarding the operator 406 "." between the two names of the search query, this is representative of an exemplary convention to indicate that the two names, "Bruce" and "Wayne", should be viewed as preferring "Bruce" occurring next to "Wayne" in that order, though it is not mandatory that the occur together or that both must occur - only that it is highly preferred. Of course, this convention (as well as the other operators in this Figure) is illustrative only and should not be viewed as limiting upon the disclosed subject matter. Other syntactical conventions include (by way of illustration and not limitation): the operator 408 "inbody:" indicating to the search engine 110 that it should match a document when any one of the words/terms between the parentheses is found in the body of the content; a "noalter:" operator that indicates that the spelling of the terms should not be modified; and a "norelax:" operator that indicates that the terms are important and may not be dropped in matching content. The operator 410 'Vindicates to a search engine a concatenation of other search operators and/or tokens.
[0030] The expanded search query 400 also includes a related entity section 412 that includes the related entities to the identified person of the search query, such as text 416 "Research". Still further included in the expanded search query is a category entities section 414 that includes the category entities of category "Employee". As mentioned above, the category entities section 414 includes the category ("Employee") as well as the category entities such as text 418 "Workgroup". These entries optionally help produce results based on how the computer user likely knows the identified person, in this case "Bruce Wayne". As can be seen, the expanded search query for a particular person takes a search query, such as "Bruce Wayne" and expands the query with related entities as well as category entities to better identify content corresponding to the identified person.
Regarding the operator "rankonly:", this operator operates to let the ranking of a document go up as a matching token/value is found in the document, such as "Research". It operates such that the specified terms are not required to be found in a resulting document but, if found, will result in the document being ranked as more relevant. The operator, "word:", operates to match on a document if one or more of the tokens in the parenthesis, such as "Workgroup", is found in the document. In a sense, the operator "word:" operates as a type of max (or maximum value) operator, comparing each token between the parenthesis to the document and returning the single maximum value of the rank of the tokens.
Specifically, if more than one token match, only the value of the greatest match token is returned. A "norank:" token (not shown) would require that the specified tokens
(identified between the enclosing parentheses) be required in a results document but doesn't affect the ordering or relevance of the document in the overall results. In combination with the operator "rankonly:", the rank of a document in which the rank of the document is increased if any one or more of the tokens is found.
[0031] While the expanded queries 400 and 500 generally include textual tokens (such as "Bruce.Wayne"), it should be appreciated that this is illustrative and should not be viewed as limiting upon the disclosed subject matter. In alternative embodiments, one or more the tokens in an expanded search query could be specific identifiers that identify the sought- for person and/or related entities. For example, expanded search query 500 includes an operator 510 that includes a Facebook numerical identifier ("740049358") as well as an operator 512 that includes a Facebook user identifier ("t-drake"). Of course, any particular sources of identifiers may be used and Facebook identifiers are illustrative only.
[0032] As suggested above, an identified person may be associated with more than one category. Hence, while the expanded search query 400 of Figure 4 describes information from a single category, it is for illustration. Similarly, Figure 5 illustrates an exemplary expanded search query 500 corresponding to the example above, i.e., for the identified person "Bruce Wayne", but in this example includes information from two categories, Employer and Education. As can be seen, the expanded search query 500 includes the initial section 502 as well as related entities section 504 and category entities section 506. As can be seen in the related entities section 504 and category entities section 506, as more related entities are found for the identified person and as more information corresponding to various categories for the identified person are obtained, the expanded search queries become more detailed and encompassing to assist the search engine to identify content corresponding to the identified person of the search query.
[0033] At block 212 search results are obtained according to the expanded search query. Obtaining search results according to a search query, in this case a search query with expanded terms according to related entities and categories is known in the art. According to aspects of the disclosed subject matter, search results are obtained according to the query terms from the received search query and optionally according to the query terms derived from the related entity data. Stated differently, the query terms of the expanded search query that are derived from the related entity data are intended to expand the scope of content/search results that correspond to the identified person, but these query terms that are derived from the related entity data are not mandatory terms. In this manner (i.e., that the query terms derived from the related entity data are "optional"), the expanded search query expands the scope of content that potentially relates to the identified person rather than narrowing the scope of content if those query terms were not optional.
[0034] At block 214, a search results presentation is generated, at least in part, according to the obtained search results. Typically, one or more search results pages are generated according to the obtained search results, with those results scoring the highest being presented in the first pages of the presentation. At block 216, after generating the search results presentation, at least a portion of the presentation is returned to the requesting computer user in response to the search query. According to various embodiments, the results that are returned to the requesting computer user are organized according to the various categories of information regarding the subject person. Thereafter, the routine 200 terminates.
[0035] While not displayed in routine 200, additional steps may be taken after the results are returned to the computer user. By way of illustration and not limitation, one or more processes on the computer user's device may monitor the computer user's activity with regard to the results provided, e.g., which references (hyperlinks) the computer user followed, which were avoided, how long the computer user spent with some content vs. other content, and the like. By monitoring the computer user's activity and submitting it to the search engine, inferences may be made regarding specific people and/or entities such that subsequent queries may take these inferences into account. Indeed, some or all of the inferences, both for and against specific results, may be used to form the search models discussed above.
[0036] Regarding routines 200 and 300, while these routines are expressed in regard to discrete steps, these steps should be viewed as being logical in nature and may or may not correspond to any actual and/or discrete steps of a particular implementation. Nor should the order in which these steps are presented in the various routines be construed as the only order in which the steps may be carried out. Moreover, while these routines include various novel features of the disclosed subject matter, other steps (not listed) may also be carried out in the execution of the routines. Further, those skilled in the art will appreciate that logical steps of these routines may be combined together or be comprised of multiple steps. Steps of routines 200 and 300 may be carried out in parallel or in series, or pre-computed. Often, but not exclusively, the functionality of the various routines is embodied in software (e.g., applications, system services, libraries, and the like) that is executed on computer hardware and/or systems as described below in regard to Figure 6. In various embodiments, all or some of the various routines may also be embodied in hardware modules, including system on chips, on a computer system.
[0037] While many novel aspects of the disclosed subject matter are expressed in routines embodied in applications (also referred to as computer programs), apps (small, generally single or narrow purposed, applications), and/or methods, these aspects may also be embodied as computer-executable instructions stored by computer-readable media, also referred to as computer-readable storage media. As those skilled in the art will recognize, computer-readable media can host computer-executable instructions for later retrieval and execution. When the computer-executable instructions stored on the computer-readable storage devices are executed, they carry out various steps, methods and/or functionality, including those steps, methods, and routines described above in regard to routines 200 and 300. Examples of computer-readable media include, but are not limited to: optical storage media such as Blu-ray discs, digital video discs (DVDs), compact discs (CDs), optical disc cartridges, and the like; magnetic storage media including hard disk drives, floppy disks, magnetic tape, and the like; memory storage devices such as random access memory (RAM), read-only memory (ROM), memory cards, thumb drives, and the like; cloud storage (i.e., an online storage service); and the like. For purposes of this disclosure, however, computer-readable media expressly excludes carrier waves and propagated signals.
[0038] Turning now to Figure 6, Figure 6 is a block diagram illustrating exemplary components of a search engine configured to provide improved results in response to a search query from a computer user. As shown in Figure 6, the search engine 110 includes a processor 602 (or processing unit) and a memory 604 interconnected by way of a system bus 610. As those skilled in the art will appreciated, memory 604 typically (but not always) comprises both volatile memory 606 and non-volatile memory 608. Volatile memory 606 retains or stores information so long as the memory is supplied with power. In contrast, non-volatile memory 608 is capable of storing (or persisting) information even when a power supply is not available. Generally speaking, RAM and CPU cache memory are examples of volatile memory whereas ROM and memory cards are examples of nonvolatile memory.
[0039] The processor 602 executes instructions retrieved from the memory 604 in carrying out various functions, particularly in responding to search queries with improved results through query expansion. The processor 602 may be comprised of any of various commercially available processors such as single -processor, multi-processor, single-core units, and multi-core units. Moreover, those skilled in the art will appreciate that the novel aspects of the disclosed subject matter may be practiced with other computer system configurations, including but not limited to: mini-computers; mainframe computers, personal computers (e.g., desktop computers, laptop computers, tablet computers, etc.); handheld computing devices such as smartphones, personal digital assistants, and the like; microprocessor-based or programmable consumer electronics; game consoles, and the like.
[0040] The system bus 610 provides an interface for the various components to inter-communicate. The system bus 610 can be of any of several types of bus structures that can interconnect the various components (including both internal and external components). The search engine 110 further includes a network communication component 612 for interconnecting the network site with other computers (including, but not limited to, user computers such as user computers 102-106, other network sites including network sites 112-116) as well as other devices on a computer network 108. The network communication component 612 may be configured to communicate with other devices and services on an external network, such as network 108, via a wired connection, a wireless connection, or both.
[0041] The search engine 110 also includes query topic identification component 614 that is configured to obtain identify the subject matter of the search query, such as a person identified in the search query, as described above. Also included in the search engine 110 is a related entity retrieval component 616. The related entity retrieval component 616 obtains related entity data corresponding to related entities of the identified person (or, more generally, related entities of the subject matter of the search query). As previously mentioned, the related entity data includes related entities, categories associated with the identified person, as well as category data corresponding to the associated categories. The related entity retrieval component 616 obtains the related entity data from related entity sources as described above in regard to Figure 2. An expanded query generator 618 generates an expanded search query from the search query received from a computer user according to the related entity data obtained by the related entity retrieval component 616.
[0042] A search results retrieval component is configured to obtain search results from a content store 626 according to the expanded search query generated by the expanded query component 618. A search model component 624 is configured to select a search model (as described above) and apply the search model to the obtained search results. The search results presentation generator 620 generates a search results presentation, typically including one or more search results pages, for presentation to the requesting computer user in response to the search query.
[0043] Those skilled in the art will appreciate that the various components of the search engine 110 of Figure 6 described above may be implemented as executable software modules within the computer systems, as hardware modules (including SoCs - system on a chip), or a combination of the two. Moreover, each of the various components may be implemented as an independent, cooperative process or device, operating in conjunction with one or more computer systems. It should be further appreciated, of course, that the various components described above in regard to the search engine 110 should be viewed as logical components for carrying out the various described functions. As those skilled in the art appreciate, logical components (or subsystems) may or may not correspond directly, in a one-to-one manner, to actual, discrete components. In an actual embodiment, the various components of each computer system may be combined together or broke up across multiple actual components and/or implemented as cooperative processes on a computer network 108.
[0044] In addition to operating on a search engine 110, aspects of the disclosed subject matter may be implemented on other computing devices and/or distributed on multiple computing devices, including a computer user's device. For example, according to various embodiments at least some highly relevant content to a search request may be hosted on a site that is access-protected, i.e., the content is available to the computer user when he/she is authenticated and/or maintains an open log-in status with the site, but the content is otherwise restricted to others. In response to a search request from the computer user, a search engine (or other service) may indirectly obtain related entity data from this access- restricted site by way of the computer user's device; the computer user's device (e.g., upon which the computer user maintains a current logged in status with the site) accesses related entity data on behalf of the search service. Indeed, in various embodiments, one or more components on the computer user's device obtain data corresponding to others from the access restricted sites in anticipation of a search request.
[0045] While much of the disclosed subject matter has be made in regard to a computer user taking an active role in obtaining content relating to a particular person, aspects of the disclosed subject matter may be suitably and advantageously applied to auto-generation of content relating to people. For example, various search queries regarding one or more persons (expanded search queries) may be made such that the "latest" content on the Internet regarding that person (or persons) may already be available when requested. Yet another example would be to set up an environment such that a user may be notified when a new image/video/news story of that user occurs on the Internet. Of course, aspects of the disclosed subject matter may be applied to topics or entities other than people. For example, an auto-generation page may be set up to display the latest regarding rock climbing, the Supreme Court, and the like.
[0046] While various novel aspects of the disclosed subject matter have been described, it should be appreciated that these aspects are exemplary and should not be construed as limiting. Variations and alterations to the various aspects may be made without departing from the scope of the disclosed subject matter.

Claims

1. A computer-implemented method for responding to a search query, the method comprising:
receiving a search query, the search query identifying a person for which content relating to the person is sought;
obtaining related entity data from a related entity source, the related entity data comprising at least one of a related entity to or a category associated with the identified person of the search query;
generating an expanded search query based on the received search query and the related entity data;
obtaining search results according to the expanded search query;
generating a search results presentation according to the obtained search results; and
returning the search results presentation in response to the search query.
2. The computer-implemented method of Claim 1 :
wherein generating an expanded search query comprises incorporating query terms derived from the related entity data with query terms of the received search query; and
wherein obtaining search results according to the expanded search query comprises obtaining search results according to query terms of the received search query and optionally according to the query terms derived from the related entity data.
3. The computer-implemented method of Claim 2, wherein the category associated with the identified person is included in the expanded search query as a query term derived from the related entity data.
4. The computer-implemented method of Claim 3, wherein the related entity data comprises a plurality of categories associated with the identified person.
5. The computer-implemented method of Claim 4, wherein the related entity data includes category data corresponding to each category in the related entity data, and wherein the category data includes one or more category entities that are incorporated as query terms into the expanded search query.
6. The computer-implemented method of Claim 2 further comprising selecting a search model from a plurality of search models and applying the selected search model to the generation of the expanded search query,
wherein selecting a search model from a plurality of search models comprises one or more of:
selecting a search model according to information corresponding to a requesting computer user; and
selecting a search model according to information corresponding to the identified person.
7. A computer-readable medium bearing computer-executable instructions which, when executed on a computing system comprising at least a processor executing the instructions retrieved from the medium, carry out any of the methods set forth in regard to Claims 1-6.
8. A computer system for responding to a search query for content related to a person, the system comprising a processor and a memory, wherein the processor executes instructions stored in the memory as part of or in conjunction with additional components to respond to a search query for content related to a person, the additional components comprising:
a query topic identification component configured to determine the identity of a person from the search query for which related content is sought;
a related entity retrieval component for obtaining related entity data
corresponding to the identified person from a related entity source;
an expanded query generator to generate an expanded query from the search query for content related to the identified person and from the related entity data, wherein the related entity data comprises at least one of a related entity or a category associated with the identified person of the search query;
a search results retrieval component configure to obtain search results from a content store, the search results referencing content corresponding to the identified person, according to the expanded search query; and
a search results presentation generator configured to generate a search results presentation according to the search results referencing content corresponding to the identified person and return the search results presentation to the computer user.
9. The computer system of Claim 8, wherein the search results retrieval component obtains search results corresponding to the identified person according to the expanded search query by obtaining search results according to query terms of the received search query and optionally according to the query terms incorporated into the expanded search query from the related entity data.
10. The computer system of Claim 8 further comprising:
a search model component configured to select a search model from a plurality of search models; and supply the search model to the expanded query generator for generating an expanded query from the search query from the related entity data according to the search model;
wherein the expanded query generator generates an expanded query according to the search model; and
wherein the search results presentation generator generates the search results presentation according to the search model.
PCT/US2014/043750 2013-06-29 2014-06-24 Person search utilizing entity expansion WO2014209925A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201480037264.2A CN105493082A (en) 2013-06-29 2014-06-24 Person search utilizing entity expansion
KR1020157036770A KR20160026907A (en) 2013-06-29 2014-06-24 Person search utilizing entity expansion
EP14740077.4A EP3014486A1 (en) 2013-06-29 2014-06-24 Person search utilizing entity expansion

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/931,922 2013-06-29
US13/931,922 US20150006520A1 (en) 2013-06-10 2013-06-29 Person Search Utilizing Entity Expansion

Publications (1)

Publication Number Publication Date
WO2014209925A1 true WO2014209925A1 (en) 2014-12-31

Family

ID=51210813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/043750 WO2014209925A1 (en) 2013-06-29 2014-06-24 Person search utilizing entity expansion

Country Status (4)

Country Link
EP (1) EP3014486A1 (en)
KR (1) KR20160026907A (en)
CN (1) CN105493082A (en)
WO (1) WO2014209925A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110291515A (en) * 2017-02-13 2019-09-27 微软技术许可有限责任公司 Distributed index search in computing system
CN113297452A (en) * 2020-05-26 2021-08-24 阿里巴巴集团控股有限公司 Multi-level search method, multi-level search device and electronic equipment

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10423652B2 (en) * 2016-08-08 2019-09-24 Baidu Usa Llc Knowledge graph entity reconciler
KR102017853B1 (en) * 2016-09-06 2019-09-03 주식회사 카카오 Method and apparatus for searching
US10467229B2 (en) * 2016-09-30 2019-11-05 Microsoft Technology Licensing, Llc. Query-time analytics on graph queries spanning subgraphs
US10242223B2 (en) 2017-02-27 2019-03-26 Microsoft Technology Licensing, Llc Access controlled graph query spanning
US11132408B2 (en) * 2018-01-08 2021-09-28 International Business Machines Corporation Knowledge-graph based question correction
US11288320B2 (en) * 2019-06-05 2022-03-29 International Business Machines Corporation Methods and systems for providing suggestions to complete query sessions
CN113111647B (en) * 2021-04-06 2022-09-06 北京字跳网络技术有限公司 Information processing method and device, terminal and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2779208C (en) * 2009-10-30 2016-03-22 Evri, Inc. Improving keyword-based search engine results using enhanced query strategies
CN102902806B (en) * 2012-10-17 2016-02-10 深圳市宜搜科技发展有限公司 A kind of method and system utilizing search engine to carry out query expansion
CN102955697B (en) * 2012-11-08 2016-01-20 沈阳建筑大学 Based on the component base construction method of AOP

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KUI-LAM KWOK ET AL: "Improving Weak Ad-Hoc Retrieval by Web Assistance and Data Fusion", 2005, INFORMATION RETRIEVAL TECHNOLOGY LECTURE NOTES IN COMPUTER SCIENCE;;LNCS, SPRINGER, BERLIN, DE, PAGE(S) 17 - 30, ISBN: 978-3-540-29186-2, XP019020783 *
THOMAS MENSINK ET AL: "Improving People Search Using Query Expansions", 12 October 2008, COMPUTER VISION ECCV 2008; [LECTURE NOTES IN COMPUTER SCIENCE], SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, PAGE(S) 86 - 99, ISBN: 978-3-540-88685-3, XP019109247 *
Y LI ET AL: "Improving Weak Ad-Hoc Queries using Wikipedia as External Corpus", 27 July 2007 (2007-07-27), pages 1 - 2, XP055138202, Retrieved from the Internet <URL:http://delivery.acm.org/10.1145/1280000/1277914/p797-yinghao.pdf?ip=145.64.254.243&id=1277914&acc=ACTIVE SERVICE&key=E80E9EB78FFDF9DF.4D4702B0C3E38B35.4D4702B0C3E38B35.4D4702B0C3E38B35&CFID=418071545&CFTOKEN=51118622&__acm__=1409819316_764599f377cdfebe447e069f1e8f3654> [retrieved on 20140904] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110291515A (en) * 2017-02-13 2019-09-27 微软技术许可有限责任公司 Distributed index search in computing system
CN110291515B (en) * 2017-02-13 2023-08-15 微软技术许可有限责任公司 Distributed index searching in computing systems
CN113297452A (en) * 2020-05-26 2021-08-24 阿里巴巴集团控股有限公司 Multi-level search method, multi-level search device and electronic equipment

Also Published As

Publication number Publication date
CN105493082A (en) 2016-04-13
KR20160026907A (en) 2016-03-09
EP3014486A1 (en) 2016-05-04

Similar Documents

Publication Publication Date Title
US20150006520A1 (en) Person Search Utilizing Entity Expansion
EP3014486A1 (en) Person search utilizing entity expansion
US20150095319A1 (en) Query Expansion, Filtering and Ranking for Improved Semantic Search Results Utilizing Knowledge Graphs
US11977568B2 (en) Building dialogue structure by using communicative discourse trees
US9514190B2 (en) Question answer system using physical distance data
Olteanu et al. Distilling the outcomes of personal experiences: A propensity-scored analysis of social media
JP4726545B2 (en) Method, system and apparatus for discovering and connecting data sources
US9311406B2 (en) Discovering trending content of a domain
US9081953B2 (en) Defense against search engine tracking
US20170124183A1 (en) Adjusting search results based on user skill and category information
US20110202533A1 (en) Dynamic Search Interaction
KR20140041574A (en) Context-based ranking of search results
US10685073B1 (en) Selecting textual representations for entity attribute values
US10198501B2 (en) Optimizing retrieval of data related to temporal based queries
US11687794B2 (en) User-centric artificial intelligence knowledge base
WO2017143096A1 (en) Generating text snippets using universal concept graph
US20140324805A1 (en) User-generated content of entity-specific search
US10169711B1 (en) Generalized engine for predicting actions
US20170286503A1 (en) Modular electronic data analysis computing system
Xu et al. From latency, through outbreak, to decline: detecting different states of emergency events using web resources
US7797311B2 (en) Organizing scenario-related information and controlling access thereto
US9576042B2 (en) Categorizing search terms
US20180089569A1 (en) Generating a temporal answer to a question
US20160239502A1 (en) Location-Activity Recommendations
US9965558B2 (en) Cross-channel social search

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201480037264.2

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14740077

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2014740077

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20157036770

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE