US8024332B2 - Clustering question search results based on topic and focus - Google Patents

Clustering question search results based on topic and focus Download PDF

Info

Publication number
US8024332B2
US8024332B2 US12/185,702 US18570208A US8024332B2 US 8024332 B2 US8024332 B2 US 8024332B2 US 18570208 A US18570208 A US 18570208A US 8024332 B2 US8024332 B2 US 8024332B2
Authority
US
United States
Prior art keywords
question
questions
topic
focus
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US12/185,702
Other versions
US20100030769A1 (en
Inventor
Yunbo Cao
Chin-Yew Lin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US12/185,702 priority Critical patent/US8024332B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CAO, YUNBO, LIN, CHIN-YEW
Publication of US20100030769A1 publication Critical patent/US20100030769A1/en
Application granted granted Critical
Publication of US8024332B2 publication Critical patent/US8024332B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Definitions

  • search engine services such as Google and Live Search
  • Google and Live Search provide for searching for information that is accessible via the Internet.
  • These search engine services allow users to search for display pages, such as web pages, that may be of interest to users.
  • the search engine service After a user submits a search request (i.e., a query) that includes search terms, the search engine service identifies web pages that may be related to those search terms.
  • the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling” the web (i.e., the World Wide Web) to identify the keywords of each web page.
  • a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages.
  • the keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on.
  • the search engine service may generate a relevance score to indicate how relevant the information of the web page may be to the search request based on the closeness of each match, web page importance or popularity (e.g., Google's PageRank), and so on.
  • the search engine service displays to the user links to those web pages in an order that is based on a ranking that may be determined by their relevance, popularity, or some other measure.
  • Q&A services may provide traditional frequently asked question (“FAQ”) services or may provide community-based services in which members of the community contribute both questions and answers to those questions.
  • FAQ frequently asked question
  • These Q&A services provide a mechanism that allows users to search for previously generated answers to previously posed questions.
  • These Q&A services typically input a queried question from a user, identify questions of the collection that relate to the queried question (i.e., a question search), and return the answers to the identified questions as the answer to the queried question.
  • Such Q&A services typically treat the questions as plain text.
  • the Q&A services may use various techniques including a vector space model and a language model when performing a question search.
  • Table 1 illustrates example results of a question search for a queried question.
  • Q&A services may identify questions Q2, Q3, Q4, and Q5 as being related to queried question Q1.
  • the Q&A services typically cannot determine, however, which identified question is most related to the queried question.
  • question Q2 is most closely related to queried question Q1.
  • the Q&A services nevertheless provide a ranking of the relatedness of the identified questions to the queried questions.
  • Such a ranking may represent the queried question and each identified question as a feature vector of keywords.
  • the relatedness of an identified question to the queried question is based on the closeness of their feature vectors. The closeness of the feature vectors may be determined using, for example, a cosine similarity metric.
  • the Q&A services typically display the identified questions to a user in rank order.
  • a difficulty with such displaying of the identified questions is that many of the highest ranking questions may be very similar in both syntax and semantics.
  • the identified questions for the example of Table 1 may also include the additional questions of Table 2.
  • a method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided.
  • a question search system provides a collection of questions. Each question of the collection has an associated topic and focus. The topic of a question represents the major context/constraint of a question that characterizes the interest of the user who submits the question. The focus of a question represents certain aspects or descriptive features of the topic of the question in which the user is interested.
  • the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions.
  • the question search system clusters the identified questions into topic clusters of questions with similar topics.
  • the question search system may rank the topic clusters based on a ranking of the original ranking of the questions within the topic clusters and may display information relating to the topic clusters in ranked order.
  • the question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.
  • the question search system may rank the focus clusters within each topic cluster based on a ranking of the original ranking of the questions within the focus clusters and may display information relating to the focus clusters in ranked order.
  • the question search system may display a list of the topic clusters and allow a user to select a topic cluster to display the focus clusters within the selected topic cluster.
  • FIG. 1 is a diagram that illustrates an example question tree.
  • FIG. 2 is a diagram that illustrates a display page with a conventional display of search results of a question search.
  • FIG. 3 is a diagram that illustrates a display page with a clustered display of the search result of a question search in some embodiments.
  • FIG. 4 is a block diagram that illustrates components of the question search system in some embodiments.
  • FIG. 5 is a flow diagram that illustrates the processing of the rank questions by topics and focuses component of the question search system in some embodiments.
  • FIG. 6 is a flow diagram that illustrates the processing of the identify topics and focuses component of the question search system in some embodiments.
  • FIG. 7 is a flow diagram that illustrates the processing of the generate graph of questions component of the question search system in some embodiments.
  • FIG. 8 is a flow diagram that illustrates the processing of the generate question clusters component of the question search system in some embodiments.
  • FIG. 9 is a block diagram of a computing device on which the question search system may be implemented.
  • a question search system provides a collection of questions. Each question of the collection has an associated topic and focus.
  • the topic of a question represents the major context/constraint of a question that characterizes the interest of the user who submits the question. For example, the question “Any cool clubs in Berlin or Hamburg?” has the topic of “Berlin Hamburg” (removing stop words).
  • the focus of a question represents certain aspects or descriptive features of the topic of the question in which the user is interested. For example, the sample question has the focus of “cool clubs,” which describes, refines, or narrows the user's interest in the topic of the question.
  • the question search system Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions.
  • the question search system may use any conventional technique for identifying and ranking the questions.
  • the question search system may use the techniques described in U.S. patent application Ser. No. 12/185,713, entitled “Searching Questions Based on Topic and Focus” and filed on Aug. 4, 2008, which is hereby incorporated by reference.
  • the question search system clusters the identified questions into topic clusters of questions with similar topics.
  • the question search system may rank the topic clusters based on a ranking of the original ranking of the questions within the topic clusters and may display information relating to the topic clusters (e.g., the topic, the questions within the cluster, or the answers to the questions within the cluster) in ranked order. For example, the question search system may generate a topic cluster for questions with the topic of “Hamburg Berlin” and separate clusters for questions with the topics of “Hamburg” and “Berlin.” The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.
  • the question search system may rank the focus clusters within each topic cluster based on a ranking of the original ranking of the questions within the focus clusters and may display information relating to the focus clusters (e.g., the focus, the questions within the cluster, or the answers to the questions within the cluster) in ranked order.
  • the question search system may generate a focus cluster for the focus of “fun clubs” within the topic cluster for the topic of “Hamburg Berlin” and separate focus clusters for questions with the focuses of “restaurant” and “hotel.”
  • the question search system may display a list of the topic clusters and allow a user to select a topic cluster to display the focus clusters within the selected topic cluster.
  • the question search system may also allow the user to select a focus cluster to display the questions within the selected focus cluster. In this way, the question search system can provide a user with an overview of the different topics and their different focuses of semantically related questions without having to view all the questions in their original ranked order.
  • the question search system identifies the topics and focuses of a collection of questions using a minimum description length (“MDL”) tree cut model.
  • MDL minimum description length
  • a “cut” of a tree is any set of nodes in the tree that defines the partition of all nodes viewing each node as representing a set of its child nodes as well as itself.
  • the question search system generates a “question tree” for questions of the collection by identifying base noun phrases and WH-ngrams of the question.
  • a base noun phrase is a simple and nonrecursive noun phrase
  • a WH-ngram is an n-gram beginning with the WH-words: when, what, where, which, and how.
  • the question search system calculates the specificity of a term (e.g., base noun phrase or WH-word) to indicate how well the term characterizes the information needs of a user who posts a question.
  • the question search system then generates a topic chain for each question, which is a list of the terms of a question ordered from highest to lowest specificity.
  • the topic chain of the question “Any cool clubs in Berlin or Hamburg?” may be “Hamburg ⁇ Berlin ⁇ cool club” because the specificity for Hamburg, Berlin, and cool club may be 0.99, 0.62, and 0.36, respectively.
  • the topic chains for the questions of Table 1 are illustrated in Table 3.
  • FIG. 1 is a diagram that illustrates an example question tree.
  • the question tree 100 represents the topic chains of Table 3.
  • the connected nodes of Hamburg, Berlin, and cool club represent the topic chain of “Hamburg ⁇ Berlin ⁇ cool club.”
  • the cut of the question tree is represented by the dashed line 101 .
  • the terms before (to the left of) the cut represent the topics, and the terms after (to the right of) the cut represent the focuses.
  • the topic of the question “Any cool clubs in Berlin or Hamburg?” is thus “Hamburg Berlin,” and the focus of that question is “cool club.”
  • the question search system uses a language modeling framework to define the similarity between questions.
  • a language modeling framework models the probability of generating one question from a language model estimated by another question.
  • the question search system may represent that probability by the following equation:
  • p ⁇ ( Q 1 ⁇ Q 2 ) ⁇ w ⁇ Q 1 ⁇ p ⁇ ⁇ ( w ⁇ Q 2 ) count ⁇ ( w , Q 1 ) ( 1 )
  • Q 2 ) represents the probability of generating question Q 1 from the language model of question Q 2
  • Q 2 ) represents the Maximum Likelihood Estimation of the language model of the question Q 2 for term w
  • count(w,Q 1 ) represents the number of occurrences of term w in question Q 1 .
  • T ( Q 1 )) (3) sim( F ( Q 1 ), F ( Q 2 )) p ( F ( Q 1 )
  • the question search system uses a star clustering algorithm to generate topic clusters and focus clusters.
  • a star clustering algorithm is based on graph partitioning.
  • Each clustering unit (e.g., question) is considered to be a node in an undirected graph.
  • the algorithm calculates the similarity sim(u,v) between each two clustering units u and v.
  • the algorithm adds a link between each pair of nodes whose similarity is above a threshold similarity.
  • a link between two nodes indicates that the questions represented by the nodes are similar in some way (e.g., similar overall, similar topics, or similar focuses).
  • the star clustering algorithm is illustrated in Table 4.
  • G ⁇ represents a graph of vertices (nodes) and edges (links) with edges between similar vertices
  • V represents the vertices
  • E ⁇ represents edges between vertices whose similarity is above the similarity threshold of ⁇
  • the degree of a vertex represents the number of edges connecting that vertex to other vertices
  • neighbor vertices are vertices that are connected by an edge.
  • the question search system clusters and re-ranks question search results using the algorithm illustrated in Table 5.
  • the output ⁇ FC(TC(C Q )) ⁇ is a ranked list of topic clusters that each contains a ranked list of focus clusters. Each focus cluster contains a ranked list of questions.
  • This clustering results in a re-ranked list of the TOP-N search results because the questions might be pushed up to the top of the rank list or down to the bottom of the rank list according to the clusters containing them.
  • FIG. 2 is a diagram that illustrates a display page with a conventional display of search results of a question search.
  • Display page 200 includes the queried question 201 and the questions of the search result 202 .
  • the questions of the search result are ranked based on their relevance to the queried questions.
  • the first two questions “Fun clubs in Hamburg or Berlin” and “What are the fun clubs in Berlin or Hamburg” are semantically the same.
  • FIG. 3 is a diagram that illustrates a display page with a clustered display of the search result of a question search in some embodiments.
  • Display page 300 includes the queried question 301 and the search result 302 organized into topic clusters 310 , 320 , and 330 representing the topics “Berlin or Hamburg,” “Berlin,” and “Hamburg,” respectively.
  • Each topic cluster has focus clusters.
  • Topic cluster 310 has focus clusters 311 , 312 , and 313 representing focuses “clubs,” “restaurants,” and “how long does it take.”
  • Topic cluster 320 has focus clusters 321 and 322 representing focuses “night clubs” and “cheap hotels.”
  • Topic cluster 330 has focus clusters 331 and 332 representing focuses “clubs” and “hotels.”
  • Focus cluster 311 is currently listing the questions with that cluster. The “+” and the “ ⁇ ” to the left of each topic cluster and focus cluster can be used to expand or collapse the information of the cluster.
  • FIG. 4 is a block diagram illustrating components of the question search system in some embodiments.
  • a question search system 410 may be connected to user computing devices 450 , a search service 460 , and a Q&A service 470 via a communication link 440 .
  • the question search system includes various data stores including a question/answer store 411 , a question tree store 412 , and a cut question tree store 413 .
  • the question/answer store contains questions and their corresponding answers.
  • the question tree store contains a question tree for the questions of the question/answer store.
  • the cut question tree store indicates the cut of the question tree.
  • the question search system also includes a search for questions component 421 , a search for answers component 422 , and a find and rank questions component 423 .
  • the search for questions component may invoke the find and rank questions component to identify questions relevant to a queried question and then cluster and display the identified questions.
  • the search for answers component may invoke the find and rank questions component to identify questions relevant to a queried question, cluster the identified questions, and display the answers to the questions organized based on the clusters.
  • the question search system also includes a rank questions by topics and focuses component 431 , an identify topics and focuses component 432 , a generate graph of questions component 433 , and a generate question clusters component 434 .
  • the rank question by topics and focuses component invokes the identify topics and focuses component to determine the topics and focuses of questions.
  • the rank questions by topics and focuses component also invokes the generate graph of questions component to generate a similarity graph and the generate question clusters component to generate the topic and focus clusters from the graphs.
  • FIG. 9 is a block diagram of a computing device on which the question search system may be implemented.
  • the computing device 900 on which the question search system 200 may be implemented may include a central processing unit 901 , memory 902 , input devices 904 (e.g., keyboard and pointing devices), output devices 905 (e.g., display devices), and storage devices 903 (e.g., disk drives).
  • the memory and storage devices are computer-readable media that may contain instructions that implement the question search system.
  • the data structures and message structures may be stored or transmitted via a data transmission medium, such as a signal on a communications link.
  • Various communications links may be used, such as the Internet, a local area network, a wide area network, or a point-to-point dial-up connection.
  • the question search system may be implemented in and/or used by various operating environments.
  • the operating environment described herein is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality of the relevance system.
  • Other well-known computing systems, environments, and configurations that may be suitable for use include personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
  • the question search system may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices.
  • program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types.
  • functionality of the program modules may be combined or distributed as desired in various embodiments.
  • FIG. 5 is a flow diagram that illustrates the processing of the rank questions by topics and focuses component of the question search system in some embodiments.
  • the component is invoked passing originally ranked questions of a search result that are relevant to a queried question and generates topic and focus clusters for those questions.
  • the component invokes the identify topics and focuses component to identify the topics and focuses of the questions.
  • the component invokes the generate graph of questions component passing an indication to generate the graph based on the similarity of topics.
  • the component invokes the generate question clusters component to generate the clusters for the graph.
  • the component ranks the generated clusters based on the highest original ranking of a question within each cluster.
  • the component loops selecting each topic cluster and generating focus clusters within that topic cluster.
  • the component selects the next topic cluster.
  • decision block 506 if all the topic clusters have already been selected, then the component completes, else the component continues at block 507 .
  • the component invokes the generate graph of questions component passing an indication to generate the graph based on the similarity of focuses.
  • the component invokes the generate question clusters component to generate the clusters for the graph.
  • the component ranks the focus clusters for the selected topic cluster based on the highest ranking questions of each focus cluster. The component then loops to block 505 to select the next topic cluster.
  • FIG. 6 is a flow diagram that illustrates the processing of the identify topics and focuses component of the question search system in some embodiments.
  • the component is passed questions and returns the topic and focus of each question.
  • the component generates a question tree.
  • the component determines the cut of the question tree. The component then returns the terms of each topic chain before its cut as the topic of a question and the terms of each topic chain after its cut as the focus of the question.
  • FIG. 7 is a flow diagram that illustrates the processing of the generate graph of questions component of the question search system in some embodiments.
  • the component is passed questions along with an indication to generate a graph for the topic or focus of the questions.
  • the component selects the next question.
  • the component if all the questions have already been selected, then the component returns, else the component continues at block 703 .
  • the component loops adding links between the selected node and each other node of the graph when the similarity between the nodes is above a similarity threshold.
  • the component chooses the next question that has not already been selected.
  • decision block 704 if all such questions have already been chosen for the selected question, then the component loops to block 701 to select the next question, else the component continues at block 705 .
  • the component calculates the similarity between the selected and chosen questions.
  • decision block 706 if the similarity is greater than a threshold similarity, then the component continues at block 707 , else the component loops to block 703 to choose the next question.
  • block 707 the component adds a similarity link between the nodes of the selected and chosen questions and then loops to block 703 to select the next question.
  • FIG. 8 is a flow diagram that illustrates the processing of the generate question clusters component of the question search system in some embodiments.
  • the component is passed a graph and generates clusters for the graph.
  • the component sets each node within the graph to be unmarked.
  • the component calculates the degree of each node of the graph.
  • the component loops generating star clusters of the nodes.
  • the component selects the next unmarked node with the highest degree.
  • decision block 804 if all such nodes have already been selected, then the component returns an indication of the clusters, else the component continues at block 805 .
  • the component marks the selected node as a center of a cluster.
  • the component marks each neighbor node of the selected node that is unmarked as a satellite of that cluster. The component then loops to block 803 to select the next unmarked node.
  • Each node that is the center of a cluster and all its satellite nodes comprise a cluster.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)
This application relates to U.S. patent application Ser. No. 12/185,713, filed on Aug. 4, 2008 entitled “SEARCHING QUESTIONS BASED ON TOPIC AND FOCUS,” which is hereby incorporated by reference in its entirety.
BACKGROUND
Many search engine services, such as Google and Live Search, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request (i.e., a query) that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling” the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service may generate a relevance score to indicate how relevant the information of the web page may be to the search request based on the closeness of each match, web page importance or popularity (e.g., Google's PageRank), and so on. The search engine service then displays to the user links to those web pages in an order that is based on a ranking that may be determined by their relevance, popularity, or some other measure.
Some online services, such as Yahoo! Answers and Live QnA, have created large collections of questions and their corresponding answers. These Q&A services may provide traditional frequently asked question (“FAQ”) services or may provide community-based services in which members of the community contribute both questions and answers to those questions. These Q&A services provide a mechanism that allows users to search for previously generated answers to previously posed questions. These Q&A services typically input a queried question from a user, identify questions of the collection that relate to the queried question (i.e., a question search), and return the answers to the identified questions as the answer to the queried question.
Such Q&A services typically treat the questions as plain text. The Q&A services may use various techniques including a vector space model and a language model when performing a question search. Table 1 illustrates example results of a question search for a queried question.
TABLE 1
Queried Question:
Q1: Any cool clubs in Berlin or Hamburg?
Expected Question
Q2: What are the best/most fun clubs in Berlin?
Not Expected Question:
Q3: Any nice hotels in Berlin or Hamburg?
Q4: How long does it take to get to Hamburg from Berlin?
Q5: Cheap hotels in Berlin?

Such Q&A services may identify questions Q2, Q3, Q4, and Q5 as being related to queried question Q1. The Q&A services typically cannot determine, however, which identified question is most related to the queried question. In this example, question Q2 is most closely related to queried question Q1. The Q&A services nevertheless provide a ranking of the relatedness of the identified questions to the queried questions. Such a ranking may represent the queried question and each identified question as a feature vector of keywords. The relatedness of an identified question to the queried question is based on the closeness of their feature vectors. The closeness of the feature vectors may be determined using, for example, a cosine similarity metric.
The Q&A services typically display the identified questions to a user in rank order. A difficulty with such displaying of the identified questions is that many of the highest ranking questions may be very similar in both syntax and semantics. For example, the identified questions for the example of Table 1 may also include the additional questions of Table 2.
TABLE 2
Q6: Fun clubs in Berlin or Hamburg?
Q7: What's a good restaurant in Hamburg or Berlin?

Because questions Q2 and Q6 have several words in common with queried question Q1, a Q&A service may rank those questions high. Depending on the size of the collection of questions, there may be many questions similar to questions Q2 and Q6. If all these similar questions are ranked high, then the first page of the search results may list only such similar questions. If the user is actually interested in hotels that have health clubs, then the user may need to scan several pages before finding a listing for a hotel or a hotel with a health club that is of interest.
SUMMARY
A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. The topic of a question represents the major context/constraint of a question that characterizes the interest of the user who submits the question. The focus of a question represents certain aspects or descriptive features of the topic of the question in which the user is interested. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may rank the topic clusters based on a ranking of the original ranking of the questions within the topic clusters and may display information relating to the topic clusters in ranked order. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses. The question search system may rank the focus clusters within each topic cluster based on a ranking of the original ranking of the questions within the focus clusters and may display information relating to the focus clusters in ranked order. The question search system may display a list of the topic clusters and allow a user to select a topic cluster to display the focus clusters within the selected topic cluster.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagram that illustrates an example question tree.
FIG. 2 is a diagram that illustrates a display page with a conventional display of search results of a question search.
FIG. 3 is a diagram that illustrates a display page with a clustered display of the search result of a question search in some embodiments.
FIG. 4 is a block diagram that illustrates components of the question search system in some embodiments.
FIG. 5 is a flow diagram that illustrates the processing of the rank questions by topics and focuses component of the question search system in some embodiments.
FIG. 6 is a flow diagram that illustrates the processing of the identify topics and focuses component of the question search system in some embodiments.
FIG. 7 is a flow diagram that illustrates the processing of the generate graph of questions component of the question search system in some embodiments.
FIG. 8 is a flow diagram that illustrates the processing of the generate question clusters component of the question search system in some embodiments.
FIG. 9 is a block diagram of a computing device on which the question search system may be implemented.
DETAILED DESCRIPTION
A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. In some embodiments, a question search system provides a collection of questions. Each question of the collection has an associated topic and focus. The topic of a question represents the major context/constraint of a question that characterizes the interest of the user who submits the question. For example, the question “Any cool clubs in Berlin or Hamburg?” has the topic of “Berlin Hamburg” (removing stop words). The focus of a question represents certain aspects or descriptive features of the topic of the question in which the user is interested. For example, the sample question has the focus of “cool clubs,” which describes, refines, or narrows the user's interest in the topic of the question. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system may use any conventional technique for identifying and ranking the questions. Alternatively, the question search system may use the techniques described in U.S. patent application Ser. No. 12/185,713, entitled “Searching Questions Based on Topic and Focus” and filed on Aug. 4, 2008, which is hereby incorporated by reference. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may rank the topic clusters based on a ranking of the original ranking of the questions within the topic clusters and may display information relating to the topic clusters (e.g., the topic, the questions within the cluster, or the answers to the questions within the cluster) in ranked order. For example, the question search system may generate a topic cluster for questions with the topic of “Hamburg Berlin” and separate clusters for questions with the topics of “Hamburg” and “Berlin.” The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses. The question search system may rank the focus clusters within each topic cluster based on a ranking of the original ranking of the questions within the focus clusters and may display information relating to the focus clusters (e.g., the focus, the questions within the cluster, or the answers to the questions within the cluster) in ranked order. For example, the question search system may generate a focus cluster for the focus of “fun clubs” within the topic cluster for the topic of “Hamburg Berlin” and separate focus clusters for questions with the focuses of “restaurant” and “hotel.” The question search system may display a list of the topic clusters and allow a user to select a topic cluster to display the focus clusters within the selected topic cluster. The question search system may also allow the user to select a focus cluster to display the questions within the selected focus cluster. In this way, the question search system can provide a user with an overview of the different topics and their different focuses of semantically related questions without having to view all the questions in their original ranked order.
In some embodiments, the question search system identifies the topics and focuses of a collection of questions using a minimum description length (“MDL”) tree cut model. Such identification of topics and focuses is described in U.S. patent application Ser. No. 12/098,457, entitled “Recommending Questions to User of Community Question Answering” and filed on Apr. 7, 2008, which is hereby incorporated by reference. A “cut” of a tree is any set of nodes in the tree that defines the partition of all nodes viewing each node as representing a set of its child nodes as well as itself. The question search system generates a “question tree” for questions of the collection by identifying base noun phrases and WH-ngrams of the question. A base noun phrase is a simple and nonrecursive noun phrase, and a WH-ngram is an n-gram beginning with the WH-words: when, what, where, which, and how. The question search system calculates the specificity of a term (e.g., base noun phrase or WH-word) to indicate how well the term characterizes the information needs of a user who posts a question. The question search system then generates a topic chain for each question, which is a list of the terms of a question ordered from highest to lowest specificity. For example, the topic chain of the question “Any cool clubs in Berlin or Hamburg?” may be “Hamburg→Berlin→cool club” because the specificity for Hamburg, Berlin, and cool club may be 0.99, 0.62, and 0.36, respectively. The topic chains for the questions of Table 1 are illustrated in Table 3.
TABLE 3
Queried Question:
Q1: Hamburg→Berlin→cool club
Expected Question
Q2: Berlin→fun club
Not Expected Question:
Q3: Hamburg→Berlin→nice hotel
Q4: Hamburg→Berlin→how long does it take
Q5: Berlin→cheap hotels
FIG. 1 is a diagram that illustrates an example question tree. The question tree 100 represents the topic chains of Table 3. The connected nodes of Hamburg, Berlin, and cool club represent the topic chain of “Hamburg→Berlin→cool club.” The cut of the question tree is represented by the dashed line 101. The terms before (to the left of) the cut represent the topics, and the terms after (to the right of) the cut represent the focuses. The topic of the question “Any cool clubs in Berlin or Hamburg?” is thus “Hamburg Berlin,” and the focus of that question is “cool club.”
In some embodiments, the question search system uses a language modeling framework to define the similarity between questions. (See Ponte, J. M. and Croft, W. B., “A Language Modeling Approach to Information Retrieval,” Proc. of SIGIR'98, 1998.) A language modeling framework models the probability of generating one question from a language model estimated by another question. The question search system may represent that probability by the following equation:
p ( Q 1 Q 2 ) = w Q 1 p ~ ( w Q 2 ) count ( w , Q 1 ) ( 1 )
where p(Q1|Q2) represents the probability of generating question Q1 from the language model of question Q2, p(w|Q2) represents the Maximum Likelihood Estimation of the language model of the question Q2 for term w, and count(w,Q1) represents the number of occurrences of term w in question Q1.
The question search system represents the similarity between questions by a symmetric function represented by the following equation:
sim(Q 1 ,Q 2)=p(Q 1 |Q 2)+p(Q 2 |Q 1)  (2)
where sim(Q1,Q2) represents the similarity between questions Q1 and Q2. The question search system may also represent the similarity between the topics and the focuses of questions in an analogous manner using the following equations:
sim(T(Q 1),T(Q 2))=p(T(Q 1)|T(Q 2))+p(T(Q 2)|T(Q 1))  (3)
sim(F(Q 1),F(Q 2))=p(F(Q 1)|F(Q 2))+p(F(Q 2)|F(Q 1))  (4)
wherein T(Q1) represents the topic of question Q1, F(Q1) represents the focus of question Q1, sim(T(Q1),T(Q2)) represents the similarity between the topics of questions Q1 and Q2, and sim(F(Q1), F(Q2)) represents the similarity between the focuses of questions Q1 and Q2.
In some embodiments, the question search system uses a star clustering algorithm to generate topic clusters and focus clusters. One skilled in the art will appreciate, however, that a variety of well-known clustering techniques may be used, such as a nearest neighbor clustering and K-means clustering. The star clustering algorithm is based on graph partitioning. (See Wang, X. and Zhai, C., “Learn from Web Search Logs to Organize Search Results,” Proc. of SIGIR'07, 2007, and Aslam, J. A., Pelekov, E., and Rus, D., “The Star Clustering Algorithm for Static and Dynamic Information Organization,” Journal of Graph Algorithms and Applications, 8(1):95-129, 2004.) Each clustering unit (e.g., question) is considered to be a node in an undirected graph. The algorithm calculates the similarity sim(u,v) between each two clustering units u and v. The algorithm adds a link between each pair of nodes whose similarity is above a threshold similarity. Thus, a link between two nodes indicates that the questions represented by the nodes are similar in some way (e.g., similar overall, similar topics, or similar focuses). The star clustering algorithm is illustrated in Table 4.
TABLE 4
For any threshold σ:
1. Let graph Gσ = (V, Eσ) where Eσ = {(u, v):
sim(u, v) ≧ σ, u ε V, v ε V}.
2. Let each vertex in Gσ initially be unmarked.
3. Calculate the degree of each vertex v ε V.
4. From the unmarked vertices, find the unmarked vertex μ that has
the highest degree and mark its flag as a center.
5. Form a cluster C containing μ and all its neighbors that are not
marked.
6. Mark all the selected neighbors as satellites.
7. Repeat steps 4-6 until all vertices are marked.
8. Represent each cluster by the vertex corresponding to its associated
star center.

In this table, Gσ represents a graph of vertices (nodes) and edges (links) with edges between similar vertices, V represents the vertices, Eσ represents edges between vertices whose similarity is above the similarity threshold of σ, the degree of a vertex represents the number of edges connecting that vertex to other vertices, and neighbor vertices are vertices that are connected by an edge. The star clustering algorithm thus establishes that pairs of questions are similar when the similarity between the questions satisfies a threshold similarity and then repeatedly selects an unmarked question that is similar to the greatest number of questions, marks the selected question as a center of a cluster, and marks each previously unmarked similar question as a satellite of the center of the cluster.
The question search system clusters and re-ranks question search results using the algorithm illustrated in Table 5. The output {{FC(TC(CQ))}} is a ranked list of topic clusters that each contains a ranked list of focus clusters. Each focus cluster contains a ranked list of questions.
TABLE 5
Given a query Q, a size N, two thresholds σ1 and σ2:
1. Retrieve a collection of questions ranked as TOP-N for the
query Q, denoted as CQ. Let C′Q = CQ ∪ Q.
2. For each question in C′Q, build the topic-focus structure
using an MDL-based tree cut model.
3. Use the star clustering algorithm, the threshold σ1, and the
topic similarity to cluster the questions in CQ into the topic
clusters {TC(CQ)}.
4. Rank each cluster TC(CQ) in {TC(CQ)} according to the
rank (in CQ) of the question in TC(CQ) that is ranked
highest.
5. For each cluster TC(CQ) in {TC(CQ)},
5.1 Use the star clustering algorithm, the threshold σ2,
and the focus similarity to cluster the questions in
TC(CQ) into the focus clusters {FC(TC(CQ))}.
5.2 Rank each cluster FC(TC(CQ)) in {FC(TC(CQ))}
according to the rank (in CQ) of the question in
FC(TC(CQ)) that is ranked highest.
5.3 Rank each Question Q′ in FC(TC(CQ)) according to
their original rank in CQ.
6. Output {{FC(TC(CQ))}}.

This clustering results in a re-ranked list of the TOP-N search results because the questions might be pushed up to the top of the rank list or down to the bottom of the rank list according to the clusters containing them. In some embodiments, the question system may set the similarity thresholds such that σ12=σ.
FIG. 2 is a diagram that illustrates a display page with a conventional display of search results of a question search. Display page 200 includes the queried question 201 and the questions of the search result 202. The questions of the search result are ranked based on their relevance to the queried questions. The first two questions “Fun clubs in Hamburg or Berlin” and “What are the fun clubs in Berlin or Hamburg” are semantically the same.
FIG. 3 is a diagram that illustrates a display page with a clustered display of the search result of a question search in some embodiments. Display page 300 includes the queried question 301 and the search result 302 organized into topic clusters 310, 320, and 330 representing the topics “Berlin or Hamburg,” “Berlin,” and “Hamburg,” respectively. Each topic cluster has focus clusters. Topic cluster 310 has focus clusters 311, 312, and 313 representing focuses “clubs,” “restaurants,” and “how long does it take.” Topic cluster 320 has focus clusters 321 and 322 representing focuses “night clubs” and “cheap hotels.” Topic cluster 330 has focus clusters 331 and 332 representing focuses “clubs” and “hotels.” Focus cluster 311 is currently listing the questions with that cluster. The “+” and the “−” to the left of each topic cluster and focus cluster can be used to expand or collapse the information of the cluster.
FIG. 4 is a block diagram illustrating components of the question search system in some embodiments. A question search system 410 may be connected to user computing devices 450, a search service 460, and a Q&A service 470 via a communication link 440. The question search system includes various data stores including a question/answer store 411, a question tree store 412, and a cut question tree store 413. The question/answer store contains questions and their corresponding answers. The question tree store contains a question tree for the questions of the question/answer store. The cut question tree store indicates the cut of the question tree. The question search system also includes a search for questions component 421, a search for answers component 422, and a find and rank questions component 423. The search for questions component may invoke the find and rank questions component to identify questions relevant to a queried question and then cluster and display the identified questions. The search for answers component may invoke the find and rank questions component to identify questions relevant to a queried question, cluster the identified questions, and display the answers to the questions organized based on the clusters. The question search system also includes a rank questions by topics and focuses component 431, an identify topics and focuses component 432, a generate graph of questions component 433, and a generate question clusters component 434. The rank question by topics and focuses component invokes the identify topics and focuses component to determine the topics and focuses of questions. The rank questions by topics and focuses component also invokes the generate graph of questions component to generate a similarity graph and the generate question clusters component to generate the topic and focus clusters from the graphs.
FIG. 9 is a block diagram of a computing device on which the question search system may be implemented. The computing device 900 on which the question search system 200 may be implemented may include a central processing unit 901, memory 902, input devices 904 (e.g., keyboard and pointing devices), output devices 905 (e.g., display devices), and storage devices 903 (e.g., disk drives). The memory and storage devices are computer-readable media that may contain instructions that implement the question search system. In addition, the data structures and message structures may be stored or transmitted via a data transmission medium, such as a signal on a communications link. Various communications links may be used, such as the Internet, a local area network, a wide area network, or a point-to-point dial-up connection.
The question search system may be implemented in and/or used by various operating environments. The operating environment described herein is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality of the relevance system. Other well-known computing systems, environments, and configurations that may be suitable for use include personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The question search system may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
FIG. 5 is a flow diagram that illustrates the processing of the rank questions by topics and focuses component of the question search system in some embodiments. The component is invoked passing originally ranked questions of a search result that are relevant to a queried question and generates topic and focus clusters for those questions. In block 501, the component invokes the identify topics and focuses component to identify the topics and focuses of the questions. In block 502, the component invokes the generate graph of questions component passing an indication to generate the graph based on the similarity of topics. In block 503, the component invokes the generate question clusters component to generate the clusters for the graph. In block 504, the component ranks the generated clusters based on the highest original ranking of a question within each cluster. In blocks 505-509, the component loops selecting each topic cluster and generating focus clusters within that topic cluster. In block 505, the component selects the next topic cluster. In decision block 506, if all the topic clusters have already been selected, then the component completes, else the component continues at block 507. In block 507, the component invokes the generate graph of questions component passing an indication to generate the graph based on the similarity of focuses. In block 508, the component invokes the generate question clusters component to generate the clusters for the graph. In block 509, the component ranks the focus clusters for the selected topic cluster based on the highest ranking questions of each focus cluster. The component then loops to block 505 to select the next topic cluster.
FIG. 6 is a flow diagram that illustrates the processing of the identify topics and focuses component of the question search system in some embodiments. The component is passed questions and returns the topic and focus of each question. In block 601, the component generates a question tree. In block 602, the component determines the cut of the question tree. The component then returns the terms of each topic chain before its cut as the topic of a question and the terms of each topic chain after its cut as the focus of the question.
FIG. 7 is a flow diagram that illustrates the processing of the generate graph of questions component of the question search system in some embodiments. The component is passed questions along with an indication to generate a graph for the topic or focus of the questions. In block 701, the component selects the next question. In block 702, if all the questions have already been selected, then the component returns, else the component continues at block 703. In blocks 703-707, the component loops adding links between the selected node and each other node of the graph when the similarity between the nodes is above a similarity threshold. In block 703, the component chooses the next question that has not already been selected. In decision block 704, if all such questions have already been chosen for the selected question, then the component loops to block 701 to select the next question, else the component continues at block 705. In block 705, the component calculates the similarity between the selected and chosen questions. In decision block 706, if the similarity is greater than a threshold similarity, then the component continues at block 707, else the component loops to block 703 to choose the next question. In block 707, the component adds a similarity link between the nodes of the selected and chosen questions and then loops to block 703 to select the next question.
FIG. 8 is a flow diagram that illustrates the processing of the generate question clusters component of the question search system in some embodiments. The component is passed a graph and generates clusters for the graph. In block 801, the component sets each node within the graph to be unmarked. In block 802, the component calculates the degree of each node of the graph. In blocks 803-806, the component loops generating star clusters of the nodes. In block 803, the component selects the next unmarked node with the highest degree. In decision block 804, if all such nodes have already been selected, then the component returns an indication of the clusters, else the component continues at block 805. In block 805, the component marks the selected node as a center of a cluster. In block 806, the component marks each neighbor node of the selected node that is unmarked as a satellite of that cluster. The component then loops to block 803 to select the next unmarked node. Each node that is the center of a cluster and all its satellite nodes comprise a cluster.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. Accordingly, the invention is not limited except as by the appended claims.

Claims (14)

1. A method in a computing device for presenting questions of a question search, the method comprising:
providing a collection of questions having terms, each question having a topic of one or more terms of the question and a focus of one or more terms of the question, the topics and focuses of the questions of the collection identified by generating a question tree of the questions of the collection and generating a cut of the question tree, a topic of a question comprising terms of the question before the cut and the focus of a question comprising terms of the question after the cut;
receiving a queried question;
identifying by the computing device questions of the collection that are relevant to the queried question;
providing a search result with identified questions of the collection that are relevant to the queried question, the identified questions of the search result being originally ranked based on relevance to the queried question;
re-ranking by the computing device the search result by:
generating topic clusters of the identified questions of the search result based on similarity between the topics of the identified questions and using a star clustering algorithm;
for each topic cluster of questions,
generating focus clusters of the identified questions within the topic cluster; and
ranking the focus clusters within the topic cluster based on the identified question within each focus cluster with the highest original rank; and
displaying as search result information organized based on the topic clusters and the ranked focus clusters within a topic cluster.
2. The method of claim 1 wherein the displayed information includes an indication of the topic of a topic cluster and the focus of a focus cluster.
3. The method of claim 2 wherein the displayed information includes the identified questions within each focus cluster.
4. The method of claim 1 wherein the displayed information includes answers to each identified question.
5. The method of claim 1 wherein the generating of a cluster includes:
establishing that pairs of identified questions are similar when the similarity between the identified questions satisfies a threshold similarity; and
repeatedly selecting an unmarked question that is similar to the greatest number of questions, marking the selected question as a center of a cluster, and marking each previously unmarked similar question as a satellite of the center of the cluster.
6. The method of claim 1 wherein the cut is based on minimum description length.
7. A computer-readable storage medium containing instructions for controlling a computing device to present questions of a question search, by a method comprising:
providing a collection of questions, each question having terms, each question having a topic and a focus, the topics and focuses of the questions of the collection identified by generating a question tree of the questions of the collection and generating a cut of the question tree, a topic of a question comprising terms of the question before the cut and the focus of a question comprising terms of the question after the cut;
receiving a queried question;
identifying questions of the collection that are relevant to the queried question;
generating an original ranking of the identified questions based on relevance to the queried question;
re-ranking the identified questions by:
generating topic clusters of the identified questions based on similarity between the topics of the questions and using a star clustering algorithm;
ranking the topic clusters based on the question within each topic cluster with the highest original ranking;
for each topic cluster of questions,
generating focus clusters of the questions within the topic cluster based on similarity between the focuses of the questions and using a star clustering algorithm; and
ranking the focus clusters within the topic cluster based on the question within each focus cluster with the highest original ranking; and
displaying the identified questions organized based on the topic clusters and focus clusters within a topic cluster.
8. The computer-readable storage medium of claim 7 including displaying an indication of the topic of a topic cluster and the focus of a focus cluster.
9. The computer-readable storage medium of claim 8 including displaying the questions within a focus cluster.
10. The computer-readable storage medium of claim 7 including displaying answers to a question.
11. The computer-readable storage medium of claim 7 wherein the generating of a cluster includes:
establishing that pairs of questions are similar when the similarity between the questions satisfies a threshold similarity; and
repeatedly selecting an unmarked question that is similar to the greatest number of questions, marking the selected question as a center of a cluster, and marking each previously unmarked similar question as a satellite of the center of the cluster.
12. The computer-readable storage medium of claim 7 wherein the topics and focuses of the questions of the collection are identified by generating a question tree of the questions of the collection and generating a cut of the question tree.
13. The computer-readable storage medium of claim 12 wherein each question is represented by a term chain within the question tree, the topic of a question comprises the terms before the cut of the term chain, and the focus of a question comprises the terms after the cut of the term chain.
14. A computing device for clustering questions of a question search, comprising:
a collection of questions having terms, each question having a topic of one or more terms of the question and a focus of one or more terms of the question, the topics and focuses of the questions of the collection identified by generating a question tree of the questions of the collection and generating a cut of the question tree, a topic of a question comprising terms of the question before the cut and the focus of a question comprising terms of the question after the cut;
a memory storing computer-executable instructions of:
a component that receives an identification of questions of the collection that are relevant to a queried question, the identified questions being originally ranked based on relevance to the queried question;
a component that re-ranks the identified questions by:
generating topic clusters of the identified questions based on similarity between the topics of the identified questions and using a star clustering algorithm; and
for each topic cluster of identified questions,
generating focus clusters of the identified questions within the topic cluster; and
ranking the focus clusters within the topic cluster based on the identified questions within each focus cluster with the highest original rank; and
a component that displays the identified questions based on the topic clusters and the ranking of the focus clusters within a topic cluster; and
a processor that executes the computer-executable instructions stored in the memory.
US12/185,702 2008-08-04 2008-08-04 Clustering question search results based on topic and focus Expired - Fee Related US8024332B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/185,702 US8024332B2 (en) 2008-08-04 2008-08-04 Clustering question search results based on topic and focus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/185,702 US8024332B2 (en) 2008-08-04 2008-08-04 Clustering question search results based on topic and focus

Publications (2)

Publication Number Publication Date
US20100030769A1 US20100030769A1 (en) 2010-02-04
US8024332B2 true US8024332B2 (en) 2011-09-20

Family

ID=41609371

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/185,702 Expired - Fee Related US8024332B2 (en) 2008-08-04 2008-08-04 Clustering question search results based on topic and focus

Country Status (1)

Country Link
US (1) US8024332B2 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150199417A1 (en) * 2014-01-10 2015-07-16 International Business Machines Corporation Seed selection in corpora compaction for natural language processing
US9571660B2 (en) 2014-10-10 2017-02-14 Avaya Inc. Conference call question manager
US9898554B2 (en) 2013-11-18 2018-02-20 Google Inc. Implicit question query identification
US9940384B2 (en) 2015-12-15 2018-04-10 International Business Machines Corporation Statistical clustering inferred from natural language to drive relevant analysis and conversation with users
US20180225365A1 (en) * 2017-02-08 2018-08-09 International Business Machines Corporation Dialog mechanism responsive to query context
US10216802B2 (en) 2015-09-28 2019-02-26 International Business Machines Corporation Presenting answers from concept-based representation of a topic oriented pipeline
US10380257B2 (en) 2015-09-28 2019-08-13 International Business Machines Corporation Generating answers from concept-based representation of a topic oriented pipeline
US10423613B2 (en) * 2013-12-20 2019-09-24 Hitachi, Ltd. Data search method and data search system
US10490094B2 (en) 2015-09-25 2019-11-26 International Business Machines Corporation Techniques for transforming questions of a question set to facilitate answer aggregation and display
US10503786B2 (en) 2015-06-16 2019-12-10 International Business Machines Corporation Defining dynamic topic structures for topic oriented question answer systems
US10839950B2 (en) 2017-02-09 2020-11-17 Cognoa, Inc. Platform and system for digital personalized medicine
US10874355B2 (en) 2014-04-24 2020-12-29 Cognoa, Inc. Methods and apparatus to determine developmental progress with artificial intelligence and user input
US20210271990A1 (en) * 2018-06-29 2021-09-02 Nippon Telegraph And Telephone Corporation Answer sentence selection device, method, and program
US11176444B2 (en) 2019-03-22 2021-11-16 Cognoa, Inc. Model optimization and data analysis using machine learning techniques
US11822588B2 (en) * 2018-10-24 2023-11-21 International Business Machines Corporation Supporting passage ranking in question answering (QA) system
US11972336B2 (en) 2015-12-18 2024-04-30 Cognoa, Inc. Machine learning platform and system for data analysis

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8027973B2 (en) 2008-08-04 2011-09-27 Microsoft Corporation Searching questions based on topic and focus
CN102063432A (en) 2009-11-12 2011-05-18 阿里巴巴集团控股有限公司 Retrieval method and retrieval system
JP2011138197A (en) * 2009-12-25 2011-07-14 Sony Corp Information processing apparatus, method of evaluating degree of association, and program
JP5556711B2 (en) * 2011-03-18 2014-07-23 富士通株式会社 Category classification processing apparatus, category classification processing method, category classification processing program recording medium, category classification processing system
US9659022B2 (en) * 2011-08-02 2017-05-23 International Business Machines Corporation File object browsing and searching across different domains
US9117194B2 (en) 2011-12-06 2015-08-25 Nuance Communications, Inc. Method and apparatus for operating a frequently asked questions (FAQ)-based system
CN103377240B (en) 2012-04-26 2017-03-01 阿里巴巴集团控股有限公司 Information providing method, processing server and merging server
US20140006406A1 (en) * 2012-06-28 2014-01-02 Aol Inc. Systems and methods for analyzing and managing electronic content
US20140030688A1 (en) * 2012-07-25 2014-01-30 Armitage Sheffield, Llc Systems, methods and program products for collecting and displaying query responses over a data network
US9015097B2 (en) 2012-12-19 2015-04-21 Nuance Communications, Inc. System and method for learning answers to frequently asked questions from a semi-structured data source
US9015162B2 (en) * 2013-01-25 2015-04-21 International Business Machines Corporation Integrating smart social question and answers enabled for use with social networking tools
US9213748B1 (en) * 2013-03-14 2015-12-15 Google Inc. Generating related questions for search queries
US9064001B2 (en) * 2013-03-15 2015-06-23 Nuance Communications, Inc. Method and apparatus for a frequently-asked questions portal workflow
US9690874B1 (en) * 2013-04-26 2017-06-27 Skopic, Inc. Social platform for developing information-networked local communities
US9336277B2 (en) * 2013-05-31 2016-05-10 Google Inc. Query suggestions based on search data
US9116952B1 (en) 2013-05-31 2015-08-25 Google Inc. Query refinements using search data
US20140358631A1 (en) * 2013-06-03 2014-12-04 24/7 Customer, Inc. Method and apparatus for generating frequently asked questions
US9146987B2 (en) * 2013-06-04 2015-09-29 International Business Machines Corporation Clustering based question set generation for training and testing of a question and answer system
US9230009B2 (en) 2013-06-04 2016-01-05 International Business Machines Corporation Routing of questions to appropriately trained question and answer system pipelines using clustering
US9965548B2 (en) * 2013-12-05 2018-05-08 International Business Machines Corporation Analyzing natural language questions to determine missing information in order to improve accuracy of answers
US9348900B2 (en) 2013-12-11 2016-05-24 International Business Machines Corporation Generating an answer from multiple pipelines using clustering
US9563688B2 (en) * 2014-05-01 2017-02-07 International Business Machines Corporation Categorizing users based on similarity of posed questions, answers and supporting evidence
US11182442B1 (en) * 2014-10-30 2021-11-23 Intuit, Inc. Application usage by selecting targeted responses to social media posts about the application
US10366107B2 (en) * 2015-02-06 2019-07-30 International Business Machines Corporation Categorizing questions in a question answering system
US9996604B2 (en) 2015-02-09 2018-06-12 International Business Machines Corporation Generating usage report in a question answering system based on question categorization
US10795921B2 (en) 2015-03-27 2020-10-06 International Business Machines Corporation Determining answers to questions using a hierarchy of question and answer pairs
EP3411244A4 (en) * 2016-02-04 2019-09-04 Global Safety Management, Inc. System for creating safety data sheets
US9558265B1 (en) * 2016-05-12 2017-01-31 Quid, Inc. Facilitating targeted analysis via graph generation based on an influencing parameter
US9836183B1 (en) * 2016-09-14 2017-12-05 Quid, Inc. Summarized network graph for semantic similarity graphs of large corpora
RU2747425C2 (en) * 2016-10-24 2021-05-04 Конинклейке Филипс Н.В. Real-time answer system to questions from different fields of knowledge
CN106777236B (en) * 2016-12-27 2020-11-03 北京百度网讯科技有限公司 Method and device for displaying query result based on deep question answering
US11048878B2 (en) * 2018-05-02 2021-06-29 International Business Machines Corporation Determining answers to a question that includes multiple foci
US10909180B2 (en) 2019-01-11 2021-02-02 International Business Machines Corporation Dynamic query processing and document retrieval
US10949613B2 (en) 2019-01-11 2021-03-16 International Business Machines Corporation Dynamic natural language processing
WO2021227059A1 (en) * 2020-05-15 2021-11-18 深圳市世强元件网络有限公司 Multi-way tree-based search word recommendation method and system
US11308287B1 (en) * 2020-10-01 2022-04-19 International Business Machines Corporation Background conversation analysis for providing a real-time feedback
CN114372215B (en) * 2022-01-12 2023-07-14 抖音视界有限公司 Search result display and search request processing method and device

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6028601A (en) 1997-04-01 2000-02-22 Apple Computer, Inc. FAQ link creation between user's questions and answers
US20020087520A1 (en) * 2000-12-15 2002-07-04 Meyers Paul Anthony Appartus and method for connecting experts to topic areas
US20020111934A1 (en) * 2000-10-17 2002-08-15 Shankar Narayan Question associated information storage and retrieval architecture using internet gidgets
US6665666B1 (en) 1999-10-26 2003-12-16 International Business Machines Corporation System, method and program product for answering questions using a search engine
US6804670B2 (en) 2001-08-22 2004-10-12 International Business Machines Corporation Method for automatically finding frequently asked questions in a helpdesk data set
US20040249808A1 (en) 2003-06-06 2004-12-09 Microsoft Corporation Query expansion using query logs
US20060078862A1 (en) 2004-09-27 2006-04-13 Kabushiki Kaisha Toshiba Answer support system, answer support apparatus, and answer support program
US20060136455A1 (en) 2001-10-12 2006-06-22 Microsoft Corporation Clustering Web Queries
US20070005566A1 (en) 2005-06-27 2007-01-04 Make Sence, Inc. Knowledge Correlation Search Engine
US7231384B2 (en) * 2002-10-25 2007-06-12 Sap Aktiengesellschaft Navigation tool for exploring a knowledge base
WO2007108788A2 (en) 2006-03-13 2007-09-27 Answers Corporation Method and system for answer extraction
US20080005075A1 (en) 2006-06-28 2008-01-03 Microsoft Corporation Intelligently guiding search based on user dialog
WO2008022150A2 (en) 2006-08-14 2008-02-21 Inquira, Inc. Method and apparatus for identifying and classifying query intent
US7349899B2 (en) * 2001-07-17 2008-03-25 Fujitsu Limited Document clustering device, document searching system, and FAQ preparing system
US20080228738A1 (en) * 2005-12-13 2008-09-18 Wisteme, Llc Web based open knowledge system with user-editable attributes
US20080288454A1 (en) * 2007-05-16 2008-11-20 Yahoo! Inc. Context-directed search

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6028601A (en) 1997-04-01 2000-02-22 Apple Computer, Inc. FAQ link creation between user's questions and answers
US6665666B1 (en) 1999-10-26 2003-12-16 International Business Machines Corporation System, method and program product for answering questions using a search engine
US20020111934A1 (en) * 2000-10-17 2002-08-15 Shankar Narayan Question associated information storage and retrieval architecture using internet gidgets
US20020087520A1 (en) * 2000-12-15 2002-07-04 Meyers Paul Anthony Appartus and method for connecting experts to topic areas
US7349899B2 (en) * 2001-07-17 2008-03-25 Fujitsu Limited Document clustering device, document searching system, and FAQ preparing system
US6804670B2 (en) 2001-08-22 2004-10-12 International Business Machines Corporation Method for automatically finding frequently asked questions in a helpdesk data set
US20060136455A1 (en) 2001-10-12 2006-06-22 Microsoft Corporation Clustering Web Queries
US7231384B2 (en) * 2002-10-25 2007-06-12 Sap Aktiengesellschaft Navigation tool for exploring a knowledge base
US20040249808A1 (en) 2003-06-06 2004-12-09 Microsoft Corporation Query expansion using query logs
US20060078862A1 (en) 2004-09-27 2006-04-13 Kabushiki Kaisha Toshiba Answer support system, answer support apparatus, and answer support program
US20070005566A1 (en) 2005-06-27 2007-01-04 Make Sence, Inc. Knowledge Correlation Search Engine
US20080228738A1 (en) * 2005-12-13 2008-09-18 Wisteme, Llc Web based open knowledge system with user-editable attributes
WO2007108788A2 (en) 2006-03-13 2007-09-27 Answers Corporation Method and system for answer extraction
US20080005075A1 (en) 2006-06-28 2008-01-03 Microsoft Corporation Intelligently guiding search based on user dialog
WO2008022150A2 (en) 2006-08-14 2008-02-21 Inquira, Inc. Method and apparatus for identifying and classifying query intent
US20080288454A1 (en) * 2007-05-16 2008-11-20 Yahoo! Inc. Context-directed search

Non-Patent Citations (17)

* Cited by examiner, † Cited by third party
Title
"WordNet: An Electric Lexical Data Base," Princeton University 2006, http://wordnet.princeton.edu, [Internet accessed May 15, 2008].
Aslam et al., "The Star Clustering Algorithm for Static and Dynamic Information Organization," Journal of Graph Algorithms and Applications, vol. 8, No. 1, 2004, pp. 95-129.
Burke et al., "Question Answering from Frequently-Asked Question Files: Experiences with the FAQ Finder System," The University of Chicago, Technical Report TR-97-05, Jun. 1997, pp. 1-37.
Cao et al., "Base Noun Phrase Translation Using Web Data and the EM Algorithm," International Conference on Computational Linguistics, 2002, 7 pages.
Fredkin, "Trie Memory," Communication of the ACM, vol. 3, Issue 9, Sep. 1960, pp. 490-499.
Jeon et al., "Finding Semantically Similar Questions Based on Their Answers," SIGIR'05, Aug. 15-19, 2005, Salvador, Brazil 84-90 pages. *
Jeon et al., "Finding Similar Questions in Large Question and Answer Archives," CIKM'05, Oct. 31-Nov. 5, 2005, Bremen, Germany, pp. 84-90.
Lai et al., "FAQ Mining via List Detection," International Conference on Computational Linguistics, 2002, pp. 1-7.
Li et al., "Generalizing Case Frames Using a Thesaurus and the MDL Principle," Computational Linguistics vol. 24, No. 2, 1998, pp. 217-244.
Lita et al., "Instance-Based Question Answering: A Data-Driven Approach," Association for Computational Linguistics, ACL Jul. 21-26, 2004, 8 pages.
Ponte et al., "A Language Modeling Approach to Information Retrieval," ACM SIGIR, 1998, pp. 275-281.
Rissanen, "Modeling by shortest data description," Automatica, vol. 14, 1978, pp. 465-471.
Sneiders, "Automated question answering using question templates that cover the conceptual model of the database," In Proc. of the 6th International Conference on Applications of Natural Language to Information Systems, 2002, pp. 235-239.
Wang et al., "Learn from Web Search Logs to Organize Search Results," SIGIR'07, Jul. 23-27, 2007, Amsterdam, The Netherlands, 8 pages.
Wen et al., "Clustering User Queries of a Search Engine," WWW10, May 1-5, 2001, Hong Kong, pp. 162-168.
Zamir et al., "Grouper: A Dynamic Clustering Interface to Web Search Result," The International Journal of Computer and Telecommunications Meeting, vol. 31, Issue 11-16, May 1999, 15 pages.
Zeng et al., "Learning to Cluster Web Search Results," SIGIR'04, Jul. 25-29, Sheffield, South Yorkshire, 8 pages.

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9898554B2 (en) 2013-11-18 2018-02-20 Google Inc. Implicit question query identification
US10423613B2 (en) * 2013-12-20 2019-09-24 Hitachi, Ltd. Data search method and data search system
US20150199417A1 (en) * 2014-01-10 2015-07-16 International Business Machines Corporation Seed selection in corpora compaction for natural language processing
US10210156B2 (en) * 2014-01-10 2019-02-19 International Business Machines Corporation Seed selection in corpora compaction for natural language processing
US10874355B2 (en) 2014-04-24 2020-12-29 Cognoa, Inc. Methods and apparatus to determine developmental progress with artificial intelligence and user input
US9571660B2 (en) 2014-10-10 2017-02-14 Avaya Inc. Conference call question manager
US10503786B2 (en) 2015-06-16 2019-12-10 International Business Machines Corporation Defining dynamic topic structures for topic oriented question answer systems
US10558711B2 (en) 2015-06-16 2020-02-11 International Business Machines Corporation Defining dynamic topic structures for topic oriented question answer systems
US10490094B2 (en) 2015-09-25 2019-11-26 International Business Machines Corporation Techniques for transforming questions of a question set to facilitate answer aggregation and display
US10216802B2 (en) 2015-09-28 2019-02-26 International Business Machines Corporation Presenting answers from concept-based representation of a topic oriented pipeline
US10380257B2 (en) 2015-09-28 2019-08-13 International Business Machines Corporation Generating answers from concept-based representation of a topic oriented pipeline
US9940384B2 (en) 2015-12-15 2018-04-10 International Business Machines Corporation Statistical clustering inferred from natural language to drive relevant analysis and conversation with users
US11972336B2 (en) 2015-12-18 2024-04-30 Cognoa, Inc. Machine learning platform and system for data analysis
US20180225365A1 (en) * 2017-02-08 2018-08-09 International Business Machines Corporation Dialog mechanism responsive to query context
US10740373B2 (en) * 2017-02-08 2020-08-11 International Business Machines Corporation Dialog mechanism responsive to query context
US10839950B2 (en) 2017-02-09 2020-11-17 Cognoa, Inc. Platform and system for digital personalized medicine
US10984899B2 (en) 2017-02-09 2021-04-20 Cognoa, Inc. Platform and system for digital personalized medicine
US20210271990A1 (en) * 2018-06-29 2021-09-02 Nippon Telegraph And Telephone Corporation Answer sentence selection device, method, and program
US12026632B2 (en) * 2018-06-29 2024-07-02 Nippon Telegraph And Telephone Corporation Response phrase selection device and method
US11822588B2 (en) * 2018-10-24 2023-11-21 International Business Machines Corporation Supporting passage ranking in question answering (QA) system
US11176444B2 (en) 2019-03-22 2021-11-16 Cognoa, Inc. Model optimization and data analysis using machine learning techniques
US11862339B2 (en) 2019-03-22 2024-01-02 Cognoa, Inc. Model optimization and data analysis using machine learning techniques

Also Published As

Publication number Publication date
US20100030769A1 (en) 2010-02-04

Similar Documents

Publication Publication Date Title
US8024332B2 (en) Clustering question search results based on topic and focus
US8027973B2 (en) Searching questions based on topic and focus
US7664735B2 (en) Method and system for ranking documents of a search result to improve diversity and information richness
US8250067B2 (en) Adding dominant media elements to search results
US7293007B2 (en) Method and system for identifying image relatedness using link and page layout analysis
US7529735B2 (en) Method and system for mining information based on relationships
US7249135B2 (en) Method and system for schema matching of web databases
EP1225517B1 (en) System and methods for computer based searching for relevant texts
US7877384B2 (en) Scoring relevance of a document based on image text
US8112269B2 (en) Determining utility of a question
US8571850B2 (en) Dual cross-media relevance model for image annotation
US7698332B2 (en) Projecting queries and images into a similarity space
US20080313142A1 (en) Categorization of queries
US20080027910A1 (en) Web object retrieval based on a language model
US20090043764A1 (en) Augmenting a training set for document categorization
US20050256832A1 (en) Method and system for ranking objects based on intra-type and inter-type relationships
Knaus et al. Highlighting relevant passages for users of the interactive SPIDER retrieval system
US7774340B2 (en) Method and system for calculating document importance using document classifications
Varadarajan et al. Beyond single-page web search results
Divya et al. Onto-search: An ontology based personalized mobile search engine
Liu et al. Discovering business intelligence information by comparing company Web sites
Varnaseri et al. The assessment of the effect of query expansion on improving the performance of scientific texts retrieval in Persian
Yamamoto et al. Extracting adjective facets from community Q&A corpus
Priyambiga et al. Diverse Relevance Ranking in Web Scrapping for Multimedia Answering
Daum III et al. Web search intent induction via search results partitioning

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION,WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CAO, YUNBO;LIN, CHIN-YEW;REEL/FRAME:022015/0495

Effective date: 20081006

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CAO, YUNBO;LIN, CHIN-YEW;REEL/FRAME:022015/0495

Effective date: 20081006

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

ZAAA Notice of allowance and fees due

Free format text: ORIGINAL CODE: NOA

ZAAB Notice of allowance mailed

Free format text: ORIGINAL CODE: MN/=.

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034564/0001

Effective date: 20141014

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20230920