WO2009078887A1 - System and method for categorizing answers such as urls - Google Patents

System and method for categorizing answers such as urls

Info

Publication number
WO2009078887A1
WO2009078887A1 PCT/US2008/004495 US2008004495W WO2009078887A1 WO 2009078887 A1 WO2009078887 A1 WO 2009078887A1 US 2008004495 W US2008004495 W US 2008004495W WO 2009078887 A1 WO2009078887 A1 WO 2009078887A1
Authority
WO
Grant status
Application
Patent type
Prior art keywords
computer system
category
answers
request
plurality
Prior art date
Application number
PCT/US2008/004495
Other languages
French (fr)
Inventor
Alessio Signorini
Alessandro Arzilli
Apostolos Gerasoulis
Antonino Gulli
Maurizio Sambati
Original Assignee
Iac Search & Media, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • G06F17/30864Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems
    • G06F17/30867Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems with filtering and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor ; File system structures therefor of unstructured textual data
    • G06F17/30705Clustering or classification
    • G06F17/30707Clustering or classification into predefined classes

Abstract

The invention provides a computer system, including a memory, a plurality of answers stored in the memory, and a classifier for matching each one of the answers with a category among a plurality of categories.

Description

SYSTEM AND METHOD FOR CATEGORIZING ANSWERS SUCH AS

URLS

BACKGROUND OF THE INVENTION

[0001] The internet is often used to obtain information regarding news, businesses, events, movies, etc. in a specific geographic area. A user interface is typically stored on a server computer system and transmitted over the internet to a client computer system. The user interface typically has a search box for entering a text search query. A user can then select a search button to transmit a search request from the client computer system to the server computer system. The server computer system then compares the text with data in a database or data source and extracts information based on the text from the database or data source. The information includes uniform resource locators (URLs) or other answers pertaining to the text search query. The information is then transmitted from the server computer system to the client computer system for display at the client computer system.

SUMMARY OF THE INVENTION

[0002] The invention provides a computer system, including a memory, a plurality of answers stored in the memory, and a classifier for matching each one of the answers with a category among a plurality of categories.

[0003] The computer system may further include a correlator utilizing the answer to extract at least one data query corresponding to the answer, the classifier matching the data query with the category.

[0004] The correlator may match the answer to the category according to the method including extracting a related query corresponding to the data query, and matching the related query to the category.

[0005] The correlator may match the answer to the category according to the method including extracting a plurality of related queries corresponding to the data query, and the categorizer matching each related query to a category, further including a statistical tool determining the relevance of each category.

[0006] The answers may be received over a network before storing the answers.

[0007] The answers for more frequently used categories may be updated more often than answers for categories used less often.

[0008] One of the categories may be a spam category, and answers in the spam category may not be downloaded.

[0009] The computer system may further include an indexer indexing the answers received over the network into indexed answers, the indexed answers being stored in the memory.

[0010] The indexer may index the answers into the categories.

[0011] The computer system may further include a search engine receiving a request from a client computer system at a server computer system and, in response to the request, transmitting a view from the server computer system to the client computer system for display at the client computer system, contents of the view being at least partially based on one selected category of the categories.

[0012] The request may be a search request, the classifier matching the request with a category among the plurality of categories, and associating at least one of a plurality of answers with the request due to association of the request and the answer with the select category.

[0013] The view may include different category areas, answers belonging to different categories being located in the respective category areas. [0014] The category may be used to select the answer based on a media type of the answer.

[0015] The category may be used to select the answer based on a freshness of the answer.

[0016] The computer system may further include a correlator extracting a related query corresponding to the request, wherein the classifier matches the request with a category by matching the related query to the category.

[0017] The correlator may match the request to the category according to the method including extracting a plurality of related queries corresponding to the request, the categorizer matching each related query to a category, further including a statistical tool determining the relevance of each category.

[0018] The search engine may transmit a first view from a server computer system to the client computer system, the first view including a search identifier, the search engine receiving a search request from a client computer system at the server computer system and utilizing the search request at the server computer system to extract at least one search result from the answers, and transmitting at least part of a second view from the server computer system to the client computer system for display at the client computer system, wherein the second view includes the search result.

[0019] An advertisement may be selected among a plurality of advertisements based on the select category.

[0020] The request may be a browsing request based on the selected category selected at the client computer system among at least a subset of the categories.

[0021] The invention also provides a computer method, including storing a plurality of answers in memory of a computer, and matching each one of a plurality of the answers with a category among a plurality of categories. [0022] In the method, each answer may be matched with a category- according to the method including utilizing the answer to extract at least one data query corresponding to the answer, and matching the data query with the category.

[0023] The method of matching the answer to the category may further include extracting a related query corresponding to the data query, and matching the related query to the category.

[0024] The invention also provides a computer-readable medium having stored thereon a set of instructions which, when executed by a processor of a computer, executes the method including storing a plurality of answers in memory of a computer, and matching each one of a plurality of the answers with a category among a plurality of categories.

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] The invention is further described by way of example with reference to the accompanying drawings, wherein:

[0026] Figure 1 is a block diagram of a network environment in which a user interface according to an embodiment of the invention may find application;

[0027] Figure 2 is a flowchart illustrating how the network environment is used to search and find information;

[0028] Figure 3 is a block diagram of a client computer system forming an area of the network environment, but may also be a block diagram of a computer in a server computer system forming area of the network environment;

[0029] Figure 4 is a block diagram illustrating a Query-to-Pick and Pick-to-

Query correlation; [0030] Figure 5 is a block diagram illustrating a classifier learning phase;

[0031] Figure 6 is a block diagram illustrating a Query-to-Query correlation;

[0032] Figure 7 is a block diagram illustrating a URL (Answer) classification;

[0033] Figure 8 is a block diagram illustrating a query classification;

[0034] Figure 9 is a screenshot showing a view wherein a URL and query classification have been used to return results;

[0035] Figure 9A illustrates that answers for different categories are located at different physical areas on a results page;

[0036] Figure 10 is a block diagram illustrating integration of the subsystems of Figures 7 and 8;

[0037] Figure 11 is a block diagram illustrating how a profile of a client computer system can be used to further improve search results;

[0038] Figure 12 is a diagram illustrating how an advertising engine utilizes categories to select an advertisement or advertisements;

[0039] Figure 13 is a block diagram illustrating how a crawler can make use of categories to download certain answers at a higher rate, and can make use of the categories to store the answers in different categories or partitions;

[0040] Figure 14 is a block diagram illustrating how a profile is created for each category;

[0041] Figure 15A illustrates how a page is ranked in different categories using traditional ranking as herein described; and

[0042] Figure 15B illustrates how a page is ranked higher in some categories than in others, utilizing a modified ranking system as herein described. DETAILED DESCRIPTION OF THE INVENTION

[0043] Figure 1 of the accompanying drawings illustrates a network environment 10 that includes a user interface 12, according to an embodiment of the invention, including the internet 14A, 14B, and 14C, a server computer system 16, a plurality of client computer systems 18, and a plurality of remote sites 20.

[0044] The server computer system 16 has stored thereon a crawler 19, a collected data store 21, an indexer 22, a plurality of search databases 24, a plurality of structured databases and data sources 26, a search engine 28, and the user interface 12. The novelty of the present invention revolves around the user interface 12, the search engine 28, and one or more of the structured databases and data sources 26. The crawler 19 is connected over the internet 14A to the remote sites 20. The collected data store 21 is connected to the crawler 19, and the indexer 22 is connected to the collected data store 21. The search databases 24 are connected to the indexer 22. The search engine 28 is connected to the search databases 24 and the structured databases and data sources 26. The client computer systems 18 are located at respective client sites and are connected over the internet 14B and the user interface 12 to the search engine 28.

[0045] Reference is now made to Figures 1 and 2 in combination to describe the functioning of the network environment 10. The crawler 19 periodically accesses the remote sites 20 over the internet 14 A (step 30). The crawler 19 collects data from the remote sites 20 and stores the data in the collected data store 21 (step 32). The indexer 22 indexes the data in the collected data store 21 and stores the indexed data in the search databases 24 (step 34). The search databases 24 may, for example, be a "Web" database, a "News" database, a "Blogs & Feeds" database, an "Images" database, etc. The structured databases and data sources 26 are licensed from third-party providers and may, for example, include an encyclopedia, a dictionary, maps, a movies database, etc.

[0046] A user at one of the client computer systems 18 accesses the user interface 12 over the internet 14B (step 36). The user can enter a search query in a search box in the user interface 12, and either hit "Enter" on a keyboard or select a "Search" button or a "Go" button of the user interface 12 (step 38). The search engine 28 then uses the "Search" query to parse the search databases 24 or the structured databases and data sources 26. In the example of where a "Web" search is conducted, the search engine 28 parses the search database 24 having general Internet Web data (step 40). Various technologies exist for comparing or using a search query to extract data from databases, as will be understood by a person skilled in the art. [0047] The search engine 28 then transmits the extracted data over the internet 14B to the client computer system 18 (step 42). The extracted data typically includes URL links to one or more of the remote sites 20. The user at the client computer system 18 can select one of the links to the remote sites 20 and access the respective remote site 20 over the internet 14C (step 44). The server computer system 16 has thus assisted the user at the respective client computer system 18 to find or select one of the remote sites 20 that have data pertaining to the query entered by the user. [0048] Figure 3 shows a diagrammatic representation of a machine in the exemplary form of one of the client computer systems 18 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a network deployment, the machine may operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. The server computer system 16 of Figure 1 may also include one or more machines as shown in Figure 3. [0049] The exemplary client computer system 18 includes a processor 130 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both), a main memory 132 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.), and a static memory 134 (e.g., flash memory, static random access memory (SRAM), etc.), which communicate with each other via a bus 136.

[0050] The client computer system 18 may further include a video display 138 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The client computer system 18 also includes an alpha-numeric input device 140 (e.g., a keyboard), a cursor control device 142 (e.g., a mouse), a disk drive unit 144, a signal generation device 146 (e.g., a speaker), and a network interface device 148.

[0051] The disk drive unit 144 includes a machine-readable medium 150 on which is stored one or more sets of instructions 152 (e.g., software) embodying any one or more of the methodologies or functions described herein. The software may also reside, completely or at least partially, within the main memory 132 and/or within the processor 130 during execution thereof by the client computer system 18, the memory 132 and the processor 130 also constituting machine readable media. The software may further be transmitted or received over a network 154 via the network interface device 148.

[0052] While the instructions 152 are shown in an exemplary embodiment to be on a single medium, the term "machine-readable medium" should be taken to understand a single medium or multiple media (e.g., a centralized or distributed database or data source and/or associated caches and servers) that store the one or more sets of instructions. The term "machine-readable medium" shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that caused the machine to perform any one or more of the methodologies of the present invention. The term "machine-readable medium" shall accordingly be taken to include, but not be limited to, solid- state memories, optical and magnetic media, and carrier wave signals.

Ouery-to-Pick and Pick-to-Ouery Correlation

[0053] Figure 4 illustrates how queries and picks are associated with one another with the use of a correlator 160 connected to the search engine 28. A query-to-pick (Q2P) correlation associates a query with a pick. When multiple independent users make the same association, that is a correlation candidate. When the search engine 28 returns a result in response to a query and a user picks that result, this is a special case of such a correlation (Q2RP). In effect, the search engine algorithm replaces a second independent user. In accordance with one embodiment of the invention, the Q2P correlation associates a query with all picks in a user session. This is in contrast to prior art schemes that terminated association of a given query with picks upon issuance of a subsequent query.

[0054] With Q2P, all picks recorded during a user session are associated with a given query issued during that user session. For one embodiment, a score is assigned to each association, based upon various factors, including the time between query and pick, the number of intervening queries and/or picks, and the order of queries with respect to picks. [0055] In addition, each association's score can be adjusted based upon well-known factors, including rank of the pick in the result list at the time of association, duration of the pick (interval until next known user action), age or order of the association (relative to older or newer associations), and age of the first known instance of association.

[0056] Each user session can be of infinite duration. In a practical application, a reasonable time limit, or limit on intervening actions, should be imposed beyond which no relationship between picks and queries will be assigned. Alternatively or additionally, an interruption of sufficient duration can indicate a break in sessions. A search log excerpt, in accordance with one embodiment of the invention, is shown below as Table 1. In various alternative embodiments, any other items could be captured in the search log, but are excluded here for clarity:

Table 1 (* = query with no associated pick)

[0057] Table IA illustrates a tabulation of the click information contained in Table 1 in accordance with an embodiment of the invention. For comparison, Table IB illustrates a tabulation of the click information contained in Table 1 in accordance with a typical prior art scheme employing a Q2RP correlation:

Table IA (Q2P Results)

Table IB (Q2RP Results of Prior Art)

[0058] Due to the fact that numerous factors can vary or penalize the scores, we will assume 1 pick = a score increment of +1, except for the following penalization situations, where we will assume the pick represents a score increment of 0. Assuming a time threshold, the click in row 103 is penalized in both tabulations due to the user spending a very short time at the URL. Assuming daily database batch updates, the click in row 203 would typically be penalized by the prior art tabulation of Table IB as a duplicate of click 201. The clicks in rows 203 and 402 are penalized by the tabulation, in accordance with an embodiment of the invention, as duplicates of click 201.

[0059] For Query Ql, URL Pl, which was never clicked immediately subsequent to Ql, has garnered a high score in the tabulation, in accordance with an embodiment of the invention, because multiple users chose it before or after - though not immediately after - issuing Query Ql. The whole matrix of scores for the tabulation, in accordance with an embodiment of the invention, is richer, as many more associations are noted. Some scores, such as that for Q2P4, are lower, due to the retention of session data indicating that all the clicks came from a single user, permitting the identification of more duplicates.

[0060] In practical applications of Q2P, we can retain the distinction as to whether a particular association was Q2RP or non-Q2RP. A single, uncorrelated non-Q2RP click (such as Q3P1 in the table) may not inspire enough confidence to release the result to users, whereas for a single, uncorrelated Q2RP click, the association is reinforced by the fact that the search engine presented the result for the original search.

[0061] A pick-to-query (P2Q) correlation associates all queries recorded during a user session that are correlated with a given pick issued during that user session. The search log excerpt of Table 1 illustrates the output of P2Q correlation. That is, the same data generated for Q2P can be re-indexed for

P2Q.

[0062] Further details of Q2P and P2Q are described in United States

Patent No. 7,181,447, which is incorporated herein by reference in its entirety.

Classifier Learning Phase

[0063] Figure 5 shows the learning phase of a classifier 162. For each category, editors collect a large number of related documents 164 and store the documents in a repository 166. The classifier 162 reads the related documents 164. and learns to recognize their important features. Important features may, for example, be identified when a particular word appears in a large percentage of the documents 164 for a particular category.

Ouery-to-Ouery Correlation

[0064] Figure 6 illustrates how queries and queries are associated with one another with the use of the same correlator 160 connected to the search engine 28. A query-to-query (Q2Q) correlation associates all queries issued during a user session with all other queries issued during that session. For one embodiment, a score may be assigned to each association based upon various factors, including the time between queries, the number of intervening queries and/or picks, age or order of the association (relative to older or newer associations), whether or not the query results generated picks, and the pair-wise order of the associated queries, among others. [0065] Determining if the query results generated picks, as well as the pair- wise order of the associated queries, can be particularly informative, as they can indicate whether one query is a "correction" of another. For any practical application, it is useful to know which of two associated queries is an error, and which is a correction.

[0066] A search log excerpt, in accordance with one embodiment of the invention, is shown below as Table 2. Only the query portion of the search log is required to create a Q2Q table: mm pTimestampi IffcjlerjD - iQκMp

101 1/1/03 00:00:00 Ul Ql

102 1/1/03 00:01:00 Q2

103 1/1/03 00:02:00

104 1/1/03 00:02:05 iRowl SS !»'#F»! tUleriID

201 1/2/03 00:00:00 U2 Q2

202 1/2/03 00:01:00

203 1/2/03 00:02:00

204 1/2/03 00:04:00 Ql

205 1/2/03 00:04:05

Row! Timestamp

301 1/3/03 00:00:00 U3 Q3

302 1/3/03 00:04:00 Q2

303 1/3/03 02:00:00 Q3

:ROW§ Mpmestamp |Use«IBi:

401 1/4/03 00:00:00 U2 Qi

402 1/4/03 00:06:00 Q2

Table 2

[0067] Table 2A illustrates a tabulation of the click information contained in Table 2 in accordance with an embodiment of the invention (assuming the order of queries issued is ignored):

Table 2A (Q2Q Results)

[0068] The lower triangular area of Table 2A can be used to retain the pair- wise query order information, avoiding double-booking cases like rows 301- 303.

[0069] As noted above, a scoring scheme may be employed in which numerous factors can vary or penalize the score. For example, duplicates (e.g., association in rows 101 and 102 and associations made in rows 401 and 402) could be penalized. Or, for example, an uncorrelated Q2Q association, like Q2Q3, would not inspire enough confidence to release the result to users.

URL (Answer) Categorization

[0070] Figure 7 illustrates categorization of a plurality of answers in the form of URLs of the search database 24 in Figure 1 stored in the memory 132 in Figure 3.

[0071] The same correlator 160 used in Figure 4 utilizes each URL to extract a plurality of data queries using P2Q as described with reference to

Figure 4 corresponding to the URL. The same correlator 160 then utilizes each data query to extracting a plurality of related queries corresponding to the respective data query using Q2Q as described with reference to Figure 6.

[0072] The classifier 162 then matches the data query and each one of the related queries with a respective category utilizing the features identified for each category. A statistical tool 164 is then used to extract the most likely category among all the categories utilizing interpolation of the categories.

The classifier thus matches each one of the URLs with a category among a plurality of categories.

[0073] Table 3 illustrates a P2Q correlation and classification for the URL http://www.apple.com/itunes/:

Query; CClassificat*brύ . WΨΨWβJψ ipod nano Consumer_Electronics # 86% » MP3_Players # 89% itunes Consumer_Electronics # 0% » MP3_Players # 0% itunes music store Consumer_Electronics # 53% » MP3_Players # 62% apple itunes Consumer_Electronics # 68% » MP3_Players # 93% download itunes Consumer_Electronics # 20% » MP3_Players # 18% itunes help Consumer_Electronics # 19% » MP3_Players # 66% apple itunes Consumer_Electronics # 62% » MP3_Players # 90% burn cds off internet for free Computers # 14% » Software # 9% free downloads itunes Computers # 22% » Software # 8% what is itunes Consumer_Electronics # 0% » MP3_Players # 0% itunes store Consumer_Electronics # 46% » MP3_Players # 49% what are itunes Consumer_Electronics # 0% » MP3_Players # 0%

Table 3

[0074] The statistical tool 164 has thus classified each one of the correlated queries according to a degree of confidence.

[0075] The statistical tool 164 then proceeds to determine the most relevant category or categories among the categories in Table 3. In the present example, the most relevant categories are as follows: [0076] Level 1: Consumer_Electronics (3.54), Computers (0.36); [0077] Level 2: Consumer_Electronics/MP3_Players (2.67), Computers/Software (0.03).

Query Classification

[0078] Figure 8 illustrates categorization of a plurality of a search request received at the client computer system 18. Figure 4, Figure 5, Figure 6, and Figure 7 are carried out ahead of time, and Figure 8 is carried out almost in real time when a search request is received.

[0079] The same correlator 160 utilizes the search request to extract a plurality of related queries using Q2Q as described with reference to Figure 7 corresponding to the search request. The classifier 160 then matches the search request and each one of the related queries with a respective category utilizing the features identified for each category. A statistical tool 170 is then used to extract the most likely category among all the categories utilizing a stochastic method. The classifier thus matches the search request with a category among a plurality of categories.

Search Results Based on Classification

[0080] Search results are generated as hereinbefore described with a reference to Figure 2. The search results include a plurality of URLs. The most relevant category for the query is used to provide URLs that are primarily in the same category as the query.

[0081] Figure 9 illustrates search results for the search request ("oscar"). In this example, a Q2Q categorization has determined that the search request is primarily for computers, entertainment, or education, and secondarily, perhaps, for health and sports. As such, URLs in the categories "computer," "education," and "entertainment" are primarily provided, followed by "health" and "sports." A browser is used to display a user interface that includes the search results. The browser 160 may, for example, be an Internet Explorer™, Firefox™, Netscape™, or any other browser. The browser has an address box, a viewing pane 166, and various buttons such as back and forward buttons. The browser is loaded on a computer at the client computer system 18 of Figure 1. A user at the client computer system 18 can load the browser into memory, so that the browser is displayed on a screen such as the video display 138 in Figure 3.

[0082] As better illustrated in Figure 9A, different categories are placed at different physical locations. Search results are also separately ranked within the physical area of each category.

[0083] In Figure 10, the same correlator is indicated with reference numerals 160A and 160B, and the same classifier is indicated with reference numerals 162 A and 162B. The correlator 160A, classifier 162 A, and the statistical tool 164 form part of a categorizer 180 in a learning phase 182 of the system. The same categorizer 180 is used a plurality of times. The statistical tool 164 is a low-level statistical tool 164. A high-level statistical tool 184 is used to combine the data from the multiple uses of the categorizer 180. An output from the high-level statistical tool 184 is used to develop a categorized database 186.

[0084] The correlator 160B, classifier 162B, and the statistical tool 170 form another categorizer 190 in a real-time phase 192 of the system. The real-time phase 192 also includes a look-up module 194 that retrieves categories from the categorized database 186 based on an output of the statistical tool 170. [0085] The categorizer 180 and the learning phase 182 are thus the same as in Figure 7. The categorizer 190 in the real-time phase 192 is the same as in Figure 8.

[0086] As shown in Figure 11, a profile 200 for a client computer system 18 is established. The profile 200 is based on queries that are received from the client computer system 18 (Query 1, Query 2... Query N) and selections that are made (e.g., click on link 4, category B). When the same client computer system 18 is used to submit a query on a search page 202, and the query is categorized in the categorizer 190, multiple factors are used to determine relevant pages for a results page 204. The factors that are taken into account for the results page 204 include the query and category from the categorizer 190, relevant pages and categories from the categorizer 180 received over the internet 14A, and the profile 200 of the client computer system 18. Certain pages may, for example, be ranked higher than other pages, than when the profile 200 is not used for ranking the pages. [0087] As shown in Figure 12, following entry of a query in a search page and categorization by the categorizer 190, an advertising engine 206 utilizes the query and the category to extract an advertisement or advertisements from a plurality of advertisements. The advertisement or advertisements that are selected utilizing the category are different than advertisements that are selected without using the category. The same query and category are used to extract relevant pages from a search database 24. The relevant pages and selected advertisements are provided together on the results page 204. The results page 204 is then transmitted back to the client computer system 18.

[0088] As further illustrated in Figure 13, the crawler 19 can use a statistical tool 208 to determine which categories are searched the most by users. The crawler 19 utilizes the statistics provided by the statistical tool 208 to download web pages belonging to certain categories more frequently than web pages belonging to other categories. The categories that are downloaded more often are typically the ones that are searched more often by users. A spam category can also be created, and downloads of web pages belonging to the spam category can be avoided or be totally eliminated.

[0089] The crawler 19 can also store the downloaded pages in separate categories, even separate partitions (Parti, Part2 ... PartN). By storing the web pages in separate categories, retrieval speed can be increased. [0090] Figure 14 illustrates that a profile 216 is created for each category. In the present example, the profile 216 is a profile for sports queries. A separate profile (not shown) is created for each category. Queries are received from different client computer systems 18, and are categorized as "sports queries," as hereinbefore described. Results are also returned to the client computer systems 18. Users at the client computer systems 18 then select answers from the results. The answers may be links to web pages, images, video, or other media types. The answers also differ in their freshness. The profile 216 that is built reflects the media types that are most frequently selected for the particular category "sports queries," and also reflects the freshness of the answers that are more frequently selected. In the present example, the profile 216 for "sports queries" may reflect that users typically select fresher content, i.e., content within the last week as opposed to content that is more than ten years old. The profile 216 may also reflect that users select web pages approximately 40% of the time, images approximately 30% of the time, and videos approximately 30% of the time. When results are provided to future users of client computer systems such as the client computer systems 18, web pages, images, and videos are provided to such client computer systems in the same ratio as reflected in the profile 216 for "sports queries."

[0091] As shown in Figure 15 A, traditional ranking of answers does not take into account the category of a web page or other answer. Figure 15B illustrates that the scores of a page may differ, depending on the category. The same page A can, for example, belong to categories A, B, and C. When the page A is provided within category B, the page A would have a lower ranking than when the page A is provided in category C. [0092] While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative and not restrictive of the current invention, and that this invention is not restricted to the specific constructions and arrangements shown and described since modifications may occur to those ordinarily skilled in the art.

Claims

CLAIMSWhat is claimed:
1. A computer system comprising: a memory; a plurality of answers stored in the memory; and a classifier for matching each one of the answers with a category among a plurality of categories.
2. The computer system of claim 1, further comprising: a correlator utilizing the answer to extract at least one data query corresponding to the answer, the classifier matching the data query with the category.
3. The computer system of claim 2, wherein the correlator matches the answer to the category according to the method comprising: extracting a related query corresponding to the data query; and matching the related query to the category.
4. The computer system of claim 3, wherein the correlator matches the answer to the category according to the method comprising: extracting a plurality of related queries corresponding to the data query; and the categorizer matching each related query to a category, further comprising: a statistical tool determining the relevance of each category.
5. The computer system of claim 1, wherein the answers are received over a network before storing the answers.
6. The computer system of claim 5, wherein answers for more frequently used categories are updated more often than answers for categories used less often.
7. The computer system of claim 6, wherein one of the categories is a spam category and answers in the spam category are not downloaded.
8. The computer system of claim 5, further comprising: an indexer indexing the answers received over the network into indexed answers, the indexed answers being stored in the memory.
9. The computer system of claim 8, wherein the indexer indexes the answers into the categories.
10. The computer system of claim 1, further comprising: a search engine receiving a request from a client computer system at a server computer system and, in response to the request, transmitting a view from the server computer system to the client computer system for display at the client computer system, contents of the view being at least partially based on one selected category of the categories.
11. The computer system of claim 10, wherein the request is a search request, the classifier matching the request with a category among the plurality of categories and associating at least one of a plurality of answers with the request due to association of the request and the answer with the select category.
12. The computer system of claim 11, wherein the view includes different category areas, answers belonging to different categories being located in the respective category areas.
13. The computer system of claim 11, wherein the category is used to select the answer based on a media type of the answer.
14. The computer system of claim 11, wherein the category is used to select the answer based on a freshness of the answer.
15. The computer system of claim 11, further comprising a correlator extracting a related query corresponding to the request, wherein the classifier matches the request with a category by matching the related query to the category.
16. The computer system of claim 15, wherein the correlator matches the request to the category according to the method comprising: extracting a plurality of related queries corresponding to the request; and the categorizer matching each related query to a category, further comprising: a statistical tool determining the relevance of each category.
17. The computer system of claim 11, wherein the search engine transmits a first view from a server computer system to the client computer system, the first view including a search identifier, the search engine receiving a search request from a client computer system at the server computer system and utilizing the search request at the server computer system to extract at least one search result from the answers, and transmitting at least part of a second view from the server computer system to the client computer system for display at the client computer system, wherein the second view includes the search result.
18. The computer system of claim 10, wherein an advertisement is selected among a plurality of advertisements based on the select category.
19. The computer system of claim 10, wherein the request is a browsing request based on the selected category selected at the client computer system among at least a subset of the categories.
20. A computer method, comprising: storing a plurality of answers in memory of a computer; and matching each one of a plurality of the answers with a category among a plurality of categories.
21. The method of claim 20, wherein each answer is matched with a category according to the method comprising: utilizing the answer to extract at least one data query corresponding to the answer; and matching the data query with the category.
22. The method of claim 21, wherein the method of matching the answer to the category further comprises: extracting a related query corresponding to the data query; and matching the related query to the category.
23. A computer-readable medium having stored thereon a set of instructions which, when executed by a processor of a computer, executes the method comprising: storing a plurality of answers in memory of a computer; and matching each one of a plurality of the answers with a category among a plurality of categories.
PCT/US2008/004495 2007-12-17 2008-04-07 System and method for categorizing answers such as urls WO2009078887A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/958,322 2007-12-17
US11958322 US9239882B2 (en) 2007-12-17 2007-12-17 System and method for categorizing answers such as URLs

Publications (1)

Publication Number Publication Date
WO2009078887A1 true true WO2009078887A1 (en) 2009-06-25

Family

ID=40754575

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/004495 WO2009078887A1 (en) 2007-12-17 2008-04-07 System and method for categorizing answers such as urls

Country Status (2)

Country Link
US (1) US9239882B2 (en)
WO (1) WO2009078887A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090327235A1 (en) * 2008-06-27 2009-12-31 Google Inc. Presenting references with answers in forums
CN102033877A (en) 2009-09-27 2011-04-27 阿里巴巴集团控股有限公司 Search method and apparatus
US9160680B1 (en) 2014-11-18 2015-10-13 Kaspersky Lab Zao System and method for dynamic network resource categorization re-assignment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040260677A1 (en) * 2003-06-17 2004-12-23 Radhika Malpani Search query categorization for business listings search
US20060161579A1 (en) * 2005-01-20 2006-07-20 Pi Corporation Data storage and retrieval system with parameterized category definitions for families of categories and dynamically generated search indices

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7107226B1 (en) * 1999-01-20 2006-09-12 Net32.Com, Inc. Internet-based on-line comparison shopping system and method of interactive purchase and sale of products
US6823332B2 (en) * 1999-12-23 2004-11-23 Larry L Russell Information storage and retrieval device
US6574616B1 (en) * 2000-02-16 2003-06-03 Index Stock Imagery, Inc. Stochastic visually based image query and retrieval system
US7194454B2 (en) * 2001-03-12 2007-03-20 Lucent Technologies Method for organizing records of database search activity by topical relevance
US7440964B2 (en) * 2003-08-29 2008-10-21 Vortaloptics, Inc. Method, device and software for querying and presenting search results
US7346839B2 (en) * 2003-09-30 2008-03-18 Google Inc. Information retrieval based on historical data
US7181447B2 (en) 2003-12-08 2007-02-20 Iac Search And Media, Inc. Methods and systems for conceptually organizing and presenting information
US7689543B2 (en) * 2004-03-11 2010-03-30 International Business Machines Corporation Search engine providing match and alternative answers using cumulative probability values
US20050222903A1 (en) * 2004-03-31 2005-10-06 Paul Buchheit Rendering content-targeted ads with e-mail
US7519581B2 (en) * 2004-04-30 2009-04-14 Yahoo! Inc. Method and apparatus for performing a search
US7680901B2 (en) * 2004-09-17 2010-03-16 Go Daddy Group, Inc. Customize a user interface of a web page using an expertise level rules engine
US20060195442A1 (en) * 2005-02-03 2006-08-31 Cone Julian M Network promotional system and method
KR100721406B1 (en) * 2005-07-27 2007-05-23 엔에이치엔(주) Product searching system and method using search logic according to each category
US20100121705A1 (en) * 2005-11-14 2010-05-13 Jumptap, Inc. Presentation of Sponsored Content Based on Device Characteristics
US20070192305A1 (en) * 2006-01-27 2007-08-16 William Derek Finley Search term suggestion method based on analysis of correlated data in three dimensions
US20070239682A1 (en) * 2006-04-06 2007-10-11 Arellanes Paul T System and method for browser context based search disambiguation using a viewed content history
US7693865B2 (en) * 2006-08-30 2010-04-06 Yahoo! Inc. Techniques for navigational query identification
US8078625B1 (en) * 2006-09-11 2011-12-13 Aol Inc. URL-based content categorization
US20080140641A1 (en) * 2006-12-07 2008-06-12 Yahoo! Inc. Knowledge and interests based search term ranking for search results validation
US7720826B2 (en) * 2006-12-29 2010-05-18 Sap Ag Performing a query for a rule in a database
US20090089373A1 (en) * 2007-09-28 2009-04-02 Yahoo! Inc. System and method for identifying spam hosts using stacked graphical learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040260677A1 (en) * 2003-06-17 2004-12-23 Radhika Malpani Search query categorization for business listings search
US20060161579A1 (en) * 2005-01-20 2006-07-20 Pi Corporation Data storage and retrieval system with parameterized category definitions for families of categories and dynamically generated search indices

Also Published As

Publication number Publication date Type
US9239882B2 (en) 2016-01-19 grant
US20090157640A1 (en) 2009-06-18 application

Similar Documents

Publication Publication Date Title
Davison Recognizing nepotistic links on the web
Lacerda et al. Learning to advertise
US7305389B2 (en) Content propagation for enhanced document retrieval
Chirita et al. Using ODP metadata to personalize search
US7647306B2 (en) Using community annotations as anchortext
Jansen et al. Determining the informational, navigational, and transactional intent of Web queries
Malouf et al. Taking sides: User classification for informal online political discourse
US8095582B2 (en) Dynamic search engine results employing user behavior
US8346701B2 (en) Answer ranking in community question-answering sites
Suryanto et al. Quality-aware collaborative question answering: methods and evaluation
US7664734B2 (en) Systems and methods for generating multiple implicit search queries
US7617176B2 (en) Query-based snippet clustering for search result grouping
US20070239701A1 (en) System and method for prioritizing websites during a webcrawling process
US20090182727A1 (en) System and method for generating tag cloud in user collaboration websites
US20100023506A1 (en) Augmenting online content with additional content relevant to user interests
US20080154883A1 (en) System and method for evaluating sentiment
US20100306249A1 (en) Social network systems and methods
US20110167054A1 (en) Automated discovery aggregation and organization of subject area discussions
US7693827B2 (en) Personalization of placed content ordering in search results
Tsagkias et al. Linking online news and social media
US7984056B1 (en) System for facilitating discovery and management of feeds
US7676507B2 (en) Methods and systems for searching and associating information resources such as web pages
US7685200B2 (en) Ranking and suggesting candidate objects
US20090313237A1 (en) Generating query suggestions from semantic relationships in content
US7636714B1 (en) Determining query term synonyms within query context

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08742610

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct app. not ent. europ. phase

Ref document number: 08742610

Country of ref document: EP

Kind code of ref document: A1