WO2007011140A1 - Method of extracting topics and issues and method and apparatus for providing search results based on topics and issues - Google Patents

Method of extracting topics and issues and method and apparatus for providing search results based on topics and issues Download PDF

Info

Publication number
WO2007011140A1
WO2007011140A1 PCT/KR2006/002787 KR2006002787W WO2007011140A1 WO 2007011140 A1 WO2007011140 A1 WO 2007011140A1 KR 2006002787 W KR2006002787 W KR 2006002787W WO 2007011140 A1 WO2007011140 A1 WO 2007011140A1
Authority
WO
WIPO (PCT)
Prior art keywords
candidate phrases
phrases
extracting
documents
secondary candidate
Prior art date
Application number
PCT/KR2006/002787
Other languages
French (fr)
Inventor
Eun-Young Lee
Mi-Na Han
Eui-Vin Park
Sung-Jin Lee
Hoon-Seok Son
Joong-Ho Shin
Original Assignee
Chutnoon Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to KR10-2005-0064515 priority Critical
Priority to KR20050064515 priority
Application filed by Chutnoon Inc. filed Critical Chutnoon Inc.
Publication of WO2007011140A1 publication Critical patent/WO2007011140A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Abstract

Disclosed is a method of displaying search results with respect to a search word, including: (a) referring to words contained in titles or content of search results matching with the search word to calculate similarities between the search results according to a predetermined similarity calculation method, and extracting representative phrases among combinations of words repeatedly contained in similar search results; and (b) displaying the representative phrases and the search results that belong to each of the representative phrases.

Description

Description

METHOD OF EXTRACTING TOPICS AND ISSUES AND

METHOD AND APPARATUS FOR PROVIDING SEARCH

RESULTS BASED ON TOPICS AND ISSUES

Technical Field

[1] The present invention relates to an information search technology and, more particularly, to a method and apparatus for extracting topics from search results and providing the search results based on the topics, and a method and apparatus for selecting and providing frequently appearing search results as issues.

Background Art

[2] A conventional search system groups search results into groups based on their types, sequentially provides the search results based on similarities with search words, or places search results that are most similar to the search words at the top of search pages.

[3] However, there is a problem in the conventional search system in that too many redundant search results appear and most of the search results are useless since users tend to view only a few search results appearing at the top of the search pages. Disclosure of Invention Technical Solution

[4] The present invention provides a method and apparatus for searching for information based on topics by extracting phrases constituting search results to select topics and outputting the search results topic-by-topic so that users can obtain desired information more easily.

[5] The present invention further provides a method of searching for information based on issues by outputting Internet search results as issues in order of appearance frequencies of the search results. Advantageous Effects

[6] According to the present invention, users can use search results more efficiently since the users can easily grasp the search results and are not provided with repeated search results. Brief Description of the Drawings

[7] The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:

[8] Fig. 1 is a view for explaining a method of providing search results based on topics according to an embodiment of the present invention;

[9] Fig. 2 is a flow chart of a method of extracting topics according to an embodiment of the present invention;

[10] Figs. 3 to 9 are views for explaining a method of extracting topics according to an embodiment of the present invention;

[11] Fig. 10 is a flow chart of a method of extracting issues according to an embodiment of the present invention;

[12] Figs. 11 to 13 are views for explaining a method of extracting issues according to an embodiment of the present invention;

[13] Fig. 14 is an issue output result;

[14] Fig. 15 is another issue output result; and

[15] Fig. 16 is a block diagram of an information search apparatus according to an embodiment of the present invention. Best Mode for Carrying Out the Invention

[16] According to an aspect of the present invention, there is provided a method of displaying search results with respect to a search word, including: (a) referring to words contained in titles or content of search results matching with the search word to calculate similarities between the search results according to a predetermined similarity calculation method, and extracting representative phrases among combinations of words repeatedly contained in similar search results; and (b) displaying the representative phrases and the search results that belong to each of the representative phrases.

[17] According to another aspect of the present invention, there is provided a method of extracting topics, including: (a) assigning document IDs to documents with respect to a search word based on appearance orders of the documents, and extracting documents with document IDs less than a predetermined value; (b) extracting words contained in titles or content of the extracted documents and appearance frequencies of the words; (c) extracting primary candidate phrases composed of words of appearance frequencies greater than a predetermined value appearing consecutively in the titles or content of the documents; (d) generating secondary candidate phrases from combinations of phrases composed of the words constituting the primary candidate phrases, and calculating weight values of the secondary candidate phrases; (e) calculating similarities between secondary candidate phrases with weight values greater than a predetermined value by use of vectors consisting of document IDs of documents belonging to the secondary candidate phrases; and (f) eliminating secondary candidate phrases with low weight values among the secondary candidate phrases with similarities greater than a predetermined value, and setting the remaining secondary candidate phrases as topics.

[ 18] According to another aspect of the present invention, there is provided a method of extracting issues, including: (a) extracting the same or similar data the number of which is greater than a predetermined threshold value among stored data; and (b) extracting as issue data a plurality of high-ranking data among the extracted data and displaying the issue data in order of writing time of the issue data or in order of a number of similar documents.

[19] According to another aspect of the present invention, there is provided an apparatus for providing search services based on extracted topics, including: a searching unit searching for stored documents; a primary candidate phrase extracting unit sequentially assigning document IDs to searched documents based on appearance orders of the searched documents, and extracting documents with document IDs less than a predetermined value; a secondary candidate phrase extracting unit extracting words contained in titles or content of the extracted documents and appearance frequencies of the words, extracting primary candidate phrases composed of words of appearance frequencies greater than a predetermined value appearing consecutively in the titles or content of the documents, generating secondary candidate phrases from combinations of phrases composed of the words constituting the primary candidate phrases, and calculating weight values of the secondary candidate phrases; and a similar candidate phrase eliminating unit calculating similarities between secondary candidate phrases with weight values greater than a predetermined value by use of vectors consisting of document IDs of documents belonging to the secondary candidate phrases, eliminating secondary candidate phrases with low weight values among the secondary candidate phrases with similarities greater than a predetermined value, and setting the remaining secondary candidate phrases as topics. Mode for the Invention

[20] Exemplary embodiments in accordance with the present invention will now be described in detail with reference to the accompanying drawings.

[21] Fig. 1 is a view for explaining a method of providing search results based on topics according to an embodiment of the present invention.

[22] Referring to Fig. 1, search results are grouped into groups having similar phrases, and topics are extracted from the groups. For instance, it is assumed that 'A=Neowiz', 'B= Separated_search_serive', 'C=Pmang', 'D=Special_force',

'E=Founding_anniversary', and 'F=Party'. When various search results including 'A, B, C, D, E, and F are output, 1ABE', 1ABF', and 'ABD' may be grouped into a group, 'CDE', 'CDF' and 'CDG' may be grouped into a group, and 'AEFG', 'AEFH', and 'AEFI' may be grouped into a group. In this case, 'AB' becomes a topic 100, 'CD' becomes a topic and 'AEF' becomes a topic. The term 'topic' implies an expression indicating a subject of search results.

[23] Fig. 2 is a flow chart of a method of extracting topics according to an embodiment of the present invention.

[24] Referring to Fig. 2, when a search word is input, similarities between search results are calculated according to a similarity calculation method by referring to words that are included in titles or content in the search results and match with the search word. Further, representative phrases are extracted among a combination of duplicate words in similar search results, and search results are displayed according to the extracted representative phrases.

[25] In more detail, document IDs are sequentially assigned to documents matching with a search word based on appearance orders of the documents, and documents with documents IDs less than a predetermined value are extracted (operation S210). The predetermined value may vary based on the number of search results, i.e., documents, or the like. Data composed of 'words' which are included in titles or content of the documents, and 'Appearance frequencies of the words' are stored (operation S220). Next, primary candidate phrases composed of words of appearance frequencies greater than a predetermined value in the titles or content of the documents are extracted (operation S230). The predetermined value may vary according to the number of primary candidate phrases to be extracted.

[26] Next, secondary candidate phrases are generated from combinations of phrases composed of the words constituting the primary candidate phrases, and weight values of the secondary candidate phrases are calculated (operation S240). The weight values of secondary candidate phrases are calculated by referring to document IDs included in the secondary candidate phrases, appearance frequencies of words constituting the secondary candidate phrases, and the number of primary candidate phrases used in the secondary candidate phrases. For instance, since a document with a low document ID is important, its weight value becomes high. In addition, if appearance frequency of words constituting the secondary candidate phrase is high, it is regarded as an important document. Further, if document ID included in the secondary candidate phrase is low, it is regarded as an important document.

[27] Next, similarities between secondary candidate phrases with weight values greater than a predetermined value are calculated by use of vectors consisting of document IDs of documents that belong to the secondary candidate phrases (operation S250). That is, when there are several document IDs, the similarities are calculated by referring to the number of the same document IDs. Among secondary candidate phrases having similarities greater than a predetermined value, secondary candidate phrases with low weight values are eliminated and the remaining secondary candidate phrases are determined as topics (operation S260).

[28] The topics and the documents belonging to individual topics are displayed.

[29] Figs. 3 to 9 are views for explaining a method of extracting topics according to an embodiment of the present invention.

[30] As shown in Fig. 3, when a search word 'Neowiz' is entered, titles 320 appear as search results and document IDs 310 are assigned to the titles based on appearance orders of the titles.

[31] As shown in Fig. 4, a database 330 is obtained from words constituting the titles

320 and appearance frequencies of the words. It can be seen from Fig. 4 that a word 'Neowiz' appears thirteen times and a word 'Yogurting' appears four times in the titles 320. The appearance frequencies of the other words are obtained in this manner. Words of appearance frequencies less than a predetermined value are eliminated. In Fig. 4, a word 'Showdown' appears once and is eliminated.

[32] Next, phrases composed of words of appearance frequencies greater than a predetermined value are extracted from the titles 320 to make primary candidate phrases 340. It can be seen from Fig. 5 that there are six titles each composed of a string of consecutive words 'Neowiz', 'Yogurting', 'RPG', 'Search_corporation', 'Jukeon', Popularized', 'Announces', 'Music', 'Service', and 'Mobile_carrier' among the fourteen titles 320 in Fig. 3.

[33] Next, as shown in Fig. 6, secondary candidate phrases 350 are created with a combination of phrases composed of the words. Appearance frequencies 351 of phrases including the secondary candidate phrases 350 in the primary candidate phrases 340 are extracted. As described in Fig. 2, weight values 352 of the secondary candidate phrases 350 are calculated by referring to document IDs included in the secondary candidate phrases 350, appearance frequencies of words constituting the secondary candidate phrases 350, and the number of primary candidate phrases 340 used in the secondary candidate phrases 350. It can be seen form Fig. 7 that the phrase 'Announces RPG yogurting popularized' has a weight value of 1732, the phrase 'Neowiz Jukeon' has a weight value of 1720, the phrase 'Neowiz search_corporation' has a weight value of 1710, and the phrase 'Neowiz Jukeon mobile_carrier' has a weight value of 1320. The phrase 'Jukeon mobile_carrier music' having a weight value of 1200 is discarded. Thus, a reference weight value to eliminate phrases is 1200.

[34] Referring to Fig. 8, strings 353 of document IDs of documents including the secondary candidate phrases 350 are extracted to calculate similarities between the secondary candidate phrases 350. For instance, it is assumed that documents containing the phrase 'Announces RPG yogurting popularized' are (7, 10), documents containing the phrase 'Neowiz yogurting' are (1, 5, 7, 10), documents containing the phrase 'Neowiz search_corporation' are (2, 4, 12), and documents containing the phrase 'Neowiz search' are (2, 4, 8, 12). In this case, since the similarity between the phrases 'Announces RPG yogurting popularized' and 'Neowiz yogurting' is 66%, the similarity is regarded to be low. Since the similarity between the phrases 'Neowiz search_corporation' and 'Neowiz search' is 82%, the phrase 'Neowiz search' having a lower weight value is eliminated from the secondary candidate phrases. In this manner, topics 361 and search results topic-by-topic are obtained as shown in Fig. 9.

[35] Fig. 10 is a flow chart of a method of extracting issues according to an embodiment of the present invention.

[36] First, data having the same or similar data greater than a predetermined threshold value is extracted from stored data. A plurality of high-ranking data is extracted as issue data from the extracted data. The issue data is displayed in order of writing time of the issue data or in order of a number of similar documents. The stored data may be all of the Internet documents, specific blogs, data on news sites, or data obtained from predetermined search methods.

[37] In more detail, target documents on the Internet or target documents matching with a search word are extracted (operation S410). The extracted documents may be the same or similar to one another. After the number of same or similar documents is calculated, documents having appearance frequencies greater than a predetermined value are extracted (operation S420).

[38] High-ranking documents having a number of the same or similar documents are extracted as issues (operation S430). The extracted issues are output in order of writing time of the documents or the number of same or similar documents (operation S440).

[39] Figs. 11 to 13 are views for explaining a method of extracting issues according to an embodiment of the present invention.

[40] When there are Internet data 510 as shown in FIG. 11 , the data 510 are arranged in order of document title 520 and its appearance frequency 521 as shown in FIG. 12. Documents of appearance frequencies less than a predetermined value are eliminated. In this case, documents of appearance frequencies less than two hundreds are eliminated. The remaining documents are selected as issues and output in order of recent writing date as shown in Fig. 13.

[41] Fig. 14 is an issue output result.

[42] Issues may be extracted from the whole target documents on the Internet and displayed as described above. As described in Figs. 2 to 9, topics may be extracted from the target documents and issues may be extracted from the topics and displayed.

[43] Fig. 15 is another issue output result.

[44] Issues and topics may be displayed as shown in Fig. 15. For instance, issues 720 and topics 730 corresponding to a search word 'Neowiz' 710 may be displayed at different positions. [45] Fig. 16 is a block diagram of an information search apparatus according to an embodiment of the present invention.

[46] The information search apparatus includes a web data storage unit 810, a searching unit 820, a primary candidate phrase extracting unit 830, a secondary candidate phrase extracting unit 840, a similar candidate phrase eliminating unit 850, and a topic output unit 860.

[47] The web data storage unit 810 collects and stores documents on the Internet. The searching unit 820 uses typical search methods to search for the documents. The primary candidate phrase extracting unit 830 sequentially assigns document IDs to the documents in appearance order of the documents, and extracts documents having document IDs less than a predetermined value. A method of extracting the primary candidate phrases is described above in detail with reference to Fig. 2. The secondary candidate phrase extracting unit 840 extracts words contained in titles or content of the documents and appearance frequencies of the words, extracts documents containing words of appearance frequencies greater than a predetermined value in the titles or content as primary candidate phrases, generates secondary candidate phrases composed of combinations of phrases obtained from the words constituting the primary candidate phrases, and calculates weight values of the secondary candidate phrases.

[48] The similar candidate phrase eliminating unit 850 uses vectors consisting of document IDs of documents belonging to secondary candidate phrases with weight values greater than a predetermined value to calculate similarities between the secondary candidate phrases. The similar candidate phrase eliminating unit 850 eliminates secondary candidate phrases with lower weight values among secondary candidate phrases with similarities greater than a predetermined value, and sets the remaining secondary candidate phrases as topics. The topic output unit 860 sets the topics as titles and outputs the topics and documents corresponding to the topics.

[49] The above-mentioned methods of extracting topics and issues may be written with computer programs. Codes and code segments constituting the programs can be easily deduced by computer programmers skilled in the art. In addition, the programs are stored in computer readable media, read and executed by computers, thereby implementing the methods of extracting topics and issues. Examples of the computer readable media include magnetic recording media, optical recording media, and carrier wave media.

[50] While the present invention has been described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the present invention as defined by the following claims. Industrial Applicability

[51] The present invention can be efficiently applied to industrial fields related to a method and apparatus for extracting topics from search results and providing the search results based on the topics, and a method and apparatus for selecting and providing frequently appearing search results as issues.

Claims

Claims
[1] A method of displaying search results with respect to a search word, comprising:
(a) referring to words contained in titles or content of search results matching with the search word to calculate similarities between the search results according to a predetermined similarity calculation method, and extracting representative phrases among combinations of words repeatedly contained in similar search results; and
(b) displaying the representative phrases and the search results that belong to each of the representative phrases.
[2] The method of claim 1, wherein the operation (a) comprises:
(al) extracting words contained in titles or content of the search results matching with the search word, and extracting primary candidate phrases in which at least one of the words consecutively appears; and
(a2) generating secondary candidate phrases from words constituting the primary candidate phrases, calculating significance of the secondary candidate phrases based on appearance orders of the search results, appearance frequencies of the words, and the number of primary candidate phrases used in the secondary candidate phrases, and extracting representative phrases by eliminating similar candidate phrases from the secondary candidate phrases of higher significance.
[3] A method of extracting topics, comprising:
(a) assigning document IDs to documents with respect to a search word based on appearance orders of the documents, and extracting documents with document IDs less than a predetermined value;
(b) extracting words contained in titles or content of the extracted documents and appearance frequencies of the words;
(c) extracting primary candidate phrases composed of words of appearance frequencies greater than a predetermined value appearing consecutively in the titles or content of the documents;
(d) generating secondary candidate phrases from combinations of phrases composed of the words constituting the primary candidate phrases, and calculating weight values of the secondary candidate phrases;
(e) calculating similarities between secondary candidate phrases with weight values greater than a predetermined value by use of vectors consisting of document IDs of documents belonging to the secondary candidate phrases; and
(f) eliminating secondary candidate phrases with low weight values among the secondary candidate phrases with similarities greater than a predetermined value, and setting the remaining secondary candidate phrases as topics. [4] The method of claim of 3, further including (g) displaying the topics as titles and documents that belong to each of the topics.
[5] The method of claim of 3, wherein the operation (d) comprises: generating secondary candidate phrases from combinations of phrases composed of the words constituting the primary candidate phrases; and calculating weight values of the secondary candidate phrases based on document IDs contained in the secondary candidate phrases, appearance frequencies of the words constituting the secondary candidate phrases, and the number of the primary candidate phrases used in the secondary candidate phrases.
[6] A method of extracting issues, comprising:
(a) extracting the same or similar data the number of which is greater than a predetermined threshold value among stored data; and
(b) extracting as issue data a plurality of high-ranking data among the extracted data and displaying the issue data in order of writing time of the issue data or in order of a number of similar documents.
[7] The method of claim 6, wherein the stored data is data obtained by a predetermined search method.
[8] The method of claim 6, wherein the operation (a) includes determining the same or similar data based on words contained in titles or content of stored data, and extracting the same or similar data the number of which is greater than a predetermined threshold value.
[9] An apparatus for providing search services based on extracted topics, comprising: a searching unit searching for stored documents; a primary candidate phrase extracting unit sequentially assigning document IDs to searched documents based on appearance orders of the searched documents, and extracting documents with document IDs less than a predetermined value; a secondary candidate phrase extracting unit extracting words contained in titles or content of the extracted documents and appearance frequencies of the words, extracting primary candidate phrases composed of words of appearance frequencies greater than a predetermined value appearing consecutively in the titles or content of the documents, generating secondary candidate phrases from combinations of phrases composed of the words constituting the primary candidate phrases, and calculating weight values of the secondary candidate phrases; and a similar candidate phrase eliminating unit calculating similarities between secondary candidate phrases with weight values greater than a predetermined value by use of vectors consisting of document IDs of documents belonging to the secondary candidate phrases, eliminating secondary candidate phrases with low weight values among the secondary candidate phrases with similarities greater than a predetermined value, and setting the remaining secondary candidate phrases as topics.
[10] The apparatus of claim 9, further including a topic output unit displaying the topics as titles and documents that belong to each of the topics.
[11] The apparatus of claim 9, wherein the secondary candidate phrase extracting unit generates secondary candidate phrases from combinations of phrases composed of the words constituting the primary candidate phrases, and calculates weight values of the secondary candidate phrases based on document IDs contained in the secondary candidate phrases, appearance frequencies of the words constituting the secondary candidate phrases, and the number of the primary candidate phrases used in the secondary candidate phrases.
[12] Computer readable media storing programs for executing on a computer the method of claim 1 or 2.
PCT/KR2006/002787 2005-07-15 2006-07-14 Method of extracting topics and issues and method and apparatus for providing search results based on topics and issues WO2007011140A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR10-2005-0064515 2005-07-15
KR20050064515 2005-07-15

Publications (1)

Publication Number Publication Date
WO2007011140A1 true WO2007011140A1 (en) 2007-01-25

Family

ID=37668993

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2006/002787 WO2007011140A1 (en) 2005-07-15 2006-07-14 Method of extracting topics and issues and method and apparatus for providing search results based on topics and issues

Country Status (1)

Country Link
WO (1) WO2007011140A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008098282A1 (en) * 2007-02-16 2008-08-21 Funnelback Pty Ltd Search result sub-topic identification system and method
JP2014059865A (en) * 2012-09-14 2014-04-03 Hon Hai Precision Industry Co Ltd Retrieval system and method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5924090A (en) * 1997-05-01 1999-07-13 Northern Light Technology Llc Method and apparatus for searching a database of records
US6012053A (en) * 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
KR20000050225A (en) * 2000-05-29 2000-08-05 전상훈 Internet information searching system and method by document auto summation
US6212517B1 (en) * 1997-07-02 2001-04-03 Matsushita Electric Industrial Co., Ltd. Keyword extracting system and text retrieval system using the same
KR20040029895A (en) * 2002-10-02 2004-04-08 씨씨알 주식회사 Search system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5924090A (en) * 1997-05-01 1999-07-13 Northern Light Technology Llc Method and apparatus for searching a database of records
US6012053A (en) * 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
US6212517B1 (en) * 1997-07-02 2001-04-03 Matsushita Electric Industrial Co., Ltd. Keyword extracting system and text retrieval system using the same
KR20000050225A (en) * 2000-05-29 2000-08-05 전상훈 Internet information searching system and method by document auto summation
KR20040029895A (en) * 2002-10-02 2004-04-08 씨씨알 주식회사 Search system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008098282A1 (en) * 2007-02-16 2008-08-21 Funnelback Pty Ltd Search result sub-topic identification system and method
AU2008215153B2 (en) * 2007-02-16 2012-02-16 Funnelback Pty Ltd Search result sub-topic identification system and method
AU2008215153B9 (en) * 2007-02-16 2012-03-01 Funnelback Pty Ltd Search result sub-topic identification system and method
US8214347B2 (en) 2007-02-16 2012-07-03 Funnelback Pty Ltd. Search result sub-topic identification system and method
JP2014059865A (en) * 2012-09-14 2014-04-03 Hon Hai Precision Industry Co Ltd Retrieval system and method thereof

Similar Documents

Publication Publication Date Title
Cohen et al. Learning to match and cluster large high-dimensional data sets for data integration
Abdul-Jaleel et al. UMass at TREC 2004: Novelty and HARD
US5546578A (en) Data base retrieval system utilizing stored vicinity feature values
Whitman et al. Inferring Descriptions and Similarity for Music from Community Metadata.
KR101157693B1 (en) Multi-stage query processing system and method for use with tokenspace repository
CN101390097B (en) System and method for identifying related queries for languages with multiple writing systems
US6751776B1 (en) Method and apparatus for personalized multimedia summarization based upon user specified theme
US8321445B2 (en) Generating content snippets using a tokenspace repository
US7769751B1 (en) Method and apparatus for classifying documents based on user inputs
US7269544B2 (en) System and method for identifying special word usage in a document
US7461056B2 (en) Text mining apparatus and associated methods
US20050055372A1 (en) Matching media file metadata to standardized metadata
US6065001A (en) Information associating apparatus and method
US20050197829A1 (en) Word collection method and system for use in word-breaking
JP4828091B2 (en) Clustering method program and apparatus
US6965900B2 (en) Method and apparatus for electronically extracting application specific multidimensional information from documents selected from a set of documents electronically extracted from a library of electronically searchable documents
US7707204B2 (en) Factoid-based searching
KR100739726B1 (en) Method and system for name matching and computer readable medium recording the method
Robertson et al. Applications of n-grams in textual information systems
US20110145348A1 (en) Systems and methods for identifying terms relevant to web pages using social network messages
Pu et al. Subject categorization of query terms for exploring Web users' search interests
US20030115188A1 (en) Method and apparatus for electronically extracting application specific multidimensional information from a library of searchable documents and for providing the application specific information to a user application
KR100721406B1 (en) Product searching system and method using search logic according to each category
US8768917B1 (en) Method and apparatus for automatically identifying compounds
CN102124459B (en) Dictionary word and phrase determination

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1)EPC

122 Ep: pct application non-entry in european phase

Ref document number: 06783312

Country of ref document: EP

Kind code of ref document: A1