WO1997049048A1 - Systeme et procede de recherche de documents hypertextes - Google Patents

Systeme et procede de recherche de documents hypertextes Download PDF

Info

Publication number
WO1997049048A1
WO1997049048A1 PCT/US1997/010191 US9710191W WO9749048A1 WO 1997049048 A1 WO1997049048 A1 WO 1997049048A1 US 9710191 W US9710191 W US 9710191W WO 9749048 A1 WO9749048 A1 WO 9749048A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
documents
term
pointing
hyperlink
Prior art date
Application number
PCT/US1997/010191
Other languages
English (en)
Inventor
Yanhong Li
Original Assignee
Idd Enterprises, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/794,425 external-priority patent/US5920859A/en
Application filed by Idd Enterprises, L.P. filed Critical Idd Enterprises, L.P.
Priority to AU34848/97A priority Critical patent/AU3484897A/en
Publication of WO1997049048A1 publication Critical patent/WO1997049048A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation

Definitions

  • the present invention relates to hypertext document retrieval, and more particularly to systems and methods of searching databases distributed over wide-area networks such as the World Wide Web.
  • a hypertext is a database system which provides a unique and non-sequential method of accessing information using nodes and links.
  • Nodes i.e. documents or files, contain text, graphics, audio, video, animation, images, etc. while links connect the nodes or documents to other nodes or documents.
  • the most popular hypertext or hypermedia system is the World Wide Web, which links various nodes or documents together using hyperlinks, thereby allowing the non-linear organization of text on the web.
  • a hyperlink is a relationship between two anchors, called the head and the tail of the hyperlink.
  • the head anchor is the destination node
  • hyperlinks are generally identified by underscoring or highlighting certain text or graphics in a tail anchor document.
  • the hyperlink automatically connects the user's computer with or "points to" the head anchor document for that
  • a hypertext system generally works well when a user has already found a tail document pertaining to the subject matter of interest to that user.
  • the hyperlinks in the tail document are created by the author of the document who generally will have reviewed the material in the head documents of the hyperlinks.
  • a user clicking on a hyperlink has a high degree of certainty that the material in the head document has some pertinence to the anchor text in the tail document of the hyperlink.
  • Search engines usually take a user query as input and attempt
  • Queries are usually in the form of several words which describe the subject matter of interest to the user.
  • search engines operate by comparing the query to an index of a document collection in order to determine if the content of one or more of
  • search engines Since most casual users of search engines do not want to type in long, specific queries and tend to search on popular topics, there may be thousands of documents that are at least tangentially related to the query. When a search engine has indexed a large document collection, such as the Web, it is particularly likely that a very large number of documents will be found that have some relevance to the query. Most search engines, therefore, output a list of documents to the user where the documents are ranked by their degree of pertinence to the query and/or where documents having a relatively low pertinence are not identified to the user. Thus, the way in which a search engine determines the relevance ranking is extremely important in order to limit the number of documents a user must review to satisfy that user's information needs.
  • probablistic model fuzzy logic models, etc. are used to develop a numerical relevance ranking. See, Harman, D. , “Ranking Algorithms, "
  • a user query Q is represented as a vector where each query term (qt) is represented as a dimension of a query vector.
  • Q ⁇ qt, , qt 2 , ... , q ⁇ >
  • Documents in the database are also represented by vectors with each term or key word (dt) in the document represented as a dimension in the vector.
  • Weight (t) TF*IDF, where TF is the term frequency of a given term in a document or query, and IDF, is the inverse document frequency of the term.
  • the inverse document frequency is the inversion of how many documents in the whole
  • Documents such as Document J might not be included in a
  • Web site containing pictures of Mozart or examples of Mozart's music may not be deemed relevant by a search engine when that search engine can only search for the word "Mozart" within the text of documents.
  • a method of indexing documents includes obtaining a list of hyperlinks pointing to each document, where each hyperlink includes one
  • Each document is indexed with the terms in the hyperlink pointing to that document.
  • a number of hyperlinks, each containing a particular term, may point to a document. The number of hyperlinks containing that particular term pointing to the document is indexed with that document.
  • a particular term may appear in hyperlinks pointing to a number of documents, and the number of documents having the particular term in hyperlinks pointing to those documents is indexed with that term. Indexing may include creating a file listing each term, the number of documents having that term in hyperlinks pointing to those documents, a document identifier for each document having that term in hyperlinks pointing that document, and the number of hyperlinks containing that term pointing to each identified document.
  • the number of documents having a particular term in hyperlinks pointing to those documents may be indexed with a document identifier for each document having the particular term in a hyperlink pointing to that document.
  • pointing to a document may be with the inverse of the number of documents having the particular term in hyperlinks pointing to those documents.
  • a term may appear a number of times in a hyperlink pointing to a document, and the number of times each term appears in a hyperlink is indexed with the document pointed to by the hyperlink.
  • the terms may be stemmed words.
  • the method of the present invention may be performed on an apparatus and may be stored as a computer-readable set of instructions.
  • a method of ranking documents is based on the document's relevance to a query where the query has at least one term, and where hyperlinks contain terms and point to corresponding documents.
  • the method includes comparing the words in the query to the words in a hyperlink to obtain a relevance ranking for each hyperlink, and summing the relevance rankings for each hyperlink pointing to a particular document to obtain a summed relevance score for that document.
  • the query may be represented by a query vector where the query vector contains a dimension for each term in the query.
  • Each document may be represented by document link vectors for each hyperlink
  • each document link vector contains a dimension for each term in the corresponding hyperlink pointing to that document. Comparing the words in the query to the words in the hyperlinks includes calculating the dot product of the query vector with the
  • Summing the relevance ranking for each hyperlink pointing to a document includes summing the dot products obtained using the document link vectors for a particular document to obtain the summed relevance score for that document.
  • summed relevance scores may then be compared to obtain a ranking of documents.
  • the dimension for a term in a query vector may be related to the inverse of the number of documents having a respective hyperlink containing that term pointing to those documents.
  • the dimension for a term in a document link vector may be related to the inverse of a
  • Fig. 1 is a block diagram of a distributed computer network including a hypertext retrieval system of the present invention
  • Fig. 2 is a block diagram of an indexing and retrieval system of the present invention
  • Fig. 3 is a diagram of two hypertext documents
  • Fig. 4 is an example of a hypertext document system including representation of hyperlinks between those documents;
  • Fig. 5 is a flow chart of an indexing process of the present invention.
  • Fig. 6 is a flow chart of a retrieval process of the present invention.
  • Fig. 1 is a block diagram of a typical distributed hypertext system including a client computer 10 connected to server computers 12, 13, 14, 15, and 16. Although the client computer 10 is shown connected directly to server 12, it may be connected indirectly to server 12 through a service provider or through any one or more of the other servers. Servers
  • Server 12 includes an index file 21 as discussed in more detail below.
  • the server computer 12 traverses the network looking for all hypertext documents residing in the files 17-20 of the other server computers 13-16 in order to build the index file 21.
  • Fig. 2 describes the general structure of an indexing and retrieval system 30 of the present invention.
  • a user from outside the system 30 inputs a query 32 through a user interface 34, which will typically reside on the user's computer, such as a client computer 10 (Fig. 1).
  • the user's query is then transmitted through the network to the indexing and retrieval system 30, which generally resides on a server, such as server 12 (Fig. 1).
  • the system 30 includes a retrieval engine 36, index files 38, and an index engine 40.
  • the operation of the retrieval engine 36 and index engine 40 and the creation of the index files 38 are described below.
  • the index engine 40 creates the index files 38 by traversing a document database 42, such as that found on the World Wide Web.
  • the document database 42 might include files 17-20 (Fig. 1).
  • the index files 38 created by the index engine 40 may take various forms in accordance with the present invention, but may include a link file 44, an inverted file 46, and a document vector file 48, all of which are described in detail below.
  • the retrieval engine 36 uses the index files 38 in order to determine a relevance ranking for the documents, and outputs search results at 49 through the user interface 34.
  • Fig. 3 is a diagram of Document A and Document B, which
  • the addresses may be in the form of a uniform resource locator (URL), which is a type of uniform
  • URL's are typically in the format such as: http : //www . w3. org/hy pertext/book . html
  • the URL may be followed by the pound symbol and a sequence of characters called a fragment identifier in order to identify a fragment within a document, i.e. : http : //www . w3. org/hypertext/book. html#Chapter 1 Document A has a title 52, an abstract 54, and text or media 56.
  • Document B has a title 58, an abstract 60, and text or media 62.
  • the text or media may contain anchor text such as anchor text 64 in Document A.
  • Document A also contains a command 66, which serves as the instructions for the hyperlink 50.
  • the representation of command 66 of the hyperlink 50 is shown in hypertext markup language (HTML) and includes the command "href” and then identifies the address of the head anchor, in this case, the address of Document B "URL2. "
  • the command 66 then includes the statement "good tutorial on Java, " which identifies the anchor text of the hyperlink 50.
  • the command 66 points to Document B, thereby instructing the user's computer to send a message to the address URL2, requesting a copy of
  • Fig. 4 is a representation of a simple hypertext system having only four documents, Documents A, B, C, and D.
  • the system shown in Fig. 4 has only three hyperlinks, hyperlink 50, also shown in
  • Fig. 3 and hyperlinks 68 and 70.
  • the anchor text "good tutorial on Java” in Document A is the tail for the hyperlink from Document A to Document B, as shown in Fig. 3.
  • Document C contains two sets of anchor text "Java tutorial” and "Sun's Java site. "
  • the anchor text 72 in Document C points to Document B through the hyperlink 68.
  • the anchor text 74 points to
  • Fig. 4 will be used below in describing the hypertext system including the
  • index engine the retrieval engine, and the index files created by the index engine.
  • Fig. 5 describes the operation of the index engine 40 of Fig.
  • the index engine traverses each document in the database.
  • Spider programs begin by obtaining various URL addresses and send messages to those addresses requesting the documents located at the addresses. Those addresses may identify a server, a document stored in files on that server, or groups of documents. Upon obtaining the document or documents identified by the URL, a spider program then reviews those documents looking for hyperlink commands identifying additional addresses. The spider program records those addresses and then seeks the documents residing at those addresses.
  • hyperlink information might include the URL of the document, the words in the anchor text of the hyperlink in the document, and the URL of any document pointed to by a hyperlink having that anchor text.
  • the system may also collect a variety of information about the document including its title and possibly the text of the document. The system may also create an abstract, if desired.
  • the system creates one or more link files where entries in the files have a format:
  • doc. ID is an identifier for each head document of a hyperlink having the corresponding anchor text.
  • the doc. ID may be in the form of a URL or may be another identifier which is indexed in some manner with the document's URL.
  • Box 104A is an example of a link file, as referred to in Fig. 2, created for the database of the documents shown in Fig. 4. Since the database in Fig. 4 has three hyperlinks, there are three entries in file 104A.
  • the system may also store the number of times a term appears in anchor text for a particular link. In the examples shown, each term only appears once in a particular link.
  • FIG. 5 shows that traversing of documents in block 100 occurs before link files are created at block 104, it is possible for some
  • the anchor text for the various hyperlinks may
  • Stemming is a process of reducing the words from various morphological forms to a simplified stem. During stemming, words are usually made case-insensitive, e.g. "Tutorial” and “tutorial” are considered the same. “Sun's” will stem to “Sun, " “documents” will stem to
  • Control then passes to block 108, which creates an invert file with entries in the format of:
  • DF is the document frequency for that term
  • doci is the document identifier for Document i
  • LFi is the link term frequency for doci.
  • Link term frequency is defined as the number of hyperlinks pointing to doci whose anchor text consists of the
  • the index engine at box 114 may also generate a document link vector file where entries in the document link vector file are in the format of: doc. id, V ] , v 2 , ..., v, where doc. id is the identifier for a particular document, and v, is a vector
  • Each vector v s will be in the format of:
  • TF is the term frequency of term i, i.e. how many times a term appears in the given anchor text
  • IDF is the invert document frequency (1/DF) for the term to which the particular
  • dimension in the link vector pertains. It may be desirable to divide the document frequency by the total number of documents to obtain a normalized document frequency when calculating the dimensions. It may also be desirable to use the logarithm of the inverse document frequency when calculating dimensions.
  • File 114A is an example of a document link vector file which has been generated at block 114. Since Document B has two hyperlinks pointing to it, there are two vectors for Document B entered in file 114, along with the identifier of Document B. Since the anchor text of the first hyperlink pointing to Document B has four distinct words "good tutorial on Java, " the first vector for Document B has four dimensions. Since the second hyperlink pointing to Document B has only two words in the anchor (Java, tutorial), the second vector indexed with Document B has only two dimensions. As described below, the document link vector file 114A is used in calculating the relevance score with respect to a particular query. Instead of creating document link vector files automatically, it may be desirable to create document link vector files only upon receipt of a query. Thus, the only entries in the link vector files which need to be created are those pertaining to documents having query terms in the anchor text of hyperlinks pointing to those documents.
  • index files 112A, and the document link vector file 114 are all considered index files as shown in Fig. 2.
  • files as shown in Fig. 5 are preferred, there are many indexing techniques which can be used with a system of the present invention, which rely on anchor text and link frequency in order to
  • index documents For instance, the files may be compressed or have a variety of relational structures for the data within files or between files.
  • the retrieval process achieves relevance ranking by using the vector space model and link vector voting.
  • the process begins at box 120 with the input of a user query as shown in
  • a document may be related to the query if that document has a hyperlink pointing to it, where the hyperlink includes a query term in its
  • the system has located two documents. Document B and Document D, each of which has one or more of the terms in the query in anchor text of hyperlinks pointing to those documents.
  • Control next passes to box 126 where the system finds document link vectors for each document identified in box 124A.
  • the document link vectors are contrasted with conventional document vectors which are based on the content of each document. The system may find
  • Box 126A shows the document link vectors, along with the anchor text, for each hyperlink pointing to a document related to the query.
  • the system While obtaining the document link vectors, the system, at box 128. also creates a query vector as shown in box 128A.
  • the dimensions in the query vector are equal to TF q *IDF for each term in the query, where TF q is the term frequency or number of times the term appears in the query.
  • IDF is the inverse document frequency for a term as calculated in box 110 of Fig. 5.
  • the TF q is one for both "Java" and "tutorial" in the query.
  • the IDF as previously calculated in box 110 of
  • the relevance score is calculated by finding the dot product of each document link vector with the query vector.
  • a dot product for vectors ⁇ a, b, c > and ⁇ d, e, f > is defined as: a*d + b*e + c*f
  • the first vector for Document B is represented as:
  • the query vector would be represented as: ⁇ 0, 1 , 0, .5 > so that the dimensions representing "tutorial" in each vector and "Java" in each vector match up.
  • the dot product of the query vector with the first document link vector for Document B would then be calculated as follows: 0 x 1 + 1 x 1 + 0 x 1 + .5 x .5
  • pertaining to a particular document are summed to obtain a "vote" or summed score for a particular document.
  • Document B is the sum of the dot products for each document link vector relating Document B, which equals 1.620. A similar calculation can be made by finding the dot product of the query vector with the only document link vector for Document D, which equals 0.149.
  • sorted results are output as shown in box 132A.
  • results are sorted so that the documents having higher summed relevance rankings are listed above those with lower rankings. Instead of listing all documents having a non-zero relevance score, it may be desirable to only list a pre-set number, i.e. the top 100 documents, or to only list those documents having a relevance score above a certain threshold.
  • the process described herein can be performed on a number of apparatus, including a Sun Sparc Station with a Solaris operating system.
  • the process may be stored in memory on the computer system as a set of instructions .
  • the set of instructions may also be stored on a computer- readable memory such as a disk, and the instructions can be transmitted from one computer to another over a network.
  • Document A and Document C contain the words in the query, "Java” and “tutorial. "
  • a conventional index and retrieval engine could be used in combination with the hyperlinked based index and retrieval system of the present invention. This combination might be used in the case of a
  • link-based relevance score tie or merely to supplement the link-based information. For instance, suppose the relevance scores for Document A and C are 0.6 and 0.8, respectively, based on conventional and relevance ranking. The final relevance ranking for the query utilizing the conventional ranking to break the tie of the link-based ranking would be
  • Another reason to use combination ranking may be when there are too few hyperlinks (such as only one link) pointing to a document.
  • the relevance score based upon the one link may not be accurate, so a threshold can be set for the link-based relevance score. If the link-based relevance score is lower than the threshold, other
  • means of relevance ranking may be used or combined with the link-based relevance score.
  • index files of the present invention use only hyperlink information, relevance ranking does not depend on the words appearing in documents themselves, or, if used in combination with conventional relevance ranking do not depend solely on words appearing in the documents. Instead, the relevance ranking depends on descriptions of those documents in the anchor text of hyperlinks pointing to the documents. Documents such as Document J described above will not have a high summed relevance score because authors creating hypertext documents will not include hyperlinks in their documents pointing to Document J.
  • the ranking method based on hyperlinks pointing to a given document can be used to select the most popular documents in a specific field using the feature words or description of that field as the query to the system.
  • a system can automatically construct a thesaurus or synonym tool.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Une machine de recherche permettant la recherche de documents relatifs à une consultation indexe des documents conformément à des hyperliens désignant ces documents. L'indexeur parcourt la base de données hypertextes et localise des informations hypertextes, notamment l'adresse du document que les hyperliens désignent et le texte d'ancrage de chaque hyperlien. Ces informations sont mémorisées dans un fichier index inversé, lequel peut également être utilisé pour calculer des vecteurs de chaînage documentaire pour chaque hyperlien désignant un document particulier. Lorsqu'une consultation est introduite, la machine de recherche localise tous les vecteurs documentaires relatifs aux documents présentant les termes de la consultation dans leur texte d'ancrage. Un vecteur de consultation est également calculé, et le produit scalaire du vecteur de consultation et de chaque vecteur de chaînage documentaire est calculé. Les produits scalaires concernant un document particulier sont additionnés pour déterminer la hiérarchie de pertinence pour chaque document.
PCT/US1997/010191 1996-06-17 1997-06-17 Systeme et procede de recherche de documents hypertextes WO1997049048A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU34848/97A AU3484897A (en) 1996-06-17 1997-06-17 Hypertext document retrieval system and method

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US66456596A 1996-06-17 1996-06-17
US08/664,565 1996-06-17
US08/794,425 1997-02-05
US08/794,425 US5920859A (en) 1997-02-05 1997-02-05 Hypertext document retrieval system and method

Publications (1)

Publication Number Publication Date
WO1997049048A1 true WO1997049048A1 (fr) 1997-12-24

Family

ID=27098998

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/010191 WO1997049048A1 (fr) 1996-06-17 1997-06-17 Systeme et procede de recherche de documents hypertextes

Country Status (2)

Country Link
AU (1) AU3484897A (fr)
WO (1) WO1997049048A1 (fr)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999053418A1 (fr) * 1998-04-10 1999-10-21 International Business Machines Corporation Diffusion de caracteristiques sur des hyperliens
WO2002054289A1 (fr) 2000-12-29 2002-07-11 International Business Machines Corporation Compression d'indice avec perte
WO2005071566A1 (fr) * 2004-01-26 2005-08-04 International Business Machines Corporation Procede, systeme et programme permettant de manipuler un texte d'ancrage
EP1587009A2 (fr) * 2004-04-15 2005-10-19 Microsoft Corporation Propagation de contenu pour la recherche documentaire améliorée
US7092901B2 (en) * 1999-05-28 2006-08-15 Overture Services, Inc. System and method for influencing a position on a search result list generated by a computer network search engine
US7194466B2 (en) 2003-05-01 2007-03-20 Microsoft Corporation Object clustering using inter-layer links
US7260568B2 (en) 2004-04-15 2007-08-21 Microsoft Corporation Verifying relevance between keywords and web site contents
US7289985B2 (en) 2004-04-15 2007-10-30 Microsoft Corporation Enhanced document retrieval
EP1880318A2 (fr) * 2004-12-30 2008-01-23 Word Data Corp. Systeme et procede permettant d'extraire des informations de documents riches en citations
US7366705B2 (en) 2004-04-15 2008-04-29 Microsoft Corporation Clustering based text classification
US7689585B2 (en) 2004-04-15 2010-03-30 Microsoft Corporation Reinforced clustering of multi-type data objects for search term suggestion
WO2019127254A1 (fr) * 2017-12-28 2019-07-04 深圳配天智能技术研究院有限公司 Procédé de positionnement de véhicule, dispositif et support de stockage

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BICHTELER J ET AL: "The combined use of bibliographic coupling and cocitation for document retrieval", JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, JULY 1980, USA, vol. 31, no. 4, ISSN 0002-8231, pages 278 - 282, XP002043307 *
DUNLOP M D ET AL: "Hypermedia and free text retrieval", INFORMATION PROCESSING & MANAGEMENT, 1993, UK, vol. 29, no. 3, ISSN 0306-4573, pages 287 - 298, XP002043306 *
FREI H P ET AL: "The use of semantic links in hypertext information retrieval", INFORMATION PROCESSING & MANAGEMENT (INCORPORATING INFORMATION TECHNOLOGY), vol. 31, no. 1, January 1995 (1995-01-01), pages 1-13, XP004040956 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125361A (en) * 1998-04-10 2000-09-26 International Business Machines Corporation Feature diffusion across hyperlinks
WO1999053418A1 (fr) * 1998-04-10 1999-10-21 International Business Machines Corporation Diffusion de caracteristiques sur des hyperliens
US7092901B2 (en) * 1999-05-28 2006-08-15 Overture Services, Inc. System and method for influencing a position on a search result list generated by a computer network search engine
WO2002054289A1 (fr) 2000-12-29 2002-07-11 International Business Machines Corporation Compression d'indice avec perte
EP1346296A1 (fr) * 2000-12-29 2003-09-24 International Business Machines Corporation Compression d'indice avec perte
EP1346296A4 (fr) * 2000-12-29 2008-07-02 Ibm Compression d'indice avec perte
US7194466B2 (en) 2003-05-01 2007-03-20 Microsoft Corporation Object clustering using inter-layer links
JP2007519111A (ja) * 2004-01-26 2007-07-12 インターナショナル・ビジネス・マシーンズ・コーポレーション アンカーテキストを処理する方法、システム、およびプログラム
WO2005071566A1 (fr) * 2004-01-26 2005-08-04 International Business Machines Corporation Procede, systeme et programme permettant de manipuler un texte d'ancrage
EP1587009A2 (fr) * 2004-04-15 2005-10-19 Microsoft Corporation Propagation de contenu pour la recherche documentaire améliorée
US7260568B2 (en) 2004-04-15 2007-08-21 Microsoft Corporation Verifying relevance between keywords and web site contents
US7289985B2 (en) 2004-04-15 2007-10-30 Microsoft Corporation Enhanced document retrieval
US7305389B2 (en) 2004-04-15 2007-12-04 Microsoft Corporation Content propagation for enhanced document retrieval
US7366705B2 (en) 2004-04-15 2008-04-29 Microsoft Corporation Clustering based text classification
EP1587009A3 (fr) * 2004-04-15 2006-08-16 Microsoft Corporation Propagation de contenu pour la recherche documentaire améliorée
US7689585B2 (en) 2004-04-15 2010-03-30 Microsoft Corporation Reinforced clustering of multi-type data objects for search term suggestion
KR101114023B1 (ko) * 2004-04-15 2014-02-26 마이크로소프트 코포레이션 확장형 문서 검색을 위한 콘텐츠 전파
EP1880318A2 (fr) * 2004-12-30 2008-01-23 Word Data Corp. Systeme et procede permettant d'extraire des informations de documents riches en citations
EP1880318A4 (fr) * 2004-12-30 2009-04-08 Word Data Corp Systeme et procede permettant d'extraire des informations de documents riches en citations
WO2019127254A1 (fr) * 2017-12-28 2019-07-04 深圳配天智能技术研究院有限公司 Procédé de positionnement de véhicule, dispositif et support de stockage

Also Published As

Publication number Publication date
AU3484897A (en) 1998-01-07

Similar Documents

Publication Publication Date Title
US5920859A (en) Hypertext document retrieval system and method
US9940398B1 (en) Customization of search results for search queries received from third party sites
US6516312B1 (en) System and method for dynamically associating keywords with domain-specific search engine queries
US7392238B1 (en) Method and apparatus for concept-based searching across a network
US7047246B2 (en) Search and index hosting system
Yuwono et al. Search and ranking algorithms for locating resources on the World Wide Web
US6338059B1 (en) Hyperlinked search interface for distributed database
US8255541B2 (en) Method and apparatus for utilizing user feedback to improve signifier mapping
US6321220B1 (en) Method and apparatus for preventing topic drift in queries in hyperlinked environments
USRE42262E1 (en) Method and apparatus for representing and navigating search results
US8775396B2 (en) Method and system for searching a wide area network
US20040024752A1 (en) Method and apparatus for search ranking using human input and automated ranking
US20100049772A1 (en) Extraction of anchor explanatory text by mining repeated patterns
US6938034B1 (en) System and method for comparing and representing similarity between documents using a drag and drop GUI within a dynamically generated list of document identifiers
US9275145B2 (en) Electronic document retrieval system with links to external documents
WO2003017023A2 (fr) Systeme et procede d'extraction d'un contenu destine a etre soumis a un moteur de recherche
US7024405B2 (en) Method and apparatus for improved internet searching
WO1997049048A1 (fr) Systeme et procede de recherche de documents hypertextes
Henzinger Web information retrieval-an algorithmic perspective
Yoke Chun World wide web robots: an overview
Lavania et al. Google: a case study (web searching and crawling)
US20030046276A1 (en) System and method for modular data search with database text extenders
US7490082B2 (en) System and method for searching internet domains
Lam The Overview of Web Search Engines
EP1158424A1 (fr) Système et procédé de publication et classification de documents sur un réseau

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IL IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG UZ VN AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 98503130

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

122 Ep: pct application non-entry in european phase