EP1257930A1 - Categorisation d'entites de donnees - Google Patents

Categorisation d'entites de donnees

Info

Publication number
EP1257930A1
EP1257930A1 EP00984929A EP00984929A EP1257930A1 EP 1257930 A1 EP1257930 A1 EP 1257930A1 EP 00984929 A EP00984929 A EP 00984929A EP 00984929 A EP00984929 A EP 00984929A EP 1257930 A1 EP1257930 A1 EP 1257930A1
Authority
EP
European Patent Office
Prior art keywords
categorisation
item
item data
quantification
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00984929A
Other languages
German (de)
English (en)
Inventor
Anders Hyldahl
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mondosoft AS
Original Assignee
Mondosoft AS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mondosoft AS filed Critical Mondosoft AS
Publication of EP1257930A1 publication Critical patent/EP1257930A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing

Definitions

  • the present invention relates to a method for categorisation of items being data entities and in particular relates to categorisation of data entities being web pages of a web site
  • Today web sites are indexed by gathering, for instance by crawling, information related to each web page to be indexed
  • the information relating to each web page typically comprises a path to the page
  • Prior art methods have attempted to do a post-categorisation of the indexed web site based on a search string provided by a searcher searching the web site Based on the search string provided, a search engine will go through a database comprising information to the indexed web site and will evaluate, by use of Boolean algebra, whether the search string or fragments of the search is/are represented in the information If the search string is represented in the information, then a link to the web page will be presented
  • a score may be assigned to each hit and the displaying of the hits may be sorted in a way where hits having the highest score are displayed first
  • the present invention provides, in a broad aspect, a method for categorising items being data entities stored a in computer system, the method comprising performing categorisation in such a manner that an item and a category are linked if a determined quantification of a relation between said item and said category fulfils a predefined criterion, said method utilising a list of categories c n which the categorisation is to be based, for each category comprised in the list of categorises at least one categorisation funct ⁇ on(s) for deter riming quantification for at least one relation between the category and an item, such as a number, a colour, and/or a text, the quantification of relat ⁇ on(s) being determined by executing the categorisation funct ⁇ on(s) for each item to be categorised item data to be used for executing the categorisation funct ⁇ on(s), the said method comprising selecting a first set of categorisation functions and a first set of item data, (A
  • categorisation of items may be construed as linking item and categories, which covers the situations of items being linked to categories, categories being linked to items and/or item and categories being linked
  • Data entities may in this context be computer data of the same kind, for instance a text document, a disk file or a web page
  • a data entity is represented in a computer some information from or about the single data entity are typically stored - that may be title of the data entity, date&time of the data entity, size, text-content of the data entity, locator or path to the data entity etc
  • linking is based on a quantification of relation this being a measure of the relation between an item and a category
  • the quantification of relation may preferably be a number and/or a statement such as false/true
  • the method is categorising items being data entities stored in a computer system
  • items are in the broadest aspect of the present invention preferably considered to be any kind of data, such as entities being grouped, data entities stored in a computer, such as in a memory, on a hard disk or the like
  • items considered are files comprising text, pictures and the like
  • the items considered are web pages stored on one or several web s ⁇ te(s)
  • a list of categories is being supplied, which list may comprise one or more categories
  • the manner in which the list of categories is provided may depend on the actual application/utilisation of the method according to the present invention Different ways of providing that list will be described in connection with the description of preferred embodiments of the invention
  • the user of the method may advantageously provide the list of categories and therefore providing of that list may be viewed upon as being supplied by a step being external with respect to the method of invention But the contents of the list are - of course - utilised by the method according to the present invention and therefore providing that list may be viewed upon as being an integral step of the present invention
  • the integral/external principle outlined above applies also to providing of categorisation funct ⁇ on(s) and item data
  • the categorising method is applied successively in the sense that a first categorisation is based on a first list of categories
  • the result of this first categorisation is then categorised based on a second list of categories, which may be determined/provided on the basis of the first categorisation result
  • the second list comprises sub-categories to a category
  • the list of categories is being built such as constructed, during application of the method
  • a quant ⁇ f ⁇ cat ⁇ on(s) of relation is determined by executing a categorisation function
  • catego ⁇ sation function may be construed in the present context as a function which takes as input information relating to data entities to be categorised and which provides an output quantifying the relation between a category and an item
  • the categorisation functions As input to - or argument for - the categorisation functions is information relating to or corresponding to the items to be categorised, this information is being provided as item data Typically, item data are extracted from the items and the content of the item data corresponds to the input to the categorisation function, but the item data may also comprise information to be processed before being used as argument for the categorisation functions
  • the content of the item data may preferably be static information relating to the items and/or information provided by processing the items
  • categorisation functions By using the concept of categorisation functions another very advantageous technical effect is provided As more than one categorisation function may be provided for one category, items being of different nature, such as a picture or text, may easily be categorised by the method according to the present invention In prior art categorising methods categorisation of items having different nature normally require a huge number of logical operations
  • the first set of categorisation function may comprise one categorisation function or more than one categorisation function, and also depending on the actual implementation/application of the method the first set of item data may comprise item data corresponding to one or more items
  • step (A) of the broad aspect of the present invention the categorisation funct ⁇ on(s) is/are executed on the item data provided This execution will, as stated, provide a first set of quant ⁇ f ⁇ cat ⁇ on('s) of relation, the number of which corresponds to the number of categorisation functions and item data
  • step (B) of the broad aspect of the present invention the linking is performed for the ⁇ tem(s) and category( ⁇ es) considered in step (A)
  • the linking is based on determination of whether a predefined or in general a defined linking criterion is fulfilled
  • the criterion is typically predefined by assigning a criterion to each of the categorisation function and/or by prescribing a criterion common for all catego ⁇ sation functions or for a selection of categorisation function
  • the criterion may also very advantageously be defined during application of the method Once such case could be a situation wherein a restriction to the number items within a category has been prescribed which number may be applied to set a lower limit on the quantification of relation to be observed for linking
  • the manner of selecting the first sets is as indicated above preferably depending on the actual implementation/application of the method
  • a new first set of categorisation funct ⁇ on(s) and/or a new first set of item data is to be selected
  • step (A) and (B) are repeated for the new first sets selected
  • this procedure may be repeated until no further functions and/or no further item data are to be considered
  • the items to be categorised are grouped and each group is tnen considered as an item to be categorised.
  • the item data corresponding to such a group nrjy preferably be a head item for the group and once the head item is categorised the remaining items in the group are categorised according to the head item.
  • step "selecting a first set of catego ⁇ sation function and a first set of item data” may be included or be inherent in step (A) as will be described in connection with descriptions of preferred embodiments of the method.
  • the selecting of a first set of item data may be inherent in providing item data, for instance in the case where this selection comprises selection of all the item data provided, in which case the first set of data may comprise all the item data provided.
  • step (A) and step (B) should not be construed in the sense that these step have to be executed independently of each other.
  • step (A) may very advantageously be executed for one categorisation function where after step (B) is executed based on the result of step (A), which sequence may be repeated until all the categorisation function(s) comprised in the first set of categorisation function has been executed.
  • the grouping of items considered is the partitioning of items into directories in a computer system.
  • the head items are then considered being main directories and once these main directories are categorised the content of these main directories are categorised similar to the main categories.
  • the item data is/are path(s) to a main directory(ies) for each group and once these directories have been categorised, the items in the main directories and sub-directories thereto is categorised according to the categorisation of the main directory.
  • step (A) of the broad aspect comprises the steps of
  • step (c) if the first set of item data comprises non-selected item data or more item data are to be selected then selecting a new item data and repeating step (b) until no further item data is to be selected
  • step (B) of the method according to the broad aspect is performed based on the selected item and the quant ⁇ f ⁇ cat ⁇ on('s) of relation corresponding thereto
  • Selection of an item date from the first set of data may be considered being performed inherently in the selection of a first set of item data in case the method is applied/implemented in a manner in which the selection of the first set of item data comprises selection of only one item
  • This is particular useful in embodiments of the method in which categorisation of items is performed on the fly, i e in the situation wherein an items is categorised when it's item data is provided
  • This preferred embodiment of the present invention might be viewed upon as comprising an outer and an inner loop
  • the outer loop may be seen as the operat ⁇ on(s) involved in providing item data and the categorisation funct ⁇ on(s) to be considered for the item
  • the inner loop may be seen as a loop running through all the categorisation functions thereby providing the quant ⁇ f ⁇ cat ⁇ on('s) of relations and performing the linking
  • This embodiment of the method according to the invention has the advantage of speeding up the categorisation, especially in a situation in which a linking criterion is applied in such a manner that once the criterion has been observed for a quantification of relation no need for looking for another fulfilment observing the criterion is necessary whereby the determination of quantification's may be interrupted and a new item may be selected
  • step (A) of the method comprises the steps of (a) selecting a categorisation function from the first set of categorisation functions, (b) executing said selected categorisation function on the item data comprised in the first set of item data thereby determining quantification of relat ⁇ on(s), and
  • step (c) if the first set of categorisation function comprises a non-selected categorisation function or if more categorisation functions are to be selected then selecting a new categorisation function and repeat step (b) until no further categorisation function is to be selected
  • This embodiment of the invention may serve the purpose of finish up linking between one category and more than one item at a time This may be very advantageously and may be applied when performing a re-catego ⁇ sation in which one category out of a list of categories has been altered In this case links between the new category and items may be performed independently of the former categorisation Also, this embodiment may be applied in case one or more categories are added to a former categorisation
  • step (B) of the method according to the broad aspect is performed based on the items and the quantification's of relation corresponding thereto
  • Selection of a new item data or a new catego ⁇ sation function may be interrupted when no more item data are to be selected or when no more categorisation functions are to be selected Thereby these embodiments may be viewed as a hybrid version comprising categorisation of a number of items according to this preferred embodiment and comprising categorisation by using other embodiments of the method for the remaining number of items to be categorised
  • step (B) may preferably be performed when either no further item data is to be selected or no further categorisation function is to be selected
  • step (B) according to the broad aspect of the method is performed when a quantification of relat ⁇ on(s) has been determined
  • a method in case the linking criterion is fulfilled, further comprises the step of determining whether further quantification of relat ⁇ on(s) corresponding to the item for which the linking criterion has been fulfilled has to be determined
  • This embodiment is particular useful in situation wherein the categorisation of an item may include linking an item and more than one category
  • the determination of whether further quantification of relat ⁇ on(s) has to be determined may be inhabitant in the method/implementation of the method according to the invention This may for instance be the case if the method is so implemented or applied that all categorisation functions are executed on the item data corresponding to said item or said determination may be based on an evaluation of for instance the quantification of relation The latter may be applied as a step to provide a measure for the linking of one item and one category relatively to said item and another category
  • the item data to be used in executing the categorisation funct ⁇ on(s) in the method according to the present invention comprises predefined information relating to the categorisation
  • the information is preferably predefined in such a way that when an item is located the information is extracted from the item
  • the predefined information relating to the categorisation is selected from the group consisting of file name, file extension, the content of a meta-tag, language of the data entity (optionally the language of the item data), position in a directory, individual item or item data assignment and URL
  • step (B) of the method further comprises consulting one or more additional categorisation rules and/or one or more additional functions, the additional categorisation rule(s) and the additional funct ⁇ on(s) being adapted to determine whether the quantification of relat ⁇ on(s) for the item is valid, and if the result of the consultation indicates that the quantification of relat ⁇ on(s) is non- valid then
  • step (i) changing the item data corresponding to the item in question in combination with executing the categorisation funct ⁇ on(s) on the item data thereby altering the quantification of relat ⁇ on(s) of the item data, or (n) altering the quantification of relat ⁇ on(s) based on the additional rule and/or the additional function or performing a combination of step (i) and (n)
  • a quantification of relation may preferably be considered to be valid in case consultation of the additional categorisation rule(s) and/or additional function results in that neither the item data nor the quantification corresponding thereto is subjected to the changed If the consultation reveals that the quantification of relat ⁇ on(s) for the item in question is not valid then either the item data are changed or the quant ⁇ f ⁇ cat ⁇ on(s) of relation ⁇ s(are) changed or a combination of those measures
  • This aspect of the method is especially applicable for error correction purposes and/or for applying a superior categorisation disabling categorisation for a subset of items, said subset being preferably defined by the additional rules and/or additional functions
  • the predefined linking criterion may preferably be that linking is provided between an item and a category if the quantification of relat ⁇ on(s) corresponding to said item and said category is the largest compared to quantification of relat ⁇ on(s) corresponding to said item and all other categories
  • the predefined linking criterion may preferably be that linking is provided between an item and a category if the quantification of relat ⁇ on(s) is within a particular interval
  • the interval may be defined by an upper and/or lower limit, which limits may preferably be expressed by number and/or characters
  • the interval may preferably determined during the categorisation
  • One preferred way of determining the interval to be observed is based on statistics relating to the determined quantification's of relations If for instance the quantification's of relations are mostly represented around a specific quantification then the limits may preferably be set so that only the items represented around that specific quantification observe the criterion
  • the categorisation is applied to a web site
  • the items to be categorised are preferably web pages
  • Categorisation of web pages not being a part of a web site may of course also be categorised by the method according to the present invention
  • the item data on which the categorisation is based are collected by a method comprising, crawling the web site, locating items to be categorised and for each of those located items collecting item data to be used in executing the categorisation funct ⁇ on(s)
  • the crawling is typically performed by use of a crawler - also called a robot, a worm, a spider or the like being set-up to locate items to be categorised
  • the crawler may perform the collecting of item data or the crawler may gather information relating to the items which information may be used by another means adapted to extract item data from the items
  • the collecting of item data comprises interpreting the contents of items so that item data collected corresponding to an item may comprise data related to the content of the item and/or the content such as fragments of the item
  • the crawling of the web site comprises crawling by descriptors, such as paths to web pages and/or paths to web pages in combination with content of specific read data from the web pages
  • descriptors such as paths to web pages and/or paths to web pages in combination with content of specific read data from the web pages
  • a new category or new categories to be added to the list of categories are provided by executing the categorisation funct ⁇ on(s) and/or consulting the additional rule(s) and/or the additional funct ⁇ on(s)
  • the method will be described in at least two sections, one describing the actual categorisation and one describing the use of the categorisation result
  • categorisation In order for the categorisation to be carried out data-items, or information relating thereto, to be categorised must somehow be provided
  • data-items being documents such as web pages located on a web site, but the method according to the invention is, of course, not limited to categorisation of such documents
  • Such web pages are uniquely defined by a URL, a uniform resource locator, being such as file name and path, and documents are "collected" by a well known crawling process utilising a worm which crawls the web site and locates web pages corresponding to a set- up of the worm or the crawling process in general
  • the documents are not collected in the sense that documents are actually copied to another location but the term collected is used to denote the process of identifying documents corresponding to the set-up of the crawling process and extracting information to be used during categorisation such as data from the so called META-tag and URL's corresponding to such documents
  • This list will according to the above discussion comprise a list of URL's and/or other information characterising the documents and being useful for the process of categorisation
  • the categorisation method is based on a categorisation list
  • Each item in the categorisation list comprises a categorisation function that provides by execution a value being termed quantification of relation
  • the quantification of relation may be viewed upon as a measure for how close a fit there is between a category and a document
  • each category is typically assigned a name and the result obtained by executing the categorisation function is assigned a categorisation identity number, a catjd, corresponding to that category the function relates to This may be exemplified by the following
  • a list of categorisation functions may have the following general appearance
  • n categorisation functions are present corresponding to n categories into which documents may be categorised Furthermore, it is by the writing url_ ⁇ indicated that it is the url corresponding to the ⁇ 'th document that is used as an argument to the categorisation function
  • the writing "-> Value_x,Cat_ ⁇ d_x” indicates that the result of executing the categorisation function is at least a value quantifying the relation between the document in question and the category in question Cat_ ⁇ d is preferably inherent in the process as the functions are related to categories, but executing the functions may in some situations derive the Catjd
  • the above example is an example often referred to as categorisation by directory structure.
  • the method is not limited to such cases as the method may apply any kind of categorisation functions as long as execution of those provides a value so as a quantification of relation is provided by execution.
  • the wild card " * " has been used to indicate that any character and number thereof may take the place of the " * ", but other wild-cards system's such as [#@ a
  • the operator is also defined in such a manner that if there is one or more character inconsistently between the two arguments then the number of letters in the intersection is per definition zero. For instance, evaluation of (/dir14/test.*) ⁇ (/dir1/drp5/test.html) results in 0 as will shown below.
  • the linking of a document and a category is based on the quantification of relation and in the preferred embodiment of the present invention a document in question is only to be linked to one category.
  • the criterion to be fulfilled for linking a document and a category is in this preferred embodiment the following: the document is linked to the category for which evaluation of the corresponding function provides the highest quantification of relation.
  • a category may have more than one function assigned which may be exemplified by the functions
  • the actual implementation of the linking process may be done in many different ways, but in the preferred embodiment the executing process has been implementing in the following way Each time the crawling process has located a document to be categorised, all the functions are executed The linking process is initiated by executing the first function in the list and the value resulting from this execution is recorded For the reason of clarifying the discussion only this value is denoted the old value Then the next function is executed and the value resulting thereby (denoted the new value for clarity only) is compared to the recorded value If the old value is smaller than the new value then the new value is recorded and old value is deleted. This procedure is repeated for the remaining functions which results in that when all the functions has been executed then only the largest quantification of evaluation is recorded which then provides the information relating to category and document to be linked.
  • the linking may be performed after the crawling process has located all the documents to be located, and the execution of the functions may be done in such a manner that one function is executed on all documents.
  • a specific important feature of the categorisation method according to the present invention is the methods ability to provide a complete categorisation. This has been provided be including a completion function which when executed will provided a quantification of relation being different from zero independent of the document.
  • break indicates that an discrepancy is found an no more comparison is to be done.
  • the ⁇ -operator provides a zero as result.
  • the completion function could in the present example be expressed as cat_id,/ * and the category identity, cat_id, could most suitable refer to a category termed "Other". Execution of this function will always result in a number being different from zero as all URL always starts with "/" and the wildcard " * " will accept all characters. By applying such a function pages or in general documents which does fit in some of the other categorises goes into the category Other. Furthermore, as this function is similar to the other functions applied the completion function is simply included into the list of functions.
  • the list of functions is hierarchically arranged having the highest prioritised category arranged as the first, i e the first function in the list of functions is the one corresponding to the category having the highest rank
  • the method according to the present invention may very advantageously be used in a kind of recursive manner
  • documents are first categorised according to a master list thereby arranging the documents in master categories
  • Documents arranged in such a master category are then categorised according to a sub-list used for categorising documents in sub-categories
  • a site-map which comprises information regarding all found directories and theirs content
  • this site-map is visualised on a computer screen
  • the user provides a number of categories, which also may be visualised
  • generation of the categorisation function can be performed by linking data entities present in the site-map and categories
  • the crawling process may have located the following items on the web site www science tst, which documents are linked with the categories following below and depicted in Fig 1
  • each line between a document and a category represents a categorisation function to be constructed After this first assignment, which typically is provided by a user of the method the documents, which in this case are directories, are examined and this examination provides the functions
  • the categorisation method may also be used such as to provide a possibility of arranging data according to more than one categorisation
  • a web site or in general the content of a storage medium may be categorised based on internal organisation of the company owning the web site or it may be categorised based content analysis
  • the method according to the present invention is applied to two sets of categories each having a list of categorisation functions
  • the execution of the categorisation function is performed when ever possible, which typically is when a document has been located
  • no memory is used for storing the data-items until processing
  • architecture of the computer used for categorisation may be so that it is advantageously to locate a number of data-item before execution of functions is performed, which number of data-items may be adapted to cache size or the like
  • the method according to the present invention does not require a full categorisation of all the data entities when the number and/or types of data entities are changed
  • the documents or theirs representation comprises a catjd being the result of the categorisation method, and as this catjd is determinable, in general, independently of determination of catjd's for other data-items a new data-item may be categorised when appearing
  • Such a search will in general provide a number of documents being selected by a search criterion/criteria from the categorised web site
  • the documents selected are typically arranged in list being subjected to presentation
  • the documents within these list are represented by a locator such as an url pointing/locating the document and catjd corresponding to the document, which catjd also represents the category to which the documents are linked and vice versa.
  • Displaying of the search result comprises the step finding data-items having the same catjd and arranging these data-items in a list of items to be displayed together with displaying the name of the category.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé de catégorisation d'articles correspondant à des entités de données et la catégorisation d'entités de données correspondant à des pages Web d'un site Web. Un procédé de catégorisation d'entités de données mémorisées dans un système informatique est décrit. Il consiste à assurer la catégorisation, de manière qu'un article et une catégorie soient liés si une quantification déterminée d'une relation entre ladite entité et ladite catégorie remplit des critères prédéfinis. Dans ledit procédé, une liste de catégories sur laquelle la catégorisation doit être basée est utilisée, au moins une fonction de catégorisation pour la détermination de la quantification pour au moins une relation entre la catégorie et une entité et des données d'articles à utiliser pour l'exécution de la ou des fonctions de catégorisation.
EP00984929A 1999-12-30 2000-12-22 Categorisation d'entites de donnees Withdrawn EP1257930A1 (fr)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
DK189099 1999-12-30
DKPA199901890 1999-12-30
US17690600P 2000-01-20 2000-01-20
US176906P 2000-01-20
PCT/DK2000/000726 WO2001050338A1 (fr) 1999-12-30 2000-12-22 Categorisation d'entites de donnees

Publications (1)

Publication Number Publication Date
EP1257930A1 true EP1257930A1 (fr) 2002-11-20

Family

ID=26066185

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00984929A Withdrawn EP1257930A1 (fr) 1999-12-30 2000-12-22 Categorisation d'entites de donnees

Country Status (4)

Country Link
US (1) US20010025277A1 (fr)
EP (1) EP1257930A1 (fr)
AU (1) AU2152501A (fr)
WO (1) WO2001050338A1 (fr)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030128236A1 (en) * 2002-01-10 2003-07-10 Chen Meng Chang Method and system for a self-adaptive personal view agent
US8271495B1 (en) * 2003-12-17 2012-09-18 Topix Llc System and method for automating categorization and aggregation of content from network sites
US7814089B1 (en) 2003-12-17 2010-10-12 Topix Llc System and method for presenting categorized content on a site using programmatic and manual selection of content items
US7975240B2 (en) * 2004-01-16 2011-07-05 Microsoft Corporation Systems and methods for controlling a visible results set
US7930647B2 (en) * 2005-12-11 2011-04-19 Topix Llc System and method for selecting pictures for presentation with text content
US20100023890A1 (en) * 2006-06-30 2010-01-28 Joonas Paalasmaa Listing for received messages
US9405732B1 (en) 2006-12-06 2016-08-02 Topix Llc System and method for displaying quotations
US20080270351A1 (en) * 2007-04-24 2008-10-30 Interse A/S System and Method of Generating and External Catalog for Use in Searching for Information Objects in Heterogeneous Data Stores
CN102737057B (zh) 2011-04-14 2015-04-01 阿里巴巴集团控股有限公司 一种商品类目信息的确定方法及装置
US8914400B2 (en) * 2011-05-17 2014-12-16 International Business Machines Corporation Adjusting results based on a drop point
US20130086485A1 (en) * 2011-09-30 2013-04-04 Michael James Ahiakpor Bulk Categorization
US10140621B2 (en) * 2012-09-20 2018-11-27 Ebay Inc. Determining and using brand information in electronic commerce

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5371807A (en) * 1992-03-20 1994-12-06 Digital Equipment Corporation Method and apparatus for text classification
US5687364A (en) * 1994-09-16 1997-11-11 Xerox Corporation Method for learning to infer the topical content of documents based upon their lexical content
US5717914A (en) * 1995-09-15 1998-02-10 Infonautics Corporation Method for categorizing documents into subjects using relevance normalization for documents retrieved from an information retrieval system in response to a query
GB2336698A (en) * 1998-04-24 1999-10-27 Dialog Corp Plc The Automatic content categorisation of text data files using subdivision to reduce false classification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0150338A1 *

Also Published As

Publication number Publication date
US20010025277A1 (en) 2001-09-27
AU2152501A (en) 2001-07-16
WO2001050338A1 (fr) 2001-07-12

Similar Documents

Publication Publication Date Title
US7949648B2 (en) Compiling and accessing subject-specific information from a computer network
Poshyvanyk et al. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval
Almind et al. Informetric analyses on the world wide web: methodological approaches to ‘webometrics’
US7499913B2 (en) Method for handling anchor text
US6081804A (en) Method and apparatus for performing rapid and multi-dimensional word searches
US20020055919A1 (en) Method and system for gathering, organizing, and displaying information from data searches
US6112204A (en) Method and apparatus using run length encoding to evaluate a database
JP2009238241A (ja) データベースのデータを検索するための方法と装置
US7636732B1 (en) Adaptive meta-tagging of websites
Mitsui et al. Predicting information seeking intentions from search behaviors
US20010025277A1 (en) Categorisation of data entities
EP0782731B1 (fr) Procede et dispositif permettant d'extraire des informations d'une base de donnees
US6711569B1 (en) Method for automatic selection of databases for searching
Qi et al. Measuring similarity to detect qualified links
US20090063464A1 (en) System and method for visualizing and relevance tuning search engine ranking functions
KR100557874B1 (ko) 과학기술 정보분석 방법 및 그 방법에 대한 컴퓨터프로그램을 저장한 기록매체
US20010051942A1 (en) Information retrieval user interface method
CN115794745A (zh) 文件搜索方法、系统、设备及存储介质
Weideman Empirical evaluation of one of the relationships between the user, search engines, metadata and Web sites in three-letter. com Web sites
US20150046437A1 (en) Search Method
JP3558376B2 (ja) 電子ファイリング装置
EP1672544A2 (fr) Amélioration de la qualité d'une recherche textuelle en utilisant des informations organisationnelles
Nowick et al. A model search engine based on cluster analysis of user search terms
Borse et al. Improvement in Ranking Relevancy of Retrieved Results from Google Search Using Feature Score Computation Algorithm
Mostafa Document search interface design: Background and introduction to special topic section

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020730

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17Q First examination report despatched

Effective date: 20040116

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20040727