WO2001077935A2 - Contextual e-commerce engine - Google Patents

Contextual e-commerce engine Download PDF

Info

Publication number
WO2001077935A2
WO2001077935A2 PCT/US2000/021289 US0021289W WO0177935A2 WO 2001077935 A2 WO2001077935 A2 WO 2001077935A2 US 0021289 W US0021289 W US 0021289W WO 0177935 A2 WO0177935 A2 WO 0177935A2
Authority
WO
WIPO (PCT)
Prior art keywords
product
documents
command
services
information
Prior art date
Application number
PCT/US2000/021289
Other languages
English (en)
French (fr)
Other versions
WO2001077935A3 (en
Inventor
Samir Elias
Original Assignee
Myprimetime, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Myprimetime, Inc. filed Critical Myprimetime, Inc.
Priority to AU2000265180A priority Critical patent/AU2000265180A1/en
Publication of WO2001077935A2 publication Critical patent/WO2001077935A2/en
Publication of WO2001077935A3 publication Critical patent/WO2001077935A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation

Definitions

  • the invention provides a method for integrating textual content of documents created by authors with stores of information on relevant products and services, and presenting the integrated information to users for their goal-oriented interactions.
  • the integrating is useful for accessing information across data networks such as the Internet and Broadband Telecommunications networks and is especially useful in e- commerce.
  • Web sites which offer a collection of lined hypertext documents controlled by a single person or entity. Because the Web site is controlled by a single person or entity, the hypertext documents, often called “Web pages" in this context, generally have a consistent look and subject matter. Especially in the case of Web sites put up by commercial interests selling goods and services, the hyperlinked documents which form a Web site will have few, if any, links to pages not controlled by the commercial interests.
  • Web site and "Web page” are often used interchangeably, but herein a "Web page” refers to a single hypertext document which forms part of a Web site and "Web site” refers to a collection of one or more Web pages which are controlled (i.e., modifiable) by a single entity (e.g. commercial interest) or group of entities working in concert to present a site on a particular topic.
  • a Web page refers to a single hypertext document which forms part of a Web site
  • Web site refers to a collection of one or more Web pages which are controlled (i.e., modifiable) by a single entity (e.g. commercial interest) or group of entities working in concert to present a site on a particular topic.
  • Internet portals and online magazines provide interactive forums in which documents covering various topics are accessed by individuals through a computer.
  • An application of the interactive nature of online portals and magazines is to provide links (Internet connections) to other Web sites that have products and services that are relevant to the articles displayed.
  • links Internet connections
  • Authors of documents may determine key phrases and words that characterize their documents and then they or users of the Internet may search databases with the key words to find products and services that relate to the subject of the documents.
  • Databases or search engines typically list query returns in order of relevance, weighted by the number of times the key words appear on the first page of a Web site.
  • Spamming occurs when a Web page author includes a word multiple times of times on a Web site merely to increase the weight it is assigned when searches are conducted using that particular key word. This procedure demands more effort from persons who need to examine the results of database searches to determine which Web sites should be linked to the document that was written. Regular search engines do not "learn” nor "remember” which web pages are cheating using spamming techniques. Through human feedback and storage of prior search results, our invention corrects for spamming. Authors or computer programmers then create links to Web sites that have products or services most relevant to a document.
  • the mvention provides a contextual e-commerce engine that integrates the content of documents created by authors, with data on products/services and presents the integrated information to users for goal-oriented interaction.
  • the invention is useful in e-commerce, in particular the Internet.
  • the authors are generally experts in a field related to the content of the documents.
  • the product/services data includes that provided by merchants who partner in Web sites that provide the documents, or by third parties who are not a formal part of the Web site.
  • the invention relates a method for automating interactions between textual content of documents and stores of information on relevant products and services.
  • the method includes searching an article for key words and phrases, searching the stored information for a match to the key words of the articles, mapping (linking) the results of the search between the articles and database, and merging the textual content of the article with an information file from the products and services database.
  • the search is conducted so that the relevant weight of each information file is updated after each use.
  • the invention also provides a method for integrating product data from sources outside a Web site (server) into a catalog database used by the Web site.
  • the method includes providing access to the database for the sources outside the Web site, receiving product data from the sources outside the Web site, storing the data in a persistent store, indexing the stored product data by key stored words, and mapping the indexed product data onto the categories of the Web site catalog database.
  • Definitions API - Application Program Interface When a library of code is created for use with more than one program, the developers usually expose a list of routines known as the API for accessing the library's functions.
  • Categories The product database is organized in part by placing the products into predefined categories. These categories are hierarchical and form a general ontology on the product database. Examples include: Computer products, printers, digital cameras. Contextual Documents - See articles.
  • Inverted index extends the idea of an inverted list.
  • a normal list of documents would simply have the contents of the documents, one after another.
  • an inverted list is a list of words in which each word has a list of documents it appears in.
  • Learned relevance factor (LRF) - As the system accumulates feedback responses, a scalar factor for each lexeme is adjusted to reflect how relevant it was in matching products to documents (articles). This factor is a real number, between -1.0 and +1.0 at all times.
  • Lexeme - When text is submitted to the catalog content engine, it is usually a character stream of fully punctuated and formatted English prose, and must be processed into a form that is usable by the engine. This process includes parsing the text into normalized-case tokens, removing the stopwords, and then stemming the tokens. After this processing, the words in stemmed form are called lexemes.
  • Metadata - Metadata is information about data; e.g. if the data is an article's text, metadata would include the byline, date and time of composition, and the like. Persistent storage - Data that needs to be kept for long periods of time are stored in a way that survives major system maintenance procedures and most transient system failures.
  • Prefix order A tree data structure can be parsed in several ways that uniquely order the nodes. Prefix order means that each node is parsed before its children, sibling nodes are parsed left to right, and children are parsed before siblings. Sentinel indicators show that a node has no children. For given example tree, the prefix order (using a period as the sentinel character) would be: ABD..E..CF..G.. The original tree structure can be completely recovered from this parsing.
  • Skip-list - Skip-lists are an extension of linked lists, and they probabilistically improve search time to log(n) where n is the number of nodes in the list. Skip-lists are described extensively in Pugh (1990).
  • Skip-list index Indices, by their nature, provide fast access to a collection of information, and so skip lists are a logical choice for implementation.
  • a skip-list index is an index where the records are the elements in the list.
  • SKU Stock Keeping Unit.
  • Sparse matrix A sparse matrix is a typically very large matrix with a high percentage of zero entries. Compression techniques known to those of skill in the art are applied to dramatically reduce the storage size and processing time on these matrices.
  • Stemming - Stemming is a process that converts words with multiple forms into one unique form.
  • the verb "drink" in the English language is lexically modified according to tense, and may appear in text as “drink”, “drinking”, or “drank.” After stemming, all occurrences of the verb will appear as "drink.”
  • Stop ord removal Certain words appear in English text so often that they are useless when attempting to distinguish one document from another. Stopword removal is the process of identifying and removing these words from processed text, so that only useful and relevant words remain. Typical stopwords include “in”, “the”, “is”, “that”, and so forth.
  • Third Party - refers to a source outside of the control of an entity operating the primary Web site.
  • Token - A string of characters not containing white space, or a double-quote delimited string of any characters.
  • FIG. 1 is a flow chart illustrating the overall scheme of the Contextual E- commerce EngineTM Technology (CE 2 ).
  • TM Contextual E- commerce EngineTM Technology
  • MPT Web site
  • FIG. 1 is a flow chart illustrating the overall scheme of the Contextual E- commerce EngineTM Technology (CE 2 ).
  • TM There are different sources of data used in this system, including merchant product XML files 105 that describe products available from the merchant, documents/stories written by authors and submitted through a particular entity in control of a Web site (MPT) 103 or a 3 rd party publishing system 101, and streaming video 100, 104 created by specialists and submitted through a 3 rd party publishing system or a particular entity.
  • MPT Web site
  • FIG. 1 is a flow chart illustrating the overall scheme of the Contextual E- commerce EngineTM Technology
  • FIG. 2 is a flow chart illustrating the components of the Catalog Content Engine (CCE) and is a more detailed depiction of the Catalog Content Engine area outlined in FIG. 1.
  • the CCE submission server 200 implements the catalog content product matcher and is divided into three components: the submission server, which manages remote submission of content and feedback, the system manager 202, which provides an interface for basic system management and maintenance functions, and the CCE services API 201, which provides services for the processing and management of product information and article documents.
  • FIG. 3 A and 3B illustrate a typical main page for a Web site using the Catalog
  • Content Engine to display subject text and related products.
  • FIG. 4 A and 4B illustrate a typical subject text and related products page for a Web site using the Catalog Content Engine.
  • FIG. 5 A and 5B illustrate a typical merchant Web site from which a consumer can order products associated with subject text from the CCE using Web site.
  • the invention is directed to a method that integrates textual content written by authors for a primary Web site, with data from products and services supplied by external sources, and provides the integrated information to users for goal-oriented interactions.
  • the invention provides a method for automating interactions between textual/streamed content of articles and database of relevant products and services, the method including searching the article for key words and phrases, searching the product database for a match to the key words of the articles, mapping (linking) the results of the search between the articles and database, and merging the textual/streamed content of the article with selected products/services.
  • the search is conducted such that the relevant weight of each selected product/service entry is updated after each use.
  • the invention also provides a method for integrating product data from sources outside a Web site(server) into a catalog database used by the Web site, the method includes providing access to the database for the sources outside the Web site, receiving product data from the source outside the Web site, storing the data in a product database, indexing the stored product data by key words, and mapping the indexed product data onto the categories of the Web site catalog database.
  • System Architecture There are different sources of data used in this system (refer to FIG. 1):
  • the merchant product XML files are submitted through a web interface to a waiting CatalogEngineTM 108, which will extract the product information from the files and insert this information into the product database.
  • the extracted product information will also be indexed into a fast access, incrementally updateable text retrieval database(Product Index DB 111) for later access by the
  • the product information is further refined by mapping the product categories to internally defined product categories using the CatalogMapperTM 109.
  • the article/story/video data is submitted by the content submission/publishing tools to the CatalogContentEngineTM 123, which will map(tie) the data to the products from the Product Index DB.
  • the mapping is saved in the Article/Products database 117, for later access by the Contents erverTM 114, which serves the article/category/product relationship in a predefined XML/HTML format.
  • the invention is directed to a method that integrates textual/streamed content written by authors for a primary Web site, with products supplied by external sources.
  • the external product and service sources use a web interface to download their product catalogs written in XML.
  • XML Extended Markup Language
  • Each product catalog has merchant name/category/brand/model/price/description/ image url/home url/status fields for each product/service.
  • the CatalogEngineTM 108 extracts the merchant name/category/brand/model/description for each product/service from the XML file and passes it to the CataloglndexerTM 112, and passes this information and all other product information to the Product Database 115.
  • the CatalogEngineTM 108 also creates a unique identifier for each product, this identifier is used to relate each product entry in the Product Database 115, to the product entry in the Product Index DB 111.
  • a notification is sent by the CatalogEngineTM 108, to a person responsible for mapping the new product categories to the internally defined categories.
  • the CatalogMapperTM 109 is used for mapping the identified product categories to product categories as specified(to a large extent) by the UN/Standard Product and Services Code(UNSPSC), an international, evolving standard. The mapping is kept together with product information in the Product Database 115.
  • the external product and service sources use a Web interface to download their product catalogs written to a specific format, generally in XML (extended markup language) to the user Web site.
  • XML is a computer programing language that allows specific information to be electronically labeled as specific items such as a product name, cost, or size.
  • the catalogs are retrieved by an internal database, which gathers all product and service information and stores it in a persistent file.
  • Each product catalog has merchant name/category/brand/model/price/description/image url/home url/description for each product/service from the XML file and passes it to the CataloglndexerTM and passes this information and all other product information to the Product Database.
  • the CatalogEngineTM also creates a unique identifier for each product which is used to relate each product entry in the Product Database to the product entry in the Product Index Database. For each new group of product categories identified by the CatalogEngineTM as categories new/different from what is in the Product Database, a notifications sent by the CatalogEngineTM to a person responsible for mapping the new product categories to the internal defined categories.
  • Catalog Mapper Contextual Intelligent matching technology includes a tool used to map product and service categories from outside sources, to the private database categories.
  • This function serves to place information files about a merchant product or service into appropriate categories of the primary Web site database. These categories are to a large extent based on the UN/Standard Product and Services Code (UNSPSC), an international, evolving,, standard. The mapped categories are stored in the persistent file together with the product information.
  • UNSPSC UN/Standard Product and Services Code
  • the indexed database categorizes products using Natural Language Processing to choose appropriate terms and phrases from product descriptions. This allows the accessing of products using the indexed terms and phrases.
  • the product index database generated is a dynamically updateable database. New products/services can be indexed incrementally, so that there is no need to fully index the product database for each new merchant catalog.
  • the authors of the articles use third party publishing software known to those of skill in the art to write their stories/articles.
  • the authors can hook up directly to a search engine which goes through an article/story and also scans the product index database looking for matching products/services. Upon finding a match, the search engine creates an article to product mapping in the persistent store. The search engine learns from writer feedback to make future choices more accurate.
  • the local server creates a link by merging the article/story HTML text together with the categories/product HTML. Consumers (users) are then able to view the articles/stories along side the relevant categories and products. In addition to the users being able to view a listing of the private database catalog, they can also click through to the merchant partner product pages. A merchant partner is joined contractually to the proprietor of the Web site. Catalog Context Engine
  • the Catalog Content Engine (CCE) is separated into two logical subsystems, each of which can be located on the same physical machine.
  • the Catalog Content Engine (CCE) 123 implements both the CataloglndexerTM 112, and the CatalogContentMatcherTM 110, and has four components: (1) the submission server 200, which manages remote submission of content and feedback; (2) the system manager 202, which provides an interface for basic system management and maintenance functions (startup, shutdown, parameter modification); (3) the CCE services API 201, which provides services for the processing and management of product information and article documents; and (4) the Product Index DB 111, which provides storage services for managing the stored and processed information and metadata about articles and products.
  • the system manager can be implemented in a variety of ways.
  • the Product Index DB 111 provides persistent storage services for all of the components described above. These services are described below, in detail, in the Serialization section.
  • the API is broken down into several kinds of entities:
  • a product object has its relevant information stored in several forms:
  • the inverted index contains the processed text of the product description, and is managed separately as a collection of postings.
  • the inverted index is described herein.
  • the product object contains all other fields, such as price.
  • a particular product is uniquely defined by the vendor name, brand name, and model name/number, and is referable in the database by either a key constructed from those three fields, or the SKU/part number if available.
  • the text of the product description is stored separately, in flat files.
  • the categories are stored in a tree structure, wherein the tree structure matches the ontological structure of the categories.
  • the root node of the tree does not correspond to any category, but is rather a container for all the top-level category nodes.
  • Each node has a link to its parent node and its child nodes.
  • the category tree also has a skip-list index to all the nodes, so that particular category nodes can be found quickly.
  • the inverted index is the core of the product Index DB. It is separated into two components, the dictionary and the postings.
  • the dictionary is a list of all the lexemes known to the system, with a few essential statistics on each lexeme available. For each lexeme in the dictionary, there is a posting that stores which documents contain this lexeme, and with what frequency.
  • the dictionary itself is a skip-list of lexeme objects, where the key is the lexeme string itself.
  • Each lexeme is the unique stemmed word form of a particular word. (Stemming is described herein, along with product document management.)
  • Each lexeme includes: a count of the number of documents it appears in, the total number of appearances in the entire set of product documents, a frequency relevance factor (FRF) a learned relevance factor (LRF) that will be used in query processing, and a reference to the posting cache where the posting can be found.
  • FRF frequency relevance factor
  • LRF learned relevance factor
  • a posting is an indexed sparse vector (i.e., a linked list) of floating point values, where each value is the number of appearances of the particular lexeme in each document, each indexed entry corresponds to a particular document, and the correspondence holds across all of the postings.
  • the conjunction of all the posting vectors forms a matrix, and queries are executed by applying operations to this matrix, so postings should support basic sparse matrix operations.
  • Postings are stored in several posting caches, according to size, as described herein.
  • a posting cache is a limited-size collection of posting objects employing the standard least-recently-used algorithm for determining which postings to keep in memory. There is a list of caches, one for each scale class of posting.
  • a scale class caching system is a scheme in which several different caches are made, according to posting size, where size is measured in the number of non-zero entries. For example, one cache which may hold all postings under 100 entries, another may hold postings under 200 entries, a third may hold posting under 400 entries, and so forth.
  • the best size and number of caches to use are determined empirically, by examining the distribution of posting sizes itself. This scheme can be tuned to maintain continuous retention of any particular class of postings.
  • An article (document) object stores the processed information corresponding to an article, including a list of lexemes, a list of keywords, lists of vendor names, product names, and categories found in the text, a list of noun-phrases, and a unique key to index the article in the database.
  • the Product Index Database provides persistent storage services for all of the components described above. These services are described herein in detail in the Serialization section.
  • the Catalog Content Engine also supports Catalog Indexing. When a product description is submitted for catalog indexing, it includes a vendor name, a brand name, a model name/number, a list of keywords, a list of categories, and a product description.
  • the Catalog Content Engine also supports Catalog Content Matching.
  • Catalog Content Matcher When an article is submitted to the Catalog Content Matcher for matching to products, it is a character stream of standard English prose text, fully punctuated and formatted, accompanied by a list of highly relevant keywords.
  • the text must be processed for use in the system.
  • the punctuation is removed, the letter case is normalized, and the text is parsed into a stream of tokens. This stream of tokens is used for several separate product-matching tasks.
  • the token stream is converted to a list of unique lexemes and lexeme counts as described for product descriptions. These lexemes select certain rows out of the whole matrix. A sub-matrix that contains only those rows is constructed for performing operations on it.
  • the frequency relevance factor for a particular lexeme is calculated in the following way (Byoung-Gyu):
  • the frequency relevance factor is calculated for one lexeme at a time, and then the log sum of the current row plus the frequency plus the learned relevance factor (adjusted for product appearance count) is accumulated with the previously processed rows, dividing the results by the total number of rows to normalize the sums when complete, as follows (Byoung-Gyu):
  • prod the product in question productA ⁇ pearanceCount(pro , lex) : how many times lex appears in the description of the product prod
  • Accum(pro-t) the accumulated relevance for the product prod so far
  • matchClass the type of keyword under consideration (categories, vendor names, and the like)
  • matchClass, product the relevance factor for a particular product under a particular keyword type
  • Machine class refers to which particular item is counted: keywords, vendor and product names, or category names. ⁇ oun-phrases are parsed out of the token stream, and then the list of noun-phrases is compared to the list of noun-phrases for each product and matches are counted and processed as discussed above.
  • each individual matching technique here called a match class: keyword, category, vendor names
  • a match class keyword, category, vendor names
  • the score from each technique is multiplied by the weight, so that good techniques affect the score more than inferior techniques.
  • Using a combination of techniques is better than relying on a single technique, because it greatly reduces the difference between the match score and what might be an ideal match score. In statistical parlance, it smooths out noise. This process yields a single relevance score for each product in the top-scoring set of products.
  • the Catalog Content Engine also features a persistent storage serialization mechanism, implemented through the posting cache system described earlier.
  • Each cache has a corresponding directory on disk in which the posting vectors from the inverted list index are stored, sorted in subdirectories according to the first two characters of the lexeme.
  • the Catalog Content Engine updates the appropriate posting vectors in each cache, and then the caches in turn manage serializing the updated information to disk.
  • the place they occupy is marked with a "tombstone” (an industry standard file processing technique) so that the data they contain will not be considered in query processing, and so that the system manager can reclaim the space later.
  • the system manager can reclaim the space by physically removing the corresponding data from each vector in the inverted list index. This procedure is very resource intensive, as it requires modifying every part of the entire index, and should therefore be performed offline.
  • Each posting vector from the inverted list stores the lexeme itself, the frequency count of lexeme occurrences in the entire set of product descriptions, the frequency relevance factor, the learned relevance factor, and the frequency count of the lexeme for each product in whose information the lexeme appears.
  • the posting files for each cache are stored in a subdirectory system consisting of one directory for each letter of the alphabet, each of which contains the posting files for all lexemes beginning with that letter.
  • the category tree is stored in a single file. The tree is written to the file in prefix order, with a special indicator marking the end of a list of sub-branches of a particular branch in the tree. Prefix order dictates that each branch in the tree is written before its sub-branches.
  • the persistent information that needs to be serialized includes the dictionary, the inverted index postings, the products metadata, and the category tree, as well as the index structures on each. Each of these groups of information is stored in a separate folder on the disk, to make the organizational structure clear. Each file has a header indicating what purpose the file serves.
  • the dictionary is stored as a series of lexeme entries. Each entry includes the lexeme itself, the count of occurrences in the entire set of product descriptions, the frequency relevance factor, the learned relevance factor, and the filename of the posting, which is derivable from the lexeme itself.
  • a posting from the inverted index is stored in a file named after the lexeme, and the posting files are stored in a directory system consisting of one directory for each letter of the alphabet, each of which contains the posting files for all lexemes beginning with that letter.
  • An individual posting file contains the lexeme and the floating point posting values.
  • the metadata on the products is stored in a binary file as a series of product entries.
  • the first part of the file indicates how many products are stored in the file.
  • Each product entry begins with the product's ID, followed by the vendor name, the brand name, and the model name/number. Following that is a list of keywords, a list of category names, a count of the number of noun-phrases found in the product description, and then the noun-phrases themselves.
  • the recommended implementation is for each file indexed with multiple records stored within it to have an index file built.
  • Each entry in the index file corresponds to one record, and has two things in it: a key to the record, and an integer indicating the record's offset position within the file, in bytes.
  • a number of utility services are needed by the system in multiple places. These include efficient fast-access containers (skip-lists), fast string matching routines, and text parsing routines. Skip lists are described in detail by Pugh. Text parsing is straightforward, and standard tokenizing routines can be used. Feedback learning
  • the system supports its own modification through feedback, so that it can be tuned for more effective matching.
  • the facilitating mechanism is a set of tunable learned relevance factors (LRFs) for lexemes in the index, which combine with the frequency relevance factors to give more accurate weights for each lexeme's relevance. Initially, these relevance factors are all uniform to render their effects invisible. As feedback indicates which lexemes are most helpful in matching products to articles, the LRFs are modified accordingly.
  • LRFs learned relevance factors
  • LRF(.ex) (arctan(tota_Positives(f ⁇ e) - totalNegatives(Zfix)) + 1) / 2 totalPositives(.ex) : total number of positive reinforcements for the lexeme lex total negatives(.ex) : total number of negative reinforcements for the lexeme lex LRF(tex) : the learned relevance frequency for the lexeme lex
  • EXAMPLE 1 MYPRIMETIMETM WEB SITE ARTICLE SCREENS The primary interface experienced by users of the Contextual E-Commerce
  • EXAMPLE 2 SUBMITTING A PRODUCT
  • TEXT Fast and accurate speech recognition system that instantly converts the spoken word into written text on the PC screen.
  • EXAMPLE 3 SUBMITTING AN ARTICLE The typical situations for using the CCE Services Protocol Description (see
  • Article related situations are fairly straight forward, and are illustrated by example 2.
  • Article related situations are somewhat more complex because they involve communication with a human: the author. In general, the author follows the same steps for each session: write the article; submit the article for a test query; process feedback; adjust some parameters for matching; submit the article for storage. The other two scenarios illustrate these interactions.
  • TEXT Myprimetime speaks with sound editor Dane Davis, whose work on The Matrix snagged an Academy Award nomination.
  • Article related situations are somewhat more complex because they involve communication with a human: the author.
  • the author follows the same steps for each session: write the article; submit the article for a test query; process feedback; adjust some parameters for matching; submit the article for storage. This example illustrates these interactions.
  • Catalog Content Engine Protocol Description The CatalogContentEngineTM 123, provides access to the catalog indexing and content matching services via the Catalog Content Engine Protocol (CCEP) described below.
  • CCEP Catalog Content Engine Protocol
  • This client-server protocol is text-based, explicitly session-based, connectionless, client-side command driven, and can be implemented for remote access on TCP.
  • the server accepts new articles and products for indexing, processes requests for deleting articles and products from the index, executes matching queries, gathers feedback data, and performs maintenance functions on itself.
  • the client chooses which commands to execute, sends command requests, and the server responds as necessary.
  • command block The commands that can be executed follow a general syntax: command name on the first line, command operand on the next line, followed by zero or more parameters on subsequent lines. This structure is called a command block.
  • the character set for commands is US-ASCII.
  • the standard white-space characters space, horizontal tab, vertical tab) are the token delimiters.
  • Certain command parameters are parsed verbatim, as required. Command names and parameters are not case sensitive, except where case-sensitivity is required for some reason (i.e. IDs, article text, and the like).
  • Command blocks are terminated by two sequential new-line characters. Every command block issued will elicit at least one response message from the server indicating the result of the command's attempted execution, and sometimes multiple response messages will be sent.
  • Sessions are maintained by the explicit request of the client.
  • the server When a client issues a command block, if no session ID is given, the server will generate one if it may be necessary. Certain commands do not establish a session, and therefore will not cause a session ID to be generated. If a client wishes to continue a session, the given session ID should be specified before the command block, as illustrated in the examples below. Clients should indicate when they no longer require a session causing the session to be closed. The server will invalidate all sessions which are idle beyond a specified time limit. Command parameters are specified in standard regular expression syntax.
  • a token is a string of characters not containing white space, or a double-quote delimited string of any characters.
  • Special cases include any, which indicates that any (possibly empty) string of characters is acceptable and taken verbatim in most cases, and none, which indicates that no parameters should be specified. The following is a list of the commands that can be processed by the server:
  • MODE TEST parameter the server processes the article without permanently storing it, and results are only available during this session. If the parameter MODE STORE is specified, the server processes and stores the article. Note that if an article ID is not specified, one will be generated and returned by the server if the
  • MODE STORE parameter is specified.
  • the META parameter allows the client to indicate various fields that are not of direct interest to the indexer, such as a URL for purchase, etc. Note that if a product ID is not specified, one will be generated and returned by the server. The generated ID will be formed by the concatenation of the vendor name, the brand name and the model name.
  • m ACCEPT productID
  • DECLINE productID Description: Return a set of feedback responses to the server regarding the most recently submitted article. If the match is acceptable, the ACCEPT parameter is used, and if the match is not acceptable, the DECLINE parameter is used.
  • Server response messages fall into several classes: server status information, error messages, and data transmission to the client. Each class has a range of numbers assigned to it, by hundreds: 100 for status information, 300 for error messages, and 400 for data transmission. Server response messages are always given in the same format: Message number, followed by a space, the message text, and a new-line character. The following is a list of possible server response messages:
  • Invalid article ID An invalid article ID was given.
  • Invalid session ID An invalid session ID (one either not issued by the server or one that has been closed) was given as part of a command block.
  • Session The server issued a new session ID, or used the indicated ID for the transaction.
  • ProductID A product ID returned as the result of a command (e.g. MATCH).
  • Lexeme lexeme relevance A lexeme and its numeric relevance to a particular match, returned as the result of an EXPLAIN command.
  • 403 Keyword keyword A keyword returned as the result of an ._. EXPLAIN command.
  • Category category A category returned as the result of an

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Circuits Of Receivers In General (AREA)
PCT/US2000/021289 2000-04-07 2000-08-03 Contextual e-commerce engine WO2001077935A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2000265180A AU2000265180A1 (en) 2000-04-07 2000-08-03 Contextual e-commerce engine

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US19538100P 2000-04-07 2000-04-07
US60/195,381 2000-04-07

Publications (2)

Publication Number Publication Date
WO2001077935A2 true WO2001077935A2 (en) 2001-10-18
WO2001077935A3 WO2001077935A3 (en) 2004-02-19

Family

ID=22721199

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/021289 WO2001077935A2 (en) 2000-04-07 2000-08-03 Contextual e-commerce engine

Country Status (2)

Country Link
AU (1) AU2000265180A1 (US06573293-20030603-C00009.png)
WO (1) WO2001077935A2 (US06573293-20030603-C00009.png)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7082427B1 (en) * 2000-05-24 2006-07-25 Reachforce, Inc. Text indexing system to index, query the archive database document by keyword data representing the content of the documents and by contact data associated with the participant who generated the document

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7330850B1 (en) 2000-10-04 2008-02-12 Reachforce, Inc. Text mining system for web-based business intelligence applied to web site server logs

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998036366A1 (en) * 1997-02-13 1998-08-20 Northern Telecom Limited An associative search engine
US5995943A (en) * 1996-04-01 1999-11-30 Sabre Inc. Information aggregation and synthesization system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5995943A (en) * 1996-04-01 1999-11-30 Sabre Inc. Information aggregation and synthesization system
WO1998036366A1 (en) * 1997-02-13 1998-08-20 Northern Telecom Limited An associative search engine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ASHISH N ET AL: "Semi-automatic wrapper generation for Internet information sources" PROCEEDINGS OF THE IFCIS INTERNATIONAL CONFERENCE ON COOPERATIVE INFORMATION SYSTEMS, COOPIS, XX, XX, 24 June 1997 (1997-06-24), pages 160-169, XP002099173 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7082427B1 (en) * 2000-05-24 2006-07-25 Reachforce, Inc. Text indexing system to index, query the archive database document by keyword data representing the content of the documents and by contact data associated with the participant who generated the document

Also Published As

Publication number Publication date
AU2000265180A8 (US06573293-20030603-C00009.png) 2002-01-10
AU2000265180A1 (en) 2001-10-23
WO2001077935A3 (en) 2004-02-19

Similar Documents

Publication Publication Date Title
US6012053A (en) Computer system with user-controlled relevance ranking of search results
US6826557B1 (en) Method and apparatus for characterizing and retrieving query results
US8495049B2 (en) System and method for extracting content for submission to a search engine
US6944612B2 (en) Structured contextual clustering method and system in a federated search engine
US8452766B1 (en) Detecting query-specific duplicate documents
US8290956B2 (en) Methods and systems for searching and associating information resources such as web pages
US6820075B2 (en) Document-centric system with auto-completion
US20080147716A1 (en) Information nervous system
US7194457B1 (en) Method and system for business intelligence over network using XML
Van Zwol et al. Faceted exploration of image search results
US20130060746A1 (en) Automatic Object Reference Identification and Linking in a Browseable Fact Respository
US20030033287A1 (en) Meta-document management system with user definable personalities
US20110184827A1 (en) System with user directed enrichment
US20090240674A1 (en) Search Engine Optimization
CN100462969C (zh) 利用互联网为公众提供和查询信息的方法
EP1428138A2 (en) Indexing a network with agents
JP2009271911A (ja) 情報のシンボルによるリンクとインテリジェントな分類を行う方法及びシステム
CA3088560A1 (en) Systems and methods for identifying documents with topic vectors
Croft et al. Search engines
WO2001077935A2 (en) Contextual e-commerce engine
Halpin A Query-Driven Characterization of Linked Data.
Watters et al. Rating news documents for similarity
Guan et al. Structure-based queries over the world wide Web
WO2022005272A1 (en) System and method for hot topics aggregation using relationship graph
Amitay et al. Multi-resolution disambiguation of term occurrences

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OFF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC(EPO COMMUNICATION= FORM 1205A DATED :19.05.2003)

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP