WO2009039311A2 - Procédé se rapportant à un système contextuel de fourniture de contenu vectorisé - Google Patents

Procédé se rapportant à un système contextuel de fourniture de contenu vectorisé Download PDF

Info

Publication number
WO2009039311A2
WO2009039311A2 PCT/US2008/076905 US2008076905W WO2009039311A2 WO 2009039311 A2 WO2009039311 A2 WO 2009039311A2 US 2008076905 W US2008076905 W US 2008076905W WO 2009039311 A2 WO2009039311 A2 WO 2009039311A2
Authority
WO
WIPO (PCT)
Prior art keywords
terms
content
context
list
webpage
Prior art date
Application number
PCT/US2008/076905
Other languages
English (en)
Other versions
WO2009039311A3 (fr
Inventor
Gregory Harrison
Original Assignee
Mpire Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mpire Corporation filed Critical Mpire Corporation
Publication of WO2009039311A2 publication Critical patent/WO2009039311A2/fr
Publication of WO2009039311A3 publication Critical patent/WO2009039311A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • the URL entry in the database is coded as such, such that further requests from this hosting page/URL can simply be referenced against this now-preclassif ⁇ ed context to then serve appropriate content.
  • This commonly accepted model requires an incredibly large scale of database and processing power at scale because the system must maintain a list of literally every possible URL that hosts an ad placement.
  • the context value of the page is also limited by the frequency in which the hosting page/URL is re-evaluated for new content.
  • the present invention provides an alternative method of achieving contextual ad serving, without the need for this expansive infrastructure of storing every possible hosting URL, and ensuring always-current page context by evaluating the context of the hosting page/URL in real-time, for every ad impression.
  • This is achievable by reversing the model by identifying the corpus of all terms relevant to the available ad inventory (i.e. a selective set of terms) rather than attempting to evaluate the corpus of all terms residing in the hosting page/URL.
  • FIGURE 1 is a block diagram of an example system formed in accordance with an embodiment of the present invention.
  • FIGURES 2-4 are flow diagrams showing processes performed by the system components shown in FIGURE 1.
  • FIGURE 1 illustrates a network environment 20 that includes components coupled to a network 34 for performing the above described service.
  • the network environment 20 includes a web publishing system 36 that produces web content (web page(s) 38) available by user computer-based systems 40 over the network 34.
  • a server 30 with memory 32 provides ad content from the server 30 (memory 32) to a user computer-based system 40 that has accessed a web page 38 that includes an ad content request.
  • FIGURE 2 illustrates a process 60 performed by the server 30 for creating a data package that is to be sent upon request to a web browser running on the user computer-based system 40 that has received the web page 38 that includes an ad content request.
  • one or more context buckets are created.
  • the content of all the buckets is reduced, normalized and rejected based on predefined rules to produce a list.
  • a vector is created for each bucket based on the contents of each bucket and the list.
  • the list, the created vectors and an analyzer engine are included in the data package that is to be sent upon request.
  • the analyzer engine is described in FIGURE 3.
  • FIGURE 3 illustrates a process 80 performed at least partially at the user computer-based system 40 (by the analyzer engine.
  • a user requests a website (webpage 38) via a web browser running on their computer-based system 40. If the webpage 38 includes the ad request (i.e., a URL directed to the server 30), decision block 84, the data package is retrieved from the server 30 at block 88. Then, at block 90, at the web browser the analyzer engine generates a list of terms by performing normalization and rejection of terms in the webpage 38. Next, at block 92, the analyzer engine generates a webpage vector by comparing the list of terms in the webpage 38 with the list included in the data package.
  • ad request i.e., a URL directed to the server 30
  • decision block 84 the data package is retrieved from the server 30 at block 88.
  • the analyzer engine generates a list of terms by performing normalization and rejection of terms in the webpage 38.
  • the analyzer engine generates a webpage vector by
  • the analyzer engine determines which bucket vector is the closest match to the webpage vector, at block 94.
  • information related to the closest matching bucket vector is sent to the server 30.
  • the web browser receives ad content from the server that corresponds to the sent information from block 96 and the ad content is displayed in the webpage 38.
  • the CLASS method includes four basic object types: the TokenSpace, the ContextBucket, the Centroid, and the Document.
  • the ContextBucket serves as a named definition to be eventually associated with a collection of web content (ex: the advertisement content).
  • the ContextBucket has two pieces of member data: a Name and a set of n-grams, which are used as a basis for generating a Centroid.
  • the set of n-grams are descriptors for the ContextBucket.
  • the Centroid is a normalized representation of the ContextBucket. Normalization in this context is defined as one of many methods available for down- casting and/or stemming of n-grams combined with an accept/reject methodology for n-grams.
  • the TokenSpace is a union of all normalized n-grams of each Centroid, ordered by an ordering function (ex: a Latin alphabetical sort).
  • a source-document represents the content being evaluated for contextual mapping.
  • the Document represents the normalized version of the source- document that will be used for term-vector distancing against the Centroids in the TokenSpace.
  • the set of all defined ContextBuckets are iterated over and a Centroid is created for each ContextBucket.
  • the set of n-grams is iterated over and each n-gram is either accepted or rejected by a Centroid building function. Accepted n-grams are then normalized via one or more pluggable normalization providers and are then added to the Centroid.
  • One such example normalization would be keyword stemming (stemming is a process for reducing inflected (or sometimes derived) words to their stem, base or root form).
  • each Centroid is bound to the TokenSpace and a term vector is computed for each Centroid in the TokenSpace.
  • a term vector in this context is a simple list of integers corresponding to the TokenSpace, where each member of the list is equal to the count of the occurrences of the corresponding term from the TokenSpace in the provided Centroid or Document.
  • the system When the system is asked to categorize a source-document, it passes the source document to a Tokenizer.
  • the role of the Tokenizer is to present a set of n- gram candidates to a Document Builder.
  • the Tokenizer uses the same normalization and rejection functions as were configured for the generation of Centroids to process all keywords in the document. Only those normalized keywords/n-grams from the source document that exist in the TokenSpace can be represented as candidates.
  • the Document Builder then builds a Document to represent the source data.
  • the Document represents a normalized set of matching n-grams from the source- document.
  • the document source URL, n-grams, and term-vector are constructed into a Document. Once this Document is constructed, it is passed to a BucketMapper, which categorizes the Document by mapping it to the Centroids in the system. [0045] This mapping by the BucketMapper is performed by finding the Centroid with the "nearest-neighbor" term-vector to the requested Document in the TokenSpace.
  • ⁇ a b acos((a dot b)/(
  • This formula is used to calculate the angles between each Centroid and the given Document, and the Centroid with the lowest angle is chosen as the Centroid for the Document. Since the Centroid is simply a normalized version of the ContextBucket, the desired mapping from source-document to ContextBucket exists.
  • the ContextBucket can be used in association with the delivery of any desired web content.
  • all ContextBuckets can be associated with one or more pieces of ad content. Once the source-document has been mapped to a ContextBucket, the associated ad content can be delivered to the source-document. [0053] While the preferred embodiment of the invention has been illustrated and described, as noted above, many changes can be made without departing from the spirit and scope of the invention. Accordingly, the scope of the invention is not limited by the disclosure of the preferred embodiment. Instead, the invention should be determined entirely by reference to the claims that follow.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne des procédés et des systèmes qui permettent de fournir des publicités contextuelles sans nécessité d'utiliser cette infrastructure expansive de stockage de chaque URL d'hébergement possible; et d'assurer un contexte de page toujours actuel en évaluant en temps réel le contexte de la page/URL d'hébergement pour chaque impression publicitaire. Ce résultat est obtenu par inversion du modèle par identification du corpus de tous les termes pertinents pour l'inventaire publicitaire disponible (c.-à-d. un ensemble sélectif de termes), plutôt que par une tentative d'évaluation du corpus de tous les termes présents dans la page/URL d'hébergement.
PCT/US2008/076905 2007-09-18 2008-09-18 Procédé se rapportant à un système contextuel de fourniture de contenu vectorisé WO2009039311A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US97339307P 2007-09-18 2007-09-18
US60/973,393 2007-09-18
US98668007P 2007-11-09 2007-11-09
US60/986,680 2007-11-09

Publications (2)

Publication Number Publication Date
WO2009039311A2 true WO2009039311A2 (fr) 2009-03-26
WO2009039311A3 WO2009039311A3 (fr) 2009-05-22

Family

ID=40468773

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/076905 WO2009039311A2 (fr) 2007-09-18 2008-09-18 Procédé se rapportant à un système contextuel de fourniture de contenu vectorisé

Country Status (1)

Country Link
WO (1) WO2009039311A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017008552A1 (fr) * 2015-07-15 2017-01-19 腾讯科技(深圳)有限公司 Procédé et dispositif de traitement de fenêtre contextuelle d'informations multimédias, et support de stockage informatique

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040059708A1 (en) * 2002-09-24 2004-03-25 Google, Inc. Methods and apparatus for serving relevant advertisements
US20040181525A1 (en) * 2002-07-23 2004-09-16 Ilan Itzhak System and method for automated mapping of keywords and key phrases to documents
KR20040104060A (ko) * 2003-06-02 2004-12-10 송재현 블로그 컨텐츠의 키워드 분석을 통한 관련 사이트 광고 및링킹 방법
KR20070029389A (ko) * 2005-09-09 2007-03-14 주식회사 엠퓨처 핵심 키워드를 이용한 광고 서비스 제공 방법, 시스템 및이를 구현하기 위한 프로그램이 기록된 기록매체

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181525A1 (en) * 2002-07-23 2004-09-16 Ilan Itzhak System and method for automated mapping of keywords and key phrases to documents
US20040059708A1 (en) * 2002-09-24 2004-03-25 Google, Inc. Methods and apparatus for serving relevant advertisements
KR20040104060A (ko) * 2003-06-02 2004-12-10 송재현 블로그 컨텐츠의 키워드 분석을 통한 관련 사이트 광고 및링킹 방법
KR20070029389A (ko) * 2005-09-09 2007-03-14 주식회사 엠퓨처 핵심 키워드를 이용한 광고 서비스 제공 방법, 시스템 및이를 구현하기 위한 프로그램이 기록된 기록매체

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017008552A1 (fr) * 2015-07-15 2017-01-19 腾讯科技(深圳)有限公司 Procédé et dispositif de traitement de fenêtre contextuelle d'informations multimédias, et support de stockage informatique
US10579690B2 (en) 2015-07-15 2020-03-03 Tencent Technology (Shenzhen) Company Limited Multimedia information pop-up window processing method and device, and computer storage medium

Also Published As

Publication number Publication date
WO2009039311A3 (fr) 2009-05-22

Similar Documents

Publication Publication Date Title
US20210103964A1 (en) Account manager virtual assistant using machine learning techniques
KR101109236B1 (ko) 복수-의미 질의에 대한 관련 용어 제안
US10891699B2 (en) System and method in support of digital document analysis
CN113673262B (zh) 使用统计流数据进行不同语言之间的机器翻译
CA2504181C (fr) Verification de pertinence entre mots-cles et contenus de sites web
Svore et al. Enhancing single-document summarization by combining RankNet and third-party sources
US7890503B2 (en) Method and system for performing secondary search actions based on primary search result attributes
US8495049B2 (en) System and method for extracting content for submission to a search engine
EP1540514B1 (fr) Systeme et procede de mise en correspondance automatisee de mots cles et de phrases cles avec des documents
US20030225763A1 (en) Self-improving system and method for classifying pages on the world wide web
US20070174270A1 (en) Knowledge management system, program product and method
US20020087515A1 (en) Data acquisition system
CN107729336A (zh) 数据处理方法、设备及系统
US20080071753A1 (en) Synthesizing information-bearing content from multiple channels
CN110637316B (zh) 用于预期对象识别的系统和方法
CN107958014B (zh) 搜索引擎
US20160267184A1 (en) Query generation for searchable content
JPH1125108A (ja) 関連キーワード自動抽出装置、文書検索装置及びこれらを用いた文書検索システム
EP2041669A2 (fr) Catégorisation de texte utilisant une connaissance externe
CN107967290A (zh) 一种基于海量科研资料的知识图谱网络构建方法及系统、介质
CN107515904B (zh) 一种职位搜索方法和计算设备
JP2004523838A (ja) 情報のシンボルによるリンクとインテリジェントな分類を行う方法及びシステム
KR20220034134A (ko) 제품들 및 서비스들에 관한 지적 재산 데이터의 분석
JP2013182466A (ja) Web検索システムおよびWeb検索方法
US20090234794A1 (en) Method for a contextual, vector-based content-serving system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08831979

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08831979

Country of ref document: EP

Kind code of ref document: A2