EP2245553A1 - Procédé de recherche d'une page web à contenu créé par l'utilisateur - Google Patents

Procédé de recherche d'une page web à contenu créé par l'utilisateur

Info

Publication number
EP2245553A1
EP2245553A1 EP08737749A EP08737749A EP2245553A1 EP 2245553 A1 EP2245553 A1 EP 2245553A1 EP 08737749 A EP08737749 A EP 08737749A EP 08737749 A EP08737749 A EP 08737749A EP 2245553 A1 EP2245553 A1 EP 2245553A1
Authority
EP
European Patent Office
Prior art keywords
lexical units
expressions
subset
web page
lexical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP08737749A
Other languages
German (de)
English (en)
Inventor
Eric De Barry
Bertrand Wolf
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alterbuzz
Original Assignee
Alterbuzz
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alterbuzz filed Critical Alterbuzz
Publication of EP2245553A1 publication Critical patent/EP2245553A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the present invention concerns a method to search for a user generated content web page and a software to practice the same.
  • the usual methods consists of choosing some words suited to the object of the search. These words are inputted into the query page of an internet engine such as those proposed by Google Inc. or Yahoo Inc.
  • the search engine lists a set of web pages by their title and a automatically generated short abstract. A link gives access to the web page.
  • Search engines contain some internal, and often secret, algorithms to sort the list of web pages and to show to the user the most pertinent, hopefully, web pages at the beginning of the list.
  • a method to search for an user generated content web page comprises
  • the method further comprises a preliminary step of creating a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.
  • the method has the advantage to select preferentially web pages containing opinions about the selected matter.
  • the database comprises phonetic transcriptions and misspelled versions of said expressions; subsets of feeling expressions and action expressions are arranged as thesaurus in the database of lexical units;
  • each lexical units set of the list is inputted into the internet search engine, and the results for all sets are consolidated into a weighted list of the web pages ;
  • the weight of each web pages is a combination of the order of appearance of the web page in each result and the number of occurrence of the web page in all results .
  • a device to search for a web page comprises
  • means for consolidating results of said internet search engine wherein means for storing lexical units comprises a database of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and means for inputting said set of lexical units is adapted to input set of lexical units comprising at least one expression of each subset.
  • means for storing lexical units comprises means for storing phonetic transcription and misspelled versions of said expressions .
  • a computer program product to search for a web page comprises program instructions to execute the steps of the hereabove method when the computer program product is executed on a computer.
  • Fig. 1 is a schematic view of a terminal connected to internet to practice an embodiment of the invention
  • FIG. 2 is a flowchart of a method according to an embodiment of the invention.
  • - Fig.3 is a functional view of a terminal practicing an embodiment of the invention.
  • a computer is connected to internet network 3. Through the network 3, the computer 1 is connected to a server 5 on which a search engine is running.
  • the server 5 symbolized the infrastructure of search companies such as Google Inc. or Yahoo Inc. In fact, these companies use server farms containing hundred of computers dispatched around the world.
  • a server 7 is also connected to the internet network 3 and contains a web page which is of interest for the user of the computer 1 but its address is not known by the computer 1.
  • the web page contains opinion on a product/service of interest for the user of the computer
  • I is a user generated content web page such as a blog, wiki or forum page.
  • the computer 1 is a classical personal computer. It comprises interface means such as a display 9, a keyboard
  • the storage means 15 contains a computer software product which, when executed by the processing means 17, makes the computer 1 execute the steps of a method to search for a web page according to an embodiment of the invention .
  • the method starts with the creation, step 20, of a database of lexical units to search for.
  • the database is stored in the storage means 15.
  • the database comprises at least three subsets of lexical units:
  • Feeling expressions mean lexical units which are related to the mood or feeling of a human being. For instance, words such as “trouble”, “happy/unhappy”, “unpleasant/pleasant”, etc. define a certain state of mind. Generally, they are the reasons for which a user has posted a message.
  • Action expressions mean lexical units which define action of a user such as "online reservation”, “booking”, “sales”, “offers”, etc.
  • Context expressions mean lexical units which are used to define the context or the specificity of the search. For instance, if the search concerns the comments of travellers having crossed the Channel, the context expressions includes terms like "ferry (ies)", "Channel
  • the subsets of feeling expressions and action expressions are structured in a form of thesaurus to allow a user to grab easily a set of words with similar meanings.
  • Each lexical unit is preferably stored in the database with all its lexical variations such as singular /plural, and with some misspelled forms.
  • the most usual misspelled forms are stored in the database as the searched web pages are edited by normal user with different cultural levels or who practices a foreign language. Therefore, it may be useful to be able to select pages ever if the words are misspelled.
  • a particular misspelled form which is often used by teenagers accustomed to the short messages of the mobile phone is the phonetic form.
  • a set of lexical units is prepared at step 22.
  • the set includes at least one lexical unit of each subset.
  • a set of lexical units is, for instance, "unhappy experience in channel crossing” where "unhappy” belongs to the feeling subset, "experience” belongs to the action subset and "channel crossing” belongs to the context subset.
  • the preparation is, advantageously, a guided automatic process in which the user selects some groups of lexical units in each subset and all the sets combining elements of these groups as well as their different forms are automatically generated, forming a list.
  • each set of lexical units is inputted into an internet search engine, such as Google engine (www . google . com) or Yahoo engine (www. yahoo . com) .
  • This step is done automatically either by building HTTP request with the adapted syntax or by using some Application Specific Interface (API) provided by these companies to automatize the internet searches with their engine .
  • API Application Specific Interface
  • the computer 1 receives classically the results of the requests as a list of links to web pages. For each request, i.e. for each set of lexical units, the search engine returns at least one web page containing a list of links. Therefore, the computer 1 may receive a huge number of links .
  • these lists of links are consolidated. Identical links found on different lists are merged but with, advantageously, an increased weight associated to the link as the fact that a page is selected through different search requests may be an indicia of relevance.
  • the internet search engines sort the found pages to present to the user on top of the list the "most relevant" page.
  • Each internet engine has its own algorithm to sort the pages and this algorithm is often kept secret.
  • the sorted list is also used to give to the "most relevant" pages in the sense of the internet search engine a superior weight.
  • a combination of weights coming from the internet search engine sort and from the number of occurrences is used to sort the consolidated list by decreasing weight.
  • the sorted list contains page addresses as hyperlink fields. Therefore, each page can be addressed by the user to be read.
  • computer 1 comprises, Fig. 3, from a functional point of view, means 30 for storing lexical units.
  • means 30 for storing lexical units comprises a database 32 of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions.
  • Computer 1 comprises also means 34 for inputting a set of lexical units into an internet search engine, the set of lexical units comprising at least one expression of each subset.
  • And computer 1 comprises means 35 for consolidating results send by the internet search engine.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention porte sur un procédé de recherche d'une page Web à contenu créé par l'utilisateur consistant: à former (22) un ensemble d'unités lexicales; à introduire (24) ledit ensemble d'unités lexicales dans un moteur de recherche Internet; et à consolider (28) les résultats dudit moteur de recherche Internet. Le procédé comprend en outre: une étape préliminaire de création (20) d'une base de données d'unités lexicales comprenant au moins trois sous-ensembles d'unités lexicales, soit: un sous-ensemble d'expressions de sentiments, un sous-ensemble d'expressions d'actions, et un sous-ensemble d'expressions de contextes; la formation de l'ensemble d'unités lexicales consistant à regrouper les expressions en ajoutant au moins une expression de chaque sous-ensemble.
EP08737749A 2008-01-29 2008-01-29 Procédé de recherche d'une page web à contenu créé par l'utilisateur Withdrawn EP2245553A1 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2008/051310 WO2009095746A1 (fr) 2008-01-29 2008-01-29 Procédé de recherche d'une page web à contenu créé par l'utilisateur

Publications (1)

Publication Number Publication Date
EP2245553A1 true EP2245553A1 (fr) 2010-11-03

Family

ID=39710949

Family Applications (1)

Application Number Title Priority Date Filing Date
EP08737749A Withdrawn EP2245553A1 (fr) 2008-01-29 2008-01-29 Procédé de recherche d'une page web à contenu créé par l'utilisateur

Country Status (2)

Country Link
EP (1) EP2245553A1 (fr)
WO (1) WO2009095746A1 (fr)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL118580A0 (en) * 1995-06-30 1996-10-16 Massachusetts Inst Technology Method and apparatus for item recommendation using automated collaborative filtering
US7143089B2 (en) * 2000-02-10 2006-11-28 Involve Technology, Inc. System for creating and maintaining a database of information utilizing user opinions
US7519605B2 (en) * 2001-05-09 2009-04-14 Agilent Technologies, Inc. Systems, methods and computer readable media for performing a domain-specific metasearch, and visualizing search results therefrom
AU2003283172A1 (en) * 2003-12-09 2005-06-29 Swiss Reinsurance Company System and method for aggregation and analysis of decentralised stored multimedia data
US20070294230A1 (en) * 2006-05-31 2007-12-20 Joshua Sinel Dynamic content analysis of collected online discussions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2009095746A1 *

Also Published As

Publication number Publication date
WO2009095746A1 (fr) 2009-08-06

Similar Documents

Publication Publication Date Title
US11176575B2 (en) Dynamic content aggregation
CN102246167B (zh) 提供搜索结果
US7849081B1 (en) Document analyzer and metadata generation and use
KR101171405B1 (ko) 검색 결과에서 배치 내용 정렬의 맞춤화
US8306962B1 (en) Generating targeted paid search campaigns
Morris et al. Enhancing collaborative web search with personalization: groupization, smart splitting, and group hit-highlighting
US20070255702A1 (en) Search Engine
US20100306249A1 (en) Social network systems and methods
US20120095834A1 (en) Systems and methods for using a behavior history of a user to augment content of a webpage
US20050222989A1 (en) Results based personalization of advertisements in a search engine
US20140046921A1 (en) Context-based person search
US20120158693A1 (en) Method and system for generating web pages for topics unassociated with a dominant url
US20120254149A1 (en) Brand results ranking process based on degree of positive or negative comments about brands related to search request terms
US20130226950A1 (en) Generalized edit distance for queries
JP2009508267A (ja) ブログ文書のランク付け
KR20060059986A (ko) 문서를 콘텐츠에 매치하기 위해 문서의 의미를 결정하는방법 및 시스템
WO2011062598A1 (fr) Système et procédé de filtrage automatisé de revues pour possibilité de commercialisation
US20070233563A1 (en) Web-page sorting apparatus, web-page sorting method, and computer product
US8024323B1 (en) Natural language search for audience
US10339191B2 (en) Method of and a system for processing a search query
JP4859893B2 (ja) 広告配信装置、広告配信方法、及び広告配信制御プログラム
KR100954842B1 (ko) 카테고리 태그 정보를 이용한 웹 페이지 분류 방법, 그 시스템 및 이를 기록한 기록매체
JP5151368B2 (ja) 情報処理装置および情報処理プログラム
JP4912384B2 (ja) 文書検索装置、文書検索方法、および文書検索プログラム
JP4825669B2 (ja) 文書の意味を決定して文書とコンテンツを一致させる方法及びシステム

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20100820

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA MK RS

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20140801