EP2245553A1 - Procédé de recherche d'une page web à contenu créé par l'utilisateur - Google Patents
Procédé de recherche d'une page web à contenu créé par l'utilisateurInfo
- Publication number
- EP2245553A1 EP2245553A1 EP08737749A EP08737749A EP2245553A1 EP 2245553 A1 EP2245553 A1 EP 2245553A1 EP 08737749 A EP08737749 A EP 08737749A EP 08737749 A EP08737749 A EP 08737749A EP 2245553 A1 EP2245553 A1 EP 2245553A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- lexical units
- expressions
- subset
- web page
- lexical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Definitions
- the present invention concerns a method to search for a user generated content web page and a software to practice the same.
- the usual methods consists of choosing some words suited to the object of the search. These words are inputted into the query page of an internet engine such as those proposed by Google Inc. or Yahoo Inc.
- the search engine lists a set of web pages by their title and a automatically generated short abstract. A link gives access to the web page.
- Search engines contain some internal, and often secret, algorithms to sort the list of web pages and to show to the user the most pertinent, hopefully, web pages at the beginning of the list.
- a method to search for an user generated content web page comprises
- the method further comprises a preliminary step of creating a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.
- the method has the advantage to select preferentially web pages containing opinions about the selected matter.
- the database comprises phonetic transcriptions and misspelled versions of said expressions; subsets of feeling expressions and action expressions are arranged as thesaurus in the database of lexical units;
- each lexical units set of the list is inputted into the internet search engine, and the results for all sets are consolidated into a weighted list of the web pages ;
- the weight of each web pages is a combination of the order of appearance of the web page in each result and the number of occurrence of the web page in all results .
- a device to search for a web page comprises
- means for consolidating results of said internet search engine wherein means for storing lexical units comprises a database of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and means for inputting said set of lexical units is adapted to input set of lexical units comprising at least one expression of each subset.
- means for storing lexical units comprises means for storing phonetic transcription and misspelled versions of said expressions .
- a computer program product to search for a web page comprises program instructions to execute the steps of the hereabove method when the computer program product is executed on a computer.
- Fig. 1 is a schematic view of a terminal connected to internet to practice an embodiment of the invention
- FIG. 2 is a flowchart of a method according to an embodiment of the invention.
- - Fig.3 is a functional view of a terminal practicing an embodiment of the invention.
- a computer is connected to internet network 3. Through the network 3, the computer 1 is connected to a server 5 on which a search engine is running.
- the server 5 symbolized the infrastructure of search companies such as Google Inc. or Yahoo Inc. In fact, these companies use server farms containing hundred of computers dispatched around the world.
- a server 7 is also connected to the internet network 3 and contains a web page which is of interest for the user of the computer 1 but its address is not known by the computer 1.
- the web page contains opinion on a product/service of interest for the user of the computer
- I is a user generated content web page such as a blog, wiki or forum page.
- the computer 1 is a classical personal computer. It comprises interface means such as a display 9, a keyboard
- the storage means 15 contains a computer software product which, when executed by the processing means 17, makes the computer 1 execute the steps of a method to search for a web page according to an embodiment of the invention .
- the method starts with the creation, step 20, of a database of lexical units to search for.
- the database is stored in the storage means 15.
- the database comprises at least three subsets of lexical units:
- Feeling expressions mean lexical units which are related to the mood or feeling of a human being. For instance, words such as “trouble”, “happy/unhappy”, “unpleasant/pleasant”, etc. define a certain state of mind. Generally, they are the reasons for which a user has posted a message.
- Action expressions mean lexical units which define action of a user such as "online reservation”, “booking”, “sales”, “offers”, etc.
- Context expressions mean lexical units which are used to define the context or the specificity of the search. For instance, if the search concerns the comments of travellers having crossed the Channel, the context expressions includes terms like "ferry (ies)", "Channel
- the subsets of feeling expressions and action expressions are structured in a form of thesaurus to allow a user to grab easily a set of words with similar meanings.
- Each lexical unit is preferably stored in the database with all its lexical variations such as singular /plural, and with some misspelled forms.
- the most usual misspelled forms are stored in the database as the searched web pages are edited by normal user with different cultural levels or who practices a foreign language. Therefore, it may be useful to be able to select pages ever if the words are misspelled.
- a particular misspelled form which is often used by teenagers accustomed to the short messages of the mobile phone is the phonetic form.
- a set of lexical units is prepared at step 22.
- the set includes at least one lexical unit of each subset.
- a set of lexical units is, for instance, "unhappy experience in channel crossing” where "unhappy” belongs to the feeling subset, "experience” belongs to the action subset and "channel crossing” belongs to the context subset.
- the preparation is, advantageously, a guided automatic process in which the user selects some groups of lexical units in each subset and all the sets combining elements of these groups as well as their different forms are automatically generated, forming a list.
- each set of lexical units is inputted into an internet search engine, such as Google engine (www . google . com) or Yahoo engine (www. yahoo . com) .
- This step is done automatically either by building HTTP request with the adapted syntax or by using some Application Specific Interface (API) provided by these companies to automatize the internet searches with their engine .
- API Application Specific Interface
- the computer 1 receives classically the results of the requests as a list of links to web pages. For each request, i.e. for each set of lexical units, the search engine returns at least one web page containing a list of links. Therefore, the computer 1 may receive a huge number of links .
- these lists of links are consolidated. Identical links found on different lists are merged but with, advantageously, an increased weight associated to the link as the fact that a page is selected through different search requests may be an indicia of relevance.
- the internet search engines sort the found pages to present to the user on top of the list the "most relevant" page.
- Each internet engine has its own algorithm to sort the pages and this algorithm is often kept secret.
- the sorted list is also used to give to the "most relevant" pages in the sense of the internet search engine a superior weight.
- a combination of weights coming from the internet search engine sort and from the number of occurrences is used to sort the consolidated list by decreasing weight.
- the sorted list contains page addresses as hyperlink fields. Therefore, each page can be addressed by the user to be read.
- computer 1 comprises, Fig. 3, from a functional point of view, means 30 for storing lexical units.
- means 30 for storing lexical units comprises a database 32 of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions.
- Computer 1 comprises also means 34 for inputting a set of lexical units into an internet search engine, the set of lexical units comprising at least one expression of each subset.
- And computer 1 comprises means 35 for consolidating results send by the internet search engine.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention porte sur un procédé de recherche d'une page Web à contenu créé par l'utilisateur consistant: à former (22) un ensemble d'unités lexicales; à introduire (24) ledit ensemble d'unités lexicales dans un moteur de recherche Internet; et à consolider (28) les résultats dudit moteur de recherche Internet. Le procédé comprend en outre: une étape préliminaire de création (20) d'une base de données d'unités lexicales comprenant au moins trois sous-ensembles d'unités lexicales, soit: un sous-ensemble d'expressions de sentiments, un sous-ensemble d'expressions d'actions, et un sous-ensemble d'expressions de contextes; la formation de l'ensemble d'unités lexicales consistant à regrouper les expressions en ajoutant au moins une expression de chaque sous-ensemble.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IB2008/051310 WO2009095746A1 (fr) | 2008-01-29 | 2008-01-29 | Procédé de recherche d'une page web à contenu créé par l'utilisateur |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2245553A1 true EP2245553A1 (fr) | 2010-11-03 |
Family
ID=39710949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP08737749A Withdrawn EP2245553A1 (fr) | 2008-01-29 | 2008-01-29 | Procédé de recherche d'une page web à contenu créé par l'utilisateur |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP2245553A1 (fr) |
WO (1) | WO2009095746A1 (fr) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IL118580A0 (en) * | 1995-06-30 | 1996-10-16 | Massachusetts Inst Technology | Method and apparatus for item recommendation using automated collaborative filtering |
US7143089B2 (en) * | 2000-02-10 | 2006-11-28 | Involve Technology, Inc. | System for creating and maintaining a database of information utilizing user opinions |
US7519605B2 (en) * | 2001-05-09 | 2009-04-14 | Agilent Technologies, Inc. | Systems, methods and computer readable media for performing a domain-specific metasearch, and visualizing search results therefrom |
AU2003283172A1 (en) * | 2003-12-09 | 2005-06-29 | Swiss Reinsurance Company | System and method for aggregation and analysis of decentralised stored multimedia data |
US20070294230A1 (en) * | 2006-05-31 | 2007-12-20 | Joshua Sinel | Dynamic content analysis of collected online discussions |
-
2008
- 2008-01-29 EP EP08737749A patent/EP2245553A1/fr not_active Withdrawn
- 2008-01-29 WO PCT/IB2008/051310 patent/WO2009095746A1/fr active Application Filing
Non-Patent Citations (1)
Title |
---|
See references of WO2009095746A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2009095746A1 (fr) | 2009-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11176575B2 (en) | Dynamic content aggregation | |
CN102246167B (zh) | 提供搜索结果 | |
US7849081B1 (en) | Document analyzer and metadata generation and use | |
KR101171405B1 (ko) | 검색 결과에서 배치 내용 정렬의 맞춤화 | |
US8306962B1 (en) | Generating targeted paid search campaigns | |
Morris et al. | Enhancing collaborative web search with personalization: groupization, smart splitting, and group hit-highlighting | |
US20070255702A1 (en) | Search Engine | |
US20100306249A1 (en) | Social network systems and methods | |
US20120095834A1 (en) | Systems and methods for using a behavior history of a user to augment content of a webpage | |
US20050222989A1 (en) | Results based personalization of advertisements in a search engine | |
US20140046921A1 (en) | Context-based person search | |
US20120158693A1 (en) | Method and system for generating web pages for topics unassociated with a dominant url | |
US20120254149A1 (en) | Brand results ranking process based on degree of positive or negative comments about brands related to search request terms | |
US20130226950A1 (en) | Generalized edit distance for queries | |
JP2009508267A (ja) | ブログ文書のランク付け | |
KR20060059986A (ko) | 문서를 콘텐츠에 매치하기 위해 문서의 의미를 결정하는방법 및 시스템 | |
WO2011062598A1 (fr) | Système et procédé de filtrage automatisé de revues pour possibilité de commercialisation | |
US20070233563A1 (en) | Web-page sorting apparatus, web-page sorting method, and computer product | |
US8024323B1 (en) | Natural language search for audience | |
US10339191B2 (en) | Method of and a system for processing a search query | |
JP4859893B2 (ja) | 広告配信装置、広告配信方法、及び広告配信制御プログラム | |
KR100954842B1 (ko) | 카테고리 태그 정보를 이용한 웹 페이지 분류 방법, 그 시스템 및 이를 기록한 기록매체 | |
JP5151368B2 (ja) | 情報処理装置および情報処理プログラム | |
JP4912384B2 (ja) | 文書検索装置、文書検索方法、および文書検索プログラム | |
JP4825669B2 (ja) | 文書の意味を決定して文書とコンテンツを一致させる方法及びシステム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20100820 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA MK RS |
|
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20140801 |