EP2245553A1 - Method to search for a user generated content web page - Google Patents
Method to search for a user generated content web pageInfo
- Publication number
- EP2245553A1 EP2245553A1 EP08737749A EP08737749A EP2245553A1 EP 2245553 A1 EP2245553 A1 EP 2245553A1 EP 08737749 A EP08737749 A EP 08737749A EP 08737749 A EP08737749 A EP 08737749A EP 2245553 A1 EP2245553 A1 EP 2245553A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- lexical units
- expressions
- subset
- web page
- lexical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Definitions
- the present invention concerns a method to search for a user generated content web page and a software to practice the same.
- the usual methods consists of choosing some words suited to the object of the search. These words are inputted into the query page of an internet engine such as those proposed by Google Inc. or Yahoo Inc.
- the search engine lists a set of web pages by their title and a automatically generated short abstract. A link gives access to the web page.
- Search engines contain some internal, and often secret, algorithms to sort the list of web pages and to show to the user the most pertinent, hopefully, web pages at the beginning of the list.
- a method to search for an user generated content web page comprises
- the method further comprises a preliminary step of creating a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.
- the method has the advantage to select preferentially web pages containing opinions about the selected matter.
- the database comprises phonetic transcriptions and misspelled versions of said expressions; subsets of feeling expressions and action expressions are arranged as thesaurus in the database of lexical units;
- each lexical units set of the list is inputted into the internet search engine, and the results for all sets are consolidated into a weighted list of the web pages ;
- the weight of each web pages is a combination of the order of appearance of the web page in each result and the number of occurrence of the web page in all results .
- a device to search for a web page comprises
- means for consolidating results of said internet search engine wherein means for storing lexical units comprises a database of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and means for inputting said set of lexical units is adapted to input set of lexical units comprising at least one expression of each subset.
- means for storing lexical units comprises means for storing phonetic transcription and misspelled versions of said expressions .
- a computer program product to search for a web page comprises program instructions to execute the steps of the hereabove method when the computer program product is executed on a computer.
- Fig. 1 is a schematic view of a terminal connected to internet to practice an embodiment of the invention
- FIG. 2 is a flowchart of a method according to an embodiment of the invention.
- - Fig.3 is a functional view of a terminal practicing an embodiment of the invention.
- a computer is connected to internet network 3. Through the network 3, the computer 1 is connected to a server 5 on which a search engine is running.
- the server 5 symbolized the infrastructure of search companies such as Google Inc. or Yahoo Inc. In fact, these companies use server farms containing hundred of computers dispatched around the world.
- a server 7 is also connected to the internet network 3 and contains a web page which is of interest for the user of the computer 1 but its address is not known by the computer 1.
- the web page contains opinion on a product/service of interest for the user of the computer
- I is a user generated content web page such as a blog, wiki or forum page.
- the computer 1 is a classical personal computer. It comprises interface means such as a display 9, a keyboard
- the storage means 15 contains a computer software product which, when executed by the processing means 17, makes the computer 1 execute the steps of a method to search for a web page according to an embodiment of the invention .
- the method starts with the creation, step 20, of a database of lexical units to search for.
- the database is stored in the storage means 15.
- the database comprises at least three subsets of lexical units:
- Feeling expressions mean lexical units which are related to the mood or feeling of a human being. For instance, words such as “trouble”, “happy/unhappy”, “unpleasant/pleasant”, etc. define a certain state of mind. Generally, they are the reasons for which a user has posted a message.
- Action expressions mean lexical units which define action of a user such as "online reservation”, “booking”, “sales”, “offers”, etc.
- Context expressions mean lexical units which are used to define the context or the specificity of the search. For instance, if the search concerns the comments of travellers having crossed the Channel, the context expressions includes terms like "ferry (ies)", "Channel
- the subsets of feeling expressions and action expressions are structured in a form of thesaurus to allow a user to grab easily a set of words with similar meanings.
- Each lexical unit is preferably stored in the database with all its lexical variations such as singular /plural, and with some misspelled forms.
- the most usual misspelled forms are stored in the database as the searched web pages are edited by normal user with different cultural levels or who practices a foreign language. Therefore, it may be useful to be able to select pages ever if the words are misspelled.
- a particular misspelled form which is often used by teenagers accustomed to the short messages of the mobile phone is the phonetic form.
- a set of lexical units is prepared at step 22.
- the set includes at least one lexical unit of each subset.
- a set of lexical units is, for instance, "unhappy experience in channel crossing” where "unhappy” belongs to the feeling subset, "experience” belongs to the action subset and "channel crossing” belongs to the context subset.
- the preparation is, advantageously, a guided automatic process in which the user selects some groups of lexical units in each subset and all the sets combining elements of these groups as well as their different forms are automatically generated, forming a list.
- each set of lexical units is inputted into an internet search engine, such as Google engine (www . google . com) or Yahoo engine (www. yahoo . com) .
- This step is done automatically either by building HTTP request with the adapted syntax or by using some Application Specific Interface (API) provided by these companies to automatize the internet searches with their engine .
- API Application Specific Interface
- the computer 1 receives classically the results of the requests as a list of links to web pages. For each request, i.e. for each set of lexical units, the search engine returns at least one web page containing a list of links. Therefore, the computer 1 may receive a huge number of links .
- these lists of links are consolidated. Identical links found on different lists are merged but with, advantageously, an increased weight associated to the link as the fact that a page is selected through different search requests may be an indicia of relevance.
- the internet search engines sort the found pages to present to the user on top of the list the "most relevant" page.
- Each internet engine has its own algorithm to sort the pages and this algorithm is often kept secret.
- the sorted list is also used to give to the "most relevant" pages in the sense of the internet search engine a superior weight.
- a combination of weights coming from the internet search engine sort and from the number of occurrences is used to sort the consolidated list by decreasing weight.
- the sorted list contains page addresses as hyperlink fields. Therefore, each page can be addressed by the user to be read.
- computer 1 comprises, Fig. 3, from a functional point of view, means 30 for storing lexical units.
- means 30 for storing lexical units comprises a database 32 of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions.
- Computer 1 comprises also means 34 for inputting a set of lexical units into an internet search engine, the set of lexical units comprising at least one expression of each subset.
- And computer 1 comprises means 35 for consolidating results send by the internet search engine.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method to search for an user generated content web page comprises • preparing (22) a set of lexical units, • inputting (24) said set of lexical units into an internet search engine, • consolidating (28) results of said internet search engine. The method further comprises a preliminary step of creating (20) a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions, and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.
Description
METHOD TO SEARCH FOR A USER GENERATED CONTENT WEB PAGE
Field of the invention
The present invention concerns a method to search for a user generated content web page and a software to practice the same. Background of the invention
Nowadays, to search for a web page, the usual methods consists of choosing some words suited to the object of the search. These words are inputted into the query page of an internet engine such as those proposed by Google Inc. or Yahoo Inc.
As a result, the search engine lists a set of web pages by their title and a automatically generated short abstract. A link gives access to the web page.
Search engines contain some internal, and often secret, algorithms to sort the list of web pages and to show to the user the most pertinent, hopefully, web pages at the beginning of the list.
However, this method is not very efficient when the search concerns user' s opinion about a product or a service .
Indeed, when a user would like to buy a product or a service it is nowadays a common practice to search for the opinions of the prior buyers or users. This information may be found on blogs, wikis, forums and any other web site where a "standard" user may post a message. These opinions around a product or service generate a buzz which has a positive, or negative, impact on the success of the product/service.
It is therefore important for marketing department as well as for users to dispose of a method which is efficient to find the web pages containing opinions on a
defined product or service while leaving aside "classical" web pages concerned by the product/service such as pages of merchant web site, of price comparators, etc . Summary of the invention
To better address one or more concerns, in a first aspect of the invention, a method to search for an user generated content web page comprises
• preparing a set of lexical units,
• inputting said set of lexical units into an internet search engine,
• consolidating results of said internet search engine .
The method further comprises a preliminary step of creating a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.
Therefore, the method has the advantage to select preferentially web pages containing opinions about the selected matter.
In particular embodiments :
- the database comprises phonetic transcriptions and misspelled versions of said expressions; subsets of feeling expressions and action expressions are arranged as thesaurus in the database of lexical units;
- based on a first set of selected lexical units, a list of lexical units sets to be inputted as prepared,
combining various phonetic transcriptions, misspelled versions and synonyms of each selected expressions;
- each lexical units set of the list is inputted into the internet search engine, and the results for all sets are consolidated into a weighted list of the web pages ;
- the weight of each web pages is a combination of the order of appearance of the web page in each result and the number of occurrence of the web page in all results .
Aspects of these embodiments may be combined or modified as appropriate or desired, however.
In a second aspect of the invention a device to search for a web page comprises
• means for storing lexical units,
• means for inputting a set of lexical units into an internet search engine,
• means for consolidating results of said internet search engine, wherein means for storing lexical units comprises a database of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and means for inputting said set of lexical units is adapted to input set of lexical units comprising at least one expression of each subset.
In a particular embodiment, means for storing lexical units comprises means for storing phonetic transcription and misspelled versions of said expressions .
In a third aspect of the invention, a computer program product to search for a web page comprises
program instructions to execute the steps of the hereabove method when the computer program product is executed on a computer.
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiment described hereafter where:
Fig. 1 is a schematic view of a terminal connected to internet to practice an embodiment of the invention;
- Fig. 2 is a flowchart of a method according to an embodiment of the invention; and
- Fig.3 is a functional view of a terminal practicing an embodiment of the invention.
Detailed description
In reference to Fig. 1, a computer is connected to internet network 3. Through the network 3, the computer 1 is connected to a server 5 on which a search engine is running. The man skilled in the art understands that the server 5 symbolized the infrastructure of search companies such as Google Inc. or Yahoo Inc. In fact, these companies use server farms containing hundred of computers dispatched around the world.
A server 7 is also connected to the internet network 3 and contains a web page which is of interest for the user of the computer 1 but its address is not known by the computer 1. The web page contains opinion on a product/service of interest for the user of the computer
I and is a user generated content web page such as a blog, wiki or forum page.
The computer 1 is a classical personal computer. It comprises interface means such as a display 9, a keyboard
II and a mouse 13 or the like.
It comprises also storage means 15 and processing means 17 such as, for instance, hard disk drives and motherboard.
The storage means 15 contains a computer software product which, when executed by the processing means 17, makes the computer 1 execute the steps of a method to search for a web page according to an embodiment of the invention .
In reference to Fig. 2, the method starts with the creation, step 20, of a database of lexical units to search for.
The database is stored in the storage means 15.
The database comprises at least three subsets of lexical units:
- a subset of feeling expressions;
- a subset of action expressions; and a subset of context expressions.
Feeling expressions mean lexical units which are related to the mood or feeling of a human being. For instance, words such as "trouble", "happy/unhappy", "unpleasant/pleasant", etc. define a certain state of mind. Generally, they are the reasons for which a user has posted a message.
Action expressions mean lexical units which define action of a user such as "online reservation", "booking", "sales", "offers", etc.
Context expressions mean lexical units which are used to define the context or the specificity of the search. For instance, if the search concerns the comments of travellers having crossed the Channel, the context expressions includes terms like "ferry (ies)", "Channel
, etc .
Advantageously, the subsets of feeling expressions and action expressions are structured in a form of thesaurus to allow a user to grab easily a set of words with similar meanings.
Each lexical unit is preferably stored in the database with all its lexical variations such as singular /plural, and with some misspelled forms. Specifically, the most usual misspelled forms are stored in the database as the searched web pages are edited by normal user with different cultural levels or who practices a foreign language. Therefore, it may be useful to be able to select pages ever if the words are misspelled. A particular misspelled form which is often used by teenagers accustomed to the short messages of the mobile phone is the phonetic form.
After the creation of the database, at least, a set of lexical units is prepared at step 22. The set includes at least one lexical unit of each subset.
In the example of the Channel crossing, a set of lexical units is, for instance, "unhappy experience in channel crossing" where "unhappy" belongs to the feeling subset, "experience" belongs to the action subset and "channel crossing" belongs to the context subset.
The preparation is, advantageously, a guided automatic process in which the user selects some groups of lexical units in each subset and all the sets combining elements of these groups as well as their different forms are automatically generated, forming a list.
At step 24, each set of lexical units is inputted into an internet search engine, such as Google engine (www . google . com) or Yahoo engine (www. yahoo . com) . This step is done automatically either by building HTTP
request with the adapted syntax or by using some Application Specific Interface (API) provided by these companies to automatize the internet searches with their engine .
At step 26, the computer 1 receives classically the results of the requests as a list of links to web pages. For each request, i.e. for each set of lexical units, the search engine returns at least one web page containing a list of links. Therefore, the computer 1 may receive a huge number of links .
At step 28, these lists of links are consolidated. Identical links found on different lists are merged but with, advantageously, an increased weight associated to the link as the fact that a page is selected through different search requests may be an indicia of relevance.
As known by the man of skilled in the art, the internet search engines sort the found pages to present to the user on top of the list the "most relevant" page. Each internet engine has its own algorithm to sort the pages and this algorithm is often kept secret.
During the consolidation step 28, the sorted list is also used to give to the "most relevant" pages in the sense of the internet search engine a superior weight.
A combination of weights coming from the internet search engine sort and from the number of occurrences is used to sort the consolidated list by decreasing weight.
The sorted list contains page addresses as hyperlink fields. Therefore, each page can be addressed by the user to be read.
By using the sorted consolidated list, the user has a great chance to read web pages which are relevant to his/her search of user's opinion on a product /service.
Indeed, computer 1 comprises, Fig. 3, from a functional point of view, means 30 for storing lexical units. Typically, means 30 for storing lexical units comprises a database 32 of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions.
Computer 1 comprises also means 34 for inputting a set of lexical units into an internet search engine, the set of lexical units comprising at least one expression of each subset.
And computer 1 comprises means 35 for consolidating results send by the internet search engine.
While the invention has been illustrated and described in details in the drawings and foregoing description, such illustration and description are to be considered illustrative and exemplary and not restrictive, the invention is not limited to the disclosed embodiment.
Other variations to the disclosed embodiment can be understood and effected by these skilled in the art in practising the claimed invention, from a study of the drawings, the disclosure and the appended claims. In the claims, the word "comprising" does not exclude other elements and the indefinite article "a" "or" "an" does not exclude a plurality.
Claims
1. Method to search for a user generated content web page comprising
• preparing (22) a set of lexical units,
• inputting (24) said set of lexical units into an internet search engine,
• consolidating (28) results of said internet search engine wherein the method comprises a preliminary step of creating (20) a database of lexical units comprising at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions, and the preparation of the set of lexical units comprises the assembly of at least one expression of each subset.
2. Method according to claim 1, wherein the database comprises phonetic transcriptions and misspelled versions of said expressions.
3. Method according to claims 1, 2, wherein subsets of feeling expressions and action expressions are arranged as thesaurus in the database of lexical units.
4. Method according to claims 2, 3, wherein, based on a first set of selected lexical units, a list of lexical units sets to be inputted as prepared, combining various phonetic transcriptions, misspelled versions and synonyms of each selected expressions.
5. Method according to claim 4, wherein each lexical units set of the list is inputted into the internet search engine, and the results for all sets are consolidated into a weighted list of the web pages.
6. Method according to claim 5, wherein the weight of each web pages is a combination of the order of appearance of the web page in each result and the number of occurrence of the web page in all results.
7. Device to search for a user generated content web page comprising
• means (30) for storing lexical units,
• means (34) for inputting a set of lexical units into an internet search engine,
• means (36) for consolidating results of said internet search engine, wherein means for storing lexical units comprises a database (32) of lexical units having at least three subsets of lexical units: a subset of feeling expressions, a subset of action expressions and a subset of context expressions and means for inputting said set of lexical units is adapted to input set of lexical units comprising at least one expression of each subset.
8. Device according to claim 7, wherein means for storing lexical units comprises means for storing phonetic transcription and misspelled versions of said expressions .
9. A computer program product to search for an user generated content web page comprising program instructions to execute the steps of the method according to any one of claims 1 to 6 when said computer program product is executed on a computer.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IB2008/051310 WO2009095746A1 (en) | 2008-01-29 | 2008-01-29 | Method to search for a user generated content web page |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2245553A1 true EP2245553A1 (en) | 2010-11-03 |
Family
ID=39710949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP08737749A Withdrawn EP2245553A1 (en) | 2008-01-29 | 2008-01-29 | Method to search for a user generated content web page |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP2245553A1 (en) |
WO (1) | WO2009095746A1 (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IL118580A0 (en) * | 1995-06-30 | 1996-10-16 | Massachusetts Inst Technology | Method and apparatus for item recommendation using automated collaborative filtering |
EP1272942A4 (en) * | 2000-02-10 | 2008-09-10 | Involve Technology Inc | System for creating and maintaining a database of information utilizing user opinions |
US7519605B2 (en) * | 2001-05-09 | 2009-04-14 | Agilent Technologies, Inc. | Systems, methods and computer readable media for performing a domain-specific metasearch, and visualizing search results therefrom |
AU2003283172A1 (en) * | 2003-12-09 | 2005-06-29 | Swiss Reinsurance Company | System and method for aggregation and analysis of decentralised stored multimedia data |
WO2007142998A2 (en) * | 2006-05-31 | 2007-12-13 | Kaava Corp. | Dynamic content analysis of collected online discussions |
-
2008
- 2008-01-29 WO PCT/IB2008/051310 patent/WO2009095746A1/en active Application Filing
- 2008-01-29 EP EP08737749A patent/EP2245553A1/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO2009095746A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2009095746A1 (en) | 2009-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11176575B2 (en) | Dynamic content aggregation | |
CN102246167B (en) | Providing search results | |
US7849081B1 (en) | Document analyzer and metadata generation and use | |
KR101171405B1 (en) | Personalization of placed content ordering in search results | |
US8306962B1 (en) | Generating targeted paid search campaigns | |
US9245022B2 (en) | Context-based person search | |
Morris et al. | Enhancing collaborative web search with personalization: groupization, smart splitting, and group hit-highlighting | |
US20070255702A1 (en) | Search Engine | |
US20100306249A1 (en) | Social network systems and methods | |
US20120095834A1 (en) | Systems and methods for using a behavior history of a user to augment content of a webpage | |
US20050222989A1 (en) | Results based personalization of advertisements in a search engine | |
US20120254149A1 (en) | Brand results ranking process based on degree of positive or negative comments about brands related to search request terms | |
US20130226950A1 (en) | Generalized edit distance for queries | |
JP2009508267A (en) | Ranking blog documents | |
KR20060059986A (en) | Methods and systems for determining a meaning of a document to match the document to conte | |
WO2011062598A1 (en) | System and method for automated filtering of reviews for marketability | |
US20070233563A1 (en) | Web-page sorting apparatus, web-page sorting method, and computer product | |
US8380745B1 (en) | Natural language search for audience | |
JP4859893B2 (en) | Advertisement distribution apparatus, advertisement distribution method, and advertisement distribution control program | |
KR100954842B1 (en) | Method and System of classifying web page using category tag information and Recording medium using by the same | |
JP4912384B2 (en) | Document search device, document search method, and document search program | |
JP4825669B2 (en) | Method and system for determining the meaning of a document and matching the document with the content | |
US9208260B1 (en) | Query suggestions with high diversity | |
JP2009086944A (en) | Information processor and information processing program | |
JP5315726B2 (en) | Information providing method, information providing apparatus, and information providing program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20100820 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA MK RS |
|
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20140801 |