GB2405709A - Search engine optimization using automated target market user profiles - Google Patents

Search engine optimization using automated target market user profiles Download PDF

Info

Publication number
GB2405709A
GB2405709A GB0320583A GB0320583A GB2405709A GB 2405709 A GB2405709 A GB 2405709A GB 0320583 A GB0320583 A GB 0320583A GB 0320583 A GB0320583 A GB 0320583A GB 2405709 A GB2405709 A GB 2405709A
Authority
GB
United Kingdom
Prior art keywords
web
pages
site
search
queries
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0320583A
Other versions
GB0320583D0 (en
Inventor
Peter H Mowforth
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TELEIT Ltd
Original Assignee
TELEIT Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TELEIT Ltd filed Critical TELEIT Ltd
Priority to GB0320583A priority Critical patent/GB2405709A/en
Publication of GB0320583D0 publication Critical patent/GB0320583D0/en
Priority to PCT/GB2004/003780 priority patent/WO2005024661A2/en
Priority to GB0604247A priority patent/GB2419993A/en
Publication of GB2405709A publication Critical patent/GB2405709A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation

Abstract

A means for continuously adjusting a Web site or Web pages in order to achieve optimization with respect to search engines, comprises first, means for automatically clustering relevant pages and sites according to target market segment via enumerated permutations of keywords and phrases and the use of searches for relevant web sites, and second means for customer profiling based on the clustering of the first means and processing of aggregated queries to search engines. The first and second means are used by an evaluation means which provides an automatic numerical ranking of pages and sites, and which provides continuous adjustment and optimization of web sites and web pages.

Description

Automatic Target Market User Profiles The invention relates to the
development and maintenance of a Web Site or other infor- maton resource where statistical data from the Web and from search engine usage is used for customer profiling and market analysis, and this information used on an ongoing basis to optimize the positioning of the Web site car information resource.
Background
The World Wide Web (the Web) is a source of information unprecedented in its scope and availability. The decentralized nature of the Web makes searching for information to satisfy a specific requirement challenging. This challenge has been met by the Search Engine. Search engines scan the Web and index Web sites and permit searches to be conducted over these indexes. Since, in many cases, information can only be obtained from search engines, visibility in searches becomes of critical importance for web sites.
Methods for optimizing a site for discovery via search engines are well known. Such methods include market segmentation and competitive analysis based on information from search engines. These methods suffer from the disadvantage of being manual, and considerable effort is needed to update sites. Updating on a frequent basis is necessary to cope with rapid changes m the structure and content of the Web.
Related work The World Wide Web (the Web), initially developed in 1989 by Tim Bcrners-Lee at CORN, is a "universe of network-accessible information, the embodiment of human knowl- edge".
The Web has been turned to commercial purposes since the mid 1 990s and an increasing volume of traffic on the Web is devoted to commerce.
lasers of the Web, for both commercial and non-commercial use, often need to find spe- c'fic Information t'rom the Web. An increasingly common tool for searching out specific 2/9 information requirements is the Search Engine. Search Engines provide an automated in- dex of the Web by systematically exploring the Web and recording and indexing the Web sites they visit.
To find information via a search engine a query Is submitted, consstmg of a set of search terms, and a ranked list of results is returned.
It is in the interest of a Web site to be as highly ranked as possible with respect to relevant queries. The process of tuning a web site in order to maximize its ranking is known as Search Engine Optimization.
Search Engine Optimization is a largely manual process, which can involve some or all of the following steps 1. Identifying the market served by the web site.
2. Identifying competitors.
3. Selecting an appropriate set of search keywords.
4. Designing the site to maximize its search engine "visibility".
The invention described herein Improves on the manual process by automating steps 1, 2 and 3 and helping to automate step 4.
Summary of the Invention
Embodiments of the invention will now be described, by way of example only, with ref- erence to the accompanying drawings, in which: Figure I shows the overall control flow of the invention.
Figure 2 shows the means for determining overall market segmentation, and a measure of similarity between Web pages.
Figure 3 shows the means for determining customer profiles.
Figure 4 shows the means for determining a numerical evaluation of Web pages. 3/9
The principal object of this invention is to provide a methodology for continuous ad- justment to a site in order to optimize it with respect to search engines. Accordingly, this invention, provides a means for providing a quantitative ranking for a Web site by evaluating the site with respect to a given market segment and competitive environment.
The ranking mechanism, the competitive environment and the market segmentation are produced automatically by appropriate interaction with one or more search engines. The automatic qualitative ranking allows a search procedure to continually optimize the rank- ing by suggesting small changes which can be continuously evaluated.
The principal driver for the invention is the idea that the principal external driver for optimization is to exploit the difference between the information and services which are currently available, as indicated by Search Engines and what the users/customers want, as indicated by searches and the choices made by users as a result of searches.
Market Segmentation The objective of the competitive analysis is to apply Statistical and Machine Learning techniques to develop clusters of web sites, where each cluster represents a related set of competitors.
The starting point for analysis is a basic set of keywords and phrases relevant to the domain. In addition web sites which are known to fall in the set can be used, as described below.
Algorithm One I. The keywords and phrases are used to generate a set of permutations of subsets of the words and phrases. If there are rib keywords, then there are (k) subsets of length k and (7k)! permutations of these subsets.
2. Each permutation is presented to a series of search engines.
3. Each web page, retrieved from the first N hits on the search engine, is saved, in- dexed by its associated permutation and ranked from 1 to N. 4/9 4. The web pages are processed to remove common words, and the words are stemmed using the Porter algorithm [Porter, 1980].
5. The web pages retrieved in this way arc filteecd for duplicates and then Latent Se- mantic Analysis [Deerwester et al., 1990] is used to create a similarity measure between sites. The similarity measure is the Euclidean distance between pages with respect to the first N latent components, where N is 50 in the pret'erred imple- mentation.
6. The web pages are clustered using k-means clustering [MacQueen, 1965]. The clustering metric Is the similarity measure determined by the Latent Semantic Anal- ysis. Each cluster represents a sector of competition for the "product" defined via the keywords and phrases.
Discussion The use of Latent Semantic Analysis (' SI) on web pages retrieved by searches on pennu- tations of the original keywords and phrases reduces dependence of results on the initial teems and phrases, since ail the information in the retrieved pages is used to determine a clustering metric and the effects of polysemy and synonymy are reduced.
The number of dimensions used from the Singular Value Decomposition in LSI is vari- abic. In the preferred implementation of this invention, 50 dimensions are used.
The clusters can be used decomposed in various ways, for instance geographically, or- thogonal to the original construction via semantic analysis of keywords.
The clustering analysis uses only Web addresses acquired from searches. More sophisti- cated strategies for clustering involving further search using web-bots to explore links not pursued by the standard search engmes can be used where the market is specialized. IS/ 9
Customer Profile The objective of customer profiling Is to determine a set of different customer mforma- tion requirements, each requirement relating to a particular"customer profile". As with Market Segmentation the starting point is a set of keywords and phrases. In the case of Market Segmentation we are interested in the totality of available information within the range defined by the keywords and phrases. With customer profiinmg we are interested in clustering search queries, chosen from our set of keywords and phrases, in such a way that each cluster contains queries used by customers in search of a particular class of infonma- tion resource. This approach is related to those of [Wen et al., 2002] and [Becferman and Berger, 2000] among others.
The information available consists of triples (Q. U. R) where Q Is a query, U is the URL (Uniform Resource Locator) which was selected in response to that query and R is the rank of the selected URL with respect to the query. This is termed "clickthrough" data and is available from search engines.
The value of this data for clustering queries is shown by the following related observa- bons. If two different users search with terms "fly" and "ant" but select the same URL, there is evidence that the search terms are related to a common mfonnation requirement.
Similarly if two distinct users search on the same tend "ant" and visit different URLs there is some evidence that these two URLs are related. Note that such evidence is statistical, the tenm "law" for example might relate to either the legal system or physics.
There are three kinds of information available for clustering.
1. The similarity between queries 2. The similarity between URLs 3. The link structure between queries and URLs S'milanty between queries can be defined as the proportion of words or phrases which they have in common. Similarity between URLs is defined in terms of the distance mea- sure described above in terms of Market Segmentation. The link structure between queries 6/9 and URLs can be described as a bipartite graph. The "white" nodes of the graph are the unique queries, and the "black" nodes are URLs. The similarity between two nodes ofthe same color is the proportion of links they share compared to their total number of links.
It is also possible to assign a weight to links in this graph, the weight being a function of the ranking of the URL selected from a query presented to a search engine.
There are some natural variations of these similarity measures. For example, search I queries which are permutations of each other can be considered equal, and queries can be clustered by containment in a natural manner.
A similarity measure between the queries (white vertices), based on a combination of the above similarities, and a complementary measure between the URLs can be generated by a weighted combination of the individual similarities.
The clustering algorithm, which clusters both URLs and queries, proceeds as follows. It is a version of Hierarchical Agglomerative Clustering (MAC) [Ward, Jr., 1963]. It proceeds on the assumption that the number of URLs is very considerably larger than the number of queries. , Algorithm Two i 1. The two query nodes with the greatest similarity are merged. Record this merger.
2. The most similar URLs are merged. This Is done a reasonably large number of times, since there are many more URLs than queries. Record these mergers.
3. (ioto step l unless the number of queries has been reduced below threshold.
The end result of this algorithm is a hierarchical clustering of both queries and URLs. For any query cluster there is an associated set of URL clusters.
Discussion We make the assumption for any query cluster that each URL cluster corresponds to a distinct market for the query cluster. 7/9
Optimization The goal of optimization is to position an information resource (a Web site, viewed as a set of web pages) so as to maximize its initial value, and to continually evaluate the situation on an ongoing basis so as to maximize its continuing value.
The goal is not to maximize the number of visitors to the site, but the maximize the number of visitors who pay to consume the sited resources. This may mean making a purchase, or just takmg the time to read articles and information on the site.
It is difficult to determine the conversion rate of visitors who consume unless one can investigate in detail the behavior of visitors and perform experiments. We assume that it is possible to optimize conversion rate separately from visitor rate once the initial posi- toning of the site or page has been determined.
The goal of optimization is to generate a numerical measure of the "fitness" ol a web page/site. The following assumptions are made Pages which are semantically similar with respect to the distance measure described in the section titled "Market Segmentation" and which lie in the same cluster with respect to the Customer Profile, and with similar strengths, will attract a similar number of visitors. This is a base assumption, In that it justifies the use a numerical measure of Web site utility.
The ratio of number of visitors to cluster size is a direct determinant of the value of a cluster, as larger values imply more visitors for any site in the cluster.
A query cluster which relates to a URL cluster with low average rank is interesting since users have consistently chosen low-ranking URLs from search queries.
These observations can be used to assign a numerical rank to web pages and web sites.
The factors used to deterrent ranking are.
1. The cluster value (visitors/element).
2. The within-cluster ranking, determined by ranking score on queries within the clus- ter's associated queries. This ranking Is comparative. 8/9
3. The relevance of the cluster to the product being sold.
We now describe the optimization process. Optimization is typically performed for either a web site or a small group of interrelated pages, for example those describing a particular product.
1. Produce an initial set of keywords and phrases.
2. Produce a market segmentation consisting of a similarity measure between web pages, and a clustering of sites/pages as described in Algorithm One above.
3. Produce a Customer Profile, as described in Algorithm Two above.
4. Create initial site and web pages.
5. Determine the degree of membership of pages in the clusters produced hi Step 3.
6. Assign pages to clusters based on a determination of how its numerical ranking can be optimized. This Is a manual process, consisting of the following steps.
(a) Estimate the clusters most relevant to the product bemg sold.
(b) Modify the page in terms of keywords and language to minimize its distance to the cluster center.
7. Make the pages/site live.
8. Monitor the numerical ranking of pages. This is necessary to determine factor 2 above.
9. Monitor the ranking of the site on a continual basis by repeating steps 2, 3 and 8.
References [Beeferman and Berger, 2000] Doug Beeferman and Adam Bergcr. Agglomerative clus- tering of a search engine query log. In Raghu Ramakrishnan, Sal Stolfo, Roberto Ba- yardo, and Ismail Parsa, editors, Pr>ceedinmg.v 'J the 6th ACM SICKDD lnternuti'nul 9/9 Conference on Knowledge Discovery and Data Mining (KDD- OO), pages 407-416, N. Y., August 20-23 2000. ACM Press.
[Deerwester et al, 1990] Scott Deerwester, Susan Dumais, Goerge Furnas, Thomas Lan- dauer, and Richard Harshman. Indexing by latent semantic analysis. Journal oJ the American Society for Information 'Science, 41 (6):391 -407, 1990.
[MacQueen, 1965] J. MacQuecn. Some methods for classification and analysis of multi- variate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281-297, Berkeley, CA, 1965. University of Califor- nia Press.
[Porter, 1980] M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137, July 1980.
[Ward, Jr., 1963] J. I I. Ward, Jr. Hierarchical grouping to optimize an objective function.
Journal of the American Statistical association, 58:236-244, 1963.
[Wen et al., 2002] Ji-Rong Wen, Jian-Yun N'e, and Hong-Jiang Zhang. Query clustering using user logs. ACM Transactions on Information Systems, 20(1) :59-81, January 2002.

Claims (5)

  1. Claims 1. The automatic positioning of a web site with respect a set of
    search engines and methods for continuous adjustment of the site to optimize positioning.
  2. 2. A method, as in Claim I for the assignment of a numerical ranking to a web site or sub-site of a web site based on customer profile, market segmentation and searches conducted via a search engine
  3. 3. A method, as in Claim 1, for adjustment of a web site in order to maximize a numerical ranking, as in Claim 2.
  4. 4. A method for producing customer profiles, as in Claim 2, based on analysis of queries placed with search engines.
  5. 5. A method for market segmentation, as in Claim 2, based on analysis of the web, based on specialized search engines.
GB0320583A 2003-09-03 2003-09-03 Search engine optimization using automated target market user profiles Withdrawn GB2405709A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
GB0320583A GB2405709A (en) 2003-09-03 2003-09-03 Search engine optimization using automated target market user profiles
PCT/GB2004/003780 WO2005024661A2 (en) 2003-09-03 2004-09-03 Improved search engine optimisation
GB0604247A GB2419993A (en) 2003-09-03 2004-09-03 Improved search engine optimisation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0320583A GB2405709A (en) 2003-09-03 2003-09-03 Search engine optimization using automated target market user profiles

Publications (2)

Publication Number Publication Date
GB0320583D0 GB0320583D0 (en) 2003-10-01
GB2405709A true GB2405709A (en) 2005-03-09

Family

ID=28686802

Family Applications (2)

Application Number Title Priority Date Filing Date
GB0320583A Withdrawn GB2405709A (en) 2003-09-03 2003-09-03 Search engine optimization using automated target market user profiles
GB0604247A Withdrawn GB2419993A (en) 2003-09-03 2004-09-03 Improved search engine optimisation

Family Applications After (1)

Application Number Title Priority Date Filing Date
GB0604247A Withdrawn GB2419993A (en) 2003-09-03 2004-09-03 Improved search engine optimisation

Country Status (2)

Country Link
GB (2) GB2405709A (en)
WO (1) WO2005024661A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170665A (en) * 2017-11-29 2018-06-15 有米科技股份有限公司 Keyword expanding method and device based on comprehensive similarity

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7487144B2 (en) 2006-05-24 2009-02-03 Microsoft Corporation Inline search results from user-created search verticals
US8650191B2 (en) * 2010-08-23 2014-02-11 Vistaprint Schweiz Gmbh Search engine optimization assistant
US10127314B2 (en) 2012-03-21 2018-11-13 Apple Inc. Systems and methods for optimizing search engine performance
FR3032291B1 (en) * 2015-02-04 2022-06-24 Jalis TOOL AND METHOD FOR IMPROVING THE REFERENCING OF A WEBSITE
FR3034893A1 (en) * 2015-04-10 2016-10-14 Pixalione METHOD FOR DETERMINING SEMANTIC GAP, DEVICE AND PROGRAM THEREOF
EP3155533A4 (en) * 2015-06-29 2018-04-11 Nowfloats Technologies Pvt. Ltd. System and method for optimizing and enhancing visibility of the website

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1050830A2 (en) * 1999-05-05 2000-11-08 Xerox Corporation System and method for collaborative ranking of search results employing user and group profiles
US20020107853A1 (en) * 2000-07-26 2002-08-08 Recommind Inc. System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models
US6564210B1 (en) * 2000-03-27 2003-05-13 Virtual Self Ltd. System and method for searching databases employing user profiles

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1050830A2 (en) * 1999-05-05 2000-11-08 Xerox Corporation System and method for collaborative ranking of search results employing user and group profiles
US6564210B1 (en) * 2000-03-27 2003-05-13 Virtual Self Ltd. System and method for searching databases employing user profiles
US20020107853A1 (en) * 2000-07-26 2002-08-08 Recommind Inc. System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Berger et al, "Agglomerative clustering of a search engine query log" [online], 2000. Available from: http://www-2.cs.cmu.edu/ïaberger/pdf/aggclust.pdf *
Daria Goetsch, "Search Engine Optimization Consultants" [online], 30/10/2002. Available from: http://www.seoconsultants.com/articles/1468/getting-honest.htm [24/02/04] *
Tedeschi et al, "SEO in the press" [online], 2001. Available from: http://www.2disc.com/Proof/SEO_in_press.htm [24/02/04] *
Wen et al, "Query Clustering Using User Logs" [online], 2002. Available from: http://research.microsoft.com/users/jrwen/jrwen_files/publications/QC-TOIS.pdf [24/02/04] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170665A (en) * 2017-11-29 2018-06-15 有米科技股份有限公司 Keyword expanding method and device based on comprehensive similarity
CN108170665B (en) * 2017-11-29 2021-06-04 有米科技股份有限公司 Keyword expansion method and device based on comprehensive similarity

Also Published As

Publication number Publication date
GB0604247D0 (en) 2006-04-12
GB2419993A (en) 2006-05-10
WO2005024661A2 (en) 2005-03-17
GB0320583D0 (en) 2003-10-01
WO2005024661A8 (en) 2006-01-05

Similar Documents

Publication Publication Date Title
US6112203A (en) Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis
US6560600B1 (en) Method and apparatus for ranking Web page search results
Diligenti et al. Focused Crawling Using Context Graphs.
Yom-Tov et al. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval
US8312035B2 (en) Search engine enhancement using mined implicit links
Xue et al. Scalable collaborative filtering using cluster-based smoothing
US20030014501A1 (en) Predicting the popularity of a text-based object
CN105045875B (en) Personalized search and device
US7552109B2 (en) System, method, and service for collaborative focused crawling of documents on a network
CN111191122A (en) Learning resource recommendation system based on user portrait
US6584460B1 (en) Method of searching documents and a service for searching documents
US20080263022A1 (en) System and method for searching and displaying text-based information contained within documents on a database
US20050060290A1 (en) Automatic query routing and rank configuration for search queries in an information retrieval system
Viles et al. Dissemination of collection wide information in a distributed information retrieval system
CN1702654A (en) Method and system for calculating importance of a block within a display page
US20040083205A1 (en) Continuous knowledgebase access improvement systems and methods
Zeng et al. A unified framework for clustering heterogeneous web objects
WO2004097568A2 (en) Method and apparatus for machine learning a document relevance function
EP1668549A1 (en) Methods and systems for improving a search ranking using related queries
US20070016545A1 (en) Detection of missing content in a searchable repository
US20030088553A1 (en) Method for providing relevant search results based on an initial online search query
US7257766B1 (en) Site finding
Hamdi SOMSE: A semantic map based meta-search engine for the purpose of web information customization
GB2405709A (en) Search engine optimization using automated target market user profiles
JP2000508450A (en) How to organize information retrieved from the Internet using knowledge-based representations

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)