WO2010131013A1

WO2010131013A1 - Collaborative search engine optimisation

Info

Publication number: WO2010131013A1
Application number: PCT/GB2010/000975
Authority: WO
Inventors: Han Lun Tan; Tsu Hoon Hoh; Chin Chin Wong
Original assignee: British Telecommunications Public Limited Company
Priority date: 2009-05-15
Filing date: 2010-05-14
Publication date: 2010-11-18

Abstract

A method for identifying collaborative searching groups in a computer system comprising: a network of one or computers; one or more user interfaces; a keyword identifier; a search engine database; a user histories database comprising a user search history database comprising information regarding historical queries of the search engine database and a user access history database comprising content parameters comprising information about content accessed by one or more users. The method comprises the steps of : a first user inputting a search term into a first user interface; identifying one or more keywords from the inputted search term using the keyword identifier thereby establishing one or more keywords associated with the search; querying the search engine database with the one or more keywords and returning one or more search result matches to the first user interface. The user search history database is then searched to identify a first group of one or more different users who have submitted identical or similar keyword queries to the search engine database. The user access history database is searched to determine the user access history for one or more members of the identified first group. Subsequently, the first user then selects one or more of the search result matches returned to the first user interface, and determines one or more content parameters associated with the search result match selected by the first user. From the first group of users, there is identified a subset of users who have identical or similar instances of said content parameters in their user access history, thereby forming a second group of users, as a subset of the first group of users, who have identical or similar search and access histories to the first user.

Description

Collaborative search engine optimisation Field of the invention

The present invention relates to, in particular but not exclusively, the fie search engine optimisation. In particular, the invention relates to the fiel collaborative searching, and methods for the identification and optimisatic collaborative searching groups, additionally methods and apparatus for overco problems with the technical implementation of such systems in the prior art are discussed.

Background of the invention Single-user search engines like Google (RTM) are very commonly used ' an individual is looking for information on the internet. The majority of corrmw search engines and web browsers focus on single-user scenarios.

It is also known for groups of users to perform collaborative searches e schools or groups with a common purpose. Collaborative searching enables the us increase the efficiency of their search by searching with multiple users. Prob with single user search engines in a collaborative group searching scenario relate I

• Desire to handle tasks in parallel without unnecessary duplication of effort

• Difficulty in situations where there may be remote collaborators

• Access to optimised historical searches performed by collaborators • Inadequacy of search user interfaces (UIs) for teaching search skill; assisting novice users

Creating UIs that support collaboration during exploratory search ha; potential to improve search experiences and outcomes in several ways:

• Better coverage of the space of relevant, high-quality sites

• Higher user confidence in the completeness and/ or correctness of the sean

• Exposure to varying search strategies and query syntax

• Increased productivity due to a decrease in redundant information-seeking A key feature of the collaborative search is the identification of the rek users to form the collaborative search group. The identification of the group shou based around the requirement of the user and the material searched.

It is a known aim to be able to be to identify collaborative groups, in ord improve the interaction of the user with the other members of group according to similar interests or searches. There are known implementations of collaboj browsing systems that are able to meet the above aims. However, prograrr implementing such systems have identified technical problems in the implement of the prior art systems. For example US 2007/0226183, Hart et al., discloses a method for perfor collaborative searches in which a first user submits a search request to a si engine, the search engine then identifies other different users who have subn similar requests to the first user. The first user is then presented with a list c identified users with similar requests. The first user can then select and vie\ search history of one or more of these users and is further enabled to initiate cc via an online chat facility, thereby having selected and created a collaboi searching group.

However, given the aim of creating collaborative search groups it is founc the method is non-optimised and requires a level of user interaction which is consuming and inefficient. Therefore, in order to create collaborative groups, cu technical implementations rely on the end user to view the search history of the members of the group, and the groups that are formed are reliant on the user's am of the data. Such decisions are subjective and are inherently prone to errors ii user's judgement as to whether the search histories of other users are rele Additionally, these systems require the user to examine potential tens or hundre users histories to create the group.

It is therefore desirable to be able to implement the identification o collaborative groups in a manner that is fast, consistent and does not rely on u judgement by removing the end user decision from the process. It is also noted by programmers that the known methods for identifying groups based on the search terms does not allow for the distinguishing bet heteronyms and regional/international variations in search terms. For example user was searching for "bass" it could relate to a musical instrument, a voice type or a brand of beer. Identifying a collaborative group around the inputted data create a collaborative search group involving people interested in music, fishing beer. Similar problems are known to occur for geographical variations in definition of words e.g. "football" in Europe and Latin America is an entirely diff game to "football" in North America or "football" in some parts of Australia. collaborative group formed around the term "football" could contain people inter in different sports.

Similar problems exist in the implementations of the prior art which are ui to disambiguate the queries. For example, a user may wish to find inform regarding "John Smith" which due to the popularity of the name would result collaborative group that covered a wide range of interests in persons called '

Smith". If a user wished to obtain information regarding a particular John Smith other members of the group she would need to identify which other members c group who are interested in that particular John Smith which would be a ler process.

Additionally, it has been noted by those implementing the systems because of these problems relating to heteronyms for a user to be sure th< identified group is indeed related to the correct subject matter she will often ha either refine her search queries or interrogate other users to see if they have the interests. It is therefore desirable to be able to improve the methods for identifyin collaborative groups based on user intentions using the available data, improvements would also require less interaction for the user when using a s« engine, as she would not need to input further clarifying terms to form collaborative group or to initiate communication with other users who do not ! similar interests.

Therefore, it is desirable to have a system that can implement the require of forming collaborative browsing groups but does not suffer from the problems i technical implementation of such systems in the prior art. Such desired improven in the implementation include not relying on end user judgement to identify men to form a collaborative group, optimising the formation of the group by fu understanding the intentions of the user from the data presented, and to reduc< level of human interaction required to form an optimised collaborative search groi

Furthermore, it is noted that an increased level of user interaction functionality of the User Interface would allow the user to be able to better int with the invention therefore rendering such a system easier to use.

Accordingly, there is provided_. a method for optimising search engine resu a computer system comprising: a network of one or computers; one or more interfaces; a keyword identifier; a search engine database; a user search hi: database comprising information regarding historical queries of the search ei database; a user access history database comprising content parameters compi information about content accessed by one or more users; the method comprisin steps of: a first user inputting a search term into a first user interface; identif one or more keywords from the inputted search term using the keyword iden thereby establishing one or more keywords associated with the search; queryin search engine database with the one or more keywords and returning one or search result matches to the first user interface; searching the user search hi database to identify a first group of one or more different users who have subπ identical or similar keyword queries to the search engine database; searching the access history database to determine the user access history for one or more men of the identified first group; subsequently, the first user then selecting one or me the search result matches returned to the first user interface; determining one or content parameters associated with the search result match selected by the first identifying from the first group of users, a subset of users who have identic similar instances of said content parameters in their user access history, the forming a second group of users, as a subset of the first group of users, who identical or similar search and access histories to the first user.

Preferably but optionally the searches relate to Internet content and matches are websites, more preferably the parameters associated with the conten keywords related to the content.

Preferably but optionally the identities of the first and/or second grou identified users are returned to the first user interface, and/or the further keyword determined from metadata associated with the match, and/or the further keyword descriptive of the content of the match.

Preferably but optionally the user is enabled to communicate to one or i members of the first and/or second group of users via the user interface.

Preferably but optionally where the system is internet based the co parameters are automatically identified by metadata tags of the selected webpage.

Preferably but optionally the content of the webpage is searched to ide content parameters from the page content. Preferably but optionally the user interface is a web browser and the si engine database is a web search database.

Preferably but optionally the user is able to select and/or deselect the fi content parameters that are associated with the match before the step of identifyin subset of users from the first group of users.

Preferably but optionally the keyword selection of the search term is bas( verbs and nouns of the search term.

Preferably but optionally the identified group members are listed with relating to their access history from the user access history database. Preferably but optionally the identified users form contact lists, listing a name and an identifier based on their viewed URLs.

Preferably but optionally the user histories database is updated whene user inputs a search term or accesses a search result. More preferably but optionally the updated contact lists are stored in the user histories database. Preferably but optionally the computer system further comprises a si storing user login details and profiles.

Preferably but optionally the user is further enabled to select one or members of the first or second groups of users to initiate a communication protocc Preferably but optionally identified group members can be blocked or de from the group of users by the user.

Preferably but optionally the presence of a group member is displayed ii group list

There is also described apparatus to implement the system described above There is also provided a method for forming collaborative groups searching, the method comprising, a first user searching for a first item on a s( engine, the search engine returning matches to the search item, identifying a group of other users who have made identical or similar searches to the first usei first user then accessing one or more of the matches returned by the search en identifying content keywords associated with the one or more accessed matches identifying a second group of other users who are a subset of the first group of < users who have accessed similar content as identified by the content keywords the forming a collaborative group who have searched and accessed similar items. Further aims and aspects of the invention will be apparent from the appe claims. Brief description of the drawings

An embodiment of the invention is now described, by way of example with reference to the accompanying drawings in which:

Figure 1 is a data flow diagram of an embodiment of the invention;

Figure 2 is an example of the system architecture of the invention according I embodiment of the invention; Figure 3 is a flow chart showing the registration process;

Figure 4 is a flow chart of the login process

Figure 5 is a flow chart of the overall process of defining a first collaborative g according to an aspect of the invention;

Figure 6 is a flow chart of the overall process of defining a refined collaborative g according to an aspect of the invention;

Figures 7a and 7b show examples of Euler diagrams of the sets defined bλ collaborative groups;

Figure 8 shows an example of a user interface according to an embodiment o invention; and Figures 9a and 9b show further examples of the user interface.

Detailed description of an embodiment

The invention provides an improved technical implementation of the sele and formation of groups for collaborative browsing. In particular it enables creation of such groups without the need for user interaction, overcomes probler identifying the user's requirements in particular those associated with heteronym; variations in definitions and increase the functionality of a User Interface (U improve the level of user interaction with the invention.

In Figure 1 there is shown a data flow diagram according to an aspect o invention, comprising: a plurality of client devices 10, 12, 14 each connected t< internet 16 and a web server 18. The web server 18 comprises a XMPP server : subset identifier 22, a collaborative search engine 24 and a user histories databas comprising a user search history database 28 and a user access history databast The term database is used to describe an organised collection of data and may des the individual tables that form a database.

In the preferred embodiment the invention relates to a method of collaboi browsing on the Internet. However, the same principles and methods may be us browse data from a series of databases, intranets or the like.

The client devices 10, 12, 14 can be any form of known client device tl able to access the Internet, such as personal computers, laptops, mobile phones, etc., with associated features such as display and data input devices. The inventi enabled to function with known devices that can access the Internet via a browsi the preferred implementation of the invention functions as an application that is in conjunction with known web-browser technology. Preferably the brows enabled with JavaScript support, though those skilled in the art will understand other languages and scripting languages may be used. The devices are enabl< browse the Internet 16 through any known means or web browsers. Preferabl; browser utilises a standardised language, e.g.. XML, to allow for e communication between users. The client devices 10, 12 and 14 connect tc internet 16 through known means, e.g. wireless connection, LAN etc.

The web server 18 and its components may be a standalone server or mi distributed across a plurality of servers in a known fashion. The web serve comprises in the preferred embodiment an XMPP server 20. In further embodim any language or protocol that is preferably enabled to carry instant messaging presence information can be used. The web server 18 additionally comprises a si identifier 22 and a collaborative search engine 24. The subset identifier 22 a collaborative search engine 24 are discussed in further detail with reference to Fij 5 and 6. There is also a user histories database 26 comprising a user search hi database 28, containing information regarding the search terms inputted intc collaborative search engine 24 (i.e. details on what searches a particular usei made), and a user access history database 29 (i.e. the websites visited by a use well as identifiers to enable a user to be associated with a particular search ma< link visited by that user. The user access history database 29 comprises inform identifying the site viewed (a URL) as well as content parameters/keywords identify the content of the site. The information stored on the user histories dat∑ 26 allows the details of a particular search or access to a website to be associated a particular user.

In a preferred embodiment the invention requires a login so that the user create and access a profile from any location. The login facilitates identificatior user's search profile from any device or location. In further embodiment: invention does not require the user to login. If the user does not log in, his sean histories will not be saved. Nevertheless, his interest for that particular session v still be shared with others.(e.g. a guest account indicated with a unique id foi session). When the session ends, the search histories are no longer available th< preventing the formation of a group with a user that cannot be identified.

Figure 2 shows the architecture of the present invention. There is shown < devices 10, 12, 14, the User Interface 30, login screen 32, query service scree chat boxes 36 comprising friends lists 38 and collaborative group lists 40, a ( input 42 and query results 44, and the server 50 which comprises the management processor 52, user database 54, the search engine server 60, compr the search engine database 62, the user history database 26 (comprising the search history database 28 and the user access history database 29 not shown) an subset identifier 66.

In a preferred embodiment the invention is implemented as a stand; application in devices (e.g. as an executable program on a personal computer) o be launched through a web application. In further embodiments the syste implemented as a toolbar in the web application or a gadget in a system, (e.g. Gadget (RTM)). The skilled man will realise that the present invention ma implemented in a number of different ways without deviation from the inve technical concepts. In the context of this specification the invention will be desc in relation to its embodiment as a web based application, to form collaborative gi whilst browsing the Internet.

A user utilises a client device 10, 12, 14 to access the invention. The us presented with a User Interface (UI) 30 on the display of the client device. The co and functionality of the UI 30 are described with reference to Figures 7 and 8. Tl comprises a login screen 32 and query service screen 34. The login screen allo user to input a username and password in order to access a profile. The login registration process are described in further detail with reference to Figures 3 and The query service screen 34 comprises chat boxes 36 comprising friends 38 and collaborative group lists 40. Friends lists 38 contain information about sell individuals preferably with information regarding their online presence e.g. oi offline, hidden etc. The friends lists 38 are equivalent to those found in known in messaging applications. The collaborative group lists 40 contain the list of men that define a collaborative group. The method of defining a collaborative groi described in further detail with reference to Figures 5 and 6.

The query service screen 34 further comprises a query input 42 and c results 44. Search strings, or query inputs are entered at the query input 42 an< results returned at the query results 44.

There is also shown the server 50 which comprises the user manage processor 52, user database 54, the search engine server 60, comprising the sc engine database 62, the user histories database 26 and the subset identifier 66. The details entered at the user login screen 32 are transmitted to the management processor 52 where the details are compared to those stored in the database 54. Therefore, the invention provides apparatus to identify users. The fri list is also managed by the user management processor 52, such management o( in the preferred embodiment via known XMPP protocols. In use, the user inputs a search query e.g. "speaker" into the query inpu

The input is sent to the search engine server 60 whereupon keywords are extπ from the search query and are searched in the search engine database 62. identification of the keywords is discussed in detail below with reference to Figu Subsequently, the search queries are saved into the user search history databas along with an identifier to identify the user who inputted the search query. The search history database 28 is queried for instances of similar or identical se queries and the results returned to the collaborative group list 40. The extπ keywords are used to search the database against search histories. Preferably t keywords are matched and provided a weighting figure that indicates how relev∑ is to the current search interest. The identification of similar search queru discussed in further detail with reference to Figure 4. Therefore the collabor, group list 40 displayed will contain users who have inputted identical or similar se terms into the query input 42. Therefore, the membership of the group is determ by the interests of the users. The invention also beneficially comprises a s identifier 66, which is connected to the user histories database 26. The s identifier 66 is enabled to optimise the selection of the collaborative groups 4( overcome the problems identified in the prior art. The function of the server 50 a particular the user histories database 26 and subset identifier 66 are discuss* further detail with reference to Figures 5, 6 and 7.

Figure 3 shows a flow chart outlining the process of a user registering wit system. In the preferred embodiment a user is required to register before commei a collaborative search. This allows a user's information to be saved, to allow the to keep updated friends and contacts lists, and also increases the security assoc with the system. As collaborative browsing allows for a level of interaction bet' users it is preferable to make users login to avoid potential problems with abusi offensive users, or those whose intention is to "spam" the various chat facilities.

There is shown the step of registering a user name and password at step , and storing the details in the XMPP server at step S 102. Both steps occur via kr methods and means for registering a user to a computer based service. Preferabl details are stored on an encrypted server.

Figure 4 is a flow chart showing the login procedure used in the prefi embodiment. Once a user is registered they are able to login to a service hosting invention. This allows a user to create or access collaborative groups in a mannei is not terminal or workstation specific. Additional benefits arise from the abili access previously defined collaborative groups thereby increasing the functionali the invention.

There is shown the step of logging in with the user name and password at

5103, verifying the details at the XMPP server at step S 104, checking friends list! statuses at step S 105, returning lists at step S 106 and showing a login error at SlO

In the preferred embodiment, a user accesses his profile and histori created friends list via a login screen. Such profiles and lists are similar to those f in instant messaging (IM) services. The user enters his login name and passwo

S 103 which are then verified against the details stored in the XMPP server 20 at

5104. The XMPP server 20 returns any previously stored friends list, or the like uses known presence information protocols to identify which friends or members collaborative group are online at step S 105. This list of users and their statuse returned to the user interface at step S 106. If the details entered at step SlO' incorrect then a login error is shown at step S 107. The status of a logged-in user is preferably set to "online" in XMPP servi and the status (presence information) of his friends list will also be loaded (onlii offline). Meanwhile, any offline messages (messages sent while the user was ofl will also be displayed. Thus the XMPP server 20 handles protocols for the pres information and instant messaging. In a further embodiment the user is not requir login. The user is named as "Guest" in the chat box and no friends list is displayec

User resource management is managed by a non-volatile storage dat∑ server, e.g. MySQL, Oracle, Postgresql, and SqI. For compatibility reasons, prefei the database server must be able to structure given records according to the_dat∑ table utilised by the XMPP server 20.

Any type of XMPP server 20 may be used as long as it is able to delive presence information (user online, offline, away, etc.) and message inform between end-to-end client and server. In the preferred embodiment, an Opei XMPP server is used to provide the presence and messaging functionalities, and transmission between client and server occurs in encrypted XML form.

On the client device 10, 12, 14, the UI 30 is accessible through a web bro served by the web application residing on the Web server 18. Any type of browser applications can be used, e.g. FireFox (RTM), Internet Explore (RTM), C (RTM), etc. The user may interact with the web server 18 through a user-frie interface. Preferably, a dynamic server is able to support multimedia elements dynamic changes of web service. In the preferred embodiment, script languages packaging technology are used in the web server, preferably, Javascript and AJ These languages are light-weight and dynamic languages that provide change feature to the web . content. In the UI 30, presence information and mes information are preferably loaded in real time without interfering with the display behaviour of the existing page. In the preferred embodiment a pop up, or a s message appears beside the web interface to show that there is an incoming mes for the user, or a change in presence information (status, buddy online/ offlin* friends in the contact list. Similarly, the interface provides a means to mini maximise or close/ open a chat box and the friends list, to create a tab for a cha etc. Furthermore, the web interface is enabled to embed multimedia elements, su animations, sounds etc., for every incoming message, an audio/ video call bet two or more people, a message avatar etc.

Figure 5 shows a flow chart of the formation of an initial collaborative g This group is selected in an automated manner, as will now be described, tht minimising the user interaction and returning an optimised collaborative group the need for further user interaction.

There is shown the sending of a query or search request at step S201 breaking down of the request into keywords at step S202, searching for the keyv in the search engine database at step S203, returning any matches at step 5 searching for identical or similar keywords in the user search history database : step S205, retrieving the status of other users at step S206, creating a collabor group at step S207, storing new contacts in list at step S208 and storing the u record in the user history database at step S209.

In the preferred embodiment, the user inputs a search query at the c service screen 34 (not shown) at step S201. The query is firstly broken down smaller objects in search engine model a step S202. This preferably occurs in S Search Engine (7). The objects that the search query is broken down to are c keywords. Use of keywords is preferable to the use of the full text inputted sear query as they optimise the implementation of the formation of the collabor groups. In further embodiments, keywords are not identified from the inputted se query may be taken to be the keywords.

For example, if a user has inputted the query "how to build a speaker", query contains features, such as verbs, nouns, etc. The invention is enabled to ide the key content of the query in order to optimise the formation of the collabor groups. In the preferred embodiment keyword relevancy algorithms are used. search term is pre-processed to convert all uppercase characters to lowercase. ' "stopwords" are removed from the list. The list of stopwords preferably con words such as "about", "all", "alone", "also", "am", "and", "as", "at", "becai "before", "beside", "besides", "between", "but", "by", "etc", "for", "i", "of", ' "other", "others", "so", "than", "that", "though", "to", "too", "trough", and "until" further embodiment, the list of stopwords is updated when words that are consic superfluous to the search term are identified. Once the stopwords have been rem the remaining words form the keywords for the request. The keywords identified in the search query as utilised to form collaborative group. In the preferred embodiment, words that are verbs and noun identified as keywords. Such identification of verbs and nouns occurs by comp, the individual words that form the search query to a dictionary database identifying the class of word from the associated entry. In further embodiments, other classes of words may be used, or the words be compared to a list of previously identified keywords. The method ol identification of the keywords may be any known method for identifying key terr a string.

In the above example the keywords are identified as "how", "build" "speaker" at step S202. Once identified these keywords are looked up in the se engine database 62 at step S203. Other methods of querying the search er database may also be used, therefore the invention could be used with different se engines available today. The search engine used may also be a known single search engine. The links or results that match the identified keywords are return) the user and displayed at the query results 44 at step S204.

Therefore steps S 201 to S 204 identify a method of querying the search er database 62 and returning the results to a user.

The invention also identifies collaborative groups during the initial se stage. Once the keywords have been identified at step S202 these keywords are searched in the user search history database 28 at step S205. As discussed reference to Figure 1 the user search history database 28 comprises inform, regarding the various historical search queries made by a user. The purpose o search is to identify all previous instances of similar searches having been inpi into the search engine database 62. The user search history databases 28 then contains entries detailing queries made and the user who made the query. To avoii problem of server capacity overrun, the oldest histories are replaced by new hist in a "First In First Out (FIFO)" manner. Any searches performed by am individual would result in the matching of the latest historical browsing informati described above.

In the preferred embodiment, the query of the user search history databai is not limited to instances of identical keywords, but also encompasses sii keywords. Similar keywords are identified using known similar word sources e thesaurus. In the preferred embodiment, an open source thesaurus, or a sep database is used to provide synonyms for the inputted keywords, particularl; heteronyms. For example, the term "speaker" would be linked with "lecti "talker" and "electro acoustical transducer". In the preferred embodiment each keyword is searched individually and part of a combined search. Therefore, each keyword inputted at step S201 eventually result in instances of previously searched for identical or similar keyw The users who inputted the queries are also identified from the relevant database < in the user search history database 28. These identified users form the 1 collaborative group at step S 205.

At step S206 the status of the users who were identified at step S2( ascertained. The status of the user indicates whether the particular user is curr online/logged into their account, and preferably if they are willing to be contacte other users who may have a similar interest. In the preferred embodimenl invention utilises a XMPP protocol which is able to return such information.

Once the status of the identified users has been ascertained at step S206 a contact list of this collaborative group is returned to the user at step S207. The us able to contact the members of this identified collaborative group via known met! for example instant messaging protocols. Preferably, members of the identified collaborative group are also assigr tag, or identifier, which is indicative of their interests. The method of assigning th is explained in further detail with reference to Figure 9.

This contact list is stored in the session component at step S208, prefei consisting of user IDs (identities) that makes it possible to contact correspon individuals with similar interests.

Each of the search actions performed also triggers the storage of a new re in the user search history database 28 at step S209. Therefore the user may als identified by any subsequent user provided that the subsequent user inputs a sii search term. This saved session also forms the basis of the refinement process discussed below with reference to Figure 6.

Figure 5 therefore describes the first level search where a list of collaboi users is identified provided based on the inputted search term and the inform from the users search history database 28. This level of searching therefore pro some optimisation over the prior art in that the groups are identified automaticall; do not require any user interaction beyond the inputting of the search term to foπ group. However, it is found that this level of search does not optimally use the and that the groups may also suffer from problems with heteronyms, ambig search queries etc.

Figure 6 describes the process of optimisation of the collaborative group: the use of the available data to overcome the problems in the implementatic collaborative systems in the prior art. The invention utilises the content viewed by the user to further defϊm collaborative group to ensure that the members of the collaborative group consi users that have identical interests. The user has selected a link or result froπ search results thereby accessing the desired webpage.

There is shown the user selecting a link (URL) and the URL and UserID user name) being sent to the subset identifier at step S301, the subset iden receiving the information at step S302, forwarding a UserID to a session identif step S3O3, obtaining contact information associated with a session at step 5 forwarding the contents of the webpage identified by the URL to a metadata ch< at step S3O5, retrieving content parameters such as metadata or keywords for the at step S306, searching for the metadata or keywords retrieved in step S306 in the access history database 29 using the subset identifier at step S307, returnin updated contact list at step S308 and storing the record in the user histories date 26 at step S309.

Figure 6 shows the sequences for the next iterations of collaborative S( after first search is done as described with reference to Figure 5. The user has sel< a link or result that he wishes to view. Details regarding the content of the web identified by the URL and the user ID are sent to the subset identifier at step and received at step S302.

The Subset identifier operates as a filter based on the content of the wet that the user has selected. In this operation, the content of the webpage of the cli URL will be studied for its metadata and contents for commonality amongst a contacts in the collaborative group list formed at step S207. A match of metada contents of the webpage within the same group of people will be used as next i input to be shown on UI. This will optimise the membership of the collaboi group The received data are forwarded to a session data check at step S303 i metadata checker and content parameter/keyword identifier at step S 305. The se data for the user will already contain the data as identified at steps S208 and S2I discussed with reference to Figure 4.

At step S304 the contact lists generated by the session (as stored ii database from steps S208 and S209) are retrieved. The contact lists comprise identifier of the members of the collaborative group (e.g. a username) as display step S207. Therefore, the list of members of the collaborative group created fror initial input of the search term by the user is recovered.

At step S306 content parameters or keywords to describe the content o link are determined. It is to be understood that content parameters will encorr terms that describe the content of the webpage accessed by the user and as such, take the form of keywords. The use of the term keywords when describing the co of a webpage implicitly implies content parameters. It is known for website contain metadata tags that describe the content of the page. These are often use search engines to determine the relevance of a link or page. In an embodimen metadata found in the header of the clicked URL is retrieved and keyword! identified from the metadata. As with the inputted queries verbs and nouns prese the metadata are used as the keywords. However, the metadata associated w website may not comprise the optimal keywords to describe a URL. For example known to manipulate the metadata so that a given webpage appears at the top search engine's results for a given query. Such manipulated metadata may optimally represent the content. In a preferred embodiment, the content ol webpage URL is analysed and content parameters/keywords are extracted fron content. In a preferred embodiment the textual content of the page is analysec keywords (defining content parameters) are extracted from the text. The identific of the keywords from the content of the webpage identified by the URL prefe using known web text and content mining techniques. In the preferred embodi these include data extraction and segmenting webpages. Before the data extra and webpage segmentation process takes place, there are some text pre-processing should be done. Firstly, the content of the page will be analysed to change al upper case characters to lower case characters.

Then, webpage segmentation techniques are used to determine the conte the web page. Such techniques are well known in the field of search ei optimisation and are preferred as they ignore content that is likely to be irreleva key themes of the page e.g. advertising, links and notices. In the prefi embodiment the invention removes all instances of stopwords (word that is indexed in webpages and not used in word identification), such as "about", ' "alone", "also", "am", "and", "as", "at", "because", "before", "beside", ^' "besi "between", "but", "by", "etc", "for", "i", "of", "on", "other", "others", "so", "tl "that", "though", "to", "too", "trough", and "until" in the text.

From the remaining text, keywords are extracted using known data extra techniques. The data extraction techniques used are preferably those used in mining programmes to extract key phrases and text clusters which are indicative o content of the webpage. Preferably, computer science language pattern recogn and knowledge discovery techniques are used to determine the content of the page

Preferably, the keywords used in the initial inputted search query are also in the extraction of key terms. Instances of the search terms and their synonym searched for in the text. For instance if the search query was "speaker" instanα words such as "lecturer", "talker" and "electro acoustical transducer" woul searched for in the text of the webpage. Matches to these words would also be us< determine the precise nature of the search and overcome problems such as t associated with heteronyms. The identification of the key themes and keywords from the text ma performed using known programs, such as those found in automatic advert generating programs. After obtaining the keywords at step S306 and the contact list of the i collaborative group identified at step S207 these are forwarded to the subset iden at step S307. The content parameters/ keywords identified for the URL at step form the search term for the subset identifier 66. The subset identifier 66 querie user access history database 29 at step S307,for instances of the co parameters/keywords in the access history of the user who form the i collaborative group identified at step S 207.

Therefore, the invention searches for instances of the co parameters/keywords in the link clicked by the user in the URLs visited by < users. Optionally, all users (regardless of whether or not they appeared in the i collaborative group or not) who have clicked on that particular URL may als identified. Not all members of the collaborative group identified at step S207 have visited URLs that have similar content, as derived by their keywords (or co parameters), as the end user and therefore they are not recovered by the search c user access history database 29. Therefore, the initial collaborative group is reduc size and is optimised according to the URLs viewed by the user.

The newly identified collaborative group is then returned to UI at step 5 Additionally, the new group is stored in the user histories database 26 with the i ID. Therefore, the user may repeat the process and further refine the group or kee same group the next time they utilise the invention.

The identification of the keywords to describe the content is a key featu the successful identification and optimisation of collaborative groups from user s( and viewing histories. By associating further keywords to a user, based on behaviour regarding their access history (as recorded in the user access hi database 29) any ambiguities that may have occurred are easily resolved am further information may be used to improve the identification of suitable membei a given collaborative group.

In the example given in the introduction, a user searching for "bass" may links relating to musical instruments, singing, fish and beer returned to them at tr

30 at step S204. The collaborative group identified at step S207 will contain i users with interests in musical instruments, singing, fish and beer. If the user at

S301 clicked on URLs of webpages relating to say, fishing rods, sea bass and a π for cooking sea bass, it would be clear that they are interested in fishing not m singing or beer. Therefore any ambiguities regarding heteronyms regional/geographical differences in definitions are overcome with the use o additional information. The collaborative group identified at step S307 and return the user at step S3O8 would reflect this knowledge and as such only contain p( interested in the fish.

Similarly, the invention provides a method of disambiguating the querj/ grouping together users who are interested in a particular subject. In the example in the introduction, a user searching for "John Smith" would create an ii collaborative group at step S207 that would include people who were interested ir person named John Smith. However, if the user were to click on a URL at step ! and the keywords identified from the webpage identified by the URL at step ! related to, say, the composer of the "Star Spangled Banner", the collaborative g formed at step S307 would only contain users who were interested in the comj John Smith.

Furthermore, the use of keywords allows for the collaborative group ident by the search and viewed webpages of the clicked URLs to contain users who viewed different webpages but have the same interests. For example, a user makes an initial search query for "bass" and subsequently views a recipe frc cooking website. A different user with a similar search query may view a sii recipe for bass from a different cooking website. The keywords associated with clicked URLs would be similar and possibly identical (e.g. cooking, fish), therefore whilst the users may not have viewed the same website they wouL identified as having similar keywords in their user access as stored in the user ac history database 29 and therefore form part of the same collaborative group. This of further data to further understand the user requirements, optimise the process o formation of collaborative groups and overcome any ambiguities that there may 1 the query. Additionally, it removes the requirement present in the prior art for a to interrogate other users to verify if the group formed is indeed relevant. Finally, the process repeats at step S 301 when the user clicks on one or i further URLs returned by the query. The further keywords associated with the clii URLs (as determined by the content or metadata of the webpages identified b) URL) at step 3S06 are used to refine the membership of the collaborative group. group that is then refined by the subsequent keywords from the webpage iden from the URL, may be either the initial group (as determined at step S207) c refined group (as determined at step S307).

An aspect of the invention is that the collaborative group identified at S307 is a subset of the initial group identified at step S207. Therefore, the gi become more relevant to the user with each clicked URL as they are based o viewed content as well as the searched content.

Figures 7a and 7b are Euler diagrams, illustrating the refinement proce described in Figure 6. Figure 7a shows the reduction in the initial collaborative group to a re group i.e. the user has only clicked on one URL. The initial group as identified b inputted search query at step S207 is represented as group A. The subset of groi group B is as identified in step S307 from the keywords associated with the cli

URL. As can be seen, the user identified by their viewed URL user histories at

S304 belong to the users in the initial group as defined at step S207, therefore, g

B must be a subset of group A.

Figure 7b shows the possible outcomes if the user clicks on further U generating a further collaborative group C. There is shown the three pos outcomes cases 1, 2 and 3.

In the preferred embodiment, the entire contact list that is stored in the se:

(i.e. the list defined at step S207) will be used to search for similarity in int< instead of only searching second level contact list (i.e. the list defined at step S!

The use of the further identified keywords may result in one of three scenarios: • Case 1: The result of contact list (C) is a subset of B. In this case, client get more accurate/ filtered list of contact who have similar interests;

• Case 2: The result of C is overlapping B. In this case, the overlapping Ii people will be shown to user, instead of C; and

• Case 3: The result of C has no relationship with result of B. In this case, might return to the first interaction, (by returning to previous search result, or sel& tab where the first search results lie, or go back to second level search results to again). As such, C will be refreshed with new contact list based on the search r again. Therefore, the invention provides a new collaborative group that has si interest (as identified by the terms of searched keywords and the keywords/co parameters associated with the clicked URL). In this way, the search topic is narrowed down into the relevant social circle so that user is able interact with cc person in order to obtain an enhanced search experience.

Figures 8 and 9 show an example of the UI of the invention in use.

Figure 8 is an example of a User Interface used in an embodiment o present invention. There is shown the interface 70, the search bar 72, search tab

76, results from the search 77, friends list 78, collaborative group list 80, collabor group member's name 82, collaborative group member's identifier 84, an collaborative group list 86, user ID 88, and user logout 90.

The user is identified as Chris at the user K) 88. Chris will have logged ii his friends list 78 populated using the method described with reference to Figu and 4. Chris has searched for keywords "speaker", as shown at tab 74, "amplifier" as shown at tab 76, on two different instances. For each search, then community which is performing similar searches. The initial search for speaker w associate Chris into a generic "speaker" community as defined at step S207, tha varied interest into audio speakers or public speaking. These results are show collaborative group list 80. In order to improve the functionality of the UI 70 di the formation of the initial collaborative group the members of the groups are liste name 82 and an identifier 84. The identifier 84 provides the user with an indicat to the relevance of the member of the group. The identifier 84 is based oi keywords associated with the webpages viewed by that user. Therefore, ir example user Rob has viewed one or more webpages relating to a politician therefore their identifier 84 is "politician". The second collaborative group Ii; based on the keyword "amplifier" has a similar structure but has been minimisec with known instant messaging applications the user Chris is able to select a memb his friends list 78 or of the collaborative groups 80 and 86 and initiate contact them. Offline message can also be sent to another user who is offline and show the page. The message will be stored in XMPP server and sent to the user once log in to the system. Figures 9a and 9b show the further functionality of the UI. There is show features of the UI as described in Figure 8 and the new collaborative group Hs listing member's names 94 and identifiers 96, and an instant messaging box 98.

The user Chris has selected a URL from the results tab for speakers 74. URL is the 'How Speakers Work' link which is rendered on the UI. The metadata content of the webpage have been identified as per step S306 and used to identif subset of users from the initial collaborative group as shown in Figure 8 to fo refined new collaborative group list 92. As can be seen, the list now identifies user names 94 Sam, Henry and Jason all of whom are identified by their identifn as "Audiophiles". Therefore, these users have all made a query relating to music/a and have viewed webpages whose content also relates to audio. The users in the ii group 80 who were interested in Debaters (Kim) and politicians (Bill and Rob) ai longer displayed as their user histories have shown that they do not have the s viewing histories (and therefore interests) as Chris. In Figure 9b, Chris has approached a member of the refined collabor, group and has initiated a chat session with Henry via an instant messaging protocc

Other possible components could also be attached to the system, for exai firewall and data replication. Firewall servers will ensure the system security by allowing trustworthy interactions that might be carried on between users and ser This will attempt to prevent the system from being hacked by intruders, for exai denial-of-services, and unlimited pings. Data replication system will help to stor the databases servers from time to time in order to prevent disaster, such as \ database server is corrupted. Hence, a backup data will help the system to restor the stored data. The above invention has been described with specific reference to the us search engines on the Internet. However, those skilled in the art will appreciate the ideas and inventive concepts of the present invention may also be applied to ( searches performed on databases used by a plurality of users.

Claims

1. A method for optimising search engine results in a computer system v comprises a network of one or computers; one or more user interfaces; a key identifier; a search engine database; a user histories database comprising a search history database comprising information regarding historical queries o search engine database and a user access history database comprising co parameters comprising information about content accessed by one or more use the method comprising: a first user inputting a search term into a first user interface; identifying one or more keywords from the inputted search term i the keyword identifier thereby establishing one or more keywords associated the search; querying the search engine database with the one or more keyw and returning one or more search result matches to the first user interface; searching the user search history database to identify a first groi one or more different users who have submitted identical or similar keyi queries to the search engine database; searching the user access history database to determine the user ac history for one or more members of the identified first group; subsequently, the first user then selecting one or more of the se result matches returned to the first user interface; determining one or more content parameters associated with the se result match selected by the first user; identifying from the first group of users, a subset of users who identical or similar instances of said content parameters in their user ac history, thereby forming a second group of users, as a subset of the first groi users, who have identical or similar search and access histories to the first user

2. The method of claim 1 , where the searches relate to Internet content anc matches are websites.

3. The method of either of the preceding claims, wherein the parami associated with the content are keywords related to the content.

4. The method of any one of the preceding claims, wherein the first ai second group of identified users are returned to the first user interface.

5. The method of any one of the preceding claims, wherein the further keyv are determined from metadata associated with the each of the one or more sell matches.

6. The method of any one of the preceding claims, wherein the further keyw are descriptive of the content of the one or more selected matches.

7. The method of any one of the preceding claims, wherein the searches rek internet content and the one or more selected matches are websites.

8. The method of any one of the preceding claims where the user is enabl< communicate to one or more members of the first and/or second group of user the user interface.

9. The method of any one of claims 2 to 7 where the content parameter! automatically identified by metadata tags of the selected websites.

10. The method of any one of claims 2 to 8 where the content oJ websites is searched to identify the content parameters from the website contei

11. The method of any one of claims 2 to 8 where the user interface web browser and the search engine database is a web search database.

12. The method of any one of the preceding claims further including step of the user is able to select and/or deselect the further content parameters are associated with the one or more selected matches before the step of identif the subset of users from the first group of users.

13. The method of any one of the preceding claims wherein the ke>π selection of the search term is based on the verbs and the nouns present ir search term.

14. The method of any one of claims 3 to 12 wherein the identified g members are listed with tags relating to their access history from the user ac history database.

15. The method of any one of the preceding claims wherein the ident users form contact lists, listing a user name and an identifier based on their vie webpages.

16. The method of any one of the preceding claims wherein the histories database is updated whenever a user inputs a search term or acces search result.

17. The method of any one of the preceding claims wherein an up< contact list is stored in the user histories database.

18. The method of any one of the preceding claims wherein the com system further comprises a server comprising user login details and profiles.

19. The method of any one of the preceding claims wherein the us further enabled to select one or more members of the first or second grou] users to initiate a communication protocol.

20. The method of any one of the preceding claims wherein idenl group members can be blocked or deleted from the group of users by the user.

21. The method of any one of the preceding claims wherein the presen a group member is displayed.

22. Apparatus for optimising search engine results in a computer sy comprising: a network of one or computers; one or more user interfacf keyword identifier; a search engine database; a user search history date comprising information regarding historical queries of the search engine datal a user access history database comprising keyword information about co accessed by one or more users; wherein the apparatus is suitable for: a first user inputting a search term into the user interface; identifying one or more keywords from the inputted search term i the keyword identifier thereby establishing keywords associated with the searc querying the search engine database with the keywords and retui one or more matches to the user interface; searching the user search history database to identify a first groi one or more different users who have submitted identical or similar key¹ queries to the search engine database; searching the user access history database to determine the user ac history for one or more members of the identified first group; subsequently, the first user selecting one or more of the ma returned to the user interface, and determining one or more further keyv associated with the content of the match selected by the first user; identifying from the first group of users, a subset of users who identical or similar instances of said further keywords in their user access his thereby forming a second group of users who have identical or similar searcl access histories to the first user.

23. Apparatus for optimising search engine results comprising: a user interface enabled for a user to input a search term; a key word identifier enabled to identify one or more key words fror inputted search term; a search engine database and suitable for identifying and returning match the key words to the user interface; a first identifier suitable for identifying a first group of one or more users have inputted identical or similar identified key words in the search ei database, in a user search history database; a second identifier suitable for identifying the access history of the first i of users from a user access history database; the user interface further configured to allow the user to access a match; an identifier suitable for identifying key words or metadata associated wit match accessed by the user; an identifier suitable for identifying users from the first group of users, identical or similar instances of the key words or metadata associated wit match from the user access history database, thereby identifying a second gro users from said first group of users who have identical or similar search and a histories to said first user.