GB2426838A

GB2426838A - Peer to peer network searching using XML metadata to personalise the search query and hit messages

Info

Publication number: GB2426838A
Application number: GB0511114A
Authority: GB
Inventors: Richard Daniel Foster; Isabel Delacour; Paul Maurice Otto Gugenheim
Original assignee: Sony United Kingdom Ltd
Current assignee: Sony Europe Ltd
Priority date: 2005-05-31
Filing date: 2005-05-31
Publication date: 2006-12-06
Also published as: GB0511114D0

Abstract

In a peer-to-peer network of peers 10 which are interconnected for sharing content data, searching peers 10 perform searches for content data by generating and transmitting search query messages which specify search criteria for passage through the network, by receiving hit messages generated in response to the search query message by peers 10 storing content data and by displaying to a user descriptive information contained in the hit messages. The behaviour of each respective searching peer 10 in performing searches is variable in accordance with a predetermined personality of the respective searching peer 10, for example relating to a type of content data which it is desired to locate.

Description

Storage of Content Data in a Peer-to-Peer Network The present invention

relates to a peer-to-peer network, in particular to searching for content data on a peer-to-peer network.

A peer-to-peer network is a network of interconnected computers known as peers having a topology and architecture to facilitate sharing of content data or resources between the peers. In general terms, the peers have equivalent responsibilities and, for at least some tasks, communicate directly with each other to share content data or resources, resulting in sharing of infrastructure and bandwidth.

This contrasts with a traditional client/server model in which a server stores resources accessed by a large number of clients.

In a pure (or decentralised) peer-to-peer network, all peers have identical responsibilities and all communication is symmetric. Examples of pure peer-to-peer protocols are Gnutella and Freenet. In a centralised peer-to-peer network, such as Napster, peers connect to a central server storing an index of all the connected peers together with their available files to cany out searches, but then make direct connections to other peers to download content data located in the search.

The present invention is concerned with improving the searching capabilities of peers in a peer-to-peer network.

The present invention provides a peer-to-peer network of peers which are interconnected for sharing content data, wherein the peers are arranged to pass messages through the network, and respective searching peers in the network are arranged to perform searches for content data by generating and transmitting search query messages which specify search criteria for passage through the network, by receiving hit messages generated in response to the search query message by peers storing content data and by displaying to a user descriptive information contained in the hit messages, the behaviour of each respective searching peer in performing searches being variable in accordance with a predetermined personality of the respective searching peer.

By providing for the behaviour of each respective searching peer in performing searches to be variable, it is possible to apply an appropriate behaviour to the desires of a given user. This allows the peer-to-peer network to be more effective in locating content data desired by a user. In general, there will be a wide range of content data stored on a given peer-to-peer network, but given users will typically desire to locate different types of content data. For example, given users may require content data in different media formats, for example video content data, audio content data or document content data. Similarly, given users may predominantly desire to search for content data from a given source or of a given genre. By allowing variation in the behaviour of the searching peers, it is possible to select a searching behaviour which enhances the ability to locate the desired type of content data. Thus, the predetermined personality of the respective searching peer may relate to a type of content data.

It is to be noted that this variation of the behaviour in performing searches differs from existing personalization of "skins" for peer-topeer applications in which the user-interface is adapted to vary the look and feel of the application to the user, for example by varying graphics, layouts and fonts displayed by the application. In such existing variation of the "skin" of a peer-to-peer application, the actual behaviour of the peer in performing the search is not actually modified or enhanced for any particular task.

Some examples of ways in which the search behaviour may be varied are as follows.

A first example is that the combination of types of search information included in a search query message are varied. This allows the search criteria to be adapted to the type of content data being searched for in accordance with the predetermined personality. In this case, the searching peer may be arranged to accept inputs from a user of data to form the search criteria in a displayed window having a variable combination of input fields each corresponding to the respective type of search information. This effectively varies the searching template.

Another example of the variable searching behaviour is variation in the types of descriptive information taken from the hit messages and displayed to a user.

Preferably, the predetermined personality of the respective searching peer is represented by personality data stored in the respective searching peer and the predetermined personality is definable by a user.

Besides the peer-to-peer network as a whole, the present invention also provides an individual peer arranged to connect into the a peer-to-peer network, a peer-to-peer application capable of execution on a computer system to cause the computer system to act as the peer, and a storage medium storing the peer-to-peer application.

To allow better understanding, an embodiment of the present invention will now be described by way of non-limitative example, with reference to the accompanying drawings.

In the drawings: Fig. 1 is a schematic view of a personal computer capable of acting as a peer in a peer-to-peer network; Fig. 2 is a diagram of a peer-to-peer network illustrating the passage of ping and pong messages; Fig. 3 is a tree diagram of an example of a file structure; Fig. 4 is an example of an XML data file; Fig. 5 is a window of the application providing a view on the shared content folders; Fig. 6 is a window of the application for accepting inputs of descriptive information; Fig. 7 is a diagram of the format of a basic search message; Fig. 8 is a diagram of the format of a basic query message; Fig. 9 is a flow chart of the searching process; Fig. 10 is a diagram of the format of an advanced search message; Fig. 11 is a window of the application for accepting inputs of search criteria; Fig. 12 is a diagram of the format of an advanced query message; Fig. 13 is a window of the application for displaying search results; Fig. 14 is a diagram of the format of a file list query; and Fig. 15 is a diagram of the format of a file list response message.

There will now be described a protocol for a peer-to-peer network which embodies the present invention. The present protocol may be implemented in a peer- to-peer application to be executed on a computer system, for example a conventional personal computer 1 as shown in Fig. I and including a processor 2, a memory 3, a drive 4 for reading a recording medium 5, and an interface 6 for connecting the personal computer to other computer systems, for example over the htemet 7 or a private connection. The personal computer 1 also has a display 8. The peer-to-peer application may be stored on a storage medium, for example the memory 3 andlor the recording medium 5. On execution of the peer-to-peer application, the computer system acts as a peer in a peer-to-peer network by connecting via the interface 6 to other computer systems executing an equivalent application.

The present protocol is a modified version of the Gnutella protocol, which is widely known and has several open-source applications, for example MFC Gnucleus.

Accordingly, the Gnutella protocol will not be described in detail, but for clarity there will first be given a brief overview of the relevant aspects of the Gnutella protocol which apply equally to the present protocol.

The Gnutella protocol is a pure peer-to-peer protocol using a distributed model without relying on a central server with its associated costs and risk of failure.

A peer provides searching capabilities as well as file serving capabilities. As a peer may be thought of as acting as both a client and a server at different times, it is sometimes referred to as a servent. Peers connect to the network by connecting to peer already in the network, for example using an index of other users (a host cache).

Each peer stores content files of content data. In general, the content data may be of any type including, but not exclusively, image data, audio data, documents and software.

The Gnutella protocol is message-based. Peers generate messages which are passed through the network by means of each peer transmitting messages it generates or receives to one or more peers to which it is connected. Two types of routing of messages are applied. The first type of routing is for passing query messages originating from a requesting peer. In this case, a broadcast routing mechanism is used in which each peer transmits messages to all peers having a connection thereto.

Any peer receiving a query message can generate a response message. The second type of routing is for a response message. In this case, the response message is routed to pass back to the requesting peer.

As an example of the routing of query messages and response messages, Fig. 2 shows the routing in a network of five peers 10 labelled A to E of a ping message which is a query message for checking the connection status of previously connected peers and pong messages which are response messages in response to a ping message. In Fig. 2, a ping message is generated and transmitted by peer A and is passed through the network to all the other peers B to E. Pong messages are generated and transmitted in response and are passed back through the network to the requesting peer A. In order to prevent messages from travelling forever, the Gnutella protocol implements the idea of message decay. A message starts out with a number of hops it may travel before expiring, called the TTL (time to live). At each transmission from one peer to the next, the TTL is decremented. When the TTL reaches zero, the request message is no longer transmitted and is said to have reached the horizon for the requesting peer.

Networks propagating request messages with a high time-to-live can incur scaling problems. The messages can spread exponentially and flood the system with requests. This can cause great disruption to the network, so typical implementations limit the IlL to 7. This imposes a horizon on each user, but reduces the chance of network overload.

Messages are distinguished using unique identifiers and are not tagged with IP addresses. As messages pass through each peer, the message identifiers are saved in dynamic routing tables store in memory. These tables are used to lookup the peers which packets should be sent to next. This system makes it difficult to determine which peer is the requesting peer, except by rigorously examining successive dynamic routing tables. However, this anonymity is lost when results messages are returned or download requests are made, in which cases the IP addresses are revealed.

There is no idea of a persistent connection between any two arbitrary peers on a Gnutella net-work. They are both on the network but not directly connected to each other in any predictable or stable fashion. Ping messages are sent out periodically by peers to check the status of the connections to other peers. Those peers that reply with a pong message within a given time limit remain in the connections list. Those peers that do not respond are assumed to be disconnected. When a peer leaves the network, it does not disrupt the network. The peers connected to the departing peer simply clean up their memories to forget the departing peer and things continue as normal. Over time the network adapts its shape to long-lived peers, but even if the longest-lived, highest capacity peer were to disappear, there would be no lasting adverse effects.

The Gnutella protocol copes with the constantly changing infrastructure by creating an ad hoc backbone. There is a varying speed of Internet connections. Some are slow, for example 56Kbs modems, and others are fast, for example T3 lines. Over time, the fast peers migrate towards the centre of the network and carry the bulk of the traffic, while the slow peers move out toward the fringes of the network where they will not carry as much traffic.

Searching is implemented by a requesting peer generating a search query message, which specifies search criteria, and transmitting the search query message for passage through the network. For example, in the network shown in Fig. 2, the search query messages are transmitted in the same manner as the ping messages. On receipt of a search query message, the respective peers (as well as transmitting the message on) use the search criteria to search their own stored content data. In the event of a match, the respective peers generate and transmit back a hit message. The hit message is a response message which is passed back through the network to the respective requesting peer. For example in the network shown in Fig. 2, the hit messages are transmitted in the same manner as the pong messages except of course that hit messages are only generated in peers where the search produces a match.

In the Gnutella protocol, the hit message identifies the located content file and address of the respective peer which generated the hit message, thereby allowing the requesting peer to connect directly thereto and download the content file. This is an aspect of the Gnutella protocol which is modified in the present protocol as described below.

The modifications of the Gnutella protocol in the present protocol will now be described.

The present protocol was designed for the purpose of sharing audio/visual (AN) data in a professional environment either within an organisation or between users in multiple organisations. As such, the modifications provide particular benefits in this context.

The present protocol uses an approach which abstracts any number (one or more) of content files of related content data as a single object. Hereinafter, the one or more content files will be called a "clip". Typically, a clip is a piece of footage that consists of one or more A/V files to represent the sequence of frames. For example, a clip may be a number of bitmap files, or it may be a single AVI file. A clip could also be a single still image file. Of course, in general the content files may contain any type of content data.

To achieve this abstraction, the present protocol uses a file structure comprising a plurality of content folders. Each content folder can contain any number of content files of related content data, which together constitute a clip. Each of the content folders is arranged in a common shared folder. This improves security as all content files in the shared folder may be shared, but there are no content files to be shared outside the content folder.

Fig. 3 illustrates an example of such a file structure having a single shared folder 20 in which three content folders 21, 22 and 23 are arranged. In the first content folder 21, there are several content files 24, in this case bit-map files. In the second content folder 22, there is a single content file 25, in this case an AVI file.

In contrast, the Gnutella protocol has no concept of grouping files into an abstract object and gives no any indication as to where files should be stored on each peer. Most applications have a default folder where shared files can be stored. Even if the user modifies the file structure, the files are maintained in a single indexed list regardless of the underlying folder hierarchy.

In addition in the present protocol, each content folder contains a description file in XML format. In the example shown in Fig. 3, the first content folder 21 has a description file 26 and the second content folder 22 has a description file 27.

Each description file contains metadata, in particular identification information and descriptive information describing the content file or files stored in the same content folder.

The identification information of each description file is unique within the network and serves to identify the content folder and hence the clip stored therein.

The identification information may be a SMPTE Unique Material Identifier (UMID) Unique naming of groups facilitates re-sharing as there will not be conflicts with existing shared subfolders In the hereinafter described examples, the identification information is called the "clip ID".

The descriptive information is used for searching. Providing descriptive information separately from the content files themselves facilitates a powerful searching technique. The use of XML means that the description file is extensible and could be modified to contain any kind of data. It is simple to find and extract fields from the files for display or searching purposes. In contrast, the Gnutella protocol relies on file names or properties or else data extracted from inside files to describe content. However, with A/V files in particular, it is infeasible to store all the information required for display and search purposes within the filenames of clips, and there is no descriptive data within the files, so the searching is ineffective.

An example of a description file in XML format is shown in Fig. 4. The text in bold within each XML element is the data which is specific to each description file. The first XML element identified by the tag <id> is the clip ID. The further XML elements are each a respective type of descriptive information, for example the name of the clip and keywords for the clip. As the protocol is concerned particularly with content data which is image data, the descriptive information includes information specifying the nature of the image data, for example the format, resolution and colour space of the image data. Encrypted metadata can also be included in the file, for example, rights and billing information. The description file does not need to contain rudimentary file information, as this is detected by the application. The file names, creation dates and file sizes are loaded into an internal file list when the application starts up. This means that if files are edited, removed or added to the subfolder, the index file does not need to be updated.

When a clip is downloaded, the content folder naming and structure is maintained and the description file is also downloaded and saved to the content folder. This facilitates the re-sharing of files, as the descriptive information does not have to be re-entered and is not subject to change.

In the present protocol, the application interprets the shared directory hierarchy and maintains a list of clips and infonnation relating to those clips.

Individual content files are uniquely identified on a peer by their associated clip ID and their file index.

The user interface in the application has been tailored to reflect these clip groupings.

The application has a window, for example as shown in Fig. 5, displayed on the display 8 of the peer 10 and providing a view on the local content folders by displaying a subset of the descriptive information extracted from the description file.

This abstraction hides the detail of the individual files, whilst enabling a user to manage all the files from a single place. This view also incorporates a clip previewer.

The application manages the storage of the content files and provides an interface which allows a user to input at least some of the descriptive information.

The application provides a window allowing the user to specify one or more files to be stored as a clip and a further window, for example as shown in Fig. 6, containing fields for accepting inputs from the user of respective elements of the descriptive information. Based on these inputs, the application generates the description file in XML format. In addition to the descriptive information input by the user, the application may generate some of the descriptive information automatically, for example based on the properties of the content files. The application also generates the clip ID as a UMID. Then, the application stores the generated description file and the content file or files specified by the user in the respective content folder.

The descriptive information may be input as part of a water-marking process for the content files.

There will now be described the searching process of the present protocol.

Searching uses search query messages and hit messages (being response messages in response to a search query message) which are passed through the network in the same manner as in the Gnutella protocol, as described above. In particular, requesting peers generate and transmit search query messages which are passed through the network. Peers storing content data, as well as passing the message on, search for content data matching the search criteria, and in the event of a match generate and transmit a hit message which is passed back through the network to the respective requesting peer from which the search message originated.

However, there are modifications to the nature of the messages and the processing performed within each peer in response to the messages, as follows.

The present protocol in fact implements two types of search, referred to as basic and advanced.

The basic search is fundamentally the same as the search in the Gnutella protocol and does not use the descriptive information in the description file.

Fig. 7 illustrates the basic search query message 30 generated and transmitted by a requesting peer. In Fig. 7, and also Figs. 8, 10 and 12 which show further messages, fields present in the corresponding Gnutella message are shaded lightly and fields added in the present protocol are shaded darkly. The basic search query message 30 includes the following fields in common with the Gnutella protocol: query header field 31 which includes a unique message identifier, message type (in this case a query), and TTL value; speed field 32; search string field 33; null field 34; query hit descriptor field 35 and peer ID field 36. In addition, the basic search query message 30 includes a query type field 37 indicating the query as being of the basic type.

The search performed by peers on receipt of the basic search query message is the same as in the Gnutella protocol. In the event of a match, those peers generate a basic hit message 40 which is illustrated in Fig. 8. The basic hit message includes the following fields in common with the Gnutella protocol: query hit header field 41; query hit descriptor field 42; and peer ID field 43; and also, in a portion repeated for each match: file index field 44; file size field 45; file name field 48; and two null fields 47 and 48. In addition, the basic hit message 40 includes a response type field 49 indicating the response as being of the basic type. Lastly, the basic hit message 40 includes, in each portion repeated for each match, the clip ID field 50 and a character 51 to separate the clip ID field 50 from the file name field 48.

Thus individual content files are uniquely identified by the clip ID in the clip ID field

and the file index in the file index field 44.

On receipt of hit messages, the requesting peer may download the individual content files identified in a respective hit message by connecting directly to the respective peer from which the hit message originated using an HTTP connection.

The download of each content file is managed by a respective shell, which in turn is responsible for monitoring the progress of a single socket receiving the download.

The advanced search uses the descriptive information in the description file, in order to support the abstraction of content files into clips. The advanced search is illustrated in Fig. 9.

In step Si, a requesting peer 11 generates and transmits an advanced search query message 60. Fig. 10 illustrates the advanced search query message 60. The advanced search query message 60 includes the following fields in common with the Gnutella protocol: query header 61; speed 62; null 63; query hit descriptor 64 and peer ID 65. In addition, the advanced search query message 60 includes a query type field 66 indicating the query as being of the advanced type.

Lastly, the advanced search query message 60 replaces the search string 33 in the basic search query message 30 by a search criteria field 67 containing search criteria in XML format. In particular, the search criteria field 67 has basically the same format as the description file described above and shown in Fig. 4, except without the clip ID. Thus, each XML element in the search criteria field is a respective type of search information, for example the name of the clip, keywords for the clip andlor information specifying the nature of the content data, for example the format, resolution and colour space in the case of image data.

The application provides an advanced search window, for example as shown in Fig. 11, displayed on the display 8 of the peer 10 and capable of accepting inputs from the user to generate the search criteria field 67 for a particular search. The advanced search window contains input fields for accepting inputs from the user of respective types of search information. The data input in each input field forms a respective type of search information as an XML element in the search criteria field 67. Thus, the each input field corresponds to a respective type of search information

in the search criteria field 67.

As described in more detail below, the types of search information included in the search criteria field 67, and hence the input fields in the advanced search window are variable in accordance with a predetermined personality of the requesting peer 10.

Each respective peer 12 storing content data performs steps S2 to S6 on receipt of an advanced search query message 60. Although only one respective peer 12 storing content data is shown in Fig. 9 for clarity, in reality there are several.

In step S2, the respective peer 12 records the advanced search query.

In step S3, the respective peer 12 transmits on the advanced search query message 60 for passage through the network.

In step S4, the respective peer 12 performs the search by comparing the search criteria in the search criteria field 67 of the advanced search query message 60 with the descriptive information in the description file of each content folder stored on the respective peer 12. In particular, the descriptive information which is in XML format is parsed with respect to the data in the search criteria field 67. This new searching facility allowed detailed searches to be carried out based on any of the descriptive information. In the event of the comparison giving a match between the search criteria and the descriptive information, the respective peer 12 responds by proceeding to steps S5 and S6.

In step S5, the origin of the advanced search query message 60 is looked up.

In step S6, the respective peer 12 generates an advanced hit message 70 and transmits it for passage back through the network to the requesting peer 11. Fig. 12 illustrates the advanced hit message 70. The advanced hit message 70 includes a query hit descriptor field 71 in common with the Gnutella protocol, but otherwise is different. The advanced hit message 70 includes a response type field 72 indicating the response as being of the advanced type. The advanced hit message 70 further includes a subset of the descriptive information taken from the description file for each matching content file. In particular, the advanced hit message 70 includes, in a portion repeated for each match, the following fields: frames field 73 indicating the number of frames in the clip; clip size field 74 indicating the size of the clip; format field 75 indicating the format of the clip; resolution field 75 indicating the resolution of the clip; clip ID field 77; a character 78 to separate the clip ID field 77 from the following field; description field 79; and two null fields 80 and 81.

Thus the content file or files in the matching content folder are not individually identified but are abstracted as a whole, by means of the clip ID in the clip ID field 77 and other descriptive information taken from the description file.

This provides sufficient information to distinguish between clips at ahigh level, but at the same time limits the amount of data in the advanced hit message 70. In this way, the abstraction into a clip reduces the amount of bandwidth occupied by searches. This is particularly advantageous for A/V content data.

The requesting peer 11 receives advanced hit messages 70 from any number of respective peers 12 storing content data, which may be none, one or more typically a plural number. In step S7, the requesting peer 11 filters the advanced hit messages 70 in accordance with the predetermined personality of the requesting peer 10, as described in more detail below.

In step S8, the requesting peer 11, in respect of each received advanced hit message not filtered out in step S7, displays the clip ID in the clip lID field 77 together with the other descriptive information within the advanced hit message 70.

For example, the application may provide a results window as shown in Fig. 13 in which the results for three matching content folders are displayed. As described in more detail below, the types of descriptive infonnation displayed in the results window are variable in accordance with the predetermined personality of the requesting peer 10.

The requesting peer 10 is provided with a predetermined personality. In accordance with this personality the behaviour of the requesting peer 10 in performing searches is variable. The predetermined personality is represented by personality data stored by the peer-to-peer application in the requesting peer. The personality data may identify one of a plural number of different personalities. The predetermined personality may be definable by a user, for example by the requesting peer 10 accepting inputs from a user. The user can define the personality by preferences including, but not exhaustively: * display files as clips, PAL or NTSC as standard * preferred document formats such as Word or Wordperfect * whether to display the results as a table, spreadsheet, or list etc * whether to archive search requests and results The idea behind the use of personalities is that there will be a wide variety of content data stored on the peer-to-peer network, but users of different requesting peers 10 will typically desire to locate different types of content data. The personality of the requesting peer may be selected to improve the effectiveness of the searching for the type of content data desired by the user of the requesting peer 10.

In particular, the personality may optimise the performance of a search for a given type of content data.

The different personalities may based on numerous different aspects of the desired content data. A particular possibility is that the predetermined personality relates to a type of content data, for example one of audio content data, video content data or document content data. In this case, the performance of the search is varied to optimize the searching capabilities for that type of content data. Another possibility is for the personality to relate to a material genre, for example motoring, electronics or gaming.

The behaviour of the request in peer 10 in performing searches may be varied in a number of different ways, as will now be described, but it is not essential to use each of these and other ways of varying the searching behaviour are envisaged.

The first way in which the searching behaviour may be variable is to vary the combination of types of search information included in the search criteria field 67 of an advance search querying message 60, as described above. An example, typical types of search information to the in combined for personality relating to video content data might be the video format (encoding/broadcast, the bit rate, the genre, the duration, and the age classification. Similarly, typical types of search information to be included for a personality relating to audio content data might be the format (e.g windows media, MP3), the artist, whether an album or a single, whether the content data is free or requires payment of a copyright fee.

As the input fields of the advanced search window each correspond to respective type of search information included in the search criteria fields 67, the advance search window is also varied in accordance with the predetermined personality. This is perceived by the user as a variation in the searching template of the user-interface. This could be accompanied by a change in the appearance of the user interface, but that is not essential. In addition, this could be accompanied with different settings and search features depending on the personality. For example, customized dialogue could prompt users for more relevant information in accordance with the predetermined personality.

As an example, the advance search window shown in Fig. 11 could be applied to a predetermined personality related to video content data, whereas the predetermined personality related to audio content data would use an advance search

window having different input fields.

In addition, in accordance with some personalities it is possible for the data of a given type of search information to be pre-selected, for example for selecting initial data in the input fields of the advance search window. For example, in the case of a predetermined personality for video content data, a search field for the video bit rate in megabits might be pre-selected by the choices 2, 4, 6, 8 or other. Similarly, a key word such as "film" might be pre-selected.

A second way to change the searching behaviour in accordance with the predetermined personality is to vary the extent to which the requesting peer is identified in a basic search query message 30 and an advanced search query message 60. As described above, in accordance with the Gnutella protocol the messages are identified by a unique message identifier, for example in the query header field 31 of the basic search query message 30, but the peer 10, 12 from which the message originates is not otherwise identified. However, in the present protocol, the peer 10 from which the message originates may be explicitly identified in the header field of the message. This is appropriate given the purpose of the present protocol of sharing AID data in a professional environment. In this case, identification of the originating peer avoids the requesting peer being anonymous and provides an audit trail of activity which is often desirable in a professional environment.

There are a number of ways in which the professing peer may be identified, for example by giving its IP address, or by using a digital signature. When security is important, a registration system for peers may be used so that peers are authenticated before they are permitted to join the peer-to-peer network. For example, either the central authentication site or third-party certification services could be used to verify the identity of the peers 10. Subsequently, a digital signature infrastructure could be adopted to unequally identify peers by including the appropriate digital signature in the messages. Certain techniques are useful, because messages can then be verified by peers receiving a request message prior to providing any response. Such precautions can also protect against rogue peers or denial of service attacks.

The extent to which the various techniques for identifying the request in peer 10 are applied may thus be varied in accordance with the predetermined personality of the peer. For example, the degree to which the user or their organisation or department is identified might be varied, so that they are either anonymous or identified. Another possibility is for the degree to which a digital signature is applied may be varied, depending on whether the personality is intended to search for secure encrypted content data or for non-confidential data. Another possibility is for the user to be identified by data appropriate for display on the respective peers 12 storing the content data, for example by a picture stamp.

A third way of varying the searching behaviour is to vary the routing of search requests. As described above, in the Gnutella protocol peers connect to a peer already in the network using an index or other users (a host cache). This may be varied by providing different indexes in respect of different personalities. Then, the basic search query message 30 and the advance search query message 60 are transmitted to the peers identified in the index in respect of the predetermined personality. The different indexes identify peers expected to store content data relevant to the respected personality.

A fourth way of varying the searching behaviour is to vary the filtering performed in step S7 described above. The filtering may be used to remove search results which are not relevant to the predetermined personality.

A fifth way of varying the searching behaviour is to vary the types of descriptive information which are displayed to a user in step S8 to identify the content data identified to the user. In this way, the search results may be displayed in a manner appropriate to the personality. For example, in respect of a personality related to video content data, the searches could display the information shown in Fig. 13, and could optionally also make use of return picture stamps andlor proxy video trailers to generate a key stamp representation of the content data located and to show these as a time line representational synopsis. For other personalities, different types of descriptive information might be displayed. In the case of a personality related to audio data, the types of descriptive information displayed could be the artist, the track names and a picture stamp showing the album cover and optionally also samples of the audio tracks themselves. In the case of a personality related to document content data, the types of descriptive information displayed could be a synopsis of the document, and may be the first page or cover page.

Documents may then be ordered in accordance with the group of matching with the original search information, which is indicative of the likely relevance to the search.

Other ways of varying the types of descriptive information is to show the search results as a spreadsheet table identifying the number of likely target groups according to the identity or location of the peers 12 returning the results.

There will now be described the download process used for an advanced search.

Downloading is controlled by the requesting peer 11. In general terms, after the requesting peer 11 receives an advanced hit message 70, it connects directly to the respective peer 12 from which the hit message originated using an HTTP connection, and then downloads one or more content files from the matching content folder identified by the clip ID 77 in the advanced hit message 70. There are two options for doing this.

The first option is for the requesting peer 11 to send a folder request message to the respective peer 12 storing the matching content folder. The respective peer 12, on receipt of the folder request message is then responsible for locating the relevant files and pushing them to the requesting peer 11 by connecting to the requesting peer 11 using an HTTP connection. Although simple for the requesting peer 11, this puts a heavy load on the respective peer 12 storing the matching content folder, in particular forcing it to manage the progress of all download requests as well as performing searches.

The second option is as follows. The general approach is that the requesting peer 11 managing the downloads and the respective peer 12 storing the matching content folder dealing only with individual file requests. This avoids placing the burden of managing downloads on the respective peer 12 storing the matching content folder and so is preferred to the first option.

Given that the advanced hit message 70 abstracts the clip of one or more content files, the first stage of the download process is for the requesting peer 11 to retrieve a file list of each content file in the identified content folder. To achieve this the requesting peer 11 generates a file list query message 90, as shown in Fig. 14, and transmits it for passage through the network to the respective peer 12 from which the advanced hit message originated. The file list query message 90 contains the following fields: query header field 91; speed field 92; query type field 93; clip ID field 94; null field 95; query hit descriptor field 96 and peer ID field 97. The clip ID in the clip ID field 94 is taken from the clip ID field 77 of the advanced hit message and therefore identifies the matching content folder in the respective peer 12. The requesting peer 11 logs the purpose of the query (for example for download or for display if the user merely wants to see the content files within a given matching content folder).

On receipt of the file list query message 90, the respective peer 12 from which the advanced hit message 70 originated looks up the content folder identified by the clip ID field 94, and then generates a file list response message 100, as shown in Fig. 15, and transmits it for passage back through the network to the requesting peer 11.

The file list response message 100 includes the following fields: query hit header field 101; response type field 102; clip ID field 103; and two null fields 104 and 105.

The clip ID in the clip ID field 103 is the same as in the clip ID field 94 in the file list query message 90. The file list response message 100 includes the following fields in portions repeated for each file in the matching content folder: file index field 106; file size field 107; file name field 108; and two null fields 109 and 110. Each file is uniquely identifies by the respective file index in the file index field 106 together

with the clip ID in the clip ID field 103.

On receipt of the file list response message 100, the requesting peer 11 from which the file list query message 90 originated looks up the purpose of the response in its log and process the results accordingly. If the purpose is display, the individual files identified in the file list response message are displayed. If the purpose is a download, the requesting peer 11 manages the downloads of the description file and each content file in the content folder, or alternatively individual content files selected based on inputs from the user. In a commercial context, this latter alternative is important, as users will only want to pay for the files they want to use. For example, video editors will only pay for the rights to the individual frames they need.

To effect the download, the requesting peer 11 creates respective download shells to download each desired file. These shells cause the requesting peer 11 to connect to the respective peer 12 storing the content data using an HTTP connection and to download the respective file. Thus within the requesting peer 11 each shell manages download of a single file and monitors a respective socket.

By downloading the description file together with the content files all data relating to the clip is passed on and could later be shared out and searched from the requesting peer 11. However, initially the downloaded files are stored in a download folder outside the shared folder storing content data on the requesting peer 11, in order to prevent downloads being shared automatically.

In the protocol described above, each executes the same peer-to-peer application. Thus each peer is capable of both (1) storing content data and (2) downloading content data. However this is not essential. A given peer might only download content data, or might only store content data. In either case, the given peer would only need part of the application described above.

The protocol described above is decentralised, in that there is no central peer storing an index of the content stored in plurality of other peers. The file structure of the present invention provides particular advantage in such a decentralised network.

However, in principle, the file structure could be applied to a centralised peer-to-peer network, in which case the similar advantages in searching would be achieved.

Claims

Claims 1. A peer-to-peer network of peers which are interconnected for

sharing content data, wherein the peers are arranged to pass messages through the network; and respective searching peers in the network are arranged to perform searches for content data by generating and transmitting search query messages which specify search criteria for passage through the network, by receiving hit messages generated in response to the search query message by peers storing content data and by displaying to a user descriptive information contained in the hit messages, the behaviour of each respective searching peer in performing searches being variable in accordance with a predetermined personality of the respective searching peer.
2. A peer-to-peer network according to claim 1, wherein the search query messages specify the search criteria as a plurality of types of search information, and said behaviour of each respective searching peer in performing searches which is variable comprises the combination of types of search information included in a search query message.
3. A peer-to-peer network according to claim 2, wherein the respective searching peers are arranged to accept inputs from a user of the data to form the search criteria, the inputs being accepted in a window displayed by the peer having a variable combination of input fields each corresponding to a respective type of search information in the variable combination of types of search information included in a search query message.
4. A peer-to-peer network according to claim 2 or 3, wherein search criteria is in XML format, each type of search information being a respective XML element.
5. A peer-to-peer network according to any one of the preceding claims, wherein said behaviour of each respective searching peer in performing searches which is variable comprises the extent to which the respective searching peer is identified in a search query message.
6. A peer-to-peer network according to any one of the preceding claims, wherein the descriptive information contained in the hit messages is of a plurality of types, and said behaviour of each respective searching peer in performing searches which is variable comprises the types of descriptive information contained in the hit messages which are displayed to a user.
7. A peer-to-peer network according to claim 6, wherein descriptive information contained in the hit messages is in XML format, each type of search information being a respective XML element.
8. A peer-to-peer network according to any one of the preceding claims, wherein said predetermined personality of the respective searching peer is definable by a user.
9. A peer-to-peer network according to any one of the preceding claims, wherein predetermined personality of the respective searching peer is represented by personality data stored in the respective searching peers.
10. A peer-to-peer network according to any one of the preceding claims, wherein the predetermined personality of the respective searching peer relates to a type of content data.
11. A peer arranged to connect into a peer-to-peer network of peers which are interconnected for sharing content data, wherein the peer is arranged to perform searches for content data by generating and transmitting search query messages which specify search criteria for passage through the network, by receiving hit messages generated in response to the search query message by peers storing content data and by displaying to a user descriptive information contained in the hit messages, the behaviour of the peer in performing searches being variable in accordance with a predetermined personality of the peer.
12. A peer according to claim 11, wherein the search query messages specify the search criteria as a plurality of types of search information, and said behaviour of each respective searching peer in performing searches which is variable comprises the combination of types of search information included in a search query message.
13. A peer according to claim 12, wherein the respective searching peers are arranged to accept inputs from a user of the data to form the search criteria, the inputs being accepted in a window displayed by the peer having a variable combination of input fields each corresponding to a respective type of search information in the variable combination of types of search information included in a search query message.
14. A peer according to claim 12 or 13, wherein search criteria is in XML format, each type of search information being a respective XML element.
15. A peer according to claims 11 to 14, wherein said behaviour of each respective searching peer in performing searches which is variable comprises the extent to which the respective searching peer is identified in a search query message.
16. A peer according to claims 11 to 15, wherein the descriptive information contained in the hit messages is of a plurality of types, and said behaviour of each respective searching peer in performing searches which is variable comprises the types of descriptive information contained in the hit messages which are displayed to a user.
17. A peer according to claim 16, wherein descriptive information contained in the hit messages is in XML format, each type of search information being a respective XML element.
18. A peer according to claims 11 to 1 7, wherein said predetermined personality of the respective searching peer is definable by a user.
19. A peer according to claims 11 to 18, wherein predetermined personality of the respective searching peer is represented by personality data stored in the respective searching peers.
20. A peer according to claims 11 to 19, wherein the predetermined personality of the respective searching peer relates to a type of content data.
21. A peer-to-peer application capable of execution on a computer system and capable, when so executed, of causing the computer system to act as a peer according to any one of claims 11 to 20.
22. A storage medium storing a peer-to-peer application according to claim 21.
23. A peer-to-peer network constructed and arranged to operate substantively as hereinbefore described with reference to the accompanying drawings.
24. A peer constructed and arranged to operate substantively as hereinbefore described with reference to the accompanying drawings.
25. A peer-to-peer application substantively as hereinbefore described with reference to the accompanying drawings.