EP1938219A1

EP1938219A1 - Method for sorting a set of electronic documents

Info

Publication number: EP1938219A1
Application number: EP06808294A
Authority: EP
Inventors: Jérôme GALTIER
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2005-09-20
Filing date: 2006-09-07
Publication date: 2008-07-02
Also published as: US7827173B2; WO2007034096A1; CN101268465B; CN101268465A; JP2009509259A; US20080208860A1

Abstract

The invention concerns a method for sorting a set of electronic documents, including the following steps: determining (S110) for each pair of documents {u, v} of the set the degree of correlation ω{ u,v} between the documents u and v; determining a function X of projection between the set of documents and a sphere of the set Rd where d is a positive integer, the function X being such that, for at least one document u, the distance in Rd between two points X (u) and X (v) where v is a document for which there is a correlation between the documents u and v, is as smaller as the degree of correlation is high; performing a sorting operation (S140) on at least one part of the set of documents based on the values taken by the function X.

Description

Method of sorting a set of electronic documents

The invention relates to the fields of telecommunications and in particular to the field of search engines for the search of electronic documents.

More specifically, the invention relates to a method of sorting a set of electronic documents. Such a set results for example from a search carried out by a user by means of a search engine on an Internet-type network, the electronic documents being in this case Web pages (abbreviation of "World Wide Web"), which can be accessed locally via a local storage medium or remotely via the network.

Search engines use several techniques for sorting or sorting pages from a search. Among the known techniques for exploring a set of web pages, some are based on semantics, a page being classified as being all the more relevant because it includes a large number of occurrences of the searched word or words. These techniques are sensitive to a practice known as the Anglo-Saxon

"spamming", intended to include in a given page a very large number of times the words commonly used by users in their search query, which has the effect of frequently appearing the page as relevant.

Other techniques are based on the topological structure of the Web. These techniques take into account both the links between the pages considered and the properties of the pages themselves, such as the membership of a page to a domain or a network subdomain of the Web. These techniques are generally based on a graphical representation of the pages to be processed. They are appropriate for the classification of pages corresponding to topological properties given in the graph. These techniques are sensitive to a variant of the practice of "spamming" aimed at referencing a large number of times a given page, which has the effect of distorting locally the topological characteristics of the graph of the Web.

Some of the techniques exploiting the topological structure of the Web consist of classifying Web pages by assigning to the different pages a rank that is a function of the relationships of a page with the others.

An example of such a method, known as "PageRank", is used in the implementation of the Google ™ search engine and is described in the document: "The Page Rank Quotation Ranking: Bringing Order on the Web" by L. Page, S. Brin, R. Motwani and T. Winograd; Technical Report, Computer Science Department, Stanford University, 1998.

The PageRank method orders pages based on their visibility on the Web. In this method, a random navigation from page to page on the Web following the hypertext links, is simulated. This navigation corresponds to that caused by a user accessing the Web when the latter randomly activates one of the hypertext links in a page viewed to access another page. This method carries out a probabilistic analysis of this simulated navigation in order to determine the probability for the user to find himself on a given page during such random navigation from page to page. The rank of a page is even higher than the number of times this page is quoted by other pages is high.

Such a method provides a ranking rank that is not necessarily relevant to the search performed by a user, the top ranked pages (of highest rank) not necessarily being the pages corresponding best waiting for the user.

In addition, this method does not make it possible to identify in the set of documents thematic communities or communities of interest, likely to point the user more quickly to an interesting page.

Finally, in the case where a user identifies in the set of documents presented a document of particular interest to him, it is not possible by using a list of documents ordered according to simply their rank, to easily determine whether or not other documents, similar to the document of interest or related to it in one way or another, are present in all the documents.

The object of the invention is therefore in particular to solve the aforementioned drawbacks of the state of the art by proposing a technique for sorting electronic documents, for example Web pages, which makes it possible in particular to detect spamming problems, which is applicable. to a large set of documents while being fast in its implementation, and which allows to simply obtain a sort of documents, not by ranking, but by building communities of documents or subset of documents close each other, that this concept of proximity is defined according to the semantic content of the documents, the hypertext links between these documents or in a different way.

For this purpose, the object of the invention is, according to a first aspect, a method for sorting a set of electronic documents, comprising:

a step of counting hypertext links or cocitations present between each pair of documents {u, v} of said set,

a step of determining, for each pair of documents {u, v} of said set, a degree of correlation ω (u, v) between the documents u and v, said degree of correlation being a function of the number of links obtained at following the counting step, - a step of determining, for each document u of said set from a point X (M) located on a sphere associated to the set C ^d where C is the set of real and d is a positive integer, for at least one document u1 of said set the distance in _D ^d between the associated points x (ul) and x {ul), where u2 is a document for which there is correlation between the documents u1 and u2, being smaller as the degree of correlation between the documents u1 and u2 is high,

a step of sorting at least a part of said set of documents according to the points determined on said sphere.

The fact of using a sphere to determine the position of the points is original in that it allows to simply define for the points obtained - and therefore for the associated documents - relative positions of these points relative to each other. Indeed, in this mode of representation no point is favored compared to another. Consequently, the relative position between two points, and therefore the distance between these two points, can be used to represent a degree of correlation between the two documents associated with these two points. The representation thus obtained reflects the correlations or links between the documents concerned.

Having a representation of all the documents on a sphere, for example a sphere in a three-dimensional space, makes it possible to envisage any sort of sorting operations: by selection, classification, filtering, classification, since each document is now represented by a simple n-tuple of coordinates in an N-dimensional space (for example by a coordinate triplet in a three-dimensional space).

The applications of the process are multiple: building clusters of documents, filing or selection of documents. These operations are performed in the space R / as a function of the spatial position of the projections of the documents or on the basis of measurement of distance, that is to say by taking into account their degree of correlation or proximity as determined.

The method according to the invention can for example be used to perform all sorts of sorting operations, classification, classification of WEB pages resulting from a search carried out by means of a search engine, the most original pages, it is that is, those that are furthest from each other, for example, being ranked first.

As an alternative or in combination, the pages are sorted by group, each group corresponding to a set of pages whose projections by the function X are in a predefined space area of the sphere of the space R /. Preferably, in this variant, a partition of this sphere into spatial zones is defined, and the documents are classified according to the belonging of their projection to one of the spatial zones of the partition. The method according to the invention can also be used to detect the presence of "Spam", that is to say pages that point to each other, because the projections of all these pages on the sphere S will be found substantially close to each other.

The method according to the invention can also be used to generate a visual representation of the WEB pages resulting from a search carried out by means of a search engine.

According to a first variant of the method, in which at least one of the documents has at least one hypertext link to at least one other document, the degree of correlation between two documents u and v is determined as a function of the number of hypertext links and / or the number of cocitation bonds, present between the documents u and v, the degree of correlation being even higher than this number is high, the lack of correlation corresponding to the absence of links.

This first variant advantageously allows the realization of sorting operations taking into account the hypertext links or cocitation between documents. According to a second variant of the method, the degree of correlation between two documents u and v is determined as a function of a measure of proximity of the semantic contents of the documents u and v, the degree of correlation being all the greater as this measure is weak, the absence of correlation corresponding to a measurement lower than a predefined threshold. This second variant advantageously allows the realization of operations sorting that takes into account the semantic content of documents.

According to a third variant, the degree of correlation is determined according to the favorite pages defined by a plurality of users. In this case, each user is associated with a set of documents (his favorite pages), the degree of correlation between two documents u and v being determined as the number of such sets to which the documents u and v belong.

This third variant advantageously makes it possible to take account of user profiles in determining the degree of correlation between pages.

The three variants can further be combined in them to determine a degree of correlation that takes into account both hypertext links, semantic content and / or user preferences. Any other type of link between two documents can also be used for the definition of a degree of correlation.

According to a particular embodiment, the method further comprising:

a step of defining an initial function Z ₀ for projecting said set onto said sphere,

a step of determining a projection function Z of said set on said sphere, said projection function Z being obtained from the initial function Z ₀ in at least one iteration, each iteration consisting in determining a function Z ₁ from the function Z _1-1 obtained at the preceding iteration by replacing, for at least one document u of said set, the value of Z _1-1 (w) by the value X ₁ (w) allowing to optimize a predefined criterion which is a function of the value of Z _{^ 1} (W) as well as values of Z _1-1 (V) and degrees of correlation ω {u, v) between the documents u and v for any document v belonging to the whole audit. The method according to the invention lends itself to an iterative determination of the X function, which simplifies its implementation and makes it possible to precisely control the convergence of the method.

Preferably the function Z ₀ is defined randomly. Starting from a random function statistically improves the speed of convergence to the desired X function, without the need for prior knowledge of the function to be obtained.

In this embodiment, the optimization of the predefined criterion consists in to maximize for the document u the value of a quantity Δ (w) equal to:

A (u) = 2 ^ ( ^M ' ^V ) | U; _ _I (M) - ^; _I (V) || -

{».V} _e £ ^M ' ^{X ll} " with δ (u, v) = \ -ω (u, v), 0 ≤ ω (u, v) ≤ l, ω (u, v) = O in l absence of correlation between the documents u and v, the value X ₁ (u) being equal to X ₁ Qt) = -Y (U) with

Y (U) = Σ ^, V) X _1-1 (V) if Y (U) ≠ O, veV- {"} the value X ₁ (w) being equal to X _1-1 (M) if F ( CZ) = O.

The invention also relates to a computer program on an information medium readable by a computer system, said program comprising instructions for implementing a method according to the invention as briefly defined above, when this program is loaded. then executed by a computer system.

The invention also relates to a data processing device comprising data processing means for executing the steps of a method according to the invention. Such a device is for example a computer server implementing a document search engine.

The invention also relates to a recording medium, readable by a computer system, comprising a program comprising program code instructions for implementing a method according to the invention when said program is executed by a system. computer science.

Other objects, features and advantages of the invention will become apparent from the description which will follow, given solely by way of nonlimiting example, and made with reference to the appended drawings in which FIG. 1 is a flowchart of a mode. embodiment of the method according to the invention.

The method according to the invention is applied to a set of electronic documents, in particular a set of WEB pages, some of which contain one or more hypertext links to one or more other pages.

In the embodiment chosen and illustrated, the degree of correlation between two documents u and v of the set of documents V is determined as a function of the number of hypertext links and cocitation links existing between the documents u and v. For the determination of the number of hypertext links between two documents, the meaning of the hypertext links is not taken into account and "symmetrized" hypertext links are considered, that is to say that one treats in the same way the case where the document u has a link to the document v and the case where the document v has a link to the document u.

Two documents u and v have a cocitation link if there is at least one other document w such that:

there is at least one hypertext link pointing from w to u, and

there is at least one hypertext link pointing from w to v. The steps of the method according to the invention are now described in more detail with reference to FIG.

Step S100 consists in determining for any pair (u, v) of documents of set V a weight a \ {u, v) which is a function of the number of hypertext links between documents u and v. Preferably, the function ω _λ {u, v) is an increasing function of the number of hypertext links between the documents u and v.

Preferably, the value of a \ {u, v) is between a predefined minimum value (typically 0) and a predefined maximum value (typically 1). In this case, the minimum value corresponds to the absence of a hypertext link between the documents u and v, and the maximum value corresponds for example to the presence of a predetermined minimum number of hypertext links between the documents u and v.

According to a first example, the value of a \ (u, v) is chosen equal to 0 in the absence of hypertext link and equal to 1 in the presence of at least one hypertext link between the documents u and v.

According to a second example, the value of o \ (u, v) is chosen equal to 0 in the absence of hypertext link, equal to 0.5 in the presence of a single hypertext link between the documents u and v and equal to 1 in the presence of two or more hypertext links between documents u and v.

According to a third example, the value of a \ (u, v) is defined as a continuously increasing function of the number N _h of hypertext links between the documents u and v, for example:

where N _hm a _x is a threshold capping the number N _h of hypertext links. Step S105 consists in determining for any pair (u, v) of documents of set V a weight ω ₂ (u, v) which is a function of the number of cocitation links between documents u and v. Preferably, the function ω ₂ (u, v) is an increasing function of the number of cocitation links between the documents u and v. The function definition examples given for a \ (u, v) are transposable to ω ₂ (u, v). For example, the value of ω ₂ (u, v) is chosen equal to 0 in the absence of a cocitation link and equal to 1 in the presence of at least one cocitation link between the documents u and v.

Step S1 consists in determining for each pair (u, v) of documents the degree of correlation ω {u, v) associated with a pair {w, v} by the relation ω (u, v} = u). ₁ (U, v) + k ₂ ω ₂ (u, v), where k1 and k2 are real coefficients such that, 0 <Ic ₁ ≤ 1, 0 ≤ * ₂ ≤ 1, Jk ₁ -I-Jt ₂ = I .

The degree of correlation ω (u, v) thus takes real values between 0 and 1, the value 0 corresponding to the absence of links.

The value given to the coefficient k1 will be chosen all the more high if one wishes to give importance to the presence of hypertext links. On the other hand, the value given to the coefficient k2 will be chosen all the more high if we wish to give importance to the presence of cocitation bonds.

This method of determining the degree of correlation between the documents makes it possible to take into account, in the document classification method according to the invention, two types of links between documents: the hypertext links and the cocitation links.

This method is generalized to other types of links. For example, we can define that two documents u and v are linked to each other by an indirect hypertext link if there exists one or more hypertext links making it possible to go from u to v, the number of hypertext links being in this case greater than or equal to to 2. According to another example, one can consider the links of semantic type between the documents. In this case, the determination of the degree of correlation between two documents is made on the basis of an analysis and a comparison of the semantic content of both documents. For this purpose, known semantic content comparison methods are applicable. The degree of correlation then represents a measure of the semantic proximity between the two documents. The degree of semantic correlation can be determined for example on the basis of a statistical analysis and comparison of the words contained in each of the documents. As a variant, it is possible to define a distance between two documents and to define the degree of correlation as a decreasing function of the defined distance, so that the smaller the distance between two documents, the greater the degree of correlation between these two documents. documents is high. The method is generalizable finally to any number of links, whatever their type. The degree of correlation between two documents is then determined as a weighted sum of elementary correlation degrees, for example a sum of a degree of correlation function of the number of cohesive links between the two documents and a degree of correlation function of semantic contents of both documents. The method allows the simultaneous taking into account of information provided by hypertext links between documents and by the semantic content of documents.

Returning to FIG. 1, the following steps S120 to S135 consist in determining a projection function X between the set V of documents and a sphere S of the set R / (Cartesian power d-th of R, where R denotes l set of real numbers and d is a positive integer). Preferably d is chosen equal to 2 or 3.

The determined function X is such that, for at least one document u, the distance in R ^ between two points X (u) and X (v) where v is a document for the correlation between the documents u and v, is even smaller as the degree of correlation is high.

According to a particular embodiment, an iterative process is used for the determination of the function X. Each iteration of this iterative process consists in determining a function X ₁ from the function X _1-1 obtained in the preceding step, in replacing, for at least one document u of the set V, the value of X _1-1 [U) by the value of X ₁ (w) making it possible to optimize a predefined criterion; this criterion is on the one hand a function of the value of X _1-1 (U) obtained for the document u considered and values of -X ^" , _i (v) obtained for any document v of the set V, and of 'other on the other hand, a function of the degrees of correlation ω (u, v) between the document u and any document v of the set V. The criterion is chosen so as to converge the sequence of functions X ₁ to an X function presenting the properties listed above. Preferably, the optimization of said predefined criterion consists in maximizing for a given document u the value of a quantity Δ (w) equal to,

| 2

A (u) = Σ δ (u, v) X (κ) -X (i

{w.vje £ with δ (u, v) = \ -ω (u, v), 0 ≤ ω (u, v) ≤ l, and ω (u, v) = 0 in the absence of correlation between documents u and v. In step S120, the initial projection function Z ₀ is determined. Preferably, the initial function Z ₀ takes random values on the sphere S. The iterative process is then applied to the current function X ₁ = X ₀ .

From step S125 begins the iterative process of determining the projection function X. An iteration corresponds to the execution of steps S125, S130 and S135. The iterations are indexed by the index i. At the end of step S120 the index i takes its initial value and is 0.

In step S125 this index is incremented: i = i + 1.

In step S130, the following operations are performed for at least one document u: the value of Y (U) = Σ S (v v) X _1-1 (V) vev-iu is determined;

- if Y (U) ≠ O, we calculate X, (u) from Y (U) by Z "= -y (£ /) / | y (£ /) | ,

if Y (U) = 0, we take X ₁ (u) equal to Z _1-1 (u).

In step S135, it is determined whether the iterative process terminates. Preferably, the process is iterated a sufficient number of times for the function X to be modified at least once for each document u of the set V.

The sequence of Z ₁ functions converging quickly, even with a random start function, it is possible to iterate a limited number of times on all the documents.

The decision to stop the iterations can also be based on: - the number of iterations already carried out, a measure of the convergence of the function, carried out after each iteration.

This measure of the convergence can be carried out by calculating after each iteration the sum Δ ₍ in the following way: Δ, = Σ | X, ( _M ) - X, ^)!

and setting a threshold value, possibly depending on the number of documents u of the set V, below which the iterative process stops.

If, in step S135, the decision to stop the iterative process is taken, then step S140 is executed; otherwise, the next iteration is executed from step S125.

In step S140, a sorting operation is performed on at least a part of the set V of the documents as a function of the values taken by the function X obtained at the last iteration.

Thanks to the determined projection function X, the position of a point X (u) on the sphere S is a function of the links of the document u with the other documents. In particular, the distance between two points is representative of the degree of correlation between the documents corresponding to these two points.

It is conceivable to use other mathematical criteria to converge the initial random function to such a function. In the case where the set V is a set of WEB pages resulting from a search carried out by means of a search engine, this sorting operation may aim at:

- select the most original pages, by detecting pages with projections farthest from other projections; - filter the pages containing "Spam" (pages that point to each other) by detecting the pages whose projections are substantially close to the projections of a group of pages;

- select the pages whose projections meet a certain criterion. According to a first variant, the sorting operation comprises the following operations:

computation for any pair {w, v} of the set V of the value of the distance d (u, v) = \\ x (u) -X (v) \\,

determination of at least one subset V ₁ of the set V on which the value d (u, v) meets a predefined criterion, for example by being greater or less than a predefined threshold.

This first variant makes it possible to detect clusters of points on the sphere and thus to determine the corresponding clusters of documents. According to a second variant, the sorting operation comprises the operation of determining a subset V ₁ for which any point X (u) belongs to a given set, for example to a predefined area of the space in R /.

This zone may be for example the interior volume of a sphere, a cube, or a surface defined on the sphere S of R ^d . By repeating this operation for several predefined zones, it is possible to create partitions or a segmentation of all the documents.

The method according to the invention thus makes it possible to perform all sorts of sorting operations on a set of documents, on the basis of the values taken by the determined function X. In addition, it can be demonstrated that the process of determining the function

X converges quickly.

In addition, the calculation time of an iteration of this process is proportional to the number of hypertext links when the degree of correlation is determined as a function of this number of hypertext links. The method of the invention can therefore be used on a large number of pages.

Finally, in case of modification of all the electronic documents, (by adding document, deleting a document or modifying links between documents), it suffices to start from the function X obtained for the unmodified set, then to proceed at the execution of step 130 for some selected documents (preferably at least for documents that have been modified or added) to determine a function X corrected and that takes into account the modified set of documents e. The invention is therefore particularly suitable for processing sets containing a large number of documents, a part of which is regularly updated. In a variant of the method according to the invention, there is generated a graphical representation of the function X, that is to say a representation of said sphere and points X (u) located on said sphere. Generating such a graphical representation makes it easier for the user to select relevant sets of documents. This representation can be done for example in the form of a two-dimensional cartographic representation, in which each document is represented by a graphic symbol corresponding to the value of the function X determined for this document.

The invention thus lends itself to an embodiment in which this graphic representation is displayed on a user's computer terminal, comprising a display screen and a graphic selection tool (for example a mouse used in combination with a pointer allowing define graphic areas on the screen), this tool being suitable for selecting at least a portion of the graphical representation.

The user is then able to make a selection of one or more parts of the graphical representation corresponding to one or more sets, chosen by him, of documents. The terminal obtains via the graphical selection tool data defining the selected parts. According to these data, the terminal sorts the set V of the documents. For example, it generates a reduced list of documents, corresponding to documents whose projection is in the parts selected by the user. Alternatively, the documents whose projection is in the selected parts are instead eliminated. From the list of documents retained by the user, additional sorting operations can be performed, these operations being performed automatically on the basis of document properties or their degree of correlation, or performed manually, based on new parts selected within the initially selected parts.

This mode of viewing the results of a search performed by a search engine is particularly ergonomic for the user. It brings up communities of documents, as a set of points close to each other,

The projection representation as defined in the invention therefore makes it possible to sort or classify, either visually and manually by means of a graphic selection tool, or automatically, according to predefined criteria related to the position of these documents. in the generated representation. According to a preferred implementation, the steps of the method of sorting electronic documents, according to the invention, are determined by instructions of a computer program.

The term "computer program" herein refers to one or more computer programs forming a set (software) whose purpose is the implementation of the invention when it is executed by an appropriate computer system. The method according to the invention is then implemented when the aforesaid program is loaded in computer means incorporated, for example, in a user terminal connected if necessary to an Internet type network and equipped with Internet browser software. Accordingly, the invention also relates to such a computer program, particularly in the form of software stored on an information carrier. Such an information carrier may be constituted by any entity or device capable of storing a program according to the invention.

For example, the medium in question may comprise a hardware storage means, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or a magnetic recording means, for example a hard disk. As a variant, the information carrier may be an integrated circuit in which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the method in question. On the other hand, the information medium can also be a transmissible immaterial medium, such as an electrical or optical signal that can be conveyed via an electrical or optical cable, by radio or by other means. A program according to the invention can in particular be downloaded to an Internet type network.

From a design point of view, a computer program according to the invention can use any programming language and be in the form of source code, object code, or intermediate code between source code and object code (for example eg, a partially compiled form), or in any other form desirable for implementing a method according to the invention.

Claims

A method of sorting a set of electronic documents, comprising:

a determination step (S110), for each pair of documents {u, v) of said set, of a degree of correlation ω (u, v) between the documents u and v, said degree of correlation being a function of the number of links obtained at the end of the counting step,

a determining step (S120, S125, S130, S135), for each document u of said set, of an associated point X (u) situated on a sphere of the set E ^, where C is the set of reals and d is a positive integer, for at least

a document u1 of said set the distance in C ^d between the points x (ul) and x (u2) associated, where u2 is a document for which there is correlation between the documents u1 and u2, being all the smaller as the degree of correlation between the documents u1 and u2 is high, - a step of sorting (S140) at least a part of said set of documents according to the points determined on said sphere.

2. Method according to claim 1, wherein at least one of said documents has at least one hypertext link to at least one other document, the degree of correlation between two documents u and v being determined according to the number of hypertext links, and or the number of cocitation bonds, present between the documents u and v, the degree of correlation being even higher than the number of links is high, the lack of correlation corresponding to the absence of links.

3. Method according to claim 1 or 2, in which the degree of correlation between two documents u and v is a function of a measure of proximity of the semantic contents of the documents u and v, the degree of correlation being all the higher as said measurement is small, the absence of correlation corresponding to a measurement lower than a predefined threshold.

4. Process according to claim 1 or 2, comprising

a step of defining an initial projection function (S120) Z ₀ together on said sphere,

a determination step (S125, S130, S135) of a projection function Z of said set on said sphere, said projection function Z being obtained from the initial function Z ₀ in at least one iteration, each iteration comprising determining a function Z ₁ from the function Z _1-1 obtained at the preceding iteration by replacing, for at least one document u of said set, the value of Z _1-1 ()) by the value X _t (u) allowing to optimize a predefined criterion which is a function of the value of Z _1-1 (^) as well as values of Z _1-1 (V) and degrees of correlation ω {u, v) between documents u and v for any document v belonging to the set.

5. Method according to claim 3 or 4, wherein the optimization of said predefined criterion consists in maximizing for the document u the value of a quantity Δ (M) equal to,

Δ (M) = 2 £ («> v) * ._. (") - Y. _. ( _v ) -

{«.V} _e £"' ^{A ι ι} u with δ (u, v) - l-ω (u, v), 0 ≤ ω (u, v) ≤ l, ω (u, v) = 0 in the absence of correlation between the documents u and v, the value X ₁ (u) being equal to X, (u) = -Y (U) / \ Y (U) \\ with

Y (U) = Σ 5 (M ₅ V) X _1-1 (V) if Y (U) ≠ O, veV- {"} the value X ₁ (u) being equal to Z _1-1 (u) if Y (U) = 0.

The method of any one of the preceding claims, further comprising a step of generating a graphical representation of said sphere and X (u) points located on said sphere.

The method of claim 6, further comprising the steps of:

displaying said graphic representation on a terminal,

providing a user of the terminal with an appropriate graphical selection tool for graphically selecting at least a portion of said graphical representation, obtaining data defining said at least one part selected by said user,

sorting said set of documents according to said data.

A program comprising program code instructions recorded on a computer readable medium for implementing a method as claimed in any one of claims 1 to 7 when said program is executed by a computer system.

Data processing apparatus comprising data processing means for executing the steps of a method according to any one of claims 1 to 7.

A recording medium, readable by a computer system, comprising a program comprising program code instructions for implementing a method according to any of claims 1 to 7 when said program is executed by a system. computer science.