AU2007231745A1

AU2007231745A1 - A method of analyzing and processing requests applied to a search engine

Info

Publication number: AU2007231745A1
Application number: AU2007231745A
Authority: AU
Inventors: Ronald Moreno
Original assignee: Innovatron SA
Current assignee: Innovatron SA
Priority date: 2006-11-13
Filing date: 2007-11-07
Publication date: 2008-05-29
Also published as: CA2610517A1

Description

AUSTRALIA

Patents Act 1990 COMPLETE SPECIFICATION Standard Patent Applicant(s):

INNOVATRON

Invention Title: A METHOD OF ANALYZING AND PROCESSING REQUESTS APPLIED TO A SEARCH ENGINE The following statement is a full description of this invention, including the best method for performing it known to me/us: A METHOD OF ANALYZING AND PROCESSING REQUESTS APPLIED TO A SEARCH ENGINE o The invention relates to a method of analyzing and Z processing requests applied to a search engine.

More precisely, it seeks to improve the performance of search engines so as to extract looked-for information in a manner that is more efficient and less expensive (in terms of computer resources) The recent development of applications associated with using the Internet has made the use of search engines such as Google (trademark of Google Inc.) or Exalead (trademark of Exalead SA) popular when searching for information, because of the ability of such search engines to index the content of several billions of pages that are available for consultation on all sorts of Internet sites.

These search engines (referred to below as "engines") are used by sending them a request containing a word or a "phrase", i.e. a plurality of words (in the absence of particular operators such as AND, OR, etc., the words are deemed to be associated by logic ANDs).

The engine replies to the request within a fraction of a second, giving a numerical value (labeled "results") that is representative of the number of occurrences ("hits") of pages containing the word (or the words of the phrase) amongst the set of pages indexed by the engine, together with a list of corresponding pertinent Internet sites.

The number of occurrences (referred to below as the "result") given in return by the engine depends very greatly on the words selected to constitute the request.

For some words, this number can reach tens or even hundreds of thousands or millions, whereas for other words it is a very small number (a few tens, or even a few units). If the user submits terms that are very general, the result will be extremely high; conversely, Sif the user is very precise, then the result will be much CI smaller.

The first generations of search engines operated by Z analyzing the indexed pages on the basis of the criterion S 5 of an exact, literal match with the word provided in the request.

That led users to proceed by successive approximations, thereby increasing the number of requests, with the terms presented to the search engine being modified in those requests, with certain terms being added or removed, etc., in a manner that was completely empirical and without any certainty that the attempted modifications might contribute to causing the engine genuinely to converge on the topic of the search.

In addition, experience shows that users are poor at evaluating the discriminating power of words for computer search engines in their own search processes, thus leading to results that erratic and poorly controlled.

In addition to the risk of pertinent responses being omitted, that also results from a technical point of view in pointless multiplication of the number of requests, with a corresponding increase in traffic on the network, and for the engine a considerable increase in the resources needed to process the stream of requests in satisfactory manner and in reasonable time.

The ever increasing presence of the Internet and the use of search engines by the general public, and also the concentration of requests on fewer and fewer engines that are considered as having the best performance, makes it likely that a considerable increase in computer resources will be necessary for those search engines in the years to come.

Those computer resources relate both to the computation power needed for processing the requests proper, and also for managing queues, that present complexity and storage space requirements that increase as a function of increasing incident traffic.

O Present developments are tending towards searching CI not on the basis of an exact match with a word, nor even with variants of the word (singular/plural, form of Z declination, variant spelling, etc.), but on words that are related, i.e. that belong to the same semantic field as the initial word, so as to increase the search potential and make it possible to converge more quickly In on the topic of the request.

One such algorithm is proposed under the name Google C- 10 Sets (trademark of Google Inc.), which is predictive software serving to generate words (common nouns or Sproper nouns), e.g. a list of fifteen words, on the basis of only two or three words given by a user, with the list that is given being supposed to belong to the same semantic field as the words given by the user. For example, the user types in the names of two or three US states, and the generator produces a list of all the states of that country. In the same manner, it can generate the names of the presidents of the United States, of a company, etc., starting from two or three examples given thereto. The software is perfectly capable of distinguishing between {Johnson Pfizer} which should give a list of pharmaceutical companies, and {Johnson Washington} which would give a list of the names of presidents of the United States.

WO--A--2004/031916 (Google Inc.) describes an algorithm of that type, for selecting clusters of words that are conceptually related to one or more words constituting the input parameters to the algorithm. The clusters are selected from a complex probabilistic model, so as to characterize, from the semantic point of view, the document from which the initial words are extracted.

However that algorithm is essentially a generator of words or of clusters of related words; it does not seek to evaluate the mere semantic proximity relative to an initial word of a given word coming from an external source.

O There thus exists a need for a tool capable of CI evaluating the semantic pertinence of a given word (referred to below as an "additional word") or of a

O

Z family of words considered collectively, relative to S 5 another given word (referred to below as the "starting word"), representative of the general semantic field of the search.

The invention proposes such a tool that can be used with words (starting words and/or additional words) produced by an operator or automatically by a word generator, or else selected and submitted by a user.

These words constitute input parameters to the algorithm implemented by the method of the invention which does not set out to produce words, but to evaluate their relative pertinence from the semantic point of view.

Another object of the invention is to propose such a method that makes it possible in non-empirical manner to improve the process whereby interrogation requests applied to a search engine are processed, in particular giving the following results: reducing processing time, and correspondingly accelerating the final response to the user; reducing the overall number of requests, thus enabling the management of queues and the associated storage resources to be optimized; and reducing overall traffic, thus making traffic more fluid.

In addition, as well as these concrete and quantitative improvements directly associated with the operation of the search engine and of the network, the user will incidentally benefit from an improvement in the quality of the search, since the method of the invention can give the user a new browsing heuristic, replacing the present method that is empirical and uncertain in the results it achieves.

These improvements can be of benefit in particular to people using documentation databases, in applications such as managing spare parts, or archive documents, etc., CI and specifically, according to a particular characteristic of the invention, these improvements can Z be of benefit by optimizing access to the references that S 5 are most used.

Incidentally, the method of the invention can even be used for teaching or entertainment purposes, for i example helping a user when confronted with a given starting word to look for additional words that are the c- 10 most pertinent from the point of view of their semantic distance. The method can be used to return to the user, for each submitted additional word, a quantification of its pertinence relative to the starting word. The method can also be implemented between a plurality of users, e.g. to compare their respective abilities at looking for pertinent words.

More precisely, the method of the invention is a method implemented by a computer system of the general type corresponding to the precharacterizing portion of claim 1, and comprising the specific steps set out in the characterizing portion of said claim 1. The subclaims relate to particular, advantageous implementations of the method.

0 Various implementations of the invention are described in greater detail below with reference to the sole accompanying figure, which is a block diagram of a computer system suitable for implementing the method of the invention.

0 In Figure 1, reference 10 designates computer terminals, each comprising a microcomputer 12 connected via an interface 14 to a telecommunications network 16, which may be the wired telephone network (whether in switched mode or in ADSL mode), a cable TV network, or indeed an Internet connection via a server that is common to a plurality of stations.

O The system may also have some number of cell phone CI terminals 20 connected by a wireless link to an interface of the mobile telephone switching type. The telephone Z terminals 20 are provided with functions enabling them to S 5 exchange data in digital form, in particular text, with the interface 30 using a variety of well-known technologies such as SMS, WAP, GPRS, UMTS, etc. The user keys in data that is sent to the interface 30, and receives messages therefrom that are displayed on the screen of the telephone.

The system also includes a central computer site typically an Internet site. Functionally, the site comprises a unit 42 forming a search engine (or coupled to a remote search engine), associated with a formatting unit 44 having two subunits 46 and 48 capable of formatting messages as a function of the type of terminal used. More precisely, the subunit 46 formats messages so that they can be received and displayed by the microcomputers 12, e.g. in the form of web pages displayable by a browser, and then it sends them to the network 16 over a connection 50. In contrast, the subunit 48 performs formatting adapted to display on mobile telephones, e.g. in the form of WAP pages sent to the interface 30 via a connection 52. The interface is also connected to the unit 42 via a connection 54 enabling the data received from the telephone terminals connected to the network to be transferred to the unit 42. It should be observed that the content of the messages formatted by the units 46 and 48, i.e. the content of the messages exchanged respectively with the computer terminals 10 and the telephone terminals 20 is identical; the only change lies in the layout with which the information is to be displayed by one or the other type of terminal.

It should be observed that the invention can be implemented by computer systems other than the system described above, which presents no limiting character.

0 In particular, the method of the invention can be CI implemented either by a dedicated device, or it can be integrated in a pre-existing application such as a Z browser or a spreadsheet by a software module or by S 5 macro-instructions including a specific function serving to execute the steps of the method described below. The formulae and the macro-instructions of a spreadsheet such T as Excel (trademark of Microsoft Inc.) driven by a macrolanguage such as QuicKeys (trademark filed by CE Software Inc.) is particularly suitable for executing the various C calculations and classifications described below, and Salso for presenting the results and enabling the user to input various parameters and data.

There follows a description of the manner in which the computer system implements the invention.

For simplicity and clarity of description, the description below assumes that the various terms being analyzed are words selected by people (users) who input these words, e.g. via a computer keyboard or a mobile telephone for transmission to the central site, and also receive from central site the results of the processing which they then make use of themselves for their own purposes. However, the invention is equally applicable to configurations in which the terms are produced by word-generator algorithms, as in above-mentioned WO-A- 2004/031916, which algorithms are capable of automatically delivering clusters of words that are conceptually linked by applying a probabilistic model.

Similarly, the invention is applicable to situations in which the results delivered by the method of the invention are reused in automatic processing, e.g. to optimize searching for groups of words belonging to a common semantic field, for comparing the relative pertinence of various groups of words, for improving the convergence of a search, etc.

In other words, the method of the invention can be implemented in full or in part with a "user" who is not necessarily a physical person, i.e. the method can be CI interfaced equally well upstream (parameter input) and downstream (making use of the results) with a distinct Z device that performs automatic processing on all or some of the data.

In general, the starting point of the invention lies on the observation that a search engine provides not only In links to all of the pages containing a given word, but also provides additional information, itself of very 1 0 great value, namely the number of pages on which the word appears. This information generally appears under the Sheading "Results" as returned by the engines.

For example {capacitor} gives a result of 1,140,000 occurrences (by convention, the curly braces used in the text of the present description specify the term or the series of terms submitted to the engine). This result is better than the result for {multiplexer} (188,000), but smaller than that for {resistor} (2,020,000) A more refined function of engines consists in searching not for occurrences of a single word, but of two, three, or more words that are united in a "phrase" in a single request. For example {resistor inductor} gives 104,000 occurrences. From which it can be deduced that of the 2,020,000 pages that contain {resistor}, only about one in twenty also contain the word {inductor}.

It is this property of engines that the method of the present invention exploits specifically.

Essentially, the basic idea of the method of the invention consists, after selecting a first word (the "starting word"), in finding another word (the "additional word") that produces the greatest result.

Thus, the starting word {transport} gives about 48,000,000 occurrences. It is then up to the user to find an additional word which, in combination with {transport} will produce the largest possible number of occurrences.

O For example, {beach} is a poor choice: the engine CI finds only about 2,300 occurrences for the phrase {transport beach}. Whereas {airplane}, {subway}, and Z above all {train} provide much better results: Word(s) Result V transport 48,000,000 transport airplane 585,000 transport subway 1,250,000 CI transport train 2,800,000 transport journey 1,140,000 transport car 611,000 It can be seen that it is {train} that is the "best" word, in that its result is greater than the result for {subway} and {car}.

However, when looking a little deeper into what appears to the user to be the lexical field of {transport}, it is possible to find something better than {train}: Word(s) Result transport bus 3,400,000 I.e. {bus} gives a result that is better than that for {train}. It is thus finally the word {bus} that is the "best" word at least for these six attempts.

A composite term in the form of a combination of terms, e.g. {"railway line"} in the above example can be deemed to be an additional word constituted by a single term (where the quotation marks inside the curly braces correspond to the syntax used for interrogating engines, i.e. the engine will search for occurrences of the nonseparable string of words between the quotation marks) Advantageously, prior to delivering the result of the phrase containing the selected additional word, the O method provides for verifying the result of the CI additional word considered in isolation, and for refusing the additional word if, on its own, it presents a result Z that is greater than that of the starting word. This serves to avoid the method being biased by selecting as additional words terms that are relatively meaningless, such as an article or an adverb, etc., which would In (often) give a large result.

The running of the method can be expressed in C- 10 particular by the simplified metalanguage flow chart given below, which is given purely by way of illustration S(SW stands for starting word, AW stands for additional word): input (SW) get result (SW) count 0 while not STOP input (AW) test result (AW) result (SW) get result (MD AW) if result (AW) Highest then Highest result (AW) count count 1 if count 10 then END wend The result may be presented either as a number of occurrences for the combination {(starting word) (additional word)}, or in the form of a ratio of that result divided by the result for the starting word in isolation.

Thus, in the following example corresponding to requests concerning key components in electronic engineering (the above-mentioned ratio being multiplied by 100 to make it more readable): Word(s) Result Ratio*100 engineering 1,100,000 100.0000 engineering memory 595,000 54.0909 engineering register 159,000 14.4545 engineering relay 150,000 13.6364 engineering meter 71,100 6.4636 engineering buffer 54,700 4.9727 engineering integrator 54,300 4.9364 engineering decoder 44,200 4.0182 engineering shielding 41,400 3.7636 engineering multiplexer 15,500 1.4091 engineering schottky 936 0.0851 For a ratio of less than 19 (result less than 0.1), the additional word is considered as being "off subject", not pertinent, and not considered as belonging to the semantic field of the starting word.

This property can be used in particular in a variant seeking to find words for which the relative semantic distance is the greatest, for example to isolate certain words for which it is desired to establish with certainty and automatically that their probability of belonging to the same semantic field as the starting word is tiny.

In other words, under such circumstances, the method is no longer based on the "sense" of words, but on the contrary on "non-sense".

In yet another variant, instead of initiating the method with a starting word constituted by an isolated term {engineering}), it is possible for the starting word to be a combination of words (e.g.

{engineering memory}).

Thus, after a first execution of the method begun with an isolated starting word, it is possible to use one of the combinations {starting word, additional word} that have been submitted to the search engine, and to run the 12 algorithm again with this particular combination of words CI as the new starting word.

Weighting may optionally be applied to take account Z of the number of attempts, with the result being 5 decreased by an amount that increases with increasing number of attempts. Conversely, a bonus could be applied if the "best" word is found on the first attempt.

In a particular implementation, for a given starting word {transport}), the central site selects a plurality of words {bus}, {train}, and {subway}) and presents these words to the user, without giving the results and in an arbitrary order, asking the user to select amongst those words the word which the user believes will give the highest result, or in a variant the best two or indeed the best three results, i.e. under such circumstances the method amounts to classifying the words by order of decreasing result. It is also possible to ask the user to select or classify three, four, or five words amongst ten or twenty words proposed.

The responses may be weighted by a bonus that increases with increasing pertinence of the selection and/or the classification, and vice versa.

In another aspect, the invention can be applied to analyzing not a succession of words considered in isolation, but a set of words considered together. It is difficult to evaluate intuitively, and a fortiori to quantify, the overall pertinence of a group of words, and to distinguish a group of words that are close to the sense, from a group of words that are far from the sense (where the "sense" is defined by the starting word).

The idea is then to estimate, relative to a given starting word, the relative overall pertinence of two (or more) lists of n words n 10 words), i.e. to determine which is the richest, semantically speaking.

To do this, a "belonging index" is calculated for each list, which index is representative of the pertinence of the list relative to the starting word.

13 O This belonging index then makes it possible to compare CI the various lists with one another and also to eliminate O or retain a given list by comparing its index with a Z given threshold, which threshold may itself be parameterizable as a function of the severity required for the discrimination.

There follows an example of this technique being implemented given with reference to following Table 1.

Table 1 Relative Starting word Additional word Results Threshold electronic 100.0000 electronic memory 18.6754 electronic relay 13.0293 electronic register 12.5950 electronic counter 11.8350 electronic decoder 7.2096 electronic integrator 4.1151 electronic shielding 2.6602 electronic buffer 1.6830 electronic multiplexer 0.9522 electronic zener 0.5364 Average of 10 words: 7.329 Belonging index (MAX/AV): 2.55 Electronic 100.0000 9,210,000 1,720,000 1,200,000 1,160,000 1,090,000 664,000 379,000 245,000 155,000 87,700 49,400 [too high list not pertinent] Electronic Electronic Electronic Electronic Electronic Electronic Electronic Electronic Electronic Electronic comparator gate resistor amplifier choke diode decoder radiator transformer capacitor 19.0011 17.5896 13.8979 13.4636 11.4007 7.4593 7.2096 5.5809 5.1574 4.8317 9,210,000 1,750,000 1,620,000 1,280,000 1,240,000 1,050,000 687,000 664,000 514,000 475,000 445,000 Average of 10 words: Belonging index (MAX/AV): 10.559 1.80

LIST

PERTINENT

O In this example, the starting word is electronic.

CI From this starting word, the search engine gives a result of 9,200,000 occurrences.

Z The first step consists in establishing two lists of additional words, e.g. two lists of ten words each.

In the example of Table 1, the first list comprises the words: memory, relay, register, multiplexer, In zener. Similarly, the second list comprises the words: comparator, gate, resistor, transformer, capacitor.

c- 10 The following step consists in classifying the various additional words in each list by order of Sdecreasing result. The two lists of Table 1 are presented in this way, with results being expressed as a percentage (18.67%, 13.02%, relative to the result for the starting word (100%) The following step consists in calculating the average of the ten words (or of the ten best words if a larger number of words were tested), i.e. 7.329% for the first list and 10.559% for the second list.

This average could be used as the belonging index, to evaluate the pertinence of the first list (average: 10.559) compared with the second list (average: 7.329) However, experience shows that in practice that criterion does not always optimize the evaluation of the relative pertinence of the list.

The invention proposes further improving discrimination by applying weighting.

This weighting can in particular be a function of the result of the highest result for the various words in the list, i.e. a function of the result for the "best" additional word.

For this purpose, the belonging index may be defined as being the quotient of the result obtained with the best additional word in the list (MAX) divided by the average for all of the words in the same list (AV) In the example of the two lists of Table 1, for the first list the belonging index as calculated in this way is: 18.6754 7.329 2.55, whereas for the second list ri it is: 19.0011 10.559 1.80.

The index as calculated in this way is a number Z greater than 1, and the pertinence of the list is greater for decreasing value of the index (closeness to and vice versa. It should be observed that if the index were to be calculated in the reciprocal manner, i.e. AV/MAX, In then its value would lie in the range 0 to 1 and high pertinence would be represented by a high value close to C- 10 unity.

In any event, a value close to unity indicates Sstrong pertinence, representative of words all presenting high scores, close to the maximum score.

Advantageously, the belonging index is compared with a given threshold that is set at a function of the looked-for selectivity: a threshold close to unity corresponds to severe requirements for the pertinence criterion, whereas a threshold that is greater will enable a larger number of lists to be considered as being pertinent.

With the example of Table 1, it can thus be seen that if the threshold is set at 2.5, then the first list will be considered as not being sufficiently pertinent (its belonging index is 2.55), whereas the second list is clearly more pertinent (belonging index: 1.80) Most advantageously, the threshold can be set by the user, e.g. via a graphics interface including arrows or a cursor of a linear potentiometer.

The fact of the pertinence criterion being satisfied or not, i.e. the position of the belonging index for each list relative to the threshold, can be displayed on the same interface in a manner that is immediately perceptible in the form of a Boolean flag, for example a change of color or displaying the word "PERTINENT" in a box (as shown in Table 1).

Using the example of Table 1, setting the threshold at 2.5 causes the message "PERTINENT" to be displayed for C the second list, and for the second list only, since the CI belonging index of the first list (2.55) is too high relative to the threshold such that the first list O Z is considered as being not pertinent.

If the user modifies the threshold, e.g. by raising it from 2.5 to 2.6, then the message "PERTINENT" will be displayed for the first list also. Conversely, if the user lowers the threshold, e.g. from 2.5 to 1.7, then the message "PERTINENT" will disappear from both lists.

Such an interactive display gives the user the possibility of responding directly, because the result that is achieved can be seen immediately, varying depending on how the threshold is set, thus enabling the method of the invention to be implemented in particularly effective manner.

In a more elaborate variant, the weighting of the belonging index can also take account of the difference between the result for the starting word and the correspond average of the various words in the list.

Such weighting serves in particular to take account of the fact that for a starting word that gives an initial result that is low (in the absolute), it is easy to find additional words that give results that are close (in relative value) to the initial result, leading to an index that is relatively favorable, due to the small standard deviation of the additional words. In contrast for a starting word that gives an initial result that is high, it is difficult to find additional words that give results that are close to the initial result, thus leading to an index that is relatively mediocre. Taking account of the difference between firstly the result for the starting word and secondly the average of the results for the additional words reflects this increasing difficulty and enables a weighted index value to be given that is more meaningful.

It is also possible to take account of the number of words in the list, considered as a variable and not as a O parameter that is frozen from the start. Under such CI circumstances, the belonging index is weighted so as to appear better for increasing number of words in the list: Z it is easier to find a pertinent list that is short than one that is long, since there is then an increase in probability of finding terms that are not very pertinent, and that will cause the average to drop. This increased In difficulty is taken into account if the index is weighted by the number of additional words in the list.

c- 10 By way of example, an index of this type could be calculated from the following formula: SINDEX [(MAX/AV)]*N MAX being the result obtained with the best additional word; .AV being the average of the results obtained for all of the words in the list; and SN being the number of words in the list.

In general, there are numerous ways of implementing the various weightings described above.

By way of non-limiting example, it is possible to calculate a belonging index from the following generalized formula: INDEX [1/(SW MAX)]*[l/(SW AV)]*N SW being the result for the starting word; MAX being the result obtained with the best additional word; AV being the average of the results for all of the words in the list; and N being the number of words in the list.

In this index: the factor [1/(SW MAX)] is representative of the quality of the various additional words found; it reflects the difficulty in establishing a pertinent list when the "best" additional word gives a high result; .the factor [1/(SW AV)] is representative of the uniformity of the results for the various additional words; it reflects low dispersion of these words around the average; and o the factor N corresponding to the length of the Z list, reflects the more or less verbose nature of the process for generating additional words.

This weighted index makes it possible to classify a plurality of lists of words relative to one another, from the most meaningful to the least meaningful, i.e. from the richest to the poorest, semantically speaking.

Very many variant ways of calculating the index could naturally be envisaged, for example starting from the median value instead of the average value, by introducing non-linearities by raising certain terms to an integer or a fractional power, by applying thresholds or ceilings, etc., so as to optimize the pertinence of the index as a function of the results of practical experiments.

In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word "comprise" or variations such as "comprises" or "comprising" is used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention.

Claims

2. The method of claim 1, further comprising, after step a step of: g) allocating a belonging index in application of a CI rule that is a function of said N results; said belonging index being an overall index Z allocated to the family of N generated additional words; and said belonging index also being a relative index, allocated in consideration of the starting word. In
3. The method of claim 2, in which said belonging index C- 10 is an index that is a function of the average (AV) of the results for the additional words.
4. The method of claim 2, in which said belonging index is an index weighted by the value (MAX) of the result corresponding to the additional word giving the highest result. The method of claim 2, in which said belonging index is an index weighted by the value (SW) of the result corresponding to the starting word.
6. The method of claim 2, in which said belonging index is an index weighted by the number N of additional words.
7. The method of claims 3 and 4 taken in combination, in which said belonging index is an index that is a function of the quotient (MAX/AV) of the result (MAX) corresponding to the additional word giving the highest result divided by the average (AV) of the results of the additional words.
8. The method of claims 3 and 5 taken in combination, in which said belonging index is an index that is a function of the difference (SW AV) between the value (SW) of the result corresponding to the starting word and the average (AV) of the results for the additional words. L 22 O 9. The method of claims 4 and 5 taken in combination, in CI which said belonging index is an index that is a function of the difference (SW AV) between the value (SW) of the Z result corresponding to the starting word and the value (MAX) of the result corresponding to the additional word giving the highest result. The method of claims 3 to 6 taken in combination, in which said belonging index is an index that is a function C- 10 of the expression: [1/(SW MAX)]*[l/(SW AV)]*N SW being the result for the starting word; MAX being the result obtained with the best additional word; .AV being the average of the results for all of the words in the list; and SN being the number of words in the list.
11. The method of claim 2, further including, after step a step of: h) comparing the belonging index with a given threshold, and producing a Boolean flag of value that reflects whether or not the threshold has been crossed.
12. The method of claim i, in which the computer system further includes: Sat least one terminal (10, 20) suitable for presenting data to a user and enabling the user to provide data to the system, the terminal including said means for submitting requests to the search engine and for receiving in response a numerical result; and a central site (40) coupled to the terminal or to each of the terminals, and to the search engine.
13. The method of claim 12, in which step a) of determining the starting word is a step of a user selecting a word from a predefined set of words. CI 14. The method of claim 12, in which step a) of determining the starting word is a set of the central O Z site selecting a word from a predefined set of words. The method of claim 12, in which the system is implemented by a plurality of users and said additional i words generated in step c) are words provided by the users to the system, each user providing at least one respective additional word.
16. The method of claim 12, in which the starting word and the additional words are generated by the system, and the method further includes, after step the following steps: i) the central site presenting the user with N additional words in an arbitrary order; j) the user establishing an estimated classification of M words selected from the N additional words presented, where M N, and communicating the estimated classification to the central site; and k) allocating a pertinence score as a function of the proximity between the estimated classification as communicated by the user and the real classification as determined in step f).
17. The method of claim 12, in which said terminal is a cell phone and said central site (40) is a remote site connected by telecommunications means (30, 52, 54) to a plurality of said terminals.
18. The method of claim 12, in which said terminal is a personal computer and said central site (40) is a remote site connected by telecommunications means (14, 16, 50) to a plurality of said terminal sites.