WO2002041174A1 - Procede de recherche, de selection et de representation cartographique de pages web - Google Patents
Procede de recherche, de selection et de representation cartographique de pages web Download PDFInfo
- Publication number
- WO2002041174A1 WO2002041174A1 PCT/FR2001/003561 FR0103561W WO0241174A1 WO 2002041174 A1 WO2002041174 A1 WO 2002041174A1 FR 0103561 W FR0103561 W FR 0103561W WO 0241174 A1 WO0241174 A1 WO 0241174A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sites
- pages
- links
- intersite
- site
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- the present invention relates to browsing the Internet and more particularly the search for web pages in relation to a search equation.
- search engines search engines
- directories are used to find web pages from a classification of pages made manually by human operators.
- Search engines are computer “robots” that crawl all the pages of the Web and make it possible to search for Web pages starting from a search equation, and thus to "find your way” in the gigantic set of Web sites which represents the Internet.
- various tools such as Alta Vista, Yahoo !, Lycos, Excite, Google ... having a large computing power are accessible to the public from any microcomputer provided with a means of connection to the Internet network and browser software.
- a search engine consists of one or more computers with a large database in which millions of web pages are indexed, which is continuously enriched and updated by raids by the search engine on the Web. .
- the information stored in the database generally includes the address (URL) and content of the page, the title and keywords describing the website to which the page is linked, the popularity index. of the page (indicator constructed from the number of web pages designating the page by hypertext links), the addresses of the web pages designated by the hypertext links contained in the page, etc.
- a search engine selects from its database relevant web pages by applying various selection criteria which may vary from one search engine to another but are generally based on the number of occurrences of the terms of the search equation in the pages examined, their position in the pages, the analysis of tags (keywords present in the pages, title of the pages ...) and 1 ' page popularity index.
- the search result is returned as a list of Web pages, each page being presented to the user in the form of a hypertext address (URL) often accompanied by other information such as a summary of the page, the position of the keyword (s) in the search equation in context within the page,. :.
- a notable disadvantage of the search engines is that the list of Web pages returned to the user is generally very long and can comprise several hundred pages arranged in an order of relevance which proves in practice rarely satisfactory. The user is thus forced to read the information provided with the address of each page and, in most cases, to "visit" a large number of pages from the list offered before finding the one he is looking for or the one that interests him the most.
- a general objective of the present invention is to provide a method making it possible to reduce the number of Web pages presented to a user in response to a search equation, which is simple to implement while being statistically reliable as regards relevance. retained pages.
- a more particular objective of the present invention is to provide a method for selecting web pages from an initial set of pages which can include a very large number of web pages selected by means of one or more search engines.
- the present invention is based on the premise that a page designated by many other pages and / or designating many other pages is likely to be more relevant than a single page unrelated to the other pages of the Web.
- the analysis of the hypertext links existing in a set of web pages being complex to implement and requiring a considerable computing power, a first idea of the present invention is to reduce an initial set of web pages to a first set of sites Web in which the sites are linked by intersite links.
- Another idea of the present invention is to apply filtering based on intersite links to the websites of such a set of sites, in order to obtain a result set comprising a reduced number of sites, forming one or more . kernels of the initial set.
- the present invention provides a method for searching and selecting web pages in relation to a search equation, comprising a step of determining, via at least one search engine, an initial set of web pages, a step of determining a first set of websites comprising sites corresponding to the web pages of the initial set, in which sites are linked by intersite links, a site being linked to another site by an intersite link) when there is at least one hypertext link between web pages of the two sites considered, and at least one filtering operation based on intersite links, applied to the first set of sites and comprising the elimination of sites linked by less than N L intersite links to the other sites of the first set of sites, N being a filtering parameter at least equal to 1, to obtain at least a first reduced set of sites comprising at least one core of rank N L of the first set of sites.
- a site is linked to another site by an intersite link and only one when there are several hypertext links of the same direction between web pages of the two sites considered.
- a site is linked to another site by an intersite link and only one when there are hypertext links of opposite directions between web pages of the two sites considered.
- the filtering operation is done by leaf stripping and includes the repetition of a step of eliminating sites "connected by less than N intersite links, for increasing values of N starting with an initial value N 0 and at least up to the value N L , which defines a filtering depth.
- the method comprises at least a second filtering operation applied to the first set of sites from which the sites belonging to the first set are removed. of reduced sites, to obtain at least a second set of reduced sites comprising nuclei of lower ranks formed by sites linked by less than N L intersite links.
- the method comprises a weighting step intersite links from the first set of sites, consisting in assigning a specific weight to each intersite link.
- the method comprises the weighting of the sites by assigning to each site a weight equal to the sum of the weights of the intersite links presented by the site in question.
- the weighting of an intersite link comprises a step consisting in assigning a determined weight to the hypertext links connecting the respective pages of two considered sites, and a step consisting in summing the weights of each of the hypertext links which underpin the cross-site link.
- the weighting of an intersite link is a function of the rank of the nucleus or of the nuclei to which the sites linked by the intersite link belong.
- the method comprises a step of prioritizing the sites as a function of the weights of their intersite links.
- the method comprises a step of presenting, on a display means, sites of at least one reduced set of sites or pages of the initial set of pages belonging to sites of at least one reduced set of sites.
- the method comprises the presentation of websites on a display means in the form of interactive objects selectable by a user, the selection of a site object by a user triggering the display, in the form of selectable interactive objects, web pages belonging to the selected site and the initial set of pages.
- the method comprises the presentation of websites on a display means, with display of the intersite links in a visual form understandable for a user.
- the steps of determining an initial set of pages and a first set of sites include the steps of: searching for pages likely to be relevant with regard to a search equation, to form a first primary set of pages, determining the sites corresponding to the pages of the first primary set of pages, to form a first primary set of sites, searching for pages linked to the pages of the first primary set of pages and / or to the sites of the first primary set of sites by hypertext links, to form at least a second primary set of pages, to determine the sites corresponding to the pages of the second primary set of pages, to form at least a second primary set of sites, to merge the first and the second primary sets of pages to form the initial set of pages, and merge the first and second primary sets of sites to form the first in seems to sites.
- the second primary set of pages comprises pages designating pages belonging to the sites of the first primary set of sites.
- the second primary set of pages comprises pages designated by pages belonging to the sites of the first primary set of sites.
- the present invention also relates to a digital computer, programmed to execute the method according to the invention.
- the present invention also relates to a computer program recorded on a medium and loadable in the memory of a computer.
- digital containing program codes executable by the computer, arranged for the execution of the steps of the method according to the invention.
- FIG. 1 is a flowchart describing the general organization of the method of the invention
- FIG. 2 schematically represents the Internet network and illustrates an example of implementation of the method according to the invention
- FIG. 3 is a flowchart describing steps for forming an initial set of web pages and a first set of websites
- FIG. 4 schematically illustrates the method described by the flow diagram of FIG. 3
- FIGS. 5A to 5B illustrate a method according to the invention for determining intersite links and for weighting these links
- FIG. 6 illustrates a simplified example of a set of websites comprising sites linked by intersite links
- FIG. 7 illustrates a filtering method according to the invention
- FIG. 8 is a flow chart describing the filtering method according to the invention.
- FIGS 9A to 9C illustrate a step of cartographic representation of the result of a filtering according to the invention.
- FIG. 1 describes the general organization of the process of searching and selecting web pages according to the invention.
- Step 10 aims to form an initial set PPE of web pages from a search equation and step 20 aims to form a first set ESI of sites corresponding to the pages of the initial set PPE.
- step 25 the intersite links between the sites of the ESI assembly are determined.
- the method according to the invention comprises a filtering step called “filtering for the kernel search "which is applied to a set of websites referenced ES2, initially containing all or part of the sites of the ESI set.
- filtering we obtain a reduced set of sites ES2 'comprising a small number of sites forming one or several cores of the ESI set, the number of sites being a function of the topography of the first set of ESI sites on the one hand and the chosen filter depth on the other hand.
- filtering can make it possible to obtain several results, by modifying the filtering configuration or the topography of the starting set, so that one can obtain several result sets.
- this display consists of a presentation of the sites selected in the form of interactive site objects, with the possibility of viewing the web pages of the initial set of PPE by selecting the site objects by means of a pointer d screen, then select the web pages viewed to access these pages directly.
- Such an interactive presentation of the results constitutes an efficient and practical human-machine interface for finding sought-after Web pages, as will become clear later.
- the method according to the invention is executed by a microcomputer 10 which is connected to the Internet 20 and can access various engines and at various sites. Web. Three search engines El, E2, E3 and four websites ST1, ST2, ST3, ST4 are represented in FIG. 1, the site ST4 being a hosting site receiving sites STA, STB and STC.
- the microcomputer 10 conventionally comprises a central unit 11, a screen 12, a keyboard 13, a mouse 14 or any other means of controlling a screen pointer, as well as a means of connection 15 to the Internet network. like a modem or router.
- the central unit 11 comprises various elements not shown but well known to those skilled in the art, in particular a microprocessor, a random access memory RAM, a memory ROM and / or ELASH EEPROM receiving the operating system of the microprocessor, and a mass memory like a hard disk, receiving the microcomputer operating system and various application programs.
- the mass memory notably includes a navigation program on the Web and a program for searching and selecting websites according to the invention.
- This program is loaded into the hard disk of the central unit by means of a program medium, for example a CD-ROM or DVD-ROM 16.
- the program according to the invention can also be loaded into the central unit by through a private intranet. It could also, in the future, be downloaded via the Internet.
- each site represented ST1 to ST4 comprises a plurality of web pages 30 accessible directly by means of their addresses, called "URL" (Uniform Resource Locator).
- URL Uniform Resource Locator
- the address of a website generally constitutes the radical of the addresses of the pages of this site.
- the address of a website can be extracted from the address of a web page by searching for the radical of the address by means of a subroutine called "parser" (“parser”), per se known. of those skilled in the art.
- Such a parser reads the page address starting with its first letter until finding the first separator bar "/" after the two separator bars "//” in the root http (Hyper Text Transfer Protocol), which allows to extract the address of the site.
- http Hyper Text Transfer Protocol
- extracting the site address from the address of a page requires further parsing until the second separator bar after the http root, because the first radical of the address of the pages is the address of the hosting site which one does not wish to retain as a site address.
- Steps 10 and 20 respectively comprise steps 100 to 130 and 200 to 230 interleaved.
- Steps 100, 110 and 120 are steps for searching for web pages and steps 200, 210 and 220 are steps for extracting websites from the addresses of the web pages found in steps 100, 110 and 120.
- Steps 130 and 230 are steps for merging the results.
- the search steps 100, 110 and 120 are carried out by means of a search engine E, for example one of the engines El, E2, E3 represented in FIG. 2.
- step 100 the user formulates a question, or RI search equation, using the keyboard 13 of the microcomputer 10.
- the equation of search is sent to the search engine E ⁇ by the central unit 11 and conventionally comprises one or more combined terms (letters, words, numbers, symbols, etc.).
- the search engine E L returns the addresses of various web pages, forming a first primary set PI of web pages represented in FIG. 4.
- the pages of the set PI are extracted from the base of search engine data E ⁇ in a conventional manner, for example according to the number of occurrences of the terms of the search equation in the pages examined, their position in the pages and various other criteria which may differ from a search engine. looking to the next.
- the central unit extracts the addresses of the sites Si corresponding to the pages p ⁇ of the set PI, by the syntactic analysis method mentioned above, to form a primary set SI of websites.
- steps 110, 210 are in parallel with steps 120 and 220 ("option 2").
- the method according to the invention can indeed be implemented by executing only steps 110 and 210 or only steps 120 and 220. Steps 110, 210 and 120, 220 can also be combined.
- Step 110 includes a main step 110a and a complementary step 110b.
- the central unit sends to the search engine E L a series of requests R2a, each request being accompanied by the address of one of the sites Si of the primary set SI.
- Each R2a request is a request for communication of the addresses of the Web pages which designate by hypertext links at least one page of the site s ⁇ and which satisfy the search equation RI.
- the R2a request is for example formulated by means of a LINK ft command as follows:
- R2a LINK A ⁇ site address s ⁇ + ⁇ R1> - HOST ⁇ site address s ⁇ >
- the search engine E ⁇ For each R2a request, the search engine E ⁇ returns a list of addresses of web pages which designate a page of the specified site s ⁇ (accompanied information on these pages and the sites to which they belong). This list can of course be empty if there is no web page that links to the page concerned.
- the central unit has a second primary set of pages P2.
- the central unit sends to the search engine E a series of requests R2b each accompanied by the address of a page p ⁇ of the set PI.
- Each R2b request is a request for communication of the addresses of web pages which designate the specified page p ⁇ by hypertext links and which satisfy the search equation RI.
- the R2b request is for example formulated as follows:
- R2b LINK A ⁇ page address p> + ⁇ R1> - HOST ⁇ site address s -
- the central unit has a primary set P2 ′ which is exclusively constituted by pages which designate pages belonging to the set PI while satisfying the search equation.
- the set P2 ′ is included in the set P2 because the latter includes pages which designate pages of the set PI (set P2 ′) and pages which designate pages belonging to the sites of the set SI but which n 'do not belong to the set PI (set P2 minus set P2').
- the determination of the set P2 ′ during step 110b aims to make a distinction between two types of hypertext links, on the one hand those which point to pages of the set PI and on the other hand those which point only towards pages of a site of the set SI not belonging to the set PI. This distinction comes in a step of weighting of intersite links described below.
- step 120a could be omitted in an embodiment of the method of the invention in which one would not wish to note the hypertext links comprising an end point not belonging to the set PI.
- the central unit determines the addresses of the sites corresponding to the pages of the set P2, still by syntactic analysis, to obtain a second primary set S2 of websites.
- Steps 120 and 220 complete steps 110 and 210 and aim to extract pages designated by pages belonging to the sites of the set IF.
- Step 120 comprises a main step 120a during which the central unit sends to the search engine a series of requests R3a for the formation of a set of pages P3, and an additional step 120b during which the unit central sends to the search engine a series of requests R3b for the determination of a set of pages P3 '.
- the requests R3a and R3b are for example formulated by means of a LINEg command aiming to search for pages designated downstream by hypertext links:
- R3a LTNKg ⁇ site address Sj> + ⁇ R1> - HOST ⁇ site address s ⁇ >
- R3b LIMKg ⁇ page address p> + ⁇ R1> - HOST ⁇ site address s ⁇ >
- the set P3 comprises pages designated by pages of the set PI (set P3 ′) as well as pages exclusively designated by pages which belong to the sites of the set SI but which n 'do not belong to the set PI (set P3 minus set P3').
- step 120b could be omitted in an embodiment of the method of the invention where one would not wish to note the hypertext links comprising a starting point 'not belonging to the PI set.
- the central unit determines the addresses of the sites corresponding to the pages of the set P3 to obtain a primary set S3 of websites.
- the final steps 130 and 230 consist in merging the primary sets of pages and the primary sets of sites to obtain respectively the initial set of EPI pages and the first set ESI of websites. , which will serve as a basis for filtering.
- merge is meant the fact of adding the sets of pages and the sets of sites while eliminating duplicates.
- the set ESI is equal to the result of the fusion of the sets SI, S2 and S3 if options 1 and 2 are simultaneously chosen. Otherwise, the set ESI is equal to the result of the fusion of the sets SI and S2 when only option 1 is chosen or to the result of the fusion of the sets SI and S3 when only option 2 is chosen.
- the initial set of PPE pages Web calculated in step 130 is equal to the result of the fusion of the sets PI, P2 and P3, or to the result of the fusion of the sets PI and P2 or PI and P3.
- the central unit thus has, at the end of these search steps, a first set of ESI sites stored in the form of a matrix A comprising m columns and m rows, "m" designating the number of sites of the ESI together, so that the intersite links appear.
- a first set of ESI sites stored in the form of a matrix A comprising m columns and m rows, "m" designating the number of sites of the ESI together, so that the intersite links appear.
- an intersite link and only one is defined between two sites when there is at least one hypertext link between two pages of the sites considered, whatever the pages and whatever the orientation of the hypertext link.
- each of the sites si, s2, s3 is linked to the other sites by an intersite link, respectively L (l, 2), L (l, 3), L (2,3), because there are at least one hypertext link between two respective pages of each of the sites.
- An array A corresponding to the example of FIG. 5B is shown below by way of example.
- the central unit has an initial set of PPE pages stored in the form of a matrix B with n + m rows and n + m columns including the hyperlinks, "n" designating the number of pages of the whole PPE.
- the matrix B takes the form described below.
- the pages p (sl), p (s2), p (3) are anonymous pages which do not belong to the PPE set although they belong to one of the sites if, s2, s3 of l ESI package. Taking these pages into account makes it possible to take into account hypertext links having a page starting point or end point not belonging to the PPE set, these links having been highlighted by steps 110b and 120b described more high. This taking into account of such hypertext links intervenes on the one hand in the definition of intersite links (but in an optional way) and on the other hand in the mode preferred execution of the cross-site link weighting method described below.
- the method according to the invention is of course susceptible of various variant embodiments with regard to the definition of the intersite links and the definition of the EPI and ESI assemblies.
- a variant consists in extending even further upstream and even further downstream the search for pages linked to those of the primary set PI, by searching for the pages which designate the pages of the 'set P2 and / or P3 and the pages which are designated by the pages of one set P3 and / or P2, etc.
- the transformation of hypertext links into intersite links consists in defining two intersite links when there are between the two sites considered hypertext links of opposite directions.
- the sites si, s2 are linked by two intersite links L1, 2 and L2, l because there is at least one page of the site si which points to a page of the site s2 and at least one page of the site s2 which points to a page on the site if.
- This variant in the definition of the intersite links leads to a significant modification of the topography of the ESI assembly and is likely in certain cases to modify the result of the filtering step.
- a filtering applied to a set of sites of the type represented in FIG. 5B and a filtering applied to a set of sites of the type represented in FIG. 5C could therefore be combined in an embodiment of the invention in order to present the user with two complementary results. Filtering for the search of nuclei
- FIG. 6 schematically represents another example of the first, set of ESI sites, to which reference will be made in the following to illustrate the filtering step.
- the ESI assembly represented comprises a small number of sites Si for reasons of readability of the figure, and may in practice include several hundred or even several thousand sites.
- the ESI assembly is represented in the form of a graph comprising "vertices" (sites s t ) linked by undirected links which represent the intersite or "even" links.
- the filtering operation is applied to a set of sites ES2 which is initially chosen equal to the set ESI (step 300).
- a selection of sites among the sites of the ESI set may be provided before the beginning of the filtering operation, for example a selection made by applying a pre-filtering carried out by means of any other algorithm.
- Filtering consists in carrying out a sort of stripping of the whole ES2 and includes a step 301 consisting in eliminating the sites which are connected to the other sites by less than N intersite links, starting with an initial value N0, here fixed at 1 , which is then incremented.
- the filtering parameter N is incremented by one in step 304 and the sites comprising less than 3 links are deleted, for example the site s5 in FIG. 6, then the site s6.
- the central processing unit After a certain number of increments of the parameter N, the central processing unit reaches then exceeds the core of the assembly ES2, so that the latter no longer contains any site, which is detected in a verification step 303 which occurs before each step 304.
- the limit value N z for which there is no longer a site in the set ES2 is known.
- a limit value N L of the filtering parameter N is then calculated during a step 305 by means of the relation:
- N L ' N z -S
- S is a selectivity parameter defining the depth of filtering, the value of which is a natural integer.
- the sites eliminated during the "S" last filtering steps are reintroduced into the assembly ES2 during a step 306, to form a reduced assembly designated ES2 ', which is the result of the filtering.
- the parameter S is preferably chosen equal to 1, so that the reduced set ES2 ′ includes the highest ranking nucleus present in the set ES2.
- the set ES2 can comprise several independent cores each consisting of a group of sites linked together by N L intersite links, these cores possibly being able to be linked together by intersite links in a number less than N L.
- the reduced assembly ES2 ′ includes in this case all the cores of the same rank N L of the assembly ES2.
- FIG. 7 represents the assembly ES2 in the form of concentric layers.
- the reduced assembly ES2 ′ obtained at the end of the filtering operation is presented to the user during the display step described below.
- This filtering method according to the invention is susceptible of various variants and embodiments.
- an alternative to the method of searching for the nucleus is described by the attached Table 3B.
- This variant consists in replacing step 303 of detecting the empty assembly with a step 303 'of determining the complexity of the assembly ES2, and in stopping the filtering when the link density is sufficiently high.
- Link density can be assessed using the following DI complexity indicator:
- DI N IMK / 2 [N SIffi (N SIffi -l)]
- the filtering process is again applied to the ES2 set after having removed from the ES2 set the sites of the reduced ES2 set, ie the kernel (s) highlighted by the first filtering.
- This second filtering makes it possible to find one or more "sub-nuclei" or nuclei of lower ranks which have been eliminated during the first filtering, that is to say nuclei corresponding to a filtering depth N L 'which is less than that which made it possible to obtain the nucleus or nuclei of higher rank (N L ).
- N L filtering depth
- one or more nuclei of higher rank and one or more nuclei of lower rank can be determined.
- the filtering operation according to the invention does not require any complex mathematical calculation such as a matrix product, and can thus be carried out by a microcomputer of PC type of medium power .
- matrix A representing the links Intersite
- deleting a site during the filtering process consists of deleting the site from all the boxes in the matrix where it is mentioned, and deleting the line where the site is located as a reference site.
- each intersite link is assigned a weight equal to the sum of the hypertext links which underlie the intersite link, in order to highlight the sites that are strongly linked together. It is advantageous to firstly assign a weight to each of the hypertext links which underlie an intersite link, then to assign to the intersite link a weight equal to the sum of the weights assigned to the hypertext links.
- This second method (equivalent to the first when we assign an equal weight to each hypertext link) makes it possible to refine the process of weighting intersite links by applying different values to the weights of the various hypertext links.
- the weighting of a hypertext link connecting two pages belonging to the primary PPE assembly is chosen to be stronger than the weighting of a hypertext link connecting two pages one of which does not belong to the PPE set.
- This second type of link was highlighted during the stages of formation of the PPE and ESI sets and appears in matrix B described above as an example (links between an anonymous page and a page of the PPE set, a so-called anonymous page that does not belong to the initial PPE package although it belongs to a site in the ESI package).
- a weight wl is assigned to the hypertext links which link pages belonging to the initial set of EPI pages and a weight w2 less than wl is assigned to a hypertext link whose starting or ending point is an anonymous page.
- the weight W (1.2) assigned to the link L (1.2) connecting the sites si and s2 is thus equal to:
- criteria which give or not value to these links.
- the criteria that can be retained let us cite as an example the age of a site and the number of pages that a site includes.
- a hypertext link connecting two pages has more "value" when one of the two pages at least belongs to a recent site than when the two pages belong to an old site.
- a hypertext link has more value when at least one of the two pages belongs to a site with a small number of pages than when the two pages belong to a very large site.
- the pages in Annex 1 and Annex 2 describe two examples of algorithms implemented by the central unit for the weighting of hypertext links and the weighting of intersite links.
- the weights wi, j assigned to hypertext links are weighted by linear combination of criteria such as the nature of the link, the age of the page and the size of the site.
- Cross-site links can also be weighted by the results obtained by filtering.
- the weights of the intersite links concerning the sites belonging to the nucleus or to the nuclei of higher rank are multiplied by a first value kl.
- the weights of the hypertext links between pages belonging to sites belonging to the core or to the highest-ranking cores are multiplied by the value kl.
- the weights of the intersite links between sites belonging to the nucleus or to the nuclei of lower ranks are multiplied by a value k2 less than kl.
- the weights of hypertext links between pages belonging to sites belonging to the nucleus or to nuclei of lower ranks are multiplied by a value k2 less than kl. This step is repeated for the nuclei of lower ranks, each time decreasing the corrective value k.
- these links can be weighted by a parameter k equal to the average of the values k assigned to the intersite links within each nucleus.
- the weighting of intersite links can also be transformed into a weighting of sites, by assigning for example to each site a weight equal to the sum of the weights of intersite links presented by the site in question.
- the weight assigned to the site s2 is equal to the sum of the weights W (2.6), W (2.5), W (2.4), W (2 , 3) and W (2, l) attributed to the links linking the site s2 to the other sites in the ES2 set.
- the step of weighting intersite links and / or of weighting sites has the advantage of allowing a new hierarchy of sites according to the weight of their intersite links (or according to their weight, if one chooses to assign weights to the sites).
- the sites' are not part of or the highest ranking cores have links Intersite weight higher than sites that are part of these nuclei, as they are connected to different cores different ranks.
- the nuclei being defined on the basis of the relationships that they maintain within them by ignoring the links that they possibly receive from other nuclei, taking into account inter-nucleus links makes it possible to refine site selection.
- a site belonging to a nucleus which has no relation with the other nuclei will be weakened compared to a site belonging to a nucleus of the same size but being in relation with other nuclei.
- the results are presented on the screen 12 of the user's microcomputer 10.
- the presentation of the result can be done in a conventional manner, for example in the form of a list of Web pages comprising in the first place the pages of the initial set EPI belonging to the sites of the reduced set ES2.
- this list can secondly include the pages of the initial set belonging to sites which belong to nuclei of lower ranks, for example the pages of the reduced set ES2 "and so on by decreasing each time the rank of the nuclei considered.
- this list presents the sites of the set ES2 by decreasing values of the weights of the intersite links, which in this case have been previously calculated and weighted as described above.
- the sites of the reduced set ES2 'and possibly other reduced sets comprising cores of lower ranks are presented in the form of selectable interactive objects, by simultaneously representing the intersite links between the sites in a form understandable by the user, for example in the form of lines.
- FIG. 9A represents the display of the result of a search made on the basis of the following search equation:
- the result of the filtering is represented in the form of site objects taking the form of selectable rectangles inside which the addresses of the sites are mentioned, the intersite links between the site objects being materialized by arrows.
- This method of graphic representation combined with the display of intersite links immediately shows the sites of the core of the ES2 set. Such a representation gives the graphic great clarity and immediately directs the user to the central sites.
- the number of sites linked by intersite links to the central sites is represented, for information, by a number surrounded in a circle. As can be seen in FIG.
- the interactive selection of a site brings up the web pages of the initial set of PPE which belong to the selected site, as well as information relating to these pages (a single page is shown in FIG. 9B because the selected site only comprises one page belonging to the initial set of PPE).
- the pages appearing after the selection of a site are themselves selectable objects for direct access to the content of the pages.
- Cross-site links are also interactive objects, the selection of which results in the display of information (not shown), for example the number of hypertext links which underlie the cross-site link or information on the sites linked by the selected link.
- Intersite links are represented by bidirectional arrows when they are subtended by hypertext links in opposite directions, or by unidirectional arrows when they are subtended by hypertext links in the same direction. Finally, the intersite links are presented with different colors to inform the user of the number of hypertext links which underlie them, black being for example reserved for intersite links comprising the largest number of hypertext links, the red reserved for intersite links including fewer hypertext links, etc.
- the color represents the weight attributed to the intersite links rather than the number of hypertext links under -jacents.
- link thicknesses an intersite link being more or less thick depending on the number of hypertext links which underlie it or according to their weight).
- Such a display is of course susceptible of various variants, the site objects being able to be represented in various forms, in a space in two or three dimensions.
- various options can be offered to the user in order to adjust the presentation of the results on the screen, in particular options relating to the filtering itself.
- the user can be offered the possibility of changing at any time the selectivity parameter "S" described above and / or the limit rank of the nuclei that he wishes to be displayed. This configuration of the filtering characteristics allows the user to increase or decrease the number of sites presented on the screen.
- steps 10, 20 and the filtering step are performed by the central unit of a microcomputer
- steps can also be performed by a search engine, for example one of the motors El, E2 or E3 shown in FIG. 1.
- a search engine for example one of the motors El, E2 or E3 shown in FIG. 1.
- the user's terminal is then relieved of the calculation and filtering and can take various forms other than that of a microcomputer, for example a mobile telephone or a television set connected to the Internet.
- the user's terminal in this case constitutes the "client" which transmits a search equation and receives in response the results of the filtering operation.
- the characteristics of the invention relating to the display of results in the form of site objects remain optional with regard to those relating to filtering, in particular when they cannot be brought into play. works for technical reasons, which is the case when the user performs a search using a device comprising only a small display, such as a mobile telephone connected to the Internet. In this case, a display of the results in the form of a list of websites can be envisaged, or even a conventional display of a list of web pages.
- the present invention provides a number of tools for analyzing and prioritizing an initial set of web pages having a determined topography, with a calculation time and reduced calculation means.
- These tools include working on websites linked by cross-site links, finding the core (s) of the set of websites, which may include finding the highest-ranking kernels up to low-ranking ones, the possible weighting intersite links, and the weighting of intersite links as a function of the rank of the nuclei to which the sites belong.
- Search for Web pages designating 120a Search for Web pages at least one page belonging to a site designated by at least one page of the set SI and satisfactory belonging to a site of the set SI the search equation and satisfactory the equation of
- 110b Search for Web pages designating at least one page of the PI set
- 120b Search for Web pages satisfying the search equation designated by at least one page of
- Step 210 Step 220
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002218366A AU2002218366A1 (en) | 2000-11-15 | 2001-11-14 | Method for searching, selecting and mapping web pages |
EP01996802A EP1334444A1 (fr) | 2000-11-15 | 2001-11-14 | Procede de recherche, de selection et de representation cartographique de pages web |
US10/436,599 US20040059732A1 (en) | 2000-11-15 | 2003-05-13 | Method for searching for, selecting and mapping web pages |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0014744A FR2816734B1 (fr) | 2000-11-15 | 2000-11-15 | Procede de recherche, de selection et de representation cartographique de pages web |
FR0014744 | 2000-11-15 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/436,599 Continuation US20040059732A1 (en) | 2000-11-15 | 2003-05-13 | Method for searching for, selecting and mapping web pages |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2002041174A1 true WO2002041174A1 (fr) | 2002-05-23 |
Family
ID=8856509
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2001/003561 WO2002041174A1 (fr) | 2000-11-15 | 2001-11-14 | Procede de recherche, de selection et de representation cartographique de pages web |
Country Status (5)
Country | Link |
---|---|
US (1) | US20040059732A1 (fr) |
EP (1) | EP1334444A1 (fr) |
AU (1) | AU2002218366A1 (fr) |
FR (1) | FR2816734B1 (fr) |
WO (1) | WO2002041174A1 (fr) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030131005A1 (en) * | 2002-01-10 | 2003-07-10 | International Business Machines Corporation | Method and apparatus for automatic pruning of search engine indices |
US7284195B2 (en) * | 2002-01-31 | 2007-10-16 | International Business Machines Corporation | Structure and method for linking within a website |
US7076477B2 (en) * | 2002-12-19 | 2006-07-11 | International Business Machines Corporation | Fast and robust optimization of complex database queries |
US7346839B2 (en) * | 2003-09-30 | 2008-03-18 | Google Inc. | Information retrieval based on historical data |
US7707265B2 (en) * | 2004-05-15 | 2010-04-27 | International Business Machines Corporation | System, method, and service for interactively presenting a summary of a web site |
US7904440B2 (en) * | 2007-04-26 | 2011-03-08 | Microsoft Corporation | Search diagnostics based upon query sets |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6112203A (en) * | 1998-04-09 | 2000-08-29 | Altavista Company | Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5694594A (en) * | 1994-11-14 | 1997-12-02 | Chang; Daniel | System for linking hypermedia data objects in accordance with associations of source and destination data objects and similarity threshold without using keywords or link-difining terms |
US6745181B1 (en) * | 2000-05-02 | 2004-06-01 | Iphrase.Com, Inc. | Information access method |
-
2000
- 2000-11-15 FR FR0014744A patent/FR2816734B1/fr not_active Expired - Fee Related
-
2001
- 2001-11-14 WO PCT/FR2001/003561 patent/WO2002041174A1/fr not_active Application Discontinuation
- 2001-11-14 AU AU2002218366A patent/AU2002218366A1/en not_active Abandoned
- 2001-11-14 EP EP01996802A patent/EP1334444A1/fr not_active Withdrawn
-
2003
- 2003-05-13 US US10/436,599 patent/US20040059732A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6112203A (en) * | 1998-04-09 | 2000-08-29 | Altavista Company | Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis |
Non-Patent Citations (3)
Title |
---|
CARRIERE S J ET AL: "WebQuery: searching and visualizing the Web through connectivity", COMPUTER NETWORKS AND ISDN SYSTEMS, NORTH HOLLAND PUBLISHING, vol. 29, no. 8-13, 1 September 1997 (1997-09-01), AMSTERDAM, NL, pages 1257 - 1267, XP004095322, ISSN: 0169-7552 * |
MUKHERJEA S: "WTMS: a system for collecting and analyzing topic-specific Web information", COMPUTER NETWORKS, ELSEVIER SCIENCE PUBLISHERS B.V., vol. 33, no. 1-6, June 2000 (2000-06-01), AMSTERDAM, NL, pages 457 - 471, XP004304785, ISSN: 1389-1286 * |
TERVEEN L ET AL: "Constructing, organizing, and visualizing collections of topically related Web resources", ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION, ACM, USA, vol. 6, no. 1, March 1999 (1999-03-01), pages 67 - 94, XP002173294, ISSN: 1073-0516 * |
Also Published As
Publication number | Publication date |
---|---|
FR2816734A1 (fr) | 2002-05-17 |
FR2816734B1 (fr) | 2003-03-14 |
US20040059732A1 (en) | 2004-03-25 |
EP1334444A1 (fr) | 2003-08-13 |
AU2002218366A1 (en) | 2002-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20090144240A1 (en) | Method and systems for using community bookmark data to supplement internet search results | |
KR101077699B1 (ko) | 검색 질의로부터 개념 유닛을 발생하기 위한 시스템 및 방법 | |
Van Zwol et al. | Faceted exploration of image search results | |
RU2324220C2 (ru) | Оснащение пользовательского интерфейса расширением поисковых запросов | |
JP4467791B2 (ja) | 情報管理及び検索 | |
FR2802671A1 (fr) | Methode, systeme et produit pour classer des resultats de recherche a l'aide d'un indice d'audience | |
FR2973134A1 (fr) | Procede pour affiner les resultats d'une recherche dans une base de donnees | |
US20020032677A1 (en) | Methods for creating, editing, and updating searchable graphical database and databases of graphical images and information and displaying graphical images from a searchable graphical database or databases in a sequential or slide show format | |
WO2003057648A9 (fr) | Procedes et systemes de recherche et d'association de ressources d'information telles que des pages web | |
JP2010541074A (ja) | 検索結果ページ上に対話要素を含めるためのシステム及び方法 | |
EP1368756A1 (fr) | Procede de navigation par calcul de groupes de documents, recepteur mettant en oeuvre le procede, et interface graphique pour la presentation du procede | |
WO2002073463A1 (fr) | Indexage d'entites numerisees | |
EP1184796A1 (fr) | Procédé de navigation associative dans des bases de données multimédia | |
EP1238323A2 (fr) | Procede de commercialisation de biens ou de services par des moyens electroniques sur des reseaux du type internet | |
FR3043816B1 (fr) | Procede de suggestion de contenus extraits d’un ensemble de sources d’information | |
CN105095175A (zh) | 获取截短的网页标题的方法及装置 | |
WO2002041174A1 (fr) | Procede de recherche, de selection et de representation cartographique de pages web | |
EP1170677A2 (fr) | Procédé et système de retourner des informations pondérées du contexte pour améliorer les résultats de récupération d'informations | |
JP5450135B2 (ja) | 関連度辞書を用いた検索モデリングシステムおよび方法 | |
WO2001077890A1 (fr) | Moteur de recherche de ressources hypermedia et procede d'indexation associe | |
FR2975553A1 (fr) | Aide a la recherche de contenus videos sur un reseau de communication | |
FR2917518A1 (fr) | Procede de tri d'informations | |
BE1013153A3 (fr) | Procede et systeme de prelevement d'information. | |
WO2020229760A1 (fr) | Procede d'indexation multidimensionnelle de contenus textuels | |
EP1408428A1 (fr) | Système et procédé de traitement et de visualisation des résultats de recherches effectuées par un moteur de recherche à base d'indexation, modèle d'interface et méta-modèle correspondants |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2001996802 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10436599 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2001996802 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001996802 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |