WO2002044946A1

WO2002044946A1 - Search engine

Info

Publication number: WO2002044946A1
Application number: PCT/JP2000/008430
Authority: WO
Inventors: Motoharu Mizutani
Original assignee: Kabushiki Kaisha Toshiba
Priority date: 2000-11-29
Filing date: 2000-11-29
Publication date: 2002-06-06
Also published as: JP3586272B2; KR100496384B1; JPWO2002044946A1; KR20020070293A; KR20050004274A

Abstract

The update date of an index page acquired by a circuiting robot (3) is compared with the update date of a linked Web site. If the update of the linked Web site is more recent, the update date of the index page is replaced with the update date of the linked Web site. A keyword extracted from the linked Web site is added to the keywords of the index page extracted by the circuiting robot.

Description

Prepared by the International Searching Authority as shown below.

Search engine

Technical field

The present invention relates to a search engine for searching data distributed on a network, a search system, a database creation method in the search system, and a storage medium. Background art

^{A ltavista (http:.. /} / Www aitalsta com Z), L ycos! (Http:.. / / Www lycos com), Y ahoo (http:.. / / Ww yahoo com / soil robots a There are a number of search engines on the network that have been used, which collect information on the network mechanically, called robots. Then, the collected data is converted into a database (a morphological analysis is performed on the page information). An index table is created and stored in a database) so that users can search for it.

The robot searches the network for text written in HTML (Hyer Text Markup Language) on the network, and searches for the link destination described in the text. For database databases that collect data that exists on the network, some perform full-text searches and others use titles. Only URLs and URLs There are also things to search for.

The above databases may be decentralized due to their large volume. However, this is simply a division for large amounts, and is not divided in any way.

In the above search, a keyword search is performed. In other words, search for the word that you would like to include in the text you want to find.

On the other hand, a mirror site can be set up to decentralize access to popular sites and reduce traffic. For example, the I_Server (http://www.pointcast.com/products/iserver, html) of Point Cast Network (PCN) periodically refreshes information to the PCN headquarters.ヱ Touch and manage the mirror site.

Conventionally, there have been the following problems in a search engine for data distributed on a network.

(1) It is becoming increasingly difficult to handle increasing data.For example, page data on the World Wide Web (WWW) will be more than 4.0 million worldwide in 1996. It is expected to increase exponentially in the future. At present, the number of pages and the amount of data per page tend to increase rapidly.In this way, the data that increases rapidly can be divided by simply dividing the amount of data. Database management is extremely difficult.

(2) Information that is updated less frequently tends to have less access is there . Pages that are updated infrequently tend to be out of date and have less access. For this reason, a search system that preferentially displays pages that are updated frequently is effective.

(3) Conventionally, when a domain or URL is registered in the search engine, the robot traverses the S domain or URL, and extracts the URL by patrol. When the search keyword is extracted from the issued file, the update date is obtained at the same time. Then, it determines the newness of the file according to the obtained update date, and prioritizes the display of search results.

In the case of index pages that are configured by frame tags, the index page is updated even if the linked page in each frame is updated. Unless the index page is updated, there is a problem that the update date will remain old and the search results will not match the content. In addition, in the case of a system that excludes pages that are updated infrequently from the search target, there is a problem that pages corresponding to frames are treated at a special disadvantage.

DISCLOSURE OF THE INVENTION ''

The purpose of this invention is to update the update date of a huge amount of search target data scattered on the database network, and to update the update date of the linked page. The search engine, search system, and data in the search system allow you to obtain accurate update frequency information by changing to the latest update date. The purpose is to provide a database creation method and a storage medium.

Another purpose of this invention is to provide database-based indexing. The keyword of the linked page can be added to the keyword of the linked page, and it can be added to a search engine, search system, or search system. The purpose is to provide a database creation method and a storage medium.

To achieve the above objective, the search engine of the present invention is an index page of information on the network, at least a URL (Uniform Resource Locator) or a domain, and a date of renewal. And a database that stores index pages including keywords and keywords, and traverses the database based on a specified domain or URL, and updates the index page and the index page. And a traveling robot that obtains the update date of the page on the linked website and uses the latest update date as the update date of the index page.

Further, the search engine of the present invention is an index page on a network, and at least URL (Uniform).

(Resource Locator) or a database that stores an index page that includes a domain and a keyword, and traverses the database based on the specified domain or URL. And a cyclic robot that acquires a keyword of the page to be linked from the index page, and adds the acquired keyword of the page to the keyword of the index page. .

Also, the search system of the present invention is an index page of information on a network, and at least a URL (Uniform). (Resource Locator) or a database that stores an index page containing the domain, date of update, and a keyword, and traverses the database based on the specified domain or URL. Gets the update date and the update date of the page on the linked website from this index page, and sets the most recent update as the update date of the index page. It is composed of a bot and a search engine for searching the database based on a specified keyword.

Further, the search system of the present invention is an index page on a network, and at least a URL (Uniform).

(Resource Locator) or a database storing an index page including a domain and a keyword; and traversing the database based on a specified domain or URL, and the index page described above; A cyclic robot that obtains a keyword of a page to be linked from an index page, adds the keyword of the obtained page to the keyword of the index page, and a specified keyword. And a search engine for searching the database based on the search engine.

Further, according to the present invention, an index page of information on a network, at least a URL (Uniform Resource Locator) or a domain, an update date keyword. Data in a search system that has a database that stores index pages that contain In the database creation method, the specified domain or

The database traverses the database based on the URL, and obtains an update date of the index page and an update date of a page on a website linked from the index page, and obtains the obtained update date. The feature is that the new update date is set as the update date of the index page.

Further, according to the present invention, an index page of information on a network, including at least a URL or a domain, an update date and a keyword. In a method of creating a database in a search system that has a database in which a database is stored and performs a database search in response to a search request, the database travels through the database based on a specified domain or URL, The key of the page to be linked is obtained from the index page and the index page, and the key word of the obtained page is added to the key word of the index page. This is the feature. According to the present invention, an index page of information on a network, including at least a URL (Uniform Resource Locator) or a domain, an update date, and a keyword is provided. A storage medium having a database storing index pages, and having a program for causing a computer to create a database in a search system for performing a database search in response to a search request.データベース前記データベースたデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベースデータベース更新更新更新更新更新更新更新. A procedure for causing a computer to execute a procedure for obtaining the update date of a page on the website and a procedure for setting the latest update date among the obtained update dates as the update date of the index page. The program is stored.

According to the present invention, an index page of information on a network includes at least a URL (Uniform Resource Locator) or a domain, an update date, and a keyword. A storage medium that has a database that stores index pages and that has a program for causing a computer to create a database in a search system that performs a database search in response to a search request. A step of circulating the database based on the domain or URL, obtaining a keyword of the index page, and a key word of a page to be linked from the index page power; A procedure for causing a computer to execute the steps of adding a keyword to the keyword on the index page. The gram is stored.

In addition, database patrol is performed for the same domain as the index page.

The index page and the link destination page are composed of a frame tag, and the latest update date of the page in the frame is set as the update date of the index page. Is done.

According to this invention, the updated date of the index page acquired by the traveling robot is compared with the updated date of the linked page, and the updated date of the linked page is newer. Replaces the update date of the index page with the update date of the linked page. In addition, the keyword extracted from the link destination page is added to the index page keyword extracted by the traveling robot.

The invention relating to each device described above is also valid as an explanation of the method.

In addition, the above invention relates to a machine-readable medium storing a program for causing a computer to execute a corresponding procedure or means. Holds true.

In the case of a robotic search engine, such as Flash I, the index page is mainly updated without updating the frame-compatible pages. Because the linked pages are updated on a frame-by-frame basis, they are treated as if they were updated very infrequently. According to the present invention, even with a frame-compatible search service, a search function similar to a non-frame-compatible page can be obtained.

In addition, from the viewpoint of improving the efficiency of the database, the larger the database capacity, the more pages can be searched, so that the amount of information increases. However, the hit rate also increases. However, if the number of registrations is increased indefinitely, the number of search pages for one keypad will also increase, so that those who search can also obtain necessary information from among them. It becomes more difficult to extract. According to the present invention, since search information can be collected in an index page, a brief description of a drawing that enables efficient search can be made.

FIG. 1 shows the configuration of a search engine according to an embodiment of the present invention. It is a figure showing an example.

Figure 2 is a diagram showing the structure of the index page. .

FIG. 3 is a flowchart showing the operation of the embodiment of the present invention.

Figure 4 is a flowchart showing the operations of the patrol robot, web server, and user.

FIG. 5 is a diagram showing an example of a screen for inputting a domain or URL to be registered.

FIG. 6 is a diagram showing an example of a registered URL screen.

FIG. 7 is a diagram illustrating a screen example when a keyword is input. FIG. 8 is a diagram illustrating a screen example of a search result obtained by a search engine.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the invention will be described with reference to the drawings. First, terms will be defined.

A page (page) shall mean a piece of noise / text. In the WWW world, one page has a unique URL.

URL (UniformResocLeccutor) is a notification necessary for accessing page data. URL includes protocol, domain name, port number, and path name information.

Mouth pots (Robots) include Hyper Text Markup Language (HTML) and Standard Generalized Markup Language. Heino, like age (SGML). — Reading documents written in text and collecting the documents on the network while mechanically extracting the links written there. However, it is realized by software. Layers with spiders instead of robots are sometimes called wanderers.

The basic operation of the robot is as follows.

(Step 1) Register the specified home page in the visiting list.

(Step 2) The robot acquires a page according to visiting 1 ist.

(Step 3) Analyze the acquired page and extract URL. ·

(Step 4) Add the extracted URL to the visiting list (however, do not duplicate the URL).

Thereafter, steps 2 to 4 are repeated. The acquisition frequency of the page may be determined according to the frequency of updating the page.

Next, the present embodiment will be described.

In this embodiment, a page is treated as an example of data distributed in a network.

FIG. 1 shows a configuration diagram of an entire search system including a search engine of the present invention. As shown in the figure, the network 1 is connected to web servers 9 and 11, a user PC 13, a search server 19 and a search engine 21. Search engine 2 1 is composed of a traveling robot 3, a database 5, and an engine 17. The traveling robot 3 accesses the registered domain and URL, obtains the update date, and extracts the keyword. Also, access the linked page, get the update date, and extract keywords. Register the acquired update date and extracted key words in the database 5. The database stores the index page power and the visiting list. As shown in FIG. 2, for example, the index page includes a URL, a keypad, and attribute information, and the attribute information includes an update date. Engine 17 searches database 5 based on the specified keyword. The search server 19 is, for example, a search server 19 typified by, for example, Informationek.

Next, the operation of the search engine of the present invention will be described with reference to FIGS.

First, it is assumed that the user has created a homepage including a frame and has uploaded it to web server 9 (11).

In step S1 of FIG. 3, the user registers a domain or URL. That is, on the screen of the user PC 13, for example, a domain or URL input screen (a registration screen of the service chain) as shown in FIG. 5 is displayed. The user enters a search domain or URL, and selects the registration button 15. As a result, as shown in FIG. 4, the traveling robot 3 registers the domain or URL input by the user in the visiting list in the database 5.

Next, in step S3 of FIG. 3, the index page Is accessed. That is, as shown in FIG. 4, the traveling robot 3 sends the registered domain or URL to the web server 11, and the web server performs the indexing based on the received domain or URL. Access the page and send it to the patrol bot.

The traveling robot 3 obtains the update date A of the index page transmitted from the web server 11. Next, in step S7 of FIG. 3, keywords registered in the index page are extracted.

Next, in step S9 in FIG. 3, the link destination is accessed. That is, as shown in FIG. 4, the traveling robot 3 transmits a link destination address included in the index page to the web server 9 (11). The web server 9 (11) accesses the link destination page on the web server 9 (11) based on the link destination address, and transmits the page to the traveling robot 3. Next, in step S11 of FIG. 3, the update date B is obtained. That is, as shown in FIG. 4, the traveling robot 3 obtains the update date B of the link destination page, and further extracts a keyword. Then, in step S13 of FIG. 3, the update dates A and B are compared, and in step S15, the update date is updated. That is, as shown in Fig. 4, if the update date B of the link destination page is larger than the update date A of the index page (the date is newer), the index page is updated. Let B be the update date of Then, in step S17 of FIG. 3, the keyword is extracted, and in step S19, the index ぺ is extracted. Add to the keywords in the page. Then, in step S21, it is determined whether or not the patrol has been completed. If the tour has not been completed, the process returns to step S9, and steps S9 to S21 are repeated.

On the other hand, if it is determined in step S21 that the tour has been completed, the tour robot 3 registers the obtained update date and keyword in the database 5 in step S23.

FIG. 6 is a diagram illustrating an example in which the traveling robot 3 uses the latest update date of the page in the frame as the update date of the index page. That is, it is assumed that the user has registered the .domain, com / index, and html powers using the registration screen of the domain or URL shown in FIG. It is also assumed that the current index page update date is March 14, 2000. It has a link destination page of title and html with an update date of February 14, 2000, and an update date of August 1, 2000. It shall consist of a link destination page of menu, html, and a link destination page of welcom. html with an update date of August 8, 2000. The patrol robot 3 obtains the update dates of these linked pages, compares the update dates, and indexes the latest update date, August 8, 2000, into an index page. Set as the update date of the page.

In addition, when the user performs a search, the search is provided, for example, on a page of the search server 19 (for example, a homepage provided by a refresh eye, an Infoseek, or the like). For example, as shown in FIG. Keypad from a keyword input screen for searching. When a search button 17 is selected after inputting a keyword, a keyword search is performed by the engine 17 shown in FIG. 1, and a search result as shown in FIG. 8 is displayed, for example. In this example, the search results

“Www, domain, com / index, html Updated August 8, 2000” Power Search server 19 Displayed on the 19th page. 'Note that the range in which the cyclic robot 3 circulates may be limited to pages linked by the specification of each frame. Also, it may be limited to the same domain.

Industrial applicability

The present invention is applicable to a search system on a network using a robot.

Claims

The scope of the claims

1. An index page for information on the network, including at least a URL (Uniform Resource Locator) or a domain, an update page, and an update page. Stores 7 database and

The database traverses the database based on the specified domain or URL, and obtains the update date of the index page and the update date of the page on the website linked from this index page color. A patrol robot whose latest update date is the update date of the index page,

A search engine characterized by being composed of:

2. An index page on the network that contains at least an index page containing a URL (Uniform Resource Locator) or domain, and a keyword.

Traverses the database based on a specified domain or URL, obtains the index page and a keyword of a page to be linked from the index page power, and obtains the obtained A cyclic robot for adding a page keyword to the keyword of the index page,

A search engine characterized by being composed of:

3. The search engine according to claim 1, wherein the circulating robot traverses the same domain as the index page.

4. The search engine according to claim 2, wherein the circulating robot traverses the same domain as the index page.

5. The index page and the link destination page are composed of frame tags, and the traveling robot updates the index page with the latest update date of the pages in the frame. The search engine according to claim 1, wherein the search engine is a day.

6. An index page for information on the network-at least a URL (Uniform Resource Locator) or a domain-containing an index page containing the date of the update. Database and

Traverses the database based on a specified domain or URL, and obtains an update date of the index page and an update date of a page on a website linked from the index page; A patrol robot whose latest update date is the update date of the index page,

An engine that searches the database based on a specified keyword;

A search system characterized by being composed of:

7. A database that stores index pages on the network, including at least a URL (Uniform Resource Locator) or domain, and a keyword. When ,

Traverses the database based on a specified domain or URL, obtains the index page and a key word of a page to be linked from the index page, and obtains the obtained page A cyclic report for adding the key of ^ ヮ ~ to the keyword of the index page, An engine that searches the database based on a specified keyword;

A search system characterized by being composed of:

8. The search system according to claim 6, wherein the traversing robot traverses the same domain as the index page.

9. The search system according to claim 7, wherein the patrol robot traverses the same domain as the index page.

10. The index page and the link destination page are configured by frame tags, and the traveling robot indicates the latest update date of the pages in the frame as the update date of the index page. 7. The search system according to claim 6, wherein the search is performed.

1 1. An index page for information on the network, containing at least a URL (Uniform Resource Locator) or a database containing an index page containing the domain, update date and keywords. In the method of creating a database in a search system that performs a database search in response to a search request,

The database traverses the database based on the specified domain or URL, and displays the update date of the index page and the update date on the page on the website that links from this index page. Acquired,

Setting the latest update date of the obtained update dates as the update date of the index page;

A database creation method for a search system characterized by this.

12. The database creation method according to claim 11, wherein the tour of the database is performed on the same domain as the index page.

13. The index page and the link destination page are constituted by frame tags, and the latest update date of the pages in the frame is set as the update date of the index page. A method for creating a database in the search system according to claim 11.

1 4. An index page of information on the network that has a database containing index pages containing at least a URL or domain, update date, and key word. The database is created in the search system that searches the database on demand.

Traversing the database based on a specified domain or URL, obtaining a key word of the index page and a key word of a page to be linked from the index page force;

Adding the keyword of the obtained page to the keyword of the index page;

A database creation method for a search system characterized by this.

15. The database creation method according to claim 14, wherein the patrol of the database is performed on the same domain as the index page.

16 6. An index page of information on the network, at least a URL (Uniform Resource Locator) or a domain. Creates a database in a search system that has a database that stores an index page that includes the keyword and update date, and that performs a database search in response to a search request. A storage medium having a program for causing

Traverses the database based on a specified domain or URL, updates the index page, and updates the pages on the website linked to this index page And steps to get

Setting the latest update date among the obtained update dates as the update date of the index page; and

A computer-readable storage medium that stores a program for causing a computer to execute the program.

17. The storage medium according to claim 16, wherein when the computer circulates through the database, the computer circulates the same domain as the index page.

18. The index page and the link destination page are composed of frame tags, and the computer is provided with an index page that shows the latest update date of the pages in the frame pages. 17. The storage medium according to claim 16, wherein the storage medium is set to an update date of the page.

1 9. An index page of information on the network, containing at least a URL (Uniform Resource Locator) or a database containing an index page containing the domain, update date and keywords. Yes, the database is A storage medium having a program for causing a computer to create a database in a search system for performing a search.

A step of circulating through the database based on a specified domain or URL to obtain the index page and a keyword of a page to be linked from the index page; Adding a keyword of the acquired page to a keyword of the index page; and

20. The storage medium according to claim 19, wherein, when the computer circulates through the database, the computer circulates the same domain as the index page.