JP2004280405A - System and method for providing information, and computer program - Google Patents

System and method for providing information, and computer program Download PDF

Info

Publication number
JP2004280405A
JP2004280405A JP2003070157A JP2003070157A JP2004280405A JP 2004280405 A JP2004280405 A JP 2004280405A JP 2003070157 A JP2003070157 A JP 2003070157A JP 2003070157 A JP2003070157 A JP 2003070157A JP 2004280405 A JP2004280405 A JP 2004280405A
Authority
JP
Japan
Prior art keywords
user
page
prefetch
information
prefetch target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2003070157A
Other languages
Japanese (ja)
Inventor
Yasushi Fukuda
Takashi Nozaki
Shinichiro Sega
信一郎 瀬賀
安志 福田
隆志 野崎
Original Assignee
Sony Corp
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp, ソニー株式会社 filed Critical Sony Corp
Priority to JP2003070157A priority Critical patent/JP2004280405A/en
Publication of JP2004280405A publication Critical patent/JP2004280405A/en
Pending legal-status Critical Current

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To provide information in a relatively short response time to a request from a specific user by caching WWW information. <P>SOLUTION: Prefetch is executed for each user by using the request log of a user so that WWW information requested by a specific user can be efficiently cached. Also, a Bayesian network model is used for a look-ahead mechanism so that, even when an access pattern is changed according to the change of the tastes of the user changing by the minute, it is possible to stochastically search the look-ahead target corresponding to this change to limit the look-ahead target and to dynamically deal with the change of the user. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information providing system, an information providing method, and a computer program for providing information requested by a user, and more particularly, to a specific user among information provided to an unspecified number of users in a WWW information space. The present invention relates to an information providing system, an information providing method, and a computer program for efficiently providing information in response to a request from a user.
[0002]
More specifically, the present invention relates to an information providing system and an information providing method for providing information with a relatively short response time from a request of a specific user by caching WWW information, and a computer program. An information providing system, an information providing method, and a computer program for providing information in a relatively short response time from a user request by pre-reading and caching (that is, pre-reading) information expected to be requested by the user. About.
[0003]
[Prior art]
Recently, network computing technology is evolving rapidly. Under a network connection environment, cooperative work such as sharing of computer resources and sharing, distribution, distribution, and exchange of information can be smoothly performed.
[0004]
There are various forms of networks for interconnecting computers. For example, a locally laid LAN (Local Area Network) such as Ethernet (registered trademark), and the “Internet” which has literally grown into a world-wide network as a result of repeatedly interconnecting networks. (The Internet). In particular, the Internet has become widespread along with the spread of broadband communication and constant connection.
[0005]
On the Internet, servers are interconnected on the basis of TCP / IP (Transmission Control Protocol / Internet Protocol), and WWW (world wide web), News, TELNET (TELetypewriter NETwork), FTP (multiple file browser, etc.) Service is open to the public.
[0006]
Of these, WWW is a wide-area information retrieval system that provides an information space having a hyperlink structure, and is the largest factor for the explosive growth and rapid spread of the Internet. On the WWW system, information contents of various media such as texts, images, and sounds are disclosed. A user of the WWW service can connect to a server that provides WWW information from the client device via the Internet and acquire WWW information.
[0007]
The WWW information is described in a hypertext format description language called HTML (Hyper Text Markup Language). According to TCP / IP, an information resource is specified by an identifier in the form of a URL (Uniform Resource Locator), and is capable of transferring an HTML document according to an HTTP (Hyper Text Transfer Protocol) protocol (well-known).
[0008]
WWW information is usually provided from a server in units called pages. On the client side, WWW information can be downloaded in page units using a WWW browser and displayed on the screen as a WWW page. A WWW page described in a hypertext format has a mutual reference relationship with a page provided on the same server or a page provided on another server by a hyperlink. .
[0009]
Here, when using WWW information, there is a problem that response time when information is obtained directly from the server is slow due to processing time of the server and bandwidth of the Internet.
[0010]
In order to solve this problem, a technique has been adopted in which a WWW information providing device called a cache server is installed on the Internet consisting of a user's client device and a device for providing WWW information (for example, see Patent Reference 1). The cache server stores WWW information frequently accessed among the WWW information requested by an unspecified number of users in the apparatus. If the information requested by the user is present in the apparatus (cache hit), the information is provided to the user. If the requested information is not present in the apparatus (cache miss), the information is provided to the server that provides the WWW information. And provide it to the requesting user.
[0011]
However, the conventional cache server is generally shared by an unspecified number of users and has a finite storage capacity. Therefore, the cache server follows a logic such as FIFO (Fast In Fast Out) or LRU (Least Recently Used). -Data management (data deletion) is being performed. In such a case, the cache server stores the frequently accessed WWW information in the device based on the access information of the unspecified number of users, so that the information requested by the specific user is not necessarily stored in the cache server. It is not always stored. For this reason, a cache miss results in the cache server sequentially obtaining information from the server that provides the WWW information, and as a result, the problem of response time cannot be solved.
[0012]
Further, link information to another page is obtained from within the reference page of the WWW information requested by the user, and all the reference pages are read ahead to reduce the response time ( See, for example, Non-Patent Document 1).
[0013]
However, in this case, in all pages, since all the link destinations in the page are prefetched, prefetching is performed even to the one that is likely not to be accessed, which is inefficient and wastes bandwidth. Problem.
[0014]
There is also a method in which importance and priority are set for a reference page of a user or WWW information, and prefetching is performed (for example, see Patent Document 2). For example, a mechanism is provided for totalizing and analyzing the history of WWW data access of the proxy server, and a priority is given to data cached in the cache server based on the analysis result. Then, a condition is set as to whether or not to perform prefetch for each requested data according to the priority. As a result, it is possible to improve the cache recall rate by prefetching while suppressing an increase in the load on the network due to prefetching.
[0015]
However, in this case, since the importance and the priority become uniform, there is a problem that the object to be pre-read also becomes uniform. In other words, when the user's preference changes and the access pattern changes dynamically, the cache server cannot cope.
[0016]
[Patent Document 1]
JP-A-10-21174
[Patent Document 2]
JP-A-11-149405
[Non-patent document 1]
Full technical explanation of Web server, Nikkei BP
[0017]
[Problems to be solved by the invention]
An object of the present invention is to provide an excellent information providing system and information which can efficiently provide information according to a request of a specific user from information provided to an unspecified number of users in a WWW information space. An object of the present invention is to provide a providing method and a computer program.
[0018]
A further object of the present invention is to provide an excellent information providing system, an information providing method, and a computer program, which can provide information with a relatively short response time from a request of a specific user by caching WWW information. Is to provide.
[0019]
A further object of the present invention is to provide an excellent information providing system and information providing method capable of providing information in a relatively short response time from a user request by pre-reading information that the user will request, And to provide a computer program.
[0020]
It is a further object of the present invention to provide information in a relatively short response time from a user request by pre-reading information requested by the user in response to a change in user's preference changing every moment. An object of the present invention is to provide an excellent information providing system, an information providing method, and a computer program.
[0021]
Means and Action for Solving the Problems
The present invention has been made in consideration of the above problems, and a first aspect of the present invention is an information providing system for providing information including pages having a mutual reference relationship across a plurality of sites,
Log extraction means for extracting a request log requesting information for each user;
A prefetch target page selecting unit that predicts a next page to be accessed based on an access sequence of pages included in the request log, and obtains a page to be prefetched for each user;
Prefetch target site selection means for predicting the next site to be accessed based on the access sequence of the site included in the request log, and seeking a site to be prefetched for each user,
A prefetch unit that prefetches information based on a page and a site selected by the prefetch target page selection unit and the prefetch target site selection unit;
An information providing system comprising:
[0022]
However, the term “system” as used herein refers to a logical collection of a plurality of devices (or functional modules that realize specific functions), and each device or functional module is in a single housing. It does not matter in particular.
[0023]
The information providing system according to the present invention provides information with a relatively short response time from a user request by caching WWW information published in a WWW server.
[0024]
At this time, by performing prefetching for each user using the user's request log, WWW information requested by a specific user, rather than an unspecified majority, can be efficiently cached.
[0025]
Here, the prefetch target page selection unit and / or the prefetch target site selection unit uses a probability network model for each user that describes a user's access sequence to determine a page and / or site that the user next accesses. Predict.
[0026]
That is, by using the Bayesian network model, which is one of the probability network models, for the mechanism for prefetching WWW information, even if the access pattern changes in response to the user's ever-changing preference. In response to this, it is possible to stochastically obtain a prefetch target, limit the prefetch target, and dynamically respond to a change of the user.
[0027]
In addition, the prefetch target page selection unit and / or the prefetch target site selection unit can update the probability network model based on the access result of the next user.
[0028]
Further, the prefetch means may perform a prefetch operation for a prefetch target in consideration of a load on a network or the like. For example, the priorities of the pages and / or sites to be prefetched based on the prediction may be rearranged in consideration of the network load.
[0029]
Further, cache information such as prefetched information may be stored in a storage area allocated to each user. Alternatively, cache information may be managed on the same storage space without distinguishing users.
[0030]
A second aspect of the present invention is described in a computer-readable format so as to execute, on a computer system, a process for providing information comprising pages having a mutual reference relationship across a plurality of sites. A computer program,
A log extracting step of extracting a request log requesting information for each user;
A prefetch target page selection step of predicting the next page to be accessed based on the access sequence of the pages included in the request log, and seeking a page to be prefetched for each user;
A prefetch target site selection step of predicting the next site to be accessed based on the access sequence of the site included in the request log and obtaining a site to be prefetched for each user;
A prefetch step of prefetching information based on the page and site selected by the prefetch target page selection step and the prefetch target site selection step;
A computer program characterized by comprising:
[0031]
The computer program according to the second aspect of the present invention defines a computer program described in a computer-readable format so as to realize a predetermined process on a computer system. In other words, by installing the computer program according to the second aspect of the present invention into a computer system, a cooperative action is exerted on the computer system, and the information provision according to the first aspect of the present invention is provided. The same operation and effect as those of the system can be obtained.
[0032]
Further objects, features, and advantages of the present invention will become apparent from more detailed descriptions based on embodiments of the present invention described below and the accompanying drawings.
[0033]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0034]
According to the present invention, a cache server is arranged on the Internet comprising a client device of a user and a device for providing WWW information, and the cache server performs a prefetch operation for information to be accessed next for each user. Thus, WWW information requested by a specific user, not an unspecified number, is efficiently provided.
[0035]
FIG. 1 schematically shows a configuration example of a WWW information providing system to which the present invention is applied. In the example shown in the figure, a terminal 100 as a client from which a user requests information, a Web server 103 for providing WWW information in page units, and a cache server for storing WWW information provided from the Web server 103 A device 104 is connected to the Internet 102 via a network 101. In the cache server device 104, a cache system 105 (described later) according to an embodiment of the present invention is constructed.
[0036]
The terminal 100 is a client terminal used when a user browses WWW information on the Internet, and corresponds to, for example, a PC (Personal Computer), a PDA (Personal Data Assistants), or a mobile phone.
[0037]
The Web server 103 is configured by a device installed by a person who provides a service for providing WWW information on the Internet to provide WWW information to users.
[0038]
WWW information is described using a hypertext format description language called HTML. The WWW information is specified by a URL and can be transferred according to the HTTP protocol. The WWW information is usually provided from the server in units called pages (hereinafter, referred to as “Web pages”). On the client side, WWW information can be downloaded in page units using a WWW browser and displayed on the screen as a Web page. In addition, a Web page described in a hypertext format has a mutual reference relationship with a page provided on the same server or a page provided on another server by a hyperlink. .
[0039]
The cache server device 104 is provided to reduce a delay time when a user obtains WWW information such as a Web page directly from the Web server 103. The cache system 105 caches information such as a web page requested by a user for each user's request for WWW information, and predicts a page that may be accessed next to the web page requested by the user. Cache (read ahead). The illustrated cache server device 104 has a configuration called a “backside cache”.
[0040]
In the illustrated form, a user request flows from the terminal 100 to the Internet 102 through the network 101 and reaches the cache server device 104. In the cache server device 104, a user request is passed to the cache system 105.
[0041]
In the cache system 105, if the page requested by the user is cached (cache hit), the page requested by the user is passed to the cache server device 104, but if the page is not cached (cache miss), The page requested by the user is acquired from the Web server 103 and passed to the cache server device 104.
[0042]
In addition, when the cache system 105 receives a user request, the cache system 105 performs a process of pre-caching a page to which the user may access next (prefetching). This point will be described in detail later.
[0043]
FIG. 2 schematically shows another configuration example of the WWW information providing system to which the present invention is applied. In the example shown in FIG. 1, a terminal 200 as a client from which a user requests information, an in-house LAN 203, and the like are connected to the Internet 202 through a network 201. The in-house LAN 203 here refers to a local area network in a company that provides a service for providing a Web page on the Internet.
[0044]
A web server 206 that provides WWW information in page units and a cache server device 204 that stores WWW information provided from the web server 103 are connected to the in-house LAN 203. In the cache server device 204, a cache system 205 according to an embodiment of the present invention is constructed. The illustrated cache server device 204 has a configuration called a “front side cache”.
[0045]
In the illustrated form, a user request flows from the terminal 200 to the Internet 202 via the network 201 and reaches the cache server device 204 via the in-house LAN 203. In the cache server device 204, a user request is passed to the cache system 205.
[0046]
In the cache system 205, if the page requested by the user is cached (cache hit), the page requested by the user is passed to the cache server device 204, but if the page is not cached (cache miss), The page requested by the user is acquired from the Web server 203 and passed to the cache server device 204.
[0047]
In addition, when the cache system 205 receives a user request, the cache system 205 performs a process of predicting a page which is likely to be accessed next by the user and caching the page in advance (prefetching). I will explain in detail.
[0048]
FIG. 3 schematically shows still another configuration example of the WWW information providing system to which the present invention is applied. In the example shown in the figure, a terminal 300 as a client from which a user requests information and a Web server 304 for providing WWW information in page units are connected to the Internet 303 via a network 302. In the terminal 300, a cache system 301 according to an embodiment of the present invention is constructed.
[0049]
In the illustrated form, a user request is first passed from the terminal 300 to the cache system 301. Then, in the cache system 301, if the page requested by the user is cached (cache hit), the page requested by the user is passed to the terminal 300, but if not cached (cache miss), the network 302 And obtains the page requested by the user from the Web server 304 via the Internet 303 and passes it to the terminal 300.
[0050]
Also, when the cache system 301 receives a user request, the cache system 301 performs a process of predicting a page which is likely to be accessed next by the user and caching it in advance (prefetching). I will explain in detail.
[0051]
FIG. 4 schematically shows a functional configuration of a cache system according to an embodiment of the present invention.
[0052]
The request processing unit 1001 records the user request in the access log 1002, instructs the log extraction unit 1003 to perform log processing, obtains the page requested by the user from the cache management unit 1004, and obtains the page obtained by the user. return it. If the corresponding page cannot be obtained from the cache management unit 1004, the page requested by the user is obtained from an external Web server, and the cache management unit (1004) is instructed to write the obtained page. , The retrieved page can be returned to the user.
[0053]
FIG. 5 shows an example of the data structure of a log recorded in the access log 1002. In the example shown in the figure, a log record is generated for each request from the user, and each record includes the date and time of access, the IP address of the requesting user, the request type, the URL of the requested Web page, and the requested URL. Has a field to record the document type, etc.
[0054]
The log extracting unit 1003 responds to an instruction from the request processing unit 1001 by the prefetch target page selecting unit 1007 and the prefetch target site selecting unit 1009, each of which is necessary for predicting a page or site to be prefetched (for example, Client IP address, URL, access time, etc.) are extracted from the access log 1002, and the extracted information is transferred to each of the prefetch target page selection unit 1007 and the prefetch target site selection unit 1009, and an instruction is given to perform a selection process.
[0055]
In response to a request from the request processing unit 1001 or the prefetch unit 1013, the cache management unit 1004 searches whether the file requested to be prefetched exists on the cache disk 1006, reads the file from the cache disk 1006, Performs disk access processing such as writing a file to the disk 1006, and records cache management information such as a file list of the cache disk 1006 and access date and time in the cache management table 1005.
[0056]
FIG. 6 shows an example of the data structure of the cache management table 1005. In the example shown in the figure, cache management information is managed for each requesting client user. Each client is identified by an IP address and has a dedicated cache storage capacity assigned to it. The cache management table 1005 manages the storage capacity allocated to each client and the storage capacity currently in use, and for each client, the last access date and time, URL, cache disk, and the like for the requested Web page. A list including file names in (1006) is recorded. The list of requested Web pages is arranged, for example, in descending order of the last access date and time for each client.
[0057]
When a file is searched by the cache management unit 1004, the cache management table 1005 is referred to. Therefore, the file requested from the terminal, the file acquired from the Web server, and the like exist in the cache management table 1005. In the configuration example of the cache management table illustrated in FIG. 6, the user terminal is separately cached. That is, it is possible to determine the allocated capacity of the cache disk 1006 for each user terminal and perform caching for each terminal. On the other hand, duplicate files may be cached between cache areas for each user terminal, and the cache disk 1006 may not be used efficiently.
[0058]
On the other hand, a method of managing files without distinguishing terminals of all users is also conceivable. A general cache server employs such a management method. FIG. 24 illustrates an example of the data structure of the cache management table 1005 when terminals of all users are not distinguished. In the example shown in the figure, each client is not identified on the cache management table 1005, and there is no cache area allocation for each client. The cache management table 1005 keeps track of the allocated capacity and the current capacity of the entire cache disk 1006, and lists the last access date and time, URL, and file name in the cache disk (1006) of the requested Web page. Is recorded without identifying the access requesting client. In this case, there is a disadvantage that the cache is not cached for each terminal of the user, but the one that is accessed more frequently remains in the cache, so that the use efficiency of the cache disk 1006 is improved.
[0059]
According to the present invention, every time a request is received from a user terminal, a page or a site that may be requested next, such as a requested page, is predicted, and the predicted page or site is obtained from a Web server (prefetching). Then, it is cached in advance on the cache disk 1006 (described later). Regardless of the configuration of the cache management table 1005 shown in FIG. 6 or FIG. 24, a page or the like that may be requested always exists on the cache management table 1005 and the cache disk 1006.
[0060]
The prefetch target page selection unit 1007 receives the information extracted from the access log 1002 by the log extraction unit 1003, predicts a page that may be accessed next by the user based on this information, and selects the page as a prefetch target page. I do. More specifically, a page that is likely to be accessed next to the page requested by the user is calculated with a probability value in the range of 0 to 1, selected with a time limit for acquisition until prefetching, and the selected result Is recorded in the prefetch target page list 1008. Further, it notifies the prefetch scheduler unit 1011 that the prefetch target page list 1008 has been updated.
[0061]
FIG. 7 shows an example of the data structure of the prefetch target page list 1008. In the example shown in the figure, on the prefetch target page list 1008, pages predicted to be likely to be accessed are listed, for example, in order of probability value, and the IP address of the client requesting each page is displayed. The page URL, prefetch deadline, and the like are recorded.
[0062]
The prefetch target site selection unit 1009 receives the information extracted from the access log 1002 by the log extraction unit 1003, and predicts a site to which the user may access next based on this information, that is, the user's access sequence. Select as a prefetch target site. More specifically, a site that is likely to be accessed next to the site where the page requested by the user is located is calculated with a probability value in the range of 0 to 1, and is selected with an acquisition deadline until prefetching. Is recorded in the prefetch target site list 1010. Further, the prefetch scheduler unit 1011 is notified that the prefetch target site list 1010 has been updated.
[0063]
FIG. 8 shows an example of the data structure of the prefetch target site list 1010. In the example shown in the drawing, sites predicted to be likely to be accessed are listed on the prefetch target site list 1010, for example, in order of probability value, and the IP address of the client requesting each site is displayed. The site URL, the prefetch deadline, and the like are recorded.
[0064]
In response to the notification from the prefetch target page selection unit 1007, the prefetch scheduler unit 1011 reads the prefetch target page list 1008, and records the pages to be prefetched in the prefetch schedule table 1012 in order of probability value. Further, in response to the notification from the prefetch target site selection unit 1009, the prefetch target site list 1010 is read, and sites to be prefetched are recorded in the prefetch schedule table 1012 in order of probability value. Then, it notifies the prefetch section 1013 that the prefetch schedule table 1012 has been updated.
[0065]
FIG. 9 schematically shows the data structure of the prefetch schedule table 1012. In the example shown in the figure, pages and sites selected as prefetch targets are listed in order of probability value, the prefetch start time, prefetch deadline, URL of the page or site, data to be prefetched for each prefetch target page or site. , A band used at the time of prefetch, a probability value, a prefetch type, a prefetch status, and the like.
[0066]
The prefetch unit 1013 acquires a prefetch target page in the schedule table 1012 from an external Web server based on the prefetch schedule table 1012, or sorts the prefetch schedule table 1012 according to a notification from the prefetch scheduler unit 1011.
[0067]
Subsequently, a processing operation executed in each unit in the cache system 1000 according to the present embodiment will be described.
[0068]
FIG. 10 shows a processing procedure executed by the request processing unit 1001 in the form of a flowchart.
[0069]
The request processing unit 1001 waits for a request from the user (step S10), and upon receiving the request (step S11), first writes the request to the access log 1002 (step S12).
[0070]
After writing the request in the access log 1002, an instruction is issued to the log extracting unit 1003 to process the access log 1002 (step S13).
[0071]
Next, it inquires of the cache management unit 1004 whether a Web page or the like requested by the user is cached in the cache disk 1006 (step S14).
[0072]
If the data is cached in the cache disk 1006 (cache hit), a request is made to the cache management unit 1004 for reading, and after reading (step S15), a Web page or the like requested by the requesting user is returned (step S15). S16).
[0073]
On the other hand, if the cached page is not cached in the cache disk 1006 (cache miss), the requested Web page or the like is acquired from the external server (step S17), and the acquired Web page or the like is cached to the cache management unit 1004. After instructing to save (step S18), a Web page or the like requested by the requesting user is returned (step S16).
[0074]
FIG. 11 shows a processing procedure executed in the log extracting unit 1003 in the form of a flowchart.
[0075]
The log extracting unit 1003 waits for an access log processing request from the request processing unit 1001 (step S20). When an access log processing request arrives (step S21), first, the latest user information is read from the access log 1002. The request is read (step S22),
[0076]
Next, an observation value is extracted from the read user request (step S23). As an observation value, for example, in the case of an access log as shown in FIG. 5, an access date and time, a client IP address, a URL, and a document type are extracted.
[0077]
The log extracting unit 1003 first sends the extracted information to the prefetch target page selecting unit 1007, and instructs to perform a work of selecting a page to be prefetched (step S24).
[0078]
Next, focusing on the URL of the extracted information, it is determined whether or not the previous URL and the site have been changed (step S25). If the previous URL is different from the site, the extracted information is sent to the prefetch target site selecting unit 1010 to instruct the prefetch target site to perform the selecting operation (step S26).
[0079]
For example, the URL previously requested by the same user is “http://www.aaa.co.jp/bbb.htm”, and the URL requested this time is “http://www.bbb.ne.jp/index. .Htm ”, the log extracting unit 1003 captures that the access has been changed from the site“ www.aaa.co.jp ”to the site“ www.bbb.ne.jp ”, and In order to select a site that may be accessed next to “www.bbb.ne.jp”, the extracted information is sent to the prefetch target site selection unit 1009 to instruct the prefetch target site selection operation.
[0080]
FIG. 12 shows, in the form of a flowchart, a processing procedure executed in the cache management unit 1004.
[0081]
The cache management unit 1004 waits for a command from the request processing unit 1001 or the prefetch unit 1013 (step S30). Then, when asked if there is a file corresponding to the requested URL in the cache disk 1006 (step S31), the cache management table 1005 as shown in FIG. It is checked whether the disk exists on the disk 1006 (step S32).
[0082]
If the file corresponding to the requested URL exists in the cache disk 1006 in the cache management table 1005, a message indicating that there is a file corresponding to the URL is returned (step S33). If the corresponding file does not exist on the cache disk 1006, a message indicating that there is no file corresponding to the URL is returned (step S34).
[0083]
If the command is a read request (step S35), the cache management table 1005 as shown in FIG. 6 is checked to determine whether a file corresponding to the requested URL exists on the cache disk 1006. It is confirmed whether or not it is (step S36).
[0084]
If the file corresponding to the URL exists on the cache disk 1006, the last access date and time of the cache management table 1005 as shown in FIG. 6 are updated to the current date and time (step S37). The file corresponding to the URL is returned (step S38). If the file corresponding to the URL does not exist on the cache disk 1006, a message indicating that there is no file corresponding to the URL is returned (step S34).
[0085]
If the command is a write request (step S39), first, the current size of the cache management table 1005 as shown in FIG. It is checked whether or not the allocated capacity specified in the table 1005 has been exceeded (step S40).
[0086]
If the allocated capacity is not exceeded, a file corresponding to the URL is written in the cache area (step S42), and the cache management table 1005 is updated (step S43).
[0087]
On the other hand, if the sum of the file size of the requested URL and the requested size exceeds the allocated area, the file is written starting from the oldest file from the access date and time by the size of the file corresponding to the written URL. Is erased to secure a cache area (step S41), a file corresponding to the URL is written in the cache area (step S42), and the cache management table 1005 is updated (step S43).
[0088]
FIG. 13 is a flowchart illustrating a processing procedure performed by the prefetch target page selection unit 1007.
[0089]
The prefetch target page selection unit 1007 is waiting for a command to be received (step S50), receives an observation value from the log extraction unit 1003, and is instructed to perform prefetch target selection processing (step S51). An observation value is set in a Bayesian network model as shown in FIG. 15 (step S52).
[0090]
For a general description of the Bayesian network model, see, for example, a paper by Yoichi Motomura, "Information Representation for Uncertainty Modeling: Bayesian Net" (BN2001 Bayesian Net Tutorial Lecture Book, pp. 5-13 (2001)). , The Japan Society for Artificial Intelligence Artificial Intelligence Basic Research Group).
[0091]
After the observation value setting process is completed, learning of conditional probability is performed (step S53). Here, in the learning of the conditional probability, in the process of setting the observation value, the conditional probability is learned by the EM algorithm using the recorded number of observations. For more information on the EM algorithm, see, for example, "Machine Watanabe and Kazunori Yamaguchi, edited by EM Algorithm and Problems of Incomplete Data" (Taga Publishing).
[0092]
Following the learning of the conditional probability, the posterior probability is calculated (step S54). Here, the calculation of the posterior probabilities is performed on the nodes in which the flag set not to be calculated is not set in the observation value setting processing. The calculation of posterior probabilities is described in, for example, a paper by Yoichi Motomura, "Information Representation for Uncertainty Modeling: Bayesian Net" (BN2001, Bayesian Net Tutorial Lecture Book, pp. 5-13 (2001), Artificial Intelligence Society) Please refer to the Artificial Intelligence Fundamental Research Group).
[0093]
After the calculation of the posterior probabilities, the probability value of each URL is acquired from the “next URL node” 4002 shown in FIG. 14 or the “next URL node” 4002 shown in FIG. They are arranged in the order of the objects (described later). Among these, a URL having a probability exceeding a certain threshold is set as a prefetch target page, and the URL and the probability value are temporarily recorded in the prefetch target page list. Here, the certain threshold value is, for example, an average value (1 / all the URLs in the “next URL node”) or α times the average value (1 / all the URLs in the “next URL node”). Is used.
[0094]
Next, a prefetch time limit (second) is obtained from the “access interval (second) node” for each URL in the once created prefetch target page list. The method of finding is to set a temporary observation value to the “next URL node” as a provisionally observed one for each URL in the prefetch target page list (one corresponding to the provisional observation value is set to 1, Otherwise, the posterior probability of the “access interval (second) node” is calculated. Among the calculated posterior probabilities, the one having the maximum probability value is obtained.
[0095]
After obtaining the prefetch time limit (seconds) for each URL in the prefetch target page list, the prefetch time limit (seconds) is recorded in the prefetch target page list to complete the prefetch target page list (step S55).
[0096]
After creating the prefetch target page list, the prefetch scheduler unit 1011 is notified that the list has been created (step S56).
[0097]
Here, a Bayesian network model and a user preference type prefetch algorithm using the same will be described below with reference to the examples shown in FIGS.
[0098]
14 and 15 are both Bayesian network models utilizing the fact that the user's preference appears in the relevance between URLs accessed by the user in time series (hereinafter, referred to as “access series”). Two access sequences are used, one that was accessed last time and one that was accessed this time. For example, it is possible to stochastically measure a tendency that a certain user accesses a site A relating to sports every time and then accesses a page of a genre B in the site A.
[0099]
In addition, the difference between the Bayesian network model shown in FIG. 14 and the Bayesian network model shown in FIG. 15 takes into account the user's preference according to the season and time, and the seasonal and temporal changes in the request contents. That is, whether or not to perform a process of selecting a page to be prefetched. For example, it is possible to stochastically measure that a certain user tends to view a page related to high school baseball on the sports newspaper site A in summer, but tends to view a page related to soccer in other seasons. Therefore, in a case where the seasonality and time change appear in the directivity of user access, FIG. 15 is used.
[0100]
The prefetch target page selection unit 1007 substitutes the observed value of the URL requested by the user into the Bayesian network model shown in FIG. 14 or FIG. 15 and performs a conditional probability learning operation to calculate a value of the conditional probability. Then, the posterior probabilities at the individual nodes in the model are obtained, and finally, the URL accessed next to the URL requested by the user is determined by the “next URL” node 4002 in the model shown in FIG. 14 or FIG. Obtain with reference to the probability value.
[0101]
Here, conditional links between nodes are represented by X i → X j And X i Is the parent node, X j Is called a child node. In FIGS. 14 and 15, arrows between nodes in the figures represent parent-child relationships. Child node X j The conditional probability P for all child nodes for a given child node is π (X j ) = {X 1 , ..., X i If}, it is defined by the following equation.
[0102]
(Equation 1)
[0103]
If there are a plurality of state variables at each node, the conditional probability is represented by a conditional probability table. For example, the conditional probability table at the “next URL” node 4002 in FIG. Become.
[0104]
Learning here means that the value of the conditional probability is X i → X j Means the work to obtain from the number of observations where the observed value is obtained. The number of observations is X i → X j The number of observations that becomes is stored in the conditional observation number table. For example, in the “next URL” node 4002 in FIG. 14, the conditional observation count table is as shown in FIG. Since learning is performed each time an observation value is obtained, the value of the conditional probability is calculated each time an observation value is obtained and changes sequentially.
[0105]
The posterior probability is the probability of a possible actual value of a random variable desired to be known from the observed variable definite value (evidence: e). Evidence given from the node for calculating the posterior probability to the parent node ahead is e + Evidence given from the node to be calculated to the previous child node Then, from Bayes' theorem, node X j Is determined, for example, by the following equation.
[0106]
(Equation 2)
[0107]
For a general description of the Bayesian network model itself, the model expression method, and the mathematical calculation method, see, for example, a paper by Yoichi Motomura, "Information Expression for Uncertainty Modeling: Bayesian Net" ( BN2001 Bayesian Net Tutorial Lecture Collection, pp. 5-13 (2001), Japanese Society for Artificial Intelligence, Artificial Intelligence Fundamental Study Group).
[0108]
Next, an operation of selecting a prefetch target page that reflects the user's preference using the Bayesian network model will be described. First, setting of observation values will be described below with reference to FIG. 14 as an example.
[0109]
When setting the observation value in the “client IP address node” 4000, the “client IP address node” 4000 sets the probability value of the client IP address of the observation value to 1, the other values to 0, and sets the subsequent posterior probability. A flag is set so that the posterior probability of “client IP address node” 4000 is not calculated in the calculation. If there is no “client IP address node” 4000, the newly observed client IP address is added to the “client IP address node” 4000. The probability value at the time of addition is set to 1 for the added value, and set to 0 for the other values, and a flag is set so that subsequent posterior probabilities are not calculated.
[0110]
When an observation value is set to the “next URL node” 4002, the “next URL node” 4002 is used for the observation of the “next URL node” 4002 so that it can be used in the calculation of the conditional probability (EM algorithm) later. The number of times the URL was observed is recorded. If there is no URL corresponding to the “next URL node” 4002, the newly observed URL is added to the “next URL node” 4002. The number of observations when adding is set to one.
[0111]
When setting the observation value in the “current URL node” 4001, the probability value of the URL of the observation value is set to 1 in the “current URL node” 4001, and the other values are set to 0 in the subsequent posterior probability calculation. A flag is set so that the posterior probability of the “current URL node” 4001 is not calculated. If there is no “current URL node” 4001, the newly observed URL is added to the “current URL node” 4001. The added probability value is set to 1 for the added value, and set to 0 for the other values, and a flag is set so as not to calculate the posterior probability.
[0112]
When an observation value is set in the “access interval (second) node” 4003, the “access interval (second) node” 4003 uses the “access interval (second) node” so that it can be used in the calculation of the conditional probability (EM algorithm) later. ) Record the number of observations of the observed access interval (second) of the node “4003”. Note that the access interval (second) falls within the corresponding location of the “access interval (second) node” 4003 from the difference between the date and time obtained by the previous observation value and the date and time obtained by the current observation value. To calculate. For example, if the difference is 38 seconds, it is 30 seconds.
[0113]
In the case of the Bayesian network model shown in FIG. 15, for the “seasonal node” 4101, from the date of the observation value, for example, March to May is spring, June to August is summer, and September to November. Is set to fall, December to February is set to winter, the corresponding part is set to 1, and other parts are set to 0 to set an observation value, and a flag is set so that the posterior probability of the “seasonal node” 4101 is not calculated. Further, regarding the “time node” 4102, for example, from 6:00 to 9:00 in the morning, 9:00 to 12:00 in the morning, 12:00 to 13:00 in the midday, 13:00 to 16:00 in the afternoon, Hour to 18:00 is the evening, 18:00 to 24:00 is the night, and 24:00 to 6:00 is the night, the corresponding part is set to 1 and the others are set to 0, and the observation value is set. The posterior probability of “time node” 4102 Set a flag to not calculate.
[0114]
After the observation value setting process is completed, learning of the conditional probability is performed (described above). Here, in the learning of the conditional probability, in the process of setting the observation value, the conditional probability is learned by the EM algorithm using the recorded number of observations.
[0115]
After the learning of the conditional probabilities, the posterior probabilities are calculated (described above). Here, the calculation of the posterior probabilities is performed on the nodes in which the flag set not to be calculated is not set in the observation value setting processing.
[0116]
After the calculation of the posterior probabilities, the probability values of the respective URLs are obtained from the “next URL node” 4002 in FIG. 14 and the “next URL node” 4002 in FIG. Line up. Among them, a URL having a probability exceeding a certain threshold is set as a prefetch target page, and its URL and a probability value are temporarily recorded in the prefetch target page list 1008. The certain threshold value referred to here is, for example, an average value (1/1 of all the URLs in the “next URL node”), α times the average value (1/1 of all the URLs in the “next URL node”), or the like. Used.
[0117]
Next, for each URL in the prefetch target page list 1008 once created, the prefetch time limit (second) is obtained from the “access interval (second) node”. The method of finding is to set a tentative observation value to the “next URL node” for each URL in the prefetch target page list 1008 as tentatively observed (one corresponding to the tentative observation value is set to 1). And 0) for the others, and calculate the posterior probabilities of the “access interval (second) node”. Of the calculated posterior probabilities, find the one having the maximum probability value. Once the prefetch deadline (seconds) is obtained for each URL in the prefetch target page list 1008, the prefetch deadline (seconds) is recorded in the prefetch target page list 1008 to complete the prefetch target page list 1008 (described above). ).
[0118]
FIG. 18 shows, in the form of a flowchart, a processing procedure executed by the prefetch target site selection unit 1009. Note that the operation itself of the prefetch target site selection unit 1009 is the same as that of the prefetch target page selection unit 1007, and a given observation value is different.
[0119]
The prefetch target site selection unit 1009 waits for a command (step S60), receives an observation value from the log extraction unit 1003, and is instructed to perform a selection process (step S61). Set observation values in the network model (step S62). The method of setting the observation value is the same as that of the prefetch target page selection unit.
[0120]
When the setting of the observation value is completed, learning of the conditional probability is performed (step S63), and then the posterior probability is calculated (step S64), and a prefetch target site list 1010 is created (step S65). The method of calculating the conditional probability, the calculation of the posterior probability, and the method of creating the prefetch target site list 1010 are the same as those of the prefetch target page selection unit 1007.
[0121]
After creating the prefetch target site list 1010, the prefetch scheduler unit 1011 is notified that the list has been created (step S66).
[0122]
In the prefetch target page list 1008 described above, there is a correlation between the pages accessed by the user in chronological order, and the degree of the correlation (probability value) reflects the user's preference between the pages. Since the higher the degree of correlation (probability value), the higher the preference of the user, the higher the preference, the higher the preference is written in the list 1008.
[0123]
Similarly, in the prefetch target site list 1010, there is a correlation between the sites accessed by the user in time series, and the degree of the correlation (probability value) reflects the user's preference between the sites. The higher the degree of correlation (probability value), the higher the preference of the user. Therefore, the list 1010 is written with an estimated preference.
[0124]
Further, the conditional observation count table is updated each time a request request (observation value) from the user is obtained, so that the conditional probability table is also updated. When the request request from the user changes in accordance with the change in the user's preference, the conditional observation count table and the conditional probability table change accordingly, so the posterior probability also changes. Therefore, the pages and sites written in the prefetch target page list 1008 or the prefetch control site list 1010 change according to the user's preference. Based on the degree of the correlation (probability value), pages and sites expected to be accessed next from the request from the user are sequentially read ahead and stored on the cache disk 1006, so that the user Can be cached according to the changing preference.
[0125]
When such a cache operation is performed, the cache management table 1005 manages a cache area without distinguishing users, and even if a file accessed by a certain user has disappeared from the cache disk 1006, a prefetch target page selection unit. Based on a request estimated for each user from the site selection unit 1007 or the prefetch target site selection unit 1009, a page or site having a high preference that is likely to be accessed next is obtained from the Web server, and is always stored on the cache disk 1006. Will be saved. Therefore, a file having a strong preference (high probability value) that can be accessed for each user always exists on the cache disk 1006.
[0126]
FIG. 19 shows, in the form of a flowchart, a processing procedure executed in the prefetch scheduler unit 1011.
[0127]
The prefetch scheduler 1011 unit is waiting for a list creation notification from the prefetch target page selection unit 1007 and the prefetch target site selection unit 1009 (step S70). When a list creation completion notification arrives (step S71), The URL, the prefetch deadline (seconds), the probability value, and the current time are additionally registered as the prefetch start time in the prefetch schedule table 1012 that currently exists (step S72). After registration in the prefetch schedule table 1012, the prefetch scheduler 1013 is notified (step S73).
[0128]
FIG. 20 is a flowchart illustrating a processing procedure executed in the prefetch unit 1013.
[0129]
The prefetch unit 1013 is waiting for a notification from the prefetch scheduler unit 1011 (step S80). When the notification arrives (step S81), the prefetch start time is older than the current time in the prefetch schedule table 1012 from the prefetch schedule table 1012. Of these, an entry for which the prefetch has not been completed is searched for, and if the entry is found, the prefetch is stopped (step S82). Next, the entry whose prefetch start time is older than the current time is deleted from the prefetch schedule table 1012 (step S83).
[0130]
If the notification is not received from the prefetch scheduler unit 1011, for the corresponding URL in the schedule table for which prefetch has not been performed, the cache management unit 1004 is checked whether there is the corresponding URL on the cache disk 1006. An inquiry is made (step S84).
[0131]
If the corresponding URL exists on the cache disk 1006 (step S85), the Web server inquires of the Web server having the corresponding URL whether the relevant URL has been updated (step S86). If the URL has not been updated (step S87), the process returns to the registration notification wait (step S80).
[0132]
In addition, when there is no corresponding URL in the cache (step S85), and when the corresponding URL has been updated (step S87), first, the size of the target URL in the list is set to the Web server having the target URL. And using the HTTP protocol.
[0133]
When checking the size, the approximate bandwidth is obtained from the response of the Web server having the target URL, and the size and the bandwidth are recorded in the prefetch schedule table 1012 (step S88). Since the time required for prefetching is obtained from the size and the bandwidth, it is compared with the prefetch time limit.
[0134]
If the prefetch time limit has been exceeded, prefetching is temporarily stopped in ascending order of the probability value of the current prefetching, and the response is measured again to determine whether the response falls within the prefetch time limit. Then, if it does not fit within the prefetch time limit, it is set not to perform prefetch.
[0135]
Next, prefetching is continued again for those having a low probability value for which prefetching has been once stopped (step S89).
[0136]
Next, for the URLs in the schedule table that have not been prefetched, a set of files such as pages corresponding to the URLs and images in the pages is acquired in the order of the probability values in the schedule table (step S90). In the process of acquiring files, the cache management unit 1004 is requested to write the acquired files in the order of acquisition (step S91).
[0137]
FIGS. 21 and 22 show an operation sequence for acquiring a page on the Web server from the user's terminal through the intervention of the cache system 1000 according to the present embodiment.
[0138]
FIG. 21 illustrates an operation sequence when the requested page exists on the cache system 1000.
[0139]
In this case, first, the WWW cache system receives a page acquisition request from the terminal.
[0140]
Next, a search is made as to whether the page requested by the terminal exists on the cache in the WWW cache system. If it exists in the cache, the page requested by the terminal is returned to the terminal.
[0141]
Next, the WWW cache server predicts a page that is likely to be accessed next to the page requested from the terminal according to the above-described procedure, and requests the WWW server for this. The WWW server returns the page requested from the WWW cache system to the WWW cache system.
[0142]
When receiving the requested page from the WWW server, the WWW cache system writes the requested page on a cache in the WWW cache system.
[0143]
FIG. 22 illustrates an operation sequence when the requested page does not exist on the cache system 1000.
[0144]
In this case, first, the WWW cache system receives a page acquisition request from the terminal. Next, it is searched whether or not the page requested by the terminal exists on the cache in the WWW cache system.
[0145]
If the page does not exist in the cache, the page requested from the terminal is requested to the WWW server. The WWW server returns the page requested from the WWW cache system to the WWW cache system.
[0146]
Upon receiving the requested page from the WWW server, the WWW cache system transmits the page received from the WWW server to the terminal and writes the requested page on a cache area in the WWW cache system.
[0147]
Next, the WWW cache system predicts a page which is likely to be accessed next to the page requested by the terminal according to the above-described procedure, and requests the WWW server for this.
[0148]
The WWW server returns the page requested from the WWW cache system to the WWW cache system. When receiving the requested page from the WWW server, the WWW cache system writes the requested page on a cache in the WWW cache system.
[0149]
Finally, a hardware environment for operating the cache system according to the present embodiment will be described with reference to FIG.
[0150]
The CPU 5000 is configured using, for example, a processor chip “Pentium (registered trademark)” manufactured by Intel Corporation of the United States, and has a function of comprehensively controlling the overall operation of the cache server device and performing various arithmetic processes. ing. Various application programs, including the cache server application according to the present embodiment, operate on the CPU 5000 in an execution environment provided by an operating system (OS).
[0151]
The cache memory 5001 is a small-capacity high-speed memory constituted by, for example, an SRAM (Static RAM), temporarily stores information such as commands and data frequently accessed by the CPU 5000, and directly exchanges information with the CPU 5000. By doing so, the system speeds up.
[0152]
A system controller 5002 implements an interface protocol between a host bus 5008 directly connected to a local pin of the CPU 5000 and a peripheral component interconnect (PCI) bus 5009 as a peripheral bus, and executes a CPU 5000, a main memory 5003, and a cache memory. 5001 and other various resources (such as a hard disk, a flexible disk, etc.) to adjust the timing of the entire system. The system controller 5002 is configured using, for example, TRITON (430FX) manufactured by Intel Corporation of the United States.
[0153]
The main memory 5003 is a semiconductor memory device configured using, for example, a DRAM (Dynamic RAM), and loads a program code executed in the CPU 5000 or is used as a storage area for work data of an execution program. . For example, the cache server application according to the present invention is loaded into the main memory 5003 and executed by the CPU 5000. Further, an access log 1002, a cache management table 1005, a prefetch target page list 1008, a prefetch target site list 1010, a prefetch schedule table 1012, and the like are temporarily stored in the main memory 5003.
[0154]
The operation of writing / reading information to / from the main memory 5003 is performed according to an instruction from the CPU 5000 or the system controller 5002. The main memory 5003 is connected to the CPU and various resources through the system controller 5002, and stores information according to these requests.
[0155]
The host bus 5008 is a means for transmitting information directly connected to the CPU 5000, and can exchange information with the cache memory 5001, the system controller 5002, and the like.
[0156]
The PCI bus 5009 is a means for transmitting information separated from the host bus 5008, and is interconnected by the system controller 5002. Then, the CPU 5000 can access various hardware resources connected on the PCI bus 5009 via the system controller 5002.
[0157]
The HD controller 5004 is connected to a hard disk drive (HDD) 5005 and a PCI bus 5009, and writes / writes information to a specific area in the disk 5005 in response to a disk access request via the PCI bus 5009. The read operation is controlled. On the hard disk, programs executed by the CPU 5000 are installed, and computer files such as programs and data are stored. For example, the cache server application according to the present invention is installed on a hard disk, or an access log 1002, a cache management table 1005, a prefetch target page list 1008, a prefetch target site list 1010, a prefetch schedule Table 1012 and the like are stored on the hard disk.
[0158]
The FD controller 5006 is connected to a flexible disk drive 5007 for removably loading a flexible disk and a PCI bus 5009, and responds to a disk access request via the PCI bus 5009 to specify a specific disk in the flexible disk. The operation of writing / reading information to / from the area is controlled.
[0159]
Alternatively, in the system, a media drive that loads a portable medium other than a flexible disk and performs an access operation may be connected to the PCI bus 5009. This type of portable media is used for transferring programs and data between systems. For example, a cache server application according to the present invention, an access log 1002 used in a cache server operation, a cache management table 1005, a prefetch target page list 1008, a prefetch target site list 1010, a prefetch schedule table 1012 and the like can be moved between multiple systems via portable media.
[0160]
A mouse controller 5010 connects a mouse 5011 (and a keyboard (not shown)) serving as a user input device to a PCI bus 5009, and transmits a mouse movement forced by an operator and other user input operations to the CPU 5000 according to a predetermined sequence. It is designed to communicate. On the CPU 5000 side, for example, a GUI (Graphical User Interface) environment is provided, and the basis of information creation for relatively moving the mouse pointer (arrow pictogram) displayed together with the image displayed on the CRT display 5013 is described. Information can be generated. The CRT display 5013 is connected to the CRTC (5012), and displays a state created by the CPU and other information as an image.
[0161]
A CRTC (CRT controller) 5012 is connected to the PCI bus 5009 and draws drawing information such as a figure on the CRT display 5013 based on an instruction from the CPU 5000 or the like.
[0162]
The network interface 5014 connects the system to an external network such as a LAN or the Internet. The system operates as a cache server described above on a network, receives a request from a terminal via a network interface 5014, distributes a file, obtains a file from a WWW server, analyzes an access sequence for each user, Perform actions such as file prefetching.
[0163]
On the network, programs (data) are transferred (downloaded) between systems. For example, a cache server application according to the present invention, an access log 1002 used in a cache server operation, a cache management table 1005, a prefetch target page list 1008, a prefetch target site list 1010, a prefetch schedule table 1012 and the like can be transferred between systems via a network.
[0164]
[Supplement]
The present invention has been described in detail with reference to the specific embodiments. However, it is obvious that those skilled in the art can modify or substitute the embodiment without departing from the scope of the present invention. That is, the present invention has been disclosed by way of example, and the contents described in this specification should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims described at the beginning should be considered.
[0165]
【The invention's effect】
As described above in detail, according to the present invention, information can be efficiently provided in response to a request of a specific user from information provided to an unspecified number of users in a WWW information space. An excellent information providing system, information providing method, and computer program can be provided.
[0166]
Further, according to the present invention, an excellent information providing system, an information providing method, and a computer, which can provide information with a relatively short response time from a request of a specific user by caching WWW information, Program can be provided.
[0167]
Further, according to the present invention, an information providing system, an information providing method, and a computer, which can provide information in a relatively short response time from a user's request by pre-reading information requested by the user, Program can be provided.
[0168]
Further, according to the present invention, by pre-reading information requested by a user in response to a change in user's preference changing from moment to moment, information can be provided in a relatively short response time from a user's request. An excellent information providing system, information providing method, and computer program can be provided.
[0169]
According to the present invention, by performing prefetching for each user using the user's request log, WWW information requested by a specific user, rather than an unspecified number, can be efficiently cached. In addition, by using a Bayesian network model, which is one of the probabilistic network models, for the look-ahead mechanism, even if the access pattern changes in response to an ever-changing user's preference, this is not considered. Correspondingly, it is possible to stochastically determine a prefetch target, limit the prefetch target, and dynamically respond to a change of the user.
[Brief description of the drawings]
FIG. 1 is a diagram schematically showing a configuration example of a WWW information providing system to which the present invention is applied.
FIG. 2 is a diagram schematically illustrating another configuration example of a WWW information providing system to which the present invention is applied.
FIG. 3 is a diagram schematically showing still another configuration example of the WWW information providing system to which the present invention is applied.
FIG. 4 is a diagram schematically showing a functional configuration of a cache system 1000 according to an embodiment of the present invention.
FIG. 5 is a diagram showing an example of a data structure of a log recorded in an access log 1002.
FIG. 6 is a diagram showing an example of a data structure of a cache management table 1005.
FIG. 7 is a diagram showing an example of a data structure of a prefetch target page list 1008.
FIG. 8 is a diagram showing an example of a data structure of a prefetch target site list 1010.
FIG. 9 is a diagram schematically showing a data structure of a prefetch schedule table 1012.
FIG. 10 is a flowchart showing a processing procedure executed in the request processing unit 1001.
FIG. 11 is a flowchart showing a processing procedure executed in the log extracting unit 1003.
FIG. 12 is a flowchart showing a processing procedure executed in the cache management unit 1004.
FIG. 13 is a flowchart showing a processing procedure executed in a prefetch target page selection unit 1007.
FIG. 14 is a diagram illustrating a configuration example of a Bayesian network model.
FIG. 15 is a diagram illustrating a configuration example of a Bayesian network model.
FIG. 16 is a diagram showing a conditional probability table.
FIG. 17 is a diagram showing a conditional probability observation count table.
FIG. 18 is a flowchart showing a processing procedure executed in a prefetch target site selection unit 1009.
FIG. 19 is a flowchart showing a processing procedure executed in a prefetch scheduler unit 1011.
FIG. 20 is a flowchart showing a processing procedure executed in a prefetch unit 1013.
FIG. 21 is a diagram showing an operation sequence for obtaining a page on a Web server from a user terminal through the intervention of a cache system 1000 (provided that a requested page exists on the cache system 1000). is there.
FIG. 22 is a diagram showing an operation sequence for obtaining a page on a Web server from a user terminal through the intervention of the cache system 1000 (provided that the requested page does not exist on the cache system 1000). is there.
FIG. 23 is a diagram showing an example of a hardware configuration for operating the cache system according to the present invention.
FIG. 24 is a diagram showing an example of a data structure of a cache management table 1005 when terminals of all users are not distinguished.
[Explanation of symbols]
1000 ... Cash system
1001 Request processing unit
1002 ... Access log
1003 ... log extraction unit
1004: Cache management unit
1005: Cache management table
1006 ... Cache disk
1007: Prefetch target page selection unit
1008: Prefetch target page list
1009: Prefetch target site selection section
1010: Prefetch target site list
1011: Prefetch scheduler section
1012: Prefetch schedule table
1013: Prefetch unit
5000 CPU, 5001 Cache memory
5002: System controller, 5003: Main memory
5004 ... HD controller, 5005 ... HDD
5006 ... FD controller, 5007 ... FD
5008: Host bus, 5009: PCI bus
5010 mouse controller, 5011 mouse
5012 ... CRT display, 5013 ... CRT controller
5014 ... Network interface

Claims (13)

  1. An information providing system for providing information consisting of pages having a mutual reference relationship over a plurality of sites,
    Log extraction means for extracting a request log requesting information for each user;
    A prefetch target page selecting unit that predicts a next page to be accessed based on an access sequence of pages included in the request log, and obtains a page to be prefetched for each user;
    Prefetch target site selection means for predicting the next site to be accessed based on the access sequence of the site included in the request log, and seeking a site to be prefetched for each user,
    A prefetch unit that prefetches information based on a page and a site selected by the prefetch target page selection unit and the prefetch target site selection unit;
    An information providing system comprising:
  2. The prefetch target page selection unit and / or the prefetch target site selection unit predicts the next page and / or site to be accessed by the user, using a probability network model for each user that describes the access sequence of the user. The information providing system according to claim 1, wherein:
  3. The prefetch target page selection unit and / or the prefetch target site selection unit updates a probability network model based on an access result of a next user.
    3. The information providing system according to claim 2, wherein:
  4. The prefetch means performs a prefetch operation for a prefetch target in consideration of a network load or the like.
    The information providing system according to claim 1, wherein:
  5. The prefetch unit rearranges the priorities of pages and / or sites to be prefetched based on the prediction in consideration of network load.
    The information providing system according to claim 4, wherein:
  6. The prefetch means stores the prefetched information in a storage area assigned to each user,
    The information providing system according to claim 1, wherein:
  7. An information providing method for providing information including pages having a mutual reference relationship over a plurality of sites,
    A log extracting step of extracting a request log requesting information for each user;
    A prefetch target page selection step of predicting the next page to be accessed based on the access sequence of the pages included in the request log, and seeking a page to be prefetched for each user;
    A prefetch target site selection step of predicting the next site to be accessed based on the access sequence of the site included in the request log and obtaining a site to be prefetched for each user;
    A prefetch step of prefetching information based on the page and site selected by the prefetch target page selection step and the prefetch target site selection step;
    An information providing method, comprising:
  8. In the step of selecting a page to be prefetched and / or the step of selecting a prefetch target site, a page and / or a site to be accessed next by a user is predicted using a probability network model for each user that describes a user's access sequence.
    The information providing method according to claim 7, wherein:
  9. In the prefetch target page selection step and / or the prefetch target site selection step, a probability network model is updated based on an access result of a next user.
    The information providing method according to claim 7, wherein:
  10. In the prefetch step, a prefetch operation for a prefetch target is performed in consideration of a network load or the like.
    The information providing method according to claim 7, wherein:
  11. In the prefetching step, the priorities of pages and / or sites to be prefetched based on the prediction are rearranged in consideration of network load.
    The information providing method according to claim 7, wherein:
  12. In the prefetch step, the prefetched information is stored in a storage area assigned to each user,
    The information providing method according to claim 7, wherein:
  13. A computer program written in a computer-readable format so as to execute a process for providing information consisting of pages having a mutual reference relationship over a plurality of sites on a computer system,
    A log extracting step of extracting a request log requesting information for each user; and a page to be accessed next is predicted based on an access sequence of pages included in the request log, and a page to be prefetched is determined for each user. A step of selecting a prefetch target page to be sought;
    A prefetch target site selection step of predicting the next site to be accessed based on the access sequence of the site included in the request log and obtaining a site to be prefetched for each user;
    A prefetch step of prefetching information based on the page and site selected by the prefetch target page selection step and the prefetch target site selection step;
    A computer program comprising:
JP2003070157A 2003-03-14 2003-03-14 System and method for providing information, and computer program Pending JP2004280405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003070157A JP2004280405A (en) 2003-03-14 2003-03-14 System and method for providing information, and computer program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003070157A JP2004280405A (en) 2003-03-14 2003-03-14 System and method for providing information, and computer program

Publications (1)

Publication Number Publication Date
JP2004280405A true JP2004280405A (en) 2004-10-07

Family

ID=33286979

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003070157A Pending JP2004280405A (en) 2003-03-14 2003-03-14 System and method for providing information, and computer program

Country Status (1)

Country Link
JP (1) JP2004280405A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007243552A (en) * 2006-03-08 2007-09-20 Keio Gijuku Information processor, method and program
JP2008541239A (en) * 2005-05-04 2008-11-20 ヴェントゥーリ ワイヤレス インコーポレーティッド Method and apparatus for increasing HTTP performance of long latency links
JP2009009484A (en) * 2007-06-29 2009-01-15 Nomura Research Institute Ltd Web server, web display terminal, and web browsing program
JP2009123047A (en) * 2007-11-16 2009-06-04 Nec Corp Terminal cache management apparatus, terminal cache management method and program
JP2009541877A (en) * 2006-06-30 2009-11-26 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Maschines Corporation Method, system, apparatus and computer program for controlling web objects (method and apparatus for caching broadcast information)
JP2010033112A (en) * 2008-07-25 2010-02-12 Fujitsu Ltd Content reproduction device, content reproduction method, and content reproduction program
EP2302527A1 (en) 2009-09-17 2011-03-30 Sony Corporation Information processing apparatus, data acquisition method, and program
EP2312469A1 (en) 2009-09-17 2011-04-20 Sony Corporation Information processing apparatus, data display method, and program
EP2320336A2 (en) 2009-09-17 2011-05-11 Sony Corporation Information processing apparatus, data acquisition method, and program
US8010580B2 (en) 2006-07-18 2011-08-30 Canon Kabushiki Kaisha Information browser, method of controlling same, and program
JP2013077046A (en) * 2011-09-29 2013-04-25 Fujitsu Ltd Data storage control device, data storage control program and data storage control method
WO2014155663A1 (en) * 2013-03-29 2014-10-02 楽天株式会社 Data provision device, data provision method, and data provision program
JP2015194832A (en) * 2014-03-31 2015-11-05 パイオニア株式会社 Content output device, content distribution server, content output method and content output program
JP2016533594A (en) * 2014-08-13 2016-10-27 シャオミ・インコーポレイテッド Web page access method, web page access device, router, program, and recording medium
JP2017507385A (en) * 2013-12-22 2017-03-16 インターデイジタル パテント ホールディングス インコーポレイテッド Accelerate web applications with personalized caching or pre-rendering
US10069925B2 (en) 2013-12-04 2018-09-04 Sony Corporation Server device and information processing method

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008541239A (en) * 2005-05-04 2008-11-20 ヴェントゥーリ ワイヤレス インコーポレーティッド Method and apparatus for increasing HTTP performance of long latency links
JP2007243552A (en) * 2006-03-08 2007-09-20 Keio Gijuku Information processor, method and program
JP2009541877A (en) * 2006-06-30 2009-11-26 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Maschines Corporation Method, system, apparatus and computer program for controlling web objects (method and apparatus for caching broadcast information)
US8407260B2 (en) 2006-06-30 2013-03-26 International Business Machines Corporation Method and apparatus for caching broadcasting information
US8010580B2 (en) 2006-07-18 2011-08-30 Canon Kabushiki Kaisha Information browser, method of controlling same, and program
JP2009009484A (en) * 2007-06-29 2009-01-15 Nomura Research Institute Ltd Web server, web display terminal, and web browsing program
JP2009123047A (en) * 2007-11-16 2009-06-04 Nec Corp Terminal cache management apparatus, terminal cache management method and program
JP2010033112A (en) * 2008-07-25 2010-02-12 Fujitsu Ltd Content reproduction device, content reproduction method, and content reproduction program
EP2312469A1 (en) 2009-09-17 2011-04-20 Sony Corporation Information processing apparatus, data display method, and program
EP2320336A2 (en) 2009-09-17 2011-05-11 Sony Corporation Information processing apparatus, data acquisition method, and program
US8504695B2 (en) 2009-09-17 2013-08-06 Sony Corporation Information processing apparatus, data acquisition method, and program
US8972852B2 (en) 2009-09-17 2015-03-03 Sony Corporation Two-stage rendering of web page containing scripts
EP2302527A1 (en) 2009-09-17 2011-03-30 Sony Corporation Information processing apparatus, data acquisition method, and program
US8478882B2 (en) 2009-09-17 2013-07-02 Sony Corporation Information processing apparatus, data acquisition method, and program
JP2013077046A (en) * 2011-09-29 2013-04-25 Fujitsu Ltd Data storage control device, data storage control program and data storage control method
WO2014155663A1 (en) * 2013-03-29 2014-10-02 楽天株式会社 Data provision device, data provision method, and data provision program
US10282482B2 (en) 2013-03-29 2019-05-07 Rakuten, Inc. Data provision device, data provision method, and data provision program
US10069925B2 (en) 2013-12-04 2018-09-04 Sony Corporation Server device and information processing method
JP2017507385A (en) * 2013-12-22 2017-03-16 インターデイジタル パテント ホールディングス インコーポレイテッド Accelerate web applications with personalized caching or pre-rendering
JP2015194832A (en) * 2014-03-31 2015-11-05 パイオニア株式会社 Content output device, content distribution server, content output method and content output program
JP2016533594A (en) * 2014-08-13 2016-10-27 シャオミ・インコーポレイテッド Web page access method, web page access device, router, program, and recording medium

Similar Documents

Publication Publication Date Title
US10182127B2 (en) Application-driven CDN pre-caching
US9530099B1 (en) Access to network content
US9497256B1 (en) Static tracker
US9304928B2 (en) Systems and methods for adaptive prefetching
EP2653987B1 (en) Displaying web pages without downloading static files
CA2845121C (en) Managing information associated with network resources
JP5869124B2 (en) Improved web browsing with cloud computing
Olston et al. Web crawling
Wang et al. How far can client-only solutions go for mobile browser speed?
Saadat et al. PDDRA: A new pre-fetching based dynamic data replication algorithm in data grids
US9189553B2 (en) Methods and systems for prioritizing a crawl
Balasubramanian et al. Web search from a bus
Venkataramani et al. The potential costs and benefits of long-term prefetching for content distribution
KR20160030381A (en) Method, device and router for access webpage
US6542967B1 (en) Cache object store
EP1362289B1 (en) Multi-tier caching system
US6836827B2 (en) Delay cache method and apparatus
Ali et al. A survey of web caching and prefetching
US5727129A (en) Network system for profiling and actively facilitating user activities
US8352597B1 (en) Method and system for distributing requests for content
US7363291B1 (en) Methods and apparatus for increasing efficiency of electronic document delivery to users
DE69834129T2 (en) Process and system for promoting information
US9065835B2 (en) Redirecting web content
US7269608B2 (en) Apparatus and methods for caching objects using main memory and persistent memory
Pallis et al. A clustering-based prefetching scheme on a Web cache environment

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20060201

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20081125

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20090108

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20090303