WO2001042990A2 - File transmission from a first web server agent to a second web server agent - Google Patents

File transmission from a first web server agent to a second web server agent Download PDF

Info

Publication number
WO2001042990A2
WO2001042990A2 PCT/US2000/042745 US0042745W WO0142990A2 WO 2001042990 A2 WO2001042990 A2 WO 2001042990A2 US 0042745 W US0042745 W US 0042745W WO 0142990 A2 WO0142990 A2 WO 0142990A2
Authority
WO
WIPO (PCT)
Prior art keywords
web
file
web server
server agent
computer
Prior art date
Application number
PCT/US2000/042745
Other languages
English (en)
French (fr)
Other versions
WO2001042990A3 (en
Inventor
Freeland Abbott
Marco Lara
Depankar Neogi
Geoff Hardy
Original Assignee
Inktomi Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/532,483 external-priority patent/US7143193B1/en
Application filed by Inktomi Corporation filed Critical Inktomi Corporation
Priority to GB0215573A priority Critical patent/GB2374699B/en
Priority to JP2001544209A priority patent/JP2003516586A/ja
Priority to DE10085300T priority patent/DE10085300T1/de
Priority to AU47161/01A priority patent/AU4716101A/en
Publication of WO2001042990A2 publication Critical patent/WO2001042990A2/en
Publication of WO2001042990A3 publication Critical patent/WO2001042990A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/10015Access to distributed or replicated servers, e.g. using brokers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection

Definitions

  • This invention relates to managing multiple web servers, and more particularly to a web service system that allows a system operator to collect content from each web server to a single computer in the web service system.
  • web servers are used to respond to users' web page requests, which are transmitted over the computer network.
  • Web page requests also referred to as content requests, typically are made by a browser running on a user's computer.
  • a web server monitors one or more computer network address/port endpoints for web page requests and responds to the web page requests by transmitting web pages to the requester.
  • Web servers may be special purpose devices, or they may be implemented with a software program running on a general purpose computer. The service capacity of a web server limits the number of web page requests that may be received and responded to in a given time interval.
  • a web service system may include one web server or more than one web server.
  • the web service system is designed so that the multiple web servers each respond to web page requests.
  • a user's web page request is directed towards one of the web servers, and that web server responds to that web page request. It is also typical for web service systems designed to receive a large number of web page requests to include many web servers.
  • a system operator or operators manage the content offered by the various web servers.
  • a system operator may sometimes wish to access the data that has been generated and stored on each web server.
  • a system operator may want to access the web server log files generated by each web server. This can be difficult and time-consuming, because it can be awkward to gather and access files located on different computers.
  • a system and method for distributing content directly from each web server to a single computer is useful to a system operator. For example, it is often desirable to transfer files generated on web servers to a central location for access by a system operator. If files generated by multiple web servers are aggregated on a single computer, processing and analysis can be performed more easily on all of the files.
  • the invention relates to a system and method for transmitting content from one computer to another in a web service system.
  • the web service system includes web servers which provide web pages in response to web page requests.
  • First and second web server agents provide an interface between the web service system and first and second computers, respectively.
  • the first web server agent runs on the first computer and identifies at least a portion of a file for transmission to the second web server agent running on the second computer. At least a portion of the file from the first web server agent is transmitted to the second web server agent and then stored by the second web server agent.
  • the file is identified, transmitted, and stored in its entirety. In another embodiment, the file is identified, transmitted, and stored repeatedly. In yet another embodiment, the method includes identifying a portion of the file that was not previously transmitted. In still another embodiment, the method includes identifying a portion of the file that contains content added subsequent to any previous identification. In another embodiment, the method also includes executing a computer program that operates on the file. In another embodiment, the file to be transmitted is a log file containing information about user requests to the on or more web servers in the web service system.
  • the invention in general, in another aspect, relates to a method for transmitting content from one computer to another in a web service system.
  • a first web server agent running on a first computer in the web service system determines that a first file with a first name was renamed to a second name such that the first file has the second name and a second file has the first name.
  • the first file having the second name is then identified.
  • a second web server agent runs on a second computer and is notified that the first file was renamed. At least a portion of the first file is transmitted from the first web server agent to the second web server agent.
  • the second file having the first name is identified and a portion of the second file is transmitted from the first web server agent to the second web server agent.
  • the invention in general, in yet another aspect, relates to a method for transmitting content from one computer to another in a web service system.
  • the web service system provides web pages in response to web page requests.
  • a first web server agent provides an interface between the web service system and a first computer.
  • the first web server agent runs on the first computer, and identifies and runs a computer program.
  • the output of the computer program is then transmitted from the first web server agent to a second web server agent running on a second computer, and the output is then stored by the second web server agent.
  • the invention in general, in still another aspect, relates to a method for transmitting content from one computer to another in a web service system.
  • a first web server agent provides an interface between the web service system and a first computer.
  • the first web server agent runs on the first computer and identifies at least a portion of a file for transmission to a second web server agent running on a second computer.
  • the identified portion of the file is transmitted from the first web server agent to the second web server agent.
  • the second web server agent provides the received portion of transmitted file as input to a computer program.
  • the invention in general, in still another aspect, relates to a method for transmitting content from one computer to another in a web service system.
  • a first web server agent provides an interface between the web service system and the first computer.
  • the first web server agent runs on a first computer and identifies and runs a computer program.
  • the output of the computer program is then transmitted from the first web server agent to a second web server agent running on a second computer.
  • the second web server agent provides the received portion of transmitted file as input to a computer program.
  • the invention in general, in still another aspect, relates to a method for transmitting content from one computer to another in a web service system.
  • a first web server agent runs on a first computer and provides an interface between the web service system and the first computer.
  • the first web server agent identifies at least a portion of a file for transmission to a second web server agent running on a second computer.
  • the first computer transmits at least the portion of the file from the first web server agent to the second web server agent.
  • the second computer includes a storage medium for storing, by the second web server agent, at least the portion of the transmitted file.
  • the invention in general, in still another aspect, relates to a computer program embodied on a computer-readable medium.
  • the computer program includes an identification code segment for identifying, by a first web server agent running on a first computer in a web service system, at least a portion of a file for transmission to a second web server agent running on a second computer.
  • the web service system includes web servers that provide web pages in response to web page requests.
  • the first and second web server agents each provide an interface between the web service system and the first and second computers, respectively.
  • the computer program also includes a transmitting code segment for transmitting at least the portion of the file from the first web server agent to the second web server agent.
  • FIG. 1 is a block diagram of an embodiment of a web service system according to the invention.
  • FIG. 2 is more detailed block diagram of an embodiment of a web service system.
  • FIG. 3 is a flowchart of file transfer according to an embodiment of the invention.
  • FIG. 4 is a flowchart of file portion transfer according to another embodiment of the invention.
  • FIG. 5 depicts file rotation
  • FIG. 6 is a flow chart of file transfer which detects file rotation (FIG. 5) according to an embodiment of the invention.
  • a system for serving web pages has a plurality of web servers and provides a system operator with features and tools to coordinate the operation of multiple web servers.
  • the system might have only one web server, but typically it includes more than one.
  • the system can manage traffic by directing web page requests, which originate, generally, from web browsers on client computers, to available web servers, thus balancing the web page request service load among the multiple servers.
  • the system can collect data on web page requests and web server responses to those web page requests, and provides reporting of the data as well as automatic and manual analysis tools.
  • the system can monitor for specific events, and can act automatically upon the occurrence of such events.
  • the events include predictions or thresholds that indicate impending system problems.
  • the system can include crisis management capability to provide automatic error recovery, and to guide a system operator through the possible actions that can be taken to recover from events such as component failure or network environment problems.
  • the system can present current information about the system operation to a system operator.
  • the system can manage content replication with version control and data updates. Some or all of this functionality can be provided in specific embodiments.
  • an embodiment of a web service system 90 receives web page requests from a browser 1.
  • a web page is electronic content that can be made available on a computer network 2 in response to a request.
  • An example of a web page is a data file that includes computer executable or interpretable information, graphics, sound, text, and/or video, that can be displayed, executed, played, processed, streamed, and/or stored and that can contain links, or pointers, to other web pages.
  • Requests typically originate from web browsers 1 and are communicated across a communications network 2.
  • the communications network 2 is an intranet.
  • the communications network 2 is the global communications network known as the Internet.
  • a browser 1 can be operated by users to make web page requests. Browsers 1 can also be operated by a computer or computer program, and make requests automatically based on the computer's programming.
  • the web page requests can be made using hypertext transfer protocol ("http") format, and also can be made using other protocols that provide request capability.
  • http hypertext transfer protocol
  • an embodiment of a web service system 90 includes various components 100-126.
  • the components communicate over one or more computer networks.
  • the physical location of the components does not impact the capability or the performance of the system, as long as the communications links between the various components have sufficient data communication capability.
  • the web service system 90 can function across firewalls of various designs, and can be configured and administered remotely.
  • the web service system 90 manages one or more hosts 100. Two hosts 100 are shown as an example. An embodiment of the web service system 90 can have any number of hosts 100. Each host 100 can be a computer system commercially available and capable of using a multithreaded operating system such as UNIX or WINDOWS NTTM. Each host 100 can have at least one network connection to a computer network, for example the Internet or an intranet, or any other network, that allows the host 100 to provide web pages in response to web page requests. Each host 100 includes at least one web server 102.
  • the web server 102 can be any web server that serves web pages in response to web page requests received over a computer network.
  • Two examples of such web servers are commercially available as the NETSCAPE ENTERPRISE SERVER, available from Netscape Communications Corporation of Mountain View, California and the MICROSOFT INTERNET INFORMATION SERVICES SERVER, available from Microsoft Corporation of Redmond, Washington.
  • the web server 102 is capable of receiving web page requests 113 from web clients, also referred to as browsers and/or web page requesters.
  • a web page request 1 13 from a browser is also referred to as a content request, or from the point of view of a web server, as a "hit.”
  • the web page requests are part of a series of communications with the web server 102 involving several requests and responses.
  • a session is an extended interaction with the web server.
  • a shorter interaction for example the purchase of an item, is referred to as a transaction.
  • a session could involve several transactions.
  • the user interacts with a web server 102 by making an initial request 1 13 of the web server 102, which results in the web server 102 sending a web page in response.
  • the web page can contain information, and also, or alternatively, pointers to other requests that the user can make of the web server 102 or, perhaps, other web servers.
  • the requests are for information that must be retrieved from a database, and sometimes the request includes information to be stored in a database.
  • the request requires processing by the web server 102, or interaction with another computer system.
  • Sophisticated web servers and browsers can interact in various ways.
  • One example of an application is a set of pages providing information about a company.
  • Another example of an application is a series of pages that allow a user to conduct transactions with her savings bank.
  • Two sets of web pages may be considered a single application, or they can be considered two separate applications.
  • a set of web pages might provide information about a bank, and a customer service set of web pages might allow transaction of business with the bank. Whether a set of web pages is considered to be one application or several applications is a decision made by the application designer.
  • the web service system 90 is capable of delivering one or more applications to users.
  • the web service system 90 can be configured so that some subset of the web servers 102 exclusively serve a single application. In one embodiment, some web servers 102 serve a subset of the available applications, and other web servers 102 serve other applications. In another embodiment, all web servers 102 serve all available applications.
  • the web pages that are presented to the user in response to web page requests 1 13 from the user's web browser can be stored on the host 100 file system or on another file system accessible to the web server 102. Some or all of the web page content can be generated by the web server 102 by processing data available to the web server 102. For example, for web pages that are documents about a topic, the web pages can be created (designed) and stored in the web server 102 file system. In response to a web page request, such a web page can be sent to the user just as it is stored in the file system. In a banking transaction system, however, it is likely that information about the user's bank account will be stored in a database. The web server 102 can generate a web page containing the user's account information by making database requests each time the user requests the page. Often, web pages are stored partially in the file system, and partly are generated by the web server 102 when the request is made.
  • Various techniques are used to store status information, also referred to as the "state" of a user's session with the web server 102.
  • the user can develop a state during her interaction with the web server 102 via the requests made to the web server 102 and the web pages received in response to those requests.
  • the user's state can, as one example, include information identifying the user.
  • the state can include information specifying web pages the user has already requested, or the options the user has selected in her interaction with the system.
  • the state can include items the user has selected for purchase from a commercial sales application.
  • some information about or identifying the state of the session is stored in the client web browser, for example as a cookie that identifies the user to the web service system 90, and some information can be stored in the web server 102.
  • Each web server 102 can generate and maintain a log file of all the requests 113 for web pages made to the web server, the web server responses to these requests, as well as of various events occurring during the web server's operation, such as a status of computer programs running on the server, component failures or network environment problems.
  • each web server 102 is capable of receiving other information from a browser, and storing it on the host 100.
  • a host 100 can have any number of web servers 102 running on it, depending on host capacity, performance, and cost considerations.
  • the host 100 includes one web server 102.
  • a host includes more than one web server 102.
  • the one web server 102 on the host 100 in FIG. 2 is a simplified illustrative example and is not intended to limit the number of possible web servers that could run on a host.
  • Each web server 102 monitors at least one network address and port, also referred to as an endpoint.
  • a particular address and port is called an endpoint because it is a virtual endpoint for communication.
  • a network connection is made between one address/port endpoint and another.
  • the web server 102 receives requests directed to one of its endpoints and responds to those requests with data in the form of web pages.
  • a web server 102 that accepts requests at multiple network address/port endpoints can perform as if it were a plurality of distinct web servers 102 even though it is actually implemented as one web server 102.
  • Such a web server is referred to as a multiple endpoint web server.
  • a multiple endpoint web server can be described as if it were in fact multiple web servers 102 with each web server 102 receiving requests on a network address/port endpoint.
  • such a multiple endpoint web server has one web server interface 104 that is the interface for all of the multiple endpoints.
  • Each web server 102 can have associated with it a web server interface 104.
  • the web server interface can be a plug-in, filter, or other software associated with the web server 102 that serves as an interface between the web server 102 and other components of web service system 90.
  • the term web server interface is distinct from the network interface that can be present on the host 100.
  • the web server 102 has a web server interface 104.
  • Each web server interface 104 on a host 100 can communicate with an agent 106.
  • Each host 100 includes an agent 106.
  • the agent 106 provides a web service system 90 interface with the host 100, serving as an intermediary between the manager 1 10 and any other software running on host 100, including the operating system.
  • the agent 106 links the web server interface 104 (if present) with the web service system 90.
  • the agent 106 also links the host 100 with the web service system 90.
  • the agent 106 is implemented in software using the JAVA programming language.
  • the agent 106 can run in the background. On a UNIX system it can run as a deamon, on WINDOWS NTTM it can run as a service. Even on a host that has multiple web servers, there is generally only one agent 106 running on the host 100, however it is possible to have more than one.
  • Each agent 106 has access to a database 108, which contains information about the system components.
  • the agent 106 communicates with the one or more web servers 102 on a host 100 via the web server interface 104 associated with each web server 102.
  • the web server interface 104 provides the agent 106 with information about the web page requests received from users, and the pages sent in response to the requests.
  • communication from the web server interfaces 104 to the agent 106 takes place over shared memory channel.
  • the agent 106 reserves shared memory, and the web server interfaces 104 are able to write data into the shared memory. This has the advantage of being faster than using sockets, and allows the agent 106 to receive data from all web server interfaces 104 at one buffer. This communication link could also be implemented with sockets or other interprocess communication.
  • the agent 106 on a host 100 communicates with a web service system manager 110.
  • the manager 110 receives information from the agents 106 about the status of the hosts 100 and the web servers 102.
  • the manager 110 can send commands to the agents 106 to configure the hosts 100, to start, stop, or pause the web servers 102, and to manage the load on the web servers 102.
  • the manager 1 10 has access to a logging database 114 that is used for logging system activity and events.
  • the manager 110 also has access to a managed object database 112, used for storing information about the various components of the system.
  • the manager 1 10 is also in communication with one or more consoles 116A-116X, generally referred to as 1 16.
  • the consoles 116 provide a user interface for the system operator.
  • the system operator can monitor the status of the system and configure the system via a console.
  • the manager 110 can be run on the same host 100 as other web service system 90 components, such as one of the web servers 102 or a traffic manager 120, or
  • the agents 106 have the capability to communicate directly with each other, as shown by link 127.
  • communication takes place over a TCP/IP socket, opened from a first one of the agents 106 to a second agent 106. Messages can be sent on that socket to communicate files and information about files.
  • the first agent 106 may not be able to open a socket to the second agent 106, because of a firewall between them.
  • the first agent 106 opens a socket to the manager 110.
  • the first agent 106 sends a message, via the manager 110, requesting that the second agent 106 open a socket to the first agent 106.
  • the manager passes on this request to the second agent 106, and the second agent 106 opens a socket to the first agent 106.
  • the first agent 106 can then use this socket to send messages to the second agent.
  • the communication protocol allows a first agent 106 to transfer an entire file to the second agent 106. It also allows the first agent 106 to transfer a portion of a file to the second agent 106 to be appended to a file on the second agent. It also allows the first agent 106 to instruct the second agent 106 to rename a file on the second agent. It also allows the first agent 106 to delete a file on the second agent.
  • the manager 1 10 communicates with a traffic manager 120, also referred to as an interceptor.
  • a traffic manager 120 directs web page requests 113 to a web server 102. It is not necessary for a web service system to include a traffic manger 120, or any particular type of traffic manager 120.
  • the traffic manager 120 receives information and commands from the manager 110.
  • the traffic manager 120 also receives information and commands from a traffic manager control program 122.
  • the traffic manager control program can be on the same computer system as the traffic manager 120, or alternatively it can run on another system.
  • the traffic manager 120 receives web page requests 1 13 and refers the requests to one of the web servers 102.
  • the traffic manager 120 sends a message to browser in response to the web page request, referring the browser to one of the web servers 102.
  • the browser then makes the request 113 directly to the web server 102.
  • the trafficmanager may pass the request through to the web server 102, and pass the response back to the browser (not shown in the figure).
  • Part of the management capability of the web service system 90 is accomplished by monitoring the web page requests made of the web servers 102 and the resulting load on the web servers 102 and the hosts 100. Web page requests can be directed to balance the load among the web servers 102.
  • the traffic manager 120 is the point of first contact for a user. The traffic manager 120 receives a web page request from a user and "refers" the user's web browser to an appropriate web server 102 for that request. The user's web browser is referred by responding to the web page request with a referral to a web page on an appropriate web server 102. This referral capability can be accomplished with a capability incorporated into the hypertext transfer protocol, but can also be accomplished in other ways.
  • the user may or may not be aware that the web browser has been referred to an appropriate web server 102.
  • the user accesses the application on that web server 102 and receives responses to its web page request from that web server 102.
  • that web server 102 under the direction of the manager 110, can refer the user back to the traffic manager 120 or to another web server 102 capable of delivering the application.
  • the traffic manager 120 receives requests from users and redirects the user's requests to web servers 102.
  • the traffic manager 120 is used to direct all users to one web server 102, such as another traffic manager 120 or a single endpoint. In this manner, the traffic manager 120 acts as a shunt, meaning it directs all requests directed towards one or more web servers on a host to another web server 102.
  • the traffic manager 120 receives status information from the manager 1 10 and uses that information to redirect users.
  • the status information includes server availability and load, administrator's changes, and application or web server 102 start and shut down actions.
  • the traffic manager 120 is designed for speed and security.
  • the traffic manager 120 is often the front door to the system, and so its performance affects the perceived performance of the entire web service system 90. It may be useful to locate the traffic manager 120 as close, in the network topology sense, to the backbone as possible. It is then necessarily the most exposed component of the web service system 90.
  • the traffic manager 120 is implemented in hardware. In another embodiment, the traffic manager 120 is a software program running on a host computer. In one software embodiment, the traffic manager 120 is a standalone program that runs on a server-class computer capable of running a multi-threaded operating system. Under UNIX, for example, the traffic manager 120 can run as a daemon. Under WINDOWS NTTM, the traffic manager 120 can run as a service.
  • the traffic manager 120 is an internet protocol bridge or router that directs requests made to one endpoint to the endpoint belonging to a web server 102. In this way, the traffic manager 120 directs the web page requests to one or more web servers 102.
  • An example of such a traffic manager is the LOCALDIRECTOR available from Cisco Systems, Inc. of San Jose, California.
  • the traffic manager 120 is a web switch, such as a CONTENT SMART WEB SWITCH available from Arrowpoint Communications, Inc. of Westford, Massachusetts. The traffic manager 120 receives each web page request and, based on the request, directs the request to a web server.
  • the web service system 90 also includes a version controller, also referred to as a content distributor 125.
  • a content distributor 125 manages version and content replication, and may provide content updates for the various web servers 102 in the web service system 90.
  • a system operator interface to the content distributor 125 is provided by a content control 126.
  • the content distributor 125 and the content control 126 are each a stand-alone process that operates on the host 100.
  • the content distributor 125 and the content control 126 operate on the same host as the manager 1 10.
  • content distributor 125 and the content control 126 operate on other hosts.
  • the content distributor 125 and the content control 126 can operate on the same host, or on a different host.
  • the content distributor 125 is incorporated into the functionality of the manager 110, or other components of the system 90.
  • a file is identified on a web server 102 (STEP 500) by an agent 106, referred to as the transmitting agent.
  • the file can be any type of file, and contain any type of content.
  • a file can be identified in various ways.
  • a file can be identified manually by the system operator, for example by identifying the file system path and file name.
  • a file can be identified by a computer, for example by matching a set of predefined attributes to all files (or a set of files) until a match is found. The file that matches the predefined attributes is then identified.
  • the file may be the output from a predefined application program, system utility, JAVA class, or other process. In one embodiment, the identified file is the output from such a computer program which is streamed directly to the agent.
  • the identified file may be processed (STEP 501).
  • the identified file may be provided as input to an application program, operating system utility, JAVA class, or other process, for processing.
  • the processing may modify the format of the file, compress the file, substitute addresses in one format for another (e.g. resolve IP addresses into DNS names), or otherwise prepare the file for transmission, or extract information from the file.
  • the agent is implemented in JAVA, and the identification, processing, and transmission functions are implemented as JAVA methods.
  • a system operator can provide a JAVA class other than the default to implement additional functions. If the system operator provides a JAVA class other than the default, that JAVA class is used. This allows the agent to, for example, use the output from a process instead of an input file, perform various types of processing, or use a particular communications protocol for transmission.
  • the file is transmitted to a receiving computer (STEP 502).
  • the receiving computer can be one of the hosts 100, the content distributor 125, or another computer in the web service system 90.
  • the receiving computer could be selected by a system operator before a particular file transmission, or the same receiving computer can be used for all transmissions.
  • transmission is accomplished using the agent-agent protocol described above.
  • file transmission can be accomplished with various other protocols.
  • the file is received by an agent, referred to as a receiving agent, which is running on the receiving computer (STEP 503).
  • the file is processed by the receiving computer (STEP 504).
  • the received file may be provided as input to an application program, operating system utility, JAVA class, or other process.
  • the processing can include converting the transmitted file into a different file format, uncompressing the file, and so on.
  • the processing can include incorporating the data from the transmitted file into a database.
  • the processing can include incorporating the data from the transmitted file into a file that includes data from files from more than one host.
  • the received file is stored by the receiving computer (STEP 505).
  • the file can be stored in the receiving computer's file system with the same name as the file had on the transmitting computer's file system.
  • the file can also be stored with a different name, for example with a file name that includes the name of the transmitting computer.
  • the file storage is accomplished by providing the file as the input to an application program, system utility, JAVA class, or other process.
  • the agent is implemented in JAVA, and the receiving, processing, and storage functions are implemented as JAVA methods.
  • a system operator can provide a JAVA class other than the default to implement additional functions. If the system operator provides a JAVA class other than the default, that JAVA class is used. This allows the agent to. for example, use a particular protocol for communication, perform various types of processing, or provide output to a process instead of a file.
  • a system operator configures a "job," which is a one-time or repeating transfer that takes place at a particular time or time interval.
  • the specification of a job includes: the job name, which is an identifier assigned by the system operator for easy recognition of the job; the source host, which is the name of the host on which the transmitting agent is operating; the source filename, which is the file system path and file name; a transmitting computer program identifier, which is (in one embodiment) a JAVA class that is either the default or a class provided by the system operator; a schedule, which may be a time or time interval, or require manual initiation; attributes, including whether the source file is continuously updated and whether it is a rotated file (as described with reference to FIG.
  • the destination host which is the host on which the receiving agent is running
  • the destination file name which is the file system path and file name
  • a receiving computer program identifier which is a JAVA class that is either the default or a class provided by the system operator.
  • an agent 106 When an agent 106 is started, it scans all pending jobs, determining which files on the web server 102 need to be transmitted according to the defined schedules. At the appropriate time, the agent 106 attempts to connect an agent on the receiving host. Once the connection to the receiving agent is initiated, the file may be processed by the source agent (e.g. compressed if the job is configured for compression), transmitted to the receiving agent, processed by the receiving agent (e.g. uncompressed if so configured), and installed into the destination location. In another embodiment, after being transmitted to a receiving agent and uncompressed, the file is fed into a computer program for processing, and the result is then stored on the receiving computer.
  • the source agent e.g. compressed if the job is configured for compression
  • the receiving agent e.g. uncompressed if so configured
  • some files may be continuously updated as time passes.
  • An example of a file that is continuously updated is a log file, which is a running record of events and/or status reported by a computer program such as a web server or operating system.
  • a computer program such as a web server or operating system.
  • such files are updated as events occur, periodically to record status, or both.
  • such a file is updated by an application that makes changes to a file by appending some event or status information to the end of a file.
  • the system can take appropriate action. For example, the portion of the file (if any) that has not been previously transmitted is identified (STEP 510). If no changes have been made, no further action is taken. This determination can be accomplished by storing the length of a file that was previously transmitted, and identifying any appended portion (i.e. changes) included in the file since the previous transmission.
  • the changes are processed (STEP 511) as described above with regard to STEP 501 of FIG. 3, and transmitted (STEP 512) to a receiving computer.
  • the receiving computer processes the received data (STEP 513) in the manner described above with regard to STEP 504 of FIG. 3.
  • the file changes can be processed for inclusion in the file (for example decompressed) or processed for inclusion in a database or other aggregation of data.
  • the changes are stored (STEP 515).
  • the changes are made to the copy of the changed file that is located on the receiving computer.
  • the changes are stored in a separate file.
  • the changes are provided to a computer program, as described above.
  • a web server or other application will "rotate" or rename files that are continuously updated.
  • a file that is continuously updated may be renamed from its first, original name to a second name.
  • a new file is given the original name and this new file is updated from that time forward.
  • the renamed file becomes an archive, and the new file receives the continuous updates.
  • This technique often is used for log files, because it prevents the files from growing infinitely long, and makes it possible to identify the file in which particular data might be located.
  • a web server log file called "SERVER.LOG” 517A is periodically renamed with a name that includes the date every day at noon.
  • a new file with the original name "SERVER.LOG” 518B, 519C is created.
  • the "SERVER.LOG” file 517A is renamed to "SERVER.LOG-01-1 1-99" 517B to indicate that the log was renamed on November 1 , 1999.
  • a new "SERVER.LOG” file 518B is created at that time. This new "SERVER.LOG” file 518B receives the continuous updates from noon on November 1, 1999 until 11:50am on November 2, 1999.
  • an agent determines that rotation occurred, and transfers the file data accordingly. Changes from the rotated file that were not previously transmitted are transmitted, and changes from the new file are transmitted as well.
  • the file is identified (STEP 520) as before.
  • the agent determines if the identified file is the same file as the agent accessed previously, in other words, the agent determines if rotation has occurred (STEP 521). For example, on a UNIX system, the file system assigns inodes to files in the file system; when a file is changed it is given a new inode number. On a UNIX system, the agent can determine if a file has rotated by determining whether the file has the same inode number. If the file has a different inode number, rotation has occurred.
  • the file system does not use inodes
  • another mechanism such as comparing creation or modification date of the current file with the previous version, must be used.
  • the original file that is renamed is referred to as the first file
  • the newly created empty file is referred to as the second file.
  • the agent identifies the first file (STEP 522), which was renamed. In one embodiment, this occurs by searching for all files that have the same prefix characters as the original file, and determining which of the files is the most recent, for example by comparing the inode, creation or modification date, or initial contents of the file to recorded known values of the first file. If they match, the candidate file is identified as the first file. In another embodiment, the identification occurs by searching for all files that have the same suffix characters as the original file.
  • the agent determines whether changes (updates) to the file were made since the last transmission. If changes were made, they are identified (STEP 523).
  • the changes optionally are processed (STEP 524) as described above, and the either processed or unprocessed changes are transmitted to the receiving agent (STEP 525).
  • the receiving agent receives the changes (STEP 526) and may process the changes (STEP 527).
  • the receiving agent stores the either processed or unprocessed changes (STEP 528) as described above.
  • the receiving agent is notified about the new file name (STEP 529). In one embodiment, the notification is an instruction to rename the file so that it has the same name as the renamed file on the transmitting host.
  • the receiving agent renames the file.
  • the transmitting agent determined what changes from the first source file name were made, and the same changes are made to the first destination file name, thus renaming the destination file. Note that if it was determined (in STEP 523) that changes were not made to the previous file, no changes are transmitted (STEP 525), received (STEP 526), or stored (STEP 528), and processing continues at STEP 529. In either case, the second file is identified (STEP 530).
  • the second file is identified (STEP 530), and if rotation did not occur, the first file is identified (STEP 520). In either case, processing continues as in FIG. 4, and changes to the file are identified (STEP 531). If rotation occurred, the entire file is a "change.” If rotation did not occur, then changes since the last transmission are identified. Optionally, the changes are processed (STEP 532) as described above.
  • the (processed or unprocessed) changes are transmitted to the receiving agent (STEP 533).
  • the receiving agent receives the changes (STEP 534), and optionally processes the changes (STEP 535) as described above.
  • the (processed or unprocessed) changes are stored by the receiving agent (STEP 536), also as described above.
  • a number of the hosts 550-1 , 550-2, 550-3, 550-4, 550-5, generally 550, in a distributed system each communicate files to one host 550-1.
  • the distributed system may be the web service system of FIG. 2, or it can be another type of web service system, another type of service system (for example a file service system), or can be any system that includes multiple hosts 550.
  • Each host 550 includes an agent 556-1, 556-2, 556-3, 556-4, 556-5, generally 556.
  • the agents 556 can be the agents 106 as described above, which provide an interface to the web service system 90, or in other systems the agents can have other functions in addition to content collection. Alternatively, the agents 556 can be agents 556 used solely for the purpose of content collection.
  • Each host 550 includes a file system 553-1, 553-2, 553-3, 553-4, 553-5, generally 553.
  • the file systems 553 are accessible by the agents 556.
  • the file systems 553 can be implemented on media used for temporary and permanent data storage, for example a hard disk, floppy disk, removable disk, optical disk, RAM, ROM, FLASH ROM, CD-R, CD-RW, and so on.
  • the file system is generally implemented by the operating system running on the host 550.
  • the file system may be physically part of the host 550, or accessible to and in communication with the host 550 over a serial bus, communications network or other such link.
  • each of the agents 556-2, 556-3, 556-4, 556-5 communicates files to one of the agents 556-1 using at least one of the methods described above.
  • Each of the agents has content collection jobs assigned to it.
  • the agent determines which of the content collection jobs are to be executed and scheduled, and executes and schedules the jobs as appropriate.
  • content collection is performed using one or more of the methods of FIG. 3, FIG. 4, and FIG. 5.
  • some subset (or all) of the files stored on the file systems 553-2, 553-3, 553-4, 553-5 are replicated on the receiving host's 550-1 file system 553-1.
  • the transfer may involve processing, file data conversion, integration of the file data into a table or database, or other changes.
  • the result is that the files on the hosts 550 are all collected onto a single system. The system operator then only needs to look in one place to access the files from the various hosts 550.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)
PCT/US2000/042745 1999-12-13 2000-12-12 File transmission from a first web server agent to a second web server agent WO2001042990A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
GB0215573A GB2374699B (en) 1999-12-13 2000-12-12 Content collection
JP2001544209A JP2003516586A (ja) 1999-12-13 2000-12-12 コンテンツ収集
DE10085300T DE10085300T1 (de) 1999-12-13 2000-12-12 Dateiübertragung von einem ersten Web-Serveragenten an einen zweiten Web-Serveragenten
AU47161/01A AU4716101A (en) 1999-12-13 2000-12-12 Content collection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/532,483 1999-12-13
US09/532,483 US7143193B1 (en) 1998-05-29 1999-12-13 Content collection

Publications (2)

Publication Number Publication Date
WO2001042990A2 true WO2001042990A2 (en) 2001-06-14
WO2001042990A3 WO2001042990A3 (en) 2002-02-14

Family

ID=24122007

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/042745 WO2001042990A2 (en) 1999-12-13 2000-12-12 File transmission from a first web server agent to a second web server agent

Country Status (5)

Country Link
JP (3) JP2003516586A (enrdf_load_stackoverflow)
AU (1) AU4716101A (enrdf_load_stackoverflow)
DE (1) DE10085300T1 (enrdf_load_stackoverflow)
GB (1) GB2374699B (enrdf_load_stackoverflow)
WO (1) WO2001042990A2 (enrdf_load_stackoverflow)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2388450A (en) * 2002-05-08 2003-11-12 Hewlett Packard Co Relevance feedback for advanced text search
JP2004519757A (ja) * 2000-09-06 2004-07-02 オラクル・インターナショナル・コーポレイション 媒介物に記憶されるデータへのサービスからのアクセス
GB2454777A (en) * 2007-11-13 2009-05-20 Intuit Inc Managing a network of autonomous software agents
US9420050B1 (en) * 2011-08-16 2016-08-16 Verizon Digital Media Services Inc. Log reporting for a federated platform

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7143193B1 (en) 1998-05-29 2006-11-28 Yahoo! Inc. Content collection
US6976093B2 (en) 1998-05-29 2005-12-13 Yahoo! Inc. Web server content replication
US7581006B1 (en) 1998-05-29 2009-08-25 Yahoo! Inc. Web service
JP4506841B2 (ja) * 2008-01-23 2010-07-21 日本電気株式会社 ファイル管理方法、ファイル管理装置およびプログラム
JP2013062627A (ja) * 2011-09-12 2013-04-04 Nippon Telegr & Teleph Corp <Ntt> ネットワーク情報蓄積装置及び方法及びプログラム
CN103701773A (zh) * 2013-11-29 2014-04-02 金蝶软件(中国)有限公司 基于Web Service的文件传输、服务调用方法和系统

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0954719A (ja) * 1995-08-14 1997-02-25 Meidensha Corp クライアント・サーバ型データベース
JP2830826B2 (ja) * 1996-03-27 1998-12-02 日本電気株式会社 分散ファイルの同期システムと方法
JP3042600B2 (ja) * 1996-08-22 2000-05-15 日本電気株式会社 分散ファイルの同期方式
JPH10224349A (ja) * 1997-02-03 1998-08-21 Hitachi Ltd ネットワークアクセス分析システム
JP4134357B2 (ja) * 1997-05-15 2008-08-20 株式会社日立製作所 分散データ管理方法
GB2327783A (en) * 1997-07-26 1999-02-03 Ibm Remotely assessing which of the software modules installed in a server are active
JPH11238036A (ja) * 1998-02-19 1999-08-31 Nec Corp 分散処理障害ログ自動管理システム
NL1008926C1 (nl) * 1998-04-17 1999-10-19 Koninkl Kpn Nv Netwerkserver en werkwijze voor het aanpassen van verwijzingsgegevens.
JPH11327964A (ja) * 1998-05-14 1999-11-30 Toshiba Tec Corp 店舗監視システム
JP2000076218A (ja) * 1998-08-27 2000-03-14 Ntt Data Corp データベース間の同期化システム
ATE373276T1 (de) * 1999-11-05 2007-09-15 Media Transfer Ag Caching-verfahren und cachesystem

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004519757A (ja) * 2000-09-06 2004-07-02 オラクル・インターナショナル・コーポレイション 媒介物に記憶されるデータへのサービスからのアクセス
GB2388450A (en) * 2002-05-08 2003-11-12 Hewlett Packard Co Relevance feedback for advanced text search
GB2388450B (en) * 2002-05-08 2005-07-20 Hewlett Packard Co Neural network feedback for enhancing text search
GB2454777A (en) * 2007-11-13 2009-05-20 Intuit Inc Managing a network of autonomous software agents
GB2454777B (en) * 2007-11-13 2013-02-13 Intuit Inc System and method for managing an agent network
US8489668B2 (en) 2007-11-13 2013-07-16 Intuit Inc. Open platform for managing an agent network
AU2008237538B2 (en) * 2007-11-13 2014-01-09 Intuit Inc. Open platform for managing an agent network
US9420050B1 (en) * 2011-08-16 2016-08-16 Verizon Digital Media Services Inc. Log reporting for a federated platform

Also Published As

Publication number Publication date
JP2012079350A (ja) 2012-04-19
DE10085300T1 (de) 2002-10-31
JP2003516586A (ja) 2003-05-13
GB2374699A (en) 2002-10-23
WO2001042990A3 (en) 2002-02-14
JP2010003311A (ja) 2010-01-07
GB2374699B (en) 2004-07-14
AU4716101A (en) 2001-06-18
JP4958951B2 (ja) 2012-06-20
GB0215573D0 (en) 2002-08-14

Similar Documents

Publication Publication Date Title
US7143193B1 (en) Content collection
JP4958951B2 (ja) コンテンツ収集
US8108347B2 (en) Web server content replication
US7035943B2 (en) Web server content replication
JP4545943B2 (ja) ウェブサーバコンテンツ複製
JP3980596B2 (ja) サーバを遠隔かつ動的に構成する方法およびシステム
EP1410247B1 (en) A network storage system
US8024306B2 (en) Hash-based access to resources in a data processing network
EP2234049B1 (en) Background service process for local collection of data in an electronic discovery system
JP2002501254A (ja) ネットワークを介したコンテンツをアドレス可能なデータに対するアクセス
EP1076957A1 (en) Access control method and apparatus
JP2012529684A (ja) バックアップ操作において重複排除を行うためのソース分類
JP2007520760A (ja) ネットワーク環境でのコンテンツ同期化システム及び同期化方法
US20030046357A1 (en) Intelligent content placement in a distributed computing network
US20080154986A1 (en) System and Method for Compression of Data Objects in a Data Storage System
EP1927061A2 (en) Systems and methods for remote storage of electronic data
WO2002063816A2 (en) Method and system for routing network traffic based upon application information
US20040073681A1 (en) Method for paralled data transmission from computer in a network and backup system therefor
JP3981342B2 (ja) 計算機の運用管理方法及び装置
JP2004220231A (ja) サービス提供システム

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase

Ref document number: 2001 544209

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 200215573

Country of ref document: GB

Kind code of ref document: A

RET De translation (de og part 6b)

Ref document number: 10085300

Country of ref document: DE

Date of ref document: 20021031

WWE Wipo information: entry into national phase

Ref document number: 10085300

Country of ref document: DE

122 Ep: pct application non-entry in european phase
REG Reference to national code

Ref country code: DE

Ref legal event code: 8607