WO2001088708A2 - Methods, apparatus, and articles-of-manufacture for network-based distributed computing - Google Patents

Methods, apparatus, and articles-of-manufacture for network-based distributed computing Download PDF

Info

Publication number
WO2001088708A2
WO2001088708A2 PCT/US2001/015247 US0115247W WO0188708A2 WO 2001088708 A2 WO2001088708 A2 WO 2001088708A2 US 0115247 W US0115247 W US 0115247W WO 0188708 A2 WO0188708 A2 WO 0188708A2
Authority
WO
WIPO (PCT)
Prior art keywords
processor
worker
network
computing system
distributed computing
Prior art date
Application number
PCT/US2001/015247
Other languages
French (fr)
Other versions
WO2001088708A3 (en
Inventor
James Bernardin
Peter Lee
Original Assignee
Datasynapse, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/583,244 external-priority patent/US6757730B1/en
Priority claimed from US09/777,190 external-priority patent/US20020023117A1/en
Application filed by Datasynapse, Inc. filed Critical Datasynapse, Inc.
Priority to AU2001263056A priority Critical patent/AU2001263056A1/en
Publication of WO2001088708A2 publication Critical patent/WO2001088708A2/en
Publication of WO2001088708A3 publication Critical patent/WO2001088708A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/465Distributed object oriented systems

Definitions

  • the present invention relates generally to the fields of distributed computing and Internet-based applications and services. More particularly, the invention relates to methods, apparatus and articles-of-manufactures relating to the collection, organization, maintenance, management and commercial exploitation of network-connected, distributed computing resources.
  • Distributed (or parallel) computing is a well-established field. Over the past few decades, thousands of distributed computing architectures have been proposed, and hundreds have been constructed and evaluated.
  • Distributed computing architectures are typically characterized as being either “coarse-grained” or “fine-grained,” depending upon the size or complexity of the individual processing elements (or “nodes”) that perform the computational work.
  • the individual processing elements are generally fully functional computing elements, such as single-chip microprocessors, capable of individually performing a variety of useful tasks.
  • the fine-grained approach typically relies on a large number of processing elements, each of which has very limited computational capabilities.
  • the Connection Machine manufactured by the now-defunct Thinking Machines Corporation.
  • the Connection Machine thousands of very-simple processing elements were connected by a highly-efficient message-routing network. Even though individual processing elements lacked the ability to perform much useful on their own, the efficiency of the message-routing network made it possible to cooperatively deploy large numbers of processing elements, on certain problems.
  • coarse-grained parallel computing systems seldom have the luxury of communication networks that operate with latencies at, or near, the clock speed of individual processing elements.
  • the more sophisticated processing elements used in a coarse-grained distributed processing systems typically cannot be packed into a small volume, like a single chip, board or chassis.
  • communications must travel across chip-to-chip, board-to-board, or even chassis-to-chassis boundaries, as well as greater physical distances, all of which causes inherent and unavoidable increases in latency.
  • coarse-grained parallel architectures have, for many years, been viewed by persons skilled in the art as useful only for "computationally intensive"-as opposed to "communication intensive”-tasks.
  • a typical computationally intensive task is prime factorization of large integers.
  • Alexandrov, SuperWeb Towards a Global Web-Based Parallel Computing Infrastructure, citeseer.nj.nec.com/ cachedpage/80115 (1997), at 1 ("We expect this approach to work well for noncommunication intensive applications, such as prime number factorization, Monte-Carlo and coarse-grained simulations, and others").
  • one object of the present invention relates to software infrastructure designed to capture a generalized problem-solving capability, handle data throughputs in excess of 100x the productivity of the
  • SETI effort require no access to a worker's local disk drive (to assuage security concems) and motivate retail Internet users to participate by paying them in cash or higher-value non-monetary compensation (frequent flyer miles, lottery, discounted products/services, etc.).
  • Another object of the invention relates to a distributed networking software system that enables thousands, potentially scaled to millions, of consumer PCs on the Internet to be networked together to function as a Virtual Super Computer (“VSC").
  • VSC Virtual Super Computer
  • Another object of the invention relates to a distributed networking software system that enables a "CPU/bandwidth" website exchange to be operated, which web site anonymously and securely brokers demand from web-centric applications seeking integrated (i) data processing and/or (ii) high- bandwidth access to the Internet with retail supply of such resources.
  • Such "brokering" platform aggregates CPU capability at a commercially-significant unit cost advantage versus the equivalent CPU horsepower of a high-end supercomputer, and aggregates Internet bandwidth access at a commercially- significant cost advantage versus T1 , T3, or OC3 high-speed connectivity.
  • Another object of the invention relates to a distributed networking software system that enables on-demand computing power, with functionality similar to an electric utility, where corporate users can "plug in” to the network's website exchange for powering a wide range of web applications, thereby capitalizing on a distributed problem-solving approach.
  • Another object of the invention relates to a distributed networking software system that enables flexibility in client deployment, where clients who have unique security concerns or applications which do not require Web integration can licence the software platform for deployment on an intranet or extranet basis only.
  • Another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture that facilitate an always-live distributed computing system.
  • Another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture that provide substantially continuous monitoring of worker processor activity and/or task progress in a distributed computing environment.
  • Another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture that provide prompt alerts of worker processor status changes that can affect the always-live operation of a network-based distributed computing system.
  • Another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture for providing reliable and/or predictable resource deployment and processing activity in a wide-area network based distributed computing system.
  • a another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture for providing reliable and/or predictable quality-of-service in a peer-to-peer network based distributed computing system.
  • the enabling platform is preferably a lightweight overlay that is intended to integrate into virtually any corporate network configuration.
  • each retail PC downloads a lightweight installation which invisibly runs in the background and automatically visits a broker website seeking work, whenever the PC's screensaver is activated.
  • Another aspect of the invention relates to a multi-tiered server/meta- server architecture configured to support various system management, load balancing, disaster recovery, and security features of the invention.
  • Another aspect of the invention relates to security.
  • Security is preferably thoroughly integrated into the framework of the invention using a conservative "restricted" approach.
  • the invention contemplates use by a variety of parties with potentially conflicting commercial interests; thus, the invention cannot rely on a network of friendly volunteers, and must assume that any one of its retail participants could potentially be hostile in intent.
  • the invention preferably makes heavy use of security features available in the Java 2 Platform.
  • the Java Secure Socket Extension (JSSE) and Java Authentication and Authorization Service (JAAS), as well as other Internet- standard cryptography and security APIs provide a rich set of tools to use in connection with the invention.
  • the customer is given flexibility in selecting among different levels of security protection.
  • the customers' code will be "cloaked” to protect it from decompilation attacks, and signed using private key and predetermined certificate authority ("CA").
  • CA private key and predetermined certificate authority
  • lightweight messages between the customer and servers/brokers will be encrypted. These messages may contain the disguised locations of Java instruction code and data inputs.
  • these heavyweight items are then also authenticated.
  • Result data is preferably signed by the client program, and server/broker acts as the CA for the worker and customer relationships, and the server/broker preferably only accepts completed work from registered workers who have their private keys registered with the system.
  • one aspect of the invention relates to a method for performing distributed, bandwidth- intensive computational tasks, comprising: providing Internet access to at least one broker processor, the at least one broker processor configured to receive jobs from Internet-connected customers; receiving a job from a customer via the Internet; in response to receipt of the job from the customer, directing a plurality of Internet-connected worker processors to perform a plurality of worker tasks related to the received job; awaiting execution of the worker tasks, the execution characterized by a predominance of worker processor-Internet communication activity; and, upon completion of the execution, confirming the completion of the execution via the Internet.
  • the plurality of worker processors are preferably collectively utilizing, on average, at least 25% of their total available communication bandwidth, and possibly as much as
  • At least part of a worker's execution may include: (i) searching the Internet in accordance with a search query supplied by the customer; (ii) creating an index; (iii) creating a database; (iv) updating a database; (v) creating a report; (vi) creating a backup or archival file; (vii) performing software maintenance operations; comparing objects downloaded from the Internet; (viii) processing signals or images downloaded from the Internet; (ix) broadcasting audio and/or video to a plurality of destinations on the Internet; and/or sending e-mail to a plurality of destinations on the Internet.
  • another aspect of the invention relates to a method for reducing the cost of performing a bandwidth-intensive job on the Internet, the method comprising: (i) transmitting a job execution request to a broker processor over the Internet; (ii) selecting, in response to the job execution request, a plurality of Internet-connected worker processors to be used in executing the job, the selection of worker processors being performed, at least in part, based on one or more bandwidth-related consideration(s); and (iii) using the selected worker processors to execute, at least in part, the job.
  • the worker processor selection is preferably based, at least in part, on one or more bandwidth-related consideration selected from the list of: (i) the types of Internet connections installed on candidate worker processors; (ii) the locations of candidate worker processors; (iii) the time of day; and (iv) historical performance statistics of candidate worker processors.
  • another aspect of the invention relates to a method for exploiting unused computational resources on the Internet, comprising: (i)recruiting prospective worker processors over the Internet, the recruiting including: (a) providing Internet- accessible instructions; (b) providing Internet-downloadable worker processor software; (c) providing an Internet-accessible worker processor operating agreement; and (d) storing a plurality of work processor preferences; (ii) maintaining a registry of worker processors, the maintaining including: (a) storing a plurality of URLs used to address the worker processors; (b) storing a plurality of worker processor profiles, the profiles including information related to hardware and software configurations of the worker processors; and (c) storing a plurality of worker processor past performance metrics; (iii) selecting a plurality of worker processors to collectively execute a job, the selecting being based, at least in part, on worker processor past performance metrics maintained by the worker processor registry; and (iv) using the selected plurality of worker processors to execute the
  • At least some of the prospective worker processors may be connected to the Internet via a satellite connection, a fixed wireless connection, and/or a mobile wireless connection.
  • Recruiting prospective worker processors may further include specifying the type and amount of compensation to be provided in exchange for use of worker processor resources, and/or providing an on-line means of accepting the worker processor operating agreement.
  • Maintaining a registry of worker processors may further include determining the performance of worker processors listed in the registry by executing one or more benchmark programs on the worker processors, and optionally updating the worker processor past performance metrics in accordance with measured benchmark program performance statistics.
  • Selecting may be further based, at least in part, on one or more bandwidth-related consideration(s) selected from the list of: (i) the types of Internet connections installed on the worker processors; (ii) the locations of the worker processors; (iii) the time of day; and (iv) one or more of the stored preferences.
  • another aspect of the invention relates to a method for reselling Internet bandwidth associated with individual DSL-connected Internet workstations, the method comprising: (i) entering on-line-completed operating agreements with a plurality of DSL-connected Internet users, the agreements providing for use of a plurality of DSL-connected Internet workstations controlled by the users; (ii) executing a customers's distributed task, using a plurality of the DSL-connected Internet workstations; (iii) storing, for each of the DSL-connected Internet workstations used in the distributed task execution, a bandwidth utilization metric; (iv) compensating the DSL-connected Internet users whose workstations where used in the distributed task execution, the compensation being determined, at least in part, based upon the bandwidth utilization metrics associated with the workstations used in the distributed task execution; and (v) charging the customer whose distributed task was executed using the DSL-connected Internet workstations.
  • Executing a customer's distributed task may include: (i) receiving an execution request message from the customer over the Internet; (ii) processing the execution request using an Internet-connected broker processor; and (iii) initiating distributed execution of the task by sending messages, over the Internet, to a plurality of the DSL- connected Internet workstations.
  • the compensation is preferably determined, at least in part, by one or more metric(s) selected from the list consisting of: (i) the amount of real time used by the DSL-connected Internet workstations in executing the distributed task; (ii) the amount of processor time used by the DSL-connected Internet workstations in the executing the distributed task; (iii) the amount of primary storage used by the DSL-connected Internet workstation in the executing the distributed task; (iv) the amount of secondary storage used by the DSL-connected Internet workstation in executing the distributed task; (v) the time of day during which the execution occurred; and (vi) the geographic location(s) of the DSL-connected Internet workstations.
  • the plurality of DSL- connected Internet workstations may operate in accordance with any one of the following protocols: ADSL, HDSL, IDSL, MSDSL, RADSL, SDSL, and VDSL (or other similar, or future, protocols).
  • another aspect of the invention relates to a method for reselling Internet bandwidth associated with individual cable modem-connected Internet workstations, the method comprising: (i) enrolling a plurality of cable modem-connected Internet users by installing worker processor software on a plurality of cable modem- connected Internet workstations controlled by the users; (ii) using the installed worker processor software to execute a distributed task on a plurality of the cable modem-connected Internet workstations; (iii) using the installed worker processor software to compute, for each workstation used in the distributed task execution, a billing metric determined, at least in part, by the amount of data communication involved in executing the distributed task; (iv) compensating the cable modem-connected Internet users whose workstations where used
  • another aspect of the invention relates to a method for executing jobs, comprised of a plurality of tasks, in a networked computing environment, the method o comprising: (i) providing networked access to at least one broker processor, the broker processor configured to receive a job from a user, unpack the job into a plurality of executable tasks, and direct a plurality of worker processors to initiate execution of the tasks; (ii) maintaining performance metrics for worker processors; (iii) monitoring completion of tasks by the worker processors and, s upon completion, updating the performance metrics; (iv) using the performance metrics to select, at least in part, worker processors to initiate execution of additional tasks; and (v) using the performance metrics to determine, at least in part, the charges to be billed to the user for execution of the job.
  • the method may further include (a) using said performance metrics to detect aberrant o performance of worker processors executing tasks; and (b) terminating execution of tasks on worker processors that display aberrant
  • another aspect of the invention relates to a method for operating a distributed computing system, the system including a multiplicity of network-connected 5 worker processors and at least one supervisory processor, the supervisory processor configured to assign tasks to, and monitor the status of, the worker processors, the method comprising: assigning tasks to a plurality of the worker processors by sending task-assignment messages, via the network, from the at least one supervisory processor to the plurality of worker processors; and o monitoring, on a substantially continuous basis, the status of at least each of the plurality of assigned worker processors until each processor completes its assigned task.
  • Monitoring, on a substantially continuous basis, the status of at least each of the plurality of assigned worker processors may involve receiving status messages from at least each of the plurality of assigned worker processors until each processor completes its assigned task. Monitoring, on a substantially continuous basis, the status of at least each of the plurality of worker processors may also involve detecting abnormalities in the operation of the plurality of assigned worker processors, and/or their associated network connections, by detecting an absence of expected status message(s) received by the at least one supervisory processor.
  • Detection of an absence of expected status message(s) received by the at least one supervisory processor may be repeated at least once every ten minutes, once every five minutes, once every two minutes, once each minute, once every thirty seconds, once every ten seconds, once every second, once every tenth of a second, once every hundredth of a second, once each millisecond, or at whatever interval is needed to assure the continuity-of-service demanded by the client.
  • Monitoring, on a substantially continuous basis, the status of at least each of the plurality of assigned worker processors may also involve detecting the presence of non-assigned-task-related activity on the worker processors.
  • Detecting the presence of non-assigned-task-related activity on the worker processors may involve running an activity monitor program on each of the assigned worker processors.
  • the activity monitor programs running on each of the assigned worker processors may behave substantially like screen saver programs.
  • the activity monitory programs running on each of the assigned worker processors may send, in response to detection of keyboard activity (or mouse activity, pointer activity, touchscreen activity, voice activity, local execution of substantial non-assigned-task-related processes, or any combination thereof), a message to at least one of the at least one supervisory processor(s).
  • Detecting the presence of non-assigned-task-related activity on the worker processors may also involve determining, in response to an activity monitor message received by at least one of the at least one supervisory of the processor(s), that at least one of the assigned worker processors is undertaking non-assigned-task- related activity.
  • the activity monitor message may be generated by an activity monitor program running on one of the assigned worker processors.
  • another aspect of the invention relates to a method for operating an always-live .
  • distributed computing system comprising: providing a pool of worker processors, each having installed worker processor software, and each connected to an always-on, peer-to-peer computer network; providing at least one supervisory processor, also connected to the always-on, peer-to-peer computer network; using the at least one supervisory processor to monitor, on a substantially continuous basis, the status of worker processors expected to be engaged in the processing of assigned tasks; and using the at least one supervisory processor to reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks.
  • Providing a pool of worker processors may further involve ensuring that each of the worker processors is linked to the always-on, peer-to-peer computer network through a high-bandwidth connection having, for example, a data rate of least 100 kilobits/sec, 250 kilobits/sec, 1 megabit/sec, 10 megabits/sec, 100 megabits/sec, 1 gigabit/sec, or whatever particular bandwidth may be demanded by the client's needs (e.g.. required throughput and data intensiveness of the application).
  • Using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks may involve sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks.
  • the process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is preferably repeated at least once every minute, second, tenth of a second, hundredth of a second, millisecond or other interval, as needed to meet client requirements.
  • Using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks may also involve periodically checking to ensure that a heartbeat message has been received, within a preselected frequency interval, from each worker processor that is expected to be engaged in the processing of assigned tasks.
  • the preselected frequency interval may be set at or less than one minute, ten seconds, one second, one tenth of a second, one hundredth of a second, one millisecond, or other appropriate value, as needed.
  • Using the at least one supervisory processor to reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks may also involve: detecting aberrant behavior among the worker processors expected to be engaged in the processing of assigned tasks; and reassigning tasks expected to be completed by the aberrant-behaving worker processor(s) to other available processor(s) in the worker processor pool.
  • another aspect of the invention relates to a method for operating a network-connected processor as a processing element in a distributed processing system, the method comprising: installing software that enables the network-connected processor to receive tasks from, and provide results to, one or more independent, network-connected resource(s); and using the software installed on the network-connected processor to provide substantially continuous status information to an independent, network-connected resource.
  • Using the software installed on the network-connected processor to provide substantially continuous status information to an independent, network-connected resource may involve sending a heartbeat message to the independent, network- connected resource at least once every second, tenth of a second, hundredth of a second, millisecond, etc.
  • Using the software installed on the network-connected processor to provide substantially continuous status information to an independent, network-connected resource may also involve responding to status-request messages, received from the independent, network-connected resource, within a predetermined response time, such as one second, one tenth of a second, one hundredth of a second, one millisecond, etc.
  • Using the software installed on the network-connected processor to provide substantially continuous status information to an independent, network-connected resource may also involve sending, in response to a change in status of the network-connected processor, a status-update message to the independent, network-connected resource within a preselected update interval, such as one second, one tenth of a second, one hundredth of a second, one millisecond, etc.
  • the change in status that initiates the sending of a status-update message may include any local activity indicator (such as keyboard activity, other processes in the process queue, etc.) that indicates additional demand for the processing resources of the network- connected processor.
  • a distributed computing system comprising: a multiplicity of worker processors; at least one supervisory processor, configured to assign tasks to, and monitor the status of, the worker processors; an always-on, peer-to-peer computer network linking the worker processors and the supervisory processor(s); and at least one of the at least one supervisory processor(s) including a monitoring module, which monitors the status of worker processors expected to be executing assigned tasks, so as to ensure that the distributed computing system maintains always-live operation.
  • the monitoring module may receive status messages from at least each of the worker processors expected to be executing assigned tasks.
  • the monitoring module may be used to detect abnormalities in the operation of the worker processors expected to be executing assigned tasks, and/or their associated network connections, by, for example, detecting an absence of expected status messages received from the worker processors.
  • the monitoring module may repeatedly check for an absence of expected status messages at a frequency of at least once each minute, at least once every ten seconds, at least once each second, at least once every tenth of a second, etc.
  • the monitoring module may also be used to detect the presence of non-assigned-task-related activity on the worker processors expected to be executing assigned tasks.
  • Activity monitor programs may be run on each of the worker processors expected to be executing assigned tasks.
  • the activity monitor programs comprise Screensaver programs.
  • the activity monitor programs may be configured to detect one or more of the following types of non-assigned- task-related activity: keyboard activity; mouse activity; pointer activity; touchscreen activity; voice activity; and execution of substantial non-assigned-task-related processes.
  • an always-live distributed computing system comprising: a pool of worker processors, each having installed worker processor software, and each connected to an always-on, peer-to-peer computer network; and at least one supervisory processor, also connected to the always-on, peer-to-peer computer network, and configured to assign tasks to the worker processors, monitor, on a substantially continuous basis, the status of worker processors expected to be engaged in the processing of assigned tasks and reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks.
  • the computer network may have a bandwidth of at least 250 kilobits/second, at least 1 megabit/second, etc.
  • the at least one supervisory processor may monitor the status of worker processors expected to be engaged in the processing of assigned tasks by sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks. Such status-request message(s) may be sent at a frequency of at least once every 10 seconds, at least once each second, at least twenty times each second, etc.
  • the at least one supervisory processor may monitor the status of worker processors expected to be engaged in the processing of assigned tasks by periodically checking to ensure that a heartbeat message has been received, within a preselected frequency interval, from each worker processor that is expected to be engaged in the processing of assigned tasks.
  • the preselected frequency interval may be, for example, one second, one tenth of a second, one hundredth of a second, one millisecond, etc.
  • another aspect of the invention relates to a processing element for use in a distributed processing system, the processing element comprising: at least one processor; memory; at least one high-bandwidth interface to a computer network; and worker processor software, configured to receive tasks via the high-bandwidth interface and to provide substantially continuous status information via the high-bandwidth interface.
  • the substantially continuous status information may be provided by sending periodic heartbeat messages.
  • the substantially continu us status information may also be provided by sending prompt responses to received status-request messages.
  • the substantially continuous status information may also be provided by promptly sending a status-update message in response to changes in status.
  • another aspect of the invention relates to article(s)-of-manufacture for use in connection with a network-based distributed computing system, the article(s)-of- manufacture comprising at least one computer-readable medium containing instructions which, when executed, cause: assignment of tasks to a plurality of worker processors via the network; and monitoring, on a substantially continuous basis, of the status of at least each of the plurality of assigned worker processors until each such processor completes its assigned task.
  • another aspect of the invention relates to article(s)-of-manufacture for use in connection with an always-live distributed computing system, the article(s)-of-manufacture comprising at least one computer-readable medium containing instructions which, when executed, cause: a pool of worker processors to install worker processor software provided via an always-on, peer-to-peer computer network; provide communication paths between the worker processors and at least one supervisory processor via the always-on, peer-to-peer computer network; cause the at least one supervisory processor to monitor, on a substantially continuous basis, the status of worker processors expected to be engaged in the processing of assigned tasks; and cause the at least one supervisory processor to reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks.
  • another aspect of the invention relates to article(s)-of-manufacture for use in connection with a processing element constituting a part of a distributed computing system, the article(s)-of-manufacture comprising at least one computer-readable medium containing instructions which, when executed, cause: worker processor software to be installed that permits the processing element to receive tasks from, and provide results to, one or more independent, network-connected resource(s); and the installed worker processor software to be executed and provide substantially continuous status information to one or more of the o independent, network-connected resource(s).
  • An important concept underlying the certain aspects of the present invention is the idea of using redundancy, either full or partial, to mitigate quality-of-service problems that plague many distributed computing approaches.
  • critical tasks In most distributed processing jobs, there will exist one or more 5 "critical tasks” for which a delay in task completion can disproportionately affect the overall completion time for the job; in other words, a delay in completion of a critical task will have a greater effect than a comparable delay in completion of at least some other "non-critical" task.
  • a "critical task” may be a task which, for whatever reason (e.g. historical behavior, need to retrieve data o via unreliable connections, etc.), poses an enhanced risk of delay in completion, even if such delay will not disproportionately impact overall job completion.)
  • Certain aspects of the present invention are based, at least in part, on the inventors' recognition that quality-of-service problems in distributed 5 computing are frequently caused by delays in completing critical task(s), and that such quality-of-se ⁇ /ice problems can be effectively mitigated by through redundancy.
  • One aspect of the invention provides methods, apparatus and articles-of-manufacture for assigning additional (i.e.. redundant) resources to ensure timely completion of a job's critical task(s).
  • additional (i.e.. redundant) resources to ensure timely completion of a job's critical task(s).
  • the critical task(s) receive redundant resource assignment; alternatively, a job's various tasks may be assigned redundant resources in accordance with their relative criticality — e.g..
  • marginally critical tasks are each assigned to two processors, critical tasks are each assigned to three processors, and the most critical tasks are each assigned to four or more processors.
  • Another aspect of the invention provides methods, apparatus and articles-of-manufacture for selectively assigning higher-capability (e.g.. faster, more memory, greater network bandwidth, etc.) processing elements/resources to a job's more critical tasks.
  • another aspect of the invention relates to methods, apparatus or articles-of- manufacture for improving quality-of-service in a distributed computing system including, for example, a multiplicity of network-connected worker processors and at least one supervisory processor, the supervisory processor configured to assign tasks to the worker processors, the methods/apparatus/articles-of- manufacture involving, for example, the following: identifying one or more of the tasks as critical task(s); assigning each of the tasks, including the critical task(s), to a worker processor; redundantly assigning each of the one or more critical task(s) to a worker processor; and monitoring the status of the assigned tasks to determine when all of the tasks have been completed by at least one worker processor.
  • the methods, apparatus or articles of manufacture may further involve monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s).
  • Monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s) may include receiving status messages from at least each of the worker processor(s) that have been assigned non-critical task(s) until each the processor completes its assigned task.
  • Monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s) may also involve detecting abnormalities in the operation of the worker processor(s) that have been assigned non-critical task(s), and/or their associated network connections, by detecting an absence of expected status message(s) received by the at least one supervisory processor.
  • Such act of detecting an absence of expected status message(s) received by the at least one supervisory processor is preferably repeated at a preselected interval, such as at least once every ten minutes, at least once each minute, at least once each second, at least once every tenth of a second, or at any other appropriate interval selected to maintain an expected quality-of-service.
  • Such activity monitory programs may also be configured to send a message to at least one of the at least one supervisory processor(s) in response to detection of any of the following: (i) mouse activity; (ii) pointer activity; (iii) touchscreen activity; (iv) voice activity; and/or (v) execution of substantial non-assigned-task-related processes.
  • another aspect of the invention relates to methods, apparatus or articles-of-manufacture for operating a peer-to-peer distributed computing system, involving, for example, the following: providing a pool of worker processors, each having installed worker processor software, and each connected to an always-on, peer-to-peer computer network; providing at least one supervisory processor, also connected to the always-on, peer-to-peer computer network; using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks; and using the at least one supervisory processor to redundantly assign one more critical task(s) to one or more additional worker processors.
  • Providing a pool of worker processors may also involve ensuring that each of the worker processors is linked to the always-on, peer-to-peer computer network through a high-bandwidth connection at, for example, a data rate of at least 100 kilobits/sec, at least 250 kilobits/sec, at least 1 megabit/sec, at least 10 megabits/sec, at least 100 megabits/sec, or at least 1 gigabit/sec.
  • Using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks may include sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks.
  • the process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is preferably repeated on a regular basis, such as at least once every second, at least once every tenth of a second, at least once every hundredth of a second, or at least once every millisecond.
  • Using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks may involve periodically checking to ensure that a heartbeat message has been received, within a preselected frequency interval, from each worker processor that is expected to be engaged in the processing of assigned tasks.
  • the preselected frequency interval may be less than ten minutes, less than two minutes, less than one minute, less than twenty seconds, less than one second, less than one tenth of a second, less than one hundredth of a second, less than one millisecond, etc.
  • another aspect of the invention relates to methods, apparatus or articles-of-manufacture for performing a job using a peer-to-peer network-connected distributed computing system, the job illustratively comprised of a plurality of tasks, the methods/apparatus/ articles-of-manufacture involving, for example, the following: initiating execution of each of the plurality of tasks on a different processor connected to the peer-to-peer computer network; initiating redundant execution of at least one of the plurality of tasks on yet a different processor connected to the peer-to-peer computer network; and once each of the plurality of tasks has been completed by at least one processor, reporting completion of the job via the peer-to-peer computer network.
  • At least one of the plurality of tasks that is/are redundantly assigned is/are critical task(s).
  • the methods/apparatus/articles-of-manufacture may further involve monitoring, on a periodic basis, to ensure that progress is being made toward completion of the job. Such monitoring may be performed at least once every five minutes, at least once every two minutes, at least once each minute, at least once every ten seconds, at least once a second, at least once every tenth of a second, at least once every hundredth of a second, at least once every millisecond, etc.
  • another aspect of the invention relates to methods, apparatus or articles-of-manufacture for performing a job using a plurality of independent, network-connected processors, the job illustratively comprising a plurality of tasks, the methods/apparatus/articles-of-manufacture involving, for example, the following: assigning each of the plurality of tasks to a different processor connected to the computer network; redundantly assigning at least some, but preferably not all, of the plurality of tasks to additional processors connected to the computer network; and using the computer network to compile results from the assigned tasks and report completion of the job.
  • Redundantly assigning at least some of the plurality of tasks to additional processors may involve assigning critical tasks to additional processors, and preferably involves assigning at least one critical task to at least two processors.
  • the methods/apparatus/articles-of-manufacture may further involve generating a heartbeat message from each processor executing an assigned task, preferably on a regular basis, such as at least once every second, at least once every tenth of a second, at least once every hundredth of a second, at least once every millisecond, etc.
  • another aspect of the invention relates to methods, apparatus or articles-of-manufacture for performing a job using a pool of network-connected processors, the job illustratively comprising a plurality of tasks, the number of processors in the pool greater than the number of tasks in the job, the methods/apparatus/ articles-of-manufacture involving, for example, the following: assigning each of the plurality of tasks to at least one processor in the pool; redundantly assigning at least some of the plurality of tasks until all, or substantially all, of the processors in the pool have been assigned a task; and using the computer network to compile results from the assigned tasks and report completion of the job. Redundantly assigning at least some of the plurality of tasks preferably includes redundantly assigning a plurality of critical tasks.
  • another aspect of the invention relates to methods, apparatus or articles-of-manufacture for using redundancy, in a network-based distributed processing environment, to avoid or mitigate delays from failures and/or slowdowns of individual processing elements, the methods/apparatus/articles-of-manufacture involving, for example, the following: receiving a job request, from a client, over the network; processing the job request to determine the number, K, of individual tasks to be assigned to individual network-connected processing elements; determining a subset, N, of the K tasks whose completion is most critical to the overall completion of the job; and assigning each of the K tasks to an individual network-connected processing element; and redundantly assigning at least some of the N task(s) in the subset to additional network-connected processing element(s).
  • Determining the subset, N, of the K tasks whose completion is most critical to the overall completion of the job may include one or more of the following: (i) assigning, to the subset, task(s) that must be completed before other task(s) can be commenced; (ii) assigning, to the subset, task(s) that supply data to other task(s); (iii) assigning, to the subset, task(s) that is/are likely to require the largest amount of memory; (iv) assigning, to the subset, task(s) that is/are likely to require the largest amount of local disk space; (v) assigning, to the subset, task(s) that is/are likely to require the largest amount of processor time; and/or (vi) assigning, to the subset, task(s) that is/are likely to require the largest amount of data communication over the network.
  • the methods/apparatus/articles-of-manufacture may further involve: determining, based on completions of certain of the K tasks and/or N redundant task(s), that sufficient tasks have been completed to compile job results; and reporting job results to the client over the network.
  • another aspect of the invention relates to methods, apparatus or articles-of-manufacture for using a group of network-connected processing elements to process a job, the job illustratively comprised of a plurality of tasks, one or more of which are critical tasks, the methods/apparatus/articles-of-manufacture involving, for example, the following: identifying a one or more higher-capacity processing elements among the group of network-connected processing elements; assigning at least one critical task to at least one of the identified higher-, capacity processing elements; assigning other tasks to other processing elements such that each task in the job has been assigned to at least one processing element; and communicating results from the assigned tasks over the network.
  • Identifying a one or more higher-capacity processing elements among the group of network-connected processing elements may involve one or more of the following: (i) evaluating the processing capacity of processing elements in the group based on their execution of previously-assigned tasks; (ii) determining the processing capacity of processing elements in the group through use of assigned benchmark tasks; and/or (iii) evaluating hardware configurations of at least a plurality of processing elements in the group.
  • the methods/apparatus/articles-of-manufacture may further involve (i) ensuring that each critical task in the job is assigned to a higher-capacity processing element and/or (ii) storing the amount of time used by the processing elements to execute the assigned tasks and computing a cost for the job based, at least in part, on the stored task execution times.
  • Computing a cost for the job based, at least in part, on the stored task execution times may involve charging a higher incremental rate for time spent executing tasks on higher-capability processing elements than for time spent executing tasks on other processing elements.
  • Such computed costs are preferably communicated over the network.
  • another aspect of the invention relates to methods, apparatus or articles-of-manufacture for distributed computing, including, for example, the following: a multiplicity of worker processors; at least one supervisory processor, configured to assign tasks to, and monitor the status of, the worker processors, the at least one supervisory processor further configured to assign each critical task to at least two worker processors; an always-on, peer-to-peer computer network linking the worker processors and the supervisory processor(s); and at least one of the at least one supervisory processor(s) including a monitoring module, which monitors the status of worker processors expected to be executing assigned tasks to ensure that the distributed computing system maintains always-live operation.
  • the monitoring module preferably receives status messages from at least each of the worker processors expected to be executing assigned tasks, and preferably detects abnormalities in the operation of the worker processors expected to be executing assigned tasks, and/or their associated network connections, by detecting an absence of expected status messages received from the worker processors.
  • the monitoring module checks for an absence of expected status messages at predetermined intervals, such as at least once each minute, at least once each second, etc.
  • the monitoring module may be configured to detect the presence of non-assigned-task-related activity on the worker processors expected to be executing assigned tasks, preferably through use activity monitor programs running on each of the worker processors expected to be executing assigned tasks.
  • Such activity monitor programs may comprise screensaver programs, and may be configured to detect one, two, three or more of the following types of non-assigned-task-related activity: keyboard activity; mouse activity; pointer activity; touchscreen activity; voice activity; and/or execution of substantial non-assigned-task-related processes.
  • Additional aspects of the invention relate to systems, structures and articles-of-manufacture used, or useful, in connection with all, or part, of the above-described methods. Still further aspects of the invention relate to different combinations or sub-combinations of the above-described elements and process steps.
  • FIG. 1 exemplifies the communication between various workers and brokers/servers in the datasynapse/WebProc environment;
  • FIG. 2 illustrates further details of the datasynapse/WebProc environment;
  • FIG. 3 illustrates aspects of the datasynapse/WebProc tasking API.
  • FIG. 4 illustrates aspects of the datasynapse/WebProc job submission process
  • FIG. 5 illustrates further aspects of the datasynapse/WebProc job submission process
  • FIG. 6 illustrates aspects of the datasynapse/WebProc job submission process, from a customer perspective
  • FIG. 7 illustrates aspects of the datasynapse/WebProc job verification process, from a job space perspective
  • FIG. 8 illustrates aspects of the datasynapse/WebProc job registration process
  • FIG. 9 illustrates aspects of the datasynapse/WebProc job unpacking process
  • FIG. 10 illustrates aspects of the datasynapse/WebProc task management process
  • FIG. 11 illustrates aspects of the datasynapse/WebProc worker interface
  • FIG. 12 illustrates aspects of the datasynapse/WebProc task return process
  • FIG. 13 illustrates aspects of the datasynapse/WebProc job collation process
  • FIG. 14 illustrates aspects of the datasynapse/WebProc job return process
  • FIGs. 15-16 depict aspects of the datasynapse/WebProc security architecture
  • FIG. 17 contains exemplary datasynapse/WebProc Tasklnput code
  • FIG. 18 contains exemplary datasynapse/WebProc TaskOutput code
  • FIG. 19 contains exemplary datasynapse/WebProc Task code
  • FIG. 20 contains exemplary datasynapse/WebProc TasklnputProcess code
  • FIG. 21 contains exemplary datasynapse/WebProc TaskOutputProcess code.
  • FIG. 22 depicts an exemplary network-based distributed processing system in which the present invention may be employed
  • FIG. 23 contains a flowchart illustrating the operation of an exemplary always-live distributed processing system in accordance with the invention.
  • FIG. 24 is a flowchart illustrating the operation of an exemplary redundancy-based, always-live distributed processing system in accordance with the invention.
  • an illustrative distributed computing network comprises at least one Beekeeper server 1, a plurality of Queen bee servers 2a-f, each in communication with a beekeeper server, and a plurality of Worker bee PC's 3a-x, each in communication with one or more queen bee servers.
  • the datasynapse network of Beekeeper(s) and Queen bees is preferably managed by a facilities outsource provider, and incorporates all of the redundancy and security features which other mission-critical users are afforded, including mirroring of servers, 24/7/365 uptime, etc.
  • Beekeeper 1 acts as the central exchange and preferably has three core responsibilities: (i) maintain customer registry; (ii) maintain worker bee registry; and (iii) load balance among the cluster of Queen bees.
  • Beekeeper 1 is preferably designed to scale according to network supply. Because Queen bees 2a-f are each typically absorbing the bulk of high-level tasking and data throughput, Beekeeper 1 is able to concentrate on efficiently maintaining a master registry and load balance.
  • Beekeeper 1 automatically interrogates worker bees 3a-x that visit the datasynapse.com website and responds according to whether the worker is a narrowband, broadband but unregistered visitor, or authorized broadband visitor. Once registered, worker bees 3a-x automatically visit Beekeeper 1 upon activation to solicit the list of designated Queen bees 2a-f where the worker should seek work. This enables the datasynapse network to dynamically interrogate a worker and load balance, assigning a designated Queen bee server for all future interaction with the datasynapse network, defaulting to a secondary backup in the event the primary Queen bee experiences difficulties or has no work to perform. This designation relieves Beekeeper 1 from congestion issues and accelerates the overall distributed network throughput.
  • the Queen bees 2a-f manage the brokering of jobs from clients (not depicted) to worker bees 3a-x, once a client has been registered at Beekeeper 1 and designated to a Queen bee, similar to the Worker bee process outlined above.
  • Queen bee 1 is preferably designed to scale up to at least 10,000 Worker bees 3a-x.
  • FIG. 2 illustrates further details of the datasynapse/WebProc environment.
  • the datasynapse software seamlessly and easily integrates within existing or new applications which can capitalize on distributed processing.
  • Tasking API 8 requests the user to organize its distributed problem in five intuitive classes 9, which collectively capture a simple yet flexible tasking semantic.
  • the datasynapse software permits the user to bind a tasking implementation to a specific run-time job submission via a WebProc markup language (see, also, FIGs. 17-21).
  • the customer downloads and installs, anywhere in its network, the lightweight WebProc customer stub 12, which supports the above-described API.
  • Customer Engine 11 automatically packages tasks into job entries.
  • Customer stub 12 will preferably automatically download the most recent WebProc engine 11 at time of job submission. Such engine download enables datasynapse to update and enhance its functionality on a continuous basis, without interfering with customer applications 8, or forcing the customer to continuously re-install the software.
  • JobSpace is the virtual boundary for the total datasynapse.com exchange of work components.
  • Illustrative worker engine 6y automatically processes tasks.
  • Worker engine 6y is preferably automatically downloaded by a corresponding worker stub 5y at the start of executing a task. Such engine download enables datasynapse to update and enhance its functionality on a continuous basis without interfering with worker applications, or forcing the worker 3y to continuously re-install the software.
  • Worker stub 5y is downloaded from the datasynapse.com website at registration. It is preferably a lightweight program which senses when a customer's screensaver program is on, and then visits the designated Queen bee server at intervals to take work, if available.
  • FIG. 3 illustrates aspects of datasynapse/WebProc's tasking API.
  • TasklnputProcess 9a Five user inputs - illustratively depicted as TasklnputProcess 9a, Tasklnput(s) 9b-e, Task 9f, TaskOutput(s) 9g-j and TaskOutputProcess 9k ⁇ provide a customer flexibility to determine how best to extract task inputs from its database and return task outputs to its application.
  • a feature of datasynapse's API is to permit nested and recursive parallel computations, which enables a user to submit multiple jobs in batches and not in sequential processing.
  • FIG. 4 illustrates aspects of the datasynapse/WebProc job submission process.
  • a theme underpinning the datasynapse network design is the concept of implementing a loosely coupled system. In such a system, the coordination between each link is standardized in such a way that data can be passed between links without a need for complex messaging, or centralized coordination extending deep into successive links.
  • JobSpace 14a reacts dynamically to job submissions, breaking them down into discrete tasks and queuing them for execution 20, and reacts dynamically to customer requests for job results, by combing the queue for finished job entries 23 and passing them along accordingly 22.
  • the worker 15a reports to JobSpace for work, if available, and takes/processes/returns work
  • JobSpace dynamically matches high-capacity workers (e.g. the worker registry can differentiate workers in terms of CPU speed, bandwidth, security profile, track record, and average times on-line, etc.) with equivalent utilization tasks, whenever feasible.
  • a job 16 is comprised of a series of Job Entries 26-29. There may be one or more Job Entries for each specific run-time Job submission.
  • Each Job Entry 26-29 includes an element descriptor 26a-29a and one or more Task Entries 26b-f, 27b-d, 28b, 29b-e.
  • the element descriptor will register "All.” Otherwise, the first Job Entry 26a will register "Head” and the last 29a will register "Tail.” Job Entries in between 27a-28a will each be registered as "Segment.”
  • Each Job Entry can contain one or more tasks.
  • the customer does not get involved in determining how tasks are packed into job entries for submission to datasynapse; the customer only identifies the number of tasks and the priority of its overall job mission to the WebProc customer engine, which automatically packs tasks into job entries according to datasynapse's efficient packing algorithms.
  • Tasks are preferably subject to the following attributes:
  • FIG. 6 illustrates aspects of the datasynapse/WebProc job submission process, from a customer perspective 13. From a customer perspective, only three actions are necessary to integrate to datasynapse's platform.
  • the stub must be installed 30 on a server. This process takes about 10 minutes, requires no training, and once installed provides a live interface for the user to implement its tasking API.
  • the installed customer stub is used to automatically download the Customer Engine 30a, automatically package a Job Entry 30b, and automatically submit a packaged Job to the Job Space 30j.
  • Packaging a job 30b preferably includes assembling a record that includes a job id 30c, customer id 30d, instruction set (preferably in bytecode) 30f, element descriptor 30g, task entries 30k, and priority 30i.
  • the Tasking API is explicitly designed to capture generalized data in a means which can most readily integrate with existing applications.
  • Implementing a taking API preferably includes creating a task input 31a, creating a task output 31b, creating one or more task(s) 31c, creating a task input process 31 d and creating a task output process 31 e.
  • a WebProcMarkup Language (XML file) enables a user to bind 32 the tasking implementation for job submission.
  • the customer is focused on his/her application, and is 0 thinking at "task” level.
  • the customer is not concerned about aggregating problems into “jobs” because this is automatically done by the engine.
  • the customer must decide: (i) how many tasks to break its overall problem into, as the more tasks, the more efficient a solution; and (ii) what priority to assign its submission. Higher service levels incur higher charges.
  • the 5 customer engine takes over and automates the transmission of the job to datasynapse.com. This process is analogous to packing a suitcase - the contents are determined by the customer, but the engine fits it into one or more suitcases for travel.
  • the engine will send it to JobSpace in one or more Job Entries, with each Job Entry containing one or o more tasks.
  • Process 33a illustratively comprises decrypting a Job Entry 33b, recognizing customer and user id's 33c-d, matching one or more password(s) 33e, and determining whether the job's instructions are properly signed and/or verified 33f. Exceptions are handled by an exception handler 33n.
  • JobSpace After passing the initial check(s), JobSpace automatically recognizes 33k if a job submission is new (first Job Entry) or is part of an ongoing job submission 33m. If new, the job moves to the registration phase 33d. If ongoing, the job is unpacked 33n and the tasks are organized into a queue. This verification and evaluation process tightly coordinates the front and back- office issues necessary to broker jobs on a continuous basis in a secure manner.
  • FIG. 8 illustrates aspects of the datasynapse/WebProc job registration process 34 from the JobSpace perspective 14.
  • the Job is assigned an ID 34b and status record 34c, and acknowledged in the master job registry 34k.
  • a job status record illustratively includes an indication of total tasks 34d, completed tasks 34e, downloaded tasks 34f, CPU time 34g, total data input 34h, total data output 34i and task completion time 34j.
  • JobSpace can monitor the job registry on a macro basis to ensure that there are no job exceptions, and to fine tune network performance as required.
  • FIG. 9 illustrates aspects of the datasynapse/WebProc job unpacking process 35 from a JobSpace perspective 14.
  • JobSpace detaches its master instruction set to a Java archive (Jar) file 35c and records the URL address for the instruction set in each specific task entry. This jar file is accessible on the web to worker bees involved in executing job-related tasks.
  • the task entries are detached 35d. This involves detaching the data input 35e and recording its URL address in its place 35f.
  • JobSpace is making each task entry a lightweight hand-off. It also decentralizes the storage of instructions and data outside the JobSpace activity circle, which is one of the reasons why JobSpace can scale so comfortably to internet levels of coordination.
  • This methodology of detaching data has a further advantage in that the customer has flexibility to preempt the sending of data in the original job submission, and can instead keep its data in a remote location.
  • the customer has flexibility to escalate its security levels as well, in that both the instruction set and data can be encrypted, if so required.
  • the detached task entry is put into the task queue 35f where it waits for a worker to pick it up. Records are stored regarding the time such a task entry was created, as well as when a worker received the task for processing, and completed the task. This enables datasynapse to measure both the transmission latency time and the raw task- crunching processing time as distinct time elements.
  • a task entry record may include job id 35i, class id 35], task id 35k, job priority 35I, instruction set URL 35m, data URL 35n, one or more time stamp(s) 35o (including time of receipt 35p and time of completion 35q), and a worker bee id 35r.
  • FIG. 10 illustrates aspects of the datasynapse/WebProc task management process 21 , from a JobSpace perspective 14.
  • An important aspect of the WebProc software platform is its ability to dynamically broker demand with supply, efficiently matching resources.
  • the task queue is the mechanism where workers take the
  • JobSpace matches tasks 24c to workers in the waiting queue 21k according to the most appropriate fit, depending on priority, ability, and latency in the pending queue. This capability accounts for the robustness of the WebProc software in terms of fault tolerance. JobSpace sweeps the pending queue 211 and compares how long a task has been waiting relative to the average time it has taken other similar tasks in the same job class to be processed. If too long a delay has occurred, JobSpace resubmits the task from the pending queue to the waiting queue, and re-prioritizes its ranking if necessary.
  • JobSpace takes note and shuts them down from accessing the network, and/or for security reasons. As tasks are competed, they are placed in the competed queue 21m. All queues are kept up to date 21g-i.
  • FIG. 11 illustrates aspects of the datasynapse WebProc worker interface 24a, from a worker perspective 15. Similar to the customer, a registered worker needs to install a lightweight worker stub 24b in order to access the datasynapse JobSpace. Workers need do nothing else, however, once this download has been installed. The engine automatically interacts with JobSpace to get task entries 24c, download jar files 24f, download data 24g, perform work 24h, and return completed tasks 24i thereafter.
  • the worker is verified upon taking 24k and submitting 24j tasks.
  • Workers build a profile in the Beekeeper registry so that JobSpace can determine how best to match specific tasks against specific workers on a rolling basis.
  • This type of expert system balances matching against waiting time to optimize network arbitrage opportunities, and provides a basis for assessing worker performance, thereby enabling detection of aberrant performance.
  • Worker security is preferably implemented using a "sandbox" approach.
  • the rules of the datasynapse sandbox preferably dictate: (i) no local disk access while processing datasynapse.com jobs; (ii) strict compliance with security features where the worker bee cannot pass on its data to any other URL address other than datasynapse's Beekeeper server, or the designated Queen bee connection; (iii) registered worker bees cannot be activated unless the specific instruction set received has been signed by datasynapse, verified, and encrypted; and (iv) no printing, no manipulation of local environment networks, and absolutely no content can be executed.
  • Worker processor performance metrics may be used to detect aberrant performance by processors executing tasks. In other words, if a processor is expected to complete a task in 1 minute, and the task is not competed in 2 minutes, one may conclude that the processor is (or may be) exhibiting aberrant performance.
  • Another way to detect aberrant performance is to compare the performance of multiple worker processors executing similar tasks. In other words, when similar processors spend significantly different amounts of time (either real time or CPU time) executing similar jobs, it may be concluded that those significantly slower processors are exhibiting some sort of aberrant performance. Because aberrant performance may suggest a security breach on the aberrant-performing worker processor(s), such processor(s) may be selectively disabled and precluded from receiving further task allocations.
  • FIG. 13 illustrates aspects of the datasynapse/WebProc job collation process 22, from a JobSpace perspective 14.
  • JobSpace collates tasks according to its Job ID once a customer returns to JobSpace to take back its processed job.
  • the interrogation of JobSpace by a customer seeking to take back a particular job triggers a search of the completed task queue and a re-packing of tasks into job entry format for transport back to the customer's application level.
  • This "collating" process is highly efficient, responding dynamically to demand from customer to return completed tasks as they roll in, and not to wait for the whole job to be accomplished. Similar to the unpacking, this enables the customer to begin integrating immediately the results as they accumulate and expedites overall throughput through the JobSpace system.
  • Job entries once packed with completed tasks, can be processed by the customer's existing applications.
  • the customer may take 17 a job by getting the completed Job Entry(ies) 17a and processing 17b it (or them).
  • JobSpace preferably does not integrate further, because: (i) it is likely that a customer will seek to keep its end result proprietary, and take the intermediate results obtained through datasynapse and finalize analysis in its own environment; and (ii) it is unlikely that the processing of intermediate results will in itself be a parallelizable task and should therefore not be handled within the confines of JobSpace.
  • FIGs. 15-16 illustrate aspects of the WebProc security architecture, that permits secure message transmission, identification and authentication of the various client 41 and server 40 components of the datasynapse distributed processing network.
  • FIGs. 17-21 contain exemplary code segments, which segments will be self-explanatory to persons skilled in the art. These segments exemplify the previously-described Tasklnput 9b-e (FIG. 17), TaskOutput 9g-j (FIG. 18), Task 9f (FIG. 19), TasklnputProcess 9a (FIG. 20) and TaskOutputProcess 9k (FIG. 21) aspects of the datasynapse/WebProc tasking API.
  • FIG. 22 depicts an exemplary context in which the method(s), apparatus and/or article(s)-of-manufacture of the invention may be applied, a computer network 201 is shown connecting a plurality of processing resources.
  • Computer network 201 may utilize any type of transmission medium (e.g.. wire, coax, fiber optics, RF, satellite, etc.) and any network protocol. However, in order to realize the principal benefit(s) of the present invention, computer network 201 should provide a relatively high bandwidth (e.g., at least 100 kilobits/second) and preferably, though not necessarily, should provide an "always on" connection to the processing resources involved in distributed processing activities.
  • one or more supervisory processor(s) 213 may communicate with a plurality of worker processors 210 via computer network 201. Supervisory processor(s) 213 perform such tasks as:
  • plurality 213 of worker processors 211 and 212 may operate collaboratively as a group, independently (e.g., each handing different job(s), task(s) and/or worker processor pool(s)) and/or redundantly (thus providing enhanced reliability). However, to realize a complete distributed processing system in accordance with the invention, only a single supervisory processor (e.g.. 211 or 212) is needed. Still referring to FIG. 22, plurality 210 of worker processors illustratively comprises worker processors 202, 204, 206 and 208, each connected to computer network 201 through network connections 203, 205, 207 and 209, respectively. These worker processors communicate with supervisory processor(s) 213 via network 201, and preferably include worker processor software that enables substantially continuous monitoring of worker processor status and/or task execution progress by supervisory processor(s) 213.
  • a received job request 320 is initially assigned 321 to a plurality of available worker processors. Then, until the client's job is completed, processor(s) working on assigned task(s) are continuously monitored to ensure that the job is completed in a substantially uninterrupted (or "always live") manner. In particular, a monitoring module repeatedly asks whether all assigned tasks have been completed 322. If so, then the job is complete, and results can be reported 323. If not, then the monitoring module inquires about the status 324 of processor(s) expected to be working on not-yet-completed tasks.
  • affected task(s) are immediately reassigned 325 to ensure that the system remains "live” and the client's work gets completed in a timely manner. This process is repeated with a frequency sufficient to ensure that worker processor problems will not cause undue delay is completing the overall job.
  • a job request is received 221 via the computer network.
  • the received job request typically includes a multiplicity of subordinate tasks.
  • the set of tasks is examined or analyzed to identify 222 critical tasks.
  • identification 222 of critical tasks may take several forms, including, but by no means limited to, the following:
  • Identification 223 of available processing resources includes determining the available pool of potential worker processors, and may also include determining the capabilities (e.g., processor speed, memory, network bandwidth, historical performance) of processing resources in the identified pool.
  • Each task is then assigned 224 to at least one processing element. Such task assignment may optionally involve assigning critical task(s) to higher- capability processing elements.
  • Some (and preferably all) critical task(s) are also assigned 225 to additional (i.e.. redundant) processing elements. (Note that although 224 and 225 are depicted as discrete acts, they can be (and are preferably) performed together.)
  • Task executions are monitored, preferably on a substantially continuous basis, as described in connection with FIG. 23. Once such monitoring reveals that each of the job's tasks has been completed 226 by at least one of the assigned processing resources, then the results are collected and reported 227 to the client.
  • a claim that contains more than one computer-implemented means-plus-function element should not be construed to require that each means-plus-function element must be a structurally distinct entity (such as a particular piece of hardware or block of code); rather, such claim should be construed merely to require that the overall combination of hardware/firmware/software which implements the invention must, as a whole, implement at least the function(s) called for by the claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Computer And Data Communications (AREA)

Abstract

A network-based, secure, distributed task-brokering and parallel-processing method/system/article-of-manufacture advantageously leverages under-utilized network-based computing resources for bandwidth-intensive and/or computationally-intensive problems, and provides significant cost and reliability advantages over traditional coarse-grained parallel computing techniques.

Description

METHODS, APPARATUS, AND ARTICLES-OF-MANUFACTURE FOR NETWORK-BASED DISTRIBUTED COMPUTING FIELD OF THE INVENTION
The present invention relates generally to the fields of distributed computing and Internet-based applications and services. More particularly, the invention relates to methods, apparatus and articles-of-manufactures relating to the collection, organization, maintenance, management and commercial exploitation of network-connected, distributed computing resources.
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority from the following co-pending U.S. Patent Applications: (i) S/N 60/203,719, Methods, Apparatus, and Articles-of- Manufacture for Network-Based Distributed Computing, filed May 12, 2000; (ii) S/N 09/583,244, Methods, Apparatus, and Articles-of-Manufacture for Network- Based Distributed Computing, filed May 31 , 2000; (iii) S/N 09/711 ,634, Methods, Apparatus and Articles-of-Manufacture for Providing Always-Live Distributed Computing, filed November 13, 2000; (iv) S/N 09/777,190, Redundancy-Based Methods, Apparatus and Articles-of-Manufacture for
Providing Improved Quality-of-Service in an Always-Live Distributed Computing Environment, filed February 2, 2001; (v) S/N 60/266,185, Methods, Apparatus and Articles-of-Manufacture for Network-Based Distributed Computing, filed February 2, 2001. Each of the co-pending applications (i)-(v) is hereby incorporated by reference herein.
BACKGROUND OF THE INVENTION
Distributed (or parallel) computing is a well-established field. Over the past few decades, thousands of distributed computing architectures have been proposed, and hundreds have been constructed and evaluated.
Distributed computing architectures are typically characterized as being either "coarse-grained" or "fine-grained," depending upon the size or complexity of the individual processing elements (or "nodes") that perform the computational work. In a typical coarse-grained, distributed computing architecture, the individual processing elements are generally fully functional computing elements, such as single-chip microprocessors, capable of individually performing a variety of useful tasks. The fine-grained approach, by contrast, typically relies on a large number of processing elements, each of which has very limited computational capabilities.
Perhaps the best known fine-grained parallel architecture is The Connection Machine, manufactured by the now-defunct Thinking Machines Corporation. In The Connection Machine, thousands of very-simple processing elements were connected by a highly-efficient message-routing network. Even though individual processing elements lacked the ability to perform much useful on their own, the efficiency of the message-routing network made it possible to cooperatively deploy large numbers of processing elements, on certain problems.
Unlike The Connection Machine, coarse-grained parallel computing systems seldom have the luxury of communication networks that operate with latencies at, or near, the clock speed of individual processing elements. The more sophisticated processing elements used in a coarse-grained distributed processing systems typically cannot be packed into a small volume, like a single chip, board or chassis. As a result, communications must travel across chip-to-chip, board-to-board, or even chassis-to-chassis boundaries, as well as greater physical distances, all of which causes inherent and unavoidable increases in latency. Because of these inherent communication limitations, coarse-grained parallel architectures have, for many years, been viewed by persons skilled in the art as useful only for "computationally intensive"-as opposed to "communication intensive"-tasks. A typical computationally intensive task is prime factorization of large integers.
Recently, there have been several efforts to exploit the resources of the world's largest coarse-grained distributed computing system-the Internet. The thrust of these efforts has been to apply traditional coarse-grained distributed processing approaches to utilize idle processing resources connected to the World-Wide Web ("www"). The first reported application of these www-based methods was signal analysis, as part of a search for extra-terrestrial intelligence ("SETI"). Several years later, a group at the University of California, Santa Barbara, described the use of web-based distributed parallelism for prime factorization, and other computationally intensive problems.
Both of these reported prior-art efforts clearly embrace and exemplify traditional, coarse-grained parallelism thinking, namely, that such parallelism is only useful for computationally intensive, as opposed to communication intensive, problems. See G. Moritz. SETI and Distributed Computing, www.people.fas.harvard.edu/~gmoritz/papers/s7.html (1998) ("Distributed computing is well suited to the search for extraterrestrial civilizations for several reasons. First, the problem itself consists of small blocks of data which each require a large amount of processing. Since CPU time, not bandwidth, is the major requirement of the SERENDIP data analysis, distributed computing via the Internet will be very feasible"); A. D. Alexandrov, SuperWeb: Towards a Global Web-Based Parallel Computing Infrastructure, citeseer.nj.nec.com/ cachedpage/80115 (1997), at 1 ("We expect this approach to work well for noncommunication intensive applications, such as prime number factorization, Monte-Carlo and coarse-grained simulations, and others").
The explosive growth of the Web over the past few years has created a huge demand for high-performance, web-centric computing services. Today, such services are typically rendered using mainframe computers (or other high- performance servers), connected to the Web via a T1 line (operating at 1.544 Mb/s). Unfortunately, T1 connectivity is very costly.
At the same time, consumers are increasingly migrating toward high- bandwidth, always-on Web connections, such as those offered by DSL and cable-modem providers. The inventors herein have observed that, as consumer connections to the Internet get faster and cheaper, the ratio of bandwidth-to-cost is far more favorable in the consumer (e.g.. DSL and cable- modem) market than in the high-performance corporate (e.g., T1) market. In other words, even at the present time, individuals with high-speed Internet connections are paying far less per unit of bandwidth than high-demand corporate users of T1 lines. Moreover, economies of scale are likely to further drive-down the cost of mass-marketed, high-speed Internet connections, thus making the existing cost disparity even greater.
It would be highly desirable if users of high-performance, web-centric computing services could take advantage of the increasingly cheaper, highspeed, mass-marketed Internet connection sen/ices. It would also be highly desirable if such users could take advantage of the millions of often-idle computing resources (e.g.. PCs, workstations and other devices) linked to the Internet through such always-on, high-speed, mass-marketed connections. And, it would be highly-desirable if owners of such often-idle computing resources could be compensated for use of their resources' always-on, highspeed Internet connections during otherwise idle periods of time. One troublesome aspect of prior-art Web-based distributed computing systems is their inability guarantee timely results. While it may be no problem for the SETI@home researchers to wait days or weeks for results from a particular data set, commercial customers simply cannot afford to have overnight processing jobs run unexpectedly into the next business day. Therefore, in order to realize the full commercial potential of network-based distributed computing, it is necessary to ensure that the clients' work gets processed in a substantially continuous and uninterrupted manner, so that a service provider can assure his/her client that assigned work will be completed in within a commercially-reasonable time period (e.g.. an hour, four hours, eight hours, etc.). The invention, as described below, also addresses this problem.
SUMMARY OF THE INVENTION
In light of the above, one object of the present invention relates to software infrastructure designed to capture a generalized problem-solving capability, handle data throughputs in excess of 100x the productivity of the
SETI effort, require no access to a worker's local disk drive (to assuage security concems) and motivate retail Internet users to participate by paying them in cash or higher-value non-monetary compensation (frequent flyer miles, lottery, discounted products/services, etc.).
Another object of the invention relates to a distributed networking software system that enables thousands, potentially scaled to millions, of consumer PCs on the Internet to be networked together to function as a Virtual Super Computer ("VSC").
Another object of the invention relates to a distributed networking software system that enables a "CPU/bandwidth" website exchange to be operated, which web site anonymously and securely brokers demand from web-centric applications seeking integrated (i) data processing and/or (ii) high- bandwidth access to the Internet with retail supply of such resources. Such "brokering" platform aggregates CPU capability at a commercially-significant unit cost advantage versus the equivalent CPU horsepower of a high-end supercomputer, and aggregates Internet bandwidth access at a commercially- significant cost advantage versus T1 , T3, or OC3 high-speed connectivity.
Another object of the invention relates to a distributed networking software system that enables on-demand computing power, with functionality similar to an electric utility, where corporate users can "plug in" to the network's website exchange for powering a wide range of web applications, thereby capitalizing on a distributed problem-solving approach.
Another object of the invention relates to a distributed networking software system that enables flexibility in client deployment, where clients who have unique security concerns or applications which do not require Web integration can licence the software platform for deployment on an intranet or extranet basis only.
Another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture that facilitate an always-live distributed computing system. Another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture that provide substantially continuous monitoring of worker processor activity and/or task progress in a distributed computing environment.
Another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture that provide prompt alerts of worker processor status changes that can affect the always-live operation of a network-based distributed computing system.
Another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture for providing reliable and/or predictable resource deployment and processing activity in a wide-area network based distributed computing system.
A another general object of the invention relates to computer-based methods, apparatus and articles-of-manufacture for providing reliable and/or predictable quality-of-service in a peer-to-peer network based distributed computing system. In accordance with the invention, the enabling platform is preferably a lightweight overlay that is intended to integrate into virtually any corporate network configuration. Similarly, each retail PC downloads a lightweight installation which invisibly runs in the background and automatically visits a broker website seeking work, whenever the PC's screensaver is activated. Another aspect of the invention relates to a multi-tiered server/meta- server architecture configured to support various system management, load balancing, disaster recovery, and security features of the invention.
Another aspect of the invention relates to security. Security is preferably thoroughly integrated into the framework of the invention using a conservative "restricted" approach. Unlike the SETI effort, the invention contemplates use by a variety of parties with potentially conflicting commercial interests; thus, the invention cannot rely on a network of friendly volunteers, and must assume that any one of its retail participants could potentially be hostile in intent.
The invention preferably makes heavy use of security features available in the Java 2 Platform. The Java Secure Socket Extension (JSSE) and Java Authentication and Authorization Service (JAAS), as well as other Internet- standard cryptography and security APIs provide a rich set of tools to use in connection with the invention. The customer is given flexibility in selecting among different levels of security protection. Preferably, the customers' code will be "cloaked" to protect it from decompilation attacks, and signed using private key and predetermined certificate authority ("CA"). In addition, lightweight messages between the customer and servers/brokers will be encrypted. These messages may contain the disguised locations of Java instruction code and data inputs. In turn, these heavyweight items are then also authenticated. Result data is preferably signed by the client program, and server/broker acts as the CA for the worker and customer relationships, and the server/broker preferably only accepts completed work from registered workers who have their private keys registered with the system.
Now, generally speaking, and without intending to be limiting, one aspect of the invention relates to a method for performing distributed, bandwidth- intensive computational tasks, comprising: providing Internet access to at least one broker processor, the at least one broker processor configured to receive jobs from Internet-connected customers; receiving a job from a customer via the Internet; in response to receipt of the job from the customer, directing a plurality of Internet-connected worker processors to perform a plurality of worker tasks related to the received job; awaiting execution of the worker tasks, the execution characterized by a predominance of worker processor-Internet communication activity; and, upon completion of the execution, confirming the completion of the execution via the Internet. During execution, the plurality of worker processors are preferably collectively utilizing, on average, at least 25% of their total available communication bandwidth, and possibly as much as
30%, 35%, 40%, 50%, 60%, 70% or more. At least part of a worker's execution may include: (i) searching the Internet in accordance with a search query supplied by the customer; (ii) creating an index; (iii) creating a database; (iv) updating a database; (v) creating a report; (vi) creating a backup or archival file; (vii) performing software maintenance operations; comparing objects downloaded from the Internet; (viii) processing signals or images downloaded from the Internet; (ix) broadcasting audio and/or video to a plurality of destinations on the Internet; and/or sending e-mail to a plurality of destinations on the Internet.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a method for reducing the cost of performing a bandwidth-intensive job on the Internet, the method comprising: (i) transmitting a job execution request to a broker processor over the Internet; (ii) selecting, in response to the job execution request, a plurality of Internet-connected worker processors to be used in executing the job, the selection of worker processors being performed, at least in part, based on one or more bandwidth-related consideration(s); and (iii) using the selected worker processors to execute, at least in part, the job. The worker processor selection is preferably based, at least in part, on one or more bandwidth-related consideration selected from the list of: (i) the types of Internet connections installed on candidate worker processors; (ii) the locations of candidate worker processors; (iii) the time of day; and (iv) historical performance statistics of candidate worker processors. Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a method for exploiting unused computational resources on the Internet, comprising: (i)recruiting prospective worker processors over the Internet, the recruiting including: (a) providing Internet- accessible instructions; (b) providing Internet-downloadable worker processor software; (c) providing an Internet-accessible worker processor operating agreement; and (d) storing a plurality of work processor preferences; (ii) maintaining a registry of worker processors, the maintaining including: (a) storing a plurality of URLs used to address the worker processors; (b) storing a plurality of worker processor profiles, the profiles including information related to hardware and software configurations of the worker processors; and (c) storing a plurality of worker processor past performance metrics; (iii) selecting a plurality of worker processors to collectively execute a job, the selecting being based, at least in part, on worker processor past performance metrics maintained by the worker processor registry; and (iv) using the selected plurality of worker processors to execute the job. At least some of the prospective worker processors may be connected to the Internet via a satellite connection, a fixed wireless connection, and/or a mobile wireless connection. Recruiting prospective worker processors may further include specifying the type and amount of compensation to be provided in exchange for use of worker processor resources, and/or providing an on-line means of accepting the worker processor operating agreement. Maintaining a registry of worker processors may further include determining the performance of worker processors listed in the registry by executing one or more benchmark programs on the worker processors, and optionally updating the worker processor past performance metrics in accordance with measured benchmark program performance statistics. Selecting may be further based, at least in part, on one or more bandwidth-related consideration(s) selected from the list of: (i) the types of Internet connections installed on the worker processors; (ii) the locations of the worker processors; (iii) the time of day; and (iv) one or more of the stored preferences.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a method for reselling Internet bandwidth associated with individual DSL-connected Internet workstations, the method comprising: (i) entering on-line-completed operating agreements with a plurality of DSL-connected Internet users, the agreements providing for use of a plurality of DSL-connected Internet workstations controlled by the users; (ii) executing a customers's distributed task, using a plurality of the DSL-connected Internet workstations; (iii) storing, for each of the DSL-connected Internet workstations used in the distributed task execution, a bandwidth utilization metric; (iv) compensating the DSL-connected Internet users whose workstations where used in the distributed task execution, the compensation being determined, at least in part, based upon the bandwidth utilization metrics associated with the workstations used in the distributed task execution; and (v) charging the customer whose distributed task was executed using the DSL-connected Internet workstations. The customer is preferably charged, at least in part, based upon the bandwidth utilization metrics associated with the workstations used in executing the customer's distributed task. Executing a customer's distributed task may include: (i) receiving an execution request message from the customer over the Internet; (ii) processing the execution request using an Internet-connected broker processor; and (iii) initiating distributed execution of the task by sending messages, over the Internet, to a plurality of the DSL- connected Internet workstations. The compensation is preferably determined, at least in part, by one or more metric(s) selected from the list consisting of: (i) the amount of real time used by the DSL-connected Internet workstations in executing the distributed task; (ii) the amount of processor time used by the DSL-connected Internet workstations in the executing the distributed task; (iii) the amount of primary storage used by the DSL-connected Internet workstation in the executing the distributed task; (iv) the amount of secondary storage used by the DSL-connected Internet workstation in executing the distributed task; (v) the time of day during which the execution occurred; and (vi) the geographic location(s) of the DSL-connected Internet workstations. The plurality of DSL- connected Internet workstations may operate in accordance with any one of the following protocols: ADSL, HDSL, IDSL, MSDSL, RADSL, SDSL, and VDSL (or other similar, or future, protocols). Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a method for reselling Internet bandwidth associated with individual cable modem-connected Internet workstations, the method comprising: (i) enrolling a plurality of cable modem-connected Internet users by installing worker processor software on a plurality of cable modem- connected Internet workstations controlled by the users; (ii) using the installed worker processor software to execute a distributed task on a plurality of the cable modem-connected Internet workstations; (iii) using the installed worker processor software to compute, for each workstation used in the distributed task execution, a billing metric determined, at least in part, by the amount of data communication involved in executing the distributed task; (iv) compensating the cable modem-connected Internet users whose workstations where used in the distributed task execution; (v) charging a customer who requested execution of the distributed task; and (vi) wherein the compensating and charging are performed, at least in part, using one or more of the computed billing metric(s), and wherein, for each distributed task executed, the amount
5 charged to the customer exceeds the sum of all amounts paid to the cable modem-connected Internet users.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a method for executing jobs, comprised of a plurality of tasks, in a networked computing environment, the method o comprising: (i) providing networked access to at least one broker processor, the broker processor configured to receive a job from a user, unpack the job into a plurality of executable tasks, and direct a plurality of worker processors to initiate execution of the tasks; (ii) maintaining performance metrics for worker processors; (iii) monitoring completion of tasks by the worker processors and, s upon completion, updating the performance metrics; (iv) using the performance metrics to select, at least in part, worker processors to initiate execution of additional tasks; and (v) using the performance metrics to determine, at least in part, the charges to be billed to the user for execution of the job. The method may further include (a) using said performance metrics to detect aberrant o performance of worker processors executing tasks; and (b) terminating execution of tasks on worker processors that display aberrant performance.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a method for operating a distributed computing system, the system including a multiplicity of network-connected 5 worker processors and at least one supervisory processor, the supervisory processor configured to assign tasks to, and monitor the status of, the worker processors, the method comprising: assigning tasks to a plurality of the worker processors by sending task-assignment messages, via the network, from the at least one supervisory processor to the plurality of worker processors; and o monitoring, on a substantially continuous basis, the status of at least each of the plurality of assigned worker processors until each processor completes its assigned task. Monitoring, on a substantially continuous basis, the status of at least each of the plurality of assigned worker processors may involve receiving status messages from at least each of the plurality of assigned worker processors until each processor completes its assigned task. Monitoring, on a substantially continuous basis, the status of at least each of the plurality of worker processors may also involve detecting abnormalities in the operation of the plurality of assigned worker processors, and/or their associated network connections, by detecting an absence of expected status message(s) received by the at least one supervisory processor. Detection of an absence of expected status message(s) received by the at least one supervisory processor may be repeated at least once every ten minutes, once every five minutes, once every two minutes, once each minute, once every thirty seconds, once every ten seconds, once every second, once every tenth of a second, once every hundredth of a second, once each millisecond, or at whatever interval is needed to assure the continuity-of-service demanded by the client. Monitoring, on a substantially continuous basis, the status of at least each of the plurality of assigned worker processors may also involve detecting the presence of non-assigned-task-related activity on the worker processors. Detecting the presence of non-assigned-task-related activity on the worker processors may involve running an activity monitor program on each of the assigned worker processors. The activity monitor programs running on each of the assigned worker processors may behave substantially like screen saver programs. The activity monitory programs running on each of the assigned worker processors may send, in response to detection of keyboard activity (or mouse activity, pointer activity, touchscreen activity, voice activity, local execution of substantial non-assigned-task-related processes, or any combination thereof), a message to at least one of the at least one supervisory processor(s). Detecting the presence of non-assigned-task-related activity on the worker processors may also involve determining, in response to an activity monitor message received by at least one of the at least one supervisory of the processor(s), that at least one of the assigned worker processors is undertaking non-assigned-task- related activity. The activity monitor message may be generated by an activity monitor program running on one of the assigned worker processors.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a method for operating an always-live . distributed computing system, comprising: providing a pool of worker processors, each having installed worker processor software, and each connected to an always-on, peer-to-peer computer network; providing at least one supervisory processor, also connected to the always-on, peer-to-peer computer network; using the at least one supervisory processor to monitor, on a substantially continuous basis, the status of worker processors expected to be engaged in the processing of assigned tasks; and using the at least one supervisory processor to reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks. Providing a pool of worker processors may further involve ensuring that each of the worker processors is linked to the always-on, peer-to-peer computer network through a high-bandwidth connection having, for example, a data rate of least 100 kilobits/sec, 250 kilobits/sec, 1 megabit/sec, 10 megabits/sec, 100 megabits/sec, 1 gigabit/sec, or whatever particular bandwidth may be demanded by the client's needs (e.g.. required throughput and data intensiveness of the application). Using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks may involve sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks. The process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is preferably repeated at least once every minute, second, tenth of a second, hundredth of a second, millisecond or other interval, as needed to meet client requirements. Using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks may also involve periodically checking to ensure that a heartbeat message has been received, within a preselected frequency interval, from each worker processor that is expected to be engaged in the processing of assigned tasks. The preselected frequency interval may be set at or less than one minute, ten seconds, one second, one tenth of a second, one hundredth of a second, one millisecond, or other appropriate value, as needed. Using the at least one supervisory processor to reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks may also involve: detecting aberrant behavior among the worker processors expected to be engaged in the processing of assigned tasks; and reassigning tasks expected to be completed by the aberrant-behaving worker processor(s) to other available processor(s) in the worker processor pool.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a method for operating a network-connected processor as a processing element in a distributed processing system, the method comprising: installing software that enables the network-connected processor to receive tasks from, and provide results to, one or more independent, network-connected resource(s); and using the software installed on the network-connected processor to provide substantially continuous status information to an independent, network-connected resource. Using the software installed on the network-connected processor to provide substantially continuous status information to an independent, network-connected resource may involve sending a heartbeat message to the independent, network- connected resource at least once every second, tenth of a second, hundredth of a second, millisecond, etc. Using the software installed on the network-connected processor to provide substantially continuous status information to an independent, network-connected resource may also involve responding to status-request messages, received from the independent, network-connected resource, within a predetermined response time, such as one second, one tenth of a second, one hundredth of a second, one millisecond, etc. Using the software installed on the network-connected processor to provide substantially continuous status information to an independent, network-connected resource may also involve sending, in response to a change in status of the network-connected processor, a status-update message to the independent, network-connected resource within a preselected update interval, such as one second, one tenth of a second, one hundredth of a second, one millisecond, etc. The change in status that initiates the sending of a status-update message may include any local activity indicator (such as keyboard activity, other processes in the process queue, etc.) that indicates additional demand for the processing resources of the network- connected processor. Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a distributed computing system comprising: a multiplicity of worker processors; at least one supervisory processor, configured to assign tasks to, and monitor the status of, the worker processors; an always-on, peer-to-peer computer network linking the worker processors and the supervisory processor(s); and at least one of the at least one supervisory processor(s) including a monitoring module, which monitors the status of worker processors expected to be executing assigned tasks, so as to ensure that the distributed computing system maintains always-live operation. The monitoring module may receive status messages from at least each of the worker processors expected to be executing assigned tasks. The monitoring module may be used to detect abnormalities in the operation of the worker processors expected to be executing assigned tasks, and/or their associated network connections, by, for example, detecting an absence of expected status messages received from the worker processors. The monitoring module may repeatedly check for an absence of expected status messages at a frequency of at least once each minute, at least once every ten seconds, at least once each second, at least once every tenth of a second, etc. The monitoring module may also be used to detect the presence of non-assigned-task-related activity on the worker processors expected to be executing assigned tasks. Activity monitor programs may be run on each of the worker processors expected to be executing assigned tasks. The activity monitor programs comprise Screensaver programs. The activity monitor programs may be configured to detect one or more of the following types of non-assigned- task-related activity: keyboard activity; mouse activity; pointer activity; touchscreen activity; voice activity; and execution of substantial non-assigned-task-related processes.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to an always-live distributed computing system, comprising: a pool of worker processors, each having installed worker processor software, and each connected to an always-on, peer-to-peer computer network; and at least one supervisory processor, also connected to the always-on, peer-to-peer computer network, and configured to assign tasks to the worker processors, monitor, on a substantially continuous basis, the status of worker processors expected to be engaged in the processing of assigned tasks and reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks. The computer network may have a bandwidth of at least 250 kilobits/second, at least 1 megabit/second, etc. The at least one supervisory processor may monitor the status of worker processors expected to be engaged in the processing of assigned tasks by sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks. Such status-request message(s) may be sent at a frequency of at least once every 10 seconds, at least once each second, at least twenty times each second, etc. The at least one supervisory processor may monitor the status of worker processors expected to be engaged in the processing of assigned tasks by periodically checking to ensure that a heartbeat message has been received, within a preselected frequency interval, from each worker processor that is expected to be engaged in the processing of assigned tasks. The preselected frequency interval may be, for example, one second, one tenth of a second, one hundredth of a second, one millisecond, etc. Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to a processing element for use in a distributed processing system, the processing element comprising: at least one processor; memory; at least one high-bandwidth interface to a computer network; and worker processor software, configured to receive tasks via the high-bandwidth interface and to provide substantially continuous status information via the high-bandwidth interface. The substantially continuous status information may be provided by sending periodic heartbeat messages. The substantially continu us status information may also be provided by sending prompt responses to received status-request messages. The substantially continuous status information may also be provided by promptly sending a status-update message in response to changes in status.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to article(s)-of-manufacture for use in connection with a network-based distributed computing system, the article(s)-of- manufacture comprising at least one computer-readable medium containing instructions which, when executed, cause: assignment of tasks to a plurality of worker processors via the network; and monitoring, on a substantially continuous basis, of the status of at least each of the plurality of assigned worker processors until each such processor completes its assigned task.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to article(s)-of-manufacture for use in connection with an always-live distributed computing system, the article(s)-of-manufacture comprising at least one computer-readable medium containing instructions which, when executed, cause: a pool of worker processors to install worker processor software provided via an always-on, peer-to-peer computer network; provide communication paths between the worker processors and at least one supervisory processor via the always-on, peer-to-peer computer network; cause the at least one supervisory processor to monitor, on a substantially continuous basis, the status of worker processors expected to be engaged in the processing of assigned tasks; and cause the at least one supervisory processor to reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks. Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to article(s)-of-manufacture for use in connection with a processing element constituting a part of a distributed computing system, the article(s)-of-manufacture comprising at least one computer-readable medium containing instructions which, when executed, cause: worker processor software to be installed that permits the processing element to receive tasks from, and provide results to, one or more independent, network-connected resource(s); and the installed worker processor software to be executed and provide substantially continuous status information to one or more of the o independent, network-connected resource(s).
An important concept underlying the certain aspects of the present invention is the idea of using redundancy, either full or partial, to mitigate quality-of-service problems that plague many distributed computing approaches. In most distributed processing jobs, there will exist one or more 5 "critical tasks" for which a delay in task completion can disproportionately affect the overall completion time for the job; in other words, a delay in completion of a critical task will have a greater effect than a comparable delay in completion of at least some other "non-critical" task. (Additionally, a "critical task" may be a task which, for whatever reason (e.g.. historical behavior, need to retrieve data o via unreliable connections, etc.), poses an enhanced risk of delay in completion, even if such delay will not disproportionately impact overall job completion.)
Certain aspects of the present invention are based, at least in part, on the inventors' recognition that quality-of-service problems in distributed 5 computing are frequently caused by delays in completing critical task(s), and that such quality-of-seπ/ice problems can be effectively mitigated by through redundancy. One aspect of the invention provides methods, apparatus and articles-of-manufacture for assigning additional (i.e.. redundant) resources to ensure timely completion of a job's critical task(s). Preferably, although not o necessarily, only the critical task(s) receive redundant resource assignment; alternatively, a job's various tasks may be assigned redundant resources in accordance with their relative criticality — e.g.. marginally critical tasks are each assigned to two processors, critical tasks are each assigned to three processors, and the most critical tasks are each assigned to four or more processors. Another aspect of the invention provides methods, apparatus and articles-of-manufacture for selectively assigning higher-capability (e.g.. faster, more memory, greater network bandwidth, etc.) processing elements/resources to a job's more critical tasks.
Accordingly, generally speaking, and without intending to be limiting, another aspect of the invention relates to methods, apparatus or articles-of- manufacture for improving quality-of-service in a distributed computing system including, for example, a multiplicity of network-connected worker processors and at least one supervisory processor, the supervisory processor configured to assign tasks to the worker processors, the methods/apparatus/articles-of- manufacture involving, for example, the following: identifying one or more of the tasks as critical task(s); assigning each of the tasks, including the critical task(s), to a worker processor; redundantly assigning each of the one or more critical task(s) to a worker processor; and monitoring the status of the assigned tasks to determine when all of the tasks have been completed by at least one worker processor. The methods, apparatus or articles of manufacture may further involve monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s). Monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s) may include receiving status messages from at least each of the worker processor(s) that have been assigned non-critical task(s) until each the processor completes its assigned task. Monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s) may also involve detecting abnormalities in the operation of the worker processor(s) that have been assigned non-critical task(s), and/or their associated network connections, by detecting an absence of expected status message(s) received by the at least one supervisory processor. Such act of detecting an absence of expected status message(s) received by the at least one supervisory processor is preferably repeated at a preselected interval, such as at least once every ten minutes, at least once each minute, at least once each second, at least once every tenth of a second, or at any other appropriate interval selected to maintain an expected quality-of-service. Monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s) may also involve detecting the presence of non-assigned-task-related activity on at least the worker processor(s) that have been assigned the non-critical task(s). Detecting the presence of non-assigned-task-related activity may include running an activity monitor program on at least each of the worker processor(s) that have been assigned non-critical task(s). Such activity monitor programs may behave substantially like screen saver programs, and may be configured to send, in response to detection of keyboard activity, a message to at least one of the at least one supervisory processor(s). Such activity monitory programs may also be configured to send a message to at least one of the at least one supervisory processor(s) in response to detection of any of the following: (i) mouse activity; (ii) pointer activity; (iii) touchscreen activity; (iv) voice activity; and/or (v) execution of substantial non-assigned-task-related processes. Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to methods, apparatus or articles-of-manufacture for operating a peer-to-peer distributed computing system, involving, for example, the following: providing a pool of worker processors, each having installed worker processor software, and each connected to an always-on, peer-to-peer computer network; providing at least one supervisory processor, also connected to the always-on, peer-to-peer computer network; using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks; and using the at least one supervisory processor to redundantly assign one more critical task(s) to one or more additional worker processors. Providing a pool of worker processors may also involve ensuring that each of the worker processors is linked to the always-on, peer-to-peer computer network through a high-bandwidth connection at, for example, a data rate of at least 100 kilobits/sec, at least 250 kilobits/sec, at least 1 megabit/sec, at least 10 megabits/sec, at least 100 megabits/sec, or at least 1 gigabit/sec. Using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks may include sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks. The process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is preferably repeated on a regular basis, such as at least once every second, at least once every tenth of a second, at least once every hundredth of a second, or at least once every millisecond. Using the at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks may involve periodically checking to ensure that a heartbeat message has been received, within a preselected frequency interval, from each worker processor that is expected to be engaged in the processing of assigned tasks. The preselected frequency interval may be less than ten minutes, less than two minutes, less than one minute, less than twenty seconds, less than one second, less than one tenth of a second, less than one hundredth of a second, less than one millisecond, etc.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to methods, apparatus or articles-of-manufacture for performing a job using a peer-to-peer network-connected distributed computing system, the job illustratively comprised of a plurality of tasks, the methods/apparatus/ articles-of-manufacture involving, for example, the following: initiating execution of each of the plurality of tasks on a different processor connected to the peer-to-peer computer network; initiating redundant execution of at least one of the plurality of tasks on yet a different processor connected to the peer-to-peer computer network; and once each of the plurality of tasks has been completed by at least one processor, reporting completion of the job via the peer-to-peer computer network. Preferably, at least one of the plurality of tasks that is/are redundantly assigned is/are critical task(s). The methods/apparatus/articles-of-manufacture may further involve monitoring, on a periodic basis, to ensure that progress is being made toward completion of the job. Such monitoring may be performed at least once every five minutes, at least once every two minutes, at least once each minute, at least once every ten seconds, at least once a second, at least once every tenth of a second, at least once every hundredth of a second, at least once every millisecond, etc. Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to methods, apparatus or articles-of-manufacture for performing a job using a plurality of independent, network-connected processors, the job illustratively comprising a plurality of tasks, the methods/apparatus/articles-of-manufacture involving, for example, the following: assigning each of the plurality of tasks to a different processor connected to the computer network; redundantly assigning at least some, but preferably not all, of the plurality of tasks to additional processors connected to the computer network; and using the computer network to compile results from the assigned tasks and report completion of the job. Redundantly assigning at least some of the plurality of tasks to additional processors may involve assigning critical tasks to additional processors, and preferably involves assigning at least one critical task to at least two processors. The methods/apparatus/articles-of-manufacture may further involve generating a heartbeat message from each processor executing an assigned task, preferably on a regular basis, such as at least once every second, at least once every tenth of a second, at least once every hundredth of a second, at least once every millisecond, etc.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to methods, apparatus or articles-of-manufacture for performing a job using a pool of network-connected processors, the job illustratively comprising a plurality of tasks, the number of processors in the pool greater than the number of tasks in the job, the methods/apparatus/ articles-of-manufacture involving, for example, the following: assigning each of the plurality of tasks to at least one processor in the pool; redundantly assigning at least some of the plurality of tasks until all, or substantially all, of the processors in the pool have been assigned a task; and using the computer network to compile results from the assigned tasks and report completion of the job. Redundantly assigning at least some of the plurality of tasks preferably includes redundantly assigning a plurality of critical tasks.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to methods, apparatus or articles-of-manufacture for using redundancy, in a network-based distributed processing environment, to avoid or mitigate delays from failures and/or slowdowns of individual processing elements, the methods/apparatus/articles-of-manufacture involving, for example, the following: receiving a job request, from a client, over the network; processing the job request to determine the number, K, of individual tasks to be assigned to individual network-connected processing elements; determining a subset, N, of the K tasks whose completion is most critical to the overall completion of the job; and assigning each of the K tasks to an individual network-connected processing element; and redundantly assigning at least some of the N task(s) in the subset to additional network-connected processing element(s). Determining the subset, N, of the K tasks whose completion is most critical to the overall completion of the job may include one or more of the following: (i) assigning, to the subset, task(s) that must be completed before other task(s) can be commenced; (ii) assigning, to the subset, task(s) that supply data to other task(s); (iii) assigning, to the subset, task(s) that is/are likely to require the largest amount of memory; (iv) assigning, to the subset, task(s) that is/are likely to require the largest amount of local disk space; (v) assigning, to the subset, task(s) that is/are likely to require the largest amount of processor time; and/or (vi) assigning, to the subset, task(s) that is/are likely to require the largest amount of data communication over the network. The methods/apparatus/articles-of-manufacture may further involve: determining, based on completions of certain of the K tasks and/or N redundant task(s), that sufficient tasks have been completed to compile job results; and reporting job results to the client over the network.
Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to methods, apparatus or articles-of-manufacture for using a group of network-connected processing elements to process a job, the job illustratively comprised of a plurality of tasks, one or more of which are critical tasks, the methods/apparatus/articles-of-manufacture involving, for example, the following: identifying a one or more higher-capacity processing elements among the group of network-connected processing elements; assigning at least one critical task to at least one of the identified higher-, capacity processing elements; assigning other tasks to other processing elements such that each task in the job has been assigned to at least one processing element; and communicating results from the assigned tasks over the network. Identifying a one or more higher-capacity processing elements among the group of network-connected processing elements may involve one or more of the following: (i) evaluating the processing capacity of processing elements in the group based on their execution of previously-assigned tasks; (ii) determining the processing capacity of processing elements in the group through use of assigned benchmark tasks; and/or (iii) evaluating hardware configurations of at least a plurality of processing elements in the group. The methods/apparatus/articles-of-manufacture may further involve (i) ensuring that each critical task in the job is assigned to a higher-capacity processing element and/or (ii) storing the amount of time used by the processing elements to execute the assigned tasks and computing a cost for the job based, at least in part, on the stored task execution times. Computing a cost for the job based, at least in part, on the stored task execution times may involve charging a higher incremental rate for time spent executing tasks on higher-capability processing elements than for time spent executing tasks on other processing elements. Such computed costs are preferably communicated over the network. Again, generally speaking, and without intending to be limiting, another aspect of the invention relates to methods, apparatus or articles-of-manufacture for distributed computing, including, for example, the following: a multiplicity of worker processors; at least one supervisory processor, configured to assign tasks to, and monitor the status of, the worker processors, the at least one supervisory processor further configured to assign each critical task to at least two worker processors; an always-on, peer-to-peer computer network linking the worker processors and the supervisory processor(s); and at least one of the at least one supervisory processor(s) including a monitoring module, which monitors the status of worker processors expected to be executing assigned tasks to ensure that the distributed computing system maintains always-live operation. The monitoring module preferably receives status messages from at least each of the worker processors expected to be executing assigned tasks, and preferably detects abnormalities in the operation of the worker processors expected to be executing assigned tasks, and/or their associated network connections, by detecting an absence of expected status messages received from the worker processors. The monitoring module checks for an absence of expected status messages at predetermined intervals, such as at least once each minute, at least once each second, etc. Alternatively, or additionally, the monitoring module may be configured to detect the presence of non-assigned-task-related activity on the worker processors expected to be executing assigned tasks, preferably through use activity monitor programs running on each of the worker processors expected to be executing assigned tasks. Such activity monitor programs may comprise screensaver programs, and may be configured to detect one, two, three or more of the following types of non-assigned-task-related activity: keyboard activity; mouse activity; pointer activity; touchscreen activity; voice activity; and/or execution of substantial non-assigned-task-related processes.
Additional aspects of the invention relate to systems, structures and articles-of-manufacture used, or useful, in connection with all, or part, of the above-described methods. Still further aspects of the invention relate to different combinations or sub-combinations of the above-described elements and process steps.
BRIEF DESCRIPTION OF THE FIGURES The above, as well as other, aspects, features and advantages of the present invention are exemplified below, with reference to a presently-preferred embodiment, the WebProc system (hosted by Website www.datasynapse.com), described under the heading Detailed Description below, which Description is intended to be read in conjunction with the following set of figures, in which:
FIG. 1 exemplifies the communication between various workers and brokers/servers in the datasynapse/WebProc environment; FIG. 2 illustrates further details of the datasynapse/WebProc environment; FIG. 3 illustrates aspects of the datasynapse/WebProc tasking API.
FIG. 4 illustrates aspects of the datasynapse/WebProc job submission process; FIG. 5 illustrates further aspects of the datasynapse/WebProc job submission process; FIG. 6 illustrates aspects of the datasynapse/WebProc job submission process, from a customer perspective; FIG. 7 illustrates aspects of the datasynapse/WebProc job verification process, from a job space perspective; FIG. 8 illustrates aspects of the datasynapse/WebProc job registration process;
FIG. 9 illustrates aspects of the datasynapse/WebProc job unpacking process; FIG. 10 illustrates aspects of the datasynapse/WebProc task management process; FIG. 11 illustrates aspects of the datasynapse/WebProc worker interface; FIG. 12 illustrates aspects of the datasynapse/WebProc task return process; FIG. 13 illustrates aspects of the datasynapse/WebProc job collation process; FIG. 14 illustrates aspects of the datasynapse/WebProc job return process; FIGs. 15-16 depict aspects of the datasynapse/WebProc security architecture; FIG. 17 contains exemplary datasynapse/WebProc Tasklnput code; FIG. 18 contains exemplary datasynapse/WebProc TaskOutput code;
FIG. 19 contains exemplary datasynapse/WebProc Task code; FIG. 20 contains exemplary datasynapse/WebProc TasklnputProcess code; and, FIG. 21 contains exemplary datasynapse/WebProc TaskOutputProcess code.
FIG. 22 depicts an exemplary network-based distributed processing system in which the present invention may be employed; FIG. 23 contains a flowchart illustrating the operation of an exemplary always-live distributed processing system in accordance with the invention; and,
FIG. 24 is a flowchart illustrating the operation of an exemplary redundancy-based, always-live distributed processing system in accordance with the invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT S.
Reference is made to FIG. 1 , which illustrates several aspects of the datasynapse network. As depicted, an illustrative distributed computing network comprises at least one Beekeeper server 1, a plurality of Queen bee servers 2a-f, each in communication with a beekeeper server, and a plurality of Worker bee PC's 3a-x, each in communication with one or more queen bee servers. The datasynapse network of Beekeeper(s) and Queen bees is preferably managed by a facilities outsource provider, and incorporates all of the redundancy and security features which other mission-critical users are afforded, including mirroring of servers, 24/7/365 uptime, etc.
Beekeeper 1 acts as the central exchange and preferably has three core responsibilities: (i) maintain customer registry; (ii) maintain worker bee registry; and (iii) load balance among the cluster of Queen bees. Beekeeper 1 is preferably designed to scale according to network supply. Because Queen bees 2a-f are each typically absorbing the bulk of high-level tasking and data throughput, Beekeeper 1 is able to concentrate on efficiently maintaining a master registry and load balance.
Beekeeper 1 automatically interrogates worker bees 3a-x that visit the datasynapse.com website and responds according to whether the worker is a narrowband, broadband but unregistered visitor, or authorized broadband visitor. Once registered, worker bees 3a-x automatically visit Beekeeper 1 upon activation to solicit the list of designated Queen bees 2a-f where the worker should seek work. This enables the datasynapse network to dynamically interrogate a worker and load balance, assigning a designated Queen bee server for all future interaction with the datasynapse network, defaulting to a secondary backup in the event the primary Queen bee experiences difficulties or has no work to perform. This designation relieves Beekeeper 1 from congestion issues and accelerates the overall distributed network throughput.
The Queen bees 2a-f manage the brokering of jobs from clients (not depicted) to worker bees 3a-x, once a client has been registered at Beekeeper 1 and designated to a Queen bee, similar to the Worker bee process outlined above. Queen bee 1 is preferably designed to scale up to at least 10,000 Worker bees 3a-x.
Reference is now made to FIG. 2, which illustrates further details of the datasynapse/WebProc environment. The datasynapse software seamlessly and easily integrates within existing or new applications which can capitalize on distributed processing. Tasking API 8 requests the user to organize its distributed problem in five intuitive classes 9, which collectively capture a simple yet flexible tasking semantic. The datasynapse software permits the user to bind a tasking implementation to a specific run-time job submission via a WebProc markup language (see, also, FIGs. 17-21). The customer downloads and installs, anywhere in its network, the lightweight WebProc customer stub 12, which supports the above-described API.
Customer Engine 11 automatically packages tasks into job entries. Customer stub 12 will preferably automatically download the most recent WebProc engine 11 at time of job submission. Such engine download enables datasynapse to update and enhance its functionality on a continuous basis, without interfering with customer applications 8, or forcing the customer to continuously re-install the software.
All communications are preferably transported by standard TCP/IP 4. JobSpace is the virtual boundary for the total datasynapse.com exchange of work components.
Illustrative worker engine 6y automatically processes tasks. Worker engine 6y is preferably automatically downloaded by a corresponding worker stub 5y at the start of executing a task. Such engine download enables datasynapse to update and enhance its functionality on a continuous basis without interfering with worker applications, or forcing the worker 3y to continuously re-install the software. Worker stub 5y is downloaded from the datasynapse.com website at registration. It is preferably a lightweight program which senses when a customer's screensaver program is on, and then visits the designated Queen bee server at intervals to take work, if available. Reference is now made to FIG. 3, which illustrates aspects of datasynapse/WebProc's tasking API. Five user inputs - illustratively depicted as TasklnputProcess 9a, Tasklnput(s) 9b-e, Task 9f, TaskOutput(s) 9g-j and TaskOutputProcess 9k ~ provide a customer flexibility to determine how best to extract task inputs from its database and return task outputs to its application. A feature of datasynapse's API is to permit nested and recursive parallel computations, which enables a user to submit multiple jobs in batches and not in sequential processing.
Reference is now made to FIG. 4, which illustrates aspects of the datasynapse/WebProc job submission process. A theme underpinning the datasynapse network design is the concept of implementing a loosely coupled system. In such a system, the coordination between each link is standardized in such a way that data can be passed between links without a need for complex messaging, or centralized coordination extending deep into successive links.
Considering first the customer's perspective 13, a customer engine 13a submits a job 16 and takes back results 17. This proactive approach reduces the need at the JobSpace level to extend software coordination into the customer site beyond the stub and engine. It further simplifies the overall coupling, as JobSpace does no planning whatsoever, nor does it do any pushing back to the customer. From the JopSpace perspective 14, JobSpace 14a reacts dynamically to job submissions, breaking them down into discrete tasks and queuing them for execution 20, and reacts dynamically to customer requests for job results, by combing the queue for finished job entries 23 and passing them along accordingly 22. Similarly, from the worker perspective 15, the worker 15a reports to JobSpace for work, if available, and takes/processes/returns work
24-25 once finished. If a worker task is not completed within expected amount of time, the task is preferably re-prioritized in the task queue for the next worker. (This provides an important security check, since out-of-the-ordinary worker execution time is likely to accompany an attempted security breach by the worker processor.) JobSpace dynamically matches high-capacity workers (e.g.. the worker registry can differentiate workers in terms of CPU speed, bandwidth, security profile, track record, and average times on-line, etc.) with equivalent utilization tasks, whenever feasible.
Reference is now made to FIG. 5, which illustrates further aspects of the datasynapse/WebProc job submission process. A job 16 is comprised of a series of Job Entries 26-29. There may be one or more Job Entries for each specific run-time Job Submission. Each Job Entry 26-29 includes an element descriptor 26a-29a and one or more Task Entries 26b-f, 27b-d, 28b, 29b-e. In the event there is only one Job Entry, the element descriptor will register "All." Otherwise, the first Job Entry 26a will register "Head" and the last 29a will register "Tail." Job Entries in between 27a-28a will each be registered as "Segment."
Each Job Entry can contain one or more tasks. The customer does not get involved in determining how tasks are packed into job entries for submission to datasynapse; the customer only identifies the number of tasks and the priority of its overall job mission to the WebProc customer engine, which automatically packs tasks into job entries according to datasynapse's efficient packing algorithms.
Tasks are preferably subject to the following attributes:
(1) They share the same Java archive ("jar") file. In other words, individual tasks can utilize different portions of the master instruction set, enabling distinct behavior (polymorphic), but must access one common instruction file.
(2) They are prepared and post-processed within the same customer virtual machine. This enables a customer maximum flexibility to integrate the results with their proprietary, in-house applications and data facilities. This preserves a further security wall in terms of revealing final aggregated and interpretive results only to a customer's in-house team. Datasynapse provides the time- consuming processing of the intermediate results. (3) The tasks share the same job context For example, parameters and properties at the job level are shared by all tasks. These attributes are set at job submission time. (4) They can be arbitrarily distinct, subject to the attributes mentioned above. For example, different data parameters and polymorphic behavior is permitted.
Reference is now made to FIG. 6, which illustrates aspects of the datasynapse/WebProc job submission process, from a customer perspective 13. From a customer perspective, only three actions are necessary to integrate to datasynapse's platform. First, the stub must be installed 30 on a server. This process takes about 10 minutes, requires no training, and once installed provides a live interface for the user to implement its tasking API. The installed customer stub is used to automatically download the Customer Engine 30a, automatically package a Job Entry 30b, and automatically submit a packaged Job to the Job Space 30j. Packaging a job 30b preferably includes assembling a record that includes a job id 30c, customer id 30d, instruction set (preferably in bytecode) 30f, element descriptor 30g, task entries 30k, and priority 30i.
Next, the customer must implement 31 a tasking API. The Tasking API is explicitly designed to capture generalized data in a means which can most readily integrate with existing applications. Implementing a taking API preferably includes creating a task input 31a, creating a task output 31b, creating one or more task(s) 31c, creating a task input process 31 d and creating a task output process 31 e.
Finally, a WebProcMarkup Language (XML file) enables a user to bind 32 the tasking implementation for job submission.
At all times, the customer is focused on his/her application, and is 0 thinking at "task" level. The customer is not worried about aggregating problems into "jobs" because this is automatically done by the engine. The customer must decide: (i) how many tasks to break its overall problem into, as the more tasks, the more efficient a solution; and (ii) what priority to assign its submission. Higher service levels incur higher charges. At this point, the 5 customer engine takes over and automates the transmission of the job to datasynapse.com. This process is analogous to packing a suitcase - the contents are determined by the customer, but the engine fits it into one or more suitcases for travel. Depending on the size of the job, the engine will send it to JobSpace in one or more Job Entries, with each Job Entry containing one or o more tasks.
Reference is now made to FIG. 7, which illustrates aspects of the datasynapse/WebProc job verification process, from a job space perspective 14. Every time a customer submits a job, there is a rigorous verification process 33a, as well as an evaluation process 33b, to ascertain the specific service objectives of a particular customer. Process 33a illustratively comprises decrypting a Job Entry 33b, recognizing customer and user id's 33c-d, matching one or more password(s) 33e, and determining whether the job's instructions are properly signed and/or verified 33f. Exceptions are handled by an exception handler 33n.
After passing the initial check(s), JobSpace automatically recognizes 33k if a job submission is new (first Job Entry) or is part of an ongoing job submission 33m. If new, the job moves to the registration phase 33d. If ongoing, the job is unpacked 33n and the tasks are organized into a queue. This verification and evaluation process tightly coordinates the front and back- office issues necessary to broker jobs on a continuous basis in a secure manner.
Reference is now made to FIG. 8, which illustrates aspects of the datasynapse/WebProc job registration process 34 from the JobSpace perspective 14. At this stage, the Job is assigned an ID 34b and status record 34c, and acknowledged in the master job registry 34k. A job status record illustratively includes an indication of total tasks 34d, completed tasks 34e, downloaded tasks 34f, CPU time 34g, total data input 34h, total data output 34i and task completion time 34j. JobSpace can monitor the job registry on a macro basis to ensure that there are no job exceptions, and to fine tune network performance as required. Reference is now made to FIG. 9, which illustrates aspects of the datasynapse/WebProc job unpacking process 35 from a JobSpace perspective 14. Once a job submission has been verified 33 and registered 34, JobSpace detaches its master instruction set to a Java archive (Jar) file 35c and records the URL address for the instruction set in each specific task entry. This jar file is accessible on the web to worker bees involved in executing job-related tasks. Next, the task entries are detached 35d. This involves detaching the data input 35e and recording its URL address in its place 35f. By detaching both the instruction set and accompanying data for each task entry, JobSpace is making each task entry a lightweight hand-off. It also decentralizes the storage of instructions and data outside the JobSpace activity circle, which is one of the reasons why JobSpace can scale so comfortably to internet levels of coordination.
This methodology of detaching data has a further advantage in that the customer has flexibility to preempt the sending of data in the original job submission, and can instead keep its data in a remote location. The customer has flexibility to escalate its security levels as well, in that both the instruction set and data can be encrypted, if so required. The detached task entry is put into the task queue 35f where it waits for a worker to pick it up. Records are stored regarding the time such a task entry was created, as well as when a worker received the task for processing, and completed the task. This enables datasynapse to measure both the transmission latency time and the raw task- crunching processing time as distinct time elements. Illustratively, a task entry record may include job id 35i, class id 35], task id 35k, job priority 35I, instruction set URL 35m, data URL 35n, one or more time stamp(s) 35o (including time of receipt 35p and time of completion 35q), and a worker bee id 35r.
Reference is now made to FIG. 10, which illustrates aspects of the datasynapse/WebProc task management process 21 , from a JobSpace perspective 14. An important aspect of the WebProc software platform is its ability to dynamically broker demand with supply, efficiently matching resources. The task queue is the mechanism where workers take the
"unpacked suitcase items" for processing and returns them when finished.
Once a worker has been appropriately verified 21a and its capabilities/track record assessed 21 e, JobSpace matches tasks 24c to workers in the waiting queue 21k according to the most appropriate fit, depending on priority, ability, and latency in the pending queue. This capability accounts for the robustness of the WebProc software in terms of fault tolerance. JobSpace sweeps the pending queue 211 and compares how long a task has been waiting relative to the average time it has taken other similar tasks in the same job class to be processed. If too long a delay has occurred, JobSpace resubmits the task from the pending queue to the waiting queue, and re-prioritizes its ranking if necessary. To the extent workers fail the verification process or are otherwise aberrant in behavior 21 g, JobSpace takes note and shuts them down from accessing the network, and/or for security reasons. As tasks are competed, they are placed in the competed queue 21m. All queues are kept up to date 21g-i. Reference is now made to FIG. 11 , which illustrates aspects of the datasynapse WebProc worker interface 24a, from a worker perspective 15. Similar to the customer, a registered worker needs to install a lightweight worker stub 24b in order to access the datasynapse JobSpace. Workers need do nothing else, however, once this download has been installed. The engine automatically interacts with JobSpace to get task entries 24c, download jar files 24f, download data 24g, perform work 24h, and return completed tasks 24i thereafter.
The worker is verified upon taking 24k and submitting 24j tasks. Workers build a profile in the Beekeeper registry so that JobSpace can determine how best to match specific tasks against specific workers on a rolling basis. This type of expert system balances matching against waiting time to optimize network arbitrage opportunities, and provides a basis for assessing worker performance, thereby enabling detection of aberrant performance.
Worker security is preferably implemented using a "sandbox" approach. The rules of the datasynapse sandbox preferably dictate: (i) no local disk access while processing datasynapse.com jobs; (ii) strict compliance with security features where the worker bee cannot pass on its data to any other URL address other than datasynapse's Beekeeper server, or the designated Queen bee connection; (iii) registered worker bees cannot be activated unless the specific instruction set received has been signed by datasynapse, verified, and encrypted; and (iv) no printing, no manipulation of local environment networks, and absolutely no content can be executed.
Further aspects of the invention relate to detection of aberrant worker processor performance, and to use of such detected aberrant performance in maintaining system security and integrity in a distributed processing environment. Worker processor performance metrics may be used to detect aberrant performance by processors executing tasks. In other words, if a processor is expected to complete a task in 1 minute, and the task is not competed in 2 minutes, one may conclude that the processor is (or may be) exhibiting aberrant performance. Another way to detect aberrant performance is to compare the performance of multiple worker processors executing similar tasks. In other words, when similar processors spend significantly different amounts of time (either real time or CPU time) executing similar jobs, it may be concluded that those significantly slower processors are exhibiting some sort of aberrant performance. Because aberrant performance may suggest a security breach on the aberrant-performing worker processor(s), such processor(s) may be selectively disabled and precluded from receiving further task allocations.
Reference is now made to 12, which illustrates aspects of the datasynapse/WebProc task return process, from a JobSpace perspective 14. Once a worker passes the verification process 24j, its returned task is placed into the completed queue and the pending 24I and completed 24m queues are adjusted to reflect this. This elegant queuing management process isolates the core engine from burdensome and integrated data storage requirements.
When the worker returns its task to JobSpace, the task is still lightweight because the task results (data output) is written back to the Data URL. In this way, the return of a task is identical to the taking of a task. Both activities are explicitly designed to keep the JobSpace engine clean of data and highly scalable as jobs are processed.
Reference is now made to FIG. 13, which illustrates aspects of the datasynapse/WebProc job collation process 22, from a JobSpace perspective 14. After the queue has been adjusted, JobSpace collates tasks according to its Job ID once a customer returns to JobSpace to take back its processed job. The interrogation of JobSpace by a customer seeking to take back a particular job triggers a search of the completed task queue and a re-packing of tasks into job entry format for transport back to the customer's application level. Task entries 22a and the job registry 22b are then appropriately updated and the job registry is closed 22d if job status = finished 22c. This "collating" process is highly efficient, responding dynamically to demand from customer to return completed tasks as they roll in, and not to wait for the whole job to be accomplished. Similar to the unpacking, this enables the customer to begin integrating immediately the results as they accumulate and expedites overall throughput through the JobSpace system.
' Reference is now made to FIG. 14, which illustrates aspects of the datasynapse/WebProc job return process, from a customer perspective 13. Job entries, once packed with completed tasks, can be processed by the customer's existing applications. The customer may take 17 a job by getting the completed Job Entry(ies) 17a and processing 17b it (or them). JobSpace preferably does not integrate further, because: (i) it is likely that a customer will seek to keep its end result proprietary, and take the intermediate results obtained through datasynapse and finalize analysis in its own environment; and (ii) it is unlikely that the processing of intermediate results will in itself be a parallelizable task and should therefore not be handled within the confines of JobSpace.
FIGs. 15-16 illustrate aspects of the WebProc security architecture, that permits secure message transmission, identification and authentication of the various client 41 and server 40 components of the datasynapse distributed processing network.
FIGs. 17-21 contain exemplary code segments, which segments will be self-explanatory to persons skilled in the art. These segments exemplify the previously-described Tasklnput 9b-e (FIG. 17), TaskOutput 9g-j (FIG. 18), Task 9f (FIG. 19), TasklnputProcess 9a (FIG. 20) and TaskOutputProcess 9k (FIG. 21) aspects of the datasynapse/WebProc tasking API. Referring now to FIG. 22, which depicts an exemplary context in which the method(s), apparatus and/or article(s)-of-manufacture of the invention may be applied, a computer network 201 is shown connecting a plurality of processing resources. (Although, for clarity, only six processing resources are shown in FIG. 22, the invention is preferably deployed in networks connecting hundreds, thousands, tens of thousands or greater numbers of processing resources.) Computer network 201 may utilize any type of transmission medium (e.g.. wire, coax, fiber optics, RF, satellite, etc.) and any network protocol. However, in order to realize the principal benefit(s) of the present invention, computer network 201 should provide a relatively high bandwidth (e.g., at least 100 kilobits/second) and preferably, though not necessarily, should provide an "always on" connection to the processing resources involved in distributed processing activities.
Still referring to FIG. 22, one or more supervisory processor(s) 213 may communicate with a plurality of worker processors 210 via computer network 201. Supervisory processor(s) 213 perform such tasks as:
• accepting job(s) from clients;
• assigning/reassigning tasks to (or among) worker processors; • managing pools of available worker processors;
• monitoring the status of worker processors;
• monitoring the status of network connections;
• monitoring the status of job and task completions; and/or,
• resource utilization tracking, timekeeping and billing. Still referring to FIG. 22, the depicted plurality 213 of worker processors 211 and 212 may operate collaboratively as a group, independently (e.g., each handing different job(s), task(s) and/or worker processor pool(s)) and/or redundantly (thus providing enhanced reliability). However, to realize a complete distributed processing system in accordance with the invention, only a single supervisory processor (e.g.. 211 or 212) is needed. Still referring to FIG. 22, plurality 210 of worker processors illustratively comprises worker processors 202, 204, 206 and 208, each connected to computer network 201 through network connections 203, 205, 207 and 209, respectively. These worker processors communicate with supervisory processor(s) 213 via network 201, and preferably include worker processor software that enables substantially continuous monitoring of worker processor status and/or task execution progress by supervisory processor(s) 213.
Referring now to FIG. 23, which depicts an exemplary "always-live" task monitoring/management process, a received job request 320 is initially assigned 321 to a plurality of available worker processors. Then, until the client's job is completed, processor(s) working on assigned task(s) are continuously monitored to ensure that the job is completed in a substantially uninterrupted (or "always live") manner. In particular, a monitoring module repeatedly asks whether all assigned tasks have been completed 322. If so, then the job is complete, and results can be reported 323. If not, then the monitoring module inquires about the status 324 of processor(s) expected to be working on not-yet-completed tasks. If potential bottlenecks are discovered, affected task(s) are immediately reassigned 325 to ensure that the system remains "live" and the client's work gets completed in a timely manner. This process is repeated with a frequency sufficient to ensure that worker processor problems will not cause undue delay is completing the overall job.
Referring now to FIG. 24, which depicts an alternative, redundancy- based process in accordance with the invention, a job request is received 221 via the computer network. The received job request typically includes a multiplicity of subordinate tasks. The set of tasks is examined or analyzed to identify 222 critical tasks. Such identification 222 of critical tasks may take several forms, including, but by no means limited to, the following:
• identifying as critical any task(s) that the client has tagged as critical in the received job request; • using data dependency analysis techniques (like those commonly used in optimizing compilers) to identify critical task(s);
• using execution dependency analysis techniques (like those commonly used in compilers and interpreters) to identify critical task(s); • analyzing the operations called for by individual tasks to identify those critical task(s) most likely to demand the greatest processing and/or network resources;
• using past performance data for the job-in-question to identify critical task(s); and/or, • any combination of the above, or any combination of the above with other techniques. Identification 223 of available processing resources includes determining the available pool of potential worker processors, and may also include determining the capabilities (e.g., processor speed, memory, network bandwidth, historical performance) of processing resources in the identified pool. Each task is then assigned 224 to at least one processing element. Such task assignment may optionally involve assigning critical task(s) to higher- capability processing elements. Some (and preferably all) critical task(s) are also assigned 225 to additional (i.e.. redundant) processing elements. (Note that although 224 and 225 are depicted as discrete acts, they can be (and are preferably) performed together.)
Task executions are monitored, preferably on a substantially continuous basis, as described in connection with FIG. 23. Once such monitoring reveals that each of the job's tasks has been completed 226 by at least one of the assigned processing resources, then the results are collected and reported 227 to the client.
While the foregoing has described the invention by recitation of its various aspects/features and illustrative embodiment(s) thereof, those skilled in the art will recognize that alternative elements and techniques, and/or combinations and sub-combinations of the described elements and techniques, can be substituted for, or added to, those described herein. The present invention, therefore, should not be limited to, or defined by, the specific apparatus, methods, and articles-of-manufacture described herein, but rather by the appended claims, which are intended to be construed in accordance with well-settled principles of claim construction, including, but not limited to, the following:
Limitations should not be read from the specification or drawings into the claims (e.g.. if the claim calls for a "chair," and the specification and drawings show a rocking chair, the claim term "chair" should not be limited to a rocking chair, but rather should be construed to cover any type of "chair").
The words "comprising," "including," and "having" are always open-ended, irrespective of whether they appear as the primary transitional phrase of a claim, or as a transitional phrase within an element or sub-element of the claim (e.g.. the claim "a widget comprising: A; B; and C" would be infringed by a device containing 2A's, B, and 3C's; also, the claim "a gizmo comprising: A; B, including X, Y, and Z; and C, having P and Q" would be infringed by a device containing 3A's, 2X's, 3Y's, Z, 6P's, and Q). The indefinite articles "a" or "an" mean "one or more"; where, instead, a purely singular meaning is intended, a phrase such as "one," "only one," or "a single," will appear. Where the phrase "means for" precedes a data processing or manipulation "function," it is intended that the resulting means-plus-function element be construed to cover any, and all, computer implementation(s) of the recited "function" using any standard programming techniques known by, or available to, persons skilled in the computer programming arts. A claim that contains more than one computer-implemented means-plus-function element should not be construed to require that each means-plus-function element must be a structurally distinct entity (such as a particular piece of hardware or block of code); rather, such claim should be construed merely to require that the overall combination of hardware/firmware/software which implements the invention must, as a whole, implement at least the function(s) called for by the claims.
10

Claims

WHAT WE CLAIM IS:
1. A method for performing distributed, bandwidth-intensive computational tasks, comprising: providing Internet access to at least one broker processor, said at least one broker processor configured to receive jobs from Internet-connected customers; receiving a job from a customer via the Internet; in response to receipt of said job from said customer, directing a plurality of Internet-connected worker processors to perform a plurality of worker tasks related to the received job; awaiting execution of said worker tasks, said execution characterized by a predominance of worker processor-Internet communication activity; and, upon completion of said execution, confirming said completion of said execution via the Internet,
2. A method, as defined in claim 1 , wherein, during said execution, said plurality of worker processors are collectively utilizing, on average, at least 25% of their total available communication bandwidth.
3. A method, as defined in claim 1 , wherein, during said execution, said plurality of worker processors are collectively utilizing, on average, at least 30% of their total available communication bandwidth.
4. A method, as defined in claim 1 , wherein, during said execution, said plurality of worker processors are collectively utilizing, on average, at least 35% of their total available communication bandwidth.
5. A method, as defined in claim 1 , wherein, during said execution, said plurality of worker processors are collectively utilizing, on average, at least
40% of their total available communication bandwidth.
6. A method, as defined in claim 1 , wherein, during said execution, said plurality of worker processors are collectively utilizing, on average, at least 50% of their total available communication bandwidth.
7. A method, as defined in claim 2, wherein said execution includes searching the Internet in accordance with a search query supplied by said customer.
8. A method, as defined in claim 2, wherein said execution includes creating an index.
9. A method, as defined in claim 2, wherein said execution includes creating a database.
10. A method, as defined in claim 2, wherein said execution includes updating a database.
11. A method, as defined in claim 2, wherein said execution includes creating a report.
12. A method, as defined in claim 2, wherein said execution includes creating a backup or archival file.
13. A method, as defined in claim 2, wherein said execution includes performing software maintenance operations.
14. A method, as defined in claim 2, wherein said execution includes comparing objects downloaded from the Internet.
15. A method, as defined in claim 2, wherein said execution includes processing signals or images downloaded from the Internet.
16. A method, as defined in claim 2, wherein said execution includes broadcasting audio and/or video to a plurality of destinations on the Internet.
17. A method, as defined in claim 2, wherein said execution includes sending e-mail to a plurality of destinations on the Internet.
18. A method for reducing the cost of performing a bandwidth-intensive job on the Internet, said method comprising: transmitting a job execution request to a broker processor over the Internet; selecting, in response to said job execution request, a plurality of Internet-connected worker processors to be used in executing said job, said selection of worker processors being performed, at least in part, based on one or more bandwidth-related consideration(s); and, using said selected worker processors to execute, at least in part, said job.
19. A method, as defined in claim 18, wherein said worker processor selection is based, at least in part, on at least one bandwidth-related consideration selected from the list of: (i) the types of Internet connections installed on candidate worker processors; (ii) the locations of candidate worker processors; (iii) the time of day; and (iv) historical performance statistics of candidate worker processors.
20. A method, as defined in claim 18, wherein said worker processor selection is based, at least in part, on at least two bandwidth-related considerations selected from the list of: (i) the types of Internet connections installed on candidate worker processors; (ii) the locations of candidate worker processors; (iii) the time of day; and (iv) historical performance statistics of candidate worker processors.
21. A method, as defined in claim 18, wherein said worker processor selection is based, at least in part, on at least three bandwidth-related considerations selected from the list of: (i) the types of Internet connections installed on candidate worker processors; (ii) the locations of candidate worker processors; (iii) the time of day; and (iv) historical performance statistics of candidate worker processors.
22. A method for exploiting unused computational resources on the Internet, comprising: recruiting prospective worker processors over the Internet, said recruiting including: providing Internet-accessible instructions; providing Internet-downloadable worker processor software; providing an Internet-accessible worker processor operating agreement; and storing a plurality of work processor preferences; maintaining a registry of worker processors, said maintaining including: storing a plurality of URLs used to address said worker processors; storing a plurality of worker processor profiles, said profiles including information related to hardware and software configurations of said worker processors; and storing a plurality of worker processor past performance metrics; selecting a plurality of worker processors to collectively execute a job, said selecting being based, at least in part, on worker processor past performance metrics maintained by said worker processor registry; and, using said selected plurality of worker processors to execute said job.
23. A method, as defined in claim 22, wherein at least some of said prospective worker processors are connected to the Internet via a satellite connection.
24. A method, as defined in claim 22, wherein at least some of said prospective worker processors are connected to the Internet via a fixed wireless connection.
25. A method, as defined in claim 22, wherein at least some of said prospective worker processors are connected to the Internet via a mobile wireless connection.
26. A method, as defined in claim 22, wherein recruiting prospective worker processors further includes specifying the type and amount of compensation to be provided in exchange for use of worker processor resources.
27. A method, as defined in claim 22, wherein recruiting prospective worker processors further includes providing an on-line means of accepting said worker processor operating agreement.
28. A method, as defined in claim 22, wherein maintaining a registry of worker processors further includes determining the performance of worker processors listed in said registry by executing one or more benchmark programs on said worker processors.
29. A method, as defined in claim 28, wherein maintaining a registry of worker processors further includes updating said worker processor past performance metrics in accordance with measured benchmark program performance statistics.
30. A method, as defined in claim 22, wherein said selecting is further based, at least in part, on at least one bandwidth-related consideration selected from the list of: (i) the types of Internet connections installed on said worker processors; (ii) the locations of said worker processors; (iii) the time of day; and (iv) one or more of said stored preferences.
31. A method, as defined in claim 22, wherein said selecting is further based, at least in part, on at least two bandwidth-related considerations selected from the list of: (i) the types of Internet connections installed on said worker processors; (ii) the locations of said worker processors; (iii) the time of day; and (iv) one or more of said stored preferences.
32. A method for reselling Internet bandwidth associated with individual DSL-connected Internet workstations, said method comprising: entering on-line-completed operating agreements with a plurality of DSL-connected Internet users, said agreements providing for use of a plurality of DSL-connected Internet workstations controlled by said users; executing a customers's distributed task, using a plurality of said DSL-connected Internet workstations; storing, for each of said DSL-connected Internet workstations used in said distributed task execution, a bandwidth utilization metric; compensating the DSL-connected Internet users whose workstations where used in said distributed task execution, said compensation being determined, at least in part, based upon the bandwidth utilization metrics associated with the workstations used in said distributed task execution; and, charging the customer whose distributed task was executed using said
DSL-connected Internet workstations.
33. A method, as defined in claim 32, wherein said customer is charged, at least in part, based upon the bandwidth utilization metrics associated with the workstations used in executing the customer's distributed task.
34. A method, as defined in claim 32, wherein executing a customer's distributed task includes: receiving an execution request message from the customer over the Internet; processing said execution request using an Internet-connected broker processor; and initiating distributed execution of said task by sending messages, over the Internet, to a plurality of said DSL-connected Internet workstations.
35. A method, as defined in claim 32, wherein said compensation is further determined, at least in part, by at least one metric selected from the list consisting of: the amount of real time used by said DSL-connected Internet workstations in executing said distributed task; the amount of processor time used by said DSL-connected Internet workstations in said executing said distributed task; the amount of primary storage used by said DSL-connected Internet workstation in said executing said distributed task; the amount of secondary storage used by said DSL-connected Internet workstation in executing said distributed task; the time of day during which said execution occurred; the geographic location(s) of said DSL-connected Internet workstations.
36. A method, as defined in claim 32, wherein said compensation is further determined, at least in part, by at least two metrics selected from the list consisting of: the amount of real time used by said DSL-connected Internet workstations in executing said distributed task; the amount of processor time used by said DSL-connected Internet workstations in said executing said distributed task; the amount of primary storage used by said DSL-connected Internet workstation in said executing said distributed task; the amount of secondary storage used by said DSL-connected Internet workstation in executing said distributed task; the time of day during which said execution occurred; the geographic location(s) of said DSL-connected Internet workstations.
37. A method, as defined in claim 32, wherein said plurality of DSL-connected Internet workstations operate in accordance with one of the following protocols: ADSL, HDSL, IDSL, MSDSL, RADSL, SDSL, and VDSL.
38. A method for reselling Internet bandwidth associated with individual cable modem-connected Internet workstations, said method comprising: enrolling a plurality of cable modem-connected Internet users by installing worker processor software on a plurality of cable modem-connected Internet workstations controlled by said users; using said installed worker processor software to execute a distributed task on a plurality of said cable modem-connected Internet workstations; using said installed worker processor software to compute, for each workstation used in said distributed task execution, a billing metric determined, at least in part, by the amount of data communication involved in executing said distributed task; compensating the cable modem-connected Internet users whose workstations where used in said distributed task execution; charging a customer who requested execution of said distributed task; and, wherein said compensating and charging are performed, at least in part, using one or more of said computed billing metric(s), and wherein, for each distributed task executed, the amount charged to said customer exceeds the sum of all amounts paid to said cable modem-connected Internet users.
39. A method for executing jobs, comprised of a plurality of tasks, in a networked computing environment, said method comprising: providing networked access to at least one broker processor, said broker processor configured to receive a job from a user, unpack said job into a plurality of executable tasks, and direct a plurality of worker processors to initiate execution of said tasks; maintaining performance metrics for worker processors;. monitoring completion of tasks by said worker processors and, upon completion, updating said performance metrics; using said performance metrics to select, at least in part, worker processors to initiate execution of additional tasks; and, using said performance metrics to determine, at least in part, the charges to be billed to the user for execution of the job.
40. A method for executing jobs, as defined in claim 39, further comprising (a) using said performance metrics to detect aberrant performance of worker processors executing tasks; and (b) terminating execution of tasks on worker processors that display aberrant performance.
41. A method for operating a distributed computing system, said system including a multiplicity of network-connected worker processors and at least one supervisory processor, said supervisory processor configured to assign tasks to, and monitor the status of, said worker processors, said method comprising: assigning tasks to a plurality of said worker processors by sending task-assignment messages, via said network, from said at least one supervisory processor to said plurality of worker processors; and, monitoring, on a substantially continuous basis, the status of at least each of said plurality of assigned worker processors until each said processor completes its assigned task.
42. A method for operating a distributed computing system, as defined in claim 41, wherein monitoring, on a substantially continuous basis, the status of at least each of said plurality of assigned worker processors comprises receiving status messages from at least each of said plurality of assigned worker processors until each said processor completes its assigned task.
43. A method for operating a distributed computing system, as defined in claim 42, wherein monitoring, on a substantially continuous basis, the 0 status of at least each of said plurality of worker processors further comprises detecting abnormalities in the operation of said plurality of assigned worker processors, and/or their associated network connections, by detecting an absence of expected status message(s) received by said at least one supervisory processor. 5
44. A method for operating a distributed computing system, as defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every ten minutes.
45. A method for operating a distributed computing system, as o defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every five minutes.
46. A method for operating a distributed computing system, as defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every two minutes.
47. A method for operating a distributed computing system, as defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once each minute.
48. A method for operating a distributed computing system, as defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every thirty seconds.
49. A method for operating a distributed computing system, as defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every ten seconds.
50. A method for operating a distributed computing system, as defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every second.
51. A method for operating a distributed computing system, as defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every tenth of a second.
52. A method for operating a distributed computing system, as defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every hundredth of a second.
53. A method for operating a distributed computing system, as defined in claim 43, wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once each millisecond.
54. A method for operating a distributed computing system, as defined in claim 41 , wherein monitoring, on a substantially continuous basis, the status of at least each of said plurality of assigned worker processors comprises: detecting the presence of non-assigned-task-related activity on said worker processors.
55. A method for operating a distributed computing system, as defined in claim 54, wherein detecting the presence of non-assigned-task-related activity on said worker processors includes: running an activity monitor program on each of said assigned worker processors.
56. A method for operating a distributed computing system, as defined in claim 55, wherein: the activity monitor programs running on each of said assigned worker processors behave substantially like screen saver programs.
57. A method for operating a distributed computing system, as defined in claim 55, wherein: the activity monitory programs running on each of said assigned worker processors send, in response to detection of keyboard activity, a message to at least one of said at least one supervisory processor(s).
58. A method for operating a distributed computing system, as defined in claim 55, wherein: the activity monitory programs running on each of said assigned worker processors send, in response to detection of mouse activity, a message to at least one of said at least one supervisory processor(s).
59. A method for operating a distributed computing system, as 5 defined in claim 55, wherein: the activity monitory programs running on each of said assigned worker processors send, in response to detection of pointer activity, a message to at least one of said at least one supervisory processor(s).
60. A method for operating a distributed computing system, as defined in claim 55, wherein: the activity monitory programs running on each of o said assigned worker processors send, in response to detection of touchscreen activity, a message to at least one of said at least one supervisory processor(s).
61. A method for operating a distributed computing system, as defined in claim 55, wherein: the activity monitory programs running on each of said assigned worker processors send, in response to detection of voice activity, a message to at least one of said at least one supervisory processor(s).
62. A method for operating a distributed computing system, as defined in claim 55, wherein: the activity monitory programs running on each of said assigned worker processors send, in response to detection of execution of substantial non-assigned-task-related processes, a message to at least one of said at least one supervisory processor(s).
63. A method for operating a distributed computing system, as defined in claim 54, wherein detecting the presence of non-assigned-task-related activity on said worker processors includes: determining, in response to an activity monitor message received by at least one of said at least one supervisory of said processor(s), that at least one of s said assigned worker processors is undertaking non-assigned-task-related activity.
64. A method for operating a distributed computing system, as defined in claim 63, wherein the activity monitor message is generated by an activity monitor program running on one of said assigned worker processors. 0
65. A method for operating an always-live distributed computing system, comprising: providing a pool of worker processors, each having installed worker processor software, and each connected to an always-on, peer-to-peer computer network; 5 providing at least one supervisory processor, also connected to said always-on, peer-to-peer computer network; using said at least one supervisory processor to monitor, on a substantially continuous basis, the status of worker processors expected to be engaged in the processing of assigned tasks; and, o using said at least one supervisory processor to reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks.
66. A method for operating an always-live distributed computing system, as defined in claim 65, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network through a high-bandwidth connection.
67. A method for operating an always-live distributed computing system, as defined in claim 65, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 100 kilobits/sec.
68. A method for operating an always-live distributed computing system, as defined in claim 65, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 250 kilobits/sec.
69. A method for operating an always-live distributed computing system, as defined in claim 65, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 1 megabit/sec.
70. A method for operating an always-live distributed computing system, as defined in claim 65, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 10 megabits/sec.
71. A method for operating an always-live distributed computing system, as defined in claim 65, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 100 megabits/sec.
72. A method for operating an always-live distributed computing system, as defined in claim 65, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 1 gigabit/sec.
73. A method for operating an always-live distributed computing system, as defined in claim 65, wherein using said at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks includes: sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks.
74. A method for operating an always-live distributed computing system, as defined in claim 73, wherein said process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once every second.
75. A method for operating an always-live distributed computing system, as defined in claim 73, wherein said process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once every tenth of a second.
76. A method for operating an always-live distributed computing system, as defined in claim 73, wherein said process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once every hundredth of a second.
77. A method for operating an always-live distributed computing system, as defined in claim 73, wherein said process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once every millisecond.
78. A method for operating an always-live distributed computing system, as defined in claim 65, wherein using said at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks includes: periodically checking to ensure that a heartbeat message has been received, within a preselected frequency s interval, from each worker processor that is expected to be engaged in the processing of assigned tasks.
79. A method for operating an always-live distributed computing system, as defined in claim 78, wherein said preselected frequency interval is less than one second. o
80. A method for operating an always-live distributed computing system, as defined in claim 78, wherein said preselected frequency interval is less than one tenth of a second.
81. A method for operating an always-live distributed computing system, as defined in claim 78, wherein said preselected frequency interval is s less than one hundredth of a second.
82. A method for operating an always-live distributed computing system, as defined in claim 78, wherein said preselected frequency interval is less than one millisecond.
83. A method for operating an always-live distributed computing 0 system, as defined in claim 65, wherein using said at least one supervisory processor to reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks comprises: detecting aberrant behavior among the worker processors expected to be engaged in the processing of assigned tasks; and, 5 assigning tasks expected to be completed by said aberrant-behaving worker processor(s) to other available processor(s) in said worker processor pool.
84. A method for operating a network-connected processor as a processing element in a distributed processing system, the method comprising: 0 installing software that enables said network-connected processor to receive tasks from, and provide results to, one or more independent, network-connected resource(s); and, using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource.
85. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: sending a heartbeat message to said o independent, network-connected resource at least once every second.
86. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, s network-connected resource includes: sending a heartbeat message to said independent, network-connected resource at least once every tenth of a second.
87. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, o wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: sending a heartbeat message to said independent, network-connected resource at least once every hundredth of a second. 5
88. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: sending a heartbeat message to said 0 independent, network-connected resource at least once every millisecond.
89. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: responding to status-request messages, received from said independent, network-connected resource, within one second.
90. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: responding to status-request messages, received from said independent, network-connected resource, within one tenth of a second.
91. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: responding to status-request messages, received from said independent, network-connected resource, within one hundredth of a second.
92. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: responding to status-request messages, received from said independent, network-connected resource, within one millisecond.
93. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: sending, in response to a change in status of said network-connected processor, a status-update message to said independent, network-connected resource within one second.
94. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: sending, in response to a change in status of said network-connected processor, a status-update message to said independent, network-connected resource within one tenth of a second.
95. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 84, wherein using the software installed on said network-connected processor to provide substantially continuous status information to an independent, network-connected resource includes: sending, in response to a change in status of said network-connected processor, a status-update message to said independent, network-connected resource within one hundredth of a second.
96. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 93, wherein the change in status that initiates the sending of a status-update message is additional demand for the processing resources of the network-connected processor.
97. A method for operating a network-connected processor as a processing element in a distributed processing system, as defined in claim 93, wherein the change in status that initiates the sending of a status-update message is user input-related activity on the network-connected processor.
98. A distributed computing system comprising: a multiplicity of worker processors; at least one supervisory processor, configured to assign tasks to, and monitor the status of, said worker processors; an always-on, peer-to-peer computer network linking said worker processors and said supen/isory processor(s); and, at least one of said at least one supervisory processor(s) including a monitoring module, which monitors the status of worker processors expected to be executing assigned tasks, so as to ensure that the distributed computing system maintains always-live operation.
99. A distributed computing system, as defined in claim 98, wherein the monitoring module receives status messages from at least each of the worker processors expected to be executing assigned tasks.
100. A distributed computing system, as defined in claim 99, wherein the monitoring module detects abnormalities in the operation of said worker processors expected to be executing assigned tasks, and/or their associated network connections, by detecting an absence of expected status messages received from said worker processors.
101. A distributed computing system, as defined in claim 100, wherein the monitoring module checks for an absence of expected status messages at least once each minute.
102. A distributed computing system, as defined in claim 100, wherein the monitoring module checks for an absence of expected status messages at least once every ten seconds.
103. A distributed computing system, as defined in claim 100, wherein the monitoring module checks for an absence of expected status messages at least once each second.
104. A distributed computing system, as defined in claim 100, wherein the monitoring module checks for an absence of expected status messages at least once every tenth of a second.
105. A distributed computing system, as defined in claim 98, wherein the monitoring module detects the presence of non-assigned-task-related activity on the worker processors expected to be executing assigned tasks.
106. A distributed computing system, as defined in claim 105, further comprising: activity monitor programs running on each of the worker processors expected to be executing assigned tasks.
107. A distributed computing system, as defined in claim 106, wherein the activity monitor programs comprise screensaver programs.
108. A distributed computing system, as defined in claim 105, wherein the activity monitor programs detect at least one of the following types of
5 non-assigned-task-related activity: keyboard activity; mouse activity; pointer activity; touchscreen activity; voice activity; and execution of substantial non-assigned-task-related processes.
109. A distributed computing system, as defined in claim 105, wherein the activity monitor programs detect at least three of the following types of o non-assigned-task-related activity: keyboard activity; mouse activity; pointer activity; touchscreen activity; voice activity; and execution of substantial non-assigned-task-related processes.
110. An always-live distributed computing system, comprising: a pool of worker processors, each having installed worker processor 5 software, and each connected to an always-on, peer-to-peer computer network; and, at least one supervisory processor, also connected to said always-on, peer-to-peer computer network, and configured to assign tasks to said worker processors, monitor, on a substantially continuous basis, the status of worker o processors expected to be engaged in the processing of assigned tasks and reassign tasks, as needed, to achieve substantially uninterrupted processing of assigned tasks.
111. An always-live distributed computing system, as defined in claim 110, wherein said computer network has a bandwidth of at least 250 5 kilobits/second.
112. An always-live distributed computing system, as defined in claim 110, wherein said computer network has a bandwidth of at least 1 megabit/second.
113. An always-live distributed computing system, as defined in claim 0 110, wherein the at least one supervisory processor monitors the status of worker processors expected to be engaged in the processing of assigned tasks by sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks.
114. An always-live distributed computing system, as defined in claim 110, wherein the process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once every 10 seconds.
115. An always-live distributed computing system, as defined in claim 110, wherein the process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once each second.
116. An always-live distributed computing system, as defined in claim 110, wherein the process of sending a status-request message to, and receiving a return acknowledgment from, -each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least twenty times each second.
117. An always-live distributed computing system, as defined in claim 110, wherein the at least one supervisory processor monitors the status of worker processors expected to be engaged in the processing of assigned tasks by periodically checking to ensure that a heartbeat message has been received, within a preselected frequency interval, from each worker processor that is expected to be engaged in the processing of assigned tasks. 5 118. An always-live distributed computing system, as defined in claim
117, wherein the preselected frequency interval is less than one second.
119. An always-live distributed computing system, as defined in claim 117, wherein the preselected frequency interval is less than one tenth of a second. o 120. An always-live distributed computing system, as defined in claim
117, wherein the preselected frequency interval is less than one hundredth of a second.
121. A processing element for use in a distributed processing system, the processing element comprising: at least one processor; memory; at least one high-bandwidth interface to a computer network; and, worker processor software, configured to receive tasks via said high-bandwidth interface and to provide substantially continuous status information via said high-bandwidth interface.
122. A processing element, as defined in claim 121 , wherein substantially continuous status information is provided by sending periodic heartbeat messages.
123. A processing element, as defined in claim 121 , wherein substantially continuous status information is provided by sending prompt responses to received status-request messages.
124. A processing element, as defined in claim 121 , wherein substantially continuous status information is provided by promptly sending a status-update message in response to a change in status.
125. Artic!e(s)-of-manufacture for use in connection with a network-based distributed computing system, the article(s)-of-manufacture comprising at least one computer-readable medium containing instructions which, when executed, cause: assignment of tasks to a plurality of worker processors via said network; and, monitoring, on a substantially continuous basis, of the status of at least each of said plurality of assigned worker processors until each said processor completes its assigned task.
126. Article(s)-of-manufacture for use in connection with an always-live distributed computing system, the article(s)-of-manufacture comprising at least one computer-readable medium containing instructions which, when executed, cause: a pool of worker processors to install worker processor software provided via an always-on, peer-to-peer computer network; provide communication paths between said worker processors and at least one supervisory processor via said always-on, peer-to-peer computer network; cause said at least one supervisory processor to monitor, on a substantially continuous basis, the status of worker processors expected to be engaged in the processing of assigned tasks; and, cause said at least one supervisory processor to reassign tasks, as o needed, to achieve substantially uninterrupted processing of assigned tasks.
127. Article(s)-of-manufacture for use in connection with a processing element constituting a part of a distributed computing system, the articIe(s)-of-manufacture comprising at least one computer-readable medium containing instructions which, when executed, cause: s worker processor software to be installed that permits said processing element to receive tasks from, and provide results to, one or more independent, network-connected resource(s); and, said installed worker processor software to be executed and provide substantially continuous status information to one or more of said independent, 0 network-connected resource(s).
128. A method for improving quality-of-service in a distributed computing system, said system including a multiplicity of network-connected worker processors and at least one supervisory processor, said supervisory processor configured to assign tasks to said worker processors, said method 5 comprising: identifying one or more of said tasks as critical task(s); assigning each of said tasks, including said critical task(s), to a worker processor; redundantly assigning each of said one or more critical task(s) to a o worker processor; and, monitoring the status of said assigned tasks to determine when all of said tasks have been completed by at least one worker processor.
129. A method for improving quality-of-service in a distributed computing system, as defined in claim 128, further comprising: monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s).
130. A method for improving quality-of-service in a distributed computing system, as defined in claim 129, wherein monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s) comprises receiving status messages from at least each of the worker processor(s) that have been assigned non-critical task(s) until each said processor completes its assigned task.
131. A method for improving quality-of-service in a distributed computing system, as defined in claim 129, wherein monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s) comprises detecting abnormalities in the operation of the worker processor(s) that have been assigned non-critical task(s), and/or their associated network connections, by detecting an absence of expected status message(s) received by said at least one supervisory processor.
132. A method for improving quality-of-service in a distributed computing system, as defined in claim 131 , wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every ten minutes.
133. A method for improving quality-of-service in a distributed computing system, as defined in claim 131 , wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once each minute.
134. A method for improving quality-of-service in a distributed computing system, as defined in claim 131 , wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once each second.
135. A method for improving quality-of-service in a distributed computing system, as defined in claim 131 , wherein said act of detecting an absence of expected status message(s) received by said at least one supervisory processor is repeated at least once every tenth of a second.
136. A method for improving quality-of-service in a distributed computing system, as defined in claim 129, wherein monitoring, on a substantially continuous basis, the status of at least the worker processor(s) that have been assigned the non-critical task(s) comprises: detecting the presence of non-assigned-task-related activity on at least said worker processor(s) that have been assigned the non-critical task(s).
137. A method for improving quality-of-service in a distributed computing system, as defined in claim 136, wherein detecting the presence of non-assigned-task-related activity includes: running an activity monitor program on at least each of the worker processor(s) that have been assigned non-critical task(s).
138. A method for improving quality-of-service in a distributed computing system, as defined in claim 137, wherein: the activity monitor programs behave substantially like screen saver programs.
139. A method for improving quality-of-service in a distributed computing system, as defined in claim 137, wherein: the activity monitory programs send, in response to detection of keyboard activity, a message to at least one of said at least one supervisory processor(s).
140. A method for improving quality-of-service in a distributed computing system, as defined in claim 137, wherein: the activity monitory programs send, in response to detection of mouse activity, a message to at least one of said at least one supervisory processor(s).
141. A method for improving quality-of-service in a distributed computing system, as defined in claim 137, wherein: the activity monitory programs send, in response to detection of pointer activity, a message to at least one of said at least one supervisory processor(s).
142. A method for improving quality-of-service in a distributed computing system, as defined in claim 137, wherein: the activity monitory programs send, in response to detection of touchscreen activity, a message to at least one of said at least one supervisory processor(s).
143. A method for improving quality-of-service in a distributed computing system, as defined in claim 137, wherein: the activity monitory programs send, in response to detection of voice activity, a message to at least one of said at least one supervisory processor(s).
144. A method for improving quality-of-service in a distributed computing system, as defined in claim 137, wherein: the activity monitory programs send, in response to detection of execution of substantial non-assigned-task-related processes, a message to at least one of said at least one supervisory processor(s).
145. A method for improving quality-of-service in a distributed computing system, as defined in claim 136, wherein detecting the presence of non-assigned-task-related activity includes: determining, in response to an activity monitor message received by at least one of said at least one supervisory of said processor(s), that at least one of said worker processors is undertaking non-assigned-task-related activity.
146. A method for improving quality-of-service in a distributed computing system, as defined in claim 145, wherein the activity monitor message is generated by an activity monitor program running on one of said assigned worker processors.
147. A method for operating a peer-to-peer distributed computing system, comprising: providing a pool of worker processors, each having installed worker processor software, and each connected to an always-on, peer-to-peer computer network; providing at least one supervisory processor, also connected to said always-on, peer-to-peer computer network; using said at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks; and, using said at least one supervisory processor to redundantly assign one more critical task(s) to one or more additional worker processors.
148. A method for operating a peer-to-peer distributed computing system, as defined in claim 147, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network through a high-bandwidth connection. o
149. A method for operating a peer-to-peer distributed computing system, as defined in claim 148, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 100 kilobits/sec. s
150. A method for operating a peer-to-peer distributed computing system, as defined in claim 148, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 250 kilobits/sec. o
151. A method for operating a peer-to-peer distributed computing system, as defined in claim 148, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 1 megabit/sec. 5
152. A method for operating a peer-to-peer distributed computing system, as defined in claim 148, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 10 megabits/sec. o
153. A method for operating a peer-to-peer distributed computing system, as defined in claim 148, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 100 megabits/sec.
154. A method for operating a peer-to-peer distributed computing system, as defined in claim 148, wherein providing a pool of worker processors further includes ensuring that each of said worker processors is linked to said always-on, peer-to-peer computer network at a data rate of at least 1 gigabit/sec.
155. A method for operating a peer-to-peer distributed computing system, as defined in claim 147, wherein using said at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks includes: sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks.
156. A method for operating a peer-to-peer distributed computing system, as defined in claim 155, wherein said process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once every second.
157. A method for operating a peer-to-peer distributed computing system, as defined in claim 155, wherein said process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once every tenth of a second.
158. A method for operating a peer-to-peer distributed computing system, as defined in claim 155, wherein said process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once every hundredth of a second.
159. A method for operating a peer-to-peer distributed computing system, as defined in claim 155, wherein said process of sending a status-request message to, and receiving a return acknowledgment from, each worker processor that is expected to be engaged in the processing of assigned tasks is repeated at least once every millisecond.
160. A method for operating a peer-to-peer distributed computing system, as defined in claim 147, wherein using said at least one supervisory processor to monitor the status of worker processors expected to be engaged in the processing of assigned tasks includes: periodically checking to ensure that a heartbeat message has been received, within a preselected frequency interval, from each worker processor that is expected to be engaged in the processing of assigned tasks.
161. A method for operating a peer-to-peer distributed computing system, as defined in claim 160, wherein said preselected frequency interval is less than one second.
162. A method for operating a peer-to-peer distributed computing system, as defined in claim 160, wherein said preselected frequency interval is less than one tenth of a second.
163. A method for operating a peer-to-peer distributed computing system, as defined in claim 160, wherein said preselected frequency interval is less than one hundredth of a second. 0
164. A method for operating a peer-to-peer distributed computing system, as defined in claim 160, wherein said preselected frequency interval is less than one millisecond.
165. A method for performing a job using a peer-to-peer network-connected distributed computing system, the job comprising a plurality 5 of tasks, the method comprising: initiating execution of each of said plurality of tasks on a different processor connected to said peer-to-peer computer network; initiating redundant execution of at least one of said plurality of tasks on yet a different processor connected to said peer-to-peer computer network; o and, once each of said plurality of tasks has been completed by at least one processor, reporting completion of said job via said peer-to-peer computer network.
166. A method for performing a job using a peer-to-peer network-connected distributed computing system, as defined in claim 165, wherein said at least one of said plurality of tasks that is/are redundantly assigned is/are critical task(s).
167. A method for performing a job using a peer-to-peer network-connected distributed computing system, as defined in claim 165, further comprising: monitoring, on a periodic basis, to ensure that progress is being made toward completion of said job.
168. A method for performing a job using a peer-to-peer network-connected distributed computing system, as defined in claim 167, wherein said monitoring is performed at least once every 10 seconds.
169. A method for performing a job using a peer-to-peer network-connected distributed computing system, as defined in claim 167, wherein said monitoring is performed at least once a second.
170. A method for performing a job using a peer-to-peer network-connected distributed computing system, as defined in claim 167, wherein said monitoring is performed at least once every tenth of a second.
171. A method for performing a job using a peer-to-peer network-connected distributed computing system, as defined in claim 167, wherein said monitoring is performed at least once every hundredth of a second.
172. A method for performing a job using a peer-to-peer network-connected distributed computing system, as defined in claim 167, wherein said monitoring is performed at least once every millisecond.
173. A method for performing a job using a plurality of independent, network-connected processors, the job comprising a plurality of tasks, the method comprising: assigning each of said plurality of tasks to a different processor connected to said computer network; redundantly assigning at least some, but not all, of said plurality of tasks to additional processors connected to said computer network; and, using said computer network to compile results from the assigned tasks and report completion of the job.
174. A method, as defined in claim 173, wherein redundantly assigning at least some of said plurality of tasks to additional processors comprises assigning critical tasks to additional processors.
175. A method, as defined in claim 173, wherein redundantly assigning at least some of said plurality of tasks to additional processors comprises o assigning at least one critical task to at least two additional processors.
176. A method, as defined in claim 173, further comprising: generating a heartbeat message from each processor executing an assigned task at least once every second.
177. A method, as defined in claim 173, further comprising: generating 5 a heartbeat message from each processor executing an assigned task at least once every tenth of a second.
178. A method, as defined in claim 173, further comprising: generating a heartbeat message from each processor executing an assigned task at least once every hundredth of a second. o
179. A method, as defined in claim 173, further comprising: generating a heartbeat message from each processor executing an assigned task at least once every millisecond.
180. A method for performing a job using a pool of network-connected processors, the job comprising a plurality of tasks, the number of processors in 5 the pool greater than the number of tasks in the job, the method comprising: assigning each of said plurality of tasks to at least one processor in said pool; redundantly assigning at least some of said plurality of tasks until all, or substantially all, of said processors in said pool have been assigned a task; o and, using said computer network to compile results from the assigned tasks and report completion of the job.
181. A method, as defined in claim 180, wherein redundantly assigning at least some of said plurality of tasks includes redundantly assigning a plurality of critical tasks. 5
182. A method for using redundancy in a network-based distributed processing system to avoid or mitigate delays from failures and/or slowdowns of individual processing elements, the method comprising: receiving a job request, from a client, over the network; processing the job request to determine the number, K, of individual o tasks to be assigned to individual network-connected processing elements; determining a subset, N, of said K tasks whose completion is most critical to the overall completion of the job; assigning each of said K tasks to an individual network-connected processing element; and, s redundantly assigning at least some of the N task(s) in said subset to additional network-connected processing element(s).
183. A method, as defined in claim 182, for using redundancy in a network-based distributed processing system to avoid or mitigate delays from failures and/or slowdowns of individual processing elements, wherein o determining the subset, N, of said K tasks whose completion is most critical to the overall completion of the job includes assigning, to the subset, task(s) that must be completed before other task(s) can be commenced.
184. A method, as defined in claim 182, for using redundancy in a network-based distributed processing system to avoid or mitigate delays from 5 failures and/or slowdowns of individual processing elements, wherein determining the subset, N, of said K tasks whose completion is most critical to the overall completion of the job includes assigning, to the subset, task(s) that supply data to other task(s).
185. A method, as defined in claim 182, for using redundancy in a 0 network-based distributed processing system to avoid or mitigate delays from failures and/or slowdowns of individual processing elements, wherein determining the subset, N, of said K tasks whose completion is most critical to the overall completion of the job includes assigning, to the subset, task(s) that is/are likely to require the largest amount of memory.
186. A method, as defined in claim 182, for using redundancy in a network-based distributed processing system to avoid or mitigate delays from failures and/or slowdowns of individual processing elements, wherein determining the subset, N, of said K tasks whose completion is most critical to the overall completion of the job includes assigning, to the subset, task(s) that is/are likely to require the largest amount of local disk space.
187. A method, as defined in claim 182, for using redundancy in a network-based distributed processing system to avoid or mitigate delays from failures and/or slowdowns of individual processing elements, wherein determining the subset, N, of said K tasks whose completion is most critical to the overall completion of the job includes assigning, to the subset, task(s) that is/are likely to require the largest amount of processor time.
188. A method, as defined in claim 182, for using redundancy in a network-based distributed processing system to avoid or mitigate delays from failures and/or slowdowns of individual processing elements, wherein determining the subset, N, of said K tasks whose completion is most critical to the overall completion of the job includes assigning, to the subset, task(s) that is/are likely to require the largest amount of data communication over the network.
189. A method, as defined in claim 182, for using redundancy in a network-based distributed processing system to avoid or mitigate delays from failures and/or slowdowns of individual processing elements, further comprising: determining, based on completions of certain of said K tasks and/or N redundant task(s), that sufficient tasks have been completed to compile job results; and reporting job results to the client over the network.
190. A method for using a group of network-connected processing elements to process a job, the job comprised of a plurality of tasks, one or more of which are critical tasks, the method comprising: identifying a one or more higher-capacity processing elements among said group of network-connected processing elements; assigning at least one critical task to at least one of the identified higher-capacity processing elements; assigning other tasks to other processing elements such that each task in said job has been assigned to at least one processing element; and, communicating results from said assigned tasks over said network.
191. A method for using a group of network-connected processing elements to process a job, as defined in claim 190, wherein identifying a one or more higher-capacity processing elements among said group of network-connected processing elements includes evaluating the processing capacity of processing elements in said group based on their execution of previously-assigned tasks.
192. A method for using a group of network-connected processing elements to process a job, as defined in claim 190, wherein identifying a one or more higher-capacity processing elements among said group of network-connected processing elements includes determining the processing capacity of processing elements in said group through use of assigned benchmark tasks.
193. A method for using a group of network-connected processing elements to process a job, as defined in claim 190, wherein identifying a one or more higher-capacity processing elements among said group of network-connected processing elements includes evaluating hardware configurations of at least a plurality of processing elements in said group.
194. A method for using a group of network-connected processing elements to process a job, as defined in claim 190, further comprising: ensuring that each critical task in the job is assigned to a higher-capacity processing element.
195. A method for using a group of network-connected processing elements to process a job, as defined in claim 190, further comprising: storing the amount of time used by said processing elements to execute the assigned tasks; and computing a cost for said job based, at least in part, on said stored task execution times.
196. A method for using a group of network-connected processing elements to process a job, as defined in claim 195, wherein computing a cost for said job based, at least in part, on said stored task execution times includes charging a higher incremental rate for time spent executing tasks on higher-capability processing elements than for time spent executing tasks on other processing elements.
197. A method for using a group of network-connected processing elements to process a job, as defined in claim 195, further comprising: communicating the computed cost for said job over said network.
198. A distributed computing system comprising: a multiplicity of worker processors; at least one supervisory processor, configured to assign tasks to, and monitor the status of, said worker processors, said at least one supervisory processor further configured to assign each critical task to at least two worker processors; an always-on, peer-to-peer computer network linking said worker processors and said supervisory processor(s); and, at least one of said at least one supervisory processor(s) including a monitoring module, which monitors the status of worker processors expected to be executing assigned tasks to ensure that the distributed computing system maintains always-live operation.
199. A distributed computing system, as defined in claim 198, wherein the monitoring module receives status messages from at least each of the worker processors expected to be executing assigned tasks.
200. A distributed computing system, as defined in claim 199, wherein the monitoring module detects abnormalities in the operation of said worker processors expected to be executing assigned tasks, and/or their associated network connections, by detecting an absence of expected status messages received from said worker processors.
201. A distributed computing system, as defined in claim 200, wherein the monitoring module checks for an absence of expected status messages at least once each minute.
202. A distributed computing system, as defined in claim 200, wherein the monitoring module checks for an absence of expected status messages at least once each second.
203. A distributed computing system, as defined in claim 199, wherein the monitoring module detects the presence of non-assigned-task-related activity on the worker processors expected to be executing assigned tasks.
204. A distributed computing system, as defined in claim 203, further comprising: activity monitor programs running on each of the worker processors expected to be executing assigned tasks.
205. A distributed computing system, as defined in claim 204, wherein the activity monitor programs comprise Screensaver programs.
206. A distributed computing system, as defined in claim 204, wherein the activity monitor programs detect at least one of the following types of non-assigned-task-related activity: keyboard activity; mouse activity; pointer activity; touchscreen activity; voice activity; and execution of substantial non-assigned-task-related processes.
207. A distributed computing system, as defined in claim 204, wherein the activity monitor programs detect at least two of the following types of non-assigned-task-related activity: keyboard activity; mouse activity; pointer activity; touchscreen activity; voice activity; and execution of substantial non-assigned-task-related processes.
PCT/US2001/015247 2000-05-12 2001-05-11 Methods, apparatus, and articles-of-manufacture for network-based distributed computing WO2001088708A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001263056A AU2001263056A1 (en) 2000-05-12 2001-05-11 Methods, apparatus, and articles-of-manufacture for network-based distributed computing

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US20371900P 2000-05-12 2000-05-12
US60/203,719 2000-05-12
US09/583,244 US6757730B1 (en) 2000-05-31 2000-05-31 Method, apparatus and articles-of-manufacture for network-based distributed computing
US09/583,244 2000-05-31
US71163400A 2000-11-13 2000-11-13
US09/711,634 2000-11-13
US26618501P 2001-02-02 2001-02-02
US09/777,190 US20020023117A1 (en) 2000-05-31 2001-02-02 Redundancy-based methods, apparatus and articles-of-manufacture for providing improved quality-of-service in an always-live distributed computing environment
US60/266,185 2001-02-02
US09/777,190 2001-02-02

Publications (2)

Publication Number Publication Date
WO2001088708A2 true WO2001088708A2 (en) 2001-11-22
WO2001088708A3 WO2001088708A3 (en) 2003-08-07

Family

ID=27539488

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/015247 WO2001088708A2 (en) 2000-05-12 2001-05-11 Methods, apparatus, and articles-of-manufacture for network-based distributed computing

Country Status (2)

Country Link
AU (1) AU2001263056A1 (en)
WO (1) WO2001088708A2 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001014961A2 (en) * 1999-08-26 2001-03-01 Parabon Computation System and method for the establishment and utilization of networked idle computational processing power

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001014961A2 (en) * 1999-08-26 2001-03-01 Parabon Computation System and method for the establishment and utilization of networked idle computational processing power

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LONDON ET AL: "POPCORN - A Paradigm for Global-Computing" THESIS UNIVERSITY JERUSALEM, XX, XX, June 1998 (1998-06), XP002159919 *
NEARY M O ET AL: "Javelin: Parallel computing on the internet" FUTURE GENERATIONS COMPUTER SYSTEMS, ELSEVIER SCIENCE PUBLISHERS. AMSTERDAM, NL, vol. 15, no. 5-6, October 1999 (1999-10), pages 659-674, XP004176754 ISSN: 0167-739X *
TAKAGI H ET AL: "Ninflet: a migratable parallel objects framework using Java" CONCURRENCY: PRACTICE AND EXPERIENCE, JOHN WILEY AND SONS, GB, vol. 10, no. 11-13, February 1998 (1998-02), pages 1063-1078, XP002209552 ISSN: 1040-3108 *
WOODWARD J ET AL: "A broadband network architecture for residential video services featuring distributed control of broadband switches and video dial tone functionality" PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC). GENEVA, MAY 23 - 26, 1993, NEW YORK, IEEE, US, vol. 2, 23 May 1993 (1993-05-23), pages 853-857, XP002142500 ISBN: 0-7803-0950-2 *

Also Published As

Publication number Publication date
AU2001263056A1 (en) 2001-11-26
WO2001088708A3 (en) 2003-08-07

Similar Documents

Publication Publication Date Title
US6757730B1 (en) Method, apparatus and articles-of-manufacture for network-based distributed computing
US7533170B2 (en) Coordinating the monitoring, management, and prediction of unintended changes within a grid environment
US10733010B2 (en) Methods and systems that verify endpoints and external tasks in release-pipeline prior to execution
JP4527976B2 (en) Server resource management for hosted applications
US7568199B2 (en) System for matching resource request that freeing the reserved first resource and forwarding the request to second resource if predetermined time period expired
US7640547B2 (en) System and method for allocating computing resources of a distributed computing system
JP4954089B2 (en) Method, system, and computer program for facilitating comprehensive grid environment management by monitoring and distributing grid activity
US7668741B2 (en) Managing compliance with service level agreements in a grid environment
US8135841B2 (en) Method and system for maintaining a grid computing environment having hierarchical relations
US6463457B1 (en) System and method for the establishment and the utilization of networked idle computational processing power
US7103628B2 (en) System and method for dividing computations
US8136118B2 (en) Maintaining application operations within a suboptimal grid environment
Jacob et al. Enabling applications for grid computing with globus
US20070124731A1 (en) System architecture for distributed computing
US20050138291A1 (en) System and method for caching results
US20060149652A1 (en) Receiving bid requests and pricing bid responses for potential grid job submissions within a grid environment
US7703029B2 (en) Grid browser component
CN109614227A (en) Task resource concocting method, device, electronic equipment and computer-readable medium
Balman et al. Data scheduling for large scale distributed applications
Belhajjame et al. Defining and coordinating open-services using workflows
US7568006B2 (en) e-Business on-demand for design automation tools
WO2001088708A2 (en) Methods, apparatus, and articles-of-manufacture for network-based distributed computing
Filho et al. A fully distributed architecture for large scale workflow enactment
Luo et al. A heterogeneous computing system for data mining workflows in multi‐agent environments
Noor et al. Bepari: A Cost-aware Comprehensive Agent Architecture for Opaque Cloud Services

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AU BA BB BG BR BZ CA CN CR CU CZ DM DZ EE GD GE HR HU ID IL IN IS JP KP KR LC LK LR LT LV MA MD MG MK MN MX NO NZ PL PT RO SG SI SK TR TT UA US UZ VN YU ZA

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP