US10116543B2 - Dynamic asynchronous communication management - Google Patents

Dynamic asynchronous communication management Download PDF

Info

Publication number
US10116543B2
US10116543B2 US14/619,389 US201514619389A US10116543B2 US 10116543 B2 US10116543 B2 US 10116543B2 US 201514619389 A US201514619389 A US 201514619389A US 10116543 B2 US10116543 B2 US 10116543B2
Authority
US
United States
Prior art keywords
time
response
message
thread
remote system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/619,389
Other versions
US20160234090A1 (en
Inventor
Mark Cameron Little
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Red Hat Inc
Original Assignee
Red Hat Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Red Hat Inc filed Critical Red Hat Inc
Priority to US14/619,389 priority Critical patent/US10116543B2/en
Assigned to RED HAT, INC. reassignment RED HAT, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LITTLE, MARK CAMERON
Publication of US20160234090A1 publication Critical patent/US20160234090A1/en
Priority to US16/173,402 priority patent/US11271839B2/en
Application granted granted Critical
Publication of US10116543B2 publication Critical patent/US10116543B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • H04L43/0864Round trip delays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0715Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time

Definitions

  • the present disclosure relates generally to distributed computing systems, and more particularly to methods and systems for improving the efficiency of asynchronous communication between systems.
  • Asynchronous communication can provide a number of benefits over synchronous communication. For example, asynchronous systems may perform other tasks after sending a message and before receiving the response. In other words, asynchronous systems do not have to be idle while waiting for a response to a message.
  • Timeouts work by assuming that a response will not be received if it is not received during a specified amount of time. While use of timeouts can be helpful, it is desirable to find ways to improve the manner in which asynchronous systems communicate.
  • a method performed by a computing system includes, executing a thread, the thread comprising an instruction to send a message to a remote system, after sending the message, allowing continued execution of the thread, after a first period of time, checking for a response to the message, and in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response.
  • the predetermined amount of time is based on collected data associated with a set of conditions that correspond to a current set of conditions related to the remote system.
  • a method performed by a computing system includes, with the computing system, monitoring response times for a plurality of messages sent from the computing system to a remote system, the response times corresponding to an amount of time between sending that message and receiving a response to that message, associating with each of the plurality of messages, a set of conditions under which that message was sent, determining an expected time range for messages sent to the remote system, the expected time range being a function of the conditions, executing a thread, the thread comprising an instruction to send a first message to the remote system, after sending the first message, continuing execution of the thread, after a first period of time, checking for a response to the message, and in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response.
  • the predetermined amount of time is based on the expected time range based on a current set of conditions.
  • a system includes a processor and a memory, the memory comprising machine readable instructions that when executed by the processor, cause the system to execute a thread, the thread comprising an instruction to send a message to a remote system, after sending the message, continuing execution of the thread, after a first period of time, checking for a response to the message, and in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response.
  • the predetermined amount of time is based on collected data associated with a set of conditions that correspond to a current set of conditions related to the remote system.
  • FIG. 1 is a diagram showing illustrative computing system that communicates with a remote system, according to one example of principles described herein.
  • FIG. 2 is a flowchart showing an illustrative method for using dynamic asynchronous communication management, according to one example of principles described herein.
  • FIGS. 3A-3C are diagrams showing illustrative timing for reception of responses, according to one example of principles described herein.
  • FIG. 4 is a table showing an illustrative collection of data used to estimate response times, according to one example of principles described herein.
  • FIG. 5 is a graph showing an illustrative upper bound and lower bound for an expected response time as a function of a condition associated with the responding entity, according to one example of principles described herein.
  • FIG. 6 is a diagram showing an illustrative computing system that may be used to perform asynchronous communication, according to one example of principles described herein.
  • asynchronous system when a computing system in a distributed computing environment sends a message to another system, the thread that sends that message is not blocked. Thus, the thread continues to execute while it waits for a response to the message. At some later point in time, the thread checks to see if there has been an answer to the message. If not, then the application assumes that there will be no response because it can't wait forever for the response and there is no good mechanism to determine whether a response will eventually come. But, it may be that network traffic is heavy and so the response will take just a few seconds longer. Thus, the application may cause errors due to sending the same message multiple times.
  • a method for asynchronous communication management includes a dynamic approach that gives applications more information regarding how long a response should take based on accumulated historical data. Specifically, when a thread being processed by a computing system sends a message to another system, that thread is not blocked, and thus is allowed to continue execution. At some point in time, the thread checks to see if a response to that message has been received. If so, then execution continues as normal. But, if not, instead of assuming that a response will not come, the thread utilizes the historical data to determine an expected response time. If the expected response time has not yet elapsed, then the thread can wait an additional amount of time in accordance with the expected response time.
  • the historical data takes into account a variety of factors. Such factors may include the type of message and the type of service to which the message is being sent. The factors may also include the current network conditions or load on server systems that process the message and send the response. The historical data may thus provide a time range in which the response to a message is expected to be received under similar conditions.
  • FIG. 1 is a diagram showing illustrative computing system 102 that communicates with a remote system 110 .
  • the computing system 102 executes a thread 104 that sends a message 108 to the remote system 110 .
  • the computing system 102 receives a response from the remote system 110 .
  • the computing system 102 may be a node within a distributed computing system.
  • the remote system 110 may also be a node within the distributed computing system.
  • the computing system 102 and the remote system 110 may each include one or more processors and a memory such as the system that will be described in more detail below in the text accompanying FIG. 6 .
  • the computing system 102 is a client system that communicates with a remote service.
  • the remote system 110 may be a server such as a web server. The principles described herein may apply to other situations in which one computing system communicates with another in an asynchronous manner.
  • a thread 104 is a unit of program instructions that is scheduled for execution on a processor. Many processors have multithread capability and can thus execute multiple threads by time division multiplexing. Additionally, as some threads become blocked, the processor can execute other threads until the blocked thread becomes unblocked.
  • the thread 104 includes an instruction 106 for sending a message 108 to the remote system 110 .
  • the message 108 may be a request for services.
  • the message 108 may include various parameters and other information that is to be processed by the remote system 110 .
  • the remote system 110 then returns the requested information as a response 112 to the requesting computing system 102 .
  • the thread 104 After the message 108 is sent, the thread 104 is not blocked, and thus is allowed to continue execution. While the processor may switch to other threads and back, the thread 104 can continue execution as normal because it is not blocked. At some point in time, the thread 104 calls into the communication infrastructure to determine if a response has been received. Conventionally, if a response has not yet been received, the thread 104 assumes failure and acts accordingly, which in some cases may involve resending the message 108 . But, according to principles described herein, the computing system 102 has access to a set of collected data 120 related to past response times for similar responses under similar conditions. Thus, if the expected time range has not yet elapsed, the computing system 102 can wait 118 for an additional period of time before assuming failure.
  • the computing system 100 is responsible for collecting the data 120 . This may be done by monitoring the response times for multiple messages sent to various remote systems. Additionally, the computing system 102 determines the conditions such as network load, processor load of the remote system, and type of response. After a statistically relevant number of response times have been collected for various conditions, the computing system 102 can form an expected time range that is a function of a specified set of conditions. Thus, when the computing system 102 sends a new message, it can check the current conditions and consult the collected data to determine a response time based on the current conditions. The computing system 102 can also update the collected data by measuring the response time for the new message.
  • FIG. 2 is a flowchart showing an illustrative method for using dynamic asynchronous communication management.
  • the method includes a step 202 for sending a message to a remote system.
  • the thread that includes the send instruction is not blocked, and thus is allowed to execute.
  • the first period of time may vary depending on the application.
  • the first period of time is a standard timeout mechanism used in asynchronous systems.
  • the first period of time represents a point at which the thread is not able to proceed without the response.
  • the thread continues unblocked and does not yet check for a response to the message. But, if the first period of time has elapsed, the thread calls into the communication infrastructure to see if a response to the sent message has been received at step 206 . If the response has in fact arrived, then there is no issue. The thread can continue as normal at step 214 . The thread may even complete if no further responses are needed for the thread and the thread finishes executing its instructions. But, if the response has not yet been received, then the method proceeds to step 208 .
  • step 208 if the response has not yet been received, then it is determined whether a second period of time has passed.
  • the second period of time corresponds to an upper bound of a time range in which the response is expected to arrive.
  • the expected response time is based on a historical data under conditions that are similar to the current conditions. For example, historical data can be collected that indicates that under a certain network load and a certain server load, then this type of request typically receives a response within 3-4 seconds.
  • the system will wait as indicated in step 216 .
  • the system may respond in different ways based on the application. For example, if the thread cannot proceed without the response, then the thread can be blocked during this waiting period. If, however, the thread has other instructions that can be performed without waiting on the response, then the thread may be allowed to continue. During this time, the processor may switch to other threads as needed.
  • step 210 it is determined whether the response has been received at step 210 . If the message has not been received, then it can be assumed that the response will not come, at step 212 . The application associated with the thread can then handle the situation accordingly. If, however, the response has been received, then the method 200 proceeds to step 214 where execution of the thread continues as normal.
  • FIGS. 3A-3C are diagrams showing illustrative timing for reception of responses.
  • FIG. 3A illustrates the case where the response is received after the first period of time but before the upper bound of the expected time.
  • the vertical axis represents time.
  • Point 302 represents the point at which a message is sent to a remote system.
  • Point 304 represents the end of the first period of time.
  • the first period of time is the normal time at which the application checks to see if a response as done conventionally without the use of principles described herein.
  • Point 306 represents the point in time at which a response to the sent message is received.
  • the computing system determines the current conditions, such as network load, processor load (of the remote system to which a message is sent), and type of message. Other conditions can be considered as well. Using the current conditions, the system consults historical data that has been collected on similar responses under similar conditions. The historical data can provide an expected range based on such conditions.
  • the current conditions such as network load, processor load (of the remote system to which a message is sent), and type of message. Other conditions can be considered as well.
  • the system consults historical data that has been collected on similar responses under similar conditions. The historical data can provide an expected range based on such conditions.
  • the first period of time elapses before the message is received.
  • the system that embodies principles described herein knows that the response may take longer based on current network conditions.
  • the system waits for an additional period of time 310 .
  • the additional period of time brings the total time to the upper bound of the expected range 308 .
  • the response is received before the additional period of time 310 expires.
  • the system did not assume failure and the response was received as desired.
  • FIG. 3B is an example of a case where the response is received after the additional time expires. Such a situation may occur due to a spike in network traffic or a spike in processor load. In such case, the response still comes too late and the system has already assumed failure, even after waiting for the additional period of time. Because a system cannot know if a late response will ever arrive, the system has to assume at some point that a response will not come and continue accordingly.
  • FIG. 3C illustrates an example in which the response 306 is received before the first period of time 304 elapses.
  • the system does not have to wait for the additional period of time 310 because the response has already been received.
  • the system can continue as normal.
  • FIG. 4 are tables showing an illustrative collection of data used to estimate response times.
  • the collected data may be organized and utilized in a variety of different ways.
  • the first column 402 corresponds to specific destinations to which messages have been sent in the past and for which the data has been collected.
  • the second column 404 indicates an upper bound of a time range in which a response from the corresponding system should be received under normal circumstances. For example, a response to a message sent to remote system A should not take more than about 20 milliseconds. Likewise, a response to a message sent to remote system B should not take more than 10 milliseconds.
  • the third column 406 and the fourth column 408 indicate an amount of time that is added to the upper bound of the second column 404 under various network load conditions. Specifically, the third column 406 represents network load A and the fourth column 408 represents network load B.
  • Various metrics may be used to measure a network load, such as how many packets are being transmitted within a specific time range.
  • Network load A represents a first range of network loads and network load B represents a second range of network loads. While FIG. 4 lists only two ranges of network loads, practical embodiments may have more, narrower ranges of network loads.
  • under network load A a response to a message sent to remote system A can be expected to take 10 milliseconds longer than usual.
  • Under network load B a response sent to remote system B can be expected to take 40 milliseconds longer than usual.
  • the fifth column 410 and the sixth column 412 indicate an amount of time that is added to the upper bound under various processor loads of the corresponding remote system. Specifically, the fifth column 410 represents an additional amount of time that is added to the upper bound under a processor load A and the sixth column 412 represents the amount of time that is added to the upper bound under a processor load B.
  • Various metrics can be used to monitor a processor load. While FIG. 4 lists only two ranges of processor loads, practical embodiments may have more, narrower ranges of processor loads.
  • a response to a message sent to remote system A is expected to take about 5 milliseconds longer than usual
  • processor load B for remote system A a response to a message sent to remote system A is expected to take about 8 milliseconds longer than normal.
  • Other conditions that can affect the expected amount of time in which a response should be received may be used in accordance with principles described herein. For example, different types of messages that request different types of processing or different types of data can affect the time it takes for a response.
  • a system embodying principles described herein may factor in multiple conditions to determine a final time range in which a response is expected. The upper bound in such a time range may be used as described above. Specifically, if the thread checks back in and a response has not yet been received, and the upper bound amount of time has not elapsed, then the system can wait for an additional period of time until the upper bound period of time expires.
  • FIG. 5 is a graph showing an illustrative upper bound and lower bound for an expected response time as a function of a condition associated with the responding entity.
  • the vertical axis 502 represents an amount of time in which a response to a message is received.
  • the horizontal axis 504 represents a condition, such as network load or processor load.
  • Line 506 represents an upper bound of time and line 508 represents a lower bound.
  • both the upper bound 506 and the lower bound 508 increase.
  • the condition represents network load
  • the expected amount of time increases.
  • the upper bound 506 and the lower bound 508 may be derived from historical data collected for actual response times in the past.
  • the upper bound may be within a
  • FIG. 6 is a diagram showing an illustrative computing system 600 that may be used to perform asynchronous communication.
  • the computing system 600 may be the computing system (e.g. 102 , FIG. 1 ) that sends a message to a different system.
  • the computing system 600 may also be the remote system (e.g. 110 , FIG. 1 ) that receives a message and sends a response.
  • the computing system 600 includes a processor 602 , an input device 614 , a storage device 612 , a video controller 608 , a system memory 604 , a display 610 , and a communication device 606 , all of which are interconnected by one or more buses 616 .
  • the storage device 612 may include a computer readable medium that can store data.
  • the storage device 612 may include volatile memory storage devices such as Random Access Memory (RAM) as well as non-volatile memory storage devices such as solid state memory components.
  • RAM Random Access Memory
  • the computer readable medium may be a non-transitory tangible media.
  • the communication device 606 may include a modem, network card, or any other device to enable the computing system 600 to communicate with other computing devices.
  • any computing device represents a plurality of interconnected (whether by intranet or Internet) computer systems, including without limitation, personal computers, mainframes, PDAs, smartphones and cell phones.
  • a computing system such as the computing system 600 typically includes at least hardware capable of executing machine readable instructions, as well as the software for executing acts (typically machine-readable instructions) that produce a desired result.
  • a computing system may include hybrids of hardware and software, as well as computer sub-systems.
  • hardware generally includes at least processor-capable platforms, such as hand-held processing devices (such as smart phones, tablet computers, personal digital assistants (PDAs), or personal computing devices (PCDs), for example.
  • hardware may include any physical device that is capable of storing machine-readable instructions, such as memory or other data storage devices.
  • other forms of hardware include hardware sub-systems, including transfer devices such as modems, modem cards, ports, and port cards, for example.
  • software includes any machine code stored in any memory medium, such as RAM or ROM, and machine code stored on other devices (such as floppy disks, flash memory, or a CD ROM, for example).
  • software may include source or object code.
  • software encompasses any set of instructions capable of being executed on a computing device such as, for example, on a client machine or server.
  • combinations of software and hardware could also be used for providing enhanced functionality and performance for certain embodiments of the present disclosure.
  • software functions may be directly manufactured into an integrated circuit. Accordingly, it should be understood that combinations of hardware and software are also included within the definition of a computer system and are thus envisioned by the present disclosure as possible equivalent structures and equivalent methods.
  • computer readable mediums include, for example, passive data storage, such as a random access memory (RAM) as well as semi-permanent data storage such as a solid state drive.
  • RAM random access memory
  • semi-permanent data storage such as a solid state drive.
  • data structures are defined organizations of data that may enable an embodiment of the present disclosure.
  • a data structure may provide an organization of data, or an organization of executable code.
  • a network and/or one or more portions thereof may be designed to work on any specific architecture.
  • one or more portions of the network may be executed on a single computer, local area networks, client-server networks, wide area networks, internets, hand-held and other portable and wireless devices and networks.
  • a database may be any standard or proprietary database software, such as Oracle, Microsoft Access, SyBase, or DBase II, for example.
  • the database may have fields, records, data, and other database elements that may be associated through database specific software.
  • data may be mapped.
  • mapping is the process of associating one data entry with another data entry.
  • the data contained in the location of a character file can be mapped to a field in a second table.
  • the physical location of the database is not limiting, and the database may be distributed.
  • the database may exist remotely from the server, and run on a separate platform.
  • the database may be accessible across the Internet. In several exemplary embodiments, more than one database may be implemented.
  • a computer program such as a plurality of instructions stored on a computer readable medium, such as the computer readable medium, the system memory 604 , and/or any combination thereof, may be executed by a processor 602 to cause the processor 602 to carry out or implement in whole or in part the operation of the computing system 600 , one or more of the methods.
  • a processor 602 may execute the plurality of instructions in connection with a virtual computer system.
  • processing systems described herein may include non-transitory, tangible, machine readable media that include executable code that when run by one or more processors (e.g., processor 602 ) may cause the one or more processors to perform the processes of methods as described above.
  • processors e.g., processor 602
  • Some common forms of machine readable media may include the processes of methods for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)
  • Data Mining & Analysis (AREA)

Abstract

A method performed by a computing system includes, executing a thread, the thread comprising an instruction to send a message to a remote system, after sending the message, allowing continued execution of the thread, after a first period of time, checking for a response to the message, and in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response. The predetermined amount of time is based on collected data associated with a set of conditions that correspond to a current set of conditions related to the remote system.

Description

BACKGROUND
The present disclosure relates generally to distributed computing systems, and more particularly to methods and systems for improving the efficiency of asynchronous communication between systems.
Individual computing elements of distributed systems often utilize asynchronous communication methods when communicating with each other. Asynchronous communication can provide a number of benefits over synchronous communication. For example, asynchronous systems may perform other tasks after sending a message and before receiving the response. In other words, asynchronous systems do not have to be idle while waiting for a response to a message.
One challenge that arises with use of asynchronous systems is that it is difficult to detect failures. If a response is not received within a certain period of time, it is unknowable as to whether the response is simply delayed due to network traffic or whether there will be no response due to the responding device crashing or otherwise being unavailable. A typical technique for handling this situation is to use timeouts. Timeouts work by assuming that a response will not be received if it is not received during a specified amount of time. While use of timeouts can be helpful, it is desirable to find ways to improve the manner in which asynchronous systems communicate.
SUMMARY
A method performed by a computing system includes, executing a thread, the thread comprising an instruction to send a message to a remote system, after sending the message, allowing continued execution of the thread, after a first period of time, checking for a response to the message, and in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response. The predetermined amount of time is based on collected data associated with a set of conditions that correspond to a current set of conditions related to the remote system.
A method performed by a computing system includes, with the computing system, monitoring response times for a plurality of messages sent from the computing system to a remote system, the response times corresponding to an amount of time between sending that message and receiving a response to that message, associating with each of the plurality of messages, a set of conditions under which that message was sent, determining an expected time range for messages sent to the remote system, the expected time range being a function of the conditions, executing a thread, the thread comprising an instruction to send a first message to the remote system, after sending the first message, continuing execution of the thread, after a first period of time, checking for a response to the message, and in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response. The predetermined amount of time is based on the expected time range based on a current set of conditions.
A system includes a processor and a memory, the memory comprising machine readable instructions that when executed by the processor, cause the system to execute a thread, the thread comprising an instruction to send a message to a remote system, after sending the message, continuing execution of the thread, after a first period of time, checking for a response to the message, and in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response. The predetermined amount of time is based on collected data associated with a set of conditions that correspond to a current set of conditions related to the remote system.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagram showing illustrative computing system that communicates with a remote system, according to one example of principles described herein.
FIG. 2 is a flowchart showing an illustrative method for using dynamic asynchronous communication management, according to one example of principles described herein.
FIGS. 3A-3C are diagrams showing illustrative timing for reception of responses, according to one example of principles described herein.
FIG. 4 is a table showing an illustrative collection of data used to estimate response times, according to one example of principles described herein.
FIG. 5 is a graph showing an illustrative upper bound and lower bound for an expected response time as a function of a condition associated with the responding entity, according to one example of principles described herein.
FIG. 6 is a diagram showing an illustrative computing system that may be used to perform asynchronous communication, according to one example of principles described herein.
In the figures, elements having the same designations have the same or similar functions.
DETAILED DESCRIPTION
In the following description, specific details are set forth describing some embodiments consistent with the present disclosure. It will be apparent, however, to one skilled in the art that some embodiments may be practiced without some or all of these specific details. The specific embodiments disclosed herein are meant to be illustrative but not limiting. One skilled in the art may realize other elements that, although not specifically described here, are within the scope and the spirit of this disclosure. In addition, to avoid unnecessary repetition, one or more features shown and described in association with one embodiment may be incorporated into other embodiments unless specifically described otherwise or if the one or more features would make an embodiment non-functional.
As described above, it is desirable to find ways to improve the manner in which asynchronous systems communicate. Typically, in an asynchronous system, when a computing system in a distributed computing environment sends a message to another system, the thread that sends that message is not blocked. Thus, the thread continues to execute while it waits for a response to the message. At some later point in time, the thread checks to see if there has been an answer to the message. If not, then the application assumes that there will be no response because it can't wait forever for the response and there is no good mechanism to determine whether a response will eventually come. But, it may be that network traffic is heavy and so the response will take just a few seconds longer. Thus, the application may cause errors due to sending the same message multiple times.
According to principles described herein, a method for asynchronous communication management includes a dynamic approach that gives applications more information regarding how long a response should take based on accumulated historical data. Specifically, when a thread being processed by a computing system sends a message to another system, that thread is not blocked, and thus is allowed to continue execution. At some point in time, the thread checks to see if a response to that message has been received. If so, then execution continues as normal. But, if not, instead of assuming that a response will not come, the thread utilizes the historical data to determine an expected response time. If the expected response time has not yet elapsed, then the thread can wait an additional amount of time in accordance with the expected response time.
The historical data takes into account a variety of factors. Such factors may include the type of message and the type of service to which the message is being sent. The factors may also include the current network conditions or load on server systems that process the message and send the response. The historical data may thus provide a time range in which the response to a message is expected to be received under similar conditions.
FIG. 1 is a diagram showing illustrative computing system 102 that communicates with a remote system 110. According to the present example, the computing system 102 executes a thread 104 that sends a message 108 to the remote system 110. At some later point in time, the computing system 102 receives a response from the remote system 110.
In one example, the computing system 102 may be a node within a distributed computing system. In such a case, the remote system 110 may also be a node within the distributed computing system. The computing system 102 and the remote system 110 may each include one or more processors and a memory such as the system that will be described in more detail below in the text accompanying FIG. 6. In one example, the computing system 102 is a client system that communicates with a remote service. In such case, the remote system 110 may be a server such as a web server. The principles described herein may apply to other situations in which one computing system communicates with another in an asynchronous manner.
A thread 104 is a unit of program instructions that is scheduled for execution on a processor. Many processors have multithread capability and can thus execute multiple threads by time division multiplexing. Additionally, as some threads become blocked, the processor can execute other threads until the blocked thread becomes unblocked.
In the present example, the thread 104 includes an instruction 106 for sending a message 108 to the remote system 110. The message 108 may be a request for services. For example, the message 108 may include various parameters and other information that is to be processed by the remote system 110. The remote system 110 then returns the requested information as a response 112 to the requesting computing system 102.
After the message 108 is sent, the thread 104 is not blocked, and thus is allowed to continue execution. While the processor may switch to other threads and back, the thread 104 can continue execution as normal because it is not blocked. At some point in time, the thread 104 calls into the communication infrastructure to determine if a response has been received. Conventionally, if a response has not yet been received, the thread 104 assumes failure and acts accordingly, which in some cases may involve resending the message 108. But, according to principles described herein, the computing system 102 has access to a set of collected data 120 related to past response times for similar responses under similar conditions. Thus, if the expected time range has not yet elapsed, the computing system 102 can wait 118 for an additional period of time before assuming failure.
In one example, the computing system 100 is responsible for collecting the data 120. This may be done by monitoring the response times for multiple messages sent to various remote systems. Additionally, the computing system 102 determines the conditions such as network load, processor load of the remote system, and type of response. After a statistically relevant number of response times have been collected for various conditions, the computing system 102 can form an expected time range that is a function of a specified set of conditions. Thus, when the computing system 102 sends a new message, it can check the current conditions and consult the collected data to determine a response time based on the current conditions. The computing system 102 can also update the collected data by measuring the response time for the new message.
FIG. 2 is a flowchart showing an illustrative method for using dynamic asynchronous communication management. According to the present example, the method includes a step 202 for sending a message to a remote system. As described above, the thread that includes the send instruction is not blocked, and thus is allowed to execute. At step 204, it is determined whether a first period of time has elapsed. The first period of time may vary depending on the application. In one example, the first period of time is a standard timeout mechanism used in asynchronous systems. In one example, the first period of time represents a point at which the thread is not able to proceed without the response.
If the first period of time has not elapsed, then the thread continues unblocked and does not yet check for a response to the message. But, if the first period of time has elapsed, the thread calls into the communication infrastructure to see if a response to the sent message has been received at step 206. If the response has in fact arrived, then there is no issue. The thread can continue as normal at step 214. The thread may even complete if no further responses are needed for the thread and the thread finishes executing its instructions. But, if the response has not yet been received, then the method proceeds to step 208.
According to principles described herein, at step 208, if the response has not yet been received, then it is determined whether a second period of time has passed. The second period of time corresponds to an upper bound of a time range in which the response is expected to arrive. The expected response time is based on a historical data under conditions that are similar to the current conditions. For example, historical data can be collected that indicates that under a certain network load and a certain server load, then this type of request typically receives a response within 3-4 seconds.
If the second period has not yet elapsed, which in this example is four seconds, then the system will wait as indicated in step 216. During this waiting period, the system may respond in different ways based on the application. For example, if the thread cannot proceed without the response, then the thread can be blocked during this waiting period. If, however, the thread has other instructions that can be performed without waiting on the response, then the thread may be allowed to continue. During this time, the processor may switch to other threads as needed.
If the second period has elapsed, it is determined whether the response has been received at step 210. If the message has not been received, then it can be assumed that the response will not come, at step 212. The application associated with the thread can then handle the situation accordingly. If, however, the response has been received, then the method 200 proceeds to step 214 where execution of the thread continues as normal.
FIGS. 3A-3C are diagrams showing illustrative timing for reception of responses. FIG. 3A illustrates the case where the response is received after the first period of time but before the upper bound of the expected time. According to the present, the vertical axis represents time. Point 302 represents the point at which a message is sent to a remote system. Point 304 represents the end of the first period of time. As described above, the first period of time is the normal time at which the application checks to see if a response as done conventionally without the use of principles described herein. Point 306 represents the point in time at which a response to the sent message is received.
As described above, according to principles described herein, there is an expected range 308 at which a response should be received. To determine this range, the computing system determines the current conditions, such as network load, processor load (of the remote system to which a message is sent), and type of message. Other conditions can be considered as well. Using the current conditions, the system consults historical data that has been collected on similar responses under similar conditions. The historical data can provide an expected range based on such conditions.
In the present example, the first period of time elapses before the message is received. Thus, while a conventional system would assume failure at this point, the system that embodies principles described herein knows that the response may take longer based on current network conditions. Thus, the system waits for an additional period of time 310. The additional period of time brings the total time to the upper bound of the expected range 308. In this example, the response is received before the additional period of time 310 expires. Thus, because the system waited for this additional period of time, which is based on historical data, the system did not assume failure and the response was received as desired.
FIG. 3B is an example of a case where the response is received after the additional time expires. Such a situation may occur due to a spike in network traffic or a spike in processor load. In such case, the response still comes too late and the system has already assumed failure, even after waiting for the additional period of time. Because a system cannot know if a late response will ever arrive, the system has to assume at some point that a response will not come and continue accordingly.
FIG. 3C illustrates an example in which the response 306 is received before the first period of time 304 elapses. In such a case, the system does not have to wait for the additional period of time 310 because the response has already been received. Thus, the system can continue as normal.
FIG. 4 are tables showing an illustrative collection of data used to estimate response times. The collected data may be organized and utilized in a variety of different ways. According to the present example, the first column 402 corresponds to specific destinations to which messages have been sent in the past and for which the data has been collected. The second column 404 indicates an upper bound of a time range in which a response from the corresponding system should be received under normal circumstances. For example, a response to a message sent to remote system A should not take more than about 20 milliseconds. Likewise, a response to a message sent to remote system B should not take more than 10 milliseconds.
The third column 406 and the fourth column 408 indicate an amount of time that is added to the upper bound of the second column 404 under various network load conditions. Specifically, the third column 406 represents network load A and the fourth column 408 represents network load B. Various metrics may be used to measure a network load, such as how many packets are being transmitted within a specific time range. Network load A represents a first range of network loads and network load B represents a second range of network loads. While FIG. 4 lists only two ranges of network loads, practical embodiments may have more, narrower ranges of network loads. In the present example, under network load A, a response to a message sent to remote system A can be expected to take 10 milliseconds longer than usual. Under network load B, a response sent to remote system B can be expected to take 40 milliseconds longer than usual.
The fifth column 410 and the sixth column 412 indicate an amount of time that is added to the upper bound under various processor loads of the corresponding remote system. Specifically, the fifth column 410 represents an additional amount of time that is added to the upper bound under a processor load A and the sixth column 412 represents the amount of time that is added to the upper bound under a processor load B. Various metrics can be used to monitor a processor load. While FIG. 4 lists only two ranges of processor loads, practical embodiments may have more, narrower ranges of processor loads. In the present example, under processor load A for remote system A, a response to a message sent to remote system A is expected to take about 5 milliseconds longer than usual Likewise, under processor load B for remote system A, a response to a message sent to remote system A is expected to take about 8 milliseconds longer than normal.
Other conditions that can affect the expected amount of time in which a response should be received may be used in accordance with principles described herein. For example, different types of messages that request different types of processing or different types of data can affect the time it takes for a response. A system embodying principles described herein may factor in multiple conditions to determine a final time range in which a response is expected. The upper bound in such a time range may be used as described above. Specifically, if the thread checks back in and a response has not yet been received, and the upper bound amount of time has not elapsed, then the system can wait for an additional period of time until the upper bound period of time expires.
FIG. 5 is a graph showing an illustrative upper bound and lower bound for an expected response time as a function of a condition associated with the responding entity. According to the present example, the vertical axis 502 represents an amount of time in which a response to a message is received. The horizontal axis 504 represents a condition, such as network load or processor load. Line 506 represents an upper bound of time and line 508 represents a lower bound.
According to the present example, as the condition grows in intensity, both the upper bound 506 and the lower bound 508 increase. For example, in the case that the condition represents network load, as the network load increases, the expected amount of time increases. The upper bound 506 and the lower bound 508 may be derived from historical data collected for actual response times in the past. For example, the upper bound may be within a,
FIG. 6 is a diagram showing an illustrative computing system 600 that may be used to perform asynchronous communication. According to the present example, the computing system 600 may be the computing system (e.g. 102, FIG. 1) that sends a message to a different system. The computing system 600 may also be the remote system (e.g. 110, FIG. 1) that receives a message and sends a response.
According to the present example, the computing system 600 includes a processor 602, an input device 614, a storage device 612, a video controller 608, a system memory 604, a display 610, and a communication device 606, all of which are interconnected by one or more buses 616.
The storage device 612 may include a computer readable medium that can store data. The storage device 612 may include volatile memory storage devices such as Random Access Memory (RAM) as well as non-volatile memory storage devices such as solid state memory components. The computer readable medium may be a non-transitory tangible media.
In some examples, the communication device 606 may include a modem, network card, or any other device to enable the computing system 600 to communicate with other computing devices. In some examples, any computing device represents a plurality of interconnected (whether by intranet or Internet) computer systems, including without limitation, personal computers, mainframes, PDAs, smartphones and cell phones.
A computing system such as the computing system 600 typically includes at least hardware capable of executing machine readable instructions, as well as the software for executing acts (typically machine-readable instructions) that produce a desired result. In some examples, a computing system may include hybrids of hardware and software, as well as computer sub-systems.
In some examples, hardware generally includes at least processor-capable platforms, such as hand-held processing devices (such as smart phones, tablet computers, personal digital assistants (PDAs), or personal computing devices (PCDs), for example. In some examples, hardware may include any physical device that is capable of storing machine-readable instructions, such as memory or other data storage devices. In some examples, other forms of hardware include hardware sub-systems, including transfer devices such as modems, modem cards, ports, and port cards, for example.
In some examples, software includes any machine code stored in any memory medium, such as RAM or ROM, and machine code stored on other devices (such as floppy disks, flash memory, or a CD ROM, for example). In some examples, software may include source or object code. In several exemplary embodiments, software encompasses any set of instructions capable of being executed on a computing device such as, for example, on a client machine or server.
In some examples, combinations of software and hardware could also be used for providing enhanced functionality and performance for certain embodiments of the present disclosure. In some examples, software functions may be directly manufactured into an integrated circuit. Accordingly, it should be understood that combinations of hardware and software are also included within the definition of a computer system and are thus envisioned by the present disclosure as possible equivalent structures and equivalent methods.
In some examples, computer readable mediums include, for example, passive data storage, such as a random access memory (RAM) as well as semi-permanent data storage such as a solid state drive. One or more exemplary embodiments of the present disclosure may be embodied in the RAM of a computing device to transform a standard computer into a new specific computing machine. In some examples, data structures are defined organizations of data that may enable an embodiment of the present disclosure. In an exemplary embodiment, a data structure may provide an organization of data, or an organization of executable code.
In some examples, a network and/or one or more portions thereof, may be designed to work on any specific architecture. In some examples, one or more portions of the network may be executed on a single computer, local area networks, client-server networks, wide area networks, internets, hand-held and other portable and wireless devices and networks.
In some examples, a database may be any standard or proprietary database software, such as Oracle, Microsoft Access, SyBase, or DBase II, for example. The database may have fields, records, data, and other database elements that may be associated through database specific software. In several exemplary embodiments, data may be mapped. In some examples, mapping is the process of associating one data entry with another data entry. In an exemplary embodiment, the data contained in the location of a character file can be mapped to a field in a second table. In some examples, the physical location of the database is not limiting, and the database may be distributed. In some examples, the database may exist remotely from the server, and run on a separate platform. In some examples, the database may be accessible across the Internet. In several exemplary embodiments, more than one database may be implemented.
In some examples, a computer program, such as a plurality of instructions stored on a computer readable medium, such as the computer readable medium, the system memory 604, and/or any combination thereof, may be executed by a processor 602 to cause the processor 602 to carry out or implement in whole or in part the operation of the computing system 600, one or more of the methods. In some examples, such a processor 602 may execute the plurality of instructions in connection with a virtual computer system.
Some examples of processing systems described herein may include non-transitory, tangible, machine readable media that include executable code that when run by one or more processors (e.g., processor 602) may cause the one or more processors to perform the processes of methods as described above. Some common forms of machine readable media that may include the processes of methods for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.
Although illustrative embodiments have been shown and described, a wide range of modification, change and substitution is contemplated in the foregoing disclosure and in some instances, some features of the embodiments may be employed without a corresponding use of other features. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. Thus, the scope of the invention should be limited only by the following claims, and it is appropriate that the claims be construed broadly and in a manner consistent with the scope of the embodiments disclosed herein.

Claims (18)

What is claimed is:
1. A method performed by a computing system, the method comprising:
executing a thread, the thread comprising an instruction to send a message to a remote system under a current set of conditions;
after sending the message, allowing continued execution of the thread;
after a first period of time, checking for a response to the message;
in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response;
for each of a plurality of messages sent to the remote system, measuring a response time corresponding to an amount of time between sending that message and receiving a response to that message; and
associating with each message, a set of conditions under which that message was sent;
wherein the predetermined amount of time is based on collected data associated with a set of conditions that correspond to the current set of conditions.
2. The method of claim 1, further comprising, based on the response time and the conditions associated with each message, determining a time range corresponding to an expected amount of time in which a response to that message should take to be received.
3. The method of claim 2, wherein an upper bound of the time range corresponds to the predetermined amount of time.
4. The method of claim 1, further comprising, updating the collected data based on a measured amount of time between sending the message to the remote system and receiving a response from the remote system.
5. The method of claim 1, wherein the conditions comprise at least one of: a network load of a network connecting the computing system to the remote system, a processing load of the remote system, and a type of the message.
6. The method of claim 1, wherein waiting for the additional period of time comprises:
allowing continued execution of the thread; and
checking, after the additional period of time, for a response to the message.
7. The method of claim 1, wherein waiting for the additional period of time comprises:
blocking the thread; and
checking, after the additional period of time, for a response to the message.
8. The method of claim 1, further comprising, in response to determining that after the additional period of time, a response has not been received, assuming the response will not be received.
9. A method performed by a computing system, the method comprising:
with the computing system, monitoring response times for a plurality of messages sent from the computing system to a remote system, the response times corresponding to an amount of time between sending that message and receiving a response to that message;
associating with each of the plurality of messages, a set of conditions under which that message was sent;
determining an expected time range for messages sent to the remote system, the expected time range being a function of the conditions;
executing a thread, the thread comprising an instruction to send a first message to the remote system under a current set of conditions;
after sending the first message, continuing execution of the thread;
after a first period of time, checking for a response to the message; and
in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response;
wherein the predetermined amount of time is based on the expected time range based on the current set of conditions.
10. The method of claim 9, wherein the predetermined amount of time comprises an upper bound of the expected time range.
11. The method of claim 1, further comprising, updating the expected time range based on a measured amount of time between sending the message to the remote system and receiving a response from the remote system.
12. The method of claim 1, wherein the conditions comprise at least one of:
a network load of a network connecting the computing system to the remote system, a processing load of the remote system, and a type of the message.
13. The method of claim 1, wherein waiting for the additional period of time comprises:
allowing continued execution of the thread; and
checking, after the additional period of time, for a response to the message.
14. The method of claim 1, wherein waiting for the additional period of time comprises:
blocking the thread; and
checking, after the additional period of time, for a response to the message.
15. The method of claim 1, further comprising, in response to determining that after the additional period of time, a response has not been received, assuming the response will not be received.
16. A system comprising:
a processor; and
a memory, the memory comprising machine readable instructions that when executed by the processor, cause the system to:
execute a thread, the thread comprising an instruction to send a message to a remote system under a current set of conditions;
after sending the message, continuing execution of the thread;
after a first period of time, checking for a response to the message; and
in response to determining that the response has not been received and that the first period of time is less than a predetermined amount of time, waiting for an additional period of time for the response;
for each of a plurality of messages sent to the remote system, measure a response time corresponding to an amount of time between sending that message and receiving a response to that message; and
associate with each message, a set of conditions under which that message was sent;
wherein the predetermined amount of time is based on collected data associated with a set of conditions that correspond to the current set of conditions.
17. The system of claim 16, wherein the collected data comprises historical response times for messages sent from the system to the remote system.
18. The system of claim 16, wherein the conditions comprise at least one of: a network load of a network connecting the computing system to the remote system, a processing load of the remote system, and a type of the message.
US14/619,389 2015-02-11 2015-02-11 Dynamic asynchronous communication management Active 2036-04-25 US10116543B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/619,389 US10116543B2 (en) 2015-02-11 2015-02-11 Dynamic asynchronous communication management
US16/173,402 US11271839B2 (en) 2015-02-11 2018-10-29 Dynamic asynchronous communication management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/619,389 US10116543B2 (en) 2015-02-11 2015-02-11 Dynamic asynchronous communication management

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/173,402 Continuation US11271839B2 (en) 2015-02-11 2018-10-29 Dynamic asynchronous communication management

Publications (2)

Publication Number Publication Date
US20160234090A1 US20160234090A1 (en) 2016-08-11
US10116543B2 true US10116543B2 (en) 2018-10-30

Family

ID=56566215

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/619,389 Active 2036-04-25 US10116543B2 (en) 2015-02-11 2015-02-11 Dynamic asynchronous communication management
US16/173,402 Active US11271839B2 (en) 2015-02-11 2018-10-29 Dynamic asynchronous communication management

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/173,402 Active US11271839B2 (en) 2015-02-11 2018-10-29 Dynamic asynchronous communication management

Country Status (1)

Country Link
US (2) US10116543B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114584500B (en) * 2022-02-25 2024-03-22 网易(杭州)网络有限公司 Asynchronous communication testing method and device and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812768A (en) 1992-10-30 1998-09-22 Software Ag System for allocating adaptor to server by determining from embedded foreign protocol commands in client request if the adapter service matches the foreign protocol
US6026424A (en) * 1998-02-23 2000-02-15 Hewlett-Packard Company Method and apparatus for switching long duration tasks from synchronous to asynchronous execution and for reporting task results
US6496850B1 (en) 1999-08-31 2002-12-17 Accenture Llp Clean-up of orphaned server contexts
US20030061279A1 (en) 2001-05-15 2003-03-27 Scot Llewellyn Application serving apparatus and method
US20060064481A1 (en) 2004-09-17 2006-03-23 Anthony Baron Methods for service monitoring and control
US20080037880A1 (en) * 2006-08-11 2008-02-14 Lcj Enterprises Llc Scalable, progressive image compression and archiving system over a low bit rate internet protocol network
US7493394B2 (en) 2003-12-31 2009-02-17 Cisco Technology, Inc. Dynamic timeout in a client-server system
US20120297216A1 (en) * 2011-05-19 2012-11-22 International Business Machines Corporation Dynamically selecting active polling or timed waits
US20140214745A1 (en) 2013-01-28 2014-07-31 Rackspace Us, Inc. Methods and Systems of Predictive Monitoring of Objects in a Distributed Network System
US8881169B2 (en) * 2007-03-20 2014-11-04 Fujitsu Mobile Communications Limited Information processing apparatus for monitoring event delivery from plurality of monitoring threads
US9007889B2 (en) * 2009-12-09 2015-04-14 Kabushiki Kaisha Toshiba Communication device and communication system with failure detection capabilities

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5913060A (en) * 1995-10-11 1999-06-15 Citrix Systems, Inc. Method for deadlock avoidance in a distributed process system using a synchronous procedure call
US8447859B2 (en) * 2007-12-28 2013-05-21 International Business Machines Corporation Adaptive business resiliency computer system for information technology environments
US9286185B2 (en) * 2012-11-16 2016-03-15 Empire Technology Development Llc Monitoring a performance of a computing device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812768A (en) 1992-10-30 1998-09-22 Software Ag System for allocating adaptor to server by determining from embedded foreign protocol commands in client request if the adapter service matches the foreign protocol
US6026424A (en) * 1998-02-23 2000-02-15 Hewlett-Packard Company Method and apparatus for switching long duration tasks from synchronous to asynchronous execution and for reporting task results
US6496850B1 (en) 1999-08-31 2002-12-17 Accenture Llp Clean-up of orphaned server contexts
US20030061279A1 (en) 2001-05-15 2003-03-27 Scot Llewellyn Application serving apparatus and method
US7493394B2 (en) 2003-12-31 2009-02-17 Cisco Technology, Inc. Dynamic timeout in a client-server system
US20060064481A1 (en) 2004-09-17 2006-03-23 Anthony Baron Methods for service monitoring and control
US20080037880A1 (en) * 2006-08-11 2008-02-14 Lcj Enterprises Llc Scalable, progressive image compression and archiving system over a low bit rate internet protocol network
US8881169B2 (en) * 2007-03-20 2014-11-04 Fujitsu Mobile Communications Limited Information processing apparatus for monitoring event delivery from plurality of monitoring threads
US9007889B2 (en) * 2009-12-09 2015-04-14 Kabushiki Kaisha Toshiba Communication device and communication system with failure detection capabilities
US20120297216A1 (en) * 2011-05-19 2012-11-22 International Business Machines Corporation Dynamically selecting active polling or timed waits
US20140214745A1 (en) 2013-01-28 2014-07-31 Rackspace Us, Inc. Methods and Systems of Predictive Monitoring of Objects in a Distributed Network System

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
"Libxmlrpc_client++", http://xmlrpc-c.sourceforge.net/doc/libxmlrpc_client++.html.
Caniou et al., "High Performance Gridrpc Middleware", 2006, Universite de Lyon, LIP, CNRS-ENS-Lyon-UCBL-INRIA, France; National Institute of Advanced Science and Technology, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan ; Electrical Engineering and Computer Science Department, University of Tennessee, Knoxville, TN, USA, http://icl.cs.utk.edu/projectsfiles/netsolve/pubs/gridrpc.pdf.
Fischer et al., "Impossilbity of Distributed Consensus with One Faulty Process" Apr. 1985, Journal of the Association for Computing Machinery, http://cs-www.cs.yale.edu/homes/arvind/cs425/doc/fischer.pdf.
Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System", Oct. 1977, http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf.
Little et al., "The University Student Registration System: A Case Study in Building a High-Availability Distributed Application Using General Purpose Components", 1999, Proceeding Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems pp. 453-471, http://dl.acm.org/citation.cfm?id=726895.
Nightingale et al., "Speculative Execution in a Distributed File System" Oct. 23-26, 2005, Department of Electrical Engineering and Computer Science University of Michigan, http://www.cs.berkeley.edu/˜brewer/cs262/speculator-nightingale.pdf.
Panzieri et al., "Rajdoot: A Remote Procedure Call Mechanism Supporting Orphan Detection and Killing", http://csis.pace.edu/˜marchese/CS865/Papers/panzieri_.pdf.
Prashant Shenoy, "CMPSCI 677 Operating Systems", Feb. 20, 2013, http://lass.cs.umass.edu/˜shenoy/courses/spring13/lectures/notes/677_lec09.pdf.

Also Published As

Publication number Publication date
US20160234090A1 (en) 2016-08-11
US20190068471A1 (en) 2019-02-28
US11271839B2 (en) 2022-03-08

Similar Documents

Publication Publication Date Title
US10440136B2 (en) Method and system for resource scheduling
EP3221795B1 (en) Service addressing in distributed environment
US9930111B2 (en) Techniques for web server management
US8095935B2 (en) Adapting message delivery assignments with hashing and mapping techniques
CN105630819B (en) A kind of data cached method for refreshing and device
US8589537B2 (en) Methods and computer program products for aggregating network application performance metrics by process pool
CN106034137A (en) Intelligent scheduling method for distributed system, and distributed service system
CN109510878B (en) Long connection session keeping method and device
WO2017206678A1 (en) Information acquisition method and device
US9736235B2 (en) Computer system, computer, and load balancing method
US9727375B1 (en) Systems and methods for run time migration
CN106533961B (en) Flow control method and device
CN105471616A (en) Cache system management method and system
US11271839B2 (en) Dynamic asynchronous communication management
CN111159131A (en) Performance optimization method, device, equipment and computer readable storage medium
US20150220380A1 (en) Dynamically determining an external systems management application to report system errors
US20160006635A1 (en) Monitoring method and monitoring system
CN112698927A (en) Bidirectional communication method, device, electronic equipment and machine-readable storage medium
JP2010170168A (en) Flow rate control method and system
JP6073211B2 (en) Server monitoring method and server monitoring system
CN109639785B (en) Data aggregation cluster management system and method
JP6349786B2 (en) Virtual machine management apparatus, virtual machine management method, and virtual machine management program
US20200374366A1 (en) Method and system for communication between two devices
JP6430061B2 (en) Connection management system and connection management method
Limoncelli Are You Load Balancing Wrong? Anyone can use a load balancer. Using them properly is much more difficult.

Legal Events

Date Code Title Description
AS Assignment

Owner name: RED HAT, INC., NORTH CAROLINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LITTLE, MARK CAMERON;REEL/FRAME:034939/0594

Effective date: 20150207

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4