US20220286373A1 - Scalable real time metrics management - Google Patents

Scalable real time metrics management Download PDF

Info

Publication number
US20220286373A1
US20220286373A1 US17/700,037 US202217700037A US2022286373A1 US 20220286373 A1 US20220286373 A1 US 20220286373A1 US 202217700037 A US202217700037 A US 202217700037A US 2022286373 A1 US2022286373 A1 US 2022286373A1
Authority
US
United States
Prior art keywords
aggregated results
metrics
network elements
rate
analyzing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/700,037
Inventor
Ranganathan Rajagopalan
Gaurav Rastogi
Praveen Yalagandula
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
VMware LLC
Original Assignee
VMware LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by VMware LLC filed Critical VMware LLC
Priority to US17/700,037 priority Critical patent/US20220286373A1/en
Publication of US20220286373A1 publication Critical patent/US20220286373A1/en
Assigned to VMware LLC reassignment VMware LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: VMWARE, INC.
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/20Arrangements for monitoring or testing data switching networks the monitoring system or the monitored elements being virtualised, abstracted or software-defined entities, e.g. SDN or NFV
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/067Generation of reports using time frame reporting

Definitions

  • Metrics are used by computer systems to quantify the measurement of system performance. Metrics are critical for analyzing systems' operations and providing feedback for improvements.
  • the quantity of metrics can be large. For example, suppose that a single cloud application collects 1000 metrics for analysis every 5 seconds, which means that 720,000 metrics are collected every hour. In a typical high scale environment such as an enterprise data center that supports thousands of applications each executing on multiple servers, the rate can be on the order of billions of metrics per hour.
  • FIG. 1 is a functional diagram illustrating a programmed computer system for managing metrics in accordance with some embodiments.
  • FIG. 2 is a block diagram illustrating an embodiment of a data center that includes a scalable distributed metrics manager.
  • FIG. 3A is a block diagram illustrating an embodiment of a metrics pipeline in a scalable distributed metrics manager.
  • FIG. 3B is a diagram illustrating an embodiment of a metric data structure.
  • FIG. 3C is a diagram illustrating an embodiment of a metrics message.
  • FIG. 4 is a flowchart illustrating an embodiment of a process for managing metrics.
  • FIGS. 5A-5B are diagrams illustrating an embodiment of an approach for archiving the aggregated results.
  • FIGS. 6A-6B are diagrams illustrating another embodiment of an approach for archiving the aggregated results.
  • FIG. 7 is a flowchart illustrating an embodiment of a process for querying metrics data stored in a database.
  • the metrics are managed and processed in a pipeline comprising multiple stages.
  • a plurality of performance metrics associated with a plurality of sources on a network is obtained.
  • the plurality of performance metrics is aggregated at a first rate to generate a plurality of first aggregated results, and at least some of the plurality of first aggregated results are maintained for a time in one or more memories.
  • the plurality of first aggregated results is aggregated at a second rate to generate a plurality of second aggregated results, the second rate being a lower rate than the first rate. At least some of the plurality of second aggregated results are maintained in the one or more memories. Additional aggregation stages can be used.
  • the aggregated results can be persisted to a persistent storage.
  • FIG. 1 is a functional diagram illustrating a programmed computer system for managing metrics in accordance with some embodiments.
  • Computer system 100 which includes various subsystems as described below, includes at least one microprocessor subsystem (also referred to as a processor or a central processing unit (CPU)) 102 .
  • processor 102 can be implemented by a single-chip processor or by multiple processors.
  • processor 102 is a general purpose digital processor that controls the operation of the computer system 100 . Using instructions retrieved from memory 110 , the processor 102 controls the reception and manipulation of input data, and the output and display of data on output devices (e.g., display 118 ).
  • processor 102 includes and/or is used to provide server functions described below with respect to server 202 , etc. of FIG. 2 .
  • Processor 102 is coupled bi-directionally with memory 110 , which can include a first primary storage, typically a random access memory (RAM), and a second primary storage area, typically a read-only memory (ROM).
  • primary storage can be used as a general storage area and as scratch-pad memory, and can also be used to store input data and processed data.
  • Primary storage can also store programming instructions and data, in the form of data objects and text objects, in addition to other data and instructions for processes operating on processor 102 .
  • primary storage typically includes basic operating instructions, program code, data, and objects used by the processor 102 to perform its functions (e.g., programmed instructions).
  • memory 110 can include any suitable computer-readable storage media, described below, depending on whether, for example, data access needs to be bi-directional or uni-directional.
  • processor 102 can also directly and very rapidly retrieve and store frequently needed data in a cache memory (not shown).
  • a removable mass storage device 112 provides additional data storage capacity for the computer system 100 , and is coupled either bi-directionally (read/write) or uni-directionally (read only) to processor 102 .
  • storage 112 can also include computer-readable media such as magnetic tape, flash memory, PC-CARDS, portable mass storage devices, holographic storage devices, and other storage devices.
  • a fixed mass storage 120 can also, for example, provide additional data storage capacity. The most common example of mass storage 120 is a hard disk drive.
  • Mass storages 112 , 120 generally store additional programming instructions, data, and the like that typically are not in active use by the processor 102 . It will be appreciated that the information retained within mass storages 112 and 120 can be incorporated, if needed, in standard fashion as part of memory 110 (e.g., RAM) as virtual memory.
  • the network interface 116 allows processor 102 to be coupled to another computer, computer network, or telecommunications network using a network connection as shown.
  • the processor 102 can receive information (e.g., data objects or program instructions) from another network or output information to another network in the course of performing method/process steps.
  • Information often represented as a sequence of instructions to be executed on a processor, can be received from and outputted to another network.
  • An interface card or similar device and appropriate software implemented by (e.g., executed/performed on) processor 102 can be used to connect the computer system 100 to an external network and transfer data according to standard protocols.
  • auxiliary I/O device interface can be used in conjunction with computer system 100 .
  • the auxiliary I/O device interface can include general and customized interfaces that allow the processor 102 to send and, more typically, receive data from other devices such as microphones, touch-sensitive displays, transducer card readers, tape readers, voice or handwriting recognizers, biometrics readers, cameras, portable mass storage devices, and other computers.
  • FIG. 2 is a block diagram illustrating an embodiment of a data center that includes a scalable distributed metrics manager.
  • client devices such as 252 connect to a data center 250 via a network 254 .
  • a client device can be a laptop computer, a desktop computer, a tablet, a mobile device, a smart phone, a wearable networking device, or any other appropriate computing device.
  • a web browser and/or a standalone client application is installed at each client, enabling a user to use the client device to access certain applications hosted by data center 250 .
  • Network 254 can be the Internet, a private network, a hybrid network, or any other communications network.
  • a networking layer 255 comprising networking devices such as routers, switches, etc. forwards requests from client devices 252 to a distributed network service platform 204 .
  • distributed network service platform 204 includes a number of servers configured to provide a distributed network service.
  • a physical server e.g., 202 , 204 , 206 , etc.
  • hardware e.g., 208
  • the server supports operating system software in which a number of virtual machines (VMs) (e.g., 218 , 219 , 220 , etc.) are configured to execute.
  • VMs virtual machines
  • instances of applications are configured to execute within the VMs.
  • Examples of such applications include web applications such as shopping cart, user authentication, credit card authentication, email, file sharing, virtual desktops, voice/video streaming, online collaboration, and many others.
  • One or more service engines are instantiated on a physical device.
  • a service engine is implemented as software executing in a virtual machine.
  • the service engine is executed to provide distributed network services for applications executing on the same physical server as the service engine, and/or for applications executing on different physical servers.
  • the service engine is configured to enable appropriate service components that implement service logic. For example, a load balancer component is executed to provide load balancing logic to distribute traffic load amongst instances of applications executing on the local physical device as well as other physical devices; a firewall component is executed to provide firewall logic to instances of the applications on various devices; a metrics agent component is executed to gather metrics associated with traffic, performance, etc. associated with the instances of the applications, etc. Many other service components may be implemented and enabled as appropriate. When a specific service is desired, a corresponding service component is configured and invoked by the service engine to execute in a VM.
  • traffic received on a physical port of a server is sent to a virtual switch (e.g., 212 ).
  • the virtual switch is configured to use an API provided by the hypervisor to intercept incoming traffic designated for the application(s) in an inline mode, and send the traffic to an appropriate service engine.
  • inline mode packets are forwarded on without being replicated.
  • the virtual switch passes the traffic to a service engine in the distributed network service layer (e.g., the service engine on the same physical device), which transforms the packets if needed and redirects the packets to the appropriate application.
  • the service engine based on factors such as configured rules and operating conditions, redirects the traffic to an appropriate application executing in a VM on a server. Details of the virtual switch and its operations are outside the scope of the present application.
  • Controller 290 is configured to control, monitor, program, and/or provision the distributed network services and virtual machines.
  • the controller includes a metrics manager 292 configured to collect performance metrics and perform analytical operations.
  • the controller can be implemented as software, hardware, firmware, or any combination thereof.
  • the controller is implemented on a system such as 100 .
  • the controller is implemented as a single entity logically, but multiple instances of the controller are installed and executed on multiple physical devices to provide high availability and increased capacity.
  • known techniques such as those used in distributed databases are applied to synchronize and maintain coherency of data among the controller instances.
  • one or more controllers 290 gather metrics data from various nodes operating in the data center.
  • a node refers to a computing element that is a source of metrics information. Examples of nodes include virtual machines, networking devices, service engines, or any other appropriate elements within the data center.
  • metrics relating to the performance of the application and/or the VM executing the application can be directly collected by the corresponding service engine.
  • a service engine sends a script to a client browser or client application. The script measures client responses and returns one or more collected metrics back to the service engine. In both cases, the service engine sends the collected metrics to controller 290 .
  • infrastructure metrics relating to the performance of other components of the service platform can be collected by the controller.
  • metrics relating to the networking devices, metrics relating to the performance of the service engines themselves, metrics relating to the host devices such as data storage as well as operating system performance, etc. can be collected by the controller.
  • Specific examples of the metrics include round trip time, latency, bandwidth, number of connections, etc.
  • FIG. 3A is a block diagram illustrating an embodiment of a metrics pipeline in a scalable distributed metrics manager.
  • Pipeline 300 implements the process for aggregating metrics and can be used to implement scalable metrics manager 292 .
  • a pipeline processes one or more specific types of metrics, and multiple pipelines similar to 300 can be configured to process different types of metrics.
  • pipeline 300 receives metrics from a variety of sources, such as service engines 214 , 224 , etc., via data streams. Metrics can also be received from other sources such as network devices, an operating system, a virtual switch, etc. (not shown).
  • the performance metrics are continuously collected at various sources (e.g., service engines, network devices, etc.) and sent to the first stage (e.g., stage 302 of FIG.
  • Each stage only needs to maintain a sufficient number of inputs (e.g., metrics or results from the previous stage) in memory to perform aggregation, thus the overall number of metrics to be stored and the total amount of memory required for real time analysis are reasonable and can be implemented for high scale environments such as enterprise data centers where large volumes of metrics are constantly generated. Note that although separate buffers are shown for the output of one stage and the input of the next stage, in some implementations only one set of buffers needs to be maintained. As will be described in greater detail below, each stage performs one or more aggregation functions to generate aggregated results.
  • inputs e.g., metrics or results from the previous stage
  • each of the pipeline stages is optionally connected to a persistent storage 310 , such as a database, a file system, or any other appropriate non-volatile storage system, in order to write the aggregated results to the storage and back up the metrics data more permanently.
  • a persistent storage 310 such as a database, a file system, or any other appropriate non-volatile storage system, in order to write the aggregated results to the storage and back up the metrics data more permanently.
  • MongoDB is used in some implementations. Further details of the pipeline's operations are explained in connection with FIG. 4 below.
  • FIG. 3B is a diagram illustrating an embodiment of a metric data structure.
  • metric 350 is a key-tuple data structure that includes the following fields: ⁇ MetricsObjectType, Entity, Node, ObjectID ⁇ .
  • the values in the fields can be alphanumeric strings, numerical values, or any other appropriate data formats.
  • MetricsObjectType specifies the type of metric being sent. Examples of MetricsObjectType include client metric, front end network metric, backend network metric, application metric, etc.
  • Entity specifies the particular element about which the metric is being reported, such as a particular server or application executing on a virtual machine.
  • Node specifies the particular element that is reporting the metric, such as a particular service engine.
  • An Entity may include multiple objects, and ObjectID specifies the particular object within the entity that is generating the metric, such as a particular Internet Protocol (IP) address, a particular port, a particular Universal Resource Identifier (URI), etc.
  • IP Internet Protocol
  • URI Universal Resource Identifier
  • MetricsObjectType is set to vserver 14 client (which corresponds to a type of metric related to virtual server layer 4 client)
  • Entity is set to vs-1 (which corresponds to a server with the identifier of vs-1)
  • Node is set to se-1 (which corresponds to a service engine with the identifier of se-1)
  • ObjectID is set to port (which corresponds to the port object within the server).
  • each field can be used as an index for lookups.
  • Metrics with different fields can be defined and used.
  • a metric also includes a timestamp field.
  • a source can report metrics to the metrics manager without requiring explicit registration with the metrics manager.
  • a source can report metrics to the metrics manager by sending one or more messages having predetermined formats.
  • FIG. 3C is a diagram illustrating an embodiment of a metrics message.
  • the message includes a header that specifies certain characteristics of the metrics being sent (e.g., number of metrics in the message, timestamp of when the message is sent, etc.), and multiple metrics in the message body.
  • the node batches multiple metrics in a single message and sends the message to the metrics manager.
  • the metrics manager maintains multiple pipelines to process different types of metrics. Upon receiving a metrics message, the metrics manager parses the message to obtain metrics and places each metric in an appropriate pipeline for processing. In some embodiments, upon detecting that a metric includes a new instance of key-tuples as discussed above, the metrics manager establishes a new in-memory processing unit (e.g., a specific pipeline such as 300 that is configured with its own memory, thread, and/or process for handling metrics associated with the key-tuple), and future metrics messages having the same key-tuple will be processed by this in-memory processing unit. In some embodiments, the metrics manager can establish one or more pipelines that receive as inputs multiple key-tuple in order to generate certain desired results. The configuration of specific pipelines depends on implementation.
  • FIG. 4 is a flowchart illustrating an embodiment of a process for managing metrics.
  • Process 400 can be implemented by scalable metrics manager 292 operating on a system such as 100 .
  • metrics associated with a plurality of sources are obtained. As discussed above, the metrics can be sent in messages to the metrics manager.
  • the obtained metrics are managed in a metrics pipeline as described above in connection with FIG. 3A .
  • the metrics associated with a plurality of sources are aggregated at a first rate to generate a plurality of first aggregated results.
  • the raw metrics are sampled at the first rate to generate the aggregated results.
  • a transform operation generates a corresponding aggregated result (also referred to as derived metric) based on the inputs to the transform function.
  • Multiple transform operations can generate a vector of aggregated results. Aggregations can be performed across service engines, across multiple servers in a pool, across multiple entities, across multiple objects, etc.
  • a pipeline can be configured to transform any appropriate metrics into a new result. The specific aggregation functions in a pipeline can be configured by the programmer or administrator according to actual system needs.
  • the first rate at which aggregation takes place corresponds to the rate at which the aggregation function is performed on the collected data.
  • a constant rate is discussed extensively for purposes of example, but the rate can be a non-constant rate as well (e.g., aggregation happens when the number of metrics collected meets or exceeds a threshold or when some other triggering condition for aggregation is met).
  • the aggregation only uses metrics stored in memory, it does not require any database calls and is highly efficient. Further, because the aggregation is done periodically and in a batched fashion (e.g., all first stages of the pipelines perform aggregation every 5 seconds), timers do not need to be maintained per object or per metric. Thus, aggregation can be performed quickly and efficiently.
  • the received metrics are temporarily maintained in a memory such as a Random Access Memory (RAM).
  • the first aggregated results and/or the received metrics will be rolled up into the next stage periodically.
  • the first stage only needs to maintain a sufficient amount of input metrics in the memory until the first aggregation is performed, after which the metrics used in the aggregation can be removed from the memory in order to save space and make room for new aggregated results and/or metrics.
  • the first aggregated results and/or the obtained metrics are optionally output to a persistent storage such as a database.
  • the aggregation of the performance metrics and the maintenance of the first aggregated results are performed in a single process to reduce the overhead of context switches. It is permissible to implement the aggregation and the maintenance steps in separate processes.
  • the plurality of second aggregated results is aggregated at a third rate to generate a plurality of third aggregated results.
  • the aggregation results of the second stage (stage 304 ) are sent to the third stage (stage 306 ), to be aggregated at a rate of every hour.
  • the second stage generates an aggregated result every 5 minutes, then in one hour there will be 12 aggregated results.
  • These 12 aggregated results from the second stage are aggregated again at the third stage according to one or more aggregation functions (F 3 ) associated with the third stage.
  • stages are shown for purposes of illustration, other numbers of stages (e.g., two stages, four, or more stages) can be implemented in various embodiments.
  • the metrics manager can invoke one or more corresponding analytical operations (such as anomaly detection) on the aggregated results.
  • the Holt-Winters algorithm used to detect outlier metrics and remove anomalies can be performed at any of the stages.
  • an aggregated result e.g., the total number of connections
  • a threshold e.g., a maximum number of 100
  • the analytical operation is implemented as an inline function within the metrics manager process. The inline function implements additional steps in the continuous processing of the metrics within the same process and software module.
  • the analytical operation does not require any input/output (I/O) operations such as database read/write, inter-process communication, system call, messaging, etc.
  • I/O input/output
  • the analytical operations are highly efficient compared with existing analytics tools that typically require database access or file system access.
  • the analytical operations can be performed in real time (e.g., at substantially the same time as when the performance metrics are received, or at substantially the same time as when the aggregated results are generated).
  • the metrics manager generates events when metrics and/or aggregated results meet certain conditions.
  • Event detection and generation can be implemented as an inline function where certain conditions are tested on a per metric type, per entity, and per node basis. For example, if the network connections metrics of a server sent by a service engine indicate that the connection exceeds a threshold, then an event such as an alarm or log is triggered.
  • event triggering conditions include: a metric meeting or exceeding a high watermark level for the first time after the metric has stayed below a low threshold; a metric meeting or falling below a low watermark level after the metric has stayed above the threshold; a metric crossing a predefined threshold; a metric indicating that an anomaly has occurred, etc.
  • a set of rules is specified for these conditions, and a rules processing engine compares the values associated with metrics against the rules to detect whether any specified conditions are met.
  • metrics and aggregated results are recorded in a persistent storage such as a database for backup purposes.
  • a retention policy is specified by the administrator to determine the amount of time for which corresponding stored data remains in the database. When the retention period is over, any data that is outside the retention policy period is erased from the database to conserve space.
  • FIGS. 5A-5B are diagrams illustrating an embodiment of an approach for archiving the aggregated results.
  • aggregated results are written to the database at the time of aggregation, as shown in FIG. 5A .
  • the retention policy periods for the first stage, the second stage, and the third stage are 2 hours, 2 days, and 1 year, respectively.
  • the database records corresponding to the first aggregated results that occurred before the current two hour window are deleted from the database, as shown in FIG. 5B .
  • FIGS. 6A-6B are diagrams illustrating another embodiment of an approach for archiving the aggregated results.
  • aggregated results from different stages of a pipeline occupy separate tables.
  • multiple tables can be used to store the aggregated results.
  • These tables are referred to as time series based database tables since they each correspond to a different period of aggregated results.
  • Different time series based database tables can be subject to different retention policies.
  • two tables are used to store the first aggregated results, where each table is configured to store one hour's worth of first aggregated results from the first stage; one table is used to store the second aggregated results, where the table is configured to store one day's worth of second aggregated results from the second stage; and one table is used to store the third aggregated results, where the table is configured to store one year's worth of third aggregated results from the third stage.
  • the aggregated results are written to the database in a batch in append mode.
  • the first aggregated results can be written to the database every 30 minutes rather than every five seconds. Maintaining a greater amount of aggregated results in memory permits less frequent database writes, which is more efficient.
  • the rate at which the aggregated results are written can be configured based on tradeoffs of memory required to keep the aggregated results and efficiency of database writes.
  • the aggregated results are written to the database in tables according to the retention period of the corresponding retention policy.
  • table size does not need to exactly correspond to the amount of data generated during the retention period but can be on the same order of magnitude.
  • the retention period for the first stage is two hours.
  • Table 602 is initially filled with first aggregated results obtained during the first hour
  • table 604 is initially filled with first aggregated results obtained during the second hour.
  • some of the old aggregated results need to be removed to make room for new aggregated results.
  • table 602 is deleted, and table 604 now stores aggregated data for the first hour, and table 602 is used to store aggregated data for the second hour. Because aggregated results are stored in separate tables and deleted separately, holes in the database are avoided.
  • the metric manager provides a query application programming interface (API) that hides the details of the underlying time series based database table and gives the appearance of making query to and receiving results from a single table.
  • API application programming interface
  • FIG. 7 is a flowchart illustrating an embodiment of a process for querying metrics data stored in a database.
  • Process 700 can be performed by the metrics manager in response to a database query, which can be initiated manually by a user via a user interface tool provided by a performance monitoring application, automatically by the performance monitoring application, etc.
  • the database query is analyzed to determine one or more corresponding time series based database tables associated with the database query. Specifically, the time window of the query is compared with the time windows of the time series based database tables.
  • time window being queried spans only a single time series based database table. If so, the database query is performed normally without changes to the query, at 706 . If, however, the time window being queried spans multiple time series based database tables, then the particular time series based database tables are determined and the process continues at 708 .
  • the database query is converted into a union of multiple sub-queries across the determined time series based database tables.
  • filters from the database query are applied to the sub-queries such that the database's efficient filtering can be used optimally.
  • the efficiency of filtering is gained as filters are applied on a per table basis before the results are joined together.
  • the time complexity of filtering becomes K (the max number of rows in any table) instead of N (the number of combined rows across tables), where K ⁇ N.
  • the sub-queries are performed on the database.
  • the responses to the sub-queries are combined into a single response.
  • FIG. 8 is a diagram illustrating an example of a query to a database comprising multiple time series based database tables.
  • a plurality of database tables is used to store aggregated metrics from various stages of the pipeline.
  • tables 802 and 804 are shown to store the first hour of the first aggregated results and the second hour of the first aggregated results, respectively.
  • table 802 stores metrics gathered between 21:00:00-22:59:55
  • table 804 stores metrics gathered between 22:00:00-22:59:55. Metrics in both tables are gathered in 5-second increments.
  • the database query is analyzed and it is determined that there are two time series based database tables ( 802 and 804 ) that correspond to the database query.
  • the database query is converted into a union of two sub-queries, and filters are applied to the sub-queries.
  • the sub-queries correspond to their respective database tables.
  • the sub-query that spans the time window of ‘2015-03-19T21:03:25’ to ‘2015-03-19T21:59:55’ is:
  • the sub-query that spans the time window of 2015-03-19T22:00:00 to ‘2015-03-19T22:03:20’ is:
  • Managing performance metrics has been disclosed.
  • the technique described above significantly reduces the amount of I/O operations and latency associated with processing the metrics, and allows for real time analytics.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Environmental & Geological Engineering (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Managing performance metrics includes: obtaining a plurality of performance metrics associated with a plurality of sources on a network; aggregating, at a first rate, the plurality of performance metrics associated with the plurality of sources to generate a plurality of first aggregated results; maintaining at least some of the plurality of first aggregated results in one or more memories; aggregating, at a second rate, the plurality of first aggregated results to generate a plurality of second aggregated results, the second rate being a lower rate than the first rate; and maintaining at least some of the plurality of second aggregated results in the one or more memories.

Description

    CROSS REFERENCE TO OTHER APPLICATIONS
  • This application claims priority to U.S. Provisional Patent Application No. 62/137,625 entitled REAL TIME METRICS ENGINE filed Mar. 24, 2015 which is incorporated herein by reference in its entirety for all purposes.
  • BACKGROUND OF THE INVENTION
  • Metrics (also referred to as performance metrics) are used by computer systems to quantify the measurement of system performance. Metrics are critical for analyzing systems' operations and providing feedback for improvements.
  • In modern computer systems, the quantity of metrics can be large. For example, suppose that a single cloud application collects 1000 metrics for analysis every 5 seconds, which means that 720,000 metrics are collected every hour. In a typical high scale environment such as an enterprise data center that supports thousands of applications each executing on multiple servers, the rate can be on the order of billions of metrics per hour.
  • Currently, most performance monitoring tools save collected metrics to a database, then perform analysis offline. These tools tend to scale poorly because of the high number of input/output (I/O) operations (such as database reads and writes) required for storing and processing a large number of metrics. Further, these tools typically do not support real time analytics due to the latency and processing overhead in storing and processing metrics data in the database.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
  • FIG. 1 is a functional diagram illustrating a programmed computer system for managing metrics in accordance with some embodiments.
  • FIG. 2 is a block diagram illustrating an embodiment of a data center that includes a scalable distributed metrics manager.
  • FIG. 3A is a block diagram illustrating an embodiment of a metrics pipeline in a scalable distributed metrics manager.
  • FIG. 3B is a diagram illustrating an embodiment of a metric data structure.
  • FIG. 3C is a diagram illustrating an embodiment of a metrics message.
  • FIG. 4 is a flowchart illustrating an embodiment of a process for managing metrics.
  • FIGS. 5A-5B are diagrams illustrating an embodiment of an approach for archiving the aggregated results.
  • FIGS. 6A-6B are diagrams illustrating another embodiment of an approach for archiving the aggregated results.
  • FIG. 7 is a flowchart illustrating an embodiment of a process for querying metrics data stored in a database.
  • FIG. 8 is a diagram illustrating an example of a query to a database comprising multiple time series based database tables.
  • DETAILED DESCRIPTION
  • The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
  • A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
  • Managing metrics for high scale environments is disclosed. In some embodiments, the metrics are managed and processed in a pipeline comprising multiple stages. A plurality of performance metrics associated with a plurality of sources on a network is obtained. The plurality of performance metrics is aggregated at a first rate to generate a plurality of first aggregated results, and at least some of the plurality of first aggregated results are maintained for a time in one or more memories. The plurality of first aggregated results is aggregated at a second rate to generate a plurality of second aggregated results, the second rate being a lower rate than the first rate. At least some of the plurality of second aggregated results are maintained in the one or more memories. Additional aggregation stages can be used. The aggregated results can be persisted to a persistent storage.
  • FIG. 1 is a functional diagram illustrating a programmed computer system for managing metrics in accordance with some embodiments. As will be apparent, other computer system architectures and configurations can be used to manage and process metrics. Computer system 100, which includes various subsystems as described below, includes at least one microprocessor subsystem (also referred to as a processor or a central processing unit (CPU)) 102. For example, processor 102 can be implemented by a single-chip processor or by multiple processors. In some embodiments, processor 102 is a general purpose digital processor that controls the operation of the computer system 100. Using instructions retrieved from memory 110, the processor 102 controls the reception and manipulation of input data, and the output and display of data on output devices (e.g., display 118). In some embodiments, processor 102 includes and/or is used to provide server functions described below with respect to server 202, etc. of FIG. 2.
  • Processor 102 is coupled bi-directionally with memory 110, which can include a first primary storage, typically a random access memory (RAM), and a second primary storage area, typically a read-only memory (ROM). As is well known in the art, primary storage can be used as a general storage area and as scratch-pad memory, and can also be used to store input data and processed data. Primary storage can also store programming instructions and data, in the form of data objects and text objects, in addition to other data and instructions for processes operating on processor 102. Also as is well known in the art, primary storage typically includes basic operating instructions, program code, data, and objects used by the processor 102 to perform its functions (e.g., programmed instructions). For example, memory 110 can include any suitable computer-readable storage media, described below, depending on whether, for example, data access needs to be bi-directional or uni-directional. For example, processor 102 can also directly and very rapidly retrieve and store frequently needed data in a cache memory (not shown).
  • A removable mass storage device 112 provides additional data storage capacity for the computer system 100, and is coupled either bi-directionally (read/write) or uni-directionally (read only) to processor 102. For example, storage 112 can also include computer-readable media such as magnetic tape, flash memory, PC-CARDS, portable mass storage devices, holographic storage devices, and other storage devices. A fixed mass storage 120 can also, for example, provide additional data storage capacity. The most common example of mass storage 120 is a hard disk drive. Mass storages 112, 120 generally store additional programming instructions, data, and the like that typically are not in active use by the processor 102. It will be appreciated that the information retained within mass storages 112 and 120 can be incorporated, if needed, in standard fashion as part of memory 110 (e.g., RAM) as virtual memory.
  • In addition to providing processor 102 access to storage subsystems, bus 114 can also be used to provide access to other subsystems and devices. As shown, these can include a display monitor 118, a network interface 116, a keyboard 104, and a pointing device 106, as well as an auxiliary input/output device interface, a sound card, speakers, and other subsystems as needed. For example, the pointing device 106 can be a mouse, stylus, track ball, or tablet, and is useful for interacting with a graphical user interface.
  • The network interface 116 allows processor 102 to be coupled to another computer, computer network, or telecommunications network using a network connection as shown. For example, through the network interface 116, the processor 102 can receive information (e.g., data objects or program instructions) from another network or output information to another network in the course of performing method/process steps. Information, often represented as a sequence of instructions to be executed on a processor, can be received from and outputted to another network. An interface card or similar device and appropriate software implemented by (e.g., executed/performed on) processor 102 can be used to connect the computer system 100 to an external network and transfer data according to standard protocols. For example, various process embodiments disclosed herein can be executed on processor 102, or can be performed across a network such as the Internet, intranet networks, or local area networks, in conjunction with a remote processor that shares a portion of the processing. Additional mass storage devices (not shown) can also be connected to processor 102 through network interface 116.
  • An auxiliary I/O device interface (not shown) can be used in conjunction with computer system 100. The auxiliary I/O device interface can include general and customized interfaces that allow the processor 102 to send and, more typically, receive data from other devices such as microphones, touch-sensitive displays, transducer card readers, tape readers, voice or handwriting recognizers, biometrics readers, cameras, portable mass storage devices, and other computers.
  • In addition, various embodiments disclosed herein further relate to computer storage products with a computer readable medium that includes program code for performing various computer-implemented operations. The computer-readable medium is any data storage device that can store data which can thereafter be read by a computer system. Examples of computer-readable media include, but are not limited to, all the media mentioned above: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as optical disks; and specially configured hardware devices such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs), and ROM and RAM devices. Examples of program code include both machine code, as produced, for example, by a compiler, or files containing higher level code (e.g., script) that can be executed using an interpreter.
  • The computer system shown in FIG. 1 is but an example of a computer system suitable for use with the various embodiments disclosed herein. Other computer systems suitable for such use can include additional or fewer subsystems. In addition, bus 114 is illustrative of any interconnection scheme serving to link the subsystems. Other computer architectures having different configurations of subsystems can also be utilized.
  • FIG. 2 is a block diagram illustrating an embodiment of a data center that includes a scalable distributed metrics manager. In this example, client devices such as 252 connect to a data center 250 via a network 254. A client device can be a laptop computer, a desktop computer, a tablet, a mobile device, a smart phone, a wearable networking device, or any other appropriate computing device. In some embodiments, a web browser and/or a standalone client application is installed at each client, enabling a user to use the client device to access certain applications hosted by data center 250. Network 254 can be the Internet, a private network, a hybrid network, or any other communications network.
  • In the example shown, a networking layer 255 comprising networking devices such as routers, switches, etc. forwards requests from client devices 252 to a distributed network service platform 204. In this example, distributed network service platform 204 includes a number of servers configured to provide a distributed network service. A physical server (e.g., 202, 204, 206, etc.) has hardware components and software components, and may be implemented using a device such as 100. In this example, hardware (e.g., 208) of the server supports operating system software in which a number of virtual machines (VMs) (e.g., 218, 219, 220, etc.) are configured to execute. A VM is a software implementation of a machine (e.g., a computer) that simulates the way a physical machine executes programs. The part of the server's operating system that manages the VMs is referred to as the hypervisor. The hypervisor interfaces between the physical hardware and the VMs, providing a layer of abstraction to the VMs. Through its management of the VMs' sharing of the physical hardware resources, the hypervisor makes it appear as though each VM were running on its own dedicated hardware. Examples of hypervisors include the VMware Workstation® and Oracle VM VirtualBox®. Although physical servers supporting VM architecture are shown and discussed extensively for purposes of example, physical servers supporting other architectures such as container-based architecture (e.g., Kubernetes®, Docker®, Mesos®), standard operating systems, etc., can also be used and techniques described herein are also applicable. In a container-based architecture, for example, the applications are executed in special containers rather than virtual machines.
  • In some embodiments, instances of applications are configured to execute within the VMs. Examples of such applications include web applications such as shopping cart, user authentication, credit card authentication, email, file sharing, virtual desktops, voice/video streaming, online collaboration, and many others.
  • One or more service engines (e.g., 214, 224, etc.) are instantiated on a physical device. In some embodiments, a service engine is implemented as software executing in a virtual machine. The service engine is executed to provide distributed network services for applications executing on the same physical server as the service engine, and/or for applications executing on different physical servers. In some embodiments, the service engine is configured to enable appropriate service components that implement service logic. For example, a load balancer component is executed to provide load balancing logic to distribute traffic load amongst instances of applications executing on the local physical device as well as other physical devices; a firewall component is executed to provide firewall logic to instances of the applications on various devices; a metrics agent component is executed to gather metrics associated with traffic, performance, etc. associated with the instances of the applications, etc. Many other service components may be implemented and enabled as appropriate. When a specific service is desired, a corresponding service component is configured and invoked by the service engine to execute in a VM.
  • In the example shown, traffic received on a physical port of a server (e.g., a communications interface such as Ethernet port 215) is sent to a virtual switch (e.g., 212). In some embodiments, the virtual switch is configured to use an API provided by the hypervisor to intercept incoming traffic designated for the application(s) in an inline mode, and send the traffic to an appropriate service engine. In inline mode, packets are forwarded on without being replicated. As shown, the virtual switch passes the traffic to a service engine in the distributed network service layer (e.g., the service engine on the same physical device), which transforms the packets if needed and redirects the packets to the appropriate application. The service engine, based on factors such as configured rules and operating conditions, redirects the traffic to an appropriate application executing in a VM on a server. Details of the virtual switch and its operations are outside the scope of the present application.
  • Controller 290 is configured to control, monitor, program, and/or provision the distributed network services and virtual machines. In particular, the controller includes a metrics manager 292 configured to collect performance metrics and perform analytical operations. The controller can be implemented as software, hardware, firmware, or any combination thereof. In some embodiments, the controller is implemented on a system such as 100. In some cases, the controller is implemented as a single entity logically, but multiple instances of the controller are installed and executed on multiple physical devices to provide high availability and increased capacity. In embodiments implementing multiple controllers, known techniques such as those used in distributed databases are applied to synchronize and maintain coherency of data among the controller instances.
  • Within data center 250, one or more controllers 290 gather metrics data from various nodes operating in the data center. As used herein, a node refers to a computing element that is a source of metrics information. Examples of nodes include virtual machines, networking devices, service engines, or any other appropriate elements within the data center.
  • Many different types of metrics can be collected by the controller. For example, since traffic (e.g., connection requests and responses, etc.) to and from an application will pass through a corresponding service engine, metrics relating to the performance of the application and/or the VM executing the application can be directly collected by the corresponding service engine. As another example, to collect metrics relating to client responses, a service engine sends a script to a client browser or client application. The script measures client responses and returns one or more collected metrics back to the service engine. In both cases, the service engine sends the collected metrics to controller 290. Additionally, infrastructure metrics relating to the performance of other components of the service platform (e.g., metrics relating to the networking devices, metrics relating to the performance of the service engines themselves, metrics relating to the host devices such as data storage as well as operating system performance, etc.) can be collected by the controller. Specific examples of the metrics include round trip time, latency, bandwidth, number of connections, etc.
  • The components and arrangement of distributed network service platform 204 described above are for purposes of illustration only. The technique described herein is applicable to network service platforms having different components and/or arrangements.
  • FIG. 3A is a block diagram illustrating an embodiment of a metrics pipeline in a scalable distributed metrics manager. Pipeline 300 implements the process for aggregating metrics and can be used to implement scalable metrics manager 292. A pipeline processes one or more specific types of metrics, and multiple pipelines similar to 300 can be configured to process different types of metrics. In this example, pipeline 300 receives metrics from a variety of sources, such as service engines 214, 224, etc., via data streams. Metrics can also be received from other sources such as network devices, an operating system, a virtual switch, etc. (not shown). The performance metrics are continuously collected at various sources (e.g., service engines, network devices, etc.) and sent to the first stage (e.g., stage 302 of FIG. 3A) of the pipeline. The rate at which a metric is generated is arbitrary and can vary for different sources. For example, one service engine can generate metrics at a rate of 1 metric/second, while another service engine can generate metrics at a rate of 2 metrics/second.
  • Pipeline 300 comprises multiple stages aggregating metrics at different rates. In particular, the first stage aggregates raw metrics, and each successive stage aggregates the outputs from the previous stage at a lower rate (or equivalently, a coarser granularity of time or lower frequency). In the example shown, three stages are used: stage 302 aggregates metrics from their sources every 5 seconds, stage 304 aggregates the results of stage 302 every 5 minutes, and stage 306 aggregates the results of stage 304 aggregated every hour. Different numbers of stages and/or aggregation rates can be used in other embodiments. The metrics pipeline is implemented in memory to allow for fast access and analytical operations. Each stage only needs to maintain a sufficient number of inputs (e.g., metrics or results from the previous stage) in memory to perform aggregation, thus the overall number of metrics to be stored and the total amount of memory required for real time analysis are reasonable and can be implemented for high scale environments such as enterprise data centers where large volumes of metrics are constantly generated. Note that although separate buffers are shown for the output of one stage and the input of the next stage, in some implementations only one set of buffers needs to be maintained. As will be described in greater detail below, each stage performs one or more aggregation functions to generate aggregated results. Further, each of the pipeline stages is optionally connected to a persistent storage 310, such as a database, a file system, or any other appropriate non-volatile storage system, in order to write the aggregated results to the storage and back up the metrics data more permanently. For example, MongoDB is used in some implementations. Further details of the pipeline's operations are explained in connection with FIG. 4 below.
  • FIG. 3B is a diagram illustrating an embodiment of a metric data structure. In this example, metric 350 is a key-tuple data structure that includes the following fields: {MetricsObjectType, Entity, Node, ObjectID}. Depending on implementation, the values in the fields can be alphanumeric strings, numerical values, or any other appropriate data formats. MetricsObjectType specifies the type of metric being sent. Examples of MetricsObjectType include client metric, front end network metric, backend network metric, application metric, etc. Entity specifies the particular element about which the metric is being reported, such as a particular server or application executing on a virtual machine. Node specifies the particular element that is reporting the metric, such as a particular service engine. An Entity may include multiple objects, and ObjectID specifies the particular object within the entity that is generating the metric, such as a particular Internet Protocol (IP) address, a particular port, a particular Universal Resource Identifier (URI), etc. In the example shown, MetricsObjectType is set to vserver 14 client (which corresponds to a type of metric related to virtual server layer 4 client), Entity is set to vs-1 (which corresponds to a server with the identifier of vs-1), Node is set to se-1 (which corresponds to a service engine with the identifier of se-1), and ObjectID is set to port (which corresponds to the port object within the server). When a metric is stored to the database, each field can be used as an index for lookups. Metrics with different fields can be defined and used. For example, in one implementation, a metric also includes a timestamp field.
  • In some embodiments, a source can report metrics to the metrics manager without requiring explicit registration with the metrics manager. A source can report metrics to the metrics manager by sending one or more messages having predetermined formats. FIG. 3C is a diagram illustrating an embodiment of a metrics message. The message includes a header that specifies certain characteristics of the metrics being sent (e.g., number of metrics in the message, timestamp of when the message is sent, etc.), and multiple metrics in the message body. In this example, the node batches multiple metrics in a single message and sends the message to the metrics manager.
  • In some implementations, the metrics manager maintains multiple pipelines to process different types of metrics. Upon receiving a metrics message, the metrics manager parses the message to obtain metrics and places each metric in an appropriate pipeline for processing. In some embodiments, upon detecting that a metric includes a new instance of key-tuples as discussed above, the metrics manager establishes a new in-memory processing unit (e.g., a specific pipeline such as 300 that is configured with its own memory, thread, and/or process for handling metrics associated with the key-tuple), and future metrics messages having the same key-tuple will be processed by this in-memory processing unit. In some embodiments, the metrics manager can establish one or more pipelines that receive as inputs multiple key-tuple in order to generate certain desired results. The configuration of specific pipelines depends on implementation.
  • FIG. 4 is a flowchart illustrating an embodiment of a process for managing metrics. Process 400 can be implemented by scalable metrics manager 292 operating on a system such as 100.
  • At 401, metrics associated with a plurality of sources are obtained. As discussed above, the metrics can be sent in messages to the metrics manager.
  • The obtained metrics are managed in a metrics pipeline as described above in connection with FIG. 3A.
  • Specifically, at 402, the metrics associated with a plurality of sources are aggregated at a first rate to generate a plurality of first aggregated results.
  • In some cases, aggregation includes applying one or more transform operations (also referred to as aggregation operations) that transform the received metrics to generate new metrics. For example, suppose that four instances of a particular application periodically generate a set of four metrics reporting the number of connections to each application instance. One or more aggregation functions (F1) can be performed to combine (e.g., add) the four metrics to generate a new aggregated result of the total number of connections, average the four metrics to generate a new aggregated result of average number of connections, determine the minimum and/or maximum number of connections among the four metrics, compute the difference between the maximum number of connections and the minimum number of connections, etc. Many other aggregation/transform functions are possible for various metrics manager implementations. In some cases, the raw metrics are sampled at the first rate to generate the aggregated results. More commonly, a transform operation generates a corresponding aggregated result (also referred to as derived metric) based on the inputs to the transform function. Multiple transform operations can generate a vector of aggregated results. Aggregations can be performed across service engines, across multiple servers in a pool, across multiple entities, across multiple objects, etc. A pipeline can be configured to transform any appropriate metrics into a new result. The specific aggregation functions in a pipeline can be configured by the programmer or administrator according to actual system needs.
  • The first rate at which aggregation takes place corresponds to the rate at which the aggregation function is performed on the collected data. In the following examples, a constant rate is discussed extensively for purposes of example, but the rate can be a non-constant rate as well (e.g., aggregation happens when the number of metrics collected meets or exceeds a threshold or when some other triggering condition for aggregation is met). Because the aggregation only uses metrics stored in memory, it does not require any database calls and is highly efficient. Further, because the aggregation is done periodically and in a batched fashion (e.g., all first stages of the pipelines perform aggregation every 5 seconds), timers do not need to be maintained per object or per metric. Thus, aggregation can be performed quickly and efficiently.
  • The received metrics are temporarily maintained in a memory such as a Random Access Memory (RAM). The first aggregated results and/or the received metrics will be rolled up into the next stage periodically. The first stage only needs to maintain a sufficient amount of input metrics in the memory until the first aggregation is performed, after which the metrics used in the aggregation can be removed from the memory in order to save space and make room for new aggregated results and/or metrics. Before being deleted from the memory, the first aggregated results and/or the obtained metrics are optionally output to a persistent storage such as a database. In some embodiments, the aggregation of the performance metrics and the maintenance of the first aggregated results are performed in a single process to reduce the overhead of context switches. It is permissible to implement the aggregation and the maintenance steps in separate processes.
  • As will be described in greater detail below, analytical operation, event detection and generation, as well as storing the aggregation results and/or the metrics to a persistent storage can be performed.
  • At 404, the first aggregated results are aggregated at a second rate to generate a plurality of second aggregated results. This is also referred to as a roll-up operation. In this case, the second rate (which can also be constant or non-constant) is on average lower than the first rate, and the aggregation function performed in the second stage is not necessarily the same as the aggregation function performed in the first stage. Referring again to the example shown in FIG. 3A, aggregated results of the first stage (stage 302) are sent to the second stage (stage 304), to be aggregated at a rate of every 5 minutes. Suppose that the first stage generates an aggregated result every 5 seconds, then in every 5 minutes there will be 60 first aggregated results. These 60 first aggregated results from the first stage are aggregated again at the second stage according to the one or more aggregation functions (F2) specified in the second stage to generate one or more second aggregated results.
  • Similar to the first aggregated results, the second aggregated results are maintained in memory for a time. A sufficient number of the second aggregated results is maintained in the memory for the third stage of aggregation to be performed. The second aggregated results are optionally output to a persistent data store. After the second aggregated results are aggregated in the third stage, those second aggregated results that are used by the third stage for aggregation can be deleted from memory to save space. One or more analytical operations can be performed on the second aggregated results. Event detection and generation can also be performed on the second aggregated results. These operations are preferably implemented as inline operations of the metrics manager.
  • At 406, the plurality of second aggregated results is aggregated at a third rate to generate a plurality of third aggregated results. Referring again to the example shown in FIG. 3A, the aggregation results of the second stage (stage 304) are sent to the third stage (stage 306), to be aggregated at a rate of every hour. Suppose that the second stage generates an aggregated result every 5 minutes, then in one hour there will be 12 aggregated results. These 12 aggregated results from the second stage are aggregated again at the third stage according to one or more aggregation functions (F3) associated with the third stage.
  • Although three stages are shown for purposes of illustration, other numbers of stages (e.g., two stages, four, or more stages) can be implemented in various embodiments.
  • In the above process, at each stage, the metrics manager can invoke one or more corresponding analytical operations (such as anomaly detection) on the aggregated results. For example, the Holt-Winters algorithm used to detect outlier metrics and remove anomalies can be performed at any of the stages. As another example, an aggregated result (e.g., the total number of connections) is compared with a threshold (e.g., a maximum number of 100) to detect if the threshold has been exceeded. Many analytical operations are possible and can be configured by the programmer or administrator according to actual system needs. Preferably, the analytical operation is implemented as an inline function within the metrics manager process. The inline function implements additional steps in the continuous processing of the metrics within the same process and software module. Because the aggregated results are kept in memory rather than streamed to a database and because the analytical operation is inline, the analytical operation does not require any input/output (I/O) operations such as database read/write, inter-process communication, system call, messaging, etc. Thus, the analytical operations are highly efficient compared with existing analytics tools that typically require database access or file system access. The analytical operations can be performed in real time (e.g., at substantially the same time as when the performance metrics are received, or at substantially the same time as when the aggregated results are generated).
  • In some embodiments, the metrics manager generates events when metrics and/or aggregated results meet certain conditions. Event detection and generation can be implemented as an inline function where certain conditions are tested on a per metric type, per entity, and per node basis. For example, if the network connections metrics of a server sent by a service engine indicate that the connection exceeds a threshold, then an event such as an alarm or log is triggered. Other examples of event triggering conditions include: a metric meeting or exceeding a high watermark level for the first time after the metric has stayed below a low threshold; a metric meeting or falling below a low watermark level after the metric has stayed above the threshold; a metric crossing a predefined threshold; a metric indicating that an anomaly has occurred, etc. Many conditions are possible, and in some embodiments, a set of rules is specified for these conditions, and a rules processing engine compares the values associated with metrics against the rules to detect whether any specified conditions are met.
  • As discussed above, in some implementations metrics and aggregated results are recorded in a persistent storage such as a database for backup purposes. A retention policy is specified by the administrator to determine the amount of time for which corresponding stored data remains in the database. When the retention period is over, any data that is outside the retention policy period is erased from the database to conserve space.
  • FIGS. 5A-5B are diagrams illustrating an embodiment of an approach for archiving the aggregated results. In this example, aggregated results are written to the database at the time of aggregation, as shown in FIG. 5A. The retention policy periods for the first stage, the second stage, and the third stage are 2 hours, 2 days, and 1 year, respectively. Thus, after 2 hours, the database records corresponding to the first aggregated results that occurred before the current two hour window are deleted from the database, as shown in FIG. 5B. As can be seen, because the data from different stages is interspersed, deleting records associated with a particular stage can leave “holes” in the database and will slow down the query of the aggregated results, negatively impacting the database's write performance, and ultimately degrading the rate of aggregation.
  • To overcome the problem illustrated in FIGS. 5A-5B, time series based database tables are used in some embodiments. FIGS. 6A-6B are diagrams illustrating another embodiment of an approach for archiving the aggregated results. In this example, aggregated results from different stages of a pipeline occupy separate tables. Within a stage, multiple tables can be used to store the aggregated results. These tables are referred to as time series based database tables since they each correspond to a different period of aggregated results. Different time series based database tables can be subject to different retention policies. In this example, two tables are used to store the first aggregated results, where each table is configured to store one hour's worth of first aggregated results from the first stage; one table is used to store the second aggregated results, where the table is configured to store one day's worth of second aggregated results from the second stage; and one table is used to store the third aggregated results, where the table is configured to store one year's worth of third aggregated results from the third stage.
  • The aggregated results are written to the database in a batch in append mode. For example, the first aggregated results can be written to the database every 30 minutes rather than every five seconds. Maintaining a greater amount of aggregated results in memory permits less frequent database writes, which is more efficient. Thus, the rate at which the aggregated results are written can be configured based on tradeoffs of memory required to keep the aggregated results and efficiency of database writes. Further, the aggregated results are written to the database in tables according to the retention period of the corresponding retention policy.
  • Note that the table size does not need to exactly correspond to the amount of data generated during the retention period but can be on the same order of magnitude. Suppose the retention period for the first stage is two hours. Table 602 is initially filled with first aggregated results obtained during the first hour, and table 604 is initially filled with first aggregated results obtained during the second hour. In this example, at the end of two hours, some of the old aggregated results need to be removed to make room for new aggregated results. Thus, for data in the next two hour window, the entire contents of table 602 is deleted, and table 604 now stores aggregated data for the first hour, and table 602 is used to store aggregated data for the second hour. Because aggregated results are stored in separate tables and deleted separately, holes in the database are avoided.
  • Because the database uses the time series based database tables used to store aggregated results, when the database is queried, the query will not necessarily be performed on a single table. Thus, in some embodiments, the metric manager provides a query application programming interface (API) that hides the details of the underlying time series based database table and gives the appearance of making query to and receiving results from a single table.
  • FIG. 7 is a flowchart illustrating an embodiment of a process for querying metrics data stored in a database. Process 700 can be performed by the metrics manager in response to a database query, which can be initiated manually by a user via a user interface tool provided by a performance monitoring application, automatically by the performance monitoring application, etc.
  • At 702, the database query is analyzed to determine one or more corresponding time series based database tables associated with the database query. Specifically, the time window of the query is compared with the time windows of the time series based database tables.
  • At 704, it is determined whether the time window being queried spans only a single time series based database table. If so, the database query is performed normally without changes to the query, at 706. If, however, the time window being queried spans multiple time series based database tables, then the particular time series based database tables are determined and the process continues at 708.
  • At 708, the database query is converted into a union of multiple sub-queries across the determined time series based database tables.
  • At 710, filters from the database query are applied to the sub-queries such that the database's efficient filtering can be used optimally. The efficiency of filtering is gained as filters are applied on a per table basis before the results are joined together. Thus, the time complexity of filtering becomes K (the max number of rows in any table) instead of N (the number of combined rows across tables), where K<<N.
  • At 712, the sub-queries are performed on the database.
  • At 714, the responses to the sub-queries are combined into a single response.
  • This way, to the generator of the query (e.g., the performance monitoring application), it appears as if the query were performed on a single table.
  • FIG. 8 is a diagram illustrating an example of a query to a database comprising multiple time series based database tables. In FIG. 8, a plurality of database tables is used to store aggregated metrics from various stages of the pipeline. In particular, tables 802 and 804 are shown to store the first hour of the first aggregated results and the second hour of the first aggregated results, respectively. As shown, table 802 stores metrics gathered between 21:00:00-22:59:55 and table 804 stores metrics gathered between 22:00:00-22:59:55. Metrics in both tables are gathered in 5-second increments.
  • Suppose the following database query is made to query the database:
    • SELECT se_stats_table.metric_timestamp AS se_stats_table_metric_timestamp, se_stats_table.avg_cpu_usage AS se_stats_table_avg_cpu_usage, entity_table.entity_id AS entity_id
    • FROM se_stats_table JOIN entity_table ON entity_table.entity_key=se_stats_table.entity_key WHERE se_stats_table.metric_timestamp>=‘2015-03-19T21:03:25’ AND se_stats_table.metric_timestamp<=‘2015-03-19T22:03:20’ AND se_stats_table.metric_period=‘5SECOND’ AND entity_table.entity_id=‘se-1’
  • Referring to FIG. 7, at 702, the database query is analyzed and it is determined that there are two time series based database tables (802 and 804) that correspond to the database query.
  • At 708 and 710, the database query is converted into a union of two sub-queries, and filters are applied to the sub-queries. In this example, the sub-queries correspond to their respective database tables. The sub-query that spans the time window of ‘2015-03-19T21:03:25’ to ‘2015-03-19T21:59:55’ is:
    • SELECT se_stats_table_1hour_396333.metric_timestamp AS se_stats_table_1hour_396333_metric_timestamp, se_stats_table_1hour_396333.avg_cpu_usage AS se_stats_table_1hour_396333_avg_cpu_usage, entity_table.entity_id AS entity_id
    • FROM se_stats_table_1hour_396333 JOIN entity_table ON entity_table.entity_key=se_stats_table_1hour_396333.entity_key
    • WHERE se_stats_table_1hour 396333.metric_timestamp>=‘2015-03-19T21:03:25’ AND se_stats_table_1hour 396333.metric_timestamp ‘2015-03-19T21:59:55’ AND se_stats_table_1hour 396333.metric_period=‘5SECOND’ AND entity_table.entity_id=‘se-1’
  • The sub-query that spans the time window of 2015-03-19T22:00:00 to ‘2015-03-19T22:03:20’ is:
    • SELECT se_stats_table_1hour 396334.metric_timestamp AS se_stats_table_1hour 396334_metric_timestamp, se_stats_table_1hour 396334.avg_cpu_usage AS se_stats_table_1hour 396334_avg_cpu_usage, entity_table.entity_id AS entity_id
    • FROM se_stats_table_1hour 396334 JOIN entity_table ON entity_table.entity_key=se_stats_table_1hour 396334.entity_key
    • WHERE se_stats_table_1hour 396334.metric_timestamp>=‘2015-03-19T22:00:00’ AND se_stats_table_1hour 396334.metric_timestamp ‘2015-03-19T22:03:20’ AND se_stats_table_1hour 396334.metric_period=‘5SECOND’ AND entity_table.entity_id=‘se-1’
  • The union of the sub-queries with filters is:
    • SELECT anon_1.se_stats_table_1hour 396333_metric_timestamp AS metric_timestamp, anon_1.se_stats_table_1hour_396333_avg_cpu_usage AS avg_cpu_usage, anon_1.entity_id AS entity_id
    • FROM (SELECT se_stats_table_1hour 396333.metric_timestamp AS se_stats_table_1hour_396333_metric_timestamp, se_stats_table_1hour_396333.avg_cpu_usage AS se_stats_table_1hour_396333_avg_cpu_usage, entity_table.entity_id AS entity_id FROM se_stats_table_1hour_396333 JOIN entity_table ON entity_table.entity_key=se_stats_table_1hour 396333.entity_key
    • WHERE se_stats_table_1hour 396333.metric_timestamp>=‘2015-03-19T21:03:25’ AND se_stats_table_1hour_396333.metric_timestamp<=‘2015-03-19T21:59:55’ AND se_stats_table_1hour 396333.metric_period=‘5SECOND’ AND entity_table.entity_id=‘se-1’ UNION ALL SELECT se_stats_table_1hour_396334.metric_timestamp AS se_stats_table_1hour 396334_metric_timestamp, se_stats_table_1hour 396334.avg_cpu_usage AS se_stats_table_1hour 396334 avg_cpu_usage, entity_table.entity_id AS entity_id FROM se_stats_table_1hour 396334 JOIN entity_table ON entity_table.entity_key=se_stats_table_1hour 396334.entity_key
    • WHERE se_stats_table_1hour 396334.metric_timestamp>=‘2015-03-19T22:00:00’ AND se_stats_table_1hour 396334.metric_timestamp ‘2015-03-19T22:03:20’ AND se_stats_table_1hour 396334.metric_period=‘5SECOND’ AND entity_table.entity_id=‘se-1’) AS anon_1
    • ORDER BY anon 1.se_stats_table_1hour 396333_metric_timestamp LIMIT 720
  • The rearrangement of the query across multiple time series tables shown above does not compromise the performance of read operations to the database, and facilitates efficient write operations to the database by the metrics manager.
  • Managing performance metrics has been disclosed. By processing the metrics in a pipeline in memory, the technique described above significantly reduces the amount of I/O operations and latency associated with processing the metrics, and allows for real time analytics.
  • Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.

Claims (19)

What is claimed is:
1-21. (canceled)
22. A method of analyzing metric data sets associated with a set of elements in a network, the method comprising:
aggregating, at a first rate, a plurality of metric data sets associated with the set of network elements to generate a plurality of first aggregated results;
aggregating, at a second rate, the plurality of first aggregated results to generate a plurality of second aggregated results, the second rate being a lower rate than the first rate; and
analyzing the plurality of second aggregated results in order to monitor the set of network elements,
said first and second aggregation operations performed to reduce amount of memory used to store metric data sets by producing aggregated results for said analyzing operation.
23. The method of claim 22 further comprising analyzing the plurality of first aggregated results in order to monitor performance of the set of network elements.
24. The method of claim 22, wherein the pluralities of the first and second aggregated results are stored in memory, and the analyzing comprises performing fast analytical operations on the plurality of the second aggregated results stored in memory in order to monitor the set of network elements.
25. The method of claim 24, wherein the analyzing further comprises performing fast event detection operations on the plurality of second aggregated results stored in memory in order to identify events associated with the set of network elements.
26. The method of claim 24 further comprising storing the pluralities of first and second aggregated results to one or more database for subsequent queries.
27. The method of claim 22, wherein the analyzing comprises:
performing an analytical operation on the plurality of second aggregated results to monitor the set of network elements; and
performing event detection operation on the plurality of second aggregated results to identify events associated with the set of network elements.
28. The method of claim 27, wherein the analyzing comprises:
performing an analytical operation on the plurality of first aggregated results to monitor the set of network elements; and
performing event detection operation on the plurality of first aggregated results to identify events associated with the set of network elements.
29. The method of claim 21 further comprising collecting the plurality of metric data sets from a plurality of sources in the network that collect metric data at different rates.
30. The method of claim 21 further comprising storing the pluralities of the first and second aggregated results in memory;
aggregating, at a third rate, the plurality of second aggregated results to generate a plurality of third aggregated results, the third rate being a lower rate than the first and second rates; and
analyzing the plurality of third aggregated results in order to monitor the set of network elements.
31. A non-transitory computer readable medium storing a program for analyzing metric data sets associated with a set of elements in a network, the program executable by a processing unit, the program comprising sets of instructions for:
aggregating, at a first rate, a plurality of metric data sets associated with the set of network elements to generate a plurality of first aggregated results;
aggregating, at a second rate, the plurality of first aggregated results to generate a plurality of second aggregated results, the second rate being a lower rate than the first rate; and
analyzing the plurality of second aggregated results in order to monitor the set of network elements,
said first and second aggregation operations performed to reduce amount of memory used to store metric data sets by producing aggregated results for said analyzing operation.
32. The non-transitory computer readable medium of claim 31, the program further comprising a set of instructions for analyzing the plurality of first aggregated results in order to monitor performance of the set of network elements.
33. The non-transitory computer readable medium of claim 31, wherein the pluralities of the first and second aggregated results are stored in memory, and the set of instructions for analyzing comprises a set of instructions for performing fast analytical operations on the plurality of the second aggregated results stored in memory in order to monitor the set of network elements.
34. The non-transitory computer readable medium of claim 33, wherein the set of instructions for analyzing further comprises a set of instructions for performing fast event detection operations on the plurality of second aggregated results stored in memory in order to identify events associated with the set of network elements.
35. The non-transitory computer readable medium of claim 33, the program further comprising a set of instructions for storing the pluralities of first and second aggregated results to one or more database for subsequent queries.
36. The non-transitory computer readable medium of claim 31, wherein the set of instructions for analyzing comprises sets of instructions for:
performing an analytical operation on the plurality of second aggregated results to monitor the set of network elements; and
performing event detection operation on the plurality of second aggregated results to identify events associated with the set of network elements.
37. The non-transitory computer readable medium of claim 36, wherein the set of instructions for analyzing comprises sets of instructions for:
performing an analytical operation on the plurality of first aggregated results to monitor the set of network elements; and
performing event detection operation on the plurality of first aggregated results to identify events associated with the set of network elements.
38. The non-transitory computer readable medium of claim 31, the program further comprising a set of instructions for collecting the plurality of metric data sets from a plurality of sources in the network that collect metric data at different rates.
39. The non-transitory computer readable medium of claim 31, the program further comprising sets of instructions for:
storing the pluralities of the first and second aggregated results in memory;
aggregating, at a third rate, the plurality of second aggregated results to generate a plurality of third aggregated results, the third rate being a lower rate than the first and second rates; and
analyzing the plurality of third aggregated results in order to monitor the set of network elements.
US17/700,037 2015-03-24 2022-03-21 Scalable real time metrics management Pending US20220286373A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/700,037 US20220286373A1 (en) 2015-03-24 2022-03-21 Scalable real time metrics management

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201562137625P 2015-03-24 2015-03-24
US15/055,450 US11283697B1 (en) 2015-03-24 2016-02-26 Scalable real time metrics management
US17/700,037 US20220286373A1 (en) 2015-03-24 2022-03-21 Scalable real time metrics management

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/055,450 Continuation US11283697B1 (en) 2015-03-24 2016-02-26 Scalable real time metrics management

Publications (1)

Publication Number Publication Date
US20220286373A1 true US20220286373A1 (en) 2022-09-08

Family

ID=80781862

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/055,450 Active 2039-06-25 US11283697B1 (en) 2015-03-24 2016-02-26 Scalable real time metrics management
US17/700,037 Pending US20220286373A1 (en) 2015-03-24 2022-03-21 Scalable real time metrics management

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/055,450 Active 2039-06-25 US11283697B1 (en) 2015-03-24 2016-02-26 Scalable real time metrics management

Country Status (1)

Country Link
US (2) US11283697B1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220237203A1 (en) * 2021-01-22 2022-07-28 Vmware, Inc. Method and system for efficiently propagating objects across a federated datacenter
US11736372B2 (en) 2018-10-26 2023-08-22 Vmware, Inc. Collecting samples hierarchically in a datacenter
US11792155B2 (en) 2021-06-14 2023-10-17 Vmware, Inc. Method and apparatus for enhanced client persistence in multi-site GSLB deployments
US11811861B2 (en) 2021-05-17 2023-11-07 Vmware, Inc. Dynamically updating load balancing criteria
US11909612B2 (en) 2019-05-30 2024-02-20 VMware LLC Partitioning health monitoring in a global server load balancing system
US12107821B2 (en) 2022-07-14 2024-10-01 VMware LLC Two tier DNS

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7636708B2 (en) * 2000-11-10 2009-12-22 Microsoft Corporation Distributed data gathering and aggregation agent
US20130346594A1 (en) * 2012-06-25 2013-12-26 International Business Machines Corporation Predictive Alert Threshold Determination Tool
US20150106325A1 (en) * 2012-01-13 2015-04-16 Amazon Technologies, Inc. Distributed storage of aggregated data
US20160125330A1 (en) * 2014-10-31 2016-05-05 AppDynamics, Inc. Rolling upgrade of metric collection and aggregation system
US9626275B1 (en) * 2014-06-05 2017-04-18 Amazon Technologies, Inc. Dynamic rate adjustment for interaction monitoring

Family Cites Families (240)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5109486A (en) 1989-01-06 1992-04-28 Motorola, Inc. Distributed computer system with network and resource status monitoring
US6515968B1 (en) 1995-03-17 2003-02-04 Worldcom, Inc. Integrated interface for real time web based viewing of telecommunications network call traffic
US5781703A (en) 1996-09-06 1998-07-14 Candle Distributed Solutions, Inc. Intelligent remote agent for computer performance monitoring
US6714979B1 (en) 1997-09-26 2004-03-30 Worldcom, Inc. Data warehousing infrastructure for web based reporting tool
US6763376B1 (en) 1997-09-26 2004-07-13 Mci Communications Corporation Integrated customer interface system for communications network management
US6148335A (en) 1997-11-25 2000-11-14 International Business Machines Corporation Performance/capacity management framework over many servers
US6804714B1 (en) * 1999-04-16 2004-10-12 Oracle International Corporation Multidimensional repositories for problem discovery and capacity planning of database applications
US6973490B1 (en) 1999-06-23 2005-12-06 Savvis Communications Corp. Method and system for object-level web performance and analysis
US6449739B1 (en) 1999-09-01 2002-09-10 Mercury Interactive Corporation Post-deployment monitoring of server performance
US6792458B1 (en) 1999-10-04 2004-09-14 Urchin Software Corporation System and method for monitoring and analyzing internet traffic
US6901051B1 (en) 1999-11-15 2005-05-31 Fujitsu Limited Server-based network performance metrics generation system and method
EP1134941A1 (en) 2000-03-15 2001-09-19 TELEFONAKTIEBOLAGET LM ERICSSON (publ) Method and arrangement for flow control
US6976090B2 (en) 2000-04-20 2005-12-13 Actona Technologies Ltd. Differentiated content and application delivery via internet
US20030050932A1 (en) 2000-09-01 2003-03-13 Pace Charles P. System and method for transactional deployment of J2EE web components, enterprise java bean components, and application data over multi-tiered computer networks
ATE379807T1 (en) 2000-12-11 2007-12-15 Microsoft Corp METHOD AND SYSTEM FOR MANAGING MULTIPLE NETWORK EQUIPMENT
US20020078150A1 (en) 2000-12-18 2002-06-20 Nortel Networks Limited And Bell Canada Method of team member profile selection within a virtual team environment
US7197559B2 (en) 2001-05-09 2007-03-27 Mercury Interactive Corporation Transaction breakdown feature to facilitate analysis of end user performance of a server system
US20020198985A1 (en) 2001-05-09 2002-12-26 Noam Fraenkel Post-deployment monitoring and analysis of server performance
US7076695B2 (en) * 2001-07-20 2006-07-11 Opnet Technologies, Inc. System and methods for adaptive threshold determination for performance metrics
US6750897B1 (en) 2001-08-16 2004-06-15 Verizon Data Services Inc. Systems and methods for implementing internet video conferencing using standard phone calls
KR100385996B1 (en) 2001-09-05 2003-06-02 삼성전자주식회사 Method for allocating a plurality of IP addresses to a NIC(Network Interface Card) and apparatus therefor
US7231442B2 (en) 2002-04-03 2007-06-12 Tonic Software, Inc. Global network monitoring system
US7711751B2 (en) * 2002-06-13 2010-05-04 Netscout Systems, Inc. Real-time network performance monitoring system and related methods
US7877435B2 (en) 2002-06-20 2011-01-25 International Business Machines Corporation Method and system for transaction pipeline decomposition
EP1527395A4 (en) 2002-06-25 2006-03-01 Ibm Method and system for monitoring performance of application in a distributed environment
US6792460B2 (en) 2002-10-02 2004-09-14 Mercury Interactive Corporation System and methods for monitoring application server performance
US7246159B2 (en) 2002-11-01 2007-07-17 Fidelia Technology, Inc Distributed data gathering and storage for use in a fault and performance monitoring system
US20040103186A1 (en) 2002-11-21 2004-05-27 Fabio Casati Platform and method for monitoring and analyzing data
US7626985B2 (en) 2003-06-27 2009-12-01 Broadcom Corporation Datagram replication in internet protocol multicast switching in a network device
US7505953B2 (en) 2003-07-11 2009-03-17 Computer Associates Think, Inc. Performance monitoring of method calls and database statements in an application server
US7386316B2 (en) 2003-08-17 2008-06-10 Omnivision Technologies, Inc. Enhanced video streaming using dual network mode
US8776050B2 (en) 2003-08-20 2014-07-08 Oracle International Corporation Distributed virtual machine monitor for managing multiple virtual resources across multiple physical nodes
US8018852B2 (en) 2003-08-22 2011-09-13 Alcatel Lucent Equal-cost source-resolved routing system and method
US8588069B2 (en) 2003-08-29 2013-11-19 Ineoquest Technologies, Inc. System and method for analyzing the performance of multiple transportation streams of streaming media in packet-based networks
US20050060574A1 (en) 2003-09-13 2005-03-17 Finisar Corporation Network analysis graphical user interface
US20050108444A1 (en) 2003-11-19 2005-05-19 Flauaus Gary R. Method of detecting and monitoring fabric congestion
US7130812B1 (en) * 2003-11-26 2006-10-31 Centergistic Solutions, Inc. Method and system for managing real time data
US20050188221A1 (en) 2004-02-24 2005-08-25 Covelight Systems, Inc. Methods, systems and computer program products for monitoring a server application
US7701852B1 (en) 2004-08-26 2010-04-20 Sprint Communications Company L.P. Method for analyzing performance of a network through measuring and reporting delay in routing devices
ATE540361T1 (en) 2004-10-20 2012-01-15 Telecom Italia Spa METHOD AND SYSTEM FOR MONITORING THE PERFORMANCE OF A CLIENT SERVER ARCHITECTURE
US7743380B2 (en) 2005-01-21 2010-06-22 Hewlett-Packard Development Company, L.P. Monitoring clustered software applications
US20060209818A1 (en) 2005-03-18 2006-09-21 Purser Jimmy R Methods and devices for preventing ARP cache poisoning
US7990847B1 (en) 2005-04-15 2011-08-02 Cisco Technology, Inc. Method and system for managing servers in a server cluster
US7743128B2 (en) 2005-04-20 2010-06-22 Netqos, Inc. Method and system for visualizing network performance characteristics
US20060271677A1 (en) 2005-05-24 2006-11-30 Mercier Christina W Policy based data path management, asset management, and monitoring
US8429630B2 (en) 2005-09-15 2013-04-23 Ca, Inc. Globally distributed utility computing cloud
US8032896B1 (en) 2005-11-01 2011-10-04 Netapp, Inc. System and method for histogram based chatter suppression
US7558985B2 (en) * 2006-02-13 2009-07-07 Sun Microsystems, Inc. High-efficiency time-series archival system for telemetry signals
US9047648B1 (en) 2006-03-30 2015-06-02 At&T Mobility Ii Llc Measurement, collection, reporting and processing of health condition data
US7817549B1 (en) 2006-06-30 2010-10-19 Extreme Networks, Inc. Flexible flow-aging mechanism
US8627402B2 (en) 2006-09-19 2014-01-07 The Invention Science Fund I, Llc Evaluation systems and methods for coordinating software agents
US20080080517A1 (en) 2006-09-28 2008-04-03 At & T Corp. System and method for forwarding traffic data in an MPLS VPN
US20080101233A1 (en) 2006-10-25 2008-05-01 The Governors Of The University Of Alberta Method and apparatus for load balancing internet traffic
US8874725B1 (en) 2006-11-15 2014-10-28 Conviva Inc. Monitoring the performance of a content player
US8769120B2 (en) 2006-11-28 2014-07-01 Sap Ag Method and system to monitor parameters of a data flow path in a communication system
US7940766B2 (en) 2006-12-08 2011-05-10 Alcatel Lucent Multicasting unicast packet/multiple classification of a packet
WO2008111067A1 (en) 2007-03-12 2008-09-18 Joliper Ltd. Method of providing a service over a hybrid network and system thereof
US8964571B2 (en) 2007-07-06 2015-02-24 Alcatel Lucent Method and apparatus for simultaneous support of fast restoration and native multicast in IP networks
US7979895B2 (en) 2007-08-16 2011-07-12 International Business Machines Corporation System and method for partitioning a multi-level security namespace
US8131712B1 (en) 2007-10-15 2012-03-06 Google Inc. Regional indexes
US8001365B2 (en) * 2007-12-13 2011-08-16 Telefonaktiebolaget L M Ericsson (Publ) Exchange of processing metric information between nodes
US8261278B2 (en) * 2008-02-01 2012-09-04 Ca, Inc. Automatic baselining of resource consumption for transactions
US20100036903A1 (en) 2008-08-11 2010-02-11 Microsoft Corporation Distributed load balancer
WO2010031001A1 (en) 2008-09-12 2010-03-18 Network Foundation Technologies, Llc System for distributing content data over a computer network and method of arranging nodes for distribution of data over a computer network
CN101741709B (en) 2008-11-06 2012-08-22 华为技术有限公司 Method and system for establishing label switched path and network node
WO2010071888A2 (en) 2008-12-19 2010-06-24 Watchguard Technologies, Inc. Cluster architecture and configuration for network security devices
US8412493B2 (en) * 2008-12-22 2013-04-02 International Business Machines Corporation Multi-dimensional model generation for determining service performance
US7924739B2 (en) 2008-12-22 2011-04-12 At&T Intellectual Property I, L.P. Method and apparatus for one-way passive loss measurements using sampled flow statistics
US8433749B2 (en) 2009-04-15 2013-04-30 Accenture Global Services Limited Method and system for client-side scaling of web server farm architectures in a cloud data center
US8170491B2 (en) 2009-05-04 2012-05-01 Qualcomm Incorporated System and method for real-time performance and load statistics of a communications system
US20100287262A1 (en) 2009-05-08 2010-11-11 Uri Elzur Method and system for guaranteed end-to-end data flows in a local networking domain
US8037076B2 (en) 2009-05-11 2011-10-11 Red Hat, Inc. Federated indexing from hashed primary key slices
US9830192B1 (en) * 2014-11-10 2017-11-28 Turbonomic, Inc. Managing application performance in virtualization systems
WO2011063269A1 (en) 2009-11-20 2011-05-26 Alert Enterprise, Inc. Method and apparatus for risk visualization and remediation
WO2011071850A2 (en) 2009-12-07 2011-06-16 Coach Wei System and method for website performance optimization and internet traffic processing
US20110185082A1 (en) 2009-12-29 2011-07-28 Tervela, Inc. Systems and methods for network virtualization
US10289636B2 (en) 2010-02-08 2019-05-14 Here Global B.V. Virtual table generator for analyzing geographic databases
US8611251B2 (en) 2010-02-08 2013-12-17 Force10 Networks, Inc. Method and apparatus for the distribution of network traffic
US8630297B2 (en) 2010-02-08 2014-01-14 Force10 Networks, Inc. Method and apparatus for the distribution of network traffic
US8472438B2 (en) 2010-04-23 2013-06-25 Telefonaktiebolaget L M Ericsson (Publ) Efficient encapsulation of packets transmitted on a packet-pseudowire over a packet switched network
US8499093B2 (en) 2010-05-14 2013-07-30 Extreme Networks, Inc. Methods, systems, and computer readable media for stateless load balancing of network traffic flows
US8462774B2 (en) 2010-08-04 2013-06-11 Alcatel Lucent Virtual IP interfaces on multi-chassis link aggregates
EP2609502A4 (en) 2010-08-24 2017-03-29 Jay Moorthi Method and apparatus for clearing cloud compute demand
US8949410B2 (en) 2010-09-10 2015-02-03 Cisco Technology, Inc. Server load balancer scaling for virtual servers
US9092561B2 (en) 2010-10-20 2015-07-28 Microsoft Technology Licensing, Llc Model checking for distributed application validation
US8711703B2 (en) 2010-10-29 2014-04-29 Telefonaktiebolaget L M Ericsson (Publ) Load balancing in shortest-path-bridging networks
US8667138B2 (en) 2010-10-29 2014-03-04 Cisco Technology, Inc. Distributed hierarchical rendering and provisioning of cloud services
US8499066B1 (en) 2010-11-19 2013-07-30 Amazon Technologies, Inc. Predicting long-term computing resource usage
US8755283B2 (en) 2010-12-17 2014-06-17 Microsoft Corporation Synchronizing state among load balancer components
US20120155468A1 (en) 2010-12-21 2012-06-21 Microsoft Corporation Multi-path communications in a data center environment
JP5843459B2 (en) 2011-03-30 2016-01-13 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Information processing system, information processing apparatus, scaling method, program, and recording medium
US8806018B2 (en) 2011-04-01 2014-08-12 Carnegie Mellon University Dynamic capacity management of multiple parallel-connected computing resources
US10452836B2 (en) 2011-05-09 2019-10-22 Pure Storage, Inc. Retrieving a hypertext markup language file from a dispersed storage network memory
US8812727B1 (en) 2011-06-23 2014-08-19 Amazon Technologies, Inc. System and method for distributed load balancing with distributed direct server return
US8671407B2 (en) 2011-07-06 2014-03-11 Microsoft Corporation Offering network performance guarantees in multi-tenant datacenters
US8713378B2 (en) 2011-07-07 2014-04-29 Microsoft Corporation Health monitoring of applications in a guest partition
US8681802B2 (en) 2011-08-15 2014-03-25 Cisco Technology, Inc. Proxy FHRP for anycast routing service
US9495222B1 (en) * 2011-08-26 2016-11-15 Dell Software Inc. Systems and methods for performance indexing
US9252979B2 (en) 2011-10-01 2016-02-02 Oracle International Corporation Transparent configuration of virtual hosts supporting multiple time zones in an enterprise platform
US9329904B2 (en) 2011-10-04 2016-05-03 Tier 3, Inc. Predictive two-dimensional autoscaling
US20130091266A1 (en) * 2011-10-05 2013-04-11 Ajit Bhave System for organizing and fast searching of massive amounts of data
US8856797B1 (en) 2011-10-05 2014-10-07 Amazon Technologies, Inc. Reactive auto-scaling of capacity
US9148381B2 (en) 2011-10-21 2015-09-29 Qualcomm Incorporated Cloud computing enhanced gateway for communication networks
US9697316B1 (en) 2011-12-13 2017-07-04 Amazon Technologies, Inc. System and method for efficient data aggregation with sparse exponential histogram
US9088584B2 (en) 2011-12-16 2015-07-21 Cisco Technology, Inc. System and method for non-disruptive management of servers in a network environment
US9083710B1 (en) 2012-01-03 2015-07-14 Google Inc. Server load balancing using minimally disruptive hash tables
US20130179894A1 (en) 2012-01-09 2013-07-11 Microsoft Corporation Platform as a service job scheduling
US9170849B2 (en) 2012-01-09 2015-10-27 Microsoft Technology Licensing, Llc Migration of task to different pool of resources based on task retry count during task lease
US9372735B2 (en) 2012-01-09 2016-06-21 Microsoft Technology Licensing, Llc Auto-scaling of pool of virtual machines based on auto-scaling rules of user associated with the pool
US20130179289A1 (en) 2012-01-09 2013-07-11 Microsoft Corportaion Pricing of resources in virtual machine pools
US20150023352A1 (en) 2012-02-08 2015-01-22 Hangzhou H3C Technologies Co., Ltd. Implement equal cost multiple path of trill network
US9477936B2 (en) 2012-02-09 2016-10-25 Rockwell Automation Technologies, Inc. Cloud-based operator interface for industrial automation
US9246777B2 (en) 2012-02-14 2016-01-26 Hitachi, Ltd. Computer program and monitoring apparatus
US8819275B2 (en) 2012-02-28 2014-08-26 Comcast Cable Communications, Llc Load balancing and session persistence in packet networks
US9071541B2 (en) 2012-04-25 2015-06-30 Juniper Networks, Inc. Path weighted equal-cost multipath
US8918510B2 (en) 2012-04-27 2014-12-23 Hewlett-Packard Development Company, L. P. Evaluation of cloud computing services
US9329915B1 (en) 2012-05-08 2016-05-03 Amazon Technologies, Inc. System and method for testing in a production environment
US8804531B2 (en) 2012-05-21 2014-08-12 Cisco Technology, Inc. Methods and apparatus for load balancing across member ports for traffic egressing out of a port channel
US9729414B1 (en) 2012-05-21 2017-08-08 Thousandeyes, Inc. Monitoring service availability using distributed BGP routing feeds
US8989189B2 (en) 2012-06-07 2015-03-24 Cisco Technology, Inc. Scaling IPv4 in data center networks employing ECMP to reach hosts in a directly connected subnet
US8972602B2 (en) 2012-06-15 2015-03-03 Citrix Systems, Inc. Systems and methods for using ECMP routes for traffic distribution
US9071537B2 (en) 2012-06-15 2015-06-30 Citrix Systems, Inc. Systems and methods for propagating health of a cluster node
US9112787B2 (en) 2012-06-21 2015-08-18 Cisco Technology, Inc. First hop load balancing
US10404556B2 (en) * 2012-06-22 2019-09-03 Microsoft Technology Licensing, Llc Methods and computer program products for correlation analysis of network traffic in a network device
US9619297B2 (en) 2012-06-25 2017-04-11 Microsoft Technology Licensing, Llc Process migration in data center networks
US9262253B2 (en) 2012-06-28 2016-02-16 Microsoft Technology Licensing, Llc Middlebox reliability
US9390055B2 (en) 2012-07-17 2016-07-12 Coho Data, Inc. Systems, methods and devices for integrating end-host and network resources in distributed memory
US9083642B2 (en) 2012-07-27 2015-07-14 Dell Products L.P. Systems and methods for optimizing layer three routing in an information handling system
WO2014052099A2 (en) 2012-09-25 2014-04-03 A10 Networks, Inc. Load distribution in data networks
US9280437B2 (en) * 2012-11-20 2016-03-08 Bank Of America Corporation Dynamically scalable real-time system monitoring
US9253520B2 (en) 2012-12-14 2016-02-02 Biscotti Inc. Video capture, processing and distribution system
US9819729B2 (en) 2012-12-21 2017-11-14 Bmc Software, Inc. Application monitoring for cloud-based architectures
US9032078B2 (en) 2013-01-02 2015-05-12 International Business Machines Corporation Predictive scaling for clusters
US9332028B2 (en) 2013-01-25 2016-05-03 REMTCS Inc. System, method, and apparatus for providing network security
US9762471B2 (en) 2013-01-26 2017-09-12 F5 Networks, Inc. Methods and systems for estimating and analyzing flow activity and path performance data in cloud or distributed systems
US9256573B2 (en) 2013-02-14 2016-02-09 International Business Machines Corporation Dynamic thread status retrieval using inter-thread communication
US9378306B2 (en) 2013-03-12 2016-06-28 Business Objects Software Ltd. Binning visual definition for visual intelligence
US9124490B2 (en) * 2013-03-15 2015-09-01 Comcast Cable Communications, Llc Consolidated performance metric analysis
US9477500B2 (en) 2013-03-15 2016-10-25 Avi Networks Managing and controlling a distributed network service platform
US9596299B2 (en) 2013-04-06 2017-03-14 Citrix Systems, Inc. Systems and methods for dynamically expanding load balancing pool
US10038626B2 (en) 2013-04-16 2018-07-31 Amazon Technologies, Inc. Multipath routing in a distributed load balancer
US9459980B1 (en) 2013-04-17 2016-10-04 Amazon Technologies, Inc. Varying cluster sizes in a predictive test load while testing a productive system
US9491063B2 (en) 2013-05-15 2016-11-08 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for providing network services orchestration
US9495420B2 (en) 2013-05-22 2016-11-15 International Business Machines Corporation Distributed feature collection and correlation engine
US8995398B2 (en) 2013-06-04 2015-03-31 Dell Products L.P. System and method for efficient L3 mobility in a wired/wireless network
US9137165B2 (en) 2013-06-17 2015-09-15 Telefonaktiebolaget L M Ericsson (Publ) Methods of load balancing using primary and stand-by addresses and related load balancers and servers
US9509614B2 (en) 2013-06-20 2016-11-29 Cisco Technology, Inc. Hierarchical load balancing in a network environment
US9288193B1 (en) 2013-06-25 2016-03-15 Intuit Inc. Authenticating cloud services
CN103365695B (en) 2013-07-31 2017-04-26 广州市动景计算机科技有限公司 Method and device for increasing sub-resource loading speed
US10110684B1 (en) 2013-08-15 2018-10-23 Avi Networks Transparent network service migration across service devices
US9843520B1 (en) 2013-08-15 2017-12-12 Avi Networks Transparent network-services elastic scale-out
US9412075B2 (en) 2013-08-23 2016-08-09 Vmware, Inc. Automated scaling of multi-tier applications using reinforced learning
US9386086B2 (en) 2013-09-11 2016-07-05 Cisco Technology Inc. Dynamic scaling for multi-tiered distributed systems using payoff optimization of application classes
US20150078152A1 (en) 2013-09-13 2015-03-19 Microsoft Corporation Virtual network routing
US20150081883A1 (en) 2013-09-17 2015-03-19 Stackdriver, Inc. System and method of adaptively and dynamically modelling and monitoring applications and software architecture hosted by an iaas provider
US9998530B2 (en) 2013-10-15 2018-06-12 Nicira, Inc. Distributed global load-balancing system for software-defined data centers
US9876711B2 (en) 2013-11-05 2018-01-23 Cisco Technology, Inc. Source address translation in overlay networks
US9571516B1 (en) 2013-11-08 2017-02-14 Skyhigh Networks, Inc. Cloud service usage monitoring system
US9870310B1 (en) 2013-11-11 2018-01-16 Amazon Technologies, Inc. Data providers for annotations-based generic load generator
JP6248560B2 (en) 2013-11-13 2017-12-20 富士通株式会社 Management program, management method, and management apparatus
US10193771B2 (en) 2013-12-09 2019-01-29 Nicira, Inc. Detecting and handling elephant flows
US9608932B2 (en) 2013-12-10 2017-03-28 International Business Machines Corporation Software-defined networking single-source enterprise workload manager
US9300552B2 (en) 2013-12-16 2016-03-29 International Business Machines Corporation Scaling a cloud infrastructure
KR20150083713A (en) 2014-01-10 2015-07-20 삼성전자주식회사 Electronic device and method for managing resource
CN104796347A (en) 2014-01-20 2015-07-22 中兴通讯股份有限公司 Load balancing method, device and system
US9678800B2 (en) * 2014-01-30 2017-06-13 International Business Machines Corporation Optimum design method for configuration of servers in a data center environment
US9712404B2 (en) 2014-03-07 2017-07-18 Hitachi, Ltd. Performance evaluation method and information processing device
US9419889B2 (en) 2014-03-07 2016-08-16 Nicira, Inc. Method and system for discovering a path of network traffic
US10003550B1 (en) 2014-03-14 2018-06-19 Amazon Technologies Smart autoscaling of a cluster for processing a work queue in a distributed system
US9842039B2 (en) 2014-03-31 2017-12-12 Microsoft Technology Licensing, Llc Predictive load scaling for services
WO2015154093A2 (en) 2014-04-05 2015-10-08 Wearable Intelligence Systems and methods for digital workflow and communication
US10360196B2 (en) * 2014-04-15 2019-07-23 Splunk Inc. Grouping and managing event streams generated from captured network data
US10700950B2 (en) 2014-04-15 2020-06-30 Splunk Inc. Adjusting network data storage based on event stream statistics
US10523521B2 (en) 2014-04-15 2019-12-31 Splunk Inc. Managing ephemeral event streams generated from captured network data
US8977728B1 (en) 2014-05-16 2015-03-10 Iboss, Inc. Maintaining IP tables
US9619548B2 (en) * 2014-05-20 2017-04-11 Google Inc. Dimension widening aggregate data
US9692811B1 (en) 2014-05-23 2017-06-27 Amazon Technologies, Inc. Optimization of application parameters
US9674302B1 (en) 2014-06-13 2017-06-06 Amazon Technologies, Inc. Computing resource transition notification and pending state
US9712410B1 (en) 2014-06-25 2017-07-18 Amazon Technologies, Inc. Local metrics in a service provider environment
US9577927B2 (en) 2014-06-30 2017-02-21 Nicira, Inc. Encoding control plane information in transport protocol source port field and applications thereof in network virtualization
WO2016033193A1 (en) 2014-08-26 2016-03-03 Matthew Hayden Harper Distributed input/output architecture for network functions virtualization
KR102295966B1 (en) 2014-08-27 2021-09-01 삼성전자주식회사 Method of Fabricating Semiconductor Devices Using Nanowires
JP6438719B2 (en) 2014-09-24 2018-12-19 株式会社日立製作所 Communication system and communication program
US9935829B1 (en) 2014-09-24 2018-04-03 Amazon Technologies, Inc. Scalable packet processing service
US9935864B2 (en) 2014-09-30 2018-04-03 Splunk Inc. Service analyzer interface
US9825881B2 (en) 2014-09-30 2017-11-21 Sony Interactive Entertainment America Llc Methods and systems for portably deploying applications on one or more cloud systems
US10171371B2 (en) 2014-09-30 2019-01-01 International Business Machines Corporation Scalable metering for cloud service management based on cost-awareness
US9210056B1 (en) 2014-10-09 2015-12-08 Splunk Inc. Service monitoring interface
US10862778B2 (en) 2014-10-30 2020-12-08 Assia Spe, Llc Method and apparatus for providing performance and usage information for a wireless local area network
US9613120B1 (en) 2014-11-11 2017-04-04 Amazon Technologies, Inc. Replicated database startup for common database storage
US10355934B2 (en) 2014-12-03 2019-07-16 Amazon Technologies, Inc. Vertical scaling of computing instances
US10432734B2 (en) 2014-12-12 2019-10-01 Hewlett Packard Enterprise Development Lp Cloud service tuning
CN105763512B (en) 2014-12-17 2019-03-15 新华三技术有限公司 The communication means and device of SDN virtualization network
US9614782B2 (en) 2014-12-23 2017-04-04 Facebook, Inc. Continuous resource pool balancing
US10257156B2 (en) 2014-12-31 2019-04-09 F5 Networks, Inc. Overprovisioning floating IP addresses to provide stateful ECMP for traffic groups
EP3248361B1 (en) 2015-01-19 2019-07-03 Telefonaktiebolaget LM Ericsson (publ) Timers in stateless architecture
US10261851B2 (en) 2015-01-23 2019-04-16 Lightbend, Inc. Anomaly detection using circumstance-specific detectors
CA2975248A1 (en) 2015-01-30 2016-08-04 Nec Corporation Node system, server apparatus, scaling control method, and program
US9608880B1 (en) 2015-02-19 2017-03-28 Dell Products L.P. Systems and methods for real-time performance monitoring
US9467476B1 (en) 2015-03-13 2016-10-11 Varmour Networks, Inc. Context aware microsegmentation
US9825875B2 (en) 2015-03-31 2017-11-21 Alcatel Lucent Method and apparatus for provisioning resources using clustering
US10476797B2 (en) 2015-04-13 2019-11-12 Dell Products L.P. Systems and methods to route over a link aggregation group to a true next hop
US9848041B2 (en) 2015-05-01 2017-12-19 Amazon Technologies, Inc. Automatic scaling of resource instance groups within compute clusters
US10789542B2 (en) 2015-06-05 2020-09-29 Apple Inc. System and method for predicting changes in network quality
US9882830B2 (en) 2015-06-26 2018-01-30 Amazon Technologies, Inc. Architecture for metrics aggregation without service partitioning
US20170041386A1 (en) 2015-08-05 2017-02-09 International Business Machines Corporation Provisioning a target hosting environment
US10313211B1 (en) 2015-08-25 2019-06-04 Avi Networks Distributed network service risk monitoring and scoring
US10594562B1 (en) 2015-08-25 2020-03-17 Vmware, Inc. Intelligent autoscale of services
US9912610B2 (en) 2015-09-24 2018-03-06 Barefoot Networks, Inc. Data-plane stateful processing units in packet processing pipelines
US9531614B1 (en) 2015-10-30 2016-12-27 AppDynamics, Inc. Network aware distributed business transaction anomaly detection
US10419530B2 (en) 2015-11-02 2019-09-17 Telefonaktiebolaget Lm Ericsson (Publ) System and methods for intelligent service function placement and autoscale based on machine learning
US9967275B1 (en) 2015-12-17 2018-05-08 EMC IP Holding Company LLC Efficient detection of network anomalies
US9749888B1 (en) 2015-12-21 2017-08-29 Headspin, Inc. System for network characteristic assessment
US10212041B1 (en) 2016-03-04 2019-02-19 Avi Networks Traffic pattern detection and presentation in container-based cloud computing architecture
US10320681B2 (en) 2016-04-12 2019-06-11 Nicira, Inc. Virtual tunnel endpoints for congestion-aware load balancing
US10469251B2 (en) 2016-05-05 2019-11-05 Auburn University System and method for preemptive self-healing security
US9716617B1 (en) 2016-06-14 2017-07-25 ShieldX Networks, Inc. Dynamic, load-based, auto-scaling network security microservices architecture
US10331590B2 (en) 2016-06-30 2019-06-25 Intel Corporation Graphics processing unit (GPU) as a programmable packet transfer mechanism
US10091093B2 (en) 2016-06-30 2018-10-02 Futurewei Technologies, Inc. Multi-controller control traffic balancing in software defined networks
US10467293B2 (en) 2017-05-18 2019-11-05 Aetna Inc. Scalable distributed computing system for determining exact median and other quantiles in big data applications
US10972437B2 (en) 2016-08-08 2021-04-06 Talari Networks Incorporated Applications and integrated firewall design in an adaptive private network (APN)
US10089135B2 (en) 2016-08-09 2018-10-02 International Business Machines Corporation Expediting the provisioning of virtual machines based on cached repeated portions of a template
US10193749B2 (en) 2016-08-27 2019-01-29 Nicira, Inc. Managed forwarding element executing in public cloud data compute node without overlay network
US10657146B2 (en) 2016-09-26 2020-05-19 Splunk Inc. Techniques for generating structured metrics from ingested events
US20180088935A1 (en) 2016-09-27 2018-03-29 Ca, Inc. Microservices application configuration based on runtime environment
US20180136931A1 (en) 2016-11-14 2018-05-17 Ca, Inc. Affinity of microservice containers
US10372600B2 (en) * 2017-03-01 2019-08-06 Salesforce.Com, Inc. Systems and methods for automated web performance testing for cloud apps in use-case scenarios
US10630543B1 (en) 2017-03-17 2020-04-21 Amazon Technologies, Inc. Wireless mesh network implementation for IOT devices
US10673714B1 (en) 2017-03-29 2020-06-02 Juniper Networks, Inc. Network dashboard with multifaceted utilization visualizations
US10868742B2 (en) 2017-03-29 2020-12-15 Juniper Networks, Inc. Multi-cluster dashboard for distributed virtualization infrastructure element monitoring and policy control
US10848936B2 (en) 2017-04-12 2020-11-24 Aspen Networks, Inc. Predictive flow switching and application continuity in connected vehicle networks
US10873541B2 (en) 2017-04-17 2020-12-22 Microsoft Technology Licensing, Llc Systems and methods for proactively and reactively allocating resources in cloud-based networks
US11855850B2 (en) 2017-04-25 2023-12-26 Nutanix, Inc. Systems and methods for networked microservice modeling and visualization
ES2963965T3 (en) 2017-04-28 2024-04-03 Opanga Networks Inc Domain name tracking system and procedure for network management
US10623470B2 (en) 2017-06-14 2020-04-14 International Business Machines Corporation Optimizing internet data transfers using an intelligent router agent
US10523748B2 (en) 2017-12-22 2019-12-31 A10 Networks, Inc. Managing health status of network devices in a distributed global server load balancing system
US11102118B2 (en) 2018-03-22 2021-08-24 Futurewei Technologies, Inc. System and method for supporting ICN-within-IP networking
US10728121B1 (en) 2018-05-23 2020-07-28 Juniper Networks, Inc. Dashboard for graphic display of computer network topology
US11044180B2 (en) 2018-10-26 2021-06-22 Vmware, Inc. Collecting samples hierarchically in a datacenter
US10887196B2 (en) 2018-11-28 2021-01-05 Microsoft Technology Licensing, Llc Efficient metric calculation with recursive data processing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7636708B2 (en) * 2000-11-10 2009-12-22 Microsoft Corporation Distributed data gathering and aggregation agent
US20150106325A1 (en) * 2012-01-13 2015-04-16 Amazon Technologies, Inc. Distributed storage of aggregated data
US20130346594A1 (en) * 2012-06-25 2013-12-26 International Business Machines Corporation Predictive Alert Threshold Determination Tool
US9626275B1 (en) * 2014-06-05 2017-04-18 Amazon Technologies, Inc. Dynamic rate adjustment for interaction monitoring
US20160125330A1 (en) * 2014-10-31 2016-05-05 AppDynamics, Inc. Rolling upgrade of metric collection and aggregation system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11736372B2 (en) 2018-10-26 2023-08-22 Vmware, Inc. Collecting samples hierarchically in a datacenter
US11909612B2 (en) 2019-05-30 2024-02-20 VMware LLC Partitioning health monitoring in a global server load balancing system
US20220237203A1 (en) * 2021-01-22 2022-07-28 Vmware, Inc. Method and system for efficiently propagating objects across a federated datacenter
US11811861B2 (en) 2021-05-17 2023-11-07 Vmware, Inc. Dynamically updating load balancing criteria
US11792155B2 (en) 2021-06-14 2023-10-17 Vmware, Inc. Method and apparatus for enhanced client persistence in multi-site GSLB deployments
US11799824B2 (en) 2021-06-14 2023-10-24 Vmware, Inc. Method and apparatus for enhanced client persistence in multi-site GSLB deployments
US12107821B2 (en) 2022-07-14 2024-10-01 VMware LLC Two tier DNS

Also Published As

Publication number Publication date
US11283697B1 (en) 2022-03-22

Similar Documents

Publication Publication Date Title
US20220286373A1 (en) Scalable real time metrics management
US11863408B1 (en) Generating event streams including modified network data monitored by remote capture agents
US11314737B2 (en) Transforming event data using values obtained by querying a data source
US10374883B2 (en) Application-based configuration of network data capture by remote capture agents
US10365915B2 (en) Systems and methods of monitoring a network topology
US11537572B2 (en) Multidimensional partition of data to calculate aggregation at scale
US20200081880A1 (en) Real-time Transactionally Consistent Change Notifications
WO2019133763A1 (en) System and method of application discovery
US8589537B2 (en) Methods and computer program products for aggregating network application performance metrics by process pool
US10999168B1 (en) User defined custom metrics
US11275667B2 (en) Handling of workload surges in a software application
US9774662B2 (en) Managing transactional data for high use databases
US11297105B2 (en) Dynamically determining a trust level of an end-to-end link
US20180081894A1 (en) Method and apparatus for clearing data in cloud storage system
US8312138B2 (en) Methods and computer program products for identifying and monitoring related business application processes
US11232106B1 (en) Windowed query with event-based open time for analytics of streaming data
US11609886B2 (en) Mechanism for stream processing efficiency using probabilistic model to reduce data redundancy
Pape et al. Restful correlation and consolidation of distributed logging data in cloud environments
US10949232B2 (en) Managing virtualized computing resources in a cloud computing environment
US11259169B2 (en) Highly scalable home subscriber server
US20240303358A1 (en) Method and system for reconfiguring a data protection module based on metadata
US20240303359A1 (en) Method and system for generating an automatic service request based on metadata
US12061708B1 (en) Remote tracking and identification of key access patterns for databases
Vallath et al. Optimize Distributed Workload

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCV Information on status: appeal procedure

Free format text: NOTICE OF APPEAL FILED

AS Assignment

Owner name: VMWARE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:VMWARE, INC.;REEL/FRAME:066692/0103

Effective date: 20231121

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED