WO2023107937A1 - Systèmes et procédés de collecte et de traitement de télémesure d'application - Google Patents

Systèmes et procédés de collecte et de traitement de télémesure d'application Download PDF

Info

Publication number
WO2023107937A1
WO2023107937A1 PCT/US2022/081007 US2022081007W WO2023107937A1 WO 2023107937 A1 WO2023107937 A1 WO 2023107937A1 US 2022081007 W US2022081007 W US 2022081007W WO 2023107937 A1 WO2023107937 A1 WO 2023107937A1
Authority
WO
WIPO (PCT)
Prior art keywords
telemetry data
telemetry
service level
insights
computer program
Prior art date
Application number
PCT/US2022/081007
Other languages
English (en)
Inventor
Raghu VUDATHU
Original Assignee
Jpmorgan Chase Bank, N.A.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jpmorgan Chase Bank, N.A. filed Critical Jpmorgan Chase Bank, N.A.
Publication of WO2023107937A1 publication Critical patent/WO2023107937A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling

Definitions

  • Embodiments generally relate to systems and methods for collecting and processing application telemetry.
  • API Application Performance Management
  • a method for collecting and processing application telemetry may include: (1) collecting, by a telemetry insights computer program, first telemetry data from computer applications, network appliances, and hardware devices in a distributed architecture; (2) generating, by the telemetry insights computer program and based on the first telemetry data, a central processing unit (CPU) operating service level, a memory operating service level, and a latency service level; (3) collecting, by the telemetry insights computer program, second telemetry data from the computer applications, the network appliances, and the hardware devices; (4) identifying, by the telemetry insights computer program, an anomaly by comparing the second telemetry data to the CPU operating service level, the memory operating service level, and the latency service level; (5) generating, by the telemetry insights computer program, an event for the anomaly and communicating the event to an event manager; and (6) executing, by the event manager, an automated proactive action in response to the anomaly.
  • a central processing unit CPU
  • second telemetry data from the computer applications, the network appliances, and the hardware
  • the first telemetry data and the second telemetry data comprise logs, traces, metrics, and metadata.
  • the method may also include pre-processing, by the telemetry insights computer program, the first telemetry data and the second telemetry data, wherein the pre-processing comprises transforming the first telemetry data and the second telemetry data to a common format, cleansing the first telemetry data and the second telemetry data, and consolidating the first telemetry data and the second telemetry data.
  • the CPU operating service level, the memory operating service level, and the latency service level are generated periodically using the first telemetry data collected since a prior generation of the CPU operating service level, the memory operating service level, and the latency service level.
  • the method may also include generating, by the telemetry insights computer program, a predictive insight for the second telemetry data using a trained machine learning engine, wherein the predictive insight identifies a predicted failure associated with the anomaly.
  • the automated proactive action may include comprises automated healing, automated scaling, or automated disabling.
  • the method may also include: periodically persisting, by the telemetry insights computer program, the second telemetry data; and generating, by the telemetry insights computer program, a long-term insight from the persisted second telemetry data using a trained machine learning engine.
  • a system may include: a plurality of computer applications, each comprising a computer application monitoring agent; a plurality of network applications, each comprising a network application monitoring agent; a plurality of hardware devices, each comprising a hardware device monitoring agent; a telemetry insights computer program executed by an electronic device that receives first telemetry data from the computer application monitoring agents, the network application monitoring agents, and the hardware device monitoring agents; generates, based on the first telemetry data, a central processing unit (CPU) operating service level, a memory operating service level, and a latency service level; collects second telemetry data from the computer application monitoring agents, the network application monitoring agents, and the hardware device monitoring agents; identifies an anomaly by comparing the second telemetry data to the CPU operating service level, the memory operating service level, and the latency service level; and generates an event for the anomaly; and an event manager computer program executed by the electronic device that receives the event and executes an automated proactive action in response to the anomaly.
  • CPU central processing unit
  • the first telemetry data and the second telemetry data comprise logs, traces, metrics, and metadata.
  • the telemetry insights computer program further pre-processes the first telemetry data and the second telemetry data, wherein the pre-processing transforms the first telemetry data and the second telemetry data to a common format, cleanses the first telemetry data and the second telemetry data, and consolidates the first telemetry data and the second telemetry data.
  • the telemetry insights computer program periodically generates the CPU operating service level, the memory operating service level, and the latency service level using the first telemetry data collected since a prior generation of the CPU operating service level, the memory operating service level, and the latency service level.
  • the telemetry insights computer program generates a predictive insight for the second telemetry data using a trained machine learning engine, wherein the predictive insight identifies a predicted failure associated with the anomaly.
  • the automated proactive action comprises automated healing, automated scaling, or automated disabling.
  • the telemetry insights computer program periodically persists the second telemetry data and generates a long-term insight from the persisted second telemetry data using a trained machine learning engine.
  • a non-transitory computer readable storage medium including instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to perform steps comprising: collecting first telemetry data from computer applications, network appliances, and hardware devices in a distributed architecture, wherein the first telemetry data are collected from monitoring agents associated with the computer applications, the network appliances, and the hardware devices; generating, based on the first telemetry data, a central processing unit (CPU) operating service level, a memory operating service level, and a latency service level; collecting second telemetry data from the computer applications, the network appliances, and the hardware devices, wherein the second telemetry data are collected from the monitoring agents associated with the computer applications, the network appliances, and the hardware devices; identifying an anomaly by comparing the second telemetry data to the CPU operating service level, the memory operating service level, and the latency service level; generating an event for the anomaly and communicating the event to an event manager; and executing an automated proactive action in response
  • the non-transitory computer readable storage medium may also include instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to pre-process the first telemetry data and the second telemetry data, wherein the pre-processing comprises transforming the first telemetry data and the second telemetry data to a common format, cleansing the first telemetry data and the second telemetry data, and consolidating the first telemetry data and the second telemetry data.
  • the CPU operating service level, the memory operating service level, and the latency service level are generated periodically using the first telemetry data collected since a prior generation of the CPU operating service level, the memory operating service level, and the latency service level.
  • the non-transitory computer readable storage medium may also include instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to generate a predictive insight for the second telemetry data using a trained machine learning engine, wherein the predictive insight identifies a predicted failure associated with the anomaly, wherein the automated proactive action comprises automated healing, automated scaling, or automated disabling.
  • the non-transitory computer readable storage medium may also include instructions stored thereon, which when read and executed by one or more computer processors, cause the one or more computer processors to periodically persist the second telemetry data and to generate a long-term insight from the persisted second telemetry data using a trained machine learning engine.
  • Figure 1 depicts a system for collecting and processing application telemetry according to an embodiment
  • Figure 2 depicts a method for collecting and processing application telemetry according to an embodiment
  • Figure 3 depicts an exemplary computing system for implementing aspects of the present disclosure.
  • Embodiments are directed to systems and methods for collecting and processing application telemetry.
  • Embodiments may reduce the MTTD for all distributed applications. For example, embodiments may identify the root cause of an issue during an incident and may provide data to help understand or identify the application, system, Application Programming Interface (API) and/or the appliance that is causing the problem.
  • Embodiments may provide end-to-end visibility of a customer request (e.g., who is performing a certain action or using a business feature) across application boundaries (e.g., across product flow), including visibility into components that the application owner does not own (e.g., cloud services or components, third parties, supporting upstream systems, supporting downstream systems, etc.).
  • Embodiments may provide early detection and near real time prediction of a system degrading based on the telemetry data.
  • Embodiments may provide support for legacy applications. For example, embodiments may identify critical legacy applications that need to be instrumented irrespective of the status (maintain / divest), and instrument the applications.
  • Embodiments may provide a standard prescription for developers to consistently follow on how to instrument applications, including legacy and modern applications, during development - “developer to-do for telemetry.”
  • Embodiments may publish insights to product teams, architects, engineers, etc. on the feature usage, to drive business and/or technical decisions on building new features, enhancements and decommissioning existing services.
  • Embodiments may be proactive instead of reactive, and may detect trends proactively and act in advance than reacting to the incident after the system stopped working. Embodiments may generate alerts proactively before the system degrades or when it shows the signs of degradation.
  • Embodiments may use open standards-based frameworks (e.g., OpenTelemetry, OpenTracing, OpenCensus, etc.) to prevent vendor lock-ins.
  • Embodiments may use shift-left observability and telemetry during development. Developers may see the same exact data that they would see during an incident (e.g., logs, metrics, dashboards, etc.) and validate in lower environments during development.
  • Embodiments may provide the ability for development teams to set up thresholds to receive appropriate notification, including soft alerts (e.g., when the system performance is degrading) and hard alerts (e.g., when the system is no longer responding), etc. Alerts may be provided in real time.
  • soft alerts e.g., when the system performance is degrading
  • hard alerts e.g., when the system is no longer responding
  • Embodiments may provide consistent dashboards for every application, “out of the box”, without any additional work. Teams/engineers may build custom dashboards for special use cases or needs as is necessary and/or required.
  • Embodiments may instrument an application with agents, such as home grown, open-source, or commercial agents, or agentless frameworks with minimal coding and custom implementation.
  • agents such as home grown, open-source, or commercial agents, or agentless frameworks with minimal coding and custom implementation.
  • agentless frameworks with minimal coding and custom implementation.
  • low code or no code programing techniques may be used.
  • Embodiments may provide the ability to trace and track all requests, from all customers, all the time, to identify, and measure the performance of the systems (SLI’s and SLO’s).
  • Embodiments may provide the ability to tag every customer request with an immutable unique id (e.g., a trace-id) and send it to every system that the request traverses in order to fulfill the customer request.
  • an immutable unique id e.g., a trace-id
  • the request may be provided to all the downstream systems starting with the system where the customer request originated.
  • Embodiments may provide the ability for every system involved in the customer flow to add metadata (e.g., application id, environment, API name, microservice name, application name, IP address, host name, etc.) so insights can be derived from the metadata.
  • metadata e.g., application id, environment, API name, microservice name, application name, IP address, host name, etc.
  • Embodiments may provide the ability to capture consistent metrics (e.g., golden signals, such as latency, error rate, traffic, saturation) from all the systems involved in the customer flow.
  • consistent metrics e.g., golden signals, such as latency, error rate, traffic, saturation
  • Embodiments may provide observability may be based on customer, flows, customer journeys, etc. and not just independent applications.
  • Embodiments may generate insights based on artificial intelligence and/or machine learning predictive analytics models that may constantly evaluate data in real-time or near real-time. Notifications may be predicted and provided during the degradation process instead of after the degradation process.
  • Embodiments may generate and auto- configure alerts (e.g., soft and hard) at various times pre, during and post degradation process.
  • alerts e.g., soft and hard
  • Embodiments may provide a birds-eye view of the overall health of distributed systems with custom dashboards.
  • System 100 may include electronic device 110, which may any suitable electronic device, including servers (e.g., physical and/or cloud-based), computers (e.g., workstations, desktops, laptops, notebooks, tablets, etc.), smart devices (e.g., smart phones, smart watches, etc.), Internet of Things (IoT) appliances, etc.
  • servers e.g., physical and/or cloud-based
  • computers e.g., workstations, desktops, laptops, notebooks, tablets, etc.
  • smart devices e.g., smart phones, smart watches, etc.
  • IoT Internet of Things
  • Electronic device 110 may execute telemetry insights computer program 115, which may collect telemetry from application monitoring agents 125 for computer programs or applications 120, network appliance monitoring agents 135 from network appliances 130, and hardware monitoring device agents 145 for hardware devices 140.
  • Application monitoring agents 125 may monitor application telemetry from one or more computer programs or applications 120.
  • Network appliance monitoring agents 135 may monitor network telemetry from network appliances 130, such as hubs, routers, switches, etc.
  • hardware device monitoring agents 145 may monitor telemetry from hardware devices 140 within the network, such as central processing units, memory, storage, etc.
  • Telemetry insights computer program 115 may be provided in a data stream between computer programs or applications 120, network appliances 130, and devices 140 and event persistence computer program 150.
  • the application telemetry, network telemetry, and hardware telemetry may be received and processed by telemetry insights computer program 115.
  • Telemetry insights computer program 115 may determine an operating service level provided by computer programs or applications 120, network appliances 130, and hardware devices 140.
  • the operating service level may monitor, for example, CPU usage, memory usage, latency, etc.
  • the operating service level may be periodically determined based on real-time data flow (e.g., the operating service level may be calculated every five minutes, or some configurable time interval, based on telemetry received in the preceding five minutes).
  • Telemetry insights computer program 115 may be a single stream or multiple streams, or a single program or multiple programs.
  • Telemetry insights computer program 115 may transform, format, cleanse, and consolidate telemetry data as it is received. Telemetry insights computer program 115 may receive telemetry data in various formats and it may enrich and normalize to a standard format to make it efficient for machine learning (ML) models or programs to train and learn from the data.
  • ML machine learning
  • telemetry insights computer program 115 may monitor incoming telemetry data for anomalies. Once an anomaly is identified, telemetry insights computer program 115 may generate and communicate an event to event manager 160. It may also output an event, information, etc. related to the processing of the telemetry to user interface 185 (e.g., an application, a browser, etc.) executed by user electronic device 180 using any suitable network (e.g., LANs, WANs, the Internet, combinations, etc.). Although only one user electronic device is depicted in Figure 1, it should be understood that more than one user electronic device 180 may be provided.
  • LANs local area network
  • WANs wide area network
  • User electronic device 180 may include, for example, computers (e.g., workstations, desktops, laptops, notebooks, tablets, etc.), smart devices (e.g., smart phones, smart watches, etc.), Internet of Things (loT) appliances, etc.
  • computers e.g., workstations, desktops, laptops, notebooks, tablets, etc.
  • smart devices e.g., smart phones, smart watches, etc.
  • Internet of Things (loT) appliances e.g., etc.
  • Telemetry insights computer program 115 may also output the telemetry data to event persistence computer program 150, which may identify longer-term operating service levels from the telemetry (e.g., minutes, hourly, daily, weekly, monthly, yearly, etc.). Event persistence computer program 150 may also be executed by electronic device 110.
  • event persistence computer program 150 may identify longer-term operating service levels from the telemetry (e.g., minutes, hourly, daily, weekly, monthly, yearly, etc.).
  • Event persistence computer program 150 may also be executed by electronic device 110.
  • Event persistence computer program 150 may persist the telemetry at an application level, at a network level, and at a hardware level. To persist telemetry data, event persistence computer program 150 may use one or more persistent data stores 190 to store different types of telemetry data. For example, event persistence computer program 150 may use a time series persistent store to store metrics, a SQL or a No SQL database to store logs, a graph database to store traces, and any other persistent store that may be suitable for the purpose.
  • Event persistence computer program 150 may output events to event manager 160, which may also be executed by electronic device 110. Events may occur in response to an anomaly. Event manager 160 may receive the event and may take an action, such as automated healing (e.g., transfer the application to a new or different cloud in response to an anticipated cloud failure), automated scaling (e.g., spin up a new platform in response to high CPU usage), reacting (e.g., disable a region or hardware in response to a failure), etc. Event manager 160 may also generate longer time insights based on the event data received.
  • automated healing e.g., transfer the application to a new or different cloud in response to an anticipated cloud failure
  • automated scaling e.g., spin up a new platform in response to high CPU usage
  • reacting e.g., disable a region or hardware in response to a failure
  • Event manager 160 may also generate longer time insights based on the event data received.
  • Event viewer 170 may provide user interface 185 with the ability to view an event at, for example, the application level, the network level, and/or the hardware level. Event viewer 170 may also present consolidated event data based on a combination of two or more of the application level, network level, and hardware level.
  • telemetry data such as logs, traces, metrics, metadata, etc. may be collected from all applications, network appliances, hosts, etc. in a distributed architecture.
  • monitoring agents such as computer program or application monitoring agents, network appliance monitoring agents, hardware device monitoring agents, etc. may collect the telemetry data and provide the telemetry data to a telemetry insights computer program.
  • the telemetry data may be collected in an open standard format. If any of the applications, network appliances, hosts, etc. do not publish telemetry data in the open standard format, a translator may be provided to translate the telemetry data into the open standard format.
  • the telemetry data may be streamed to the telemetry insights computer program.
  • the telemetry insights computer program may receive the telemetry data, and may pre-process the telemetry data. For example, the telemetry insights computer program may transform the telemetry data to a common format, may format the telemetry data, may cleanse the telemetry data, may enrich the data, and may consolidate the telemetry data.
  • the telemetry insights computer program may review the collected telemetry data and may generate an operating service level for CPU, memory, latency, etc.
  • the operating service level may be generated periodically, such as every five minutes, based on data received since the prior operating service level was generated.
  • the telemetry insights computer program may train one or more machine learning models using the data generated by data systems.
  • telemetry insights computer program may use mathematical models of data to help a computer learn without explicit or direct instructions. This enables the program to learn and improve on its own, based on the experience.
  • the trained machine learning models may review the collected telemetry data that may be received in a standard format or different formats from each program/ application/ device/ appliance/ database and constantly adjust the baseline SLO’s, SLI’s, performance indicators, resource metrics etc. for each program/ application/ device/ appliance/ database.
  • the telemetry insights computer program may continue to monitor telemetry data as it is received.
  • the telemetry insights computer program may compare the incoming telemetry data to the operating service level to identify an anomaly. If an anomaly is identified, such as the incoming telemetry data being outside the scope of the operating service level, in step 230, the telemetry insights computer program may generate an event and may communicate the event to an event manager.
  • the telemetry insights computer program may also generate a predictive insight using, for example, a trained machine learning engine.
  • the predictive insight may identify a predicted failure associated with the anomaly.
  • artificial intelligence may be used with the trained machine learning models to evolve the system.
  • Artificial intelligence may be used for predictive maintenance (e.g., sending alerts and reacting to alerts, auto healing systems, etc.) to establish the baselines for each program, API, hardware component, network appliance, device metric (e.g., CPU, memory, bandwidth, limits) etc.
  • the telemetry insights computer program may generate an alert and may notify the appropriate individuals, such as application developer engineers, site reliability engineering (SRE) engineers, production support engineers, etc. to act and avoid an undesired impact, such as a customer impact.
  • individuals such as application developer engineers, site reliability engineering (SRE) engineers, production support engineers, etc. to act and avoid an undesired impact, such as a customer impact.
  • SRE site reliability engineering
  • the event manager may receive the event and may take a proactive action, automated healing (e.g., transfer the application to a new or different cloud in response to an anticipated cloud failure), automated scaling (e.g., spin up a new platform in response to high CPU usage), reacting (e.g., disable a region or hardware in response to a failure), etc.
  • automated healing e.g., transfer the application to a new or different cloud in response to an anticipated cloud failure
  • automated scaling e.g., spin up a new platform in response to high CPU usage
  • reacting e.g., disable a region or hardware in response to a failure
  • the telemetry insights computer program may persist data at application, network, and hardware levels.
  • the event manager may generate longer-term insights based on persisted data using the trained machine learning engine above, or a different trained machine algorithm.
  • the telemetry insights computer program may generate short term insights and notify users if and when the system is degrading.
  • the intent and purpose of the algos in step 245 is to generate long term insights for data more than an hour, day, week, month, year etc.
  • the persisted data is made available at application, network, hardware, and consolidated levels.
  • visualizations may be generated manually on- demand or automatically based on the raw data in the persistent store.
  • the visualization may span across data centers, clouds, and servers, to give a transactional view of the customer request to the application developer engineers, system engineers, and production support engineers.
  • the visualizations may provide the ability to correlate data between logs, traces, and metrics using the metadata to detect the issue or investigate the issue in very short duration.
  • past events, notifications, etc. may be made available as necessary and/or desired.
  • Figure 3 depicts an exemplary computing system for implementing aspects of the present disclosure.
  • Figure 3 depicts exemplary computing device 300.
  • Computing device 300 may represent the system components described herein.
  • Computing device 300 may include processor 305 that may be coupled to memory 310.
  • Memory 310 may include volatile memory.
  • Processor 305 may execute computer-executable program code stored in memory 310, such as software programs 315.
  • Software programs 315 may include one or more of the logical steps disclosed herein as a programmatic instruction, which may be executed by processor 305.
  • Memory 310 may also include data repository 320, which may be nonvolatile memory for data persistence.
  • Processor 305 and memory 310 may be coupled by bus 330.
  • Bus 330 may also be coupled to one or more network interface connectors 340, such as wired network interface 342 or wireless network interface 344.
  • Computing device 300 may also have user interface components, such as a screen for displaying graphical user interfaces and receiving input from the user, a mouse, a keyboard and/or other input/output components (not shown).
  • Embodiments of the system or portions of the system may be in the form of a “processing machine,” such as a general-purpose computer, for example.
  • the term “processing machine” is to be understood to include at least one processor that uses at least one memory.
  • the at least one memory stores a set of instructions.
  • the instructions may be either permanently or temporarily stored in the memory or memories of the processing machine.
  • the processor executes the instructions that are stored in the memory or memories in order to process data.
  • the set of instructions may include various instructions that perform a particular task or tasks, such as those tasks described above. Such a set of instructions for performing a particular task may be characterized as a program, software program, or simply software.
  • the processing machine may be a specialized processor.
  • the processing machine may be a cloud-based processing machine, a physical processing machine, or combinations thereof.
  • the processing machine executes the instructions that are stored in the memory or memories to process data.
  • This processing of data may be in response to commands by a user or users of the processing machine, in response to previous processing, in response to a request by another processing machine and/or any other input, for example.
  • the processing machine used to implement embodiments may be a general-purpose computer.
  • the processing machine described above may also utilize any of a wide variety of other technologies including a special purpose computer, a computer system including, for example, a microcomputer, mini-computer or mainframe, a programmed microprocessor, a micro- controller, a peripheral integrated circuit element, a CSIC (Customer Specific Integrated Circuit) or ASIC (Application Specific Integrated Circuit) or other integrated circuit, a logic circuit, a digital signal processor, a programmable logic device such as a FPGA (Field- Programmable Gate Array), PLD (Programmable Logic Device), PLA (Programmable Logic Array), or PAL (Programmable Array Logic), or any other device or arrangement of devices that is capable of implementing the steps of the processes disclosed herein.
  • a programmable logic device such as a FPGA (Field- Programmable Gate Array), PLD (Programmable Logic Device), PLA (Programmable Logic Array), or PAL (Programm
  • the processing machine used to implement embodiments may utilize a suitable operating system.
  • each of the processors and/or the memories of the processing machine may be located in geographically distinct locations and connected so as to communicate in any suitable manner.
  • each of the processor and/or the memory may be composed of different physical pieces of equipment. Accordingly, it is not necessary that the processor be one single piece of equipment in one location and that the memory be another single piece of equipment in another location. That is, it is contemplated that the processor may be two pieces of equipment in two different physical locations. The two distinct pieces of equipment may be connected in any suitable manner. Additionally, the memory may include two or more portions of memory in two or more physical locations.
  • processing is performed by various components and various memories.
  • processing performed by two distinct components as described above in accordance with a further embodiment, may be performed by a single component.
  • processing performed by one distinct component as described above may be performed by two distinct components.
  • the memory storage performed by two distinct memory portions as described above may be performed by a single memory portion. Further, the memory storage performed by one distinct memory portion as described above may be performed by two memory portions.
  • various technologies may be used to provide communication between the various processors and/or memories, as well as to allow the processors and/or the memories to communicate with any other entity; i.e., so as to obtain further instructions or to access and use remote memory stores, for example.
  • Such technologies used to provide such communication might include a network, the Internet, Intranet, Extranet, a LAN, an Ethernet, wireless communication via cell tower or satellite, or any client server system that provides communication, for example.
  • Such communications technologies may use any suitable protocol such as TCP/IP, UDP, or OSI, for example.
  • a set of instructions may be used in the processing of embodiments.
  • the set of instructions may be in the form of a program or software.
  • the software may be in the form of system software or application software, for example.
  • the software might also be in the form of a collection of separate programs, a program module within a larger program, or a portion of a program module, for example.
  • the software used might also include modular programming in the form of object-oriented programming. The software tells the processing machine what to do with the data being processed.
  • the instructions or set of instructions used in the implementation and operation of embodiments may be in a suitable form such that the processing machine may read the instructions.
  • the instructions that form a program may be in the form of a suitable programming language, which is converted to machine language or object code to allow the processor or processors to read the instructions. That is, written lines of programming code or source code, in a particular programming language, are converted to machine language using a compiler, assembler or interpreter.
  • the machine language is binary coded machine instructions that are specific to a particular type of processing machine, i.e., to a particular type of computer, for example. The computer understands the machine language.
  • any suitable programming language may be used in accordance with the various embodiments.
  • the instructions and/or data used in the practice of embodiments may utilize any compression or encryption technique or algorithm, as may be desired.
  • An encryption module might be used to encrypt data.
  • files or other data may be decrypted using a suitable decryption module, for example.
  • the embodiments may illustratively be embodied in the form of a processing machine, including a computer or computer system, for example, that includes at least one memory.
  • the set of instructions i.e., the software for example, that enables the computer operating system to perform the operations described above may be contained on any of a wide variety of media or medium, as desired.
  • the data that is processed by the set of instructions might also be contained on any of a wide variety of media or medium. That is, the particular medium, i.e., the memory in the processing machine, utilized to hold the set of instructions and/or the data used in embodiments may take on any of a variety of physical forms or transmissions, for example.
  • the medium may be in the form of a compact disc, a DVD, an integrated circuit, a hard disk, a floppy disk, an optical disc, a magnetic tape, a RAM, a ROM, a PROM, an EPROM, a wire, a cable, a fiber, a communications channel, a satellite transmission, a memory card, a SIM card, or other remote transmission, as well as any other medium or source of data that may be read by the processors.
  • the memory or memories used in the processing machine that implements embodiments may be in any of a wide variety of forms to allow the memory to hold instructions, data, or other information, as is desired.
  • the memory might be in the form of a database to hold data.
  • the database might use any desired arrangement of files such as a flat file arrangement or a relational database arrangement, for example.
  • a variety of “user interfaces” may be utilized to allow a user to interface with the processing machine or machines that are used to implement embodiments.
  • a user interface includes any hardware, software, or combination of hardware and software used by the processing machine that allows a user to interact with the processing machine.
  • a user interface may be in the form of a dialogue screen for example.
  • a user interface may also include any of a mouse, touch screen, keyboard, keypad, voice reader, voice recognizer, dialogue screen, menu box, list, checkbox, toggle switch, a pushbutton or any other device that allows a user to receive information regarding the operation of the processing machine as it processes a set of instructions and/or provides the processing machine with information.
  • the user interface is any device that provides communication between a user and a processing machine.
  • the information provided by the user to the processing machine through the user interface may be in the form of a command, a selection of data, or some other input, for example.
  • a user interface is utilized by the processing machine that performs a set of instructions such that the processing machine processes data for a user.
  • the user interface is typically used by the processing machine for interacting with a user either to convey information or receive information from the user.
  • the user interface might interact, i.e., convey and receive information, with another processing machine, rather than a human user. Accordingly, the other processing machine might be characterized as a user.
  • a user interface utilized in the system and method may interact partially with another processing machine or processing machines, while also interacting partially with a human user.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Un procédé de collecte et de traitement de télémesure d'application peut comprendre : la collecte, par un programme informatique d'aperçus de télémesure, de premières données de télémesure à partir d'applications informatiques, d'appareils de réseau et de dispositifs matériels dans une architecture distribuée ; la génération, par le programme informatique d'aperçus de télémesure et sur la base des premières données de télémesure, d'un niveau de service de fonctionnement d'unité centrale, d'un niveau de service de fonctionnement de mémoire et d'un niveau de service de latence ; la collecte, par le programme informatique d'aperçus de télémesure, de secondes données de télémesure à partir des applications informatiques, des appareils de réseau et des dispositifs matériels ; l'identification, par le programme informatique d'aperçus de télémesure, d'une anomalie par comparaison des secondes données de télémesure au niveau de service de fonctionnement d'unité centrale, au niveau de service de fonctionnement de mémoire et au niveau de service de latence ; la génération, par le programme informatique d'aperçus de télémesure, d'un événement pour l'anomalie et la communication de l'événement à un gestionnaire d'événements ; et l'exécution, par le gestionnaire d'événements, d'une action proactive automatisée en réponse à l'anomalie.
PCT/US2022/081007 2021-12-06 2022-12-06 Systèmes et procédés de collecte et de traitement de télémesure d'application WO2023107937A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163265013P 2021-12-06 2021-12-06
US63/265,013 2021-12-06

Publications (1)

Publication Number Publication Date
WO2023107937A1 true WO2023107937A1 (fr) 2023-06-15

Family

ID=85018898

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/081007 WO2023107937A1 (fr) 2021-12-06 2022-12-06 Systèmes et procédés de collecte et de traitement de télémesure d'application

Country Status (1)

Country Link
WO (1) WO2023107937A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150269050A1 (en) * 2014-03-18 2015-09-24 Microsoft Corporation Unsupervised anomaly detection for arbitrary time series
US20210279632A1 (en) * 2020-03-04 2021-09-09 Cisco Technology, Inc. Using raw network telemetry traces to generate predictive insights using machine learning
US20210374027A1 (en) * 2018-05-02 2021-12-02 Visa International Service Association Self-learning alerting and anomaly detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150269050A1 (en) * 2014-03-18 2015-09-24 Microsoft Corporation Unsupervised anomaly detection for arbitrary time series
US20210374027A1 (en) * 2018-05-02 2021-12-02 Visa International Service Association Self-learning alerting and anomaly detection
US20210279632A1 (en) * 2020-03-04 2021-09-09 Cisco Technology, Inc. Using raw network telemetry traces to generate predictive insights using machine learning

Similar Documents

Publication Publication Date Title
US11586972B2 (en) Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
US10817803B2 (en) Data driven methods and systems for what if analysis
US20200067789A1 (en) Systems and methods for distributed systemic anticipatory industrial asset intelligence
US11086755B2 (en) System and method for implementing an application monitoring tool
US11968264B2 (en) Systems and methods for operation management and monitoring of bots
US20210097431A1 (en) Debugging and profiling of machine learning model training
US11449798B2 (en) Automated problem detection for machine learning models
CN107704387B (zh) 用于系统预警的方法、装置、电子设备及计算机可读介质
US11165799B2 (en) Anomaly detection and processing for seasonal data
US11108835B2 (en) Anomaly detection for streaming data
US11934290B2 (en) Interactive model performance monitoring
US20180316743A1 (en) Intelligent data transmission by network device agent
US11468365B2 (en) GPU code injection to summarize machine learning training data
US20160139961A1 (en) Event summary mode for tracing systems
US11599404B2 (en) Correlation-based multi-source problem diagnosis
US20220179764A1 (en) Multi-source data correlation extraction for anomaly detection
US11212162B2 (en) Bayesian-based event grouping
WO2023107937A1 (fr) Systèmes et procédés de collecte et de traitement de télémesure d'application
US11809267B2 (en) Root cause analysis of computerized system anomalies based on causal graphs
US20180032393A1 (en) Self-healing server using analytics of log data
Sharma et al. Scalable microservice forensics and stability assessment using variational autoencoders
US11971809B2 (en) Systems and methods for testing components or scenarios with execution history
CN113112038B (zh) 智能监测与诊断分析系统、装置、电子设备及存储介质
US20240012731A1 (en) Detecting exceptional activity during data stream generation
US10191764B2 (en) Agent-based end-to-end transaction analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22847059

Country of ref document: EP

Kind code of ref document: A1