JP2018513500A

JP2018513500A - System and method for handling events involving computing systems and networks using a fabric monitoring system

Info

Publication number: JP2018513500A
Application number: JP2017555525A
Authority: JP
Inventors: アンダーソン，ロバート; ベルマン，イリア; ビリス，キース; ジョシ，アモール
Original assignee: ゴールドマンサックスアンドカンパニーエルエルシー
Priority date: 2015-04-24
Filing date: 2016-04-21
Publication date: 2018-05-24
Anticipated expiration: 2036-04-21
Also published as: CN107683467B; EP3286656A4; CA2983306C; EP3286656B1; US10652103B2; JP6767387B2; WO2016172300A1; AU2016252639B2; EP3286656A1; HK1244900A1; CA2983306A1; ES2782207T3; CN107683467A; US20160315822A1; AU2016252639A1

Abstract

方法が、ファブリック監視システム（１１０）において、複数のコンピューティング又はネットワーキングシステム（１０２ａ〜１０２ｎ）を有するエンタープライズシステム内のイベントの発生を識別する情報を受信するステップ（７０２）を含む。イベントは、コンピューティング又はネットワーキングシステム内のコンピューティング又はネットワーキング装置（１０４、１０６）に発生し、あるいは関与し、イベントは、ファブリック監視システムによりアクセス可能なルールを用いて識別される。上記方法は、ファブリック監視システムを用いて、上記情報をリアルタイムで処理してイベントを発生を識別し、かつイベントを複数の状況に割り当てるステップ（７０４、７０８）をさらに含む。イベントは、ファブリック監視システムによりアクセス可能な１つ以上の処理モデルを用いて上記状況に割り当てられる。上記方法は、上記状況を識別する情報を出力するステップ（７１０）をさらに含む。The method includes receiving (702) information identifying an occurrence of an event in an enterprise system having a plurality of computing or networking systems (102a-102n) at the fabric monitoring system (110). The event occurs or participates in a computing or networking device (104, 106) within the computing or networking system, and the event is identified using rules accessible by the fabric monitoring system. The method further includes the steps (704, 708) of processing the information in real time to identify the occurrence of the event and assigning the event to multiple situations using a fabric monitoring system. Events are assigned to the above situation using one or more processing models accessible by the fabric monitoring system. The method further includes outputting (710) information identifying the situation.

Description

本開示は、コンピューティングシステムに一般に関する。より詳細には、本開示は、ファブリック監視システムを用いてコンピューティングシステム及びネットワークに関与するイベントを扱うシステム及び方法に関する。 The present disclosure relates generally to computing systems. More particularly, this disclosure relates to systems and methods for handling events involving computing systems and networks using a fabric monitoring system.

ビジネス、行政、及び他の組織は、広範囲の地理的エリアにわたり分散された非常に多数のコンピューティング及びネットワーキング装置をしばしば有する。例えば、大きな多国籍企業は、数万のコンピューティング及びネットワーキング装置を各々が有する複数のデータセンターと、少数のコンピューティング又はネットワーキング装置から何千のコンピューティング又はネットワーキング装置に及ぶ世界中の様々なオフィスとを有することがある。各コンピューティング又はネットワーキング装置は、必要に応じて追跡、調査、及び解決される必要がある、あり得る異常又は他のイベントのソースを表す。しかしながら、組織の大きさがそのコンピューティングシステム及びネットワークと共に成長するとき、これらイベントを扱うことは組織のますますより多くの時間及びリソースを消費する可能性がある。 Businesses, governments, and other organizations often have a large number of computing and networking devices distributed over a wide geographic area. For example, a large multinational company has multiple data centers, each with tens of thousands of computing and networking devices, and various offices around the world ranging from a small number of computing or networking devices to thousands of computing or networking devices. And may have. Each computing or networking device represents a source of possible abnormalities or other events that need to be tracked, investigated, and resolved as needed. However, handling the events can consume more and more time and resources of the organization as the size of the organization grows with its computing systems and networks.

本開示は、ファブリック監視システムを用いてコンピューティングシステム及びネットワークに関与するイベントを扱うシステム及び方法を提供する。 The present disclosure provides systems and methods for handling events involving computing systems and networks using a fabric monitoring system.

第１の実施形態において、方法が、ファブリック監視システムにおいて、複数のコンピューティング又はネットワーキングシステムを有するエンタープライズシステム内のイベントの発生を識別する情報を受信するステップを含む。イベントは、コンピューティング又はネットワーキングシステム内のコンピューティング又はネットワーキング装置に発生し、あるいは関与し、イベントは、ファブリック監視システムによりアクセス可能なルールを用いて識別される。上記方法は、ファブリック監視システムを用いて、上記情報をリアルタイムで処理してイベントの発生を識別し、イベントを複数の状況に割り当てるステップをさらに含む。イベントは、ファブリック監視システムによりアクセス可能な１つ以上の処理モデルを用いて状況に割り当てられる。上記方法は、状況を識別する情報を出力するステップをさらに含む。 In a first embodiment, a method includes receiving information identifying an occurrence of an event in an enterprise system having a plurality of computing or networking systems in a fabric monitoring system. An event occurs or participates in a computing or networking device within a computing or networking system, and the event is identified using rules accessible by the fabric monitoring system. The method further includes using the fabric monitoring system to process the information in real time to identify the occurrence of the event and assign the event to multiple situations. Events are assigned to situations using one or more processing models accessible by the fabric monitoring system. The method further includes outputting information identifying the situation.

第２の実施形態において、システムが、複数のコンピューティングノードと該コンピューティングノードを結合する複数の通信リンクとを有するファブリック監視システムを含む。ファブリック監視システムは、複数のコンピューティング又はネットワーキングシステムを有するエンタープライズシステム内のイベントの発生を識別する情報を受信するように構成される。イベントは、コンピューティング又はネットワーキングシステム内のコンピューティング又はネットワーキング装置に発生し、あるいは関与し、イベントは、ファブリック監視システムによりアクセス可能なルールを用いて識別される。ファブリック監視システムは、上記情報をリアルタイムで処理してイベントの発生を識別し、イベントを複数の状況に割り当てるようにさらに構成される。イベントは、ファブリック監視システムによりアクセス可能な１つ以上の処理モデルを用いて状況に割り当てられる。ファブリック監視システムは、状況を識別する情報を出力するようにさらに構成される。 In a second embodiment, a system includes a fabric monitoring system having a plurality of computing nodes and a plurality of communication links that couple the computing nodes. The fabric monitoring system is configured to receive information identifying the occurrence of an event in an enterprise system having a plurality of computing or networking systems. An event occurs or participates in a computing or networking device within a computing or networking system, and the event is identified using rules accessible by the fabric monitoring system. The fabric monitoring system is further configured to process the information in real time to identify the occurrence of the event and assign the event to multiple situations. Events are assigned to situations using one or more processing models accessible by the fabric monitoring system. The fabric monitoring system is further configured to output information identifying the situation.

第３の実施形態において、非一時的コンピュータ読取可能媒体がコンピュータ読取可能プログラムコードを含み、上記コンピュータ読取可能プログラムコードは、ファブリック監視システムのコンピューティングノードにより実行されるときにコンピューティングノードに、複数のコンピューティング又はネットワーキングシステムを有するエンタープライズシステム内のイベントの発生を識別する情報を受信することをさせる。イベントは、コンピューティング又はネットワーキングシステム内のコンピューティング又はネットワーキング装置に発生し、あるいは関与し、イベントは、ファブリック監視システムによりアクセス可能なルールを用いて識別される。上記コンピュータ読取可能プログラムコードは、ファブリック監視システムのコンピューティングノードにより実行されるときにコンピューティングノードに、上記情報をリアルタイムで処理してイベントの発生を識別し、イベントを複数の状況に割り当てることをさらにさせる。イベントは、ファブリック監視システムによりアクセス可能な１つ以上の処理モデルを用いて状況に割り当てられる。上記コンピュータ読取可能プログラムコードは、ファブリック監視システムのコンピューティングノードにより実行されるときにコンピューティングノードに、状況を識別する情報を出力することをさらにさせる。 In a third embodiment, a non-transitory computer readable medium includes computer readable program code, the computer readable program code being stored on a computing node when executed by a computing node of a fabric monitoring system. And receiving information identifying the occurrence of an event in an enterprise system having a computing or networking system. An event occurs or participates in a computing or networking device within a computing or networking system, and the event is identified using rules accessible by the fabric monitoring system. The computer readable program code, when executed by a computing node of the fabric monitoring system, causes the computing node to process the information in real time to identify the occurrence of the event and assign the event to multiple situations. Let me further. Events are assigned to situations using one or more processing models accessible by the fabric monitoring system. The computer readable program code further causes the computing node to output information identifying the situation when executed by the computing node of the fabric monitoring system.

他の技術的特徴が下記の図面、説明、及び請求項から当業者に容易に明らかになるであろう。 Other technical features will be readily apparent to one skilled in the art from the following figures, descriptions, and claims.

本開示及びその特徴のより完全な理解のために、次に、添付図面と併せて下記説明に対して参照が行われる。
本開示によるファブリック監視システムを用いてコンピューティングシステム及びネットワークに関与するイベントを扱う一例示的なシステムを示す。本開示によるファブリック監視システムを用いてコンピューティングシステム及びネットワークに関与するイベントを扱うシステムに関連付けられた一例示的なコンピューティング装置を示す。本開示によるコンピューティングシステム及びネットワークに関与するイベントを扱う一例示的なファブリック監視システム並びに関連した詳細を示す。本開示によるコンピューティングシステム及びネットワークに関与するイベントを扱う一例示的なファブリック監視システム並びに関連した詳細を示す。本開示によるコンピューティングシステム及びネットワークに関与するイベントを扱う一例示的なファブリック監視システム並びに関連した詳細を示す。本開示によるコンピューティングシステム及びネットワークに関与するイベントを扱う一例示的なファブリック監視システム並びに関連した詳細を示す。本開示に従うファブリック監視システムを用いてコンピューティングシステム及びネットワークに関与するイベントを扱うシステムにおける例示的な処理フローを示す。本開示に従うファブリック監視システムを用いてコンピューティングシステム及びネットワークに関与するイベントを扱うシステムにおける例示的な処理フローを示す。 For a more complete understanding of the present disclosure and its features, reference is now made to the following description, taken in conjunction with the accompanying drawings.
1 illustrates an exemplary system for handling events involving a computing system and a network using a fabric monitoring system according to the present disclosure. 1 illustrates an example computing device associated with a system that handles events involving a computing system and a network using a fabric monitoring system according to the present disclosure; 1 illustrates an example fabric monitoring system and related details for handling events involving computing systems and networks according to the present disclosure. 1 illustrates an example fabric monitoring system and related details for handling events involving computing systems and networks according to the present disclosure. 1 illustrates an example fabric monitoring system and related details for handling events involving computing systems and networks according to the present disclosure. 1 illustrates an example fabric monitoring system and related details for handling events involving computing systems and networks according to the present disclosure. 2 illustrates an exemplary process flow in a system that handles events involving a computing system and a network using a fabric monitoring system in accordance with the present disclosure. 2 illustrates an exemplary process flow in a system that handles events involving a computing system and a network using a fabric monitoring system in accordance with the present disclosure.

下記で論じられる図１乃至図８、及び本特許文献において本発明の原理を説明するのに用いられる様々な実施形態は、単に例示であり、いかなる形でも本発明の範囲を限定するようみなされるべきではない。当業者は、本発明の原理が任意タイプの適切に配置された装置又はシステムにおいて実装できることを理解するであろう。 The various embodiments used to explain the principles of the present invention in FIGS. 1-8, discussed below, and in this patent document, are merely exemplary and are considered to limit the scope of the invention in any way. Should not. Those skilled in the art will appreciate that the principles of the present invention may be implemented in any type of suitably arranged device or system.

図１は、本開示によるファブリック監視システムを用いてコンピューティングシステム及びネットワークに関与する（involving）イベントを扱う一例示的なシステム１００を示す。図１に示されるように、システム１００は、１つ以上のコンピューティングシステム又はネットワーク１０２ａ〜１０２ｎを含み、あるいは該１つ以上のコンピューティングシステム又はネットワーク１０２ａ〜１０２ｎに関連付けられる。各コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎは、コンピューティング装置１０４及び／又はネットワーキング装置１０６の集合を表す。各コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎは、任意数の装置１０４及び／又は１０６を含むことができる。上述されたように、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎは、ほんの一握りの装置１０４及び／又は１０６を有するシステム又はネットワークから、最大で数万の装置１０４及び／又は１０６（あるいはさらに多く）を有するシステム又はネットワークに及んでよい。複数のコンピューティングシステム又はネットワーク１０２ａ〜１０２ｎが、単一の共通の地理的エリア内で、又はかなり長い距離で分離されるエリアを含む複数の地理的エリアにわたって使用できる。 FIG. 1 illustrates an example system 100 that handles involving events in computing systems and networks using a fabric monitoring system according to the present disclosure. As shown in FIG. 1, system 100 includes or is associated with one or more computing systems or networks 102a-102n. Each computing system or network 102a-102n represents a collection of computing devices 104 and / or networking devices 106. Each computing system or network 102a-102n may include any number of devices 104 and / or 106. As described above, a computing system or network 102a-102n can have up to tens of thousands of devices 104 and / or 106 (or more) from a system or network having only a handful of devices 104 and / or 106. It may span any system or network it has. Multiple computing systems or networks 102a-102n can be used within a single common geographic area or across multiple geographic areas including areas separated by a significant distance.

コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎの各々における１つ以上の装置が、少なくとも１つのネットワーク１０８を通じて通信することができる。ネットワーク１０８は、１つ以上の場所における任意の適切なネットワーク又はネットワークの組み合わせを表す。ネットワーク１０８には、例えば、１つ以上のローカルエリアネットワーク（ＬＡＮ）、ワイドエリアネットワーク（ＷＡＮ）、メトロポリタンエリアネットワーク（ＭＡＮ）、又は地域的若しくは世界的なネットワークを含むことができる。コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎと関連ネットワーク１０８との集合は、本特許文献において「エンタープライズシステム（enterprise system）」と呼ばれ得る。 One or more devices in each of the computing systems or networks 102a-102n may communicate over at least one network 108. Network 108 represents any suitable network or combination of networks at one or more locations. The network 108 may include, for example, one or more local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), or regional or global networks. A collection of computing systems or networks 102a-102n and associated network 108 may be referred to in this patent document as an “enterprise system”.

ファブリック監視システム１１０は、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎ内のコンピューティング装置１０４及びネットワーキング装置１０６のうちさまざまものを用いることによってなどで、エンタープライズシステム内に実装される。ファブリックコンピューティング（統一されたコンピューティング、統一されたファブリック、データセンターファブリック、統一されたデータセンターファブリックとも呼ばれる）は、通信リンク１１４を用いて相互接続されるコンピューティングノード１１２により形成されるコンピューティングファブリックの作成に関与する。通信リンク１１４により定義されるコンピューティングノード１１２及びネットワーク接続性トポロジの正確なレイアウトは、必要又は所望に応じて、ここに図示されるものから変動してよい。ファブリック監視システム１１０は、高帯域幅相互接続（１０ギガビットイーサネット（登録商標）、インフィニバンド（InfiniBand）接続など）によりリンクされた疎結合のストレージ、ネットワーキング、及び並列処理機能を含む合併された高性能コンピューティングシステムをルーチン的に含む。いくつかの実施形態において、相互接続されたノードは、単一の論理ユニットとして実行するように見える。 The fabric monitoring system 110 is implemented in an enterprise system, such as by using a variety of computing devices 104 and networking devices 106 in a computing system or network 102a-102n. Fabric computing (also referred to as unified computing, unified fabric, data center fabric, unified data center fabric) is the computing formed by computing nodes 112 interconnected using communication links 114. Involved in creating the fabric. The exact layout of the computing nodes 112 and network connectivity topology defined by the communication link 114 may vary from what is shown here as needed or desired. The fabric monitoring system 110 is a merged high performance including loosely coupled storage, networking, and parallel processing functions linked by high bandwidth interconnects (such as 10 Gigabit Ethernet, InfiniBand connections, etc.). Routinely includes a computing system. In some embodiments, the interconnected nodes appear to execute as a single logical unit.

ファブリック監視システム１１０の基本コンポーネントは、そのノード１１２及びそのリンク１１４である。ノード１１２は、プロセッサ、メモリ、及びペリフェラルデバイスなどのハードウェアコンポーネントを一般に含む。リンク１１４は、ノード１１２間の機能的接続である。ファブリック監視システム１１０は、いくつかの理由で他のアーキテクチャから区別できる。例えば、ファブリック監視システム１１０は、複数の「ストライプ（stripes）」において展開され、クロスストライプ通信及び信号伝達のサポートを提供することができる。このことは、ファブリック監視システム１１０の向上したスケーラビリティ及び回復力（resiliency）を提供する。さらに、ファブリック監視システム１１０は、複数のタイプの処理モデル（ユーザ定義されたモデル及び解析的なモデルなど）をサポートすることができ、このことは、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎに関連付けられたイベントを識別及び分類する複数のメカニズムをサポートする。 The basic components of the fabric monitoring system 110 are its node 112 and its link 114. Node 112 typically includes hardware components such as processors, memory, and peripheral devices. Link 114 is a functional connection between nodes 112. The fabric monitoring system 110 can be distinguished from other architectures for several reasons. For example, the fabric monitoring system 110 can be deployed in multiple “stripes” to provide cross-stripe communication and signaling support. This provides improved scalability and resiliency for the fabric monitoring system 110. Further, the fabric monitoring system 110 can support multiple types of processing models (such as user-defined models and analytical models), which are associated with the computing system or networks 102a-102n. Supports multiple mechanisms for identifying and classifying events.

下記でより詳細に説明されるように、ファブリック監視システム１１０は、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎにおいて展開されるエンタープライズアプリケーションとコンピューティングシステム又はネットワーク１０２ａ〜１０２ｎの他の態様とを監視、診断、及び維持することにおいて、有利に使用できる。エンタープライズアプリケーションは、１つ以上の場所の複数の装置１０４及び／又は１０６上に展開され、かつイベント関連情報をファブリック監視システム１１０に提供するアプリケーションを表す。従来の監視システムは個々の異常又はシステム故障についてアラートをしばしば提供するが、これら監視システムは典型的に、大きなエンタープライズシステムにわたりシステム及びアプリケーションイベントを正しくカテゴリ化及び処理するための統合されたアプローチを提供するのに失敗している。ファブリック監視システム１１０は、大きなエンタープライズシステムを含む様々な環境において使用されるシステム及びアプリケーションイベントを正しくカテゴリ化及び処理するための統合されたアプローチを提供することができる。 As described in more detail below, the fabric monitoring system 110 monitors, diagnoses, and monitors enterprise applications deployed in computing systems or networks 102a-102n and other aspects of the computing systems or networks 102a-102n. And can be advantageously used in maintaining. An enterprise application represents an application that is deployed on multiple devices 104 and / or 106 at one or more locations and that provides event-related information to the fabric monitoring system 110. Traditional monitoring systems often provide alerts for individual anomalies or system failures, but these monitoring systems typically provide an integrated approach to correctly categorize and handle system and application events across large enterprise systems Failed to do. The fabric monitoring system 110 can provide an integrated approach for correctly categorizing and handling system and application events used in various environments including large enterprise systems.

とりわけ、このことは、ファブリック監視システム１１０が組織レベルの診断及び維持を提供することを可能にする。例えば、ファブリック監視システム１１０は、下記に説明されるように、イベントの発生又は発端からその（可能性として自動化された）解決まで、イベントの完全な状況（situation）管理ライフサイクルを提供することに使用できる。ファブリック監視システム１１０は、静的ルールに代わって又は追加で、解析及び機械学習に基づいてイベントの処理をさらに提供することができる。さらに、ファブリック監視システム１１０は、インフラストラクチャ及びアプリケーションメトリクス収集の高度にスケーラブルなプラットフォームを、予測解析に基づく迅速なインシデント解決と共に提供することができる。このことは、ファブリック監視システム１１０が、発生したイベントに単に反応するのでなくイベント処理に関連したより多くの予測機能に使用されることを可能にできる。 Among other things, this allows the fabric monitoring system 110 to provide organization level diagnosis and maintenance. For example, the fabric monitoring system 110 may provide a complete situation management lifecycle of events from the occurrence or initiation of the event to its (possibly automated) resolution, as described below. Can be used. The fabric monitoring system 110 may further provide event processing based on analysis and machine learning instead of or in addition to static rules. Furthermore, the fabric monitoring system 110 can provide a highly scalable platform for infrastructure and application metrics collection with rapid incident resolution based on predictive analysis. This can allow the fabric monitoring system 110 to be used for more predictive functions related to event processing rather than simply reacting to events that have occurred.

ファブリック監視システム１１０により識別及び処理されるイベントは、情報のビットを表し、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎ内の任意の適切なソースから生じる可能性がある。例えば、イベントは、装置、システム、又はネットワーク（又は、ゆえに一部分）の、現在の状態、又は現在の状態における変化を表すことができる。イベントは、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎ内における異常又は定義された条件の発生を識別するのにさらに使用できる。特定タイプのイベントの例には、アプリケーションを実行するコンピュータの現在の中央処理ユニット（ＣＰＵ）利用、アプリケーションを実行するコンピュータ上の障害の識別、又はアプリケーションにより識別される障害のある接続を含むことができる。下記で説明されるように、ファブリック監視システム１１０により使用されるルールは、関心のあるイベントをリアルタイムで識別するのに役立ち、イベントは、次いで、（手動で、又は自動化された仕方で）調査又は解決される状況を識別するのに使用される。 Events identified and processed by the fabric monitoring system 110 represent bits of information and may originate from any suitable source within the computing system or networks 102a-102n. For example, an event can represent a current state or a change in the current state of a device, system, or network (or part thereof). Events can further be used to identify the occurrence of anomalies or defined conditions within a computing system or network 102a-102n. Examples of specific types of events include current central processing unit (CPU) utilization of the computer executing the application, identification of a fault on the computer executing the application, or a faulty connection identified by the application. it can. As described below, the rules used by the fabric monitoring system 110 help to identify events of interest in real time, which can then be investigated (manually or in an automated manner) Used to identify the situation being resolved.

状況は、イベントのストリームから導出され、様々な処理モデルを用いて識別でき、上記処理モデルは、ファブリック監視システム１１０が如何にしてイベントを処理して状況を識別するかを定義する。例えば、処理モデルは、状況が各イベントについて作成されるべきであると示すことができる。別の例として、処理モデルは、単一の資産又は資産のグループに関連した指定された数又はタイプのイベントが定義された時間期間内に発生するとき、状況が作成されるべきであると示すことができる。資産は、ゆえに、何らかのハードウェア、ソフトウェア、ファームウェア、又は組み合わせを一般に表す。資産の例には、特定のハードウェア（スイッチ又はホストコンピュータなど）、特定のアプリケーション、又は他の仮想的／物理的計算プラットフォームを含むことができる。処理モデル及びベースラインポリシーのライブラリが、ファブリック監視システム１１０内で作成及び記憶でき、これらモデル及びポリシーは、インフラストラクチャ又はアプリケーションイベント監視のドメインに直接適用可能であり得る。 The situation is derived from a stream of events and can be identified using various processing models, which define how the fabric monitoring system 110 processes the event and identifies the situation. For example, the processing model can indicate that a situation should be created for each event. As another example, a processing model indicates that a situation should be created when a specified number or type of events associated with a single asset or group of assets occur within a defined time period be able to. An asset therefore generally represents some hardware, software, firmware, or combination. Examples of assets can include specific hardware (such as switches or host computers), specific applications, or other virtual / physical computing platforms. A library of processing models and baseline policies can be created and stored within the fabric monitoring system 110, and these models and policies may be directly applicable to infrastructure or application event monitoring domains.

各々の識別された状況は、さらなるアクションのためにシステムにわたり変換及び通信できる。例えば、状況は、チケット番号を与えられ、補正アクションのためにシステム維持又は動作インテリジェンスプラットフォームにルーティングされることができ、あるいは、状況は、エンタープライズアプリケーション内の自動化された報告及び補正機能に関するとして識別されてもよい。 Each identified situation can be translated and communicated across the system for further action. For example, the situation can be given a ticket number and routed to a system maintenance or operational intelligence platform for corrective action, or the situation can be identified as relating to automated reporting and correction functions within the enterprise application. May be.

こうして、エンタープライズシステム全体が、ファブリック監視システム１１０を用いて、特定イベントレベルにおける報告及び記録と共に監視及び維持できる。カテゴリ化、報告、並びに補正及び／又は予測アクションを含むイベント処理が、静的なルール及びフィルタに代わって又は追加で、解析及び機械学習手法に基づくことができる。そのようなものとして、エンタープライズシステムにわたりファブリック監視システム１１０を利用するイベント監視が、インフラストラクチャ及びアプリケーションメトリクス収集の高度にスケーラブルな統一されたプラットフォームを提示し、予測解析に基づいて迅速なインシデント解決を提供する。 Thus, the entire enterprise system can be monitored and maintained with reports and records at a specific event level using the fabric monitoring system 110. Event processing including categorization, reporting, and correction and / or prediction actions can be based on analysis and machine learning techniques instead of or in addition to static rules and filters. As such, event monitoring utilizing fabric monitoring system 110 across enterprise systems presents a highly scalable and unified platform for infrastructure and application metrics collection and provides rapid incident resolution based on predictive analysis To do.

ファブリック監視システム１１０は、イベント枯渇が軽減されることを確保するのに役立つようにさらに動作することができる。イベント枯渇は、障害アプリケーション又は装置に起因して、あるいは意図的なサービス拒否（ＤＯＳ）攻撃、分散ＤＯＳ（ＤＤＯＳ）攻撃、又は他の攻撃に起因してなどで、過剰な数のイベントが生成されるとき、発生することがある。過剰な数のイベントは、従来のシステムに負荷をかけ過ぎて、システムが下流コンポーネントにイベントを提供することを停止させる可能性がある（ゆえに、下流コンポーネントはイベントが「枯渇（starved）」する）。いくつかの実施形態において、ファブリック監視システム１１０は、コンポーネントの抽象化を可能にすることによりイベント枯渇に関する問題点に対処する。 The fabric monitoring system 110 can further operate to help ensure that event depletion is mitigated. Event depletion may generate an excessive number of events, such as due to faulty applications or devices, or due to intentional denial of service (DOS) attacks, distributed DOS (DDOS) attacks, or other attacks. May occur. An excessive number of events can overload a conventional system and cause the system to stop providing events to downstream components (thus downstream components “starved” events). . In some embodiments, the fabric monitoring system 110 addresses issues related to event exhaustion by enabling component abstraction.

ファブリック監視システム１１０は、イベントルーティング、状況検出、及びイベント強化（enrichment）の間、メッセージング及び持続と、参照データの使用とをさらに提供することができる。例えば、いくつかの実施形態において、各イベントがファブリック監視システム１１０を通して処理されるとき、各イベントの処理の詳細な履歴が持続的記憶装置に記憶できる。イベント履歴は、クエリ又は検索機能を用いることによってなどで、問い合わせ及び検索できる。 The fabric monitoring system 110 may further provide messaging and persistence and use of reference data during event routing, situation detection, and event enrichment. For example, in some embodiments, as each event is processed through the fabric monitoring system 110, a detailed history of the processing of each event can be stored in persistent storage. The event history can be queried and searched, such as by using a query or search function.

さらに、イベントサブスクリプションに関するプロトコル及び機能性が、エンタープライズシステムとエンタープライズシステム内のエンタープライズアプリケーションとにおけるイベント及び状況の先制的（preemptive）認識をファブリック監視システム１１０がサポートすることを可能にし、このことは、下層の低レベルのインフラストラクチャコンポーネントにしばしば依存する。例えば、ファブリック監視システム１１０は、組織のインフラストラクチャの別個の又は異なるエリア内で発生するイベントから導出された状況が作成できるように、イベントのサブスクリプションをサポートすることができる。 Further, the protocol and functionality related to event subscription allows the fabric monitoring system 110 to support preemptive awareness of events and situations in the enterprise system and enterprise applications within the enterprise system, Often depends on the underlying low-level infrastructure components. For example, the fabric monitoring system 110 can support event subscription so that situations derived from events occurring in separate or different areas of an organization's infrastructure can be created.

いくつかの実施形態において、ユーザは、イベントが如何にしてカテゴリ化及びエスカレーションされ（escalated）るかを指定するのに使用されるポリシー及びルールを構成することができる。イベント管理ポリシーを構成する２つの例示的なメカニズムには、（ｉ）標準化された仕様の、事前定義された選択、及び（ｉｉ）特化された仕様を記述するドメイン固有言語（ＤＳＬ）が含まれる。ＤＳＬは、例えば、イベントが同じ名前又は他の識別子を与えられること、又はグループ化モデルに送信されることを可能にすることができ、このことは、スケジュール又は挙動解析に基づいて選択できる。 In some embodiments, the user can configure policies and rules that are used to specify how events are categorized and escalated. Two exemplary mechanisms for configuring an event management policy include: (i) a predefined selection of standardized specifications, and (ii) a domain specific language (DSL) that describes specialized specifications. It is. DSL can, for example, allow events to be given the same name or other identifier or sent to a grouping model, which can be selected based on a schedule or behavior analysis.

ファブリック監視システム１１０は、イベントグループ化及び状況識別のための様々な処理モデルをさらにサポートする。２つの例示的なタイプのモデルには、ユーザ定義されたグループ化モデルと、発見された又は解析的なグループ化モデルとが含まれる。複数の処理モデルが使用又はサポートされてもよく、さらなる処理モデルが必要又は所望に応じて作成されて異なるグループ化パターンを定義してもよい。ユーザ定義されたグループ化モデルは１以上のユーザにより定義され、ユーザ定義されたグループ化モデルの例には、「ワンフォーワン（One for One）」、「ＸオーバーＹ（X over Y）」、及び「バッテリ故障」を含むことができる。解析モデルが、１つ以上の解析機能をサポートするモデルとして定義され、解析モデルの例には、イベント類似度によるグループ化、又はイベント異常によるグループ化（カテゴリ化されていないイベント、新しい又は見たことのないイベント、イベントボリューム不規則性、予期されたイベントの欠如、登録されていないイベント、及びその他など）を含むことができる。 The fabric monitoring system 110 further supports various processing models for event grouping and situation identification. Two exemplary types of models include user-defined grouping models and discovered or analytical grouping models. Multiple processing models may be used or supported, and additional processing models may be created as needed or desired to define different grouping patterns. User-defined grouping models are defined by one or more users, examples of user-defined grouping models include “One for One”, “X over Y”, And “battery failure”. An analysis model is defined as a model that supports one or more analysis functions, and examples of analysis models include grouping by event similarity, or grouping by event anomalies (uncategorized events, new or viewed Events, event volume irregularities, lack of expected events, unregistered events, etc.).

いくつかの実施形態において、イベントカテゴリ化は、状態を持たなくて（stateless）よく、分散でき、しかしながら、多くのノード１１２が負荷を処理するのに必要とされ、あるいは利用可能である。ファブリック監視システム１１０内のメッセージングシステムが、利用可能な処理ノード１１２にイベントを分散するのに使用できる。メッセージングシステムは、「グループキー」又は他のインジケータを実装又は利用して、同じグループの一部であるいかなるイベントも同じ処理ノード１１２に配信されることを確保することができる。グループは、単一の資産又は資産の集合に関連付けられたイベントをグループ化することによってなど、任意の適切な仕方で定義できる。メッセージングシステム及び特定の持続メカニズムがさらに「プラガブル（pluggable）」であってもよく、このことは、ファブリックシステム内のさらなる機能性の品質保証及び開発のための様々なメカニズムのより低コストの実装を容易にする。モデル評価に必要とされる状態は、処理インスタンス内にキャッシュでき、メッセージングシステムは、ノード１１２又は情報がキャッシュされる場所にイベントを配信することができ、連続性が、オフマシン持続ストアに対する状態の変化のドロップコピー（drop copy）によってなどで達成できる。 In some embodiments, event categorization may be stateless and distributed, however, many nodes 112 are required or available to handle the load. A messaging system within the fabric monitoring system 110 can be used to distribute events to available processing nodes 112. The messaging system may implement or utilize a “group key” or other indicator to ensure that any event that is part of the same group is delivered to the same processing node 112. Groups can be defined in any suitable manner, such as by grouping events associated with a single asset or collection of assets. The messaging system and the specific persistence mechanism may also be “pluggable”, which means a lower cost implementation of various mechanisms for quality assurance and development of additional functionality within the fabric system. make it easier. The state required for model evaluation can be cached in the processing instance, the messaging system can deliver events to the node 112 or where information is cached, and continuity can be This can be achieved, for example, by drop copy of changes.

上述されたように、ファブリック監視システム１１０には、ストライピングされた（striped）処理フローのビルトインサポートを含むことができ、このことは、プラットフォームの隔離を可能にし、イベント枯渇に関連したリスクを軽減するのに役立つ可能性がある。ストライピングを用いて、異なるノード１１２又はさらにはファブリック監視システム１１０自体の異なるインスタンスが、異なる資産、異なる地域、又はハードウェア／ソフトウェア／ファームウェアの異なる展開からのイベントなどの、異なるソースからのイベントを処理するのに使用できる。ストライピングをサポートする他のパーティションが、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎを用いて取引されるビジネスユニットにより又はビジネスのタイプによりエンタープライズシステムを分割することによってなどで、さらに使用できる。ストライピングでの１つの挑戦は、あるストライプ内のイベント又は状況をこうしたイベント又は状況を知る必要がある他のストライプに通信する方法に関与する。いくつかの実施形態において、このことは、１つのストライプ内における状況の作成に対して合成イベントを作成することにより行える。これら合成イベントは、次いで、他のストライプに分散されて、イベント又は状況のクロスストライプ相関（cross-stripe correlations）を可能にすることができる。 As described above, the fabric monitoring system 110 can include built-in support for striped processing flows, which enables platform isolation and reduces the risks associated with event depletion. May help. Using striping, different nodes 112 or even different instances of the fabric monitoring system 110 themselves handle events from different sources, such as events from different assets, different regions, or different hardware / software / firmware deployments. Can be used to do. Other partitions that support striping can be further used, such as by dividing the enterprise system by business units that are traded using computing systems or networks 102a-102n or by business type. One challenge in striping involves how to communicate events or situations in one stripe to other stripes that need to know such events or situations. In some embodiments, this can be done by creating a composite event for the creation of situations within a stripe. These composite events can then be distributed to other stripes to enable cross-stripe correlations of events or situations.

実装に依存して、ファブリック監視システム１１０は、システム管理者、ユーザグループ、又はサブスクライバに対する通知を含む、アクションを必要とする状況のインテリジェントな監視及び通知を提供する。さらに、状況は、エンタープライズシステム上の単一のイベント、又はエンタープライズシステム内の異常への深い洞察を提供するように相関された複数のイベントであり得る。さらに、ファブリック監視システム１１０は、大規模エンタープライズ技術環境イベントの透過性及びインテリジェントな管理を届けることにより動作及び規制リスクを低減することができる。ファブリック監視システム１１０は、さらに、ユーザのためのワークフローを配信して、イベントが如何にして（優先順位、グループ、状況、又はユーザ定義されたカテゴリによってなどで）カテゴリ化、報告、及び記録されるかと、後続のアクションが如何にして割り当て及び実行されるかとを指定する。ファブリック監視システム１１０は、イベントグループ化ポリシーが、制御された検査及び促進ライフサイクルに晒されることをさらに可能にし、これにより、生産環境における望まれない変化又は不必要な処理に関連したエクスポージャが低減される。さらに、ファブリック監視システム１１０は、ルールを作成することができるユーザと生産又は使用に対して上記ルールを促進することができるユーザとの分離に起因して、ポリシー及びルールについて、制御されたライフサイクルの強制をサポートすることができる。 Depending on the implementation, the fabric monitoring system 110 provides intelligent monitoring and notification of situations requiring action, including notification to system administrators, user groups, or subscribers. Further, the situation can be a single event on the enterprise system or multiple events that are correlated to provide deep insight into anomalies within the enterprise system. In addition, the fabric monitoring system 110 can reduce operational and regulatory risks by delivering transparency and intelligent management of large-scale enterprise technology environment events. The fabric monitoring system 110 further distributes the workflow for the user and how events are categorized, reported, and recorded (such as by priority, group, status, or user-defined category). And how subsequent actions are assigned and executed. The fabric monitoring system 110 further allows the event grouping policy to be exposed to a controlled inspection and promotion life cycle, which allows exposure related to unwanted changes or unnecessary processing in the production environment. Reduced. In addition, the fabric monitoring system 110 provides a controlled lifecycle for policies and rules due to the separation of users who can create rules and users who can facilitate the rules for production or use. Can support forcing.

ファブリック監視システム１１０に関するさらなる詳細が下記に提供される。ファブリック監視システム１１０は、任意数のノード１１２及び通信リンク１１４を任意の適切な配置で含んでよいことに留意する。ファブリック監視システム１１０は、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎの外部に存在するように図示されているが、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎの１つ以上の中に形成され、あるいは存在してもよい。 Further details regarding the fabric monitoring system 110 are provided below. Note that the fabric monitoring system 110 may include any number of nodes 112 and communication links 114 in any suitable arrangement. Although the fabric monitoring system 110 is illustrated as being external to the computing system or networks 102a-102n, the fabric monitoring system 110 may be formed or exist within one or more of the computing systems or networks 102a-102n. Good.

図１は、ファブリック監視システム１１０を用いてコンピューティングシステム及びネットワークに関与するイベントを扱うシステム１００の一例を示すが、様々な変更が図１に対して行われてよい。例えば、システム１００には、任意数のコンピューティングシステム又はネットワーク（各々が任意数のコンピューティング又はネットワーキング装置を有する）、ネットワーク、及びファブリック監視システムを含むことができる。さらに、コンピュータに関与するシステム及びネットワークは高度に構成可能であり、図１は、本開示をシステム又はネットワークのいずれか特定の構成に限定しない。 Although FIG. 1 illustrates an example of a system 100 that handles events involving computing systems and networks using the fabric monitoring system 110, various changes may be made to FIG. For example, the system 100 can include any number of computing systems or networks (each having any number of computing or networking devices), networks, and fabric monitoring systems. Further, the systems and networks involved in the computer are highly configurable, and FIG. 1 does not limit the present disclosure to any particular configuration of systems or networks.

図２は、本開示によるファブリック監視システムを用いてコンピューティングシステム及びネットワークに関与するイベントを扱うシステムに関連付けられた一例示的なコンピューティング装置２００を示す。具体的に、図２は、図１のファブリック監視システム１１０内のコンピューティングノード１１２の一例示的な実装を示す。 FIG. 2 illustrates an exemplary computing device 200 associated with a system that handles events involving a computing system and a network using a fabric monitoring system according to the present disclosure. Specifically, FIG. 2 illustrates an exemplary implementation of the computing node 112 within the fabric monitoring system 110 of FIG.

図２に示されるように、コンピューティング装置２００はバスシステム２０２を含み、バスシステム２０２は、少なくとも１つの処理装置２０４と少なくとも１つの記憶装置２０６と少なくとも１つの通信ユニット２０８と少なくとも１つの入力／出力（Ｉ／Ｏ）ユニット２１０とにおける通信をサポートする。処理装置２０４は命令を実行し、該命令はメモリ２１２にロードされ得る。処理装置２０４は、任意の適切な数及びタイプのプロセッサ又は他の装置を任意の適切な配置で含むことができる。例示的なタイプの処理装置２０４には、マイクロプロセッサ、マイクロコントローラ、デジタル信号プロセッサ、フィールドプログラマブルゲートアレイ、特定用途向け集積回路、及びディスクリート回路が含まれる。 As shown in FIG. 2, computing device 200 includes a bus system 202, which includes at least one processing device 204, at least one storage device 206, at least one communication unit 208, and at least one input / Communication with the output (I / O) unit 210 is supported. The processing unit 204 executes instructions that may be loaded into the memory 212. The processing unit 204 can include any suitable number and type of processors or other devices in any suitable arrangement. Exemplary types of processing devices 204 include microprocessors, microcontrollers, digital signal processors, field programmable gate arrays, application specific integrated circuits, and discrete circuits.

メモリ２１２及び持続的記憶装置２１４は、記憶装置２０６の例であり、記憶装置２０６は、情報（一時的又は永続的方式でのデータ、プログラムコード、及び／又は他の適切な情報など）を記憶し、かつ情報の取り出しを容易にすることができる任意の構造を表す。メモリ２１２は、ランダムアクセスメモリ、又は任意の他の適切な揮発又は不揮発記憶装置を表すことができる。持続的記憶装置２１４には、より長期のデータ記憶をサポートする１つ以上のコンポーネント又は装置、例えば、読取専用メモリ、ハードドライブ、フラッシュメモリ、又は光ディスクなどを含むことができる。 Memory 212 and persistent storage 214 are examples of storage device 206, which stores information (such as data, program code, and / or other suitable information in a temporary or permanent manner). And any structure that can facilitate the retrieval of information. Memory 212 may represent random access memory or any other suitable volatile or non-volatile storage device. Persistent storage device 214 may include one or more components or devices that support longer term data storage, such as read only memory, hard drive, flash memory, or optical disk.

通信ユニット２０８は、他のシステム又は装置との通信をサポートする。例えば、通信ユニット２０８には、１つ以上の通信リンク１１４を通じた他のノード１１２との通信を容易にするネットワークインターフェースカード又は無線送受信器を含むことができる。通信ユニット２０８は、任意の適切な物理的又は無線の通信リンクを通して通信をサポートすることができる。 The communication unit 208 supports communication with other systems or devices. For example, the communication unit 208 can include a network interface card or wireless transceiver that facilitates communication with other nodes 112 over one or more communication links 114. The communication unit 208 can support communication through any suitable physical or wireless communication link.

Ｉ／Ｏユニット２１０は、データの入力及び出力を可能にする。例えば、Ｉ／Ｏユニット２１０は、ローカル外部メモリ、データベース、又はペリフェラルデバイスに対するデータの入力及び出力のための接続を提供することができる。 The I / O unit 210 allows data input and output. For example, the I / O unit 210 can provide a connection for input and output of data to a local external memory, database, or peripheral device.

図２は、ファブリック監視システムを用いてコンピューティングシステム及びネットワークに関与するイベントを扱うシステムに関連付けられたコンピューティング装置２００の一例を示すが、様々な変更が図２対して行われてよい。例えば、コンピューティング装置は高度に構成可能であり、図２は、本開示をコンピューティング装置のいずれか特定の構成に限定しない。 Although FIG. 2 illustrates an example of a computing device 200 associated with a system that handles events involving a computing system and a network using a fabric monitoring system, various changes may be made to FIG. For example, the computing device is highly configurable, and FIG. 2 does not limit the present disclosure to any particular configuration of computing device.

図３乃至図６は、本開示によるコンピューティングシステム及びネットワークに関与するイベントを扱う一例示的なファブリック監視システム１１０並びに関連した詳細を示す。図３に示されるように、ファブリック監視システム１１０は、ホスト３０２と関連して動作しており、ホスト３０２は、図１におけるコンピューティング装置１０４又はネットワーキング装置１０６のうち任意のものを表すことができる。ここで、ホスト３０２には、様々なハードウェアコンポーネント、例えば、１つ以上のプロセッサ３０４、１つ以上のハードディスク３０６、及び１つ以上のメモリ３０８などが含まれる。プロセッサ３０４は（とりわけ）、１つ以上のエンタープライズアプリケーション又は他のアプリケーションを実行するのに使用できる。当然ながら、ホスト装置は幅広い構成で生じることができ、上記構成には、他の又はさらなるハードウェアコンポーネントを含んでよい。１つのホスト３０２が図３に示されているが、ファブリック監視システム１１０は、イベントの任意数のホスト又は他のソースと共に使用できることに留意する。 FIGS. 3-6 illustrate an example fabric monitoring system 110 and associated details for handling events involving computing systems and networks according to this disclosure. As shown in FIG. 3, the fabric monitoring system 110 is operating in conjunction with a host 302, which can represent any of the computing devices 104 or networking devices 106 in FIG. . Here, the host 302 includes various hardware components, such as one or more processors 304, one or more hard disks 306, and one or more memories 308. The processor 304 (among others) can be used to execute one or more enterprise applications or other applications. Of course, the host device can occur in a wide variety of configurations, which may include other or additional hardware components. Note that although one host 302 is shown in FIG. 3, the fabric monitoring system 110 can be used with any number of hosts or other sources of events.

ホスト３０２は、イベントエージェント３１０及びイベントアプリケーションプログラミングインターフェース（ＡＰＩ）３１２を含む。イベントエージェント３１０は、ホスト３０２により生成されるイベントを収集し、イベントＡＰＩ３１２を介してファブリック監視システム１１０にイベントを提供する。イベントエージェント３１０は、イベントを収集する任意の適切なロジックを含み、イベントＡＰＩ３１２は、イベントエージェント３１０とインターフェースする任意の適切なインターフェースを含む。イベントエージェント３１０は、例えば、プロセッサ３０４により実行される１つ以上のアプリケーションを表すことができる。 Host 302 includes an event agent 310 and an event application programming interface (API) 312. The event agent 310 collects events generated by the host 302 and provides the events to the fabric monitoring system 110 via the event API 312. Event agent 310 includes any suitable logic for collecting events, and event API 312 includes any suitable interface that interfaces with event agent 310. The event agent 310 can represent, for example, one or more applications that are executed by the processor 304.

ファブリック監視システム１１０は、監視プラットフォーム３１４を含み、監視プラットフォーム３１４は、ホスト３０２及び他のイベントソースからイベントを収集するように動作する。とりわけ、検出されたイベントは、期待されたとおり稼働していないか又はユーザ定義された若しくは他の監視ルールを満足している、コンピューティング又はネットワーキング環境の態様を識別することができる。この例において、監視プラットフォーム３１４は、イベントサーバ３１２及びテレメトリモジュール３１６を含む。イベントサーバ３１４は、ホスト３０２内のイベントエージェント３１０から、及び他のホスト又はイベントソース内の他のイベントエージェントからイベントを収集する。テレメトリモジュール３１６は、検出されたイベント又は他の情報を解析して、トラブルシューティング、キャパシティプランニング、又は他の機能のためのメトリクスを提供する。テレメトリモジュール３１６からの情報は、例えば、イベント枯渇の予防に少なくとも部分的に貢献することができる。イベントサーバ３１４は、イベントエージェントからイベントを収集する任意の適切なロジックを含む。いくつかの実施形態において、イベントエージェント３１０及びイベントサーバ３１４は、情報技術（ＩＴ）監視ツール、例えば、ナギオスエンタープライズ（NAGIOS ENTERPRISES）から入手できるものなどを表すことができる。テレメトリモジュール３１６は、入ってくるイベントに関連付けられた１つ以上のメトリクスを識別する任意の適切なロジックを含む。 The fabric monitoring system 110 includes a monitoring platform 314 that operates to collect events from the host 302 and other event sources. Among other things, detected events can identify aspects of a computing or networking environment that are not running as expected or that meet user-defined or other monitoring rules. In this example, the monitoring platform 314 includes an event server 312 and a telemetry module 316. Event server 314 collects events from event agent 310 in host 302 and from other event agents in other hosts or event sources. Telemetry module 316 analyzes detected events or other information and provides metrics for troubleshooting, capacity planning, or other functions. Information from telemetry module 316 can contribute at least in part to prevention of event depletion, for example. Event server 314 includes any suitable logic for collecting events from event agents. In some embodiments, event agent 310 and event server 314 may represent information technology (IT) monitoring tools, such as those available from NAGIOS ENTERPRISES. Telemetry module 316 includes any suitable logic that identifies one or more metrics associated with an incoming event.

ファブリック監視システム１１０は、コアプラットフォーム３２０をさらに含み、コアプラットフォーム３２０は、監視プラットフォーム３１４により取得されるイベントを解析して、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎの１つ以上において起こっている、起こった、あるいは起こり得る状況を識別する。この例において、コアプラットフォーム３２０は、相関機能３２２をサポートし、相関機能３２２は、関連し、ゆえに１つ以上の状況の一部を形成し得るイベントを識別するのに使用できる。コアプラットフォーム３２０は、集約機能３２４をさらにサポートし、集約機能３２４は、さらなる処理のために関連イベントをグループ化するのに使用できる。コアプラットフォーム３２０は、強化機能３２６をさらにサポートし、強化機能３２６は、イベント又はイベントのグループに関するさらなる情報を提供するのに使用できる。強化機能３２６により提供される情報は、いくつかの例において、集約機能３２４が使用して関連イベントをグループ化することができる。コアプラットフォーム３２０は、抑制機能３２８をさらにサポートし、抑制機能３２８は、特定のイベントを抑制するのに使用でき、したがって、こうしたイベントは、状況を作成するのに使用されない（例えば、関心のないことを知られているイベントについてなど）。さらに、コアプラットフォーム３２０は、１つ以上の自律サービス３３０をサポートし、自律サービス３３０は、変化する条件に応答して自動的に発生するサービスを表すことができる。例えば、自律サービス３３０は、検出された状況に応答してファブリック監視システム１１０又はコンピューティングシステム若しくはネットワーク１０２ａ〜１０２ｎを修正する自己修復の、自己構成の、自己最適化の、又は自己保護の機能をサポートすることができる。 The fabric monitoring system 110 further includes a core platform 320 that analyzes the events obtained by the monitoring platform 314 and has occurred in one or more of the computing systems or networks 102a-102n. Or identify possible situations. In this example, the core platform 320 supports a correlation function 322 that can be used to identify events that are related and therefore can form part of one or more situations. The core platform 320 further supports an aggregation function 324, which can be used to group related events for further processing. The core platform 320 further supports an enhancement function 326, which can be used to provide further information regarding the event or group of events. The information provided by the enhancement function 326 can be used by the aggregation function 324 to group related events in some examples. The core platform 320 further supports suppression functions 328, which can be used to suppress certain events, and therefore these events are not used to create a situation (eg, uninteresting About known events). Further, the core platform 320 supports one or more autonomous services 330, which can represent services that occur automatically in response to changing conditions. For example, the autonomous service 330 provides a self-healing, self-configuring, self-optimizing, or self-protecting function that modifies the fabric monitoring system 110 or computing system or network 102a-102n in response to detected conditions. Can be supported.

図示されていないが、ファブリック監視システム１１０又はコアプラットフォーム３２０は、他の機能をサポートすることができる。例えば、１つ以上の解析機能が、イベントを解析するのに使用されて、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎ内におけるアプリケーションの調子（health）及びその依存関係を推定することができる。別の例として、１つ以上の報告機能が、イベント、エージェントの調子、及びシステムにより収集されたデータの過去の（historical）ビューを提供するのに使用できる。この例において、報告又は他の情報が、様々な宛先３３２ａ〜３３２ｃに提供できる。この例において、宛先には、アラート又は他の情報をユーザに提示するように構成された装置を表すアラートコンソール３３２ａ、コンピューティングシステム又はネットワークにおける装置の依存関係を表現するグラフィカル表示を表す依存関係グラフ３３２ｂ、検出されたイベント又は状況の数の指標を提示するパルスインジケータ３３２ｃが含まれる。当然ながら、ファブリック監視システム１１０からの情報は、任意の他の又はさらなる宛先に提示され、あるいは任意の他の適切な仕方で使用されてよい。 Although not shown, the fabric monitoring system 110 or the core platform 320 can support other functions. For example, one or more analysis functions can be used to analyze the event to estimate the health of the application and its dependencies within the computing system or network 102a-102n. As another example, one or more reporting functions can be used to provide a historical view of events, agent health, and data collected by the system. In this example, reports or other information can be provided to various destinations 332a-332c. In this example, the destination is an alert console 332a representing a device configured to present alerts or other information to the user, a dependency graph representing a graphical display representing device dependencies in a computing system or network. 332b, a pulse indicator 332c that presents an indication of the number of detected events or situations is included. Of course, information from the fabric monitoring system 110 may be presented to any other or further destination or used in any other suitable manner.

この例において、ポリシーマネージャ３３４は、監視プラットフォーム３１４及びコアプラットフォーム３２０により使用される監視ルールをユーザが自己管理することを可能にする。例として、これらルールは、関心のあるイベントを識別し、関連したイベントをグループ化し、イベントを抑制し、イベントに関連した状況を識別するのに使用できる。ポリシーマネージャ３３４を用いて定義されるルールは、リポジトリ３３６、例えば、データベース又は他の記憶及び取り出し装置若しくはシステムなどに記憶できる。 In this example, the policy manager 334 allows the user to self-manage the monitoring rules used by the monitoring platform 314 and the core platform 320. By way of example, these rules can be used to identify events of interest, group related events, suppress events, and identify situations associated with events. Rules defined using the policy manager 334 can be stored in a repository 336, such as a database or other storage and retrieval device or system.

ファブリック監視システム１１０は、さらに、少なくとも１つの参照データサービス３３８からデータを取り出すことができる。参照データサービス３３８は、ファブリック監視システム１１０により使用される任意の適切な参照データを提供するのに使用できる。例えば、参照データサービス３３８は、イベント分類及びグループ化を並びに状況識別を支援する情報を取得するのに使用できる。各データサービス３３８は、情報を記憶し、かつ情報の取り出しを容易にする任意の適切な構造を含む。 The fabric monitoring system 110 can further retrieve data from at least one reference data service 338. Reference data service 338 can be used to provide any suitable reference data used by fabric monitoring system 110. For example, the reference data service 338 can be used to obtain information that supports event classification and grouping as well as situation identification. Each data service 338 includes any suitable structure for storing information and facilitating information retrieval.

ファブリック監視システム１１０のさらなる詳細が図４に示される。図４に示されるように、ユーザ（例えば、アプリケーション技術的オーナなど）は、ポリシーマネージャ３３４によりサポートされるセルフサービスポータルを用いることによってなどで、１つ以上のポリシーを構成することができる。ポリシーは、リポジトリ３３６に記憶できる。ポリシーは、監視プラットフォーム３１４に対して利用可能にされ、監視プラットフォーム３１４は、ポリシーを使用して（とりわけ）ホスト３０２及び他のイベントソースからイベントを取得する。複数のホストが、エンタープライズシステムにわたり展開される１つ以上の共通エンタープライズアプリケーションを実行している可能性がある。 Further details of the fabric monitoring system 110 are shown in FIG. As shown in FIG. 4, a user (eg, an application technical owner, etc.) can configure one or more policies, such as by using a self-service portal supported by policy manager 334. Policies can be stored in the repository 336. The policy is made available to the monitoring platform 314, which uses the policy to obtain events from the host 302 and other event sources (among others). Multiple hosts may be running one or more common enterprise applications that are deployed across enterprise systems.

この例において、監視プラットフォーム３１４は、構成分散機能４０２をサポートし、構成分散機能４０２は、受信したポリシーからのルール及び閾情報をホスト及び他のイベントソース内の分散されたイベントエージェントに提供するのに使用される。監視プラットフォーム３１４は、状態管理機能４０４をさらにサポートし、状態管理機能４０４は、分散されたイベントエージェントとコアプラットフォーム３２０との間に位置し、かつ状態遷移を追跡して該状態遷移に基づくイベントをコアプラットフォーム３２０に送信する前処理コンポーネントである。監視プラットフォーム３１４は、抑制機能４０６をさらにサポートし、抑制機能４０６は、特定のイベントを抑制するのに使用でき、したがって、該イベントは、状況を作成するのに使用されない。さらに、監視プラットフォーム３１４は、「トラップ送信（send trap）」機能をサポートし、上記機能は、アプリケーション又は他のソースからコアプラットフォーム３２０に直接的にイベントを送信するのに使用されるエージェントなしＡＰＩを表現することができる。 In this example, the monitoring platform 314 supports a configuration distribution function 402 that provides rules and threshold information from received policies to distributed event agents in hosts and other event sources. Used for. The monitoring platform 314 further supports a state management function 404, which is located between the distributed event agent and the core platform 320, and tracks state transitions to generate events based on the state transitions. A pre-processing component that is sent to the core platform 320. The monitoring platform 314 further supports a suppression function 406, which can be used to suppress a particular event, and thus the event is not used to create a situation. In addition, the monitoring platform 314 supports a “send trap” function that enables an agentless API that is used to send events directly to the core platform 320 from an application or other source. Can be expressed.

監視プラットフォーム３１４は、イベント基準及び監視情報、例えば、ベースライン監視ポリシー及びアプリケーション監視ポリシーなどをイベントエージェント３１０に送信し、イベントエージェント３１０からイベントを受信する。受信イベントは、イベント基準及び監視情報を用いてイベントエージェント３１０により識別される。監視プラットフォーム３１４は、さらに、外部監視モジュール及び機能４１０並びにエンタープライズスキャン機能４１２と通信し、外部監視モジュール及び機能４１０並びにエンタープライズスキャン機能４１２からイベントを受信することができ得る。外部監視モジュール及び機能４１０は、イベント基準及び監視情報を監視プラットフォーム３１４から受信し、その情報を使用してイベントを識別することができ、一方、エンタープライズスキャン機能４１２は、こうした情報なく動作することができる。ここに見られるように、監視プラットフォーム３１４は、入力として様々なソースからイベントを受信することができる。イベントエージェント３１０、外部監視モジュール及び機能４１０、及びエンタープライズスキャン機能４１２は、エンタープライズシステムにわたり分散できるため、監視プラットフォーム３１４は、複数の場所において発生するイベントを受信し、システムを通じてイベントを報告して実際のエンタープライズ性能に対する可視性を提供することができる。 The monitoring platform 314 sends event criteria and monitoring information, such as baseline monitoring policy and application monitoring policy, to the event agent 310 and receives events from the event agent 310. Received events are identified by event agent 310 using event criteria and monitoring information. The monitoring platform 314 may further be able to communicate with the external monitoring module and function 410 and the enterprise scan function 412 and receive events from the external monitoring module and function 410 and the enterprise scan function 412. External monitoring module and function 410 can receive event criteria and monitoring information from monitoring platform 314 and use that information to identify events, while enterprise scan function 412 can operate without such information. it can. As can be seen here, the monitoring platform 314 can receive events from various sources as input. Since the event agent 310, the external monitoring module and function 410, and the enterprise scan function 412 can be distributed across enterprise systems, the monitoring platform 314 receives events that occur in multiple locations and reports the events through the system and Provides visibility into enterprise performance.

ひとたびイベントが監視プラットフォーム３１４において受信されると、イベント（又は、少なくとも抑制されていないイベント）がコアプラットフォーム３２０に転送され、コアプラットフォーム３２０において、イベントは、ポリシーからロードされたルールに従って評価される。例えば、ルールを使用して、イベントを分類し、いずれのタイプの処理モデルがコアプラットフォーム３２０に到着するイベントのストリームを監視するのに使用されることになるかを決定することができる。ゆえに、少なくとも１つの処理モデルが選択され、使用されて、状況がいつ作成されるべきかを決定する。イベントは、分類の後、抑制されているとしてマーク付けでき、イベントを評価するモデルは、抑制指標を無視して抑制されたイベントを処理するか、あるいは抑制指標を使用して抑制されたイベントを無視することができる。相関及び集約機能３２２及び３２４は、イベント分類の間、ルールと、ルールが指定するモデルとによって駆動できる。 Once the event is received at the monitoring platform 314, the event (or at least an unsuppressed event) is forwarded to the core platform 320, where the event is evaluated according to the rules loaded from the policy. For example, rules can be used to classify events and determine which type of processing model will be used to monitor the stream of events arriving at the core platform 320. Thus, at least one processing model is selected and used to determine when the situation should be created. Events can be marked as suppressed after classification, and the model that evaluates the event either handles the suppressed event ignoring the suppression index, or uses the suppression index to mark the suppressed event. Can be ignored. Correlation and aggregation functions 322 and 324 can be driven by rules and the model specified by the rules during event classification.

１つ以上のチケッティング（ticketing）作成機能４１４が、ここで、コアプラットフォーム３２０内で使用される。識別された状況が、ポリシーからロードされたルールに基づいてチケッティング作成機能４１４に分散でき、上記ポリシーからロードされたルールは、いずれのチケッティング作成機能４１４がいずれの状況に適当かを示す。ひとたびイベントがコアプラットフォーム３２０内で処理されると、イベント又は状況は、記録、分析、補正／予防アクション、又は他の機能のために、端末、プロセッサ、又はユーザなどの任意数のさらなる宛先４１６に対するエスカレーションに利用可能にされる。 One or more ticketing creation functions 414 are now used within the core platform 320. The identified situations can be distributed to the ticketing creation function 414 based on the rules loaded from the policy, and the rules loaded from the policy indicate which ticketing creation function 414 is appropriate for which situation. Once the event is processed within the core platform 320, the event or situation may be directed to any number of additional destinations 416, such as a terminal, processor, or user, for recording, analysis, corrective / preventive actions, or other functions. Made available for escalation.

いくつかの実施形態において、コアプラットフォーム３２０は、関連したイベントの、サービスインパクト状況へのクラスタ化を提供する。こうしたクラスタ化は、いくつかの例において、解析的に同様のイベントをクラスタ化又はグループ化すること、重複イベントを除外すること、及び解析的に一意のイベントを識別することにより、監視ノイズにおいて６５％又はそれ以上の低減を可能にする。 In some embodiments, the core platform 320 provides clustering of related events into service impact situations. Such clustering, in some instances, in surveillance noise by clustering or grouping analytically similar events, excluding duplicate events, and identifying analytically unique events. % Or more reduction is possible.

状況は、イベントと同様、さらに処理されて、複数の状況モデル、例えば、発見された及び／又はユーザ定義されたモデルなどを作成することができる。ファブリック監視システム１１０のチケッティング及びイベント／状況記録機能に起因して、すべてのイベント及び状況の透過的かつ十分な監査証跡が提供できる。さらに、イベント及び状況の記録、カテゴリ化、及び監査が、トレンド、外れ値、偽の状況、並びにイベント及び状況に関連付けられた他のデータを解析及び識別する能力を提供する。 Situations, like events, can be further processed to create multiple situation models, such as discovered and / or user-defined models. Due to the ticketing and event / status recording capabilities of the fabric monitoring system 110, a transparent and sufficient audit trail of all events and status can be provided. In addition, event and situation recording, categorization, and auditing provide the ability to analyze and identify trends, outliers, fake situations, and other data associated with the events and situations.

図５は、コアプラットフォーム３２０の特定の実施形態において如何にしてイベントが処理できるかのさらなる詳細を示す。図５に示されるように、様々なイベントソース５０２が、ファブリック監視システム１１０にイベントを提供する。イベントソース５０２には、イベントエージェント３１０の使用を通してなどでイベントをファブリック監視システム１１０に提供することができるアプリケーション、ホストサーバ、及びユーザ装置が含まれる。イベントは、イベントバス５０４を通して報告され、イベントバス５０４は、イベントを受信するように構成されたキュー又は他の構造を表すことができる。イベントバス５０４は、例えば、監視プラットフォーム３１４又はコアプラットフォーム３２０において使用できる。 FIG. 5 shows further details on how events can be handled in certain embodiments of the core platform 320. As shown in FIG. 5, various event sources 502 provide events to the fabric monitoring system 110. Event sources 502 include applications, host servers, and user equipment that can provide events to the fabric monitoring system 110, such as through the use of an event agent 310. Events are reported through event bus 504, which may represent a queue or other structure configured to receive events. The event bus 504 can be used in the monitoring platform 314 or the core platform 320, for example.

イベント処理システム５０４が、イベント登録モジュール５０８、モデル評価モジュール５１０、及び状況強化モジュール５１２を含む。イベント登録モジュール５０８は、入ってくるイベントを識別し、イベントに一意識別子を割り当て、入ってくるイベントに関連した他の動作を実行することができる。モデル評価モジュール５１０は、イベントを処理して、イベントに関連付けられた様々な状況を識別する。状況強化モジュール５１２は、識別された状況を処理し、識別された状況に関するさらなる情報を提供する。 The event processing system 504 includes an event registration module 508, a model evaluation module 510, and a situation enhancement module 512. The event registration module 508 can identify incoming events, assign a unique identifier to the event, and perform other actions associated with the incoming event. The model evaluation module 510 processes the event and identifies various situations associated with the event. The situation enhancement module 512 processes the identified situation and provides further information regarding the identified situation.

モジュール５０８〜５１２は、イベントポリシーストア５１４、イベント／状況ストア５１６、及びキープロセスインジケータ（ＫＰＩ）ストア５１８から、データ及び情報を引き出す。監査証跡及び追跡モジュール５２０、並びにイベント／状況ビューア５２２又は他のユーザインターフェースが、さらに提供される。イベントポリシーストア５１４は、例えばポリシーがリポジトリ３３６から受信されるときなどに様々なユーザ定義された又は他のポリシーが記憶される記憶装置を表す。イベント／状況ストア５１６は、受信されたイベント及び識別された状況に関する情報を記憶する。ＫＰＩストア５１８は、ファブリック監視システム１１０により捕捉される測定値と測定値が如何にして使用されるかとに関する情報を提供する。監査証跡及び追跡モジュール５２０は、イベント及び状況に関する情報を追跡し、上記情報を、イベント及び状況自体に関する情報と状況が如何にして解決されるかとを含めて記憶する。イベント／状況ビューア５２２は、ファブリック監視システム１１０とインターフェースし、かつファブリック監視システム１１０により取得される結果を見るためのユーザインターフェースを提供する。 Modules 508-512 pull data and information from event policy store 514, event / status store 516, and key process indicator (KPI) store 518. An audit trail and tracking module 520 and an event / situation viewer 522 or other user interface are further provided. Event policy store 514 represents a storage device in which various user-defined or other policies are stored, for example, when policies are received from repository 336. Event / situation store 516 stores information regarding received events and identified situations. KPI store 518 provides information regarding the measurements captured by fabric monitoring system 110 and how the measurements are used. The audit trail and tracking module 520 tracks information about events and situations, and stores the information, including information about the events and the situation itself and how the situation is resolved. The event / status viewer 522 interfaces with the fabric monitoring system 110 and provides a user interface for viewing the results obtained by the fabric monitoring system 110.

イベント処理システム５０４は、状況を定義するグループ化及びカテゴリ化されたイベントを状況バス５２４へ提供し、状況バス５２４は、状況を出力するように構成されたキュー又は他の構造を表すことができる。状況は、ここで、宛先５２６に対して、例えば、ユーザ確認のためにコンソール、装置、及びメッセージングサービスに、並びに自動化された処理のためにサーバ及びプロセッサになど、出力される。 Event processing system 504 provides grouped and categorized events that define a situation to status bus 524, which may represent a queue or other structure configured to output the situation. . The status is now output to destination 526, eg, to a console, device, and messaging service for user confirmation, and to a server and processor for automated processing.

ここで示される複合イベント処理をサポートするための、システム１１０におけるファブリックベース監視アーキテクチャの使用は、前のエンタープライズ監視能力で見られるようなエンタープライズシステム障害アラートから遠ざかる（transitions away from）。代わって、ファブリック監視システム１１０は、エンタープライズシステムにわたりイベント／状況認識を可能にする。ここで示される例示的な実施形態において、イベント分類は、監視定義言語（例えばＤＳＬなど）の使用と隔離のためのドメインへのイベントのストリームの分離又は他のカテゴリ化とを通しての、イベント処理のセルフサービス定義を含む。ファブリック監視システム１１０内の処理モジュールは、イベントを処理して状況を作成する方法と、個々のイベントを扱う方法とを定義する。モデルは、エンタープライズシステムにわたり予期されたイベントを最も良くカテゴリ化するように任意の仕方で定義できる。例えば、モデルは、イベントの頻度、イベントのタイプ、イベントの場所若しくはローカルインパクト、又はイベントのソース（例えば、エンタープライズシステムに対する外部からの影響、例えば、ハッキング、登録されていない使用、承認されていない使用、又は同じユーザによる複数の使用など）によって、イベントを処理して状況を作成することができる。解析モデルは、同じ根本原因、同じ地理的場所、又は同じ日付／時間発生を有する状況にイベントをクラスタ化するのにさらに使用できる。 The use of the fabric-based monitoring architecture in the system 110 to support the complex event processing shown here moves away from enterprise system failure alerts as seen in previous enterprise monitoring capabilities. Instead, the fabric monitoring system 110 enables event / situation awareness across enterprise systems. In the exemplary embodiment shown here, event classification is based on event processing through the use of a monitoring definition language (such as DSL) and separation or other categorization of the stream of events into the domain for isolation. Includes self-service definitions. A processing module in the fabric monitoring system 110 defines how to process events and create a situation, and how to handle individual events. The model can be defined in any way to best categorize the expected events across the enterprise system. For example, the model may include event frequency, event type, event location or local impact, or event source (eg, external impact on enterprise system, eg hacking, unregistered use, unauthorized use) Or multiple uses by the same user, etc.) to process the event and create a situation. The analytical model can further be used to cluster events into situations with the same root cause, the same geographic location, or the same date / time occurrence.

例示的な実施形態において、ファブリック監視システム１１０により、従属した資産について、プラガブルな参照データソースに基づいて、合成イベントを表現する信号を生成することができる。例えば、ホストがダウンすることに関連付けられたイベントは、アプリケーション展開についての合成イベントの生成に至る可能性がある。さらに、例示的な実施形態において、ファブリック監視システム１１０は処理の十分な透過性を提供し、イベントが如何にして及び何故グループ化又は処理されて一又は複数の状況を作成するのかを示す。 In an exemplary embodiment, the fabric monitoring system 110 can generate a signal representing a composite event for a dependent asset based on a pluggable reference data source. For example, an event associated with a host going down can lead to the generation of a composite event for application deployment. Further, in the exemplary embodiment, fabric monitoring system 110 provides sufficient transparency of processing to indicate how and why events are grouped or processed to create one or more situations.

ファブリック監視システム１１０の使用は、十分回復力があり、ファブリック監視システム１１０は、複数の次元においてスケーラブルであり得る。例えば、ファブリック監視システム１１０において使用されるコンピューティングノード１１２の数は、負荷に基づいて調整でき、ファブリック監視システム１１０のインスタンスの数（ストライプの数）もまた負荷に基づいて調整できる。いくつかの例において、ファブリック監視システム１１０は、１分あたり最大１千イベント又はそれ以上を扱うことができる。具体的な一例として、ファブリック監視システム１１０は（平均で）、特定の設備について１日あたり約２８０万イベントを受信し、約１７０万イベントを処理し（残りは抑制される）、約１３０，０００の状況を識別することができる。 The use of the fabric monitoring system 110 is sufficiently resilient, and the fabric monitoring system 110 can be scalable in multiple dimensions. For example, the number of computing nodes 112 used in the fabric monitoring system 110 can be adjusted based on the load, and the number of instances of the fabric monitoring system 110 (number of stripes) can also be adjusted based on the load. In some examples, the fabric monitoring system 110 can handle up to 1000 events or more per minute. As a specific example, the fabric monitoring system 110 (on average) receives approximately 2.8 million events per day for a particular facility, processes approximately 1.7 million events (the rest are suppressed), and approximately 130,000. The situation can be identified.

いくつかの実施形態において、ファブリック監視システム１１０は、任意のＪＡＶＡ（登録商標）メッセージサービス（JAVA MESSAGE SERVICE）（ＪＭＳ）準拠メッセージングの使用を通してなどで、プラガブルなメッセージングアーキテクチャをサポートすることができる。ファブリック監視システム１１０は、１つ以上の参照データソースを介してイベント及びサービス強化をさらにサポートすることができ、埋め込まれたイベント相関は、発見された及びモデル化された解析方法を介して行うことができる。ファブリック監視システム１１０は、外部自動化フレームワークに容易にプラガブルであり、イベント抑制及び提出ＡＰＩをサポートし、自己定義されたＤＳＬを介してイベントポリシー定義をサポートすることができる。ファブリック監視システム１１０は、カスタム状況モデルを構築する能力、イベント及び状況を追跡する能力を提供し、エージェントにとらわれない（agent-agnostic）フレームワークを提供することができる。 In some embodiments, the fabric monitoring system 110 can support a pluggable messaging architecture, such as through the use of any JAVA MESSAGE SERVICE (JMS) compliant messaging. The fabric monitoring system 110 can further support event and service enrichment via one or more reference data sources, and embedded event correlation is performed via discovered and modeled analysis methods. Can do. The fabric monitoring system 110 is easily pluggable into an external automation framework, supports event suppression and submission APIs, and can support event policy definitions via self-defined DSLs. The fabric monitoring system 110 provides the ability to build custom situation models, the ability to track events and situations, and can provide an agent-agnostic framework.

監視定義言語の一例示的な使用が図６に示される。ドメイン固有言語が、ユーザがイベントとイベントを処理する方法とを自己記述することを可能にする。この情報は、ポリシーマネージャ３３４に提供され、ポリシーとしてリポジトリ３３６に記憶されることができる。図６に示されるように、ユーザは複数のイベントファイル６０２を定義することができ、上記イベントファイル６０２の各々がイベントの１つ以上のタイプを定義する。ユーザはさらに、複数のイベントファイル６０２を組み合わせて単一の処理モデルファイル６０４を生成することができ、処理モデルファイル６０４は、状況の発生を識別するのに使用できる。このタイプの機能性が任意数のユーザにより使用されて、関心のあるイベントを定義し、上記イベントが如何にして状況にグループ化されるかを定義することができる。 One exemplary use of the monitoring definition language is shown in FIG. Domain-specific languages allow users to self-describe events and how to handle events. This information can be provided to the policy manager 334 and stored in the repository 336 as a policy. As shown in FIG. 6, the user can define multiple event files 602, each of which defines one or more types of events. The user can also combine multiple event files 602 to generate a single processing model file 604, which can be used to identify the occurrence of a situation. This type of functionality can be used by any number of users to define the events of interest and how the events are grouped into situations.

監視定義言語の使用は、要員（personnel）のチームが、ファブリック監視システム１１０により実行される監視をより容易に管理することを可能にする。監視定義言語の使用はさらに、イベントが如何にして処理されているかとファブリック監視システム１１０のカバレッジ及び使用とに関する向上した透過性を提供する。さらに、監視定義言語の使用は、ルールについて変更を公開すること及び変更をリリースすることに関する制御を提供することができる。 Use of the monitoring definition language allows a team of personnel to more easily manage the monitoring performed by the fabric monitoring system 110. The use of the monitoring definition language further provides improved transparency regarding how events are being processed and the coverage and usage of the fabric monitoring system 110. Further, the use of a monitoring definition language can provide control over publishing changes and releasing changes for rules.

いくつかの実施形態において、監視定義言語は、イベントの定義と、上記イベントの監視が如何にして発生するかと、監視の結果として状況が如何にして識別されるかとを含むパッケージを定義するのに使用できる。下記は、監視定義言語を用いて定義できるパッケージの１つの例を表す。 In some embodiments, the monitoring definition language defines a package that includes the definition of the event, how the monitoring of the event occurs, and how the situation is identified as a result of the monitoring. Can be used. The following represents one example of a package that can be defined using a monitoring definition language.

ファブリック監視システム１１０内の様々な機能が、様々な恩恵を得ることを可能にする。例えば、ファブリック監視システム１１０をインシデント管理及び自動化プラットフォームと統合し、ポリシーを監視するシステム開発ライフサイクル（ＳＤＬＣ）サポート及び制御を提供することが可能である。さらに、ファブリック監視システム１１０を使用して、ビジネスユニットにわたる生産及び動作状況への可視性を提供し、複数のストライプによりイベントストリームを隔離することが可能である。ストライプは、ファブリック監視システム１１０の別個のインスタンスにより処理される地域又はビジネスユニットに関連付けられたイベントのセットとして定義できる。ストライプは、別個のサービスインスタンスを有するメッセージング、持続、及び処理の、その独自のインスタンスを有することができる。１つのストライプの動作は、他のストライプから独立であり得、クロスストライプ相関のためのストライプ間の通信は、合成イベントを通して発生することができる。

Various functions within the fabric monitoring system 110 allow for various benefits. For example, the fabric monitoring system 110 can be integrated with an incident management and automation platform to provide system development lifecycle (SDLC) support and control for monitoring policies. In addition, the fabric monitoring system 110 can be used to provide visibility into production and operational status across business units and to segregate event streams with multiple stripes. A stripe can be defined as a set of events associated with a region or business unit that is handled by a separate instance of the fabric monitoring system 110. A stripe can have its own instance of messaging, persistence, and processing with separate service instances. The operation of one stripe can be independent of the other stripes, and communication between stripes for cross stripe correlation can occur through synthesis events.

上記で説明されたプラットフォーム、機能、及びモジュールの各々は、任意の適切なハードウェア、又はハードウェアとソフトウェア／ファームウェア命令との組み合わせを用いて実装できることに留意する。特定の実施形態において、プラットフォーム、機能、及びモジュールの各々には、１つ以上の処理装置により実行されるソフトウェア命令が含まれる。複数の処理装置が、プラットフォーム、機能、及びモジュールの複数のインスタンスを実行することができ、処理装置は、ファブリックコンピューティングシステムの任意数のノードにわたり分散できる。 Note that each of the platforms, functions, and modules described above can be implemented using any suitable hardware or combination of hardware and software / firmware instructions. In certain embodiments, each of the platforms, functions, and modules includes software instructions that are executed by one or more processing devices. Multiple processing devices can execute multiple instances of platforms, functions, and modules, and the processing devices can be distributed across any number of nodes of the fabric computing system.

図３乃至図６は、コンピューティングシステム及びネットワークに関与するイベントを扱うファブリック監視システム１１０並びに関連した詳細の１つの例を示すが、様々な変更が図３乃至図６に対して行われてよい。例えば、図３乃至図６に示される機能的分割は単に例示である。図３乃至図６における様々なコンポーネントが組み合わせられ、さらに細分割され、再配置され、あるいは省略されてもよく、さらなるコンポーネントが特定の必要に従って追加されてもよい。 Although FIGS. 3-6 illustrate one example of a fabric monitoring system 110 and related details that handle events involving computing systems and networks, various changes may be made to FIGS. 3-6. . For example, the functional division shown in FIGS. 3-6 is merely exemplary. Various components in FIGS. 3-6 may be combined, further subdivided, rearranged, or omitted, and additional components may be added according to particular needs.

図７及び図８は、本開示によるファブリック監視システムを用いたコンピューティングシステム及びネットワークに関与するイベントを扱うシステムにおける例示的な処理フロー並びに関連した詳細を示す。具体的に、図７は、イベントを扱って状況を識別する一例示的な処理フロー７００を示し、図８は、識別された状況を扱う一例示的な処理フロー８００を示す。図７及び図８は、図３乃至図６に示されるような実装を有する図１のファブリック監視システム１１０に関して説明されるが、処理フロー７００及び８００は、任意の適切なファブリック監視システムを用いて、任意の適切なシステムにおいて使用できることに留意する。 7 and 8 illustrate an exemplary processing flow and related details in a computing system using a fabric monitoring system according to the present disclosure and a system for handling events involving the network. Specifically, FIG. 7 shows an exemplary process flow 700 for handling events and identifying situations, and FIG. 8 shows an exemplary process flow 800 for handling identified situations. Although FIGS. 7 and 8 are described with respect to the fabric monitoring system 110 of FIG. 1 having an implementation as shown in FIGS. 3-6, the process flows 700 and 800 may be used with any suitable fabric monitoring system. Note that it can be used in any suitable system.

図７に示されるように、ステップ７０２において、イベントがエンタープライズシステム内で発生し、ファブリック監視システムに提供される。このことには、例えば、イベントエージェント３１０がホスト３０２又は他のイベントソース５０２内のイベントを識別することと、イベントを監視プラットフォーム３１４又はイベントバス５０４に提供することとを含むことができる。 As shown in FIG. 7, at step 702, an event occurs within the enterprise system and is provided to the fabric monitoring system. This may include, for example, the event agent 310 identifying an event in the host 302 or other event source 502 and providing the event to the monitoring platform 314 or event bus 504.

ステップ７０４において、イベントは登録される。このことには、例えば、監視プラットフォーム３１４又はイベント処理システム５０４のイベント登録モジュール５０８が入ってくるイベントを識別することと、イベントを用いて様々なアクションを実行することとを含むことができる。イベント登録は、ここで、様々なデータを用いて発生する。例えば、イベント登録は、１つ以上のファブリック監視ポリシーから取得されるルール、例えば、イベントを関心のあるドメインにマッチさせるため及び個々のイベントを特定のイベントタイプ（事前定義されたタイプ又は導出されたタイプなど）にマッチさせるためのセルフサービスルールなどに基づくことができる。参照データが、ルールクエリ又は他のイベントカテゴリ化をさらに提供して、イベント登録を支援することができる。イベント登録の間、イベントは、ポリシー内に指定されるパターン及び値に対してマッチできる。イベントがルールを用いてマッチされた後、イベントは、該イベントがポリシーシステムからロードされた何らかの抑制基準にマッチするかを確認するようチェックされることができる。イベントがマッチする場合、イベントは、抑制間隔内であるとして注釈をつけられることができ、したがって、１つ以上の処理モデルがそのことを考慮に入れることができる。イベント登録の間、イベントは、資産名、イベント名、処理モデルタイプ、及び（事前割り当てされていない場合に）イベント一意識別子（ＵＩＤ）を割り当てられることができる。 In step 704, the event is registered. This can include, for example, identifying the incoming event by the monitoring platform 314 or the event registration module 508 of the event processing system 504 and performing various actions using the event. The event registration here occurs using various data. For example, event registration is a rule obtained from one or more fabric monitoring policies, eg, to match an event to a domain of interest and individual events to a specific event type (predefined type or derived) Based on self-service rules etc. to match). Reference data can further provide rule queries or other event categorization to assist in event registration. During event registration, events can be matched against patterns and values specified in policies. After an event is matched using a rule, the event can be checked to see if it matches any suppression criteria loaded from the policy system. If the event matches, the event can be annotated as being within the suppression interval, so one or more processing models can take that into account. During event registration, events can be assigned an asset name, event name, processing model type, and event unique identifier (UID) (if not pre-assigned).

イベントは、ステップ７０６においてディスパッチされて、ステップ７０８において評価される。このことには、例えば、コアプラットフォーム３２０又はイベント処理システム５０４のモデル評価モジュール５１０がイベントを評価して何らかの状況がイベントにより示されるかを識別することを含むことができる。コアプラットフォーム３２０又はモデル評価モジュール５１０は、様々な入力を受信して、イベントストリーム、例えば各資産名についての複数の入力などを処理して、状況を作成することができる。コアプラットフォーム３２０又はモデル評価モジュール５１０に対する入力には、ファブリックポリシールール及び他のモデル情報、モデル及び状況状態情報、並びにエンタープライズ参照データを含むことができる。コアプラットフォーム３２０又はモデル評価モジュール５１０は、潜在的に状況を形成するイベントのストリームの中の最新のものとしてイベントを処理する。いくつかの実施形態において、状況の作成は、それ自体で、イベントを定義することができる。 Events are dispatched at step 706 and evaluated at step 708. This may include, for example, the core platform 320 or the model evaluation module 510 of the event processing system 504 evaluating the event to identify if any situation is indicated by the event. The core platform 320 or model evaluation module 510 can receive various inputs and process the event stream, eg, multiple inputs for each asset name, to create a situation. Inputs to the core platform 320 or model evaluation module 510 may include fabric policy rules and other model information, model and status state information, and enterprise reference data. The core platform 320 or model evaluation module 510 processes the event as the latest in the stream of events that potentially form the situation. In some embodiments, the creation of a situation can itself define an event.

ステップ７１０において、いかなる識別された状況も出力される。このことには、例えば、コアプラットフォーム３２０又はイベント処理システム５０４のモデル評価モジュール５１０が識別された状況と任意の関連した情報とを出力することを含むことができる。 In step 710, any identified status is output. This can include, for example, the core platform 320 or the model evaluation module 510 of the event processing system 504 outputting the identified situation and any associated information.

図８に示されるように、ひとたび状況が、イベントのストリームから、適用可能なファブリックポリシーに従って識別されると、状況は出力され、ステップ８０２において状況バス分散サービスに入る。状況は、サービスバス５２４から、該状況に依存して、様々な装置又はシステム、例えば様々なイベント／状況チケッティングシステムなどにディスパッチされることができる。例えば、状況の自動化された解決が可能であり、あるいは許可される場合、該状況は、ステップ８０４において自動化エージェントにディスパッチできる。自動化エージェントは、与えられた状況を自動的に解決するよう何らかの１つ又は複数の機能を実行するアプリケーション又は他のロジックを表すことができる。状況の自動化された解決が可能でなく、あるいは許可されず、特定のチケッティングシステムが識別され、あるいは該状況に関連付けられる場合、該状況は、ステップ８０６においてチケッティング及びインシデントエージェントにディスパッチできる。チケッティング及びインシデントエージェントは、次いで、そのチケッティング及びインシデントシステムの仕様に従ってチケット又は他の通知を生成することができる。チケッティング及びインシデントエージェントは、状況のための参照識別子と状況がクローズされるべきであるとの指標とを返すことができる。 As shown in FIG. 8, once a situation is identified from a stream of events according to an applicable fabric policy, the situation is output and enters a status bus distribution service at step 802. A situation can be dispatched from the service bus 524 to various devices or systems, such as various event / situation ticketing systems, depending on the situation. For example, if an automated resolution of the situation is possible or permitted, the situation can be dispatched to the automation agent at step 804. An automation agent can represent an application or other logic that performs some one or more functions to automatically resolve a given situation. If an automated resolution of the situation is not possible or allowed, and the particular ticketing system is identified or associated with the situation, the situation can be dispatched to the ticketing and incident agent at step 806. The ticketing and incident agent can then generate a ticket or other notification according to the specifications of the ticketing and incident system. Ticketing and incident agents can return a reference identifier for the situation and an indication that the situation should be closed.

チケッティング及びインシデントエージェントが識別されない場合、状況は、ステップ８０８において軽量（lightweight）チケッティングエージェントに提供できる。軽量チケッティングエージェントは、ステップ８１０における状況記憶をサポートするチケット持続データベースを含み、１つ以上の実行サービスから入力を受信する。軽量チケッティングエージェントは、チケットをアラートに変換し、状況の介入を実践する（live intervention）ブリッジとしての役割を果たし、関連したユーザ又は利害関係者に対する電子メール、メッセージ通知、又は他の通知を生成する。この例において、軽量チケッティングエージェントは、ステップ８１２において１つ以上のメッセージングトピック（アラートなど）をアラートキャッシングサービスに提供することができ、上記アラートキャッシングサービスは、ステップ８１４において１以上のユーザに少なくとも１つのコンソールを介してアラートを通知することができる。コンソールを用いて、ユーザは、各アラートについて実行されるべき様々なアラートアクション、例えば、アラートを割り当てること又はクローズすることなどを識別することができる。アラートアクションは、ステップ８１６において１つ以上の実行サービスに提供され、上記実行サービスは、選択されたアラートアクションを実施するためのステップを行うことができる。例えば、実行サービスは、ステップ８１８において軽量チケッティングエージェントにより、及び／又はステップ８２０において別のファブリックコンピューティングコアにより実施されるべき、「イベント処理ファブリック」（ＥＰＦ）アクションを発行することができる。 If no ticketing and incident agent is identified, the situation can be provided to the lightweight ticketing agent in step 808. The lightweight ticketing agent includes a ticket persistence database that supports status storage in step 810 and receives input from one or more execution services. Lightweight ticketing agents convert tickets into alerts, act as a bridge for live intervention and generate emails, message notifications, or other notifications to relevant users or stakeholders . In this example, the lightweight ticketing agent can provide one or more messaging topics (such as alerts) to an alert caching service at step 812, which alerts at least one to one or more users at step 814. Alerts can be notified via the console. Using the console, the user can identify various alert actions to be performed for each alert, such as assigning or closing the alert. An alert action is provided to one or more execution services in step 816, which may perform steps for performing the selected alert action. For example, an execution service may issue an “event processing fabric” (EPF) action to be performed by a lightweight ticketing agent at step 818 and / or by another fabric computing core at step 820.

図７及び図８は、ファブリック監視システムを用いたコンピューティングシステム及びネットワークに関与するイベントを扱うシステムにおける処理フロー７００及び８００並びに関連した詳細の例を示すが、様々な変更が図７及び図８に対して行われてよい。例えば、各図における様々なステップが、重なり、並列に発生し、異なる順序で発生し、あるいは任意の回数発生してもよい。さらに、ここで示される処理フローは、イベントが如何にして識別され、状況にコンバートされるかと、状況が如何にして特定のファブリック監視システム内で扱われるかとに依存して、変動してよい。 7 and 8 show examples of processing flows 700 and 800 and associated details in a system that handles events involving computing systems and networks using a fabric monitoring system, although various changes may be made to FIGS. May be performed. For example, the various steps in each figure may overlap, occur in parallel, occur in different orders, or occur any number of times. Further, the process flow shown here may vary depending on how the event is identified and converted to a situation and how the situation is handled within a particular fabric monitoring system.

上記で説明された、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎを監視、診断、及び維持するファブリック監視システム１１０の使用は、コンピュータ及びネットワーク管理の分野における技術的問題に対する技術的解決策を提供する。上述されたように、ファブリック監視システム１１０により扱われるイベントは、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎ内における装置、システム、又はネットワークの現在の状態又は現在の状態の変化と、異常又は定義された条件の発生とに関連することができる。大きいエンタープライズシステムについて、イベントの数は大量で、時に１分あたり数千に達する可能性がある。このことは、要員がイベントを手動で精査及び解決し、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎ内のより深刻なセキュリティ違反又は他の問題を示し得る関連したイベントを識別することを、非常に困難又は不可能にする。 The use of the fabric monitoring system 110 described above for monitoring, diagnosing, and maintaining a computing system or network 102a-102n provides a technical solution to technical problems in the field of computer and network management. As described above, the events handled by the fabric monitoring system 110 are the current state of the device, system, or network in the computing system or network 102a-102n or a change in the current state and an abnormal or defined condition. Can be related to the occurrence of For large enterprise systems, the number of events is large and can sometimes reach several thousand per minute. This makes it very difficult for personnel to manually review and resolve events and identify related events that may indicate a more serious security breach or other problem within the computing system or network 102a-102n, or Make impossible.

ファブリック監視システム１１０は、イベントの自動化された識別、並びにイベントの自動化された分類及び関連イベントからの状況の識別をサポートする。このことは、イベントを管理し、解決されるべき状況を識別し、可能性としてさらには該状況を自動的に解決することをかなりより容易にする。とりわけ、このことは、コンピューティングシステム又はネットワーク１０２ａ〜１０２ｎがより円滑に機能するように保ち、起こる問題点を解決するのに役立つ可能性がある。さらに、上述されたように、このことは、カスタマイズ可能な仕方で、例えば、イベントと、イベントの監視が如何にして発生するかと、状況を識別するのにイベントが如何にして用いられるかとを定義することによってなどで行える。このことは、ファブリック監視システム１１０の使用における高い柔軟性を提供する。他の技術的特徴がさらに上記で提供されている。 The fabric monitoring system 110 supports automated identification of events, as well as automated classification of events and status identification from related events. This makes it much easier to manage events, identify situations to be resolved, and possibly even resolve them automatically. Among other things, this may help to keep the computing system or networks 102a-102n functioning more smoothly and solve the problems that arise. Furthermore, as mentioned above, this defines in a customizable way, for example, events, how event monitoring occurs and how events are used to identify situations. It can be done by doing. This provides great flexibility in the use of the fabric monitoring system 110. Other technical features are further provided above.

いくつかの実施形態において、本特許文献に説明される様々な機能が、コンピュータ読取可能プログラムコードから形成され、かつコンピュータ読取可能媒体において具現化されるコンピュータプログラムにより実装され、あるいはサポートされる。フレーズ「コンピュータ読取可能プログラムコード」には、ソースコード、オブジェクトコード、及び実行可能コードを含む任意タイプのコンピュータコードが含まれる。フレーズ「コンピュータ読取可能媒体」には、コンピュータによりアクセスできる任意タイプの媒体、例えば、読取専用メモリ（ＲＯＭ）、ランダムアクセスメモリ（ＲＡＭ）、ハードディスクドライブ、コンパクトディスク（ＣＤ）、デジタルビデオディスク（ＤＶＤ）、又は任意の他タイプのメモリなどが含まれる。「非一時的」コンピュータ読取可能媒体は、一時的な電気又は他の信号を運ぶ有線、無線、光学、又は他の通信リンクを除外する。非一時的コンピュータ読取可能媒体には、データが永続的に記憶できる媒体、及び、データが記憶でき、後に上書きできる媒体、例えば、書換可能光ディスク又は消去可能メモリ装置などが含まれる。 In some embodiments, the various functions described in this patent document are implemented or supported by a computer program formed from computer readable program code and embodied in a computer readable medium. The phrase “computer readable program code” includes any type of computer code including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium accessible by a computer, such as read only memory (ROM), random access memory (RAM), hard disk drive, compact disk (CD), digital video disk (DVD). Or any other type of memory. “Non-transitory” computer-readable media excludes wired, wireless, optical, or other communication links that carry temporary electrical or other signals. Non-transitory computer readable media include media that can store data permanently, and media that can store data and can be overwritten later, such as a rewritable optical disk or an erasable memory device.

本特許文献の全体を通して用いられる特定の語及びフレーズの定義を明記することは有利であり得る。用語「アプリケーション」及び「プログラム」は、適切なコンピュータコード（ソースコード、オブジェクトコード、又は実行可能コードを含む）における実装のために適合された１つ以上のコンピュータプログラム、ソフトウェアコンポーネント、命令セット、プロシージャ、ファンクション、オブジェクト、クラス、インスタンス、関連データ、又はこれらの一部分を参照する。用語「通信する」及びその派生語は、直接的通信及び間接的通信の双方を包含する。用語「含める」及び「含む」並びにその派生語は、限定なく含むことを意味する。用語「又は」は包括的であり、及び／又はを意味する。フレーズ「に関連付けられる」及びその派生語は、含む、の中に含まれる、と相互接続する、包含する、の中に包含される、〜に又は〜と接続する、〜に又は〜と結合する、と通信可能である、と協働する、インターリーブする、併置する、に近接する、〜に又は〜と結び付けられる、有する、の属性を有する、〜に対する又は〜との関係を有する、及び同様のものを意味し得る。フレーズ「の少なくとも１つ」は、アイテムのリストと共に用いられるとき、リスト化されたアイテムのうち１つ以上からなる異なる組み合わせが使用され得ることと、リスト内のアイテムが１つだけ必要とされ得ることとを意味する。例えば、「Ａ、Ｂ、及びＣ：のうち少なくとも１つ」は、下記組み合わせのうちいずれかを含む：Ａ、Ｂ、Ｃ、Ａ及びＢ、Ａ及びＣ、Ｂ及びＣ、並びに、Ａ及びＢ及びＣ。 It may be advantageous to specify the definitions of specific words and phrases used throughout this patent document. The terms “application” and “program” refer to one or more computer programs, software components, instruction sets, procedures adapted for implementation in appropriate computer code (including source code, object code, or executable code). , Function, object, class, instance, related data, or part thereof. The term “communicate” and its derivatives encompass both direct communication and indirect communication. The terms “include” and “include”, and derivatives thereof, mean including without limitation. The term “or” is inclusive, meaning and / or. The phrase “associated with” and its derivatives include, includes, includes, interconnects with, includes, includes in, connects to, connects to, or binds to. , Can communicate with, cooperate with, interleave, juxtapose with, have proximity to, have, have, have, have, or have a relationship to, or similar Can mean things. When the phrase “at least one of” is used with a list of items, different combinations of one or more of the listed items may be used and only one item in the list may be required. Means that. For example, “at least one of A, B, and C:” includes any of the following combinations: A, B, C, A and B, A and C, B and C, and A and B And C.

本特許文献における説明は、いずれか特定の要素、ステップ、又は機能が請求項範囲に含まれるべき必須の又は重大な要素であることを暗示するとして読まれるべきでない。さらに、正確な語「ミーンズフォー」又は「ステップフォー」が特定の請求項内で明示的に使用され、機能を識別する分詞句が後に続かない限り、請求項のいずれも、別記の請求項又は請求項要素のいずれかに関して米国特許法第１１２条（ｆ）（35 U.S.C.§112(f)）を行使することは意図されない。請求項内における、（これらに限られないが）「メカニズム」、「モジュール」、「デバイス」、「ユニット」、「コンポーネント」、「要素」、「メンバ」、「装置」、「マシン」、「システム」、「プロセッサ」、「処理装置」、又は「コントローラ」などの用語の使用は、当業者に知られる構造を、請求項自体の特徴によりさらに修正又は強化されたものとして参照するよう理解及び意図され、米国特許法第１１２条（ｆ）を行使することは意図されない。 The description in this patent document should not be read as implying that any particular element, step, or function is an essential or critical element to be included in the claims. Further, unless the exact word “means for” or “step for” is explicitly used within a particular claim and is followed by a participle phrase that identifies the function, any claim shall be It is not intended to enforce 35 USC 112 (f) 35 USC 112 (f) with respect to any of the claim elements. Within the claims, (but not limited to) “mechanism”, “module”, “device”, “unit”, “component”, “element”, “member”, “device”, “machine”, “ The use of terms such as “system”, “processor”, “processing device”, or “controller” is understood and referenced to refer to structures known to those skilled in the art as further modified or enhanced by the features of the claims themselves. It is intended and not intended to enforce 35 USC 112 (f).

本開示は、特定の実施形態及び一般に関連付けられた方法を説明したが、これら実施形態及び方法の改変及び並べ替えが当業者に明らかになるであろう。したがって、例示的な実施形態の上記説明は、本開示を定義又は制約しない。他の変更、代替、及び改変が、下記請求項により定義されるものとして、本開示の主旨及び範囲から逸脱することなくさらに可能である。
While this disclosure has described particular embodiments and generally associated methods, modifications and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this disclosure. Other changes, substitutions, and modifications are further possible without departing from the spirit and scope of this disclosure, as defined by the following claims.

Claims

In a fabric monitoring system, receiving information identifying the occurrence of an event in an enterprise system that includes a plurality of computing or networking systems, the event being transmitted to a computing or networking device in the computing or networking system Occurs or participates, and the event is identified using rules accessible by the fabric monitoring system;
Using the fabric monitoring system to process the information in real time to identify the occurrence of the event and assign the event to a plurality of situations, the event being one accessible by the fabric monitoring system; Steps assigned to the situation using the above processing model;
Outputting information identifying the situation;
Including methods.

The fabric monitoring system is scalable in multiple dimensions, one dimension being associated with the number of computing nodes operating within the fabric monitoring system, another dimension being associated with the number of stripes, each stripe being the fabric The method of claim 1, comprising a portion of a monitoring system or a separate instance of the fabric monitoring system.

Storing information associated with the event and situation, including information about the event and situation and information about how the situation is resolved, and providing an audit trail of the event and situation The method of claim 1.

The method of claim 1, further comprising: obtaining the rules from one or more policies, wherein at least a portion of the one or more policies are defined by at least one user using a monitoring definition language. The method described.

The one or more processing models define a method for categorizing the event and identifying the situation, wherein the one or more processing models include:
At least one user-defined model defined by at least one user; and
The method of claim 1, comprising at least one analytical model that defines one or more analytical functions that operate using the information identifying the occurrence of the event.

The information identifying the occurrence of the event is processed using a plurality of stripes, each stripe comprising a portion of the fabric monitoring system or a separate instance of the fabric monitoring system;
Sending a composite event between the stripes to support cross-stripe correlation of the event or situation;
The method of claim 1 further comprising:

Different stripes
Different assets within the computing or networking system,
Different locations where the computing or networking system is deployed,
Different deployments of hardware, software, or firmware within the computing or networking system,
7. Process events associated with at least one of different business units using the computing or networking system and different types of businesses traded using the computing or networking system. Method.

The event is
The current state of the computing or networking device within the computing or networking system;
A change in the current state of the computing or networking device within the computing or networking system;
An anomaly in the computing or networking device within the computing or networking system; and
The method of claim 1, comprising at least one of occurrence of a defined condition in the computing or networking system.

Outputting the information identifying the situation comprises:
The method of claim 1, comprising providing information identifying at least one of the situations to an automated agent that automatically resolves the at least one situation.

Obtaining the information identifying the situation comprises:
Providing information identifying at least one of the situations to a ticketing agent that generates at least one notice for personnel, wherein the at least one notice identifies the at least one situation. The method of claim 1 comprising:

A system,
A fabric monitoring system comprising a plurality of computing nodes and a plurality of communication links coupling the computing nodes, the fabric monitoring system comprising:
Receiving information identifying the occurrence of an event in an enterprise system including a plurality of computing or networking systems, wherein the event occurs or is involved in a computing or networking device in the computing or networking system The event is identified using rules accessible by the fabric monitoring system;
Processing the information in real time to identify the occurrence of the event and assigning the event to a plurality of situations, the event using the one or more processing models accessible by the fabric monitoring system; Assigned to the situation,
A system configured to output information identifying the situation.

The fabric monitoring system is scalable in multiple dimensions, one dimension being associated with the number of computing nodes operating in the fabric monitoring system, another dimension being associated with the number of stripes, and each stripe being The system of claim 11, comprising a portion of a fabric monitoring system or a separate instance of the fabric monitoring system.

The fabric monitoring system stores information associated with the event and situation, including information about the event and situation and information about how the situation is resolved, and provides an audit trail of the event and situation. The system of claim 11, further configured to provide.

A repository configured to store one or more policies including the rules, wherein at least a portion of the one or more policies are defined by at least one user using a monitoring definition language. Item 12. The system according to Item 11.

The one or more processing models define a method for categorizing the event and identifying the situation, wherein the one or more processing models include:
At least one user-defined model defined by at least one user; and
The system of claim 11, comprising at least one analysis model that defines one or more analysis functions that operate using the information identifying the occurrence of the event.

The system includes a plurality of stripes, each stripe including a portion of the fabric monitoring system or a separate instance of the fabric monitoring system;
Each stripe is configured to process at least a portion of the information identifying the occurrence of the event;
Each stripe is configured to generate and send a composite event to other stripes to support cross-stripe correlation of the event or situation.
The system of claim 11.

The system of claim 16, wherein each stripe is configured to generate at least some of the composite events for identifying a situation by the stripe.

Different stripes
Different assets within the computing or networking system,
Different locations where the computing or networking system is deployed,
Different deployments of hardware, software, or firmware within the computing or networking system,
Claims configured to process events associated with at least one of different business units using the computing or networking system and different types of businesses traded using the computing or networking system. Item 17. The system according to Item 16.

The event is
The current state of the computing or networking device within the computing or networking system;
A change in the current state of the computing or networking device within the computing or networking system;
An anomaly in the computing or networking device within the computing or networking system; and
The system of claim 11, comprising at least one of occurrence of a defined condition in the computing or networking system.

The fabric monitoring system outputs the information identifying the situation by providing information identifying at least one of the situations to an automated agent that automatically resolves the at least one situation The system of claim 11, configured as follows.

The fabric monitoring system outputs information identifying the situation by providing information identifying at least one of the situations to a ticketing agent that generates at least one notification for personnel. The system of claim 11, wherein the system is configured and the at least one notification identifies the at least one situation.

A non-transitory computer readable medium containing computer readable program code, the computer readable program code being executed by a computing node of a fabric monitoring system,
Receiving information identifying the occurrence of an event in an enterprise system including a plurality of computing or networking systems, wherein the event occurs or is involved in a computing or networking device in the computing or networking system The event is identified using rules accessible by the fabric monitoring system;
Processing the information in real time to identify the occurrence of the event and assigning the event to a plurality of situations, the event using the one or more processing models accessible by the fabric monitoring system; Assigned to the situation,
Outputting information identifying the situation;
A non-transitory computer-readable medium.