JP6868416B2

JP6868416B2 - Failure response support system

Info

Publication number: JP6868416B2
Application number: JP2017029690A
Authority: JP
Inventors: 知優志田; 優伊藤
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2017-02-21
Filing date: 2017-02-21
Publication date: 2021-05-12
Anticipated expiration: 2037-02-21
Also published as: JP2018136656A

Description

本発明は、情報処理システム等における障害対応の技術に関し、特に検知された障害に対する対応方法を担当者に提示する障害対応支援システムに適用して有効な技術に関するものである。 The present invention relates to a failure handling technique in an information processing system or the like, and is particularly effective in applying to a failure handling support system that presents a method of dealing with a detected failure to a person in charge.

例えば、各種のサーバ等を有する情報処理システムに係る運用管理業務においては、監視システム等によって障害発生の有無を監視し、障害発生を検知した場合にこれに迅速に対応することが含まれる。ここでは、例えば、監視システム等が検知した障害についてオペレータ等による一次対応で解決できない場合に、もしくは障害の内容によっては即時に、開発者やエンジニア等のその場に所在しない担当者に対して連絡がなされ、当該担当者による対応が行われる。担当者へのこのような連絡は、運用する情報処理システムによっては夜間になされる場合もある。この場合、現場に所在しない担当者は、遠隔から独力で障害対応を行わなければならない場合もある。また、顧客への連絡が必要と判断した場合には顧客への連絡も行わなければならない。 For example, in the operation management work related to an information processing system having various servers and the like, it is included to monitor the presence or absence of a failure by a monitoring system or the like and promptly respond to the detection of the failure. Here, for example, when a failure detected by a monitoring system or the like cannot be resolved by a primary response by an operator or the like, or depending on the content of the failure, the person in charge who is not present at the site such as a developer or an engineer is immediately contacted. Will be done and the person in charge will take action. Such communication to the person in charge may be made at night depending on the information processing system to be operated. In this case, the person in charge who is not located at the site may have to deal with the failure by himself / herself from a remote location. In addition, if it is determined that it is necessary to contact the customer, the customer must also be contacted.

障害対応の自動化や迅速化等に関連する技術として、例えば、特開２００５−２０２４４６号公報（特許文献１）には、業務サーバ、ネットワーク機器、業務アプリケーションが存在する環境で障害の発生を検知し、障害内容をログ情報より抽出し、障害対策情報を検索することにより、障害発生時の早期対策の実施と復旧時間の短縮を可能とする旨が記載されている。また、障害対策情報にない場合は、調査に必要なログ情報やリソース情報を編集し提供することにより、専門知識がなくても障害情報を抽出可能とする旨が記載されている。 As a technology related to automation and speeding up of failure response, for example, Japanese Patent Application Laid-Open No. 2005-202446 (Patent Document 1) detects the occurrence of a failure in an environment where a business server, a network device, and a business application exist. , It is stated that by extracting the details of the failure from the log information and searching for the failure countermeasure information, it is possible to implement early countermeasures in the event of a failure and shorten the recovery time. In addition, if it is not included in the failure countermeasure information, it is stated that the failure information can be extracted without specialized knowledge by editing and providing the log information and resource information necessary for the investigation.

また、特開２００９−２７６９２９号公報（特許文献２）には、監視対象サーバを定期的に監視し、障害を検知した場合にその前後の動作ログを採取して、障害発生のトリガーとなったエラーログを検索してエラーコード等のキーワードを抽出するとともに、当該キーワードによって事例ＤＢから類似事例を抽出して障害回避情報を取得し、これに基づいて障害回避ファイルを生成して監視対象サーバに適用する旨が記載されている。 Further, in Japanese Patent Application Laid-Open No. 2009-276929 (Patent Document 2), the monitored server is periodically monitored, and when a failure is detected, operation logs before and after the failure are collected to trigger the occurrence of the failure. Search the error log to extract keywords such as error codes, extract similar cases from the case DB using the keywords, acquire failure avoidance information, generate a failure avoidance file based on this, and use it as the monitored server. It is stated that it applies.

特開２００５−２０２４４６号公報Japanese Unexamined Patent Publication No. 2005-202446 特開２００９−２７６９２９号公報Japanese Unexamined Patent Publication No. 2009-276929

従来技術のように、障害発生を検知した場合、その障害メッセージの内容等に基づいて、過去の障害対応履歴や障害対策情報等から該当もしくは類似するものを検索し、検索結果から対応方法を取得して担当者等に提示することは行われている。 When a failure is detected as in the conventional technology, the corresponding or similar one is searched from the past failure response history, failure countermeasure information, etc. based on the content of the failure message, and the response method is obtained from the search results. And it is presented to the person in charge.

これに対し、対応方法を抽出する際に、近年適用が広がっているＡＩ（Artificial Intelligence）・機械学習を用いて、抽出の精度の向上を図ることが検討されている。このとき、機械学習エンジンや機械学習サービス（以下ではこれらを「機械学習エンジン」と総称する）と、既存のものも含む監視システムとをどのように連携させるかが課題となる。例えば、障害検知時に、その内容をどのように機械学習エンジンに自動的に入力・連携して、学習済みのモデルに基づく計算結果の出力を得るかが課題となる。また、実際に障害対応を実施した結果を学習データとしてどのタイミングでどのように機械学習エンジンに反映させ、学習モデルを更新（再学習）するかも課題となる。 On the other hand, when extracting a countermeasure method, it is being studied to improve the accuracy of extraction by using AI (Artificial Intelligence) / machine learning, which has been widely applied in recent years. At this time, the issue is how to link the machine learning engine and the machine learning service (hereinafter collectively referred to as "machine learning engine") with the monitoring system including the existing one. For example, when a failure is detected, the problem is how to automatically input and link the contents to the machine learning engine to obtain the output of the calculation result based on the trained model. Another issue is how to reflect the results of actual troubleshooting as learning data in the machine learning engine at what timing and how to update (re-learn) the learning model.

そこで本発明の目的は、機械学習エンジンと効率的に連携し、監視システム等により検知された障害に対して精度の高い対応方法を提示・推奨することを可能とする障害対応支援システムを適用することにある。 Therefore, an object of the present invention is to apply a failure response support system that efficiently cooperates with a machine learning engine and can present and recommend a highly accurate response method for a failure detected by a monitoring system or the like. There is.

本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述および添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will become apparent from the description and accompanying drawings herein.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 A brief description of typical inventions disclosed in the present application is as follows.

本発明の代表的な実施の形態による障害対応支援システムは、監視対象に発生した障害について、ユーザに対して推奨する対応内容を出力する障害対応支援システムであって、前記障害に係る障害メッセージを含む障害通知を送信する障害通知装置と、前記障害メッセージに基づいて前記対応内容を取得し、前記対応内容を含む対応内容通知を前記ユーザに対して送信する障害対応支援装置と、前記障害メッセージに対して、過去の障害対応の蓄積情報に基づいて予め機械学習により構築された分析モデルを用いて、前記対応内容を取得する機械学習装置と、を有するものである。 The failure response support system according to a typical embodiment of the present invention is a failure response support system that outputs the response content recommended to the user for a failure that has occurred in the monitored object, and outputs a failure message related to the failure. The failure notification device for transmitting the failure notification including the failure message, the failure response support device for acquiring the response content based on the failure message, and transmitting the response content notification including the response content to the user, and the failure message. On the other hand, it has a machine learning device that acquires the corresponding contents by using an analysis model constructed in advance by machine learning based on the accumulated information of past failure correspondence.

そして、前記障害通知装置は、前記障害通知を前記ユーザおよび前記障害対応支援装置に対して送信し、前記障害対応支援装置は、受信した前記障害通知から前記障害メッセージを抽出し、抽出した前記障害メッセージについて前記機械学習装置に対して前記対応内容の取得を要求し、前記機械学習装置は、取得した前記障害メッセージについて、前記障害についての前記ユーザからの顧客への連絡の要否を分類する第１の分析モデル、および／または、前記障害と類似する過去の障害対応から前記対応内容の候補を抽出する第２の分析モデルにより、前記対応内容を取得する。 Then, the failure notification device transmits the failure notification to the user and the failure response support device, and the failure response support device extracts the failure message from the received failure notification and extracts the failure. Regarding the message, the machine learning device is requested to acquire the corresponding content, and the machine learning device classifies the necessity of contacting the customer from the user regarding the failure with respect to the acquired failure message. The correspondence content is acquired by the analysis model 1 and / or the second analysis model that extracts the candidate of the correspondence content from the past failure correspondence similar to the failure.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, the effects obtained by representative ones will be briefly described as follows.

すなわち、本発明の代表的な実施の形態によれば、機械学習エンジンと効率的に連携し、監視システム等により検知された障害に対して精度の高い対応方法を提示・推奨することが可能となる。 That is, according to a typical embodiment of the present invention, it is possible to efficiently cooperate with a machine learning engine and present / recommend a highly accurate response method for a failure detected by a monitoring system or the like. Become.

本発明の一実施の形態である障害対応支援システムの構成例について概要を示した図である。It is a figure which showed the outline about the structural example of the trouble response support system which is one Embodiment of this invention. 障害メールの内容について例を示した図である。It is a figure which showed an example about the content of the trouble mail. 本発明の一実施の形態における対応内容メールの内容について例を示した図である。It is a figure which showed the example about the content of correspondence content mail in one Embodiment of this invention. 本発明の一実施の形態における分析モデルを事前に構築する処理の流れの例について概要を示した図である。It is a figure which showed the outline about the example of the process flow which constructs the analysis model in advance in one Embodiment of this invention. 本発明の一実施の形態におけるＢｏＷコーパスを作成する例について概要を示した図である。It is a figure which showed the outline about the example which creates the BoW corpus in one Embodiment of this invention. 本発明の一実施の形態におけるオンラインでの分析処理の流れの例について概要を示した図である。It is a figure which showed the outline of the example of the flow of the online analysis processing in one Embodiment of this invention. （ａ）〜（ｃ）は、本発明の一実施の形態における担当者による対応内容のフィードバックの例について概要を示した図である。FIGS. (A) to (C) are diagrams showing an outline of an example of feedback of the contents of correspondence by the person in charge in one embodiment of the present invention.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一部には原則として同一の符号を付し、その繰り返しの説明は省略する。一方で、ある図において符号を付して説明した部位について、他の図の説明の際に再度の図示はしないが同一の符号を付して言及する場合がある。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In addition, in all the drawings for explaining the embodiment, in principle, the same reference numerals are given to the same parts, and the repeated description thereof will be omitted. On the other hand, the parts described with reference numerals in one figure may be referred to with the same reference numerals in the explanation of other figures, although they are not shown again.

＜システム構成＞
図１は、本発明の一実施の形態である障害対応支援システムの構成例について概要を示した図である。本実施の形態の障害対応支援システム１は、例えば、企業等が運営する複数のＷｅｂサーバ等からなる監視対象サーバ１０における障害の発生が監視システム２０によって検知された場合に、開発者等の担当者２に対して対応内容を推奨・レコメンドすることで支援するサーバシステムである。 <System configuration>
FIG. 1 is a diagram showing an outline of a configuration example of a failure response support system according to an embodiment of the present invention. The failure response support system 1 of the present embodiment is in charge of the developer or the like when, for example, the occurrence of a failure in the monitored server 10 including a plurality of Web servers operated by a company or the like is detected by the monitoring system 20. It is a server system that supports the person 2 by recommending and recommending the contents to be dealt with.

障害対応支援システム１は、例えば、障害メールサーバ３０と、障害対応支援サーバ４０と、機械学習サーバ５０とにより構成される。障害メールサーバ３０は、監視対象サーバ１０における障害の発生を監視システム２０が検知した場合に、これを通知する電子メールを発信する機能を有するサーバシステムである。障害対応支援サーバ４０は、発生した障害に対する対応内容を担当者２に対して推奨・レコメンドする機能を有するサーバシステムである。機械学習サーバ５０は、過去の対応内容の蓄積に基づいて対応内容を抽出・決定するための分析モデルを機械学習により作成し、分析を行う機能を有するサーバシステムである。 The failure response support system 1 is composed of, for example, a failure mail server 30, a failure response support server 40, and a machine learning server 50. The failure mail server 30 is a server system having a function of transmitting an e-mail notifying the occurrence of a failure in the monitoring target server 10 when the monitoring system 20 detects the occurrence of the failure. The failure response support server 40 is a server system having a function of recommending / recommending the response contents for the occurrence of a failure to the person in charge 2. The machine learning server 50 is a server system having a function of creating an analysis model for extracting and determining a correspondence content based on the accumulation of past correspondence contents by machine learning and performing analysis.

いずれのサーバも、例えば、サーバ機器やクラウドコンピューティングサービス上に構築された仮想サーバ等により構成され、図示しないインターネットやＶＰＮ（Virtual Private Network）、ＬＡＮ（Local Area Network）等のネットワークにそれぞれ接続されている。そして、それぞれ、図示しないＣＰＵ（Central Processing Unit）により、ＨＤＤ（Hard Disk Drive）等の記録装置からメモリ上に展開したＯＳ（Operating System）やＤＢＭＳ（DataBase Management System）、Ｗｅｂサーバプログラム等のミドルウェアや、その上で稼働するソフトウェアを実行することで、障害対応支援に係る各種機能を実現する。なお、図１の例では、それぞれを独立したサーバシステムとして記載しているが、１つ以上のサーバが同一のサーバシステム上に構築されていてもよい。また、可能な場合にはサーバに代えてＰＣ（Personal Computer）等の情報処理装置を用いてもよい。 Each server is composed of, for example, a server device or a virtual server built on a cloud computing service, and is connected to a network such as the Internet, VPN (Virtual Private Network), or LAN (Local Area Network) (not shown). ing. Then, by using a CPU (Central Processing Unit) (not shown), middleware such as OS (Operating System), DBMS (DataBase Management System), and Web server program developed on the memory from a recording device such as HDD (Hard Disk Drive). By executing the software that runs on it, various functions related to troubleshooting support are realized. In the example of FIG. 1, each is described as an independent server system, but one or more servers may be constructed on the same server system. Further, if possible, an information processing device such as a PC (Personal Computer) may be used instead of the server.

障害メールサーバ３０は、監視対象サーバ１０における障害の発生を監視する監視システム２０から障害発生の通知を受けて、これを通知する電子メールである障害メール３１を発行するメールサーバである。例えば、障害が発生した監視対象サーバ１０やアプリケーション、障害の内容等に応じて、予め登録された担当者２のメールアドレス宛に、障害メッセージ等の障害内容が記載された障害メール３１を作成して送信する。通常、担当者２はこの障害メール３１を受信して、その内容に基づいて、障害の内容や程度、対応内容、顧客３への連絡の要否等の各種事項についての判断を行っている。 The failure mail server 30 is a mail server that receives a failure occurrence notification from the monitoring system 20 that monitors the occurrence of a failure in the monitoring target server 10 and issues a failure mail 31 that is an e-mail for notifying the failure occurrence. For example, a failure email 31 in which the failure content such as a failure message is described is created to the email address of the person in charge 2 registered in advance according to the monitored server 10 and the application in which the failure occurred, the content of the failure, and the like. And send. Normally, the person in charge 2 receives the failure mail 31 and makes a judgment on various matters such as the content and degree of the failure, the content of the response, and the necessity of contacting the customer 3 based on the content of the failure mail 31.

なお、本実施の形態では、通知を行う際の具体的手段として電子メールを用いるものとしているが、通知を行う手段であれば、これに加えて、もしくはこれに代えて、他の具体的手段を用いてもよい。例えば、既に普及しているＳＭＳ（Short Message Service）を用いてもよいし、ＳＭＴＰ（Simple Mail Transfer Protocol）による通知に代えて本実施の形態独自のメッセージプロトコルを実装してもよい。 In the present embodiment, e-mail is used as a specific means for giving a notification, but if it is a means for giving a notification, in addition to or instead of this, another specific means. May be used. For example, SMS (Short Message Service), which has already become widespread, may be used, or a message protocol unique to the present embodiment may be implemented instead of notification by SMTP (Simple Mail Transfer Protocol).

本実施の形態では、障害メールサーバ３０は、さらに、同内容の障害メール３１を、障害対応支援サーバ４０（すなわち、障害対応支援サーバ４０が受信可能なメールアドレス）に対しても送信する。例えば、障害メールサーバ３０が、障害メール３１を送信する宛先を同報通信（メーリングリスト）用のメールアドレスとしておき、これに障害対応支援サーバ４０が受信可能なメールアドレスを追加することで容易に実現することができる。なお、障害メール３１を送信することが可能であれば、メールサーバ機能を有することは必須ではなく、図示しない他のメールサーバや電子メールサービスを利用して障害メール３１を送信するメールクライアントであってもよい。 In the present embodiment, the failure mail server 30 further transmits the failure mail 31 having the same contents to the failure response support server 40 (that is, an email address that can be received by the failure response support server 40). For example, the failure mail server 30 sets the destination for sending the failure mail 31 as an email address for broadcast communication (mailing list), and the failure response support server 40 adds an email address that can be received to this, which is easily realized. can do. If it is possible to send the failure mail 31, it is not essential to have a mail server function, and it is a mail client that sends the failure mail 31 using another mail server or e-mail service (not shown). You may.

障害対応支援サーバ４０は、障害メールサーバ３０からの障害メール３１を受信して、その内容に基づいて、担当者２に対して対応内容を推奨・レコメンドする対応内容メール４４を送信する機能を有するサーバシステムである。担当者２への対応内容の推奨に際しては、例えば、対応内容が記載された電子メールである対応内容メール４４を担当者２に対して送信する。障害対応支援サーバ４０は、例えば、ソフトウェアとして実装された障害メッセージ抽出部４１、対応内容取得部４２、および対応内容反映部４３等の各部を有する。 The failure response support server 40 has a function of receiving the failure mail 31 from the failure mail server 30 and transmitting the response content mail 44 that recommends / recommends the response content to the person in charge 2 based on the content thereof. It is a server system. When recommending the response content to the person in charge 2, for example, the response content mail 44, which is an e-mail containing the response content, is transmitted to the person in charge 2. The failure response support server 40 includes, for example, a failure message extraction unit 41 implemented as software, a response content acquisition unit 42, a response content reflection unit 43, and the like.

障害メッセージ抽出部４１は、障害メールサーバ３０から受信した障害メール３１に対して、対応内容の分析に必要な障害メッセージの部分のみを抽出する処理を行う。図２は、一般的な障害メール３１の内容について例を示した図である。担当者２宛の障害メール３１には、例えば、定形文等からなるヘッダーやフッター、挨拶文等、障害の分析のためには不要もしくは無意味な情報が多く含まれる。図２の例では、破線枠内の部分、すなわち監視対象サーバ１０を特定する情報や障害メッセージ等の情報が記載された部分のみが必要な情報であり、その他の部分は分析のためには不要な情報である。本実施の形態では、障害メッセージ抽出部４１によりこれらの障害メッセージのみを必要部分として抽出する。これにより、機械学習サーバ５０での無用な分析を排除して分析負荷を低減するとともに、ノイズを排除して分析の精度を向上させる。 The failure message extraction unit 41 performs a process of extracting only the part of the failure message necessary for analyzing the correspondence contents with respect to the failure mail 31 received from the failure mail server 30. FIG. 2 is a diagram showing an example of the contents of a general failure mail 31. The failure mail 31 addressed to the person in charge 2 contains a lot of unnecessary or meaningless information for analyzing the failure, such as a header and a footer composed of a fixed phrase, a greeting sentence, and the like. In the example of FIG. 2, only the part inside the broken line frame, that is, the part in which the information for identifying the monitored server 10 and the information such as the failure message are described is necessary information, and the other parts are unnecessary for analysis. Information. In the present embodiment, the failure message extraction unit 41 extracts only these failure messages as necessary parts. As a result, unnecessary analysis by the machine learning server 50 is eliminated to reduce the analysis load, and noise is eliminated to improve the accuracy of analysis.

なお、障害メール３１から必要部分を抽出する手法は特に限定されず、障害メール３１の仕様に応じて適当な手法をとることができる。例えば、行番号や文字数等により定型的なヘッダーやフッター等の所定の領域を指定して、もしくは目印となる文字列やタグ等によってこれらの領域を特定して、当該領域を一律に削除することができる。逆に、これらの手法により必要部分を特定して抽出するようにしてもよい。 The method of extracting the necessary portion from the failure mail 31 is not particularly limited, and an appropriate method can be taken according to the specifications of the failure mail 31. For example, specify a predetermined area such as a standard header or footer by the line number or the number of characters, or specify these areas by a character string or a tag as a mark and delete the area uniformly. Can be done. On the contrary, the necessary part may be specified and extracted by these methods.

図１に戻り、対応内容取得部４２は、障害メッセージ抽出部４１によって障害メール３１の中の必要部分のみが抽出された情報に対して、機械学習サーバ５０によって学習済みの分析モデルを用いて対応内容を分析し、推奨・レコメンドする対応内容の候補を取得する機能を有する。機械学習サーバ５０に対する分析の依頼方法は、機械学習サーバ５０が提供する外部インタフェースの仕様に応じて適当なものを用いることができる。例えば、ＡＰＩ（Application Programming Interface）の呼び出しを行なってもよいし、実行指示の電子メールを送信してもよい。機械学習サーバ５０での分析によって取得する対応内容の候補は複数であってもよい。対応内容取得部４２は、取得した対応内容の候補を整形し、担当者２のメールアドレス宛に対応内容メール４４として通知する機能も有する。 Returning to FIG. 1, the response content acquisition unit 42 responds to the information obtained by extracting only the necessary part of the failure mail 31 by the failure message extraction unit 41 by using the analysis model learned by the machine learning server 50. It has a function to analyze the contents and obtain candidates for the corresponding contents to be recommended / recommended. As a method for requesting analysis to the machine learning server 50, an appropriate method can be used according to the specifications of the external interface provided by the machine learning server 50. For example, an API (Application Programming Interface) may be called, or an execution instruction e-mail may be sent. There may be a plurality of candidates for the corresponding contents acquired by the analysis on the machine learning server 50. The correspondence content acquisition unit 42 also has a function of shaping the acquired correspondence content candidates and notifying the person in charge 2 of the correspondence content mail 44 to the mail address of the person in charge 2.

図３は、本実施の形態における対応内容メール４４の内容について例を示した図である。図３の例では、「◆」の記号で囲まれた領域に、推奨する対応内容が記載されている。対応内容には、後述するように、（Ａ）顧客連絡の要否、および（Ｂ）ワークアラウンド（応急措置）の候補の２つの項目が含まれる。 FIG. 3 is a diagram showing an example of the content of the correspondence content mail 44 in the present embodiment. In the example of FIG. 3, the recommended correspondence content is described in the area surrounded by the symbol “◆”. As will be described later, the response includes two items: (A) necessity of customer contact and (B) workaround (first aid) candidates.

顧客連絡の要否には、機械学習サーバ５０によって算出された、連絡が必要な可能性や確からしさを示す値（例えばパーセンテージ）が含まれる。また、ワークアラウンドには、機械学習サーバ５０によって算出された、過去の類似する障害事例との類似の程度や確からしさを示す値、および当該事例においてとられた対応内容の情報が含まれる。担当者２は、ＰＣやタブレット端末、スマートフォン等の情報処理端末である担当者端末６０によって対応内容メール４４を受信し、その内容を参照して、障害対応を行う際の有益な参考情報とすることができる。 The necessity of customer contact includes a value (for example, a percentage) calculated by the machine learning server 50, which indicates the possibility or certainty that contact is necessary. In addition, the workaround includes a value calculated by the machine learning server 50 that indicates the degree and certainty of similarity with a similar failure case in the past, and information on the content of the response taken in the case. The person in charge 2 receives the correspondence content mail 44 by the person in charge terminal 60, which is an information processing terminal such as a PC, a tablet terminal, or a smartphone, and refers to the content as useful reference information when dealing with a failure. be able to.

図１に戻り、対応内容反映部４３は、担当者２による対応内容の実施の結果について、担当者２（担当者端末６０）からフィードバックの情報を取得し、その内容を新たな学習データとして、機械学習サーバ５０により再学習を行なって学習済みの分析モデルを更新する機能を有する。このとき、担当者２の作業負荷を低減させるため、担当者２は、障害対応支援サーバ４０から先に受信した対応内容メール４４に対して対応返信メール４５を返信することで、容易に結果のフィードバックを通知できるものとする。対応返信メール４５によるフィードバックの手法については後述する。 Returning to FIG. 1, the response content reflection unit 43 acquires feedback information from the person in charge 2 (person in charge terminal 60) regarding the result of the implementation of the response content by the person in charge 2, and uses the content as new learning data. It has a function of re-learning by the machine learning server 50 and updating the trained analysis model. At this time, in order to reduce the workload of the person in charge 2, the person in charge 2 easily returns the response reply mail 45 to the response content mail 44 previously received from the failure response support server 40, so that the result can be easily obtained. We shall be able to notify you of feedback. The method of feedback by the corresponding reply mail 45 will be described later.

機械学習サーバ５０は、過去の対応内容の蓄積を教師データとして、対応内容を抽出・決定するための分析モデル５３を機械学習により作成し、分析を行う機能を有するサーバシステムであり、例えば、ソフトウェアとして実装された前処理部５１、および分析部５２等の各部を有する。 The machine learning server 50 is a server system having a function of creating an analysis model 53 for extracting and determining the correspondence contents by machine learning and performing analysis by using the accumulation of the past correspondence contents as teacher data. For example, software. It has each part such as a pretreatment part 51 and an analysis part 52 implemented as.

前処理部５１は、機械学習エンジンによる分析を行う前処理として、障害内容を含むテキストデータからなる障害メッセージについて、自然言語処理を行なって障害内容を分析できる形式に正規化するクレンジング機能を有する。形態素解析等の自然言語処理を行う機能については、公知の一般に利用可能な自然言語処理エンジンやライブラリ等を適宜用いることができる。クレンジング機能の内容につては後述する。 The preprocessing unit 51 has a cleansing function as a preprocessing for analysis by a machine learning engine, which normalizes a failure message composed of text data including a failure content into a format capable of analyzing the failure content by performing natural language processing. For the function of performing natural language processing such as morphological analysis, a known and generally available natural language processing engine, library, or the like can be appropriately used. The contents of the cleansing function will be described later.

分析部５２は、過去の対応履歴の内容（前処理部５１により障害メッセージがクレンジングされたもの）に基づいて予め機械学習を行って、学習済みの分析モデル５３を作成する機能を有する。また、新たに発生した障害に係る障害メッセージの内容（前処理部５１によりクレンジングされたもの）について、分析モデル５３に基づいて対応内容の候補を取得して障害対応支援サーバ４０に応答する機能を有する。機械学習の機能については、公知の一般に利用可能な機械学習エンジンや機械学習サービス等を適宜用いることができる。 The analysis unit 52 has a function of performing machine learning in advance based on the contents of the past correspondence history (the failure message is cleansed by the preprocessing unit 51) to create the learned analysis model 53. In addition, regarding the content of the failure message related to the newly generated failure (cleansed by the preprocessing unit 51), a function of acquiring a response content candidate based on the analysis model 53 and responding to the failure response support server 40 is provided. Have. As for the machine learning function, a known and generally available machine learning engine, machine learning service, or the like can be appropriately used.

＜処理の流れ＞
図４は、本実施の形態における分析モデルを事前に構築する処理の流れの例について概要を示した図である。ここでは、機械学習サーバ５０により、過去の対応内容の蓄積に基づいて対応内容を抽出・決定するための分析モデル５３を機械学習により構築する、もしくは最新のものに更新する。当該処理は、例えば、月次のバッチ処理により、もしくは担当者２からの指示に基づいて随時実行される。 <Processing flow>
FIG. 4 is a diagram showing an outline of an example of a process flow for constructing an analysis model in advance in the present embodiment. Here, the machine learning server 50 constructs an analysis model 53 for extracting and determining the corresponding contents based on the accumulation of the past corresponding contents by machine learning, or updates the latest one. The process is executed at any time, for example, by a monthly batch process or based on an instruction from the person in charge 2.

まず、機械学習サーバ５０は、新たな学習データ、すなわち過去の対応内容の履歴情報を取得する（Ｓ０１）。取得方法は特に限定されず、例えば、障害対応支援サーバ４０等に記録された履歴情報を自動的に機械学習サーバ５０に送信してもよいし、ユーザが機械学習サーバ５０にアクセスして手動でアップロードしてもよい。 First, the machine learning server 50 acquires new learning data, that is, historical information of past correspondence contents (S01). The acquisition method is not particularly limited, and for example, the history information recorded in the failure response support server 40 or the like may be automatically transmitted to the machine learning server 50, or the user may access the machine learning server 50 manually. You may upload it.

その後、前処理部５１により、学習データに含まれる障害メッセージ（学習データに障害メッセージ以外の不要な内容が含まれている場合は、障害メッセージ部分を抽出した上で処理するものとする）についてクレンジング処理を行う。まず、障害メッセージに対して形態素解析を行い品詞に分解する（Ｓ０２）。分解した結果について、全ての品詞の語句を用いるようにしてもよいし、所定の品詞の語句に限定して用いるようにしてもよい。通常は、所定の品詞（例えば名詞）に限定した方が分析の精度が上がるものと考えられる。 After that, the preprocessing unit 51 cleanses the failure message included in the learning data (if the learning data contains unnecessary contents other than the failure message, the failure message portion is extracted and then processed). Perform processing. First, the failure message is morphologically analyzed and decomposed into part of speech (S02). For the decomposed result, all the words and phrases of the part of speech may be used, or the words and phrases of a predetermined part of speech may be used only. Usually, it is considered that the accuracy of analysis is improved by limiting to a predetermined part of speech (for example, a noun).

次に、形態素解析した結果の語句について、障害メッセージ毎にＢｏＷ（Bag of Words）コーパスを作成して数値（ベクトル）により正規化し（Ｓ０３）、さらに、分析精度を向上させるための重み付け要素として、昼夜区分の項目を追加する（Ｓ０４）。 Next, for the words and phrases as a result of the morphological analysis, a BoW (Bag of Words) corpus is created for each failure message and normalized by a numerical value (vector) (S03), and further, as a weighting element for improving the analysis accuracy. An item for day / night division is added (S04).

図５は、本実施の形態におけるＢｏＷコーパスを作成する例について概要を示した図である。図５の上段左側には、３つの障害メッセージについて形態素解析を行って品詞に分解した結果イメージの例を形態素解析済みメッセージ４１１として示している。 FIG. 5 is a diagram showing an outline of an example of creating a BoW corpus in the present embodiment. On the upper left side of FIG. 5, an example of the result image obtained by performing morphological analysis on three failure messages and decomposing them into part of speech is shown as morphologically analyzed message 411.

ここではまず、形態素解析済みメッセージ４１１に基づいて、図５の上段右側に示すような辞書４１２を作成する。辞書４１２には、例えば、各形態素解析済みメッセージ４１１に現れる全ての語句について、重複を排除した上で一意となる識別番号を付与（図５の例では、語句‘あいう’に１、語句‘かきく’に２、…等）したデータを保持している。そして、図５の中段に示すように、各障害メッセージについて、辞書４１２に登録された全ての語句の順にそれぞれの出現回数を要素として有するＢｏＷコーパス４１３（出現頻度ベクトル）を作成する。 Here, first, based on the morphologically analyzed message 411, a dictionary 412 as shown on the upper right side of FIG. 5 is created. In the dictionary 412, for example, all the words and phrases appearing in each morphologically analyzed message 411 are given unique identification numbers after eliminating duplication (in the example of FIG. 5, one word or phrase is added to the word or phrase'a'). It holds the data that has been added to the dictionary (2, ..., etc.). Then, as shown in the middle part of FIG. 5, for each failure message, a BoW corpus 413 (occurrence frequency vector) having each occurrence count as an element in the order of all the words and phrases registered in the dictionary 412 is created.

ところで、障害メッセージに対して推奨・レコメンドする対応内容に含まれる情報としては、上述したように、（Ａ）顧客連絡の要否、および（Ｂ）ワークアラウンドの候補の２つの項目がある。このうち、（Ａ）顧客連絡の要否については、障害の重要度に加えて、例えば、障害メッセージが発生したタイミングが昼であるか夜であるかで、その要否の判断基準が大きく変わり得る（夜である場合は顧客連絡をできるだけ控える）ことが経験則上分かっている。 By the way, as described above, there are two items of information included in the recommended / recommended response contents for the failure message: (A) necessity of customer contact and (B) workaround candidates. Of these, regarding (A) the necessity of customer contact, in addition to the importance of the failure, for example, the judgment criteria for the necessity changes greatly depending on whether the failure message occurs in the daytime or at night. As a rule of thumb, we know that we will get (if it is night, refrain from contacting customers as much as possible).

そこで、本実施の形態では、このような障害メッセージに直接現れない特徴項目に対して重み付けして判断することができるよう、特徴項目に所定の重み付けをした値をＢｏＷコーパス４１３に追加する。例えば、図５の下段に示すように、各ＢｏＷコーパス４１３に対して、その末尾に昼夜区分の項目の値を追加する。ここでは、例えば、障害メッセージに含まれるタイムスタンプの情報等に基づいて昼であるのか夜であるのかを判断し、該当する区分の値（１か０）に対して１０倍の重み付けをして追加していることを示している。これにより、障害発生時の昼夜の区分を、（Ａ）顧客連絡の要否の判断の際に重視するよう調整することができる。 Therefore, in the present embodiment, a predetermined weighted value is added to the BoW corpus 413 so that the feature items that do not directly appear in such a failure message can be weighted and determined. For example, as shown in the lower part of FIG. 5, the value of the item of the day / night division is added to the end of each BoW corpus 413. Here, for example, it is determined whether it is daytime or nighttime based on the information of the time stamp included in the failure message, and the value (1 or 0) of the corresponding category is weighted 10 times. Indicates that you are adding. As a result, it is possible to adjust the classification of day and night when a failure occurs so as to emphasize (A) when determining the necessity of customer contact.

図４に戻り、次に、各ＢｏＷコーパス４１３に対して、語句の出現頻度によって特徴量の評価を行うため、ＴＦ−ＩＤＦ（Term Frequency - Inverse Document Frequency）を計算する（Ｓ０５）。ＴＦ−ＩＤＦは、文章の特徴を表す指標として一般的に用いられているものであり、ある語句について、対象の文章中での出現頻度の高さを示す値と、他の文章での出現頻度の低さを示す値とを乗算して得られる数値である。この値が高い語句は、対象の文章をよく特徴付ける語句であるということができる。これを各ＢｏＷコーパス４１３中の各語句について計算することで、各ＢｏＷコーパス４１３を各障害メッセージについての特徴ベクトルに変換することができる。なお、ステップＳ０４でＢｏＷコーパス４１３に追加した昼夜区分の項目については、ＴＦ−ＩＤＦの計算の対象外とする。 Returning to FIG. 4, next, TF-IDF (Term Frequency --Inverse Document Frequency) is calculated for each BoW corpus 413 in order to evaluate the feature amount according to the frequency of appearance of words (S05). TF-IDF is generally used as an index showing the characteristics of a sentence, and is a value indicating the high frequency of appearance in the target sentence for a certain word and the frequency of appearance in another sentence. It is a numerical value obtained by multiplying the value indicating the lowness of. It can be said that a phrase having a high value is a phrase that characterizes the target sentence well. By calculating this for each word in each BoW corpus 413, each BoW corpus 413 can be converted into a feature vector for each failure message. The items of the day / night division added to the BoW corpus 413 in step S04 are excluded from the calculation of TF-IDF.

データのクレンジング（正規化）と特徴量の評価が完了すると、分析部５２により、これらのデータを教師データとして機械学習を行い、分析モデル５３を構築する（Ｓ０６）。そして、構築した分析モデル５３をファイルやデータベース等に記録して保存し（Ｓ０７）、処理を終了する。 When the data cleansing (normalization) and the evaluation of the feature quantity are completed, the analysis unit 52 performs machine learning using these data as teacher data to construct the analysis model 53 (S06). Then, the constructed analysis model 53 is recorded in a file, database, or the like and saved (S07), and the process is terminated.

推奨・レコメンドする対応内容に含まれる（Ａ）顧客連絡の要否と（Ｂ）ワークアラウンドの候補とでは、分析モデル５３の内容も分析に用いるアルゴリズムも異なる。（Ａ）顧客連絡の要否を判断する分析モデル５３は、対象の障害メッセージに対して「不要（０）／要（１）」のいずれかを判断するものであり、いわゆるクラス分類アルゴリズムが用いられる。一方、（Ｂ）ワークアラウンドの候補については、過去の類似する障害事例を抽出し、その際の対応内容を推奨・レコメンドするものであり、類似度を分析する各種のアルゴリズムを適宜用いることができる。例えば、所定の幾何学的距離の近さやベクトルの向きの近さ（なす角度の小ささ）を計算する手法、集合体の類似度を計算する手法等、様々なアルゴリズムが提案されている。 The content of the analysis model 53 and the algorithm used for the analysis differ depending on (A) the necessity of customer contact and (B) the workaround candidate included in the recommended / recommended response content. (A) The analysis model 53 for determining the necessity of customer contact determines either "unnecessary (0) / required (1)" for the target failure message, and is used by a so-called classification algorithm. Be done. On the other hand, for (B) workaround candidates, similar failure cases in the past are extracted, and the corresponding contents at that time are recommended / recommended, and various algorithms for analyzing the degree of similarity can be appropriately used. .. For example, various algorithms have been proposed, such as a method of calculating the closeness of a predetermined geometric distance and the closeness of the direction of a vector (the smallness of the angle formed), and a method of calculating the similarity of aggregates.

図６は、本実施の形態におけるオンラインでの分析処理の流れの例について概要を示した図である。ここでは、監視対象サーバ１０での障害の発生が監視システム２０により検知され、障害メールサーバ３０から障害メール３１が送信された場合に、これを受信した障害対応支援サーバ４０が、機械学習サーバ５０によって障害メッセージの分析を行い、対応内容を取得して対応内容メール４４として担当者２（担当者端末６０）に送信する。したがって、当該処理は、障害対応支援サーバ４０が障害メッセージを受信したことをトリガーとして実行される。 FIG. 6 is a diagram showing an outline of an example of the flow of the online analysis process in the present embodiment. Here, when the occurrence of a failure in the monitored server 10 is detected by the monitoring system 20 and the failure mail 31 is transmitted from the failure mail server 30, the failure response support server 40 that receives the failure mail 31 is the machine learning server 50. The failure message is analyzed, the response content is acquired, and the response content mail 44 is transmitted to the person in charge 2 (person in charge terminal 60). Therefore, the process is executed with the failure response support server 40 receiving the failure message as a trigger.

まず、障害対応支援サーバ４０の障害メッセージ抽出部４１は、障害メール３１を受信すると（Ｓ１１）、その中から図２の例に示したように分析に必要となる障害メッセージの部分のみを抽出する（Ｓ１２）。そして、抽出した障害メッセージをパラメータとして、機械学習サーバ５０の分析処理を呼び出す（Ｓ１３）。呼び出しの方法は、上述したように、機械学習サーバ５０が提供する外部インタフェースの仕様に応じて、ＡＰＩの呼び出しや実行指示の電子メールを送信等、各種の方法をとることができる。 First, when the failure message extraction unit 41 of the failure response support server 40 receives the failure mail 31 (S11), it extracts only the part of the failure message necessary for analysis as shown in the example of FIG. (S12). Then, the analysis process of the machine learning server 50 is called using the extracted failure message as a parameter (S13). As described above, various methods such as calling an API and sending an e-mail of an execution instruction can be taken according to the specifications of the external interface provided by the machine learning server 50.

機械学習サーバ５０では、取得した障害メッセージについて、図４に示したバッチ処理の場合と同様に、前処理部５１により、形態素解析を行って品詞に分解し（Ｓ１４）、ＢｏＷコーパス４１３を作成し（Ｓ１５）、昼夜区分を追加して（Ｓ１６）、ＴＦ−ＩＤＦを計算することで（Ｓ１７）、障害メッセージのクレンジング・正規化と特徴量の評価を行う。これらの処理は、図４の例におけるステップＳ０２〜Ｓ０５と基本的に同様であるため、再度の説明は省略する。 In the machine learning server 50, the acquired failure message is morphologically analyzed by the preprocessing unit 51 and decomposed into part of speech (S14) in the same manner as in the case of batch processing shown in FIG. 4, and a BoW corpus 413 is created. (S15), the day / night division is added (S16), and the TF-IDF is calculated (S17) to cleanse / normalize the failure message and evaluate the feature amount. Since these processes are basically the same as steps S02 to S05 in the example of FIG. 4, the description thereof will be omitted again.

なお、図４の例におけるステップＳ０３のＢｏＷコーパス４１３の作成処理では、図５に示すように、学習対象の教師データである障害メッセージを品詞に分解した形態素解析済みメッセージ４１１から辞書４１２を作成し、この辞書４１２に登録されている語句に基づいてＢｏＷコーパス４１３を作成している。すなわち、辞書４１２に登録されている語句は、学習対象とした障害メッセージ（形態素解析済みメッセージ４１１）に含まれている語句である。 In the process of creating the BoW corpus 413 in step S03 in the example of FIG. 4, as shown in FIG. 5, a dictionary 412 is created from the morphologically analyzed message 411 which decomposes the failure message which is the teacher data to be learned into part speech. , The BoW corpus 413 is created based on the words and phrases registered in this dictionary 412. That is, the words and phrases registered in the dictionary 412 are words and phrases included in the failure message (morphologically analyzed message 411) to be learned.

一方、図６のステップＳ１５のＢｏＷコーパス４１３の作成処理では、辞書４１２の作成（更新）は行わず、既に作成済みの辞書４１２を用いる。このとき、処理対象の障害メッセージが、既存の辞書４１２を作成したときの学習対象の教師データには含まれていない新たな障害メッセージである場合が生じ得る。すなわち、処理対象の障害メッセージに係る形態素解析済みメッセージ４１１に含まれる語句の中に、既存の辞書４１２に登録されていない新たな語句が含まれる場合がある。この場合、当該語句については、辞書４１２に登録されていないことからステップＳ１５のＢｏＷコーパス４１３の作成処理の際には考慮されない。したがって、分析の精度が低下する可能性が生じ得る。 On the other hand, in the process of creating the BoW corpus 413 in step S15 of FIG. 6, the dictionary 412 is not created (updated), and the already created dictionary 412 is used. At this time, the failure message to be processed may be a new failure message that is not included in the teacher data to be learned when the existing dictionary 412 is created. That is, the phrase included in the morphologically analyzed message 411 related to the failure message to be processed may include a new phrase that is not registered in the existing dictionary 412. In this case, since the phrase is not registered in the dictionary 412, it is not considered in the process of creating the BoW corpus 413 in step S15. Therefore, the accuracy of the analysis may be reduced.

このため、例えば、図４の例に示したバッチ処理を定期的に実施して再学習を行い、学習対象である既存の障害メッセージに係る辞書４１２およびＢｏＷコーパス４１３を最新の情報に更新しておくのが望ましい。さらに、新たな障害メッセージに係る形態素解析済みメッセージ４１１に含まれる語句の中に、辞書４１２に登録されていない語句がある場合に、当該語句についてのみ、学習対象の既存の障害メッセージおよび新たな障害メッセージの双方についてＢｏＷコーパス４１３を作成するようにしてもよい。 Therefore, for example, the batch processing shown in the example of FIG. 4 is periodically performed to perform re-learning, and the dictionary 412 and BoW corpus 413 related to the existing failure message to be learned are updated to the latest information. It is desirable to keep it. Further, when there is a phrase included in the morphologically analyzed message 411 related to the new failure message that is not registered in the dictionary 412, only the phrase is the existing failure message to be learned and the new failure. A BoW corpus 413 may be created for both messages.

その後、図４の例におけるステップＳ０６で構築された学習済みの分析モデル５３を用いて、新たな障害メッセージに係るＢｏＷコーパス４１３について分析を実施し（Ｓ１８）、分析結果を障害対応支援サーバ４０に応答する（Ｓ１９）。分析において、例えば、（Ａ）顧客連絡の要否の項目については、対象の障害メッセージが不要（０）／要（１）のいずれに分類されるのかを判断する。その際、分類結果とともにその確からしさや可能性を示す数値を出力するようにしてもよい。また、（Ｂ）ワークアラウンドの候補については、過去の障害メッセージから類似する障害事例を抽出し、その際の対応内容を取得してワークアラウンドとして出力する。その際、類似度を示す数値を併せて出力するようにしてもよい。 Then, using the learned analysis model 53 constructed in step S06 in the example of FIG. 4, the BoW corpus 413 related to the new failure message is analyzed (S18), and the analysis result is sent to the failure response support server 40. Respond (S19). In the analysis, for example, with respect to the item (A) whether or not customer contact is necessary, it is determined whether the target failure message is classified into unnecessary (0) or necessary (1). At that time, a numerical value indicating the certainty and possibility may be output together with the classification result. Further, regarding (B) workaround candidates, similar failure cases are extracted from past failure messages, and the corresponding contents at that time are acquired and output as a workaround. At that time, a numerical value indicating the degree of similarity may also be output.

障害対応支援サーバ４０の対応内容取得部４２は、機械学習サーバ５０から応答された分析結果を取得し（Ｓ２０）、データを図３の例に示したような対応内容メール４４の形に整形して（Ｓ２１）、担当者２（担当者端末６０）宛に送信し（Ｓ２２）、処理を終了する。 The response content acquisition unit 42 of the failure response support server 40 acquires the analysis result returned from the machine learning server 50 (S20), and shapes the data into the form of the response content email 44 as shown in the example of FIG. (S21), the data is transmitted to the person in charge 2 (person in charge terminal 60) (S22), and the process is completed.

担当者２は、受信した対応内容メール４４を参照して障害対応、すなわち、必要な場合の顧客３への連絡と、ワークアラウンドの実施や指示を行うことができる。ここで、対応内容メール４４に記載された対応内容は、障害対応支援サーバ４０および機械学習サーバ５０により推奨・レコメンドされたものであり、現実の状況には合致せず適切ではない場合もある。このような場合も含めて、担当者２は、実際に採った対応内容、もしくは結果的に採るべきであったと思われる対応内容を障害対応支援サーバ４０にフィードバックして、再学習させることができる。本実施の形態では、フィードバックに係る担当者２の作業負荷を低減させるため、受信した対応内容メール４４に対する返信として対応返信メール４５を送信することで容易にフィードバックできるようにする。 The person in charge 2 can deal with a failure by referring to the received correspondence content mail 44, that is, can contact the customer 3 when necessary, and carry out a workaround or give an instruction. Here, the response content described in the response content email 44 is recommended / recommended by the failure response support server 40 and the machine learning server 50, and may not match the actual situation and may not be appropriate. Including such a case, the person in charge 2 can feed back the response content actually taken or the response content that should have been taken as a result to the failure response support server 40 to relearn. .. In the present embodiment, in order to reduce the workload of the person in charge 2 related to the feedback, feedback can be easily provided by sending the response reply mail 45 as a reply to the received response content mail 44.

図７は、本実施の形態における担当者２による対応内容のフィードバックの例について概要を示した図である。図７（ａ）は、障害対応支援サーバ４０からの対応内容メール４４において（Ａ）顧客連絡の要否（図中の例では「要」）が推奨された場合の対応返信メール４５でのフィードバックの例を示している。実際に担当者２が顧客連絡を行った、もしくは行うべきであった場合（すなわち、推奨された対応内容が適切であった場合）は、対応返信メール４５として空メール（図中では「」で示す）を返信する（図中の○印）。もしくは何も返信しないものとし、障害対応支援サーバ４０側で一定時間以上対応返信メール４５を受信しなかった場合に適切であったと判断するようにしてもよい。 FIG. 7 is a diagram showing an outline of an example of feedback of the response contents by the person in charge 2 in the present embodiment. FIG. 7A shows feedback in the response reply email 45 when (A) necessity of customer contact (“required” in the example in the figure) is recommended in the response content email 44 from the failure response support server 40. An example of is shown. If the person in charge 2 actually contacted or should have contacted the customer (that is, if the recommended response content was appropriate), a blank email ("" in the figure) was used as the response reply email 45. Reply (shown) (marked with a circle in the figure). Alternatively, it is assumed that no reply is made, and it may be determined that it is appropriate when the failure response support server 40 does not receive the response reply mail 45 for a certain period of time or longer.

一方、実際には担当者２が顧客連絡を行わなかった、もしくは行うべきではなかった場合（すなわち、推奨された対応内容が不適切であった場合）は、対応返信メール４５として、その旨が識別できる単純な情報（図中では「×」）を返信する（図中の×印）。これらの対応返信メール４５を分析することにより、障害対応支援サーバ４０では、推奨した（Ａ）顧客連絡の要否の内容が適切であったか不適切であったかを把握し、蓄積しておくことができ、以降の分析における学習データとして分析モデル５３に反映させることができる。 On the other hand, if the person in charge 2 did not actually contact the customer or should not have contacted the customer (that is, if the recommended response content was inappropriate), the response reply mail 45 indicates that fact. Return simple identifiable information (“x” in the figure) (x mark in the figure). By analyzing these response reply emails 45, the failure response support server 40 can grasp and accumulate the recommended (A) content of the necessity of customer contact as appropriate or inappropriate. , Can be reflected in the analysis model 53 as training data in the subsequent analysis.

図７（ｂ）は、障害対応支援サーバ４０からの対応内容メール４４において（Ｂ）ワークアラウンドの候補（図中の例では「一時的障害なので…」）が推奨された場合の対応返信メール４５でのフィードバックの例を示している。推奨されたワークアラウンドが適切であった場合は、対応返信メール４５として空メールを返信する（図中の○印）。もしくは何も返信しない。一方、推奨されたワークアラウンドが不適切であった場合は、対応返信メール４５として、実際に担当者２が実施した、もしくは実施すべきであったワークアラウンドの内容を文章として記載して返信する。障害対応支援サーバ４０からの対応内容メール４４において（Ｂ）ワークアラウンドの候補が複数推奨されている場合は、担当者２がその中から実際にどの候補を選択したかの情報を返信するようにしてもよい。 FIG. 7B shows a response reply email 45 when (B) a workaround candidate (“because it is a temporary failure ...” in the example in the figure) is recommended in the response content email 44 from the failure response support server 40. Here is an example of feedback in. If the recommended workaround is appropriate, a blank email is returned as the corresponding reply email 45 (marked with a circle in the figure). Or do not reply anything. On the other hand, if the recommended workaround is inappropriate, the response reply mail 45 will be replied with the contents of the workaround actually carried out or should have been carried out by the person in charge 2 as a sentence. .. (B) When a plurality of workaround candidates are recommended in the response content mail 44 from the failure response support server 40, the person in charge 2 should return information on which candidate was actually selected from them. You may.

これらの対応返信メール４５を分析することにより、障害対応支援サーバ４０では、推奨した（Ｂ）ワークアラウンドの候補が適切であったか不適切であったかを把握し、蓄積しておくことができ、以降の分析における学習データとして分析モデル５３に反映させることができる。 By analyzing these response reply emails 45, the failure response support server 40 can grasp whether the recommended (B) workaround candidate was appropriate or inappropriate, and can accumulate it. It can be reflected in the analysis model 53 as training data in the analysis.

上述したように、障害対応支援サーバ４０は、例えば、定期的に機械学習サーバ５０に対して再学習を要求して、担当者２からフィードバックされた内容を分析モデル５３に反映させて最新の状態に更新する。しかし、定期的な再学習では、担当者２が不適切であったとしてフィードバックした結果が再学習されて分析モデル５３に反映されるまでに、タイミングによっては間隔が開いてしまう場合がある。この場合、例えば、同様の障害が連続して発生すると、定期的な再学習が行われるまでは、前回同様の不適切な対応内容を推奨する対応内容メール４４が繰り返し送信されてしまう状態となる。 As described above, the failure response support server 40 periodically requests the machine learning server 50 for re-learning, reflects the content fed back from the person in charge 2 in the analysis model 53, and is in the latest state. Update to. However, in the regular re-learning, there may be an interval depending on the timing before the result of feedback that the person in charge 2 is inappropriate is re-learned and reflected in the analysis model 53. In this case, for example, if similar failures occur continuously, a response content email 44 recommending inappropriate response content similar to the previous time will be repeatedly sent until periodic re-learning is performed. ..

そこで、本実施の形態では、担当者２が対応返信メール４５を返信する際に、即時の再学習を要求できるようにする。図７（ｃ）は、担当者２が、対応内容のフィードバックを行うとともに即時の再学習を要求する場合の例を示している。図示するように、例えば、対応返信メール４５において、フィードバックするワークアラウンドの内容等の記載に加えて、即時の再学習を要求する所定の記載（図中の例では「即時再学習」の文言）を追記する。所定の記載の内容は、障害対応支援サーバ４０が認識できるものであれば特に限定されない。 Therefore, in the present embodiment, when the person in charge 2 replies the corresponding reply mail 45, he / she can request immediate re-learning. FIG. 7C shows an example in which the person in charge 2 gives feedback on the correspondence and requests immediate re-learning. As shown in the figure, for example, in the response reply mail 45, in addition to the description of the content of the workaround to be fed back, a predetermined description requesting immediate re-learning (in the example in the figure, the wording of "immediate re-learning"). Is added. The predetermined contents are not particularly limited as long as they can be recognized by the failure response support server 40.

対応返信メール４５によって即時の再学習の要求を受けた障害対応支援サーバ４０の対応内容反映部４３では、機械学習サーバ５０により、図４に示したバッチ処理と同様の処理によって再学習を行い、分析モデル５３を更新することができる。このとき、学習対象となる教師データの量によっては、分析に要する時間が非常に長時間となる場合が生じ得る。 In the response content reflection unit 43 of the failure response support server 40 that received the request for immediate re-learning by the response reply mail 45, the machine learning server 50 performs re-learning by the same processing as the batch processing shown in FIG. The analysis model 53 can be updated. At this time, depending on the amount of teacher data to be learned, the time required for analysis may become very long.

そこで本実施の形態では、即時の再学習の要求を受けた際に、対応内容反映部４３が、再学習に要する時間を事前に見積もり、フィードバックした内容がいつ分析モデル５３に反映されるかの情報として反映予定メール４６により担当者２に返信する。図７の例では、反映予定の日時として表現しているが、「○○時間後」のように相対時間・経過時間により表現してもよい。担当者２は、反映予定メール４６に記載された反映予定のタイミングに係る情報を参照し、即時の再学習を実際に行わせるか否か（次回の定期的なバッチ処理での再学習まで待つか否か）を判断することができる。 Therefore, in the present embodiment, when an immediate re-learning request is received, the response content reflection unit 43 estimates the time required for re-learning in advance, and when the feedback content is reflected in the analysis model 53. Reply to the person in charge 2 by the mail 46 scheduled to be reflected as information. In the example of FIG. 7, it is expressed as the date and time to be reflected, but it may be expressed by the relative time / elapsed time such as “after XX hours”. The person in charge 2 refers to the information related to the timing of the reflection schedule described in the reflection schedule mail 46, and whether or not to actually perform the immediate re-learning (waits until the re-learning in the next periodic batch processing). Whether or not) can be determined.

再学習に要する時間の見積もり手法については特に限定されないが、例えば、各種のパラメータに基づいて所要時間を推測するモデルを構築し、当該モデルに基づいて計算することができる。当該モデルに含まれるパラメータとしては、例えば、過去の機械学習における所要時間の実績値や、機械学習サーバ５０のサーバスペック、処理負荷の状況、分析対象の教師データの件数、分析モデル５３におけるパラメータの数、分析に用いるアルゴリズムの種類等、各種のものを適宜用いることができる。また、上記のような所要時間を推測するモデルを、機械学習サーバ５０による機械学習の対象として精度を向上させるようにしてもよい。 The method for estimating the time required for re-learning is not particularly limited, but for example, a model for estimating the required time based on various parameters can be constructed, and calculation can be performed based on the model. The parameters included in the model include, for example, the actual value of the time required in the past machine learning, the server specifications of the machine learning server 50, the processing load status, the number of teacher data to be analyzed, and the parameters in the analysis model 53. Various things such as numbers and types of algorithms used for analysis can be used as appropriate. Further, the model for estimating the required time as described above may be targeted for machine learning by the machine learning server 50 to improve the accuracy.

以上に説明したように、本発明の一実施の形態である障害対応支援システム１によれば、監視対象サーバ１０において障害の発生を検知した場合に、障害メッセージに基づいて機械学習サーバ５０により推奨・レコメンドする対応内容を分析し、対応内容メール４４として担当者２に対して送信する、という一連の流れを人手を介さずに行うことができる。これにより、機械学習エンジンと効率的に連携し、監視システム２０により検知された障害に対して精度の高い対応方法を提示・推奨することが可能となる。 As described above, according to the failure response support system 1 according to the embodiment of the present invention, when the occurrence of a failure is detected in the monitored server 10, the machine learning server 50 recommends it based on the failure message. -It is possible to perform a series of steps of analyzing the recommended response content and sending it to the person in charge 2 as the response content mail 44 without human intervention. This makes it possible to efficiently cooperate with the machine learning engine and present / recommend a highly accurate response method for the failure detected by the monitoring system 20.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は上記の実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。例えば、上記の実施の形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、上記の実施の形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 Although the invention made by the present inventor has been specifically described above based on the embodiments, the present invention is not limited to the above embodiments and can be variously modified without departing from the gist thereof. Needless to say. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. Further, it is possible to add / delete / replace a part of the configuration of the above-described embodiment with another configuration.

例えば、上記の実施の形態では、障害メールサーバ３０からの障害メール３１を障害対応支援サーバ４０および担当者２に送信し、顧客３への連絡の要否を障害対応支援サーバ４０で判別する構成であったが、障害発生時の初動では、障害メールサーバ３０からの障害メール３１は障害対応支援サーバ４０のみに送信し、担当者２への連絡の要否についても障害対応支援サーバ４０で判別する構成としてもよい。 For example, in the above embodiment, the failure mail 31 from the failure mail server 30 is transmitted to the failure response support server 40 and the person in charge 2, and the failure response support server 40 determines whether or not to contact the customer 3. However, in the initial action when a failure occurs, the failure mail 31 from the failure mail server 30 is sent only to the failure response support server 40, and the failure response support server 40 also determines whether or not it is necessary to contact the person in charge 2. It may be configured to be used.

このとき、機械学習サーバ５０において、担当者２への連絡要否を判断するための分析モデル５３と、顧客３への連絡要否を判断するための分析モデル５３とは、同じであってもよいが、異なることが望ましい。通常は、障害の重要度が多少低いものであっても担当者２はその内容を確認すべきであり、顧客３への連絡は重要度が高いもののみに制限すべきであるという基準が採用されることが多いからである。したがって、このような基準に対応すべく、担当者２への連絡要否を判断するための分析モデル５３と、顧客３への連絡要否を判断するための分析モデル５３を別個に構成することが望ましい。 At this time, in the machine learning server 50, even if the analysis model 53 for determining the necessity of contacting the person in charge 2 and the analysis model 53 for determining the necessity of contacting the customer 3 are the same. Good, but preferably different. Normally, the standard is adopted that the person in charge 2 should check the contents of the disability even if the disability is a little less important, and the contact with the customer 3 should be limited to the ones of higher importance. This is because it is often done. Therefore, in order to correspond to such a standard, the analysis model 53 for determining the necessity of contacting the person in charge 2 and the analysis model 53 for determining the necessity of contacting the customer 3 should be separately configured. Is desirable.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部または全部を、例えば、集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリやハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、またはＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 Further, each of the above configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be placed in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

本発明は、検知された障害に対する対応方法を担当者に提示する障害対応支援システムに利用可能である。 The present invention can be used in a failure response support system that presents a method for responding to a detected failure to a person in charge.

１…障害対応支援システム、２…担当者、３…顧客、
１０…監視対象サーバ、
２０…監視システム、
３０…障害メールサーバ、３１…障害メール、
４０…障害対応支援サーバ、４１…障害メッセージ抽出部、４２…対応内容取得部、４３…対応内容反映部、４４…対応内容メール、４５…対応返信メール、４６…反映予定メール、
５０…機械学習サーバ、５１…前処理部、５２…分析部、５３…分析モデル、
６０…担当者端末 1 ... Failure response support system, 2 ... Person in charge, 3 ... Customer,
10 ... Monitored server,
20 ... Monitoring system,
30 ... Failure mail server, 31 ... Failure mail,
40 ... Failure response support server, 41 ... Failure message extraction unit, 42 ... Response content acquisition unit, 43 ... Response content reflection unit, 44 ... Response content email, 45 ... Response reply email, 46 ... Reflection schedule email,
50 ... Machine learning server, 51 ... Preprocessing unit, 52 ... Analysis unit, 53 ... Analysis model,
60 ... Person in charge terminal

Claims

It is a failure response support system that outputs the response content recommended to the user for the failure that occurred in the monitored target.
A failure notification device that transmits a failure notification including a failure message related to the failure, and
A failure response support device that acquires the response content based on the failure message and transmits a response content notification including the response content to the user.
It has a machine learning device for acquiring the corresponding contents by using an analysis model constructed in advance by machine learning based on the accumulated information of the past trouble correspondence for the trouble message.
The failure notification device transmits the failure notification to the user and the failure response support device.
The failure response support device extracts the failure message from the received failure notification, and requests the machine learning device to acquire the corresponding content for the extracted failure message.
With respect to the acquired failure message, the machine learning device classifies the necessity of contacting the customer from the user regarding the failure from the first analysis model and the past failure response similar to the failure. the second analysis model for extracting the candidate, to obtain the corresponding contents,
In the trouble response support device, as a reply to the response content notification from the user, feedback information as to whether or not the necessity of contacting the customer included in the response content notification was appropriate is described. A failure response support system that receives a response reply notification and uses the feedback information as learning data for re-learning in the machine learning device.

In the failure response support system according to claim 1,
The machine learning device decomposes existing failure messages included in the accumulated information of past failure correspondence into words and phrases for each part, creates a dictionary consisting of the words and phrases for some or all of the part words, and prepares a dictionary consisting of the words and phrases for each existing part. For the failure message, a first appearance frequency vector having the appearance frequency as an element is created for the word / phrase included in the dictionary, and a feature related to the appearance frequency of the word / phrase with respect to the first appearance frequency vector. A failure handling support system that calculates a quantity, generates a first feature vector, and constructs the analysis model using the first feature vector as learning data in machine learning.

In the failure response support system according to claim 2,
The machine learning device decomposes the acquired failure message into words and phrases for each part of speech, creates a second appearance frequency vector for the words and phrases included in the dictionary, and creates a second appearance frequency vector having the appearance frequency as an element. For the second appearance frequency vector, the feature amount related to the appearance frequency of the phrase is calculated to generate the second feature vector, and the analysis model is applied to the second feature vector to correspond to the second feature vector. A failure response support system that acquires the contents.

In the failure response support system according to claim 3,
The machine learning device adds, as an element, a weighted index related to a predetermined item related to the first analytical model to the first occurrence frequency vector and the second occurrence frequency vector. Correspondence support system.

In the failure response support system according to claim 1,
When the response reply notification received from the user includes an instruction to immediately relearn the feedback information included in the response reply notification in the machine learning device. A failure response support system that estimates the time required for re-learning before re-learning by the machine learning device, and sends a reflection schedule notification including information related to the scheduled completion time of the re-learning to the user.

It is a failure response support system that outputs the response content recommended to the user for the failure that occurred in the monitored target.
A failure notification device that transmits a failure notification including a failure message related to the failure, and
A failure response support device that acquires the response content based on the failure message and transmits a response content notification including the response content to the user.
It has a machine learning device for acquiring the corresponding contents by using an analysis model constructed in advance by machine learning based on the accumulated information of the past trouble correspondence for the trouble message.
The failure notification device transmits the failure notification to the user and the failure response support device.
The failure response support device extracts the failure message from the received failure notification, and requests the machine learning device to acquire the corresponding content for the extracted failure message.
The machine learning device is a first analysis model that classifies whether or not the user needs to contact the customer about the failure with respect to the acquired failure message, and / or a past failure response similar to the failure. The corresponding content is acquired by the second analysis model that extracts the candidate of the corresponding content from the above .
The machine learning device decomposes existing failure messages included in the accumulated information of past failure correspondence into words and phrases for each part of speech, creates a dictionary consisting of the words and phrases for some or all part of speech, and creates a dictionary consisting of the words and phrases for each existing part of speech. Regarding the failure message, a first appearance frequency vector having the appearance frequency as an element is created for the word / phrase included in the dictionary, and a feature related to the appearance frequency of the word / phrase with respect to the first appearance frequency vector. The quantity is calculated to generate a first feature vector, and the analysis model is constructed using the first feature vector as training data in machine learning.
The machine learning device decomposes the acquired failure message into words and phrases for each part of speech, creates a second appearance frequency vector for the words and phrases included in the dictionary, and creates a second appearance frequency vector having the appearance frequency as an element. For the second appearance frequency vector, the feature amount related to the appearance frequency of the phrase is calculated to generate the second feature vector, and the analysis model is applied to the second feature vector to correspond to the second feature vector. Get the content,
The machine learning device adds, as an element, a weighted index related to a predetermined item related to the first analysis model to the first occurrence frequency vector and the second appearance frequency vector.
Failure response support system.

It is a failure response support system that outputs the response content recommended to the user for the failure that occurred in the monitored target.
A failure notification device that transmits a failure notification including a failure message related to the failure, and
A failure response support device that acquires the response content based on the failure message and transmits a response content notification including the response content to the user.
It has a machine learning device for acquiring the corresponding contents by using an analysis model constructed in advance by machine learning based on the accumulated information of the past trouble correspondence for the trouble message.
The failure notification device transmits the failure notification to the user and the failure response support device.
The failure response support device extracts the failure message from the received failure notification, and requests the machine learning device to acquire the corresponding content for the extracted failure message.
The machine learning device is a first analysis model that classifies whether or not the user needs to contact the customer about the failure with respect to the acquired failure message, and / or a past failure response similar to the failure. The corresponding content is acquired by the second analysis model that extracts the candidate of the corresponding content from the above .
The failure response support device receives from the user a response reply notification in which feedback information about the response content included in the response content notification is described as a reply to the response content notification, and receives the feedback information. , As learning data when re-learning in the machine learning device
When the response reply notification received from the user includes an instruction to immediately relearn the feedback information included in the response reply notification in the machine learning device. Before re-learning is performed by the machine learning device, the time required for re-learning is estimated, and a reflection schedule notification including information related to the scheduled completion time of re-learning is transmitted to the user.
Failure response support system.