JP2021022264A

JP2021022264A - Text data analysis system, text data analysis method, and fault response recommend system

Info

Publication number: JP2021022264A
Application number: JP2019139483A
Authority: JP
Inventors: 絵里滝川; Eri Takigawa; 愛子細包; Aiko Hosozutsumi; 賢佑追立; Kensuke Oitate; 程張; Cheng Zhang; 太郎石川; Taro Ishikawa; 敬之若山; Noriyuki Wakayama
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2021-02-18
Anticipated expiration: 2039-07-30
Also published as: JP7118037B2

Abstract

To supply a user with information that contributes to efficiently solving a problem and improve the efficiency of solving a problem irrespective of experience.SOLUTION: A text data analysis system 100 for processing the text data of a plurality of documents includes an analysis server 120 comprising: an item extraction unit for extracting a "notification" and a "confirmation" item previously defined as text data in a document having a specific meaning from the text data of the plurality of documents; an item text classification unit for classifying the plurality of text data extracted as belonging to the same item into a plurality of categories; and a report analysis information creation unit for associating the categories into which the extracted text data was classified with the document from which the text data was extracted.SELECTED DRAWING: Figure 1

Description

本発明は、テキストデータを解析する技術に関する。 The present invention relates to a technique for analyzing text data.

社会活動においては、文章によって報告や説明がされた書類が作成されている。現代において、多くの書類はテキストデータとして記憶装置に蓄積されている。蓄積されたテキストデータを分析すれば、多くの知見を得ることが期待される。 In social activities, documents with written reports and explanations are created. In modern times, many documents are stored in storage devices as text data. It is expected that a lot of knowledge will be obtained by analyzing the accumulated text data.

テキストの文字列を対象としたデータマイニングは、テキストマイニングとも呼ばれる。テキストマイニングは、通常の文章からなるデータを単語や文節で区切り、それらの出現の頻度や共出現の相関、出現傾向、時系列などを解析することで有用な情報を取り出す、テキストデータの分析方法である。近年は人工知能を用いて分析することも行われている。 Data mining for text strings is also called text mining. Text mining is a text data analysis method that extracts useful information by dividing data consisting of ordinary sentences into words and phrases and analyzing the frequency of their appearance, the correlation of co-appearance, the tendency of appearance, and the time series. Is. In recent years, analysis using artificial intelligence has also been carried out.

例えば、特許文献１には、パターンを定義することなくテキスト情報を分析するシステムについて開示されている。 For example, Patent Document 1 discloses a system for analyzing text information without defining a pattern.

特開2018−18254号公報JP-A-2018-18254

テキストデータ化されている文書の中で代表的なものは報告書である。報告書は、業務の経緯や結果について述べた文書であり、多くの企業や官公庁において日常的に使用されている。 A typical document that has been converted into text data is a report. The report is a document that describes the background and results of the work, and is used daily by many companies and government offices.

報告書には様々な種類があるが、例えば「保守報告書」や「故障報告書」等と呼ばれるものでは、これらのテキストデータを解析することにより、効率的な問題解決に寄与する情報を、保守や故障を行う作業員に与えることで、作業員の経験の度合いによらず、故障等の問題が生じた場合の作業員の作業の効率性や精度を向上させることが期待されている。 There are various types of reports. For example, in what is called a "maintenance report" or "fault report", information that contributes to efficient problem solving can be obtained by analyzing these text data. By giving it to workers who perform maintenance or breakdowns, it is expected to improve the efficiency and accuracy of workers' work when problems such as breakdowns occur, regardless of the degree of experience of the workers.

しかしながら、報告書のテキストデータは、あくまで「報告」という目的のために作成されているため、他の目的に転用し難いという課題があった。 However, since the text data of the report is created only for the purpose of "reporting", there is a problem that it is difficult to divert it to other purposes.

例えば、「保守報告書」や「故障報告書」は、作業員の故障現場で臨場の様子を正確に報告する必要があるため、作業内容が時系列に記載されている場合が多い。そのため、この作業が時系列に記載されているテキストデータ上から、「効率的な問題解決に寄与する情報」を判別、抽出することは困難、ひいては、故障等の問題が生じた場合の作業員の作業の効率性や精度を向上させることが困難であるという課題があった。 For example, a "maintenance report" or a "fault report" is often described in chronological order because it is necessary to accurately report the situation of the worker at the failure site. Therefore, it is difficult to identify and extract "information that contributes to efficient problem solving" from the text data in which this work is described in chronological order, and by extension, workers when problems such as breakdowns occur. There was a problem that it was difficult to improve the efficiency and accuracy of the work.

本発明の好ましい一側面は、複数の文書のテキストデータを処理する、テキストデータ解析システムである。このシステムでは、複数の文書のテキストデータから、文書中の特定の意味を持つテキストデータとして予め定義された「通知」と「確認」の項目を抽出する項目抽出部と、同一の項目として抽出された複数のテキストデータを、複数のカテゴリに分類する項目分類部と、抽出されたテキストデータが分類されたカテゴリを、当該テキストデータが抽出された文書に対応付ける報告書分析情報作成部と、を備える。 A preferred aspect of the present invention is a text data analysis system that processes text data of a plurality of documents. In this system, it is extracted as the same item as the item extraction unit that extracts the items of "notification" and "confirmation" that are defined in advance as text data having a specific meaning in the document from the text data of multiple documents. It is provided with an item classification unit that classifies a plurality of text data into a plurality of categories, and a report analysis information creation unit that associates the categories in which the extracted text data is classified with the document from which the text data is extracted. ..

本発明の好ましい他の一側面は、複数の文書のテキストデータを処理する、テキストデータ解析方法である。この方法では、複数の文書のテキストデータから、文書中の特定の意味を持つテキストデータとして予め定義された項目を抽出し、同一の項目として抽出された複数のテキストデータを、複数のカテゴリに分類し、抽出されたテキストデータが分類されたカテゴリを、当該テキストデータが抽出された文書に対応付ける。 Another preferred aspect of the present invention is a text data analysis method that processes text data of a plurality of documents. In this method, items defined in advance as text data having a specific meaning in a document are extracted from the text data of a plurality of documents, and the plurality of text data extracted as the same item are classified into a plurality of categories. Then, the category in which the extracted text data is classified is associated with the document in which the text data is extracted.

本発明の好ましい他の一側面は、故障に関する内容を含む複数の文書のテキストデータを処理し、故障への対応を提案する故障対応リコメンドシステムであって、複数の文書のテキストデータから、文書中の特定の意味を持つテキストデータとして予め定義された「受動情報」と「能動情報」の項目を抽出する項目抽出部と、同一の項目として抽出された複数のテキストデータを、複数のカテゴリに分類する項目分類部と、抽出されたテキストデータが分類されたカテゴリを、当該テキストデータが抽出された文書に対応付ける報告書分析情報作成部と、を備え、報告書分析情報作成部の分析結果に基づいた故障対応リコメンドを出力する、故障対応リコメンドシステムである。 Another preferred aspect of the present invention is a failure response recommendation system that processes text data of a plurality of documents including contents related to a failure and proposes a response to the failure, from the text data of the plurality of documents in the document. The item extraction unit that extracts the items of "passive information" and "active information" that are defined in advance as text data with a specific meaning of, and multiple text data extracted as the same item are classified into multiple categories. Based on the analysis result of the report analysis information creation unit, it is provided with an item classification unit and a report analysis information creation unit that associates the categories in which the extracted text data is classified with the document from which the text data is extracted. It is a failure response recommendation system that outputs failure response recommendations.

効率的な問題解決に寄与する情報をユーザに与え、経験の有無にかかわらず、問題解決の効率を向上させることができる。 It is possible to give the user information that contributes to efficient problem solving and improve the efficiency of problem solving regardless of experience.

実施例のテキストデータ解析システムの構成ブロック図。The block diagram of the text data analysis system of an Example. 実施例のシステムの全体フロー図。Overall flow diagram of the system of the embodiment. 原因の抽出と分類の処理Ｓ２の詳細なフロー図。Detailed flow chart of process S2 for extraction and classification of causes. 原因カテゴリ付報告書ＤＢ１１２のデータ構造例を示す表図。The figure which shows the data structure example of the report DB112 with a cause category. 他の項目の抽出の処理Ｓ３の詳細なフロー図。The detailed flow chart of the process S3 of the extraction of other items. 項目タグ付報告書ＤＢ１１３のデータ構造例を示す表図。The figure which shows the data structure example of the report DB113 with an item tag. 他の各項目のグルーピング処理Ｓ４の詳細なフロー図。A detailed flow chart of the grouping process S4 for each of the other items. 項目タグ分類付報告書ＤＢ１１５のデータ構造例を示す表図。The figure which shows the data structure example of the report DB115 with item tag classification. 原因推定モデル作成処理Ｓ５の詳細なフロー図。A detailed flow chart of the cause estimation model creation process S5. 報告書分析情報ＤＢ１１６のデータ構造例を示す表図。The figure which shows the data structure example of the report analysis information DB116. 原因推定モデルの運用処理Ｓ６の詳細なフロー図。The detailed flow chart of the operation process S6 of the cause estimation model. 決定木で構成された原因推定モデル１３４の例を示す概念図。The conceptual diagram which shows the example of the cause estimation model 134 composed of a decision tree. 図１のシステムを用いた他の全体フローの例を示すフロー図。The flow diagram which shows the example of another whole flow using the system of FIG. 報告書分析情報ＤＢ１１６のデータを一覧形式にした例を示す表図。The figure which shows the example which made the data of the report analysis information DB116 into a list format. 図１のシステムを用いた他の全体フローの例を示すフロー図。The flow diagram which shows the example of another whole flow using the system of FIG.

実施の形態について、図面を用いて詳細に説明する。ただし、本発明は以下に示す実施の形態の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 The embodiment will be described in detail with reference to the drawings. However, the present invention is not construed as being limited to the description of the embodiments shown below. It is easily understood by those skilled in the art that a specific configuration thereof can be changed without departing from the idea or purpose of the present invention.

以下に説明する発明の構成において、同一部分又は同様な機能を有する部分には同一の符号を異なる図面間で共通して用い、重複する説明は省略することがある。 In the configuration of the invention described below, the same reference numerals may be used in common among different drawings for the same parts or parts having similar functions, and duplicate description may be omitted.

同一あるいは同様な機能を有する要素が複数ある場合には、同一の符号に異なる添字を付して説明する場合がある。ただし、複数の要素を区別する必要がない場合には、添字を省略して説明する場合がある。 When there are a plurality of elements having the same or similar functions, they may be described by adding different subscripts to the same code. However, if it is not necessary to distinguish between a plurality of elements, the subscript may be omitted for explanation.

本明細書等における「第１」、「第２」、「第３」などの表記は、構成要素を識別するために付するものであり、必ずしも、数、順序、もしくはその内容を限定するものではない。また、構成要素の識別のための番号は文脈毎に用いられ、一つの文脈で用いた番号が、他の文脈で必ずしも同一の構成を示すとは限らない。また、ある番号で識別された構成要素が、他の番号で識別された構成要素の機能を兼ねることを妨げるものではない。 Notations such as "first", "second", and "third" in the present specification and the like are attached to identify the components, and do not necessarily limit the number, order, or contents thereof. is not it. In addition, numbers for identifying components are used for each context, and numbers used in one context do not always indicate the same composition in other contexts. Further, it does not prevent the component identified by a certain number from having the function of the component identified by another number.

図面等において示す各構成の位置、大きさ、形状、範囲などは、発明の理解を容易にするため、実際の位置、大きさ、形状、範囲などを表していない場合がある。このため、本発明は、必ずしも、図面等に開示された位置、大きさ、形状、範囲などに限定されない。 The position, size, shape, range, etc. of each configuration shown in the drawings and the like may not represent the actual position, size, shape, range, etc. in order to facilitate understanding of the invention. Therefore, the present invention is not necessarily limited to the position, size, shape, range, etc. disclosed in the drawings and the like.

本明細書で引用した刊行物、特許および特許出願は、そのまま本明細書の説明の一部を構成する。 The publications, patents and patent applications cited herein form part of the description herein.

本明細書において単数形で表される構成要素は、特段文脈で明らかに示されない限り、複数形を含むものとする。 Components represented in the singular form herein shall include the plural form unless explicitly stated in the context.

＜１．システム構成＞
図１に、実施例のテキストデータ解析システムの構成ブロックを示す。本実施例のシステムは、基本的に入力部、出力部、制御部、記憶部からなる情報処理装置で構成される。情報処理装置の例はタブレット等の端末やサーバである。 <1. System configuration>
FIG. 1 shows a constituent block of the text data analysis system of the embodiment. The system of this embodiment is basically composed of an information processing device including an input unit, an output unit, a control unit, and a storage unit. Examples of information processing devices are terminals such as tablets and servers.

本実施例のテキストデータ解析システム１００は、それぞれが、情報処理装置で構成されるデータ蓄積サーバ１１０、分析サーバ１２０、分析モデル格納サーバ１３０、ユーザ端末１４０からなる。これらは、互いに有線あるいは無線のネットワークで接続されており、互いに情報処理のためのソフトウエア資源を利用することができる。 The text data analysis system 100 of this embodiment includes a data storage server 110, an analysis server 120, an analysis model storage server 130, and a user terminal 140, each of which is composed of an information processing device. These are connected to each other by a wired or wireless network, and software resources for information processing can be used with each other.

データ蓄積サーバ１１０は、主にデータを管理するサーバである。この例では、報告書データベース（ＤＢ）１１１、原因カテゴリ付報告書ＤＢ１１２、項目タグ付き報告書ＤＢ１１３、項目分類リストＤＢ１１４、項目タグ分類付報告書ＤＢ１１５、報告書分析情報ＤＢ１１６が格納される。報告書データベース（ＤＢ）１１１には、前述した報告書のテキストデータが格納される。 The data storage server 110 is a server that mainly manages data. In this example, the report database (DB) 111, the report DB 112 with the cause category, the report DB 113 with the item tag, the item classification list DB 114, the report DB 115 with the item tag classification, and the report analysis information DB 116 are stored. The report database (DB) 111 stores the text data of the above-mentioned report.

分析サーバ１２０は、主にデータを解析するサーバであり、また、分析モデルの学習を管理する。この例では、モデル生成部１２１、原因カテゴリ抽出部１２２、項目抽出部１２３、項目テキスト分類部１２４、報告書分析情報作成部１２５、解析部１２６が機能ブロックとして存在する。 The analysis server 120 is a server that mainly analyzes data, and also manages learning of an analysis model. In this example, the model generation unit 121, the cause category extraction unit 122, the item extraction unit 123, the item text classification unit 124, the report analysis information creation unit 125, and the analysis unit 126 exist as functional blocks.

本実施例では、計算や制御等の機能は、各情報処理装置の記憶装置に格納されたプログラムが処理装置によって実行されることで、定められた処理を他のハードウエアと協働して実現される。計算機などが実行するプログラム、その機能、あるいはその機能を実現する手段を、「機能」、「手段」、「部」、「ユニット」、「モジュール」等と呼ぶ場合がある。図１では、サーバなどの情報処理装置の機能を機能ブロックとして示しており、周知構成である入力部、出力部、制御部、記憶部は省略している。 In this embodiment, functions such as calculation and control are realized in cooperation with other hardware by executing a program stored in the storage device of each information processing device by the processing device. Will be done. A program executed by a computer or the like, its function, or a means for realizing the function may be referred to as a "function", a "means", a "part", a "unit", a "module", or the like. In FIG. 1, the functions of an information processing device such as a server are shown as functional blocks, and the well-known input unit, output unit, control unit, and storage unit are omitted.

分析モデル格納サーバ１３０は、分析サーバ１２０が利用する各種分析モデルを格納する。この例では、原因抽出モデル１３１、項目抽出モデル１３２、項目グループ化部１３３、原因推定モデル１３４である。 The analysis model storage server 130 stores various analysis models used by the analysis server 120. In this example, the cause extraction model 131, the item extraction model 132, the item grouping unit 133, and the cause estimation model 134.

ユーザ端末１４０は、ユーザが直接操作するパーソナルコンピュータや携帯端末である。ユーザ端末の機能である制御部１４１は、ネットワークを介して、データ蓄積サーバ１１０、分析サーバ１２０、分析モデル格納サーバ１３０に指示を送り、これらのハードウエア、ソフトウエア、およびデータを利用可能である。
以上で列挙された構成要素の詳細は、この後の実施例で詳細に説明される。 The user terminal 140 is a personal computer or a mobile terminal that the user directly operates. The control unit 141, which is a function of the user terminal, sends an instruction to the data storage server 110, the analysis server 120, and the analysis model storage server 130 via the network, and these hardware, software, and data can be used. ..
The details of the components listed above will be described in detail in subsequent examples.

本実施例で処理されるテキストデータは、ユーザがキーボードから入力したものでもよいし、音声認識によりテキスト化されたものでもよいし、チャットボットなどが自動的に収集したものでもよい。すなわち収集方法を問わない
本実施例で説明される各種のＤＢのデータは、基本的に文書番号によりクロスリファレンスが可能である。よって、複数のＤＢであってもこれらを合成することが可能であり、逆に単一のＤＢを複数に分割することも可能である。 The text data processed in this embodiment may be input by the user from the keyboard, may be converted into text by voice recognition, or may be automatically collected by a chatbot or the like. That is, the data of various DBs described in this embodiment regardless of the collection method can basically be cross-referenced by the document number. Therefore, it is possible to synthesize these even if there are a plurality of DBs, and conversely, it is also possible to divide a single DB into a plurality of DBs.

以上の構成は、図１で示した構成に限らず、単体のコンピュータで構成してもよいし、あるいは、入力装置、出力装置、処理装置、記憶装置の任意の部分が、ネットワークで接続された他のコンピュータで構成されてもよい。これらのサーバは物理サーバのみならず、クラウドコンピューティングの形態でもよい。 The above configuration is not limited to the configuration shown in FIG. 1, and may be configured by a single computer, or any part of the input device, output device, processing device, and storage device is connected by a network. It may be composed of other computers. These servers are not limited to physical servers, but may be in the form of cloud computing.

本実施例中、ソフトウエアで構成した機能と同等の機能は、ＦＰＧＡ（Field Programmable Gate Array）、ＡＳＩＣ（Application Specific Integrated Circuit）などのハードウエアでも実現できる。そのような態様も本願発明の範囲に含まれる。 In this embodiment, the same function as the function configured by the software can be realized by the hardware such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit). Such aspects are also included in the scope of the present invention.

＜２．全体フロー＞
図２は本システムの全体フローを示している。ここでは概略だけを述べ、詳細は後の項目で説明する。 <2. Overall flow>
FIG. 2 shows the overall flow of this system. Only the outline will be described here, and the details will be described in a later section.

処理Ｓ１では、ユーザはユーザ端末１４０により、報告書データベース（ＤＢ）１１１をアクセスし、対象とする報告書データ（の集合）を選定する（報告書のテキストデータのことを、以下単に「報告書」ということがある）。また、報告書から抽出する抽出項目を定義する。制御部１４１は例えばブラウザ機能を有しており、ユーザはデータ蓄積サーバ１１０のデータを閲覧しデータのダウンロードも可能である。 In the process S1, the user accesses the report database (DB) 111 by the user terminal 140 and selects (a set of) the target report data (the text data of the report is simply referred to as "report" below. "). In addition, the extraction items to be extracted from the report are defined. The control unit 141 has, for example, a browser function, and the user can browse the data of the data storage server 110 and download the data.

対象とする報告書は例えば、「設備Ａの保守報告書」のように特定の目的や用途に対応する報告書を選定することが望ましい。抽出項目は、例えば「設備Ａの保守報告書」の中に現れる項目を、ユーザが自由に定義することができる。 As the target report, for example, it is desirable to select a report corresponding to a specific purpose and application such as "Maintenance report of equipment A". As the extraction items, for example, the user can freely define the items appearing in the “maintenance report of equipment A”.

本実施例では、故障等が起こった際に作業員に対して通知された情報、言い換えれば、作業員が受領した故障等に関する情報を「通知」、通知された事象の原因や対策を検討するために必要な情報を収集する確認を行い、確認の結果得られた事象の内容に関する情報を「確認（内容）」、当該確認の結果に基づいて作業員により導き出された原因に関する情報を「原因」、当該原因に基づいて立案・実行された対策に関する情報を「対策」と定義する。 In this embodiment, the information notified to the worker when a failure occurs, in other words, the information about the failure received by the worker is "notified", and the cause and countermeasure of the notified event are examined. We perform confirmation to collect necessary information for the purpose, "confirm (content)" the information about the content of the event obtained as a result of the confirmation, and "cause" the information about the cause derived by the worker based on the result of the confirmation. , Information on the measures planned and implemented based on the cause is defined as "countermeasures".

保守報告書や故障報告書においては、上述のように作業内容が時系列やフリーテキストの形式で記載されているものがあるため、上記「通知」、「確認」、「対策」、「原因」それぞれの項目は散発的に記載されていたり、同じ項目複数回登場したり、複数の項目にまたがる内容がテキストになっていたりする場合があり、抽出が困難である。 In some maintenance reports and failure reports, the work contents are described in chronological order or in free text format as described above, so the above "notification", "confirmation", "countermeasure", and "cause" Each item may be described sporadically, the same item may appear multiple times, or the content that spans multiple items may be text, making it difficult to extract.

しかしながら、上記４つの項目は、上記保守報告書等においては概ね全てに記載されている内容であり、且つ、これらを解析することにより、作業員の作業の効率性を飛躍的に高めることができることを初めて見出し、本実施例はこれを実現するものである。 However, the above four items are almost all the contents described in the above maintenance report and the like, and by analyzing these, the work efficiency of the worker can be dramatically improved. Is found for the first time, and this embodiment realizes this.

なお、「通知」と「確認」はともに事象に関して得られた情報であるが、「通知」は報告書を作成した主体からみて受動的に得られた情報、「確認」は能動的に得られた情報である。例えば、「通知」はお客様からの故障連絡すなわち課題の内容であり、時系列的には最初の事象であることが多い。また、「確認」は課題を解決するために認識した事象の内容であり、受動的な情報である。認識のために現場の作業員などが行う行動を「指示（内容）」ということがあるが、これは能動的な情報である。また、「対策」は通知された課題を解決するために現場の作業員などが行う行動であり、これは能動的な情報である。「原因」は、それがなければ課題が生じなかった事象であり、これは受動的な情報である。 Both "notification" and "confirmation" are information obtained about the event, but "notification" is information obtained passively from the viewpoint of the subject who created the report, and "confirmation" is actively obtained. Information. For example, a "notification" is a failure report from a customer, that is, the content of an issue, and is often the first event in chronological order. In addition, "confirmation" is the content of the event recognized to solve the problem, and is passive information. Actions taken by on-site workers for recognition are sometimes called "instructions (contents)", but this is active information. In addition, "countermeasures" are actions taken by on-site workers to solve the notified problems, and this is active information. A "cause" is an event that would not have caused a problem without it, which is passive information.

このように、各情報は受動的な情報（受動情報）または能動的な情報（能動情報）の２つに大別することができ、この受動情報と能動情報の２つを含んでいることが、「故障という事象（受動的事象）が起きた場合の、対策（能動的事象）をリコメンドする」という故障対応リコメンデーションの概念と整合する。このため、報告書に含まれる受動情報と能動情報の関係を分類整理することで、故障等が起こった際の的確なリコメンデーションが可能となる。 In this way, each information can be roughly divided into two types, passive information (passive information) and active information (active information), and may include the passive information and the active information. , Consistent with the concept of failure response recommendation, which is to "recommend countermeasures (active events) when an event called failure (passive event) occurs". Therefore, by classifying and organizing the relationship between the passive information and the active information included in the report, it is possible to make an accurate recommendation when a failure or the like occurs.

本実施例では、上記で説明した「通知」、「確認」、「対策」、「原因」を項目とした例で説明するが、これに限る必要はないし、さらに項目を増やしてもよい。たとえば、「報告」「作業内容」「指示」「その他」など、項目は対象とする報告書の用途や性格にあわせて定義してよい。 In this embodiment, the items of "notification", "confirmation", "countermeasure", and "cause" described above will be described, but the present invention is not limited to this, and the number of items may be further increased. For example, items such as "report", "work content", "instruction", and "others" may be defined according to the purpose and character of the target report.

処理Ｓ２では、対象とする報告書から「原因」の項目を抽出して「原因」を分類する。分類はユーザが定義する。「原因」の項目は報告書の結論として記載されることも多く、また、ユーザが最も知りたい項目でもある。「原因」の項目の抽出と分類は、例えばＤＮＮ（Deep Neural Network）を用いて行うことができる。分類した「原因」のカテゴリは報告書のテキストデータにタグ（原因ラベル）として付加する。 In the process S2, the item of "cause" is extracted from the target report and the "cause" is classified. The classification is user-defined. The "cause" item is often stated as the conclusion of the report, and is also the item that the user wants to know most. The item of "cause" can be extracted and classified by using, for example, DNN (Deep Neural Network). The classified "cause" category is added as a tag (cause label) to the text data of the report.

処理Ｓ３では、対象とする報告書から「原因」以外の他の項目を抽出する。本実施例では、「通知」、「確認」、「対策」の項目が該当する。項目の抽出は、例えばＤＮＮを用いて行うことができる。処理Ｓ３の結果、項目毎に、報告書から抽出したテキストデータの集合が形成される。 In the process S3, items other than the “cause” are extracted from the target report. In this embodiment, the items of "notification", "confirmation", and "countermeasure" are applicable. Extraction of items can be performed using, for example, DNN. As a result of the process S3, a set of text data extracted from the report is formed for each item.

処理Ｓ４では、処理Ｓ３で抽出した各項目のテキストデータを、類似したもの同士でグルーピングする。そして、各報告書にそれに含まれる項目のグループを特定するＩＤ（グループ名）をタグとして付加する。 In the process S4, the text data of each item extracted in the process S3 is grouped by similar items. Then, an ID (group name) that identifies the group of the items included in each report is added as a tag.

処理Ｓ５では、グループ名を説明変数、原因ラベルを目的変数として、原因推定モデルを作成する。 In process S5, a cause estimation model is created with the group name as the explanatory variable and the cause label as the objective variable.

処理Ｓ６では、原因推定モデルを運用する。 In process S6, the cause estimation model is operated.

なお、上記の処理Ｓ１〜Ｓ６は一連の処理であるが、独立の処理として実行することもできる。その場合、処理Ｓ１〜Ｓ６は、それぞれ別のシステムで実行することもできる。 Although the above processes S1 to S6 are a series of processes, they can also be executed as independent processes. In that case, the processes S1 to S6 can be executed by different systems.

＜３．原因の抽出と分類（処理Ｓ２）＞
処理Ｓ１に続く処理を以下説明する。なお、以降の説明では、処理Ｓ１で選定した対象とする報告書データにのみについて説明する。報告書ＤＢ１１１には、選定していないデータも格納される場合もあるが、説明上それらについては言及しない。 <3. Extraction and classification of causes (processing S2)>
The process following the process S1 will be described below. In the following description, only the target report data selected in the process S1 will be described. Data that has not been selected may also be stored in the report DB 111, but these are not mentioned for the sake of explanation.

図３は、原因の抽出と分類の処理Ｓ２の詳細なフロー図である。説明上、ユーザがユーザ端末１４０と分析サーバ１２０を操作して行う処理、分析モデル格納サーバ１３０の分析モデルが行う処理、およびデータ蓄積サーバ１１０が格納するテキストデータを区分して説明している（以下同様）。 FIG. 3 is a detailed flow chart of the cause extraction and classification process S2. For the sake of explanation, the processing performed by the user operating the user terminal 140 and the analysis server 120, the processing performed by the analysis model of the analysis model storage server 130, and the text data stored in the data storage server 110 are described separately (). The same applies below).

処理Ｓ２０１では、ユーザは処理Ｓ１で選定した対象とする報告書を検討し、推定される原因のカテゴリを決定する。このため、例えば制御部１４１により、該当する報告書を報告書ＤＢ１１１から呼び出して、ユーザ端末１４０に表示して内容を検討する。原因のカテゴリの定義、内容や種類は任意であり、対象とする報告書に応じてユーザが任意に定めることができる。 In the process S201, the user examines the target report selected in the process S1 and determines the category of the probable cause. Therefore, for example, the control unit 141 calls the corresponding report from the report DB 111, displays it on the user terminal 140, and examines the contents. The definition, content and type of the cause category are arbitrary, and can be arbitrarily determined by the user according to the target report.

原因カテゴリの定義は経験のあるユーザが決定することが望ましい。例えば「電気系統の故障」「機械系統の故障」「人為的ミス」等である。もちろんさらに原因を細分化してもよい。決定した原因カテゴリは、分析サーバ１２０の原因カテゴリ抽出部１２２に格納される。 It is desirable that the definition of the cause category be decided by an experienced user. For example, "electrical system failure", "mechanical system failure", "human error" and the like. Of course, the cause may be further subdivided. The determined cause category is stored in the cause category extraction unit 122 of the analysis server 120.

処理Ｓ２０２では、ユーザは教師データとする報告書を抜粋する。これはＳ１で選定した対象とする報告書の集合から、その一部を任意に選定してよい。ランダムに選定してもよい。 In process S202, the user extracts a report as teacher data. A part of the set of reports to be targeted selected in S1 may be arbitrarily selected. It may be randomly selected.

処理Ｓ２０３では、ユーザはユーザ端末１４０で、教師データとする報告書を閲覧し、それぞれの報告書に処理Ｓ２０１で定めた定義に従い原因カテゴリを付与する。原因カテゴリが付された報告書は、データ蓄積サーバ１１０の原因カテゴリ付報告書ＤＢ１１２に格納される。 In the process S203, the user browses the reports as teacher data on the user terminal 140, and assigns a cause category to each report according to the definition defined in the process S201. The report with the cause category is stored in the report DB 112 with the cause category of the data storage server 110.

図４は、原因カテゴリ付報告書ＤＢ１１２のデータ構造例である。報告書の文書番号４０１にテキストデータ４０２と原因カテゴリ４０３が対応している。なお、報告書ＤＢ１１１のデータ構造は、図４の構成から原因カテゴリ４０３を除いたものである。 FIG. 4 is an example of the data structure of the report DB 112 with cause category. The text data 402 and the cause category 403 correspond to the document number 401 of the report. The data structure of the report DB 111 excludes the cause category 403 from the configuration of FIG.

図３に戻り、処理Ｓ２０４では、モデル生成部１２１は、テキストデータ４０２と原因カテゴリ４０３のペアを教師データ４０４として、原因抽出モデル１３１に入力する。 Returning to FIG. 3, in the process S204, the model generation unit 121 inputs the pair of the text data 402 and the cause category 403 as the teacher data 404 into the cause extraction model 131.

処理Ｓ２０５では、モデル生成部１２１の制御により、分析モデル格納サーバ１３０の学習前の原因抽出モデル１３１が教師データを取り込む。原因推定モデルは、特に限定しないが、例えばベイジアンＤＮＮである。 In the process S205, the cause extraction model 131 before learning of the analysis model storage server 130 takes in the teacher data under the control of the model generation unit 121. The cause estimation model is not particularly limited, but is, for example, Bayesian DNN.

処理Ｓ２０６では、モデル生成部１２１の制御により、原因抽出モデル１３１が教師有り学習を行う。学習方法は公知の手法を用いてよい。 In process S206, the cause extraction model 131 performs supervised learning under the control of the model generation unit 121. A known method may be used as the learning method.

処理Ｓ２０７では、原因カテゴリ抽出部１２２の制御により、学習済みの原因抽出モデル１３１に対して、教師データ以外の報告書のテキストデータを入力する。教師データ以外の報告書には原因カテゴリ４０３が付されていないが、処理Ｓ２０６で適切に学習された原因抽出モデル１３１は、報告書のテキストデータ４０２の入力に対して、原因カテゴリ４０３を出力する。 In the process S207, the text data of the report other than the teacher data is input to the learned cause extraction model 131 under the control of the cause category extraction unit 122. Although the cause category 403 is not attached to the report other than the teacher data, the cause extraction model 131 properly learned in the process S206 outputs the cause category 403 in response to the input of the text data 402 of the report. ..

処理Ｓ２０８では、原因カテゴリ抽出部１２２は、分類結果の出力をユーザ端末１４０に送信する。処理Ｓ２０９では、ユーザは必要により内容をチェックして、教師データ以外の報告書の原因カテゴリを決定する。 In the process S208, the cause category extraction unit 122 transmits the output of the classification result to the user terminal 140. In process S209, the user checks the contents as necessary to determine the cause category of the report other than the teacher data.

以上の処理により、ユーザが処理Ｓ１で選定した対象データの報告書には、ユーザあるいは原因カテゴリ抽出部１２２によって、全て原因カテゴリが付与されたことになる。このデータ形式は図４に示したものと同様である。ユーザは制御部１４１の制御により、原因カテゴリが付与された報告書を、データ蓄積サーバの原因カテゴリ付報告書ＤＢ１１２に格納する。 By the above processing, the cause category is allotted to the report of the target data selected by the user in the process S1 by the user or the cause category extraction unit 122. This data format is similar to that shown in FIG. Under the control of the control unit 141, the user stores the report to which the cause category is assigned in the report DB 112 with the cause category of the data storage server.

＜３．他の項目の抽出（処理Ｓ３）＞
図５は、他の項目の抽出の処理Ｓ３の詳細なフロー図である。 <3. Extraction of other items (processing S3)>
FIG. 5 is a detailed flow chart of the process S3 for extracting other items.

処理Ｓ３０１では、ユーザは報告書データベース１１１から、教師データとする報告書を抜粋する。これはＳ１で選定した対象とするデータの集合から、任意に選定してよい。処理Ｓ２０２で選定した報告書とは異なっていてもよい。 In process S301, the user extracts a report as teacher data from the report database 111. This may be arbitrarily selected from the set of target data selected in S1. It may be different from the report selected in the process S202.

データ蓄積サーバ１１０に格納されている、教師とする報告書３０１０のテキストデータから、時系列的に記載された作業内容３０１１がユーザ端末に送られる。 From the text data of the report 3010 as a teacher stored in the data storage server 110, the work contents 3011 described in chronological order are sent to the user terminal.

処理Ｓ３０２では、ユーザはユーザ端末１４０を操作して、時系列的に記載された作業内容３０１１の一行ごとに項目をタグ付けする。タグ付けする項目は、処理Ｓ１で行った定義に従う。この例では、「通知」、「確認」、「対策」である。なお、タグ付けする単位は、所定のルールに従って報告書のテキストを区切るものであれば、一行ごとでなくてもよい。タイムスタンプ毎でもよいし、一段落毎でもよいし、枠で区切られた文章毎でもよい。句読点から句読点まででもよい。タグ付けをした報告書は項目タグ付き報告書ＤＢ１１３として、データ蓄積サーバ１１０に格納してもよい。 In the process S302, the user operates the user terminal 140 to tag an item for each line of the work content 3011 described in chronological order. Items to be tagged follow the definition made in process S1. In this example, they are "notification", "confirmation", and "countermeasure". The tagging unit does not have to be line by line as long as it separates the text of the report according to a predetermined rule. It may be for each time stamp, for each paragraph, or for each sentence separated by a frame. It may be from punctuation to punctuation. The tagged report may be stored in the data storage server 110 as the item-tagged report DB 113.

処理Ｓ３０３では、モデル生成部１２１の制御により、項目がタグ付けされた報告書を、教師データとして分析モデル格納サーバ１３０の項目抽出モデル１３２に入力する。 In the process S303, under the control of the model generation unit 121, the report tagged with the item is input to the item extraction model 132 of the analysis model storage server 130 as teacher data.

図６は、処理Ｓ３０２でタグ付けされた報告書の例であり、項目タグ付報告書ＤＢ１１３のデータ構造例である。この例では行番号６０１の一行ごとに、「通知」「確認」「対策」がタグ付けされている。一行ごとのテキストデータ６０２と項目タグ６０３のペアが教師データ６０４として項目抽出モデル１３２に入力される。 FIG. 6 is an example of the report tagged in the process S302, and is an example of the data structure of the item-tagged report DB 113. In this example, "notification", "confirmation", and "countermeasure" are tagged for each line of line number 601. A pair of text data 602 and item tag 603 for each line is input to the item extraction model 132 as teacher data 604.

処理Ｓ３０４では、モデル生成部１２１の制御により、項目抽出モデル１３２が教師データを取り込む。このとき、蓄積済みの分類データ３０１２があれば、教師データとして同様に取り込むことができる。 In the process S304, the item extraction model 132 takes in the teacher data under the control of the model generation unit 121. At this time, if there is the accumulated classification data 3012, it can be similarly taken in as teacher data.

処理Ｓ３０５では、モデル生成部１２１の制御により、項目抽出モデルが教師有り学習あるいは再学習される。項目抽出モデル１３２は、必要に応じて学習を繰り返すことにより性能向上を図ることができる。分析モデルの学習については、処理Ｓ２０６と同様である。 In the process S305, the item extraction model is supervised learning or relearned under the control of the model generation unit 121. The performance of the item extraction model 132 can be improved by repeating learning as necessary. The training of the analysis model is the same as that of the process S206.

処理Ｓ３０６では、項目抽出部１２３が、学習済みの項目抽出モデル１３２に対して、教師データ以外の報告書のテキストデータ６０２を例えば、１行ずつ入力する。処理Ｓ３０５で適切に学習された項目抽出モデル１３２は、報告書の各行に対して、項目タグ６０３を出力する。 In the process S306, the item extraction unit 123 inputs the text data 602 of the report other than the teacher data into the trained item extraction model 132, for example, line by line. The item extraction model 132 appropriately trained in the process S305 outputs the item tag 603 for each line of the report.

処理Ｓ３０７では、出力された項目タグが付された報告書の各行を、ユーザ端末１４０に送付する。 In the process S307, each line of the report with the output item tag is sent to the user terminal 140.

処理Ｓ３０８では、ユーザ端末１４０の制御部１４１は、項目タグが付された報告書の各行を、項目別にリスト化する。この例では、報告書の各行のテキストが、「通知」「確認」「対策」の３つに分類されてリストアップされる。 In the process S308, the control unit 141 of the user terminal 140 lists each line of the report with the item tag by item. In this example, the text on each line of the report is classified into three categories, "notification", "confirmation", and "countermeasure", and listed.

以上の処理で、過去の報告書データのテキストデータの各行が、所定の項目と紐づけられた項目分類リスト３０１３が作成された。項目分類リストは、項目分類リストＤＢ１１４としてデータ蓄積サーバ１１０に格納しておいてもよい。 By the above processing, the item classification list 3013 in which each line of the text data of the past report data is associated with a predetermined item is created. The item classification list may be stored in the data storage server 110 as the item classification list DB 114.

＜４．他の各項目のグルーピング（処理Ｓ４）＞
図７は、他の各項目のグルーピング処理Ｓ４の詳細なフロー図である。処理Ｓ４０１では、分析モデル格納サーバ１３０の項目グループ化部１３３は、処理Ｓ３により作成された項目分類リストＤＢ１１４の同一項目に紐づけられたテキストを分類し、類似する文章同士でグルーピングする。 <4. Grouping of other items (processing S4)>
FIG. 7 is a detailed flow chart of the grouping process S4 for each of the other items. In the process S401, the item grouping unit 133 of the analysis model storage server 130 classifies the texts associated with the same item in the item classification list DB 114 created by the process S3, and groups similar sentences together.

テキスト分類の手法は公知の技術を適用可能である。例えば、テキストをベクトルに変換し、クラスター分析の手法によりデータを外的基準なし（教師なし）で自動的にグルーピングする。 Known techniques can be applied to the text classification method. For example, the text is converted to a vector, and the data is automatically grouped without an external standard (unsupervised) by a cluster analysis method.

処理Ｓ４０２では、項目グループ化部１３３は、グルーピングされたデータにグループ名（ＩＤ）を付与する。グループ名は自動的に付与してもよいし、ユーザが、制御部１４１を操作し、ユーザ端末１４０でテキストの内容を見て適切な名前を付与してもよい。そして処理Ｓ４０３では、報告書に各項目のグループ名をタグ付けして、項目紐付済の報告書を項目タグ分類付報告書ＤＢ１１５としてデータ蓄積サーバ１１０に格納する。 In the process S402, the item grouping unit 133 assigns a group name (ID) to the grouped data. The group name may be automatically assigned, or the user may operate the control unit 141, see the content of the text on the user terminal 140, and assign an appropriate name. Then, in the process S403, the group name of each item is tagged in the report, and the report linked with the items is stored in the data storage server 110 as the report DB 115 with the item tag classification.

以上の処理により、各報告書で自由な表現で記入されていた各項目の内容が、類似なもの同士でカテゴライズされ、記述の揺らぎが吸収される。 By the above processing, the contents of each item entered in free expression in each report are categorized by similar items, and the fluctuation of the description is absorbed.

例えば、ドアの故障を表現する「通知」の記述が「ドアが開かない」「扉が故障している」「開閉異常」など異なる表記で行われていた場合でも、これらを同じカテゴリの通知であるとして扱うことができる。 For example, even if the description of "notification" that expresses a door failure is made with different notations such as "door does not open", "door is broken", "opening / closing abnormality", these are notified in the same category. Can be treated as being.

図８は、項目タグ分類付報告書ＤＢ１１５のデータ構造の一例を示す表図である。報告書の文書番号４０１に対して、行番号６０１、当該行のテキストデータ６０２、当該行の項目タグ６０３、および項目の項目分類８０１が対応付けられる。 FIG. 8 is a table diagram showing an example of the data structure of the report DB 115 with item tag classification. The line number 601, the text data 602 of the line, the item tag 603 of the line, and the item classification 801 of the item are associated with the document number 401 of the report.

＜５.原因推定モデル作成（処理Ｓ５）＞
図９は、原因推定モデル作成処理Ｓ５の詳細なフロー図である。 <5. Cause estimation model creation (processing S5)>
FIG. 9 is a detailed flow chart of the cause estimation model creation process S5.

処理Ｓ５０１では、報告書分析情報作成部１２５は、項目タグ分類付報告書ＤＢ１１５のデータと原因カテゴリ付き報告書ＤＢ１１２のデータを合体して、報告書分析情報データを生成する。すなわち図８の項目タグ分類付報告書ＤＢ１１５のデータの文書番号毎に、図４の原因カテゴリ付き報告書ＤＢ１１２の原因カテゴリ４０３を付加し、これをデータ蓄積サーバ１１０に格納して、報告書分析情報ＤＢ１１６とする。報告書分析情報ＤＢのデータは、原因推定モデル教師データとして利用できるが、後述のように他にも利用価値がある。 In the process S501, the report analysis information creation unit 125 combines the data of the report DB 115 with the item tag classification and the data of the report DB 112 with the cause category to generate the report analysis information data. That is, the cause category 403 of the report DB 112 with the cause category of FIG. 4 is added for each document number of the data of the report DB 115 with the item tag classification of FIG. 8, and this is stored in the data storage server 110 to analyze the report. Information DB 116. The data in the report analysis information DB can be used as the cause estimation model teacher data, but it is also useful as described later.

図１０は、処理Ｓ５０１の結果生成されたデータ構造（報告書分析情報ＤＢ１１６のデータ構造）を示す図である。先に述べたように、項目タグ分類付報告書ＤＢ１１５のデータに原因カテゴリ４０３が追加されている。ここで、項目分類８０１と原因カテゴリ４０３のペアを原因推定モデル教師データ１００１として利用することができる。 FIG. 10 is a diagram showing a data structure (data structure of the report analysis information DB 116) generated as a result of the process S501. As described above, the cause category 403 is added to the data of the item tag classification report DB 115. Here, the pair of the item classification 801 and the cause category 403 can be used as the cause estimation model teacher data 1001.

処理Ｓ５０２では、ユーザ端末の制御部１４１は、モデル生成部１２１に対して、原因推定モデル教師データ１００１を原因推定モデル１３４に入力するように指示を出す。 In the process S502, the control unit 141 of the user terminal instructs the model generation unit 121 to input the cause estimation model teacher data 1001 into the cause estimation model 134.

処理Ｓ５０３では、原因推定モデル１３４は、原因推定モデル教師データ１００１の項目分類８０１を説明変数、原因カテゴリ４０３を目的変数として取り込む。 In the process S503, the cause estimation model 134 takes in the item classification 801 of the cause estimation model teacher data 1001 as an explanatory variable and the cause category 403 as an objective variable.

処理Ｓ５０４では、原因推定モデル１３４は、モデル生成部１２１の制御で学習を行う。 In the process S504, the cause estimation model 134 learns under the control of the model generation unit 121.

処理Ｓ５０５では、教師データで一応学習が済んだ原因推定モデル１３４に対して、項目タグ分類付報告書ＤＢ１１５の原因カテゴリ４０３が付されていない報告書を学習データ（トレーニングデータ）として用い、原因の推定を行う。すなわち、項目分類８０１を入力として、原因推定モデル１３４が正しい原因カテゴリ４０３を出力するように学習させる。 In the process S505, for the cause estimation model 134 that has been learned by the teacher data, the report without the cause category 403 of the item tag classification report DB 115 is used as the learning data (training data), and the cause is caused. Make an estimate. That is, the item classification 801 is input, and the cause estimation model 134 is trained to output the correct cause category 403.

処理Ｓ５０６では、出力された原因カテゴリ４０３を確認して推定結果を評価し、原因推定モデル１３４の精度を確認する。精度が不十分であれば、原因推定モデル１３４を再学習する。精度が十分であれば、原因推定モデル１３４は完成し、実運用となる。 In the process S506, the output cause category 403 is confirmed, the estimation result is evaluated, and the accuracy of the cause estimation model 134 is confirmed. If the accuracy is insufficient, the cause estimation model 134 is retrained. If the accuracy is sufficient, the cause estimation model 134 is completed and put into actual operation.

用いる原因推定モデル１３４は、決定木、ランダムフォレスト、サポートベクトルマシン（ＳＶＭ）など、公知の教師有り学習モデルから選択してよい。学習方法も公知手法を用いてよい。 The cause estimation model 134 to be used may be selected from known supervised learning models such as decision trees, random forests, and support vector machines (SVMs). A known method may be used as the learning method.

＜６.原因推定モデルの運用（処理Ｓ６）＞
図１１は、原因推定モデルの運用処理Ｓ６の詳細なフロー図である。 <6. Operation of cause estimation model (processing S6)>
FIG. 11 is a detailed flow chart of the operation process S6 of the cause estimation model.

処理Ｓ６０１で何らかの障害が発生したとする。障害の内容は、「通知」内容として処理Ｓ６０２でトリガとして入力される。通知は、原因推定モデル１３４に入力される。 It is assumed that some kind of failure occurs in the process S601. The content of the failure is input as a trigger in the process S602 as the content of the "notification". The notification is input to the cause estimation model 134.

処理Ｓ６０３で、原因推定モデル１３４が決定木で構成されている場合、「通知」を入力された原因推定モデル１３４は、分岐となる「確認」項目に対応する指示を出力する。 In the process S603, when the cause estimation model 134 is composed of a decision tree, the cause estimation model 134 in which the "notification" is input outputs an instruction corresponding to the branching "confirmation" item.

出力された確認項目の指示に対して、処理Ｓ６０４で現場の作業員などは、状況を確認し回答を行う。処理Ｓ６０５で、回答は「確認」内容として原因推定モデル１３４に入力される。
処理Ｓ６０６で、原因推定モデル１３４は推定される原因を出力する。 In response to the output confirmation item instruction, the on-site worker or the like confirms the situation and responds in the process S604. In process S605, the answer is input to the cause estimation model 134 as the "confirmation" content.
In process S606, the cause estimation model 134 outputs the estimated cause.

図１２は、決定木で構成された原因推定モデル１３４の例である。この決定木は、「通知」「確認」「原因」「対策」を入力としているが、「確認」を行うために必要な動作は「指示」として定義することができる。いま、「モータが動作しない」という通知カテゴリ１の事象が入力された場合、原因推定モデル１３４の警告灯の点灯有無という分岐に対応する指示「警告灯をチェック」が現場に対して行われる（処理Ｓ６０３）。図１２の例にも見られるように、「通知」と「確認」の項目は、決定木の（特に初期の）分岐に対応する事象であることが多いため、重要である。 FIG. 12 is an example of a cause estimation model 134 composed of a decision tree. This decision tree inputs "notification", "confirmation", "cause", and "countermeasure", but the operation required to perform "confirmation" can be defined as "instruction". Now, when an event of notification category 1 that "the motor does not operate" is input, an instruction "check the warning light" corresponding to the branch of whether or not the warning light of the cause estimation model 134 is lit is given to the site ( Process S603). As can be seen in the example of FIG. 12, the items "notification" and "confirmation" are important because they often correspond to the (especially early) branching of the decision tree.

指示に対応して「警告灯が点灯している（確認カテゴリ１）」あるいは「警告灯が点灯していない（確認カテゴリ２）」が原因推定モデル１３４に入力され（処理Ｓ６０４）、原因推定モデルの実行で推定が進められる（処理Ｓ６０５）。 In response to the instruction, "warning light is on (confirmation category 1)" or "warning light is not on (confirmation category 2)" is input to the cause estimation model 134 (process S604), and the cause estimation model The estimation is advanced by the execution of (Process S605).

最終的に原因が推定されたら、推定原因を出力し（処理Ｓ６０６〜Ｓ６０７）、出力結果に基づく事実確認を現場に指示し（処理Ｓ６０８）、対策を行う（処理Ｓ６０９）。例えば、ヒューズ切れという原因が推定されたら（図１２、原因カテゴリ１）、ヒューズを交換という対策を行う（図１２、対策カテゴリ１）。 When the cause is finally estimated, the estimated cause is output (processes S606 to S607), the fact confirmation based on the output result is instructed to the site (process S608), and countermeasures are taken (process S609). For example, if the cause of a blown fuse is presumed (FIG. 12, cause category 1), a countermeasure of replacing the fuse is taken (FIG. 12, countermeasure category 1).

図１３は、実施例２の説明のため、図１のシステムを用いた他の全体フローの例を示している。 FIG. 13 shows an example of another overall flow using the system of FIG. 1 for the purpose of explaining Example 2.

実施例１の図１では、処理Ｓ１〜Ｓ４で得られたデータを用いて、システムは原因推定モデルの作成と運用（処理Ｓ５およびＳ６）を行っている。しかし、処理Ｓ１〜Ｓ４で得られたデータを、報告書分析情報作成処理Ｓ１００で加工して、直接ユーザに示してもよい。 In FIG. 1 of the first embodiment, the system creates and operates a cause estimation model (processes S5 and S6) using the data obtained in the processes S1 to S4. However, the data obtained in the processes S1 to S4 may be processed by the report analysis information creation process S100 and directly shown to the user.

図１４は、報告書分析情報作成部１２５により報告書分析情報作成処理Ｓ１００を行い、図１０に示した報告書分析情報ＤＢ１１６のデータを一覧１４００の形に変形したものである。報告書に出現する各項目のカテゴリの組み合わせが示されており、ユーザは各通知のカテゴリに対して、どの確認や原因のカテゴリが関係するかを知ることができる。 FIG. 14 shows a report analysis information creation process S100 performed by the report analysis information creation unit 125, and the data of the report analysis information DB 116 shown in FIG. 10 is transformed into the form of the list 1400. The combination of categories for each item that appears in the report is shown, allowing the user to know which confirmation or cause category is relevant to each notification category.

ユーザは、制御部１４１により、報告書分析情報ＤＢ１１６のデータから作成した一覧１４００を、ユーザ端末１４０に表示することができる。 The user can display the list 1400 created from the data of the report analysis information DB 116 on the user terminal 140 by the control unit 141.

図１５は、実施例３の説明のため、図１のシステムを用いた他の全体フローの例を示している。 FIG. 15 shows an example of another overall flow using the system of FIG. 1 for the purpose of explaining Example 3.

実施例１の図１では、原因項目のカテゴリの抽出のために、他の項目のカテゴリの抽出処理Ｓ３とは別個の処理Ｓ２を行っている。しかし、原因項目のカテゴリ抽出も、他の項目と同様に行ってもよい。 In FIG. 1 of the first embodiment, a process S2 separate from the extraction process S3 of the categories of other items is performed in order to extract the category of the cause item. However, the category extraction of the cause item may be performed in the same manner as other items.

実施例３の項目の抽出処理Ｓ３−２では、「通知」「確認」「対策」と同様に「原因」の項目抽出およびカテゴリ分けを行う。実施例１よりも処理が簡略化されるメリットがあるが、「原因」特有の抽出方式を使用できないというデメリットがある。 In the item extraction process S3-2 of the third embodiment, the item of "cause" is extracted and categorized in the same manner as "notification", "confirmation", and "countermeasure". There is a merit that the processing is simplified as compared with the first embodiment, but there is a demerit that the extraction method peculiar to the "cause" cannot be used.

以上説明したように、本実施例では、様式や表現の定まっていない報告書であっても、項目を抽出してカテゴライズすることにより、報告書の内容を定式化して整理し、ユーザが内容を理解しやすくすることができる。また、人口知能の機械学習のための教師データとして使いやすくすることができる。 As described above, in this embodiment, even if the report has no fixed format or expression, the contents of the report are formulated and organized by extracting and categorizing the items, and the user can organize the contents. It can be made easier to understand. In addition, it can be easily used as teacher data for machine learning of artificial intelligence.

テキストデータ解析システム１００、データ蓄積サーバ１１０、分析サーバ１２０、分析モデル格納サーバ１３０、ユーザ端末１４０、報告書データベースＤＢ１１１、原因カテゴリ付報告書ＤＢ１１２、項目タグ付き報告書ＤＢ１１３、項目分類リストＤＢ１１４、項目タグ分類付報告書ＤＢ１１５、報告書分析情報ＤＢ１１６、モデル生成部１２１、原因カテゴリ抽出部１２２、項目抽出部１２３、項目テキスト分類部１２４、報告書分析情報作成部１２５、解析部１２６ Text data analysis system 100, data storage server 110, analysis server 120, analysis model storage server 130, user terminal 140, report database DB 111, report DB 112 with cause category, report DB 113 with item tag, item classification list DB 114, item Report DB 115 with tag classification, report analysis information DB 116, model generation unit 121, cause category extraction unit 122, item extraction unit 123, item text classification unit 124, report analysis information creation unit 125, analysis unit 126

Claims

A text data analysis system that processes text data from multiple documents.
An item extraction unit that extracts "notification" and "confirmation" items that are defined in advance as text data having a specific meaning in the document from the text data of the plurality of documents.
An item classification unit that classifies multiple text data extracted as the same item into multiple categories,
The report analysis information creation department that associates the categories in which the extracted text data is classified with the documents in which the text data is extracted, and
A text data analysis system equipped with.

The item classification unit
Multiple text data extracted as the same item are classified into multiple categories by grouping similar ones.
The text data analysis system according to claim 1.

The item extraction unit
As the above items, "notification" that describes passively obtained information and "confirmation" that describes actively obtained information are extracted.
The text data analysis system according to claim 1.

The item extraction unit
As the item, the "cause" of the event that is the content of the "notification" is extracted, the "notification" and the "confirmation" are extracted by the first algorithm, and the "cause" is extracted by the second algorithm.
The text data analysis system according to claim 3.

The item extraction unit
As the above item, "countermeasures" for eliminating the event that is the content of "notification" are extracted.
The text data analysis system according to claim 3.

The report analysis information creation department
Generates information indicating that the category in which the text data extracted as "Notification", "Confirmation" and "Cause" items is classified is associated with the extracted document.
The text data analysis system according to claim 4.

The report analysis information creation department
Learn the analysis model using the category in which the text data extracted as the "notification" and "confirmation" items is classified as the explanatory variable and the category in which the text data extracted as the "cause" item is classified as the objective variable. Generate teacher data for
The text data analysis system according to claim 4.

A text data analysis method that processes text data from multiple documents.
From the text data of the plurality of documents, items defined in advance as text data having a specific meaning in the document are extracted.
Multiple text data extracted as the same item is classified into multiple categories,
The category in which the extracted text data is classified is associated with the document in which the text data is extracted.
Text data analysis method.

Multiple text data extracted as the same item are classified into multiple categories by grouping similar ones.
The text data analysis method according to claim 8.

As the above items, "notification" that describes passively obtained information and "confirmation" that describes actively obtained information are extracted.
The text data analysis method according to claim 9.

Correspond the category in which the text data extracted as the "notification" and "confirmation" items is classified to the document in which the text data is extracted.
The text data analysis method according to claim 10.

From the text data of the plurality of documents, text data having the meaning of "cause" of the event that is the content of "notification" in the document is extracted according to a predefined classification.
The category in which the text data extracted as the "cause" item is classified is associated with the document in which the text data is extracted.
The text data analysis method according to claim 11.

Information indicating that the category in which the text data extracted as an item is classified is associated with the extracted document is displayed to the user.
The text data analysis method according to claim 12.

The analysis model is learned by using the information in which the text data extracted as items is classified and the information associated with the document from which the text data is extracted as teacher data.
Analyze new text data using the learned analysis model,
The text data analysis method according to claim 12.

It is a failure response recommendation system that processes text data of multiple documents including the contents related to the failure and proposes a response to the failure.
An item extraction unit that extracts items of passive information and active information defined in advance as text data having a specific meaning in the document from the text data of the plurality of documents.
An item classification unit that classifies multiple text data extracted as the same item into multiple categories,
It is equipped with a report analysis information creation unit that associates the categories in which the extracted text data is classified with the documents in which the text data is extracted.
Outputs failure response recommendations based on the analysis results of the report analysis information creation unit.
Failure response recommendation system.