JP6621385B2

JP6621385B2 - Text analysis system and text analysis method

Info

Publication number: JP6621385B2
Application number: JP2016147345A
Authority: JP
Inventors: 正裕本林; 健太郎堀; 秀明新井; 岩村　篤; 篤岩村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2016-07-27
Filing date: 2016-07-27
Publication date: 2019-12-18
Anticipated expiration: 2036-07-27
Also published as: JP2018018254A

Description

本発明は、テキスト分析システムに関する。 The present invention relates to a text analysis system.

社会では様々なテキスト情報が蓄積されており、これらを分析して活用することが求められている。例えば、テキスト情報として蓄積されている事象間の因果関係を解析する必要がある。 Various text information is accumulated in society, and it is required to analyze and utilize them. For example, it is necessary to analyze the causal relationship between events accumulated as text information.

本技術分野の背景技術として、以下の先行技術がある。特許文献１（特開２００８−２０３９６４号公報）には、自然言語で記述された文書である自然言語文から抽出された互いに異なる複数の事象間の因果関係に対して、事象を構成する単語の一部が同一でかつ、共通の原因または結果事象を持つ事象群がクラスタ対象選択部にてクラスタリングの対象として選択され、クラスタリング対象の事象全てで共通の因果関係のみが統合されるデータ構造を持つ因果関係グラフが因果関係記憶部に格納され、因果関係グラフの構造の複雑さがクラスタリングスコアとして数値化され、クラスタリングスコアが最小となるようにクラスタ対象選択部によって選択された事象群が事象クラスタ評価部にてクラスタリングする因果関係分析装置が開示されている。 There are the following prior arts as background art of this technical field. Patent Document 1 (Japanese Patent Application Laid-Open No. 2008-203964) discloses a word that constitutes an event with respect to a causal relationship between a plurality of different events extracted from a natural language sentence that is a document described in a natural language. A group of events that are partly the same and have a common cause or effect event are selected as clustering targets by the cluster target selection unit, and have a data structure in which only common causal relationships are integrated in all clustering target events The causal relationship graph is stored in the causal relationship storage unit, the complexity of the structure of the causal relationship graph is digitized as a clustering score, and the event group selected by the cluster target selection unit so that the clustering score is minimized is the event cluster evaluation A causal relationship analysis apparatus that performs clustering in each section is disclosed.

特開２００８−２０３９６４号公報JP 2008-203964 A

特許文献１に記載された因果関係分析装置は、定義済みのパターン及び因果関係に基づいて事象間の因果関係を抽出する。しかし、人が書く文章は無限のパターンや因果関係があるため、全てのパターンや因果関係を予め定義することは困難であり、予め定義されていない場合、テキストの分析が困難であり、因果関係の抽出が困難となる。 The causal relationship analysis apparatus described in Patent Literature 1 extracts a causal relationship between events based on predefined patterns and causal relationships. However, since sentences written by humans have infinite patterns and causal relationships, it is difficult to pre-define all patterns and causal relationships. If they are not pre-defined, it is difficult to analyze text, and causal relationships. Extraction becomes difficult.

本発明は、パターンを定義することなくテキスト情報を分析することを目的とする。 An object of the present invention is to analyze text information without defining a pattern.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、テキスト分析システムであって、プログラムを実行するプロセッサと、前記プロセッサがアクセスする記憶装置とを備え、前記テキスト分析システムが分析するテキスト情報は、複数の列の各々に定められた項目名と、前記項目名に対応した文字又は数値のデータを含む複数の行データとを含み、前記テキスト分析システムは、各行について、他の列のデータを用いて、指定列のデータを抽象化する抽象化部と、前記抽象化されたデータからキーワードを抽出する言語処理部と、前記指定列のキーワードの類似度によって、データが類似する行を集約する集約部と、統計処理の対象行のデータの代表値を計算する集計部とを備え、前記抽象化部は、前記指定列のデータを当該行の他の列のデータに置換した具体化処理結果を生成する。 A typical example of the invention disclosed in the present application is as follows. That is, the text analysis system includes a processor that executes a program and a storage device that is accessed by the processor, and the text information analyzed by the text analysis system includes item names defined in each of a plurality of columns. A plurality of line data including character or numeric data corresponding to the item name, and the text analysis system abstracts the data of the designated column using the data of other columns for each line. A language processing unit for extracting keywords from the abstracted data, an aggregating unit for aggregating rows with similar data according to the keyword similarity in the specified column, and representative data of a target row for statistical processing A summation unit that calculates values, and the abstraction unit generates a materialization process result obtained by replacing the data in the designated column with data in another column of the row. .

本発明の一態様によれば、パターンを定義することなくテキスト情報を分析できる。前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to one aspect of the present invention, text information can be analyzed without defining a pattern. Problems, configurations, and effects other than those described above will become apparent from the description of the following embodiments.

本発明の実施例のテキスト分析システムの構成を示す図である。It is a figure which shows the structure of the text analysis system of the Example of this invention. 本実施例のサーバの物理的な構成を示すブロック図である。It is a block diagram which shows the physical structure of the server of a present Example. 本実施例のテキスト分析処理のフローチャートである。It is a flowchart of the text analysis process of a present Example. 抽象化処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of an abstraction process. データ集約処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a data aggregation process. 図６（Ａ）は故障情報Ａの一例を示す図であり、図６（Ｂ）は故障情報Ａ１の一例を示す図である。FIG. 6A is a diagram illustrating an example of the failure information A, and FIG. 6B is a diagram illustrating an example of the failure information A1. 図７（Ａ）は、故障情報Ａ２の一例を示す図である。FIG. 7A is a diagram illustrating an example of the failure information A2. 図８（Ａ）は集約処理中間結果Ａ３’の一例を示す図であり、図８（Ｂ）は集約処理結果Ａ３の一例を示す図である。FIG. 8A is a diagram illustrating an example of the aggregation processing intermediate result A3 ′, and FIG. 8B is a diagram illustrating an example of the aggregation processing result A3. データ集約処理を示す図である。It is a figure which shows a data aggregation process. 図１０（Ａ）は集計結果Ａ４の一例を示す図であり、図１０（Ｂ）具体化処理結果Ａ５の一例を示す図である。FIG. 10A is a diagram illustrating an example of the total result A4, and FIG. 10B is a diagram illustrating an example of the materialization processing result A5. パターン抽出処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a pattern extraction process. 保守運用業務パターン表示画面の一例を示す図である。It is a figure which shows an example of a maintenance operation business pattern display screen. 保守運用業務パターン表示画面の一例を示す図である。It is a figure which shows an example of a maintenance operation business pattern display screen.

＜実施例１＞
図１は、本発明の実施例のテキスト分析システムの構成を示す図である。 <Example 1>
FIG. 1 is a diagram showing a configuration of a text analysis system according to an embodiment of the present invention.

本実施例のテキスト分析システムは、サーバ１０１によって構成される。サーバ１０１は、ネットワーク１０５を介して、１以上の計算機１０２と接続される。サーバ１０１の物理的な構成は図２を参照して後述する。 The text analysis system of this embodiment is configured by a server 101. The server 101 is connected to one or more computers 102 via the network 105. The physical configuration of the server 101 will be described later with reference to FIG.

計算機１０２は、利用者１００が使用する計算機であり、プログラムを実行するプロセッサ（ＣＰＵ）と、プログラムやデータを格納する記憶装置と、通信インターフェースとを有する。また、計算機１０２は、ディスプレイ１０３などの出力装置及び入力装置１０４（キーボード、マウス、タッチパネルなど）が接続され、利用者１００に対するユーザインターフェースを提供する。 The computer 102 is a computer used by the user 100, and includes a processor (CPU) that executes a program, a storage device that stores the program and data, and a communication interface. The computer 102 is connected to an output device such as a display 103 and an input device 104 (keyboard, mouse, touch panel, etc.), and provides a user interface for the user 100.

サーバ１０１は、データ抽象化部１１０、自然言語処理部１１１、データ集約部１１２、データ集計部１１３及びパターン抽出部１１４を有する。各部が実行する処理は図３において説明する。 The server 101 includes a data abstraction unit 110, a natural language processing unit 111, a data aggregation unit 112, a data aggregation unit 113, and a pattern extraction unit 114. The processing executed by each unit will be described with reference to FIG.

サーバ１０１は、故障情報データベース１２０及び保守運用業務パターンデータベース１２１を有する。 The server 101 has a failure information database 120 and a maintenance operation work pattern database 121.

故障情報データベース１２０は、本実施例のテキスト分析システムがテキスト情報を解析し、該テキスト情報に含まれる項目間の関係を抽出する元のデータ（故障情報）を格納するデータベースであり、その詳細は図６（Ａ）において説明する。保守運用業務パターンデータベース１２１は、本実施例のテキスト分析システムが解析した結果（具体化処理結果Ａ５）を格納するデータベースであり、その詳細は図１０（Ｂ）において説明する。 The failure information database 120 is a database that stores original data (failure information) from which the text analysis system of the present embodiment analyzes text information and extracts relationships between items included in the text information. This will be described with reference to FIG. The maintenance operation work pattern database 121 is a database for storing the result (materialization process result A5) analyzed by the text analysis system of this embodiment, and the details will be described with reference to FIG.

図２は、本実施例のテキスト分析システムを構成するサーバ１０１の物理的な構成を示すブロック図である。 FIG. 2 is a block diagram showing the physical configuration of the server 101 that constitutes the text analysis system of this embodiment.

本実施例のサーバ１０１は、プロセッサ（ＣＰＵ）１、メモリ２、補助記憶装置３及び通信インターフェース４を有する計算機によって構成される。 The server 101 of this embodiment is configured by a computer having a processor (CPU) 1, a memory 2, an auxiliary storage device 3, and a communication interface 4.

プロセッサ１は、メモリ２に格納されたプログラムを実行する。メモリ２は、不揮発性の記憶デバイスであるＲＯＭ及び揮発性の記憶デバイスであるＲＡＭを含む。ＲＯＭは、不変のプログラム（例えば、ＢＩＯＳ）などを格納する。ＲＡＭは、ＤＲＡＭ（Dynamic Random Access Memory）のような高速かつ揮発性の記憶デバイスであり、プロセッサ１が実行するプログラム及びプログラムの実行時に使用されるデータを一時的に格納する。 The processor 1 executes a program stored in the memory 2. The memory 2 includes a ROM that is a nonvolatile storage device and a RAM that is a volatile storage device. The ROM stores an immutable program (for example, BIOS). The RAM is a high-speed and volatile storage device such as a dynamic random access memory (DRAM), and temporarily stores a program executed by the processor 1 and data used when the program is executed.

補助記憶装置３は、例えば、磁気記憶装置（ＨＤＤ）、フラッシュメモリ（ＳＳＤ）等の大容量かつ不揮発性の記憶デバイスによって構成され、プロセッサ１が実行するプログラム及びプログラムの実行時に使用されるデータを格納する。すなわち、プログラムは、補助記憶装置３から読み出されて、メモリ２にロードされて、プロセッサ１によって実行される。 The auxiliary storage device 3 is configured by a large-capacity and non-volatile storage device such as a magnetic storage device (HDD) or a flash memory (SSD), for example, and stores a program executed by the processor 1 and data used when the program is executed. Store. That is, the program is read from the auxiliary storage device 3, loaded into the memory 2, and executed by the processor 1.

通信インターフェース４は、所定のプロトコルに従って、他の装置（計算機１０２など）との通信を制御するネットワークインターフェース装置である。 The communication interface 4 is a network interface device that controls communication with other devices (such as the computer 102) according to a predetermined protocol.

サーバ１０１は、入力インターフェース５及び出力インターフェース８を有してもよい。入力インターフェース５は、キーボード６やマウス７などが接続され、管理者からの入力を受けるインターフェースである。出力インターフェース８は、ディスプレイ装置９やプリンタなどが接続され、サーバ１０１の状態やプログラムの実行結果を管理者が視認可能な形式で出力するインターフェースである。 The server 101 may have an input interface 5 and an output interface 8. The input interface 5 is an interface to which a keyboard 6 and a mouse 7 are connected and receives an input from an administrator. The output interface 8 is an interface to which a display device 9 or a printer is connected, and outputs the state of the server 101 and the execution result of the program in a format that can be viewed by an administrator.

プロセッサ１が実行するプログラムは、リムーバブルメディア（ＣＤ−ＲＯＭ、フラッシュメモリなど）又はネットワークを介してサーバ１０１に提供され、非一時的記憶媒体である不揮発性の補助記憶装置３に格納される。このため、サーバ１０１は、リムーバブルメディアからデータを読み込むインターフェースを有するとよい。 The program executed by the processor 1 is provided to the server 101 via a removable medium (CD-ROM, flash memory, etc.) or a network, and is stored in the nonvolatile auxiliary storage device 3 that is a non-temporary storage medium. For this reason, the server 101 may have an interface for reading data from a removable medium.

サーバ１０１は、物理的に一つの計算機上で、又は、論理的又は物理的に構成された複数の計算機上で構成される計算機システムであり、複数の物理的計算機資源上に構築された仮想計算機上で動作してもよい。また、サーバ１０１上で実行されるプログラムは、同一の計算機上で別個のスレッドで動作してもよい。 The server 101 is a computer system configured on a plurality of computers that are physically configured on one computer or logically or physically, and is a virtual computer constructed on a plurality of physical computer resources. It may work on. Further, the program executed on the server 101 may operate in a separate thread on the same computer.

また、サーバ１０１において、プログラムによって実装される機能ブロックの全部又は一部は、物理的な集積回路（例えば、Field-Programmable Gate Array）等によって構成されてもよい。 In the server 101, all or a part of the functional blocks implemented by the program may be configured by a physical integrated circuit (for example, Field-Programmable Gate Array).

図３は、本実施例のテキスト分析処理のフローチャートである。 FIG. 3 is a flowchart of the text analysis process of this embodiment.

まず、データ抽象化部１１０は、故障情報データベース１２０から故障情報Ａ（図６（Ａ））を取得し、指定列Ｃ_０〜Ｃ_Ｎに抽象化処理を実行し、故障情報Ａ１（図６（Ｂ））を生成し、記憶装置に一時的に書き込む（Ｓ３０１）。抽象化処理は、指定列Ｃ_０〜Ｃ_Ｎのデータを、当該行の他の列のデータと比較して、データが一致した列の項目名を用いて抽象化する。抽象化処理の詳細は、図４を参照して後述する。なお、指定列は、本実施例のテキスト分析処理によって、自然文が分析される対象であり、解析すべき因果関係が含まれるデータである。 First, the data abstraction unit 110 acquires the failure information A (FIG. 6A) from the failure information database 120, executes an abstraction process on the designated columns C _{0 to} C _N , and acquires the failure information A1 (FIG. B)) is generated and temporarily written in the storage device (S301). Abstraction process, the data for the specified column C ₀ -C _N, as compared with the data of the other columns in the row, abstracts using item names of the columns data matches. Details of the abstraction processing will be described later with reference to FIG. Note that the designated column is data on which a natural sentence is analyzed by the text analysis processing of the present embodiment, and is data including a causal relationship to be analyzed.

次に、自然言語処理部１１１は、故障情報Ａ１（図６（Ｂ））の指定列Ｃ_０〜Ｃ_Ｎに自然言語処理を実行して、キーワードを抽出し、故障情報Ａ２（図７（Ａ））を生成し、記憶装置に一時的に書き込む（Ｓ３０２）。自然言語処理は、公知の技術を用いることができ、故障情報Ａ１に含まれる自然文から単語を抽出する。例えば、形態素解析技術を用いて、自然文から単語を抽出し、所定の品詞（例えば、名詞、動詞）をキーワードとして抽出する。 Next, the natural language processing unit 111 executes the natural language processing to specify columns _C 0 -C _N of the failure information A1 (FIG. 6 (B)), to extract the keyword, failure information A2 (FIG. 7 (A )) Is generated and temporarily written in the storage device (S302). For natural language processing, a known technique can be used, and a word is extracted from a natural sentence included in the failure information A1. For example, using a morphological analysis technique, a word is extracted from a natural sentence, and a predetermined part of speech (for example, a noun or a verb) is extracted as a keyword.

その後、データ集約部１１２は、故障情報Ａ２（図７（Ａ））の指定された列Ｃ_０〜Ｃ_Ｎのデータに所定の順序であいまい集約処理を実行して、集約処理結果Ａ３（図８（Ｂ））を生成し、記憶装置に一時的に書き込む（Ｓ３０３）。あいまい集約処理は、指定列Ｃ_０〜Ｃ_Ｎのデータの類似度を比較して、データが類似する行を集約する。あいまい集約処理の詳細は、図５を参照して後述する。 Thereafter, data aggregation unit 112 executes the fuzzy aggregation processing failure information A2 to data of the specified column _C 0 -C _N in (FIG. 7 (A)) in a predetermined order, aggregation processing result A3 (FIG. 8 (B)) is generated and temporarily written in the storage device (S303). Ambiguity aggregation process compares the similarity of the data in the specified columns C ₀ -C _N, data aggregates row similar. Details of the fuzzy aggregation processing will be described later with reference to FIG.

次に、データ集計部１１３は、集約処理結果Ａ３（図８（Ｂ））の指定された対象列Ｖ_０〜Ｖ_Ｍに統計処理を実行して、集計結果Ａ４（図１０（Ａ））を生成し、記憶装置に一時的に書き込む（Ｓ３０４）。統計処理の対象列Ｖ_０〜Ｖ_Ｍには、主に、数値が記録されている。統計処理では、この対象列の数値を統計処理して、当該行を代表するデータ（代表値）を算出する。統計処理は、例えば、平均値、合計値、最低値、最高値、分散値、標準偏差値、最頻値、中央値などを算出したり、確率分布（ヒストグラム）を作成してもよい。本実施例では、集約処理結果Ａ３（図８（Ｂ））の作業工数、停止時間に含まれる値の平均を算出する。 Next, the data aggregation unit 113, aggregates processing result A3 running (Fig. 8 (B)) statistical processing to specified target sequence _V 0 ~V _M of aggregation result A4 to (Fig. 10 (A)) Generate and temporarily write to the storage device (S304). The target column V ₀ ~V _M statistical process, mainly, the number is recorded. In the statistical processing, the numerical value of the target column is statistically processed to calculate data representing the row (representative value). In the statistical processing, for example, an average value, a total value, a minimum value, a maximum value, a variance value, a standard deviation value, a mode value, a median value, or the like may be calculated, or a probability distribution (histogram) may be created. In the present embodiment, the average of the values included in the work man-hours and the stop time of the aggregation processing result A3 (FIG. 8B) is calculated.

その後、データ抽象化部１１０は、集計結果Ａ４（図１０（Ａ））の指定された列Ｃ_０〜Ｃ_Ｎに具体化処理を実行して、具体化処理結果Ａ５（図１０（Ｂ））を生成し、保守運用業務パターンデータベース１２１に書き込む（Ｓ３０５）。例えば、指定列Ｃ_０〜Ｃ_Ｎに含まれている項目名は抽象化処理によって抽象化された文字列なので、本来の値に戻す。具体的には、現象欄に「装置名」が含まれている場合、当該行の装置名のデータには集約処理によって集約された複数（又は一つ以上）の装置名が含まれている。このため、現象欄の「装置名」を装置名欄の「aaa0ZZZ00##ccc1XXX02」と置き換える。また、原因欄の「故障部品」を故障部品欄の「部品001##部品008」と置き換える。さらに、現象欄に「故障部品’」が含まれている場合、当該行の故障部品のデータには集約処理によって集約された複数（又は一つ以上）の装置名が含まれている。このため、どの行のデータと一致して項目名に抽象化されたかを記録するメタデータを参照して、現象欄の「故障部品’」を故障部品欄の「部品008」と置き換える。 Thereafter, the data abstraction unit 110, counting result A4 is running a specific process in the designated column _C 0 -C _N in (FIG. 10 (A)), embodied processing result A5 (FIG. 10 (B)) Is written into the maintenance operation pattern database 121 (S305). For example, item names that are included in the specified column C ₀ -C _N because strings that are abstracted by the abstraction process, and returns to the original value. Specifically, when “device name” is included in the phenomenon column, the device name data in the row includes a plurality (or one or more) device names aggregated by the aggregation process. Therefore, the “device name” in the phenomenon column is replaced with “aaa0ZZZ00 ## ccc1XXX02” in the device name column. In addition, “failed part” in the cause column is replaced with “part 001 ## part 008” in the failed part column. Further, when “failed part” is included in the phenomenon column, the data of the failed part in the row includes a plurality of (or one or more) device names that are aggregated by the aggregation process. For this reason, referring to the metadata that records which line of data matches and is abstracted into the item name, “failed part” in the phenomenon column is replaced with “part 008” in the failed part column.

最後に、パターン抽出部１１４は、具体化処理結果Ａ５から保守運用業務パターンを抽出し、ランク付けをして、利用者１００に推薦する（Ｓ３０６）。パターン抽出処理の詳細は、図１１を参照して後述する。 Finally, the pattern extraction unit 114 extracts maintenance operation work patterns from the materialization processing result A5, ranks them, and recommends them to the user 100 (S306). Details of the pattern extraction processing will be described later with reference to FIG.

図４は、抽象化処理Ｓ３０１の詳細を示すフローチャートである。 FIG. 4 is a flowchart showing details of the abstraction processing S301.

まず、データ抽象化部１１０は、故障情報Ａの行の数をＲ、列の数Ｍとし、指定列Ｃ_０、…、Ｃ_ｎ、…、Ｃ_Ｎ（０≦ｎ≦Ｎ）と定義する。なお、指定列の数はＮ＋１列である（Ｓ４０１）。また、行を制御するパラメータｒと、指定列を制御するパラメータｎとを０に初期設定する（Ｓ４０２）。 First, the data abstraction unit 110 defines the number of rows of the failure information A as R and the number of columns M, and defines them as designated columns C ₀ ,..., C _n , ..., C _N (0 ≦ n ≦ N). The number of designated columns is N + 1 columns (S401). Further, the parameter r for controlling the row and the parameter n for controlling the designated column are initially set to 0 (S402).

そして、データ抽象化部１１０は、パラメータｎがＮ以下であるかを判定する（Ｓ４０３）。パラメータｎがＮより大きければ（Ｓ４０３でＮＯ）、全ての指定列のデータを抽象化したので、処理を終了する。一方、パラメータｎがＮ以下であれば（Ｓ４０３でＹＥＳ）、パラメータｒが行の最大数Ｒより小さいかを判定する（Ｓ４０４）。 Then, the data abstraction unit 110 determines whether the parameter n is N or less (S403). If the parameter n is greater than N (NO in S403), the processing is terminated because all the specified column data has been abstracted. On the other hand, if the parameter n is N or less (YES in S403), it is determined whether the parameter r is smaller than the maximum number R of rows (S404).

その結果、パラメータｒが行の最大数Ｒ以上であれば（Ｓ４０４でＮＯ）、現在処理中の指定列の抽象化処理は終了したので、パラメータｎに１を加算し（Ｓ４１２）、ステップＳ４０３に戻り、次の指定列のデータを抽象化する。一方、パラメータｒが行の最大数Ｒより小さければ（Ｓ４０４でＹＥＳ）、故障情報Ａのｒ行、Ｃ_ｎ列のデータＤ_ｒＣｎを取得する（Ｓ４０５）。さらに、行を制御するパラメータｓを０と、列を制御するパラメータｍとを０に初期設定する（Ｓ４０６）。 As a result, if the parameter r is equal to or greater than the maximum number of rows R (NO in S404), the abstraction processing for the designated column currently being processed is completed, so 1 is added to the parameter n (S412), and the process goes to step S403. Return and abstract the data of the next specified column. On the other hand, if the parameter r is smaller than the maximum number of rows R (YES in S404), the data D _rCn of the r row and C _n column of the failure information A is acquired (S405). Further, the parameter s for controlling the row is initialized to 0, and the parameter m for controlling the column is initialized to 0 (S406).

そして、データ抽象化部１１０は、列のパラメータｍが列の最大数Ｍより小さいかを判定する（Ｓ４０７）。その結果、列のパラメータｍが列の最大数Ｍ以上であれば（Ｓ４０７でＮＯ）、現在処理中の行の全ての列のデータとの比較が終了したので、パラメータｒに１を加算し（Ｓ４１１）、ステップＳ４０４に戻り、次の行のデータを抽象化する。 The data abstraction unit 110 determines whether the column parameter m is smaller than the maximum number M of columns (S407). As a result, if the column parameter m is equal to or greater than the maximum number M of columns (NO in S407), the comparison with the data of all columns in the currently processed row is completed, so 1 is added to the parameter r ( S411), the process returns to step S404, and the next row of data is abstracted.

一方、列のパラメータｍが列の最大数Ｍより小さければ（Ｓ４０７でＹＥＳ）、行のパラメータｓが行の最大数Ｒより小さいかを判定する（Ｓ４０８）。その結果、行のパラメータｓが行の最大数Ｒ以上であれば（Ｓ４０８でＮＯ）、当該列の処理は終了したので、パラメータｍに１を加算し（Ｓ４１４）、ステップＳ４０７に戻り、次の列のデータを処理する。 On the other hand, if the column parameter m is smaller than the maximum number M of columns (YES in S407), it is determined whether the row parameter s is smaller than the maximum number R of rows (S408). As a result, if the row parameter s is greater than or equal to the maximum number R of rows (NO in S408), the processing for the column is completed, so 1 is added to the parameter m (S414), and the process returns to step S407. Process column data.

一方、行のパラメータｓが行の最大数Ｒより小さければ（Ｓ４０８でＹＥＳ）、故障情報Ａのｓ行、ｍ列目のデータＤ_ｓｍを取得する（Ｓ４０９）。そして、取得したＤ_ｒＣｎ内で、Ｄ_ｓｍと一致する文字列を以下の条件で置換する。まず、行のパラメータｓが行のパラメータｒと等しい、すなわち、当該行の他の列の文字列Ｄ_ｓｍと等しい文字列がある場合、Ｄ_ｒＣｎ内でＤ_ｓｍと一致する文字列をｍ列目の項目名に置換する（Ｓ４１０）。例えば、故障情報Ａ（図６（Ａ））の１行目の現象欄の「aaa0ZZZ00」は、当該行の装置名のデータと同じであるため、項目名である「装置名」に置き換える。 On the other hand, if the parameter s of the row is smaller than the maximum number R of rows (YES in S408), the data D _sm of the s row and m-th column of the failure information A is acquired (S409). Then, in the acquired _DrCn , the character string that matches D _sm is replaced under the following conditions. First, if there is a character string in which the row parameter s is equal to the row parameter r, that is, the character string D _{sm in} the other column of the row is equal to the character string that matches D _sm in the D _rCn , (S410). For example, “aaa0ZZZ00” in the phenomenon column on the first line of the failure information A (FIG. 6A) is the same as the device name data on the corresponding line, and is thus replaced with “device name” which is the item name.

また、行のパラメータｓが行のパラメータｒと異なる、すなわち、他の行の文字列Ｄ_ｓｍと等しい文字列がある場合、Ｄ_ｒＣｎ内でＤ_ｓｍと一致する文字列をｍ列目の項目名’に置換する。例えば、故障情報Ａ（図６（Ａ））の２行目の現象欄の「部品008」は、３行目の故障部品のデータと同じであるため、項目名である「故障部品’」に置き換える。 In addition, if there is a character string in which the row parameter s is different from the row parameter r, that is, there is a character string equal to the character string D _sm of another row, the character string that matches D _sm in D _rCn is the item name in the m-th column. Replace with '. For example, since “component 008” in the phenomenon column on the second line of the failure information A (FIG. 6A) is the same as the data on the failed component on the third line, the item name “failed component '” is displayed. replace.

このように「’」によって、当該行のデータと一致して置換されたか、他の行のデータと一致して置換されたかを記録する。このとき、どの行のデータと一致して項目名に抽象化されたかのデータを持つとよい。 In this way, “′” records whether the data is replaced in accordance with the data in the corresponding row or is replaced in accordance with the data in another row. At this time, it is preferable to have data indicating which row data matches and is abstracted into the item name.

その後、行のパラメータｓに１を加え（４１３）、ステップＳ４０８に戻り、次の行について処理する。 Thereafter, 1 is added to the parameter s of the line (413), and the process returns to step S408 to process the next line.

以上に説明したように、抽象化処理では、指定列のデータ内の単語を同一行内で同じ単語を有する列の項目名と置換することによって、指定列のデータを抽象化するので、事前に辞書を設定せずに、データを抽象化できる。なお、前述したように。本実施例の抽象化処理では、辞書データを使用せずにデータを抽象化できるが、辞書データを併用してもよく、抽象化の精度を向上できる。 As described above, the abstraction process abstracts the data in the designated column by replacing the word in the data in the designated column with the item name of the column having the same word in the same row. Data can be abstracted without setting As mentioned above. In the abstraction processing of the present embodiment, data can be abstracted without using dictionary data, but dictionary data may be used together, and the accuracy of abstraction can be improved.

さらに、抽象化処理において、抽象化された文字列が区別可能とするためのメタデータを記録してもよい。例えば、抽象化された文字列の開始位置（何文字目か）を記録し、具体化処理Ｓ３０５において抽象化された文字列を戻す場合に参照してもよい。また、抽象化された文字列に、置換された項目名を識別するための情報を関連付けてもよい。 Further, in the abstraction process, metadata for enabling the abstracted character string to be distinguished may be recorded. For example, the start position (number of characters) of the abstracted character string may be recorded and referred to when the abstracted character string is returned in the materialization process S305. Further, information for identifying the replaced item name may be associated with the abstracted character string.

図５は、データ集約処理Ｓ３０３の詳細を示すフローチャートである。 FIG. 5 is a flowchart showing details of the data aggregation processing S303.

まず、データ集約処理Ｓ３０３は、故障情報Ａ２の行数Ｒ、列数Ｍ、指定列Ｃ_０、…、Ｃ_ｎ、…、Ｃ_Ｎ（０≦ｎ≦Ｎ）、類似度の閾値をＴと定義する。また、空のリストをＬを設定する（Ｓ５０１）。また、行を制御するパラメータｒと、指定列を制御するパラメータｎとを０に初期設定する（Ｓ５０２）。 First, the data aggregation processing S303 defines the failure information A2 as the number of rows R, the number of columns M, the designated columns C ₀ ,..., C _n , ..., C _N (0 ≦ n ≦ N), and the similarity threshold is defined as T. To do. Also, L is set to an empty list (S501). In addition, a parameter r for controlling the row and a parameter n for controlling the designated column are initially set to 0 (S502).

そして、データ集約処理Ｓ３０３は、パラメータｎがＮ以下であるかを判定する（Ｓ５０３）。パラメータｎがＮより大きければ（Ｓ５０３でＮＯ）、全ての指定列のデータを集約したので、処理を終了する。一方、パラメータｎがＮ以下であれば（Ｓ５０３でＹＥＳ）、パラメータｒが行の最大数Ｒより小さいかを判定する（Ｓ５０４）。 Then, the data aggregation processing S303 determines whether the parameter n is N or less (S503). If the parameter n is larger than N (NO in S503), the processing is terminated because the data of all the designated columns are collected. On the other hand, if the parameter n is N or less (YES in S503), it is determined whether the parameter r is smaller than the maximum number R of rows (S504).

その結果、データ集約処理Ｓ３０３は、パラメータｒが行の最大数Ｒ以上であれば（Ｓ５０４でＮＯ）、現在処理中の指定列のデータ集約処理は終了したので、一時的に作成したリストＬに含まれる組の行を一行にまとめる。このとき、指定列Ｃ_ｎ以外の列の値は指定の文字列によるデリミタ（例えば、＃＃）で繋げる（Ｓ５１２）。なお、同一の値もそのままデリミタで繋げても、同一の値は一つだけ残して、他は消去してもよい。 As a result, if the parameter r is equal to or greater than the maximum number R of rows R (NO in S504), the data aggregation processing S303 has completed the data aggregation processing for the specified column currently being processed, so that the list L temporarily created Combine included lines into one line. At this time, the value of the column other than the specified column _{C n} is connect by delimiters according to the specified character string (e.g., ##) (S512). Note that even if the same values are directly connected by a delimiter, only one identical value may be left and the others may be deleted.

そして、パラメータｎに１を加算し（Ｓ５１３）、ステップＳ５０３に戻り、次の指定列のデータを集約する。一方、パラメータｒが行の最大数Ｒより小さければ（Ｓ５０４でＹＥＳ）、故障情報Ａ２のｒ行、Ｃ_ｎのデータＤ_ｒＣｎを取得する（Ｓ５０５）。さらに、行のパラメータｋをｒ＋１に初期設定する（Ｓ５０６）。データ集約処理では、行同士を集約できるかを判定するので、既に判定が終了した組み合わせを再び判定する必要がないため、現在処理しているｒ行以後の行とを比較すればよいからである。 Then, 1 is added to the parameter n (S513), and the process returns to step S503 to aggregate the data of the next designated column. On the other hand, if the parameter r is smaller than the maximum number R of rows (YES in S504), r rows of failure information A2, acquires data _{D RCN} of _{C n} (S505). Further, the line parameter k is initialized to r + 1 (S506). This is because in the data aggregation processing, it is determined whether the rows can be aggregated, so there is no need to determine again the combination for which the determination has already been completed, so it is only necessary to compare the rows after the currently processed r row. .

そして、データ集約処理Ｓ３０３は、行のパラメータｋが列の最大数Ｒより小さいかを判定する（Ｓ５０７）。その結果、列のパラメータｋが列の最大数Ｒ以上であれば（Ｓ５０７でＮＯ）、現在処理中の行のデータと他の行のデータとの比較が終了したので、パラメータｒに１を加算し（Ｓ５１１）、ステップＳ５０４に戻り、次の行のデータを集約する。 Then, the data aggregation processing S303 determines whether the row parameter k is smaller than the maximum number R of columns (S507). As a result, if the column parameter k is equal to or greater than the maximum number R of columns (NO in S507), the comparison between the data in the currently processed row and the data in the other row is completed, and 1 is added to the parameter r. Then (S511), the process returns to step S504, and the data of the next row is collected.

一方、行のパラメータｋが列の最大数Ｒより小さければ（Ｓ５０７でＹＥＳ）、故障情報Ａ２のｋ行、Ｃ_ｎ列目のデータＤ_ｋＣｎを取得する（Ｓ５０８）。そして、取得したＤ_ｒＣｎとＤ_ｋＣｎとの類似度を計算し、計算された類似度が閾値Ｔより大きい場合｛ｒ，ｋ｝をリストＬに追加する（Ｓ５０９）。 On the other hand, if the parameter k of rows is less than the maximum number R of rows (YES at S507), k rows of failure information A2, acquires data _{D KCN} of _{C n-th} column (S508). Then, the degree of similarity between the acquired _DrCn and _DkCn is calculated, and if the calculated degree of similarity is larger than the threshold T, {r, k} is added to the list L (S509).

類似度の計算は、公知のコサイン類似度を用いることができる。故障情報Ａ２（図７（Ａ））の現象欄について類似度を計算した例を図９（Ａ）に示す。故障情報Ａ２の１行目の「装置名、異常、発生」と２行目の「装置、故障、あり」との類似度は０．００であり、２行目の「装置、故障、あり」と３行目の「装置名、装置、異常、発生」との類似度は０．３３であり、３行目の「装置名、装置、異常、発生」と１行目の「装置名、異常、発生」との類似度は０．７５である。例えば、閾値Ｔを０．７に設定すると、現象欄については、１行目と３行目とは類似すると判定される。 For calculating the similarity, a known cosine similarity can be used. FIG. 9A shows an example in which the similarity is calculated for the phenomenon column of the failure information A2 (FIG. 7A). The similarity between “device name, abnormality, occurrence” on the first line of failure information A2 and “device, failure, present” on the second line is 0.00, and “device, failure, present” on the second line. And “Device Name, Device, Abnormal, Occurrence” on the third line is 0.33, “Device Name, Device, Abnormal, Occurrence” on the third line and “Device Name, Abnormal” on the first line The similarity to “occurrence” is 0.75. For example, when the threshold value T is set to 0.7, it is determined that the first and third lines are similar in the phenomenon column.

また、類似度を用いたデータ集約の具体例は、図９（Ｂ）に示すように、故障情報Ａ２（図７（Ａ））の１行目（故障ＩＤがＦ０）の現象欄の「装置名、異常、発生」は、３行目（故障ＩＤがＦ２）の現象欄の「装置名、装置、異常、発生」と類似しているので、故障ＩＤがＦ０のデータと、Ｆ２のデータとを統合する。一方、図９（Ｃ）に示すように、故障情報Ａ２（図７（Ａ））の１行目（故障ＩＤがＦ０）の現象欄の「装置名、異常、発生」は、２行目（故障ＩＤがＦ１）の現象欄の「装置、故障、あり」と類似していないので、故障ＩＤがＦ０のデータと、Ｆ１のデータとは統合しない。 Further, as shown in FIG. 9B, a specific example of data aggregation using similarity is “device” in the phenomenon column of the first line (failure ID is F0) of the failure information A2 (FIG. 7A). “Name, Abnormality, Occurrence” is similar to “Device Name, Device, Abnormality, Occurrence” in the phenomenon column of the third row (Failure ID is F2). To integrate. On the other hand, as shown in FIG. 9C, “device name, abnormality, occurrence” in the phenomenon column of the first line (fault ID is F0) of the failure information A2 (FIG. 7A) is the second line ( Since the failure ID is not similar to “device, failure, present” in the phenomenon column of F1), the data of failure ID F0 and the data of F1 are not integrated.

次に、図８を参照して、故障情報Ａ２の複数の指定列のデータを集約して集約処理結果Ａ３を生成する過程を説明する。 Next, with reference to FIG. 8, a process of aggregating data of a plurality of designated columns of the failure information A2 and generating an aggregation processing result A3 will be described.

まず、最初の指定列Ｃ_０である現象欄のデータを集約する。前述したように、故障情報Ａ２の１行目（故障ＩＤがＦ０）と３行目（故障ＩＤがＦ２）とは現象欄のデータが類似しているので、故障ＩＤがＦ０のデータと、Ｆ２のデータとを統合する。このとき、現象欄以外の列の値は指定の文字列によるデリミタである＃＃で接続する。現象欄のデータを集約した中間結果Ａ３’を図８（Ａ）に示す。 First, aggregate data phenomena column is the first of the indicated column C _0. As described above, since the data in the phenomenon column is similar to the first line (fault ID is F0) and the third line (fault ID is F2) of the fault information A2, the data with the fault ID F0 and F2 Integrate with other data. At this time, the values of the columns other than the phenomenon column are connected by ## that is a delimiter by a designated character string. FIG. 8A shows an intermediate result A3 ′ obtained by collecting the data in the phenomenon column.

次に、指定列Ｃ_１である原因欄のデータを集約する。原因欄では類似度が閾値Ｔを超えるデータがないので、原因欄によっては行は集約されない。 Next, aggregate data causes column is specified column C _1. Since there is no data whose similarity exceeds the threshold T in the cause column, rows are not aggregated depending on the cause column.

次に、指定列Ｃ_２である作業内容欄のデータを集約する。作業内容欄が「故障部品、交換」の行が二つあり、「経過、観察」の行が二つある。このため、図８（Ｂ）に示すように、指定列Ｃ_０によって集約された行を二つのグループに分類する。なお、故障ＩＤ欄から原因欄はそのままである。なお、類似度の閾値Ｔは、指定列ごとに変えてもよい。 Next, aggregate data for work content field is specified column C _2. In the work content column, there are two “failed parts, replacement” lines, and two “progress and observation” lines. Therefore, as shown in FIG. 8 (B), to classify the row aggregated by the specified columns C ₀ into two groups. The cause column from the failure ID column remains unchanged. Note that the similarity threshold T may be changed for each designated column.

図５に戻って説明を続ける。行のパラメータｋに１を加え（５１０）、ステップＳ５０７に戻り、次の列について処理する。 Returning to FIG. 1 is added to the parameter k of the row (510), and the process returns to step S507 to process the next column.

以上に説明したように、データ集約処理では、同じ傾向のデータを纏めて、分かりやすく整形できる。また、データが完全に一致していない場合でも集約するので、入力候補が限定されていない項目でもデータを集約できる。例えば、作業者が自ら自然文で入力する項目でもデータを集約できる。 As described above, in the data aggregation process, data having the same tendency can be collected and shaped in an easy-to-understand manner. Further, since data is aggregated even when the data does not completely match, data can be aggregated even for items for which input candidates are not limited. For example, data can be aggregated even for items that the operator inputs in natural sentences.

次に、図６から図１０を用いて、故障情報データベース１２０に格納される故障情報Ａから、保守運用業務パターンデータベース１２１に格納される具体化処理結果Ａ５が生成されるまでのデータの変化を説明する。なお、以後の図において、斜字体は抽象化処理Ｓ３０１によって置換されたデータを示し、下線は当該処理ステップによって変更されたデータを示す。 Next, using FIG. 6 to FIG. 10, the change in data from the failure information A stored in the failure information database 120 to the instantiation processing result A5 stored in the maintenance operation pattern database 121 is shown. explain. In the following drawings, italics indicate data replaced by the abstraction processing S301, and underline indicates data changed by the processing step.

図６（Ａ）は、故障情報データベース１２０に格納される故障情報Ａの一例を示す図である。故障情報Ａは、故障ＩＤ、発生日時、装置名、装置分類、故障部品、現象、原因、作業内容、作業工数、及び停止時間のデータを含む。故障情報Ａは、例えば、故障に対する作業を行った作業者が入力する。 FIG. 6A is a diagram illustrating an example of failure information A stored in the failure information database 120. The failure information A includes data of failure ID, occurrence date / time, device name, device classification, failure part, phenomenon, cause, work content, work man-hour, and stop time. The failure information A is input by, for example, an operator who has performed work for the failure.

故障ＩＤは、故障情報データベース１２０に格納されるデータを一意に識別するための識別情報である。発生日時は、当該故障が発生した日時である。装置名は、当該故障が発生した装置の名称である。装置分類は、当該故障が発生した装置の分類である。故障部品は、当該故障した部品の型番又は名称である。現象は、当該故障の現象であり、原因は、当該故障の原因である。作業内容は、当該故障に対する作業の内容である。作業工数は、当該故障に対する作業を行った工数である。停止時間は、当該故障により装置が停止した時間である。 The failure ID is identification information for uniquely identifying data stored in the failure information database 120. The occurrence date and time is the date and time when the failure occurred. The device name is the name of the device in which the failure has occurred. The device classification is a classification of a device in which the failure has occurred. The failed part is the model number or name of the failed part. The phenomenon is the phenomenon of the failure, and the cause is the cause of the failure. The work content is the work content for the failure. The work man-hour is the man-hour for performing work for the failure. The stop time is a time when the apparatus is stopped due to the failure.

なお、現象、原因及び作業内容が指定列、作業工数及び停止時間が対象列として定義されている。テキスト分析の対象となる指定列には、図示した項目の他、発生日時、発生場所（地域）対応者などを定義してもよい。 Note that the phenomenon, cause, and work content are defined as a designated column, work man-hours, and stop time as target columns. In addition to the items shown in the figure, the date and time of occurrence, the location (region) responder, etc. may be defined in the designated column to be subjected to text analysis.

図６（Ｂ）に一例を示す故障情報Ａ１は、抽象化処理Ｓ３０１によって故障情報Ａから生成される。指定列（現象、原因、作業内容）のデータのうち、他の項目のデータと同じデータが当該データの項目名と置換されている。 The fault information A1 shown as an example in FIG. 6B is generated from the fault information A by the abstraction process S301. Of the data in the designated column (phenomenon, cause, work content), the same data as the data of other items is replaced with the item name of the data.

図７（Ａ）に一例を示す故障情報Ａ２は、自然言語処理Ｓ３０２によって故障情報Ａ１から生成される。指定列（現象、原因、作業内容）のデータ（自然文）が自然言語処理によって単語に分解されている。 The failure information A2 shown as an example in FIG. 7A is generated from the failure information A1 by the natural language processing S302. Data (natural sentences) in a designated string (phenomenon, cause, work content) is broken down into words by natural language processing.

図８（Ｂ）に一例を示す集約処理結果Ａ３は、データ集約処理Ｓ３０３によって故障情報Ａ２から生成される。指定列のデータが類似している場合、当該行が集約される。具体的には、現象、原因、作業内容の少なくとも一つが同じ行を纏めて、項目間の関係を分析する。なお、図８（Ａ）は、集約処理結果Ａ３を導出するための中間結果Ａ３’である。 The aggregation processing result A3 shown as an example in FIG. 8B is generated from the failure information A2 by the data aggregation processing S303. If the data in the specified column is similar, the row is aggregated. Specifically, at least one of the phenomenon, the cause, and the work content is collected on the same line, and the relationship between items is analyzed. FIG. 8A shows an intermediate result A3 ′ for deriving the aggregation processing result A3.

図１０（Ａ）に一例を示す集計結果Ａ４は、集計処理Ｓ３０４によって集約処理結果Ａ３から生成される。対象列（作業項数、停止時間）のデータを統計処理されて平均値で更新されている。 An aggregation result A4 shown as an example in FIG. 10A is generated from the aggregation processing result A3 by the aggregation processing S304. Data of the target column (number of work items, stop time) is statistically processed and updated with an average value.

図１０（Ｂ）に一例を示す具体化処理結果Ａ５は、具体化処理Ｓ３０５によって集計結果Ａ４から生成され、保守運用業務パターンデータベース１２１に格納される。抽象化処理Ｓ３０１で置換されたデータが、元のデータ（装置名、部品名など）に置換されている。 The materialization process result A5 shown in FIG. 10B is generated from the totalization result A4 by the materialization process S305 and stored in the maintenance operation work pattern database 121. The data replaced in the abstraction processing S301 is replaced with the original data (device name, part name, etc.).

図１１は、パターン抽出処理Ｓ３０６の詳細を示すフローチャートである。 FIG. 11 is a flowchart showing details of the pattern extraction processing S306.

まず、利用者１００が着目する指標の種類の指定を計算機１０２に入力する。パターン抽出部１１４は、利用者１００が指定した指標を取得する（Ｓ１１０１）。このとき、利用者１００は、指標の種類の他、出力するデータの数やデータの並び順（昇順又は降順）を指定してもよい。 First, designation of the type of index that the user 100 focuses on is input to the computer 102. The pattern extraction unit 114 acquires an index designated by the user 100 (S1101). At this time, the user 100 may specify the number of data to be output and the arrangement order (ascending order or descending order) of data in addition to the type of index.

そして、パターン抽出部１１４は、指定された指標の値の昇順又は降順に所定数の保守運用業務パターンを保守運用業務パターンデータベース１２１から抽出する（Ｓ１１０２）。 Then, the pattern extraction unit 114 extracts a predetermined number of maintenance operation business patterns from the maintenance operation business pattern database 121 in ascending or descending order of the designated index values (S1102).

最後に、パターン抽出部１１４は、保守運用業務パターンデータベース１２１から抽出した保守運用業務パターンを表形式やグラフ形式などで可視化するデータを生成し、計算機１０２に送信する（Ｓ１１０３）。計算機１０２は、サーバ１０１（パターン抽出部１１４）から送信されたデータをディスプレイ１０３に表示する。 Finally, the pattern extraction unit 114 generates data for visualizing the maintenance operation business pattern extracted from the maintenance operation business pattern database 121 in a table format or a graph format, and transmits the data to the computer 102 (S1103). The computer 102 displays the data transmitted from the server 101 (pattern extraction unit 114) on the display 103.

図１２及び図１３は、保守運用業務パターン表示画面の一例を示す図である。 12 and 13 are diagrams illustrating an example of the maintenance operation work pattern display screen.

図１２は、利用者１００が指標の種類として作業工数を指定した場合の保守運用業務パターン表示画面の例を示す。図示するように、画面の上部には、作業工数が大きい順に三つの保守運用業務パターンが表示される。画面の下部には作業工数の値を表す円グラフと棒グラフが表示される。例えば、利用者１００は、１行目のデータからＨＤＤの交換時期を検討してもよい。また、サーバ１０１、作業内容（ＨＤＤの交換）を推奨してもよい。 FIG. 12 shows an example of a maintenance operation work pattern display screen when the user 100 designates the work man-hour as the index type. As shown in the figure, at the top of the screen, three maintenance operation work patterns are displayed in descending order of work man-hours. At the bottom of the screen, a pie chart and a bar chart showing the work man-hour values are displayed. For example, the user 100 may consider the replacement time of the HDD from the data in the first row. Further, the server 101 and work contents (replacement of HDD) may be recommended.

図１３は、利用者１００が指標の種類として停止時間を指定した場合の保守運用業務パターン表示画面の例を示す。図示するように、画面の上部には、停止時間が長い順に三つの保守運用業務パターンが表示される。画面の下部には停止時間の値を表す円グラフと棒グラフが表示される。 FIG. 13 shows an example of a maintenance operation work pattern display screen when the user 100 designates a stop time as an index type. As shown in the figure, at the top of the screen, three maintenance operation work patterns are displayed in the order of longer stop time. A pie chart and a bar chart showing the value of the stop time are displayed at the bottom of the screen.

以上に、本発明によるテキスト解析方法について故障情報を例にして説明したが、他のテキスト情報の解析にも適用できる。 The text analysis method according to the present invention has been described above by taking failure information as an example, but it can also be applied to the analysis of other text information.

以上に説明したように、本発明の実施例のテキスト分析システムは、テーブル内の各行について、他の列のデータを用いて指定列のデータを抽象化するデータ抽象化部１１０と、抽象化されたデータからキーワードを抽出する自然言語処理部１１１と、指定列のキーワードの類似度によって、データが類似する行を集約するデータ集約部１１２と、統計処理の対象行のデータの代表値を計算するデータ集計部１１３とを有し、データ抽象化部１１０は、指定列のデータを当該行の他の列のデータに置換した具体化処理結果Ａ５を生成するので、事前に定義をすることなく事象間の関係を抽出できる。また、同じ傾向のデータが散らばっていると傾向が分かりづらいが、同じ傾向のデータを纏めることができる。 As described above, the text analysis system according to the embodiment of the present invention is abstracted with the data abstraction unit 110 that abstracts the data of the designated column using the data of other columns for each row in the table. A natural language processing unit 111 that extracts keywords from the collected data, a data aggregation unit 112 that aggregates rows with similar data according to the similarity of the keywords in the specified column, and a representative value of the data of the target row for statistical processing The data abstraction unit 110 generates the materialization processing result A5 in which the data of the designated column is replaced with the data of the other column of the row, so that the event can be performed without defining in advance. The relationship between them can be extracted. In addition, if data having the same tendency is scattered, it is difficult to understand the tendency, but data having the same tendency can be collected.

また、利用者が入力した指標の値について、代表値の昇順又は降順に所定数のデータを具体化処理結果Ａ５から抽出し、利用者に提示する抽出部を有するので、影響が大きい傾向を抽出できる。 In addition, for the index value input by the user, a predetermined number of data is extracted from the materialization processing result A5 in ascending or descending order of the representative value, and the extraction unit presenting the extracted value to the user is extracted. it can.

また、データ抽象化部１１０は、各指定列のデータを当該行の他の列のデータと比較し、各指定列のデータ内の単語を同一行で同じ単語がある列の項目名に置換するので、固有名詞や表現の揺れを吸収できる。 Further, the data abstraction unit 110 compares the data in each designated column with the data in the other columns of the row, and replaces the word in the data in each designated column with the item name of the column having the same word in the same row. So it can absorb the swing of proper nouns and expressions.

また、データ抽象化部１１０は、指定列のデータに含まれる項目名を、当該行の当該項目名のデータに置換して具体化するので、事前に定義をすることなく、抽象化したデータを具体化できる。 The data abstraction unit 110 replaces the item name included in the data of the specified column with the data of the item name in the row, so that the data is abstracted without defining in advance. It can be embodied.

また、データ集約部１１２は、各指定列のデータを当該指定列の他の行のデータと比較し、データの類似度が所定の閾値より大きい場合、類似するデータの行を纏めるので、解析対象の自然文の表現が揺れていたり、入力候補が限定されていない項目でも、データを集約できる。 Further, the data aggregating unit 112 compares the data of each designated column with the data of other rows of the designated column, and collects similar rows of data when the degree of similarity of the data is greater than a predetermined threshold. Data can be aggregated even for items whose natural sentence expression is shaking or whose input candidates are not limited.

なお、本発明は前述した実施例に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例及び同等の構成が含まれる。例えば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに本発明は限定されない。また、ある実施例の構成の一部を他の実施例の構成に置き換えてもよい。また、ある実施例の構成に他の実施例の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加・削除・置換をしてもよい。 The present invention is not limited to the above-described embodiments, and includes various modifications and equivalent configurations within the scope of the appended claims. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and the present invention is not necessarily limited to those having all the configurations described. A part of the configuration of one embodiment may be replaced with the configuration of another embodiment. Moreover, you may add the structure of another Example to the structure of a certain Example. In addition, for a part of the configuration of each embodiment, another configuration may be added, deleted, or replaced.

また、前述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等により、ハードウェアで実現してもよく、プロセッサがそれぞれの機能を実現するプログラムを解釈し実行することにより、ソフトウェアで実現してもよい。 In addition, each of the above-described configurations, functions, processing units, processing means, etc. may be realized in hardware by designing a part or all of them, for example, with an integrated circuit, and the processor realizes each function. It may be realized by software by interpreting and executing the program to be executed.

各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置、又は、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に格納することができる。 Information such as programs, tables, and files that realize each function can be stored in a storage device such as a memory, a hard disk, and an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, and a DVD.

また、制御線や情報線は説明上必要と考えられるものを示しており、実装上必要な全ての制御線や情報線を示しているとは限らない。実際には、ほとんど全ての構成が相互に接続されていると考えてよい。 Further, the control lines and the information lines are those that are considered necessary for the explanation, and not all the control lines and the information lines that are necessary for the mounting are shown. In practice, it can be considered that almost all the components are connected to each other.

１０１サーバ
１０２計算機
１１０データ抽象化部
１１１自然言語処理部
１１２データ集約部
１１３データ集計部
１１４パターン抽出部
１２０故障情報データベース
１２１保守運用業務パターンデータベース 101 Server 102 Computer 110 Data Abstraction Unit 111 Natural Language Processing Unit 112 Data Aggregation Unit 113 Data Aggregation Unit 114 Pattern Extraction Unit 120 Failure Information Database 121 Maintenance Operation Business Pattern Database

Claims

A text analysis system,
A processor that executes a program; and a storage device that is accessed by the processor;
The text information analyzed by the text analysis system includes an item name defined in each of a plurality of columns, and a plurality of line data including character or numerical data corresponding to the item name,
The text analysis system includes:
For each row, an abstraction unit that abstracts the data of the designated column using data of other columns,
A language processing unit for extracting keywords from the abstracted data;
An aggregation unit that aggregates rows with similar data according to the similarity of keywords in the specified column;
A totaling unit that calculates the representative value of the data of the target row for statistical processing,
The abstraction unit generates a materialization process result by replacing the data in the designated column with data in another column of the row.

The text analysis system according to claim 1,
A text analysis system comprising: an extracting unit that extracts a predetermined number of data from the materialization processing result in ascending or descending order of the representative value for an index input by a user and presents it to the user.

The text analysis system according to claim 1,
The abstraction unit is:
Compare the data in each specified column with the data in other columns in the row,
A text analysis system that abstracts data in a specified column by replacing words in the data in each specified column with item names of columns having the same word in the same row.

The text analysis system according to claim 3,
The abstraction unit replaces the item name included in the data of the designated column with the data of the item name in the row and concretizes the text analysis system.

The text analysis system according to claim 1,
The aggregation unit is
Compare the data in each specified column with the data in other rows of the specified column,
A text analysis system that aggregates rows with similar data when the similarity of the data is greater than a predetermined threshold.

A text analysis method executed by a computer,
The computer includes a processor that executes a program, and a storage device that is accessed by the processor.
The text information analyzed by the computer includes a plurality of item names and a plurality of line data including character or numerical data corresponding to the item names,
The text analysis method includes:
The processor, for each row, using the data of the other column to abstract the data of the designated column;
The processor includes a language processing step of extracting a keyword from the abstracted data;
The processor includes an aggregation step of aggregating rows with similar data according to the similarity of keywords in the designated column;
The processor includes a totaling step for calculating a representative value of data of a target row for statistical processing,
A text analysis method comprising: an instantiation step of generating an instantiation processing result obtained by replacing the data in the designated column with data in another column of the row.

The text analysis method according to claim 6, comprising:
The processor includes an extraction step of extracting a predetermined number of data from the materialization process result in ascending or descending order of the representative value for the index input by the user and presenting the result to the user. Method.

The text analysis method according to claim 6, comprising:
In the abstraction step,
The processor compares the data in each designated column with the data in the other columns of the row,
The text analysis method, wherein the processor abstracts the data of the designated column by replacing words in the data of the designated column with item names of columns having the same word in the same row.

The text analysis method according to claim 8, comprising:
In the abstraction step, the processor is configured to replace the item name included in the data of the designated column with the data of the item name in the row and to make it concrete.

The text analysis method according to claim 6, comprising:
In the aggregation step,
The processor compares the data of each designated column with the data of other rows of the designated column,
The text analysis method according to claim 1, wherein when the similarity of data is greater than a predetermined threshold, the processor aggregates lines having similar data.