JP2011039938A

JP2011039938A - Document analysis program and document analysis system

Info

Publication number: JP2011039938A
Application number: JP2009188648A
Authority: JP
Inventors: Takeshi Mizunashi; 豪水梨; Masato Obe; 正人小部; Shoichi Tateno; 昌一舘野
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2009-08-17
Filing date: 2009-08-17
Publication date: 2011-02-24
Anticipated expiration: 2029-08-17
Also published as: JP5446577B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a means for making a comparison between different graphs for a set of different documents. <P>SOLUTION: In selection processing, a graph selecting section 202 allows a display device 102 to display a screen for a user to select a plurality of graph data to be compared. When the graph data P and Q are selected by the user, the selected graph data P and Q are acquired from a graph data storage part 212. In comparison item selection processing, the graph selecting section 202 allows the display device to display the kinds of comparison items in order that the user selects the comparison items highlighted in the graph data P and Q to be compared. In comparative display processing, a comparative display control section 204 makes a comparative display of emphasizing the difference from the graph data Q while displaying the graph data P based on the graph data P and Q and comparison items selected in the selection processing. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、文書解析プログラム及び文書解析システムに関する。 The present invention relates to a document analysis program and a document analysis system.

アンケートや苦情文書に含まれる自由形式で記載される文書には、多くの有用な情報が含まれていることから、文書を解析して、製品開発にフィードバックすることが望ましい。しかしながら、その量の多さから、人手によりその解析を行うことは時間的にも費用的にも困難であり、コンピュータによる解析を行うことが試みられている。 Documents written in free form included in questionnaires and complaint documents contain a lot of useful information, so it is desirable to analyze the documents and feed them back to product development. However, because of the large amount, it is difficult to perform the analysis manually and in terms of time and cost, and attempts have been made to perform analysis by a computer.

特許文献１は、複数の文書における単語の共起頻度から、意味ネットワークを作成することについて開示している。特許文献２は、文書からグラフを作成し、各グラフのノードに重要度、リンクに関連度が振られ、これらをグラフ間で比較することについて開示している。特許文献３は、複数の文書における単語の出現頻度に基づいてグラフを作成すると共に、部分集合を作成することについて開示している。 Patent Document 1 discloses creating a semantic network from word co-occurrence frequencies in a plurality of documents. Patent Document 2 discloses that a graph is created from a document, importance is assigned to a node of each graph, and relevance is assigned to a link, and these are compared between the graphs. Patent Document 3 discloses creating a graph and creating a subset based on the appearance frequency of words in a plurality of documents.

特開２００１−２４３２２３号公報JP 2001-243223 A 特開２００３−３３０９６６号公報JP 2003-330966 A 特開２００９−１２８９４９号公報JP 2009-128949 A

本発明は、異なる文書の集合を対象にした異なるグラフ間において、比較する手段を提供することを目的とする。 It is an object of the present invention to provide a means for comparing between different graphs targeting different document sets.

請求項１に記載の発明は、文書に出現する語の文字列情報及び前記語についての出現頻度情報を有するノードデータ、並びに複数の前記ノードデータを結びつける情報を有するリンクデータをそれぞれ複数含むグラフデータを複数選択するグラフデータ選択手順と、前記選択された前記複数のグラフデータを比較表示するグラフ比較表示手順と、を処理装置に実行させるための文書解析プログラムである。 The invention according to claim 1 is graph data including a plurality of pieces of link data having character string information of words appearing in a document and node data having appearance frequency information for the words, and information linking a plurality of the node data. Is a document analysis program for causing a processing device to execute a graph data selection procedure for selecting a plurality of graphs and a graph comparison display procedure for comparing and displaying the selected plurality of graph data.

請求項２に記載の発明は、前記複数の文書が入力される文書入力手順と、前記文書入力手順において入力された前記複数の文書に含まれる各語のうち、出現頻度の高い語である複数の第１語を抽出し、前記第１語のうちのひとつである第１特定語が含まれる文書の部分集合において、出現頻度の高い語である複数の第２語を抽出する語抽出手順と、前記部分集合の第１特定語と第２語とを結びつける情報を有するリンクデータを保存するグラフデータ保存手順と、を更に備えることを特徴とする請求項１に記載の文書解析プログラムである。 The invention according to claim 2 is a document input procedure in which the plurality of documents are input, and a plurality of words that have a high appearance frequency among the words included in the plurality of documents input in the document input procedure. A word extraction procedure for extracting a plurality of second words that are words having a high appearance frequency in a subset of documents including a first specific word that is one of the first words. The document analysis program according to claim 1, further comprising: a graph data storage procedure for storing link data having information that links the first specific word and the second word of the subset.

請求項３に記載の発明は、前記グラフデータ選択手順では、前記比較において使用する比較項目を更に選択し、前記グラフ比較表示手順では、前記選択された前記比較項目に基づいて前記比較表示を行う、ことを特徴とする請求項１又は２に記載の文書解析プログラムである。 According to a third aspect of the present invention, in the graph data selection procedure, a comparison item to be used in the comparison is further selected, and in the graph comparison display procedure, the comparison display is performed based on the selected comparison item. The document analysis program according to claim 1, wherein the document analysis program is a document analysis program.

請求項４に記載の発明は、前記グラフ比較表示手順では、前記選択された前記複数のグラフデータのうちの一のグラフデータについて、前記ノードデータに基づいて前記文字列情報を含むノードの表示と、前記リンクデータに基づいてノード間をつなぐ矢印の表示とを行う、ことを特徴とする請求項１〜３のいずれか一項に記載の文書解析プログラムである。 According to a fourth aspect of the present invention, in the graph comparison display procedure, for one graph data of the selected plurality of graph data, a node display including the character string information based on the node data The document analysis program according to any one of claims 1 to 3, wherein an arrow connecting nodes is displayed based on the link data.

請求項５に記載の発明は、前記グラフ比較表示手順では、前記一のグラフデータの前記ノードに含まれる前記文字列情報と、前記複数のグラフデータのうち他のグラフデータにおける文字列情報とが、一致する文字列情報に係る前記ノード又は前記文字列情報を強調して表示する、ことを特徴とする請求項４に記載の文書解析プログラムである。 In the graph comparison display procedure, the character string information included in the node of the one graph data and the character string information in other graph data among the plurality of graph data may be included in the graph comparison display procedure. 5. The document analysis program according to claim 4, wherein the node or the character string information related to the matched character string information is displayed in an emphasized manner.

請求項６に記載の発明は、前記グラフ比較表示手順では、前記一のグラフデータと、前記複数のグラフデータのうち他のグラフデータとに共に出現する同一の文字列情報に係る前記語の出現頻度において、前記他のグラフデータにおける前記出現頻度の比率が高い旨又は前記出現頻度の比率が低い旨を示すよう強調して表示を行う、ことを特徴とする請求項４又は５に記載の文書解析プログラムである。 According to a sixth aspect of the present invention, in the graph comparison display procedure, the appearance of the word relating to the same character string information that appears together in the one graph data and the other graph data among the plurality of graph data 6. The document according to claim 4, wherein the document is displayed with emphasis so as to indicate that the ratio of the appearance frequency in the other graph data is high or the ratio of the appearance frequency is low. It is an analysis program.

請求項７に記載の発明は、前記グラフ比較表示手順では、前記一のグラフデータと、前記複数のグラフデータのうち他のグラフデータとに共に出現する同一の文字列情報に係る前記語の出現頻度において、前記一のグラフデータにおける前記出現頻度の順位が高い旨又は前記出現頻度の順位が低い旨を示すよう強調して表示を行う、ことを特徴とする請求項４〜６のいずれか一項に記載の文書解析プログラムである。 According to a seventh aspect of the present invention, in the graph comparison and display procedure, the appearance of the word related to the same character string information appearing together in the one graph data and the other graph data among the plurality of graph data The frequency is displayed with emphasis so as to indicate that the rank of the appearance frequency is high or the rank of the appearance frequency is low in the one graph data. The document analysis program described in the section.

請求項８に記載の発明は、前記グラフ比較表示手順では、一リンクデータに含まれる前記ノードデータを結びつける情報により組合わせられる前記文字列情報の組合わせについて、前記複数のグラフデータのうちの他のグラフデータの前記文字列情報の組合わせが、前記一のグラフデータの前記文字列情報の組合わせのいずれにおいても、前記文字列情報の組合わせの一方とのみしか一致しない場合には、前記文字列情報の組み合わせの他方の文字列情報に係るノード又は前記ノードの文字列情報を強調して表示する、ことを特徴とする請求項４〜７のいずれか一項に記載の文書解析プログラムである。 According to an eighth aspect of the present invention, in the graph comparison and display procedure, the combination of the character string information combined by the information linking the node data included in one link data is the other of the plurality of graph data. If the combination of the character string information of the graph data is only matched with only one of the combinations of the character string information in any of the combinations of the character string information of the one graph data, The document analysis program according to any one of claims 4 to 7, wherein a node related to the other character string information of the combination of character string information or the character string information of the node is highlighted and displayed. is there.

請求項９に記載の発明は、前記グラフ比較表示手順では、前記一のグラフデータに含まれる異なる２つの文字列情報の組合わせと、前記他のグラフデータに含まれる前記文字列情報の組合わせと同一の文字列情報の組合わせのうち、前記文字列情報の組合わせの一方のみが前記リンクデータに含まれる前記ノードデータを結びつける情報により組合わせられる前記文字列情報の組合わせである場合、又は前記文字列情報の組合わせの両方が前記リンクデータに含まれる前記ノードデータを結びつける情報により組合わせられる前記文字列情報の組合わせであるが、組合わせの順番が互いに異なっている場合には、前記矢印を強調して表示する、ことを特徴とする請求項４〜８のいずれか一項に記載の文書解析プログラムである。 The invention according to claim 9 is a combination of two different character string information included in the one graph data and a combination of the character string information included in the other graph data in the graph comparison display procedure. If only one of the combinations of the character string information is a combination of the character string information that is combined by the information that links the node data included in the link data, Or the combination of the character string information is a combination of the character string information combined by the information that connects the node data included in the link data, but the combination order is different from each other The document analysis program according to any one of claims 4 to 8, wherein the arrow is highlighted and displayed.

請求項１０に記載の発明は、前記比較された結果の表示は、前記一のグラフデータの上に、前記複数のグラフデータのうちの他のグラフデータを重ね合わせて表示することにより行う、ことを特徴とする請求項４〜９のいずれか一項に記載の文書解析プログラムである。 According to a tenth aspect of the present invention, the comparison result is displayed by superimposing and displaying other graph data of the plurality of graph data on the one graph data. The document analysis program according to claim 4, wherein the program is a document analysis program.

請求項１１に記載の発明は、前記重ね合わせた表示を行う際には、前記ノード及び前記矢印を出現、消失、変色及び拡縮のいずれかによる動的な表示を行う、ことを特徴とする請求項１０に記載の文書解析プログラムである。 The invention according to claim 11 is characterized in that when the superimposed display is performed, the node and the arrow are dynamically displayed by any of appearance, disappearance, discoloration, and enlargement / reduction. The document analysis program according to Item 10.

請求項１２に記載の発明は、前記複数のグラフデータのうちの前記一のグラフデータは、比較される他のグラフデータを含んでいる、ことを特徴とする請求項４〜１１のいずれか一項に記載の文書解析プログラムである。 The invention according to claim 12 is characterized in that the one graph data of the plurality of graph data includes other graph data to be compared. The document analysis program described in the section.

請求項１３に記載の発明は、前記強調のための表示は、色、線の太さ及び線の種類のいずれかを変える、又は、影、半透明、太字により表示することにより行う、ことを特徴とする請求項５〜１２のいずれか一項に記載の文書解析プログラムである。 According to a thirteenth aspect of the present invention, the display for emphasis is performed by changing any of color, line thickness, and line type, or by displaying in shadow, translucent, or bold. The document analysis program according to any one of claims 5 to 12, wherein the program is a document analysis program.

請求項１４に記載の発明は、文書に出現する語の文字列情報及び前記語についての出現頻度情報を有するノードデータ、並びに複数の前記ノードデータを結びつける情報であるリンクデータをそれぞれ複数含むグラフデータから、複数の前記グラフデータを選択するグラフ選択部と、前記選択された前記複数のグラフデータを比較して表示させる比較表示制御部と、を備える文書解析システムである。 The invention according to claim 14 is a graph data including a plurality of pieces of link data which is character string information of words appearing in a document, node data having appearance frequency information about the words, and information linking a plurality of the node data. To a graph selection unit that selects the plurality of graph data, and a comparison display control unit that compares and displays the selected plurality of graph data.

請求項１５に記載の発明は、前記複数の文書が入力される文書入力部と、前記文書入力部により入力された前記複数の文書に含まれる各語のうち、出現頻度の高い語である複数の第１語を抽出し、前記第１語のうちのひとつである第１特定語が含まれる文書の部分集合において、出現頻度の高い語である複数の第２語を抽出する語抽出部と、前記部分集合の第１特定語と第２語とを結びつける情報を、前記グラフデータ記憶部に保存するグラフデータ保存部と、を更に備えることを特徴とする請求項１４に記載の文書解析システムである。 The invention according to claim 15 is a plurality of words that have a high frequency of appearance among a document input unit to which the plurality of documents are input and each word included in the plurality of documents input by the document input unit. A word extraction unit that extracts a plurality of second words that are words having a high appearance frequency in a subset of documents including the first specific word that is one of the first words. The document analysis system according to claim 14, further comprising: a graph data storage unit that stores, in the graph data storage unit, information that links the first specific word and the second word of the subset. It is.

請求項１及び１４に記載の発明によれば、異なる文書の集合を対象にした異なる複数のグラフデータについて比較表示することができる。 According to the invention described in claims 1 and 14, a plurality of different graph data targeting a set of different documents can be compared and displayed.

請求項２及び１５に記載の発明によれば、複数の文書の入力から、グラフの比較表示までを一連の処理により実現することができる。 According to the second and fifteenth aspects of the present invention, it is possible to realize a series of processes from inputting a plurality of documents to comparing and displaying graphs.

請求項３に記載の発明によれば、必要に応じた比較項目について比較表示することができる。 According to the third aspect of the present invention, it is possible to comparatively display comparison items as necessary.

請求項４に記載の発明によれば、一方のグラフデータについてのノードと矢印の表示に基づいて、比較表示することができる。 According to the fourth aspect of the present invention, the comparison display can be performed based on the display of the node and the arrow for one graph data.

請求項５に記載の発明によれば、複数のグラフデータ間において、一致する文字列情報を有するノードについて比較表示することができる。 According to the invention described in claim 5, it is possible to compare and display nodes having matching character string information among a plurality of graph data.

請求項６に記載の発明によれば、一のグラフデータと、他のグラフデータとに共に出現する同一の語の出現頻度の比率の違いについて比較表示することができる。 According to the sixth aspect of the present invention, it is possible to compare and display the difference in the ratio of the appearance frequencies of the same word that appears together in one graph data and the other graph data.

請求項７に記載の発明によれば、一のグラフデータと、他のグラフデータとに共に出現する同一の語の出現頻度の順位の違いについて比較表示することができる。 According to the seventh aspect of the present invention, it is possible to compare and display the difference in the ranks of the appearance frequencies of the same word that appears together in one graph data and the other graph data.

請求項８に記載の発明によれば、複数のグラフデータ間において、共起する相手の語が異なる語について比較表示することができる。 According to the eighth aspect of the present invention, it is possible to compare and display words having different co-occurrence partner words among a plurality of graph data.

請求項９に記載の発明によれば、複数のグラフデータ間において、共起する相手の語との関係が異なることについて比較表示することができる。 According to the ninth aspect of the present invention, it is possible to compare and display that the relationship between the co-occurring partner words is different among a plurality of graph data.

請求項１０、１１及び１３に記載の発明によれば、違いをより視覚的に際立たせて表示することができる。 According to invention of Claim 10, 11 and 13, a difference can be displayed more visually conspicuously.

請求項１２に記載の発明によれば、一のグラフデータと、それに含まれる一部の集合としてのグラフデータについて、視覚的に表示することができる。 According to the twelfth aspect of the present invention, it is possible to visually display one piece of graph data and a portion of the graph data included therein.

本発明の一実施形態に係る文書解析システムを示す図である。It is a figure which shows the document analysis system which concerns on one Embodiment of this invention. 図１の文書解析システムの機能ブロックを示す図である。It is a figure which shows the functional block of the document analysis system of FIG. グラフデータ作成処理を示すフローチャートである。It is a flowchart which shows a graph data creation process. 語出現データのデータ構成を示す図である。It is a figure which shows the data structure of word appearance data. ノードデータのデータ構成を示す図である。It is a figure which shows the data structure of node data. リンクデータのデータ構成を示す図である。It is a figure which shows the data structure of link data. グラフデータ比較表示処理を示すフローチャートである。It is a flowchart which shows a graph data comparison display process. グラフデータ選択画面の一部を示す図である。It is a figure which shows a part of graph data selection screen. 比較項目選択画面の一部を示す図である。It is a figure which shows a part of comparison item selection screen. 単独のグラフデータＰのグラフである。It is a graph of single graph data P. グラフデータＰ及びＱの同一語について強調表示されたグラフである。It is the graph highlighted about the same word of graph data P and Q. グラフデータＰ及びＱの出現頻度の比率差について強調表示されたグラフである。It is the graph highlighted about the ratio difference of the appearance frequency of graph data P and Q. グラフデータＰ及びＱの出現頻度の順位について強調表示されたグラフである。It is the graph highlighted about the order of appearance frequency of graph data P and Q. グラフデータＰ及びＱの共起相手の差について強調表示されたグラフである。It is the graph highlighted about the difference of the co-occurrence partner of graph data P and Q. グラフデータＰ及びＱの共起相手の関係について強調表示されたグラフである。It is the graph highlighted about the relationship between the co-occurrence partners of the graph data P and Q. グラフデータＰをすべてのグラフデータの集合として場合にグラフデータＱについて強調表示されたグラフである。The graph is highlighted for graph data Q when graph data P is a set of all graph data.

図１には、本発明の一実施形態に係る文書解析システム１００が示されている。文書解析システム１００は、ＣＰＵ（中央処理装置）、ＲＡＭ（Random Access Memory）、及び磁気ディスク装置等からなるコンピュータ本体２００と、コンピュータ本体２００の指令により画面表示を行う表示装置１０２と、コンピュータ本体２００へ情報を入力するための入力装置１０４とを備えている。ここで、表示装置１０２は、液晶表示装置、ＣＲＴ（Cathode Ray Tube）その他の表示装置のいずれであってもよく、入力装置１０４には、キー入力装置、マウス等のポインティングデバイス、及びスキャナ等の画像入力装置が含まれる。 FIG. 1 shows a document analysis system 100 according to an embodiment of the present invention. The document analysis system 100 includes a computer main body 200 including a CPU (Central Processing Unit), a RAM (Random Access Memory), a magnetic disk device, and the like, a display device 102 that performs screen display according to commands from the computer main body 200, and a computer main body 200 And an input device 104 for inputting information. Here, the display device 102 may be any of a liquid crystal display device, a CRT (Cathode Ray Tube) and other display devices, and the input device 104 includes a key input device, a pointing device such as a mouse, and a scanner. An image input device is included.

図２には、文書解析システム１００の機能ブロック図が示されている。ここでコンピュータ本体２００は、内部の磁気ディスク装置に記憶されたプログラムが実行されることにより機能し、図２の２０２〜２１０の各機能ブロックはプログラムにより実現されている。図２に示されるように、コンピュータ本体２００は、入力装置から入力される文書のデータを受けつける文書入力部２０６と、文書入力部２０６において入力された文書を解析し、文書に含まれる語とその出現頻度とを共に抽出する語抽出部２１０と、語抽出部２１０により抽出された語について、後述するノードデータ２５４及びリンクデータ２５６を作成し、グラフデータ記憶部２１２に保存するグラフデータ作成保存部２１８と、比較するためのグラフデータと比較項目を選択させるための画面を表示装置１０２に表示させ、入力装置１０４からの入力により選択されたグラフデータをグラフデータ記憶部２１２から取得するグラフ選択部２０２と、グラフ選択部２０２により選択されたグラフデータと比較項目の内容を取得し、表示装置１０２に比較して表示させる比較表示制御部２０４とを備えている。 FIG. 2 shows a functional block diagram of the document analysis system 100. Here, the computer main body 200 functions by executing a program stored in an internal magnetic disk device, and the functional blocks 202 to 210 in FIG. 2 are realized by the program. As shown in FIG. 2, the computer main body 200 analyzes a document input unit 206 that receives data of a document input from the input device, a document input in the document input unit 206, and a word included in the document and its word A word extraction unit 210 that extracts both appearance frequencies, and a graph data creation and storage unit that creates node data 254 and link data 256 (to be described later) for the words extracted by the word extraction unit 210 and stores them in the graph data storage unit 212 218 and a graph selection unit that displays graph data for comparison and a screen for selecting comparison items on the display device 102 and acquires graph data selected by input from the input device 104 from the graph data storage unit 212 202 and the graph data selected by the graph selection unit 202 and the contents of the comparison items are acquired and displayed. And a comparison display controller 204 to display in comparison with the location 102.

図３には、グラフデータ作成処理のフローチャートが示されている。グラフデータ作成処理では、まず、ステップＳ１０１の文書入力処理において、文書の入力を行う。文書の入力は、入力装置１０４を介して、文書入力部２０６にテキスト情報が入力されることにより行われるが、入力装置１０４としてのキー入力装置、スキャナ等以外にネットワークに接続されたコンピュータ装置等から文書が入力されてもよい。本実施形態では、入力される文書として、携帯電話に関するアンケート結果が入力されるものとしている。 FIG. 3 shows a flowchart of the graph data creation process. In the graph data creation process, first, a document is input in the document input process in step S101. A document is input by inputting text information to the document input unit 206 via the input device 104. In addition to a key input device as the input device 104, a scanner, etc., a computer device connected to a network, etc. A document may be input. In the present embodiment, a questionnaire result regarding a mobile phone is input as an input document.

次に、ステップＳ１０２の語抽出処理において、語抽出部２１０が語の抽出を行う。この語抽出処理では、図４に示されるデータ構成の語出現データ２５２が作成される。語出現データ２５２は、図４に示されるように、文書に割り当てられたメッセージ番号と、語の識別子である語ＩＤと、例えばメッセージ作成者の年齢、性別、地域等のメッセージの属性が記録される属性１、属性２及び属性３とから構成される。 Next, in the word extraction process in step S102, the word extraction unit 210 extracts words. In this word extraction process, word appearance data 252 having the data structure shown in FIG. 4 is created. In the word appearance data 252, as shown in FIG. 4, a message number assigned to a document, a word ID that is an identifier of the word, and a message attribute such as the age, sex, and region of the message creator are recorded. Attribute 1, attribute 2, and attribute 3.

ステップＳ１０３のグラフデータ作成保存処理では、グラフデータ作成保存部２１８が、対象となる複数の文書の集合（「全体集合」という。）における語出現データ２５２から、高い出現頻度の語を特定し、更に、特定された高い出現頻度の語を含む文書の集合（「部分集合」という。）を対象として、他の高い出現頻度の語、すなわち高い頻度で共起する語を特定することを繰り返し、その特定されたそれぞれの語について、図５に示されるようなノードデータ２５４を作成する。ノードデータ２５４は、図５のデータ構成に示されるように、ノードＩＤ、語ＩＤ、語及び要素数から構成される。 In the graph data creation / save process of step S103, the graph data creation / save unit 218 identifies words with high appearance frequency from the word appearance data 252 in a set of a plurality of documents (referred to as “whole set”). Furthermore, for a set of documents including the specified high-frequency words (referred to as “subset”), it is repeated to specify other high-frequency words, that is, words that co-occur with high frequency, Node data 254 as shown in FIG. 5 is created for each identified word. The node data 254 includes a node ID, a word ID, a word, and the number of elements, as shown in the data configuration of FIG.

また、図６のリンクデータ２５６のデータ構成に示されるように、グラフデータ作成保存部２１８は、更に、特定された高い出現頻度の語のノードＩＤをソースノードＩＤとし、その特定された高い出現頻度の語を含む部分集合における特定された他の高い出現頻度の語、すなわち共起相手の語のノードＩＤをターゲットノードＩＤとする組合わせにリンクＩＤを付したリンクデータ２５６を作成する。作成されたノードデータ２５４及びリンクデータ２５６はグラフデータとして、グラフデータ作成保存部２１８により、グラフデータ記憶部２１２に保存される。 Further, as shown in the data configuration of the link data 256 in FIG. 6, the graph data creation / storing unit 218 further sets the node ID of the identified high-frequency word as the source node ID, and identifies the identified high occurrence. The link data 256 is generated in which a link ID is added to a combination of the node ID of the other high-frequency words identified in the subset including the frequency word, that is, the node ID of the co-occurrence partner word as the target node ID. The created node data 254 and link data 256 are saved as graph data in the graph data storage unit 212 by the graph data creation / save unit 218.

次に、グラフデータ比較表示処理Ｓ２００について説明する。図７には、グラフデータ比較表示処理Ｓ２００のフローチャートが示されている。グラフデータ比較表示処理Ｓ２００は、選択処理（ステップＳ２１０）と、比較表示処理（ステップＳ２２０）とを有しおり、ステップＳ２１０の選択処理は更に、ステップＳ２１２のグラフ選択処理と、ステップＳ２１４の比較項目選択処理とを有している。 Next, the graph data comparison display process S200 will be described. FIG. 7 shows a flowchart of the graph data comparison display process S200. The graph data comparison display process S200 includes a selection process (step S210) and a comparison display process (step S220). The selection process in step S210 further includes a graph selection process in step S212 and a comparison item selection in step S214. Processing.

まず、ステップＳ２１２の選択処理では、グラフ選択部２０２が、表示装置１０２に、比較する複数のグラフデータを、利用者に選択させる画面を表示させる。図８には、その選択画面の一部が示されている。利用者によりグラフデータが選択されると、グラフデータ記憶部２１２から選択された複数のグラフデータを取得する。本実施形態においてはグラフデータＰ及びグラフデータＱの２つが選択されたものとしているが、比較表示の対象は、３つ以上であってもよい。ここで、グラフデータＰ及びグラフデータＱは、それぞれ異なる地域で実施された携帯電話に関するアンケート結果の集合である。 First, in the selection process in step S212, the graph selection unit 202 causes the display device 102 to display a screen that allows the user to select a plurality of graph data to be compared. FIG. 8 shows a part of the selection screen. When the graph data is selected by the user, a plurality of selected graph data is acquired from the graph data storage unit 212. In the present embodiment, it is assumed that the graph data P and the graph data Q are selected, but there may be three or more comparison display targets. Here, the graph data P and the graph data Q are a set of questionnaire results regarding mobile phones performed in different regions.

次に、ステップＳ２１４の比較項目選択処理では、グラフ選択部２０２は、利用者に、比較されるグラフデータにおいて強調表示される比較項目を選択させるために、比較項目の種類を表示装置１０２に表示させる。図９には、その選択画面の一部が示されている。比較項目の種類には、同一語、同一語の出現頻度の比率差、同一語の出現頻度の順位、共起相手の差、共起関係の差等が上げられる。ここでは、「共起の組合せ」が選択されている。引き続き、ステップＳ２２０の比較表示処理において、ステップＳ２１０の選択処理で選択されたグラフデータ及び比較項目に基づき、比較結果が表示される。 Next, in the comparison item selection processing in step S214, the graph selection unit 202 displays the type of comparison item on the display device 102 in order to cause the user to select the comparison item highlighted in the graph data to be compared. Let FIG. 9 shows a part of the selection screen. The types of comparison items include the same word, the difference in the appearance frequency of the same word, the rank of the appearance frequency of the same word, the difference in co-occurrence partners, the difference in co-occurrence relationship, and the like. Here, the “co-occurrence combination” is selected. Subsequently, in the comparison display process in step S220, the comparison result is displayed based on the graph data and the comparison items selected in the selection process in step S210.

図１０には、グラフデータＰのグラフが単独で表示された場合について示されている。このグラフでは、グラフデータＰの全体集合Ａに含まれる文書において、出現頻度が高い語である語「電話」、「携帯電話」、「ＰＨＳ」、「必要」及び「メール」が示され、このうち語「携帯電話」が含まれる文書の部分集合において出現頻度が高い語である語「便利」、「マナー」及び「電話」が示されている。 FIG. 10 shows a case where the graph of the graph data P is displayed alone. In this graph, the words “phone”, “mobile phone”, “PHS”, “necessary”, and “mail”, which are words having a high appearance frequency, are shown in the document included in the entire set A of the graph data P. Of these, the words “convenience”, “manner” and “phone”, which are words having a high appearance frequency in a subset of documents including the word “mobile phone”, are shown.

図１１には、グラフデータＰのグラフを表示しつつ、グラフデータＱとの差分を強調する比較表示のうち、比較項目として選択された「同一語」について強調表示される場合について示されている。この図に示されるように、グラフデータＰのノードに含まれる語のうち、グラフデータＱのノードに含まれる語と同一の語である語「電話」、「携帯電話」、「ＰＨＳ」及び「便利」が、ノードの枠を示す線を太くすることにより強調表示されている。 FIG. 11 shows a case where the “same word” selected as the comparison item is highlighted in the comparison display that highlights the difference from the graph data Q while displaying the graph of the graph data P. . As shown in this figure, among the words included in the nodes of the graph data P, the words “phone”, “mobile phone”, “PHS” and “ “Useful” is highlighted by thickening the line indicating the frame of the node.

図１２は、同一語の出現頻度の比率差が異なる場合について強調した比較表示について示している。太線で枠が描かれているノードの語は、グラフデータＱにおける出現頻度の比率がグラフデータＰにおける出現頻度の比率よりも高く、点線で枠が描かれているノードの語は、グラフデータＱにおける出現頻度の比率がグラフデータＰにおける出現頻度の比率よりも低い、ことを意味している。つまり、語「電話」は、グラフデータＱの方が、グラフデータＰより出現頻度が高く、語「携帯電話」、「便利」及び「ＰＨＳ」は、グラフデータＱの方が、グラフデータＰより出現頻度が低い。 FIG. 12 shows a comparative display emphasized when the difference in the frequency of appearance of the same word is different. The word of the node whose frame is drawn with a bold line has a higher ratio of the appearance frequency in the graph data Q than the ratio of the appearance frequency in the graph data P, and the word of the node whose frame is drawn with a dotted line is the graph data Q It means that the ratio of the appearance frequency in is lower than the ratio of the appearance frequency in the graph data P. In other words, the word “phone” is more frequent in the graph data Q than in the graph data P, and the words “mobile phone”, “convenience”, and “PHS” are in the graph data Q more than the graph data P. The frequency of appearance is low.

図１３は、同一語の出現頻度の順位が異なる場合について強調した比較表示について示している。図１２のグラフでは、グラフデータＱにおいて語「ＰＨＳ」の方が語「電話」より出現頻度の順位が高いが、グラフデータＰにおいてはその逆である場合について、語「ＰＨＳ」と語「電話」のノードに影をつけ、その影の色を変えることにより示している。例えば、グラフデータＰでは、出現頻度第３位が語「電話」で第５位が語「ＰＨＳ」である場合に、グラフデータＱでは、語「電話」が出現頻度第５位であり、語「ＰＨＳ」が第３位である場合である。 FIG. 13 shows a comparative display that is emphasized when the appearance frequency ranks of the same word are different. In the graph of FIG. 12, in the graph data Q, the word “PHS” has a higher appearance frequency than the word “telephone”, but in the graph data P, the opposite is true for the word “PHS” and the word “phone”. "Is shown by adding a shadow to the node and changing the color of the shadow. For example, in the graph data P, when the third highest occurrence frequency is the word “phone” and the fifth highest is the word “PHS”, in the graph data Q, the word “phone” is the fifth highest appearance frequency, This is the case where “PHS” is third.

図１４は、共起相手に差がある場合について示している。図１３のグラフでは、グラフデータＱには、語「携帯電話」の共起相手としての語「料金」が存在するが、データＰには存在しない場合を、矢印とノードの枠を共に点線にすることにより示している。 FIG. 14 shows a case where there is a difference in co-occurrence partners. In the graph of FIG. 13, when the word “charge” as the co-occurrence partner of the word “mobile phone” exists in the graph data Q, but does not exist in the data P, both the arrow and the node frame are indicated by dotted lines. By showing.

図１５は、共起の組合わせが同じ語について、矢印の方向すなわちソースノードとターゲットノードの関係が異なる場合について示している。このグラフでは、グラフデータＰには語「携帯電話」から語「便利」の矢印のみが存在するが、グラフデータＱには、語「携帯電話」から語「便利」の矢印だけでなく、語「便利」から語「携帯電話」の矢印も存在する場合について、語「便利」から語「携帯電話」の矢印の先を大きく強調して示している。この他、一方のグラフデータの矢印がない場合や矢印の方向が逆である場合についても強調表示を行うことができる。 FIG. 15 shows the case where the direction of the arrow, that is, the relationship between the source node and the target node is different for words having the same combination of co-occurrence. In this graph, only the arrow from the word “mobile phone” to the word “convenient” exists in the graph data P, but the graph data Q includes not only the arrow from the word “mobile phone” to the word “convenience”, but also the word In the case where an arrow from “convenient” to the word “mobile phone” is also present, the tip of the arrow from the word “convenient” to the word “mobile phone” is greatly emphasized. In addition, it is possible to perform highlighting when there is no arrow of one graph data or when the direction of the arrow is reversed.

図１６は、すべてのグラフデータの集合を比較対象とした場合について示している。図には、グラフデータＱ〜Ｕを含むすべてのグラフデータＶ（集合Ｇ）のノードが示され、このすべてのグラフデータと比較されるグラフデータＱのノード及び矢印が太線で示されると共に、ノード内の字体を他のノード内の字体と変化させている。 FIG. 16 shows a case where a set of all graph data is set as a comparison target. In the figure, nodes of all graph data V (set G) including the graph data Q to U are shown, and nodes and arrows of the graph data Q to be compared with all the graph data are indicated by bold lines, and nodes The font in is changed from the font in other nodes.

上述の実施形態においては、図面の制約等により強調表示は太線、点線、影及び太字に限られるが、画面表示においては、色彩の変更、半透明表示することによる強調表示、並びに、ノードや矢印の出現、消失、変色及び拡縮等の動的変化を利用した強調表示をすることができる。また、比較表示を行うかどうかのボタンを画面上に設置し、比較表示を行う指令を発した際に強調表示を行うこととすることにより、比較表示の有無による違いを認識し易くしてもよい。 In the above-described embodiment, highlighting is limited to bold lines, dotted lines, shadows, and bold characters due to drawing restrictions and the like. However, in the screen display, highlighting by changing colors, semi-transparent display, and nodes and arrows Can be highlighted using dynamic changes such as the appearance, disappearance, discoloration and scaling. In addition, it is possible to make it easier to recognize the difference due to the presence or absence of the comparison display by installing a button on the screen whether or not to perform the comparison display and highlighting it when issuing a command to perform the comparison display. Good.

また、上述の実施形態においては、携帯電話のアンケート結果の文章を例としたが、解析の対象はこれに限られず、他の文章の集合であってもよいし、既にグラフ表示可能となっている語についてのデータであってもよい。 Moreover, in the above-mentioned embodiment, although the text of the questionnaire result of the mobile phone is taken as an example, the object of analysis is not limited to this, and it may be a set of other texts, and can already be displayed in a graph. It may be data about a certain word.

上述の実施形態においては、一つの装置により構成されるシステムとしたが、各構成部がネットワークを介した装置に保存され、一つのシステムを構成していることとしてもよい。 In the above-described embodiment, the system is configured by one device. However, each component may be stored in a device via a network to configure one system.

なお、上述の実施形態においては、プログラムは磁気ディスク装置に記憶されていることとしたが、ＣＤ−ＲＯＭ等その他の記憶媒体に格納して提供することも可能である。 In the above-described embodiment, the program is stored in the magnetic disk device. However, the program may be provided by being stored in another storage medium such as a CD-ROM.

１００文書解析システム、１０２表示装置、１０４入力装置、２００コンピュータ本体、２０２グラフ選択部、２０４比較表示制御部、２０６文書入力部、２１０語抽出部、２１２グラフデータ記憶部、２１８グラフデータ作成保存部、２５２語出現データ、２５４ノードデータ、２５６リンクデータ。 DESCRIPTION OF SYMBOLS 100 Document analysis system, 102 Display apparatus, 104 input apparatus, 200 Computer main body, 202 Graph selection part, 204 Comparison display control part, 206 Document input part, 210 Word extraction part, 212 Graph data storage part, 218 Graph data creation preservation | save part 252 word appearance data, 254 node data, 256 link data.

Claims

A graph data selection procedure for selecting a plurality of graph data each including a plurality of pieces of link data having character string information of words appearing in the document and appearance frequency information about the words, and information linking a plurality of the node data; ,
A document analysis program for causing a processing device to execute a graph comparison display procedure for comparing and displaying the selected plurality of graph data.

A document input procedure for inputting the plurality of documents;
A plurality of first words that are words having a high appearance frequency are extracted from the words included in the plurality of documents input in the document input procedure, and a first specific word that is one of the first words is extracted. A word extraction procedure for extracting a plurality of second words that are words having a high appearance frequency in a subset of documents including
A graph data storage procedure for storing link data having information for connecting the first specific word and the second word of the subset;
The document analysis program according to claim 1, further comprising:

In the graph data selection procedure, further select a comparison item to be used in the comparison,
The document analysis program according to claim 1, wherein in the graph comparison display procedure, the comparison display is performed based on the selected comparison item.

In the graph comparison display procedure, for one graph data of the selected plurality of graph data, a node display including the character string information based on the node data and an inter-node based on the link data The document analysis program according to any one of claims 1 to 3, wherein an arrow connecting the two is displayed.

In the graph comparison display procedure, the character string information included in the node of the one graph data and the character string information in the other graph data among the plurality of graph data are related to the character string information that matches. The document analysis program according to claim 4, wherein the node or the character string information is highlighted and displayed.

In the graph comparison display procedure, in the appearance frequency of the word related to the same character string information appearing together in the one graph data and the other graph data among the plurality of graph data, in the other graph data The document analysis program according to claim 4, wherein the document analysis program is displayed with emphasis so as to indicate that the appearance frequency ratio is high or the appearance frequency ratio is low.

In the graph comparison display procedure, in the appearance frequency of the word related to the same character string information appearing together in the one graph data and the other graph data among the plurality of graph data, in the one graph data The document analysis program according to any one of claims 4 to 6, wherein the document analysis program is displayed with emphasis so as to indicate that the rank of the appearance frequency is high or that the rank of the appearance frequency is low.

In the graph comparison display procedure, for the combination of the character string information combined by the information linking the node data included in one link data, the character string information of the other graph data of the plurality of graph data If the combination matches only one of the combinations of the character string information in any combination of the character string information of the one graph data, the other character of the combination of the character string information The document analysis program according to any one of claims 4 to 7, wherein a node related to column information or character string information of the node is highlighted.

In the graph comparison display procedure, the combination of two different character string information included in the one graph data and the same character string information combination as the combination of the character string information included in the other graph data Among the combinations of the character string information, only one of the combinations of the character string information is a combination of the character string information combined by the information connecting the node data included in the link data, or the combination of the character string information Both are combinations of the character string information combined by the information connecting the node data included in the link data. When the combination order is different from each other, the arrow is highlighted. The document analysis program according to any one of claims 4 to 8, wherein

The comparison result is displayed by superimposing and displaying other graph data of the plurality of graph data on the one graph data. The document analysis program according to any one of the above.

The document analysis program according to claim 10, wherein, when performing the superimposed display, the node and the arrow are dynamically displayed by any of appearance, disappearance, discoloration, and enlargement / reduction.

The document analysis program according to any one of claims 4 to 11, wherein the one graph data among the plurality of graph data includes other graph data to be compared.

The display for emphasis is performed by changing any of color, line thickness, and line type, or by displaying in shadow, translucent, or bold. The document analysis program according to any one of the above.

A plurality of graph data are selected from node data having character string information of words appearing in a document and appearance frequency information for the words, and graph data each including a plurality of link data as information linking a plurality of the node data. A graph selection section to perform,
A document analysis system comprising: a comparison display control unit configured to compare and display the selected plurality of graph data.

A document input unit for inputting the plurality of documents;
A plurality of first words that are words having a high appearance frequency are extracted from the words included in the plurality of documents input by the document input unit, and a first specific word that is one of the first words A word extraction unit that extracts a plurality of second words that are words having a high appearance frequency in a subset of documents including
A graph data storage unit that stores, in the graph data storage unit, information that links the first specific word and the second word of the subset;
The document analysis system according to claim 14, further comprising: