JP2009128949A

JP2009128949A - Graphic display device and program

Info

Publication number: JP2009128949A
Application number: JP2007299854A
Authority: JP
Inventors: Shoichi Tateno; 昌一舘野
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2007-11-19
Filing date: 2007-11-19
Publication date: 2009-06-11
Anticipated expiration: 2027-11-19
Also published as: JP5309537B2

Abstract

<P>PROBLEM TO BE SOLVED: To suggest the tendency of a whole text unit aggregation by graphically indicating a plurality of representative words in the whole text unit aggregation to be analyzed, and relations among the words. <P>SOLUTION: A word appearance data storage part 150 holds word appearance data which indicates which message includes the word. A frequency calculating part 170 calculates how many messages exist in the object message aggregation, where the respective object words appear. A word specifying part 180 specifies L number of words with the first to L-th frequencies, based on the frequencies calculated by the frequency calculating part 170. A partial aggregation deriving part 160 derives partial aggregations, specifies the corresponding words, and generates node data and link data. A graphic display part 190 acquires the node data and the link data from the partial aggregation deriving part 160, determines a graphic network structure, and performs graphic display. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、文書に含まれる語をグラフ表示する技術に関する。 The present invention relates to a technique for graphically displaying words included in a document.

自由形式で回答するアンケート、苦情文書（メール）等の多量の文書を解析してそれら多量の文書に内在する真実を抽出することが行なわれている。例えば自由回答欄のメッセージを分析して要望、感謝、満足、要求、不満等を抽出して製品開発等にフィードバックすることが望まれる。このような抽出作業は、熟練した作業者が多くの時間をかけて行なわれるのが通常であり、費用または即時性のうえで問題があった。 Analyzing a large number of documents such as questionnaires and complaint documents (emails) that are answered in free format, and extracting the truth inherent in these large numbers of documents. For example, it is desirable to analyze the message in the free answer column and extract requests, thanks, satisfaction, requirements, dissatisfaction, etc., and feed it back to product development. Such an extraction operation is usually performed by a skilled worker taking a lot of time, and there is a problem in terms of cost or immediacy.

計算機による言語処理を利用してこのような作業を支援することが望まれる。 It is desirable to support such work using language processing by a computer.

なお、この発明と関連するものとしては特許文献１がある。特許文献１は、検索対象文書全体に表れる頻度に対して検索結果の文書に表れる頻度の割合が大きな特徴語をノードで表し特徴語の共起度に基づいて特徴語ノードの間にリンクを結んでグラフ表示することを開示している。しかしながら、検索を前提とするものであり、一般的な場面で文書を解析するのを支援するものではない。 Incidentally, Patent Document 1 is related to the present invention. In Patent Document 1, a feature word having a large ratio of the frequency appearing in the search result document with respect to the frequency appearing in the entire search target document is represented by a node, and a link is established between the feature word nodes based on the co-occurrence degree of the feature word The graph display is disclosed. However, it is based on search and does not support the analysis of documents in general situations.

なお、上述の背景技術やその問題点は、この発明の背景の一部を説明するものにすぎず、この発明は上述の背景技術や問題点に限定されるものではないことを理解されたい。
特開平１０−７４２１０号公報 It should be understood that the above-described background art and its problems are only part of the background of the present invention, and the present invention is not limited to the above-mentioned background techniques and problems.
JP-A-10-74210

この発明は、以上の事情を考慮してなされたものであり、対象文書全体に含まれる語を分析して対象文書全体を分析するのを支援するために語をグラフ表示するグラフ表示技術を提供することを目的としている。 The present invention has been made in consideration of the above circumstances, and provides a graph display technique for displaying words in a graph to analyze the words contained in the entire target document and assist in analyzing the entire target document. The purpose is to do.

この発明によれば、上述の目的を達成するために、特許請求の範囲に記載のとおりの構成を採用している。ここでは、発明を詳細に説明するのに先だって、特許請求の範囲の記載について補充的に説明を行なっておく。 According to this invention, in order to achieve the above-mentioned object, the configuration as described in the claims is adopted. Here, prior to describing the invention in detail, supplementary explanations of the claims will be given.

すなわち、この発明の一側面によれば、上述の目的を達成するために、複数の文単位を含んでなる解析対象の文単位集合において出現する語をグラフ表示するグラフ表示装置に：文単位に対応づけて当該文単位に出現する語を記憶する記憶手段と；文単位集合中の文単位に出現する語について当該文単位集合における頻度を上記記憶手段を参照して計算する頻度計算手段と；上記頻度計算手段が計算した頻度を参照して、上記文単位集合において頻度が上位のＬ（２以上の整数）個以下の語を特定する語特定手段と；上記語特定手段で特定された語の各々について、上記文単位集合から、当該語を含む文単位からなる部分集合を導出する部分集合導出手段と；上記部分集合に対応する語および導出元の文単位集合に対応する語から導出先の部分集合に対応する語へのリンクを表示する表示手段とを設け、上記部分集合導出手段は、ルートの文単位集合から開始して繰り返し部分集合を導出するようにしている。 That is, according to one aspect of the present invention, in order to achieve the above-described object, a graph display device that graphically displays words that appear in a sentence unit set to be analyzed including a plurality of sentence units: Storage means for storing words appearing in the sentence unit in association with each other; frequency calculation means for calculating a frequency in the sentence unit set with reference to the storage means for words appearing in the sentence unit in the sentence unit set; Referring to the frequency calculated by the frequency calculating means, a word specifying means for specifying the uppermost L (an integer greater than or equal to 2) words in the sentence unit set; a word specified by the word specifying means; A subset derivation means for deriving a subset of the sentence unit including the word from the sentence unit set; and a derivation destination from the word corresponding to the subset and the word corresponding to the derivation sentence unit set Part of And display means for displaying a link to a word corresponding to the set provided, said subset derivation means is so as to derive a subset repeated starting from the statement unit set route.

グラフ表示装置は、典型的には、スタンドアローンのコンピュータシステム、ネットワークに接続されたコンピュータシステム群（クライアント・サーバシステム）等により構成できるが、これに限定されない。 Typically, the graph display apparatus can be configured by a stand-alone computer system, a computer system group (client / server system) connected to a network, or the like, but is not limited thereto.

文単位は、１または複数の文からなる一塊の文章であり、以下ではメッセージと呼ぶことがある。文単位は分節等の文の一部であってもよい。語は典型的には名詞であるが、これに限定されない。語を特定の品詞、例えば名詞に限定することによりシンプルな表示が実現可能である。 A sentence unit is a single sentence composed of one or a plurality of sentences, and may be referred to as a message below. The sentence unit may be a part of a sentence such as a segment. A word is typically a noun, but is not limited thereto. A simple display can be realized by limiting the word to a specific part of speech, for example a noun.

この構成においては、文単位集合から、その文単位集合の局在的な語の頻度に着目して部分集合を順次に導出し、導出した方向および導出した部分集合に対応する語をグラフとして表示し、分析対象の文単位集合全体における代表的な複数の語およびそれら語の間の関係をグラフ表示して文単位集合全体の傾向を示唆することができる。 In this configuration, a subset is sequentially derived from the sentence unit set, focusing on the local word frequency of the sentence unit set, and the direction and the words corresponding to the derived subset are displayed as a graph. Then, a plurality of typical words in the whole sentence unit set to be analyzed and the relationship between the words can be displayed in a graph to suggest a tendency of the whole sentence unit set.

部分集合は、深さ優先探索、幅優先探索で導出して良い。また、導出元の文単位集合中の頻度の大きい順に部分集合を導出してもよい。 The subset may be derived by depth-first search and width-first search. Also, subsets may be derived in descending order of frequency in the derivation source sentence unit set.

また、上記記憶手段は、文単位ごとに１または複数の属性を記憶し、属性に応じて解析対象の文単位集合を絞り込むようにしてもよい。属性は典型的には性別、年齢（年齢範囲）であり、属性値に応じて表示を切り替え、または対比させて表示し、属性値の特徴を示唆するようにしても良い。 The storage unit may store one or a plurality of attributes for each sentence unit, and narrow down a sentence unit set to be analyzed according to the attribute. The attribute is typically gender and age (age range), and the display may be switched according to the attribute value or displayed in comparison with the attribute value to suggest the feature of the attribute value.

上記語特定手段は、同一の語が所定回数以上特定されるのを禁止してもよい。例えば、同一の語は一度しか特定されないようにしてもよい。 The word specifying means may prohibit the same word from being specified more than a predetermined number of times. For example, the same word may be specified only once.

また、この発明の他の側面によれば、複数の文単位を含んでなる解析対象の文単位集合において出現する語をグラフ表示するグラフ表示装置に：文単位に対応づけて当該文単位に出現する語を記憶する記憶手段と；上記解析対象の文単位集合中の文単位において対をなす第１の語および第２の語が共起する頻度を上記記憶手段を参照して計算する頻度計算手段と；上記頻度計算手段が計算した頻度が閾値を越える対の各々について、当該対の頻度が、当該対の第１の語を同じくする対のうちで上位Ｌ（２以上の整数）位までに含まれない場合を除いて、当該対をなす第１の語および第２の語を特定する語特定手段と；上記語特定手段で特定された第１の語および第２の語、ならびにこれらを連結するリンクからなるグラフを表示する表示手段とを設けるようにしている。 According to another aspect of the present invention, there is provided a graph display device for graphically displaying words appearing in a set of sentence units to be analyzed including a plurality of sentence units: appearing in the sentence unit in association with the sentence unit A frequency calculation for calculating a frequency of co-occurring first and second words that make a pair in a sentence unit in the sentence unit set to be analyzed with reference to the storage means Means; for each pair whose frequency calculated by the frequency calculation means exceeds a threshold, the frequency of the pair is up to the upper L (an integer of 2 or more) of the pairs having the same first word of the pair A word specifying means for specifying the first word and the second word that make up the pair, the first word and the second word specified by the word specifying means, and Display means for displaying a graph consisting of links linking The are to be provided.

この構成においては、対象の文単位集合全体における、語の共起頻度に着目して、語の関連性に基づいて語をグラフ表示できる。共起頻度について閾値を設けてスクリーニングすることにより、グラフの標示が煩雑になるのを防止している。 In this configuration, the word can be displayed in a graph based on the relevance of the word, focusing on the co-occurrence frequency of the word in the entire target sentence unit set. By providing a threshold for the co-occurrence frequency and screening, it is possible to prevent the marking of the graph from becoming complicated.

なお、この発明は装置またはシステムとして実現できるのみでなく、方法としても実現可能である。また、そのような発明の一部をソフトウェアとして構成することができることはもちろんである。またそのようなソフトウェアをコンピュータに実行させるために用いるソフトウェア製品もこの発明の技術的な範囲に含まれることも当然である。 The present invention can be realized not only as an apparatus or a system but also as a method. Of course, a part of the invention can be configured as software. Of course, software products used to cause a computer to execute such software are also included in the technical scope of the present invention.

この発明の上述の側面および他の側面は特許請求の範囲に記載され以下実施例を用いて詳述される。 These and other aspects of the invention are set forth in the appended claims and will be described in detail below with reference to examples.

この発明によれば、文単位集合から、その文単位集合の局在的な語の頻度に着目して部分集合を順次に導出し、導出した方向および導出した部分集合に対応する語をグラフとして表示し、分析対象の文単位集合全体における代表的な複数の語およびそれら語の間の関係をグラフ表示して文単位集合全体の傾向を示唆することができる。 According to the present invention, a subset is sequentially derived from a sentence unit set by paying attention to the local word frequency of the sentence unit set, and the derived direction and the word corresponding to the derived subset are graphed. A plurality of typical words in the whole sentence unit set to be analyzed and the relationship between the words can be displayed in a graph to indicate a tendency of the whole sentence unit set.

以下、この発明の実施例について説明する。 Examples of the present invention will be described below.

図１は、この発明の原理的な構成を説明する実施例１のグラフ表示システム１１０を全体として示している。この例では、コンピュータ２００上のソフトウェアとして実現している。ソフトウェアは周知の手法により記録媒体２０１を用いたり、通信回線を用いてコンピュータ２００にインストールできる。図ではスタンドアローンの構成となっているが、ネットワークにより接続されたサーバ装置およびクライアント装置で構成しても良い。実施例１のグラフ表示システム１１０の各機能ブロックは、典型的には、ソフトウェアおよびコンピュータ２００のハードウェア資源が協働して実現する。 FIG. 1 shows an overall graph display system 110 according to a first embodiment for explaining the basic configuration of the present invention. In this example, it is realized as software on the computer 200. The software can be installed in the computer 200 using the recording medium 201 by a well-known method or using a communication line. In the figure, a stand-alone configuration is used, but a server device and a client device connected via a network may be used. Each functional block of the graph display system 110 according to the first embodiment is typically realized by cooperation of software and hardware resources of the computer 200.

グラフ表示システム１１０は、例えば、アンケートの回答を解析目的として、回答内に含まれる自由形式のメッセージ（１または複数の文からなる文単位）の傾向をメッセージに含まれる語を含むグラフを表示して提示する。固定形式の回答（選択枝、または固定語）を併せて用いてもよい。 For example, the graph display system 110 displays a graph including a word included in a message with a tendency of a free-form message (a sentence unit composed of one or more sentences) included in the answer for analysis of a questionnaire response. Present. A fixed-format answer (choice or fixed word) may be used together.

この実施例では、図２に示すように、表示する語と、語の間の関係（リンク）とを特定する。具体的にはつぎの処理を行う。 In this embodiment, as shown in FIG. 2, the word to be displayed and the relationship (link) between the words are specified. Specifically, the following processing is performed.

（１）メッセージ集合に含まれる語の頻度を求める。
（２）頻度の多い順にＬ個の部分集合を作る。以下ではＬを幅と呼ぶことがある。
（３）部分集合の各々について、処理（２）を行う。これをＭ回繰り返す。以下ではＭを深さと呼ぶこともある。 (1) The frequency of words included in the message set is obtained.
(2) Create L subsets in descending order of frequency. Hereinafter, L may be referred to as a width.
(3) Process (2) is performed for each subset. Repeat this M times. Hereinafter, M is sometimes referred to as depth.

このようにして部分集合を幅Ｌ、深さＭで導出し、導出した部分集合の作成に用いた語を表示対象の語として選択し、導入元の集合から導入先の集合への関係を導入元の集合から特定された語から、導入先の集合から特定された語へのリンクとし、特定した語およびリンクからなるグラフを生成して表示する。 In this way, the subset is derived with the width L and the depth M, the word used to create the derived subset is selected as the display target word, and the relationship from the introduction source set to the introduction destination set is introduced. From the word specified from the original set, a link from the introduction destination set to the specified word is generated, and a graph including the specified word and link is generated and displayed.

図１において、グラフ表示システム１１０は、語出現データ記憶部１５０、頻度計算部１７０、語特定部１８０、部分集合導出部１６０、グラフ表示部１９０等を含んで構成される。 1, the graph display system 110 includes a word appearance data storage unit 150, a frequency calculation unit 170, a word identification unit 180, a subset derivation unit 160, a graph display unit 190, and the like.

語出現データ記憶部１５０は、例えば、図４に示すようなデータ構造の語出現データを保持する。この例では、語出現データはメッセージ番号、語ＩＤ、メッセージ（アンケートの回答）に付与された複数の属性の値を含むが、これに限定されない。属性は、性別、年齢（年齢範囲）、居住地範囲等である。１のメッセージにＮ個の異なり語が含まれる場合には、Ｎ個の語出現データが準備される。準備対象の語を名詞等の品詞で限定してもよい。
語出現データの具体的な例は、これに限定されないが、例えば図５に示すようなものであり、メッセージ番号、述部を受け部とする係り受け関係、メッセージ（文）、メッセージの属性値（「ｆａｃｔ」欄）を含んでなる。例えば、最上行のデータは、メッセージ番号が１０４１９で、述部（受け部）が「合う」で名詞部（係り部）が「肌」である。係り部を助詞の「は」、「が」、「を」、「に」、「で」等で区分して表示できるので、係り受け関係を参照するだけで全体としての意味内容の把握が容易になる。 The word appearance data storage unit 150 holds, for example, word appearance data having a data structure as shown in FIG. In this example, the word appearance data includes a message number, a word ID, and a plurality of attribute values assigned to the message (questionnaire answer), but is not limited thereto. The attributes are sex, age (age range), place of residence, and the like. When N different words are included in one message, N word appearance data are prepared. The word to be prepared may be limited by part of speech such as a noun.
Specific examples of word appearance data are not limited to this, but are as shown in FIG. 5, for example, and include message numbers, dependency relationships that receive predicates, messages (sentences), and message attribute values. ("Fact" column). For example, the data on the top line has a message number of 10419, the predicate (receiving part) is “matching”, and the noun part (relative part) is “skin”. Because the dependency part can be displayed by classifying the particles as “ha”, “ga”, “on”, “ni”, “de”, etc., it is easy to understand the meaning content as a whole simply by referring to the dependency become.

頻度計算部１７０は、対象語の各々が出現するメッセージが、対象メッセージ集合内にいくつあるかを計算するものであり、典型的には、語ＩＤごとに当該語出現データを含む語出現データの個数を計算するけれども、語ごとのメッセージ数のヒストグラムを生成するものであれば、どのような手法を採用してもよい。 The frequency calculation unit 170 calculates how many messages each of the target words appear in the target message set. Typically, the frequency calculation unit 170 stores the word appearance data including the word appearance data for each word ID. Although the number is calculated, any method may be adopted as long as it generates a histogram of the number of messages for each word.

語特定部１８０は、頻度計算部１７０で計算された頻度に基づいて、頻度が１位からＬ位までのＬ個の語を特定する。 Based on the frequency calculated by the frequency calculation unit 170, the word specifying unit 180 specifies L words having frequencies from the first place to the L place.

部分集合導出部１６０は、図２を参照して説明した手順で部分集合を導出していくものであり、その際、頻度計算部１７０に対して導入元の集合を指定し、語特定部１８０から特定された語を取得する。部分集合導出部１６０は、特定した語から図６（ａ）に示すようなノードデータを取得し、また部分集合の導入元および導入先に基づいて図６（ｂ）に示すようなリンクデータを取得する。ノードデータはノードＩＤ、語ＩＤ、語（文字列）、ノードに対応する部分集合に含まれるメッセージの個数（要素数）等を含むが、これに限定されない。リンクデータは、リンクＩＤ、ソースノード（親、導出元）のノードＩＤ、ターゲットノード（子、導出先）のノードＩＤ等を含むが、これに限定されない。
グラフ表示部１９０は部分集合導出部１６０からノードデータおよびリンクデータを取得してグラフのネットワーク構造を決定しグラフ表示を行う。ネットワーク構造は任意の手法により決定でき、これに限定されないが、「キーグラフ」（ｈｔｔｐ：／／ｗｗｗ２．ｋｋｅ．ｃｏ．ｊｐ／ｋｅｙｇｒａｐｈ）や「ＧｒａｐｈＶｉｚ」（ｈｔｔｐ：／／ｉ．ｌｏｖｅｒｒｕｂｙ．ｎｅｔ／ｊａ／ｒｈｇ／ｃｄ／ｇｒａｐｈｖｉｚ．ｈｔｍｌ）を利用できる。グラフ表示部１９０の表示例は例えば図９に示すようなものであり、対象となるメッセージ集合全体において出現する語をリンクで結びつけるものである。語（ノード）やリンクの大きさ、太さ、色等をメッセージの頻度に応じて可変させてよい。 The subset derivation unit 160 derives a subset according to the procedure described with reference to FIG. Get the specified word from. The subset deriving unit 160 obtains node data as shown in FIG. 6A from the identified word, and generates link data as shown in FIG. 6B based on the introduction source and introduction destination of the subset. get. The node data includes a node ID, a word ID, a word (character string), the number of messages (number of elements) included in a subset corresponding to the node, but is not limited thereto. The link data includes, but is not limited to, a link ID, a node ID of a source node (parent, derivation source), a node ID of a target node (child, derivation destination), and the like.
The graph display unit 190 acquires node data and link data from the subset derivation unit 160, determines the network structure of the graph, and displays the graph. The network structure can be determined by an arbitrary method, but is not limited thereto, but is not limited to “key graph” (http://www2.kke.co.jp/keygraph) or “GraphViz” (http://i.loverruby.net/ ja / rhg / cd / graphviz.html). A display example of the graph display unit 190 is, for example, as shown in FIG. 9 and links words appearing in the entire target message set with links. Words (nodes), link size, thickness, color, and the like may be varied according to the frequency of messages.

図３はこの実施例の動作例を示しており、その処理は以下のとおりである。 FIG. 3 shows an example of the operation of this embodiment, and the processing is as follows.

［ステップＳ１０］：メッセージの全体集合を頻度計算の対象にセットする。
［ステップＳ１１］：語の頻度を計算する。
［ステップＳ１２］：頻度が上位のＬ個以下の語を選択する。例えば、深さ優先（バックトラック法）で部分集合を探索する場合には、図６（ａ）（Ｌ＝２の例）に示すように、１〜Ｎの部分集合を探索し、最上位の１個の語を選択していく。幅優先で部分集合を探索する場合には、図６（ｂ）（Ｌ＝２の例）に示すように１〜Ｎの部分集合を探索していく。
［ステップＳ１３］：ノード/リンクデータを生成して記憶する。記憶箇所は、語出現データ記憶部１５０を用いてもよいし、他の任意の記憶手段を用いてよい。
［ステップＳ１４］：選択語に対応する部分集合を頻度計算の対象にセットする。
［ステップＳ１５］：終了条件が満たされるかどうかを判別し、満たされない場合はステップＳ１１に戻り、満たされた場合にはステップＳ１６へ進む。終了条件は、指定された深さまで語が特定され終わった場合や、上限数の語が特定された場合や、下限数以上のメッセージを含む集合がなくなった場合等であるが、これに限定されない。
［ステップＳ１６］：グラフ表示部１９０がノードデータおよびリンクデータを用いて所定のグラフ生成手法でグラフデータを生成してグラフを表示する。 [Step S10]: The entire set of messages is set as a frequency calculation target.
[Step S11]: The word frequency is calculated.
[Step S12]: L or less words with the highest frequency are selected. For example, when searching for a subset with depth priority (backtrack method), the subsets 1 to N are searched for as shown in FIG. Select one word. When searching for a subset with priority in width, the subsets 1 to N are searched as shown in FIG. 6B (example of L = 2).
[Step S13]: Node / link data is generated and stored. As the storage location, the word appearance data storage unit 150 may be used, or any other storage means may be used.
[Step S14]: A subset corresponding to the selected word is set as a frequency calculation target.
[Step S15]: It is determined whether or not the end condition is satisfied. If not satisfied, the process returns to Step S11, and if satisfied, the process proceeds to Step S16. The end condition is when a word has been specified to a specified depth, when an upper limit number of words has been specified, or when there are no more sets containing messages exceeding the lower limit number, but it is not limited to this. .
[Step S16]: The graph display unit 190 generates graph data by a predetermined graph generation method using the node data and link data, and displays the graph.

図８は、メッセージの全体の集合の語ごとのメッセージ頻度を棒グラフで表示し（図８（ａ）、さらに、これを語のグラフで表示したものである（Ｍを１とした）。この例では、図２に示すような階層的な処理を行っていないので、どの語が多く出現しているかを示すにすぎない。 8 shows the message frequency for each word of the entire set of messages as a bar graph (FIG. 8 (a), which is further displayed as a word graph (M is set to 1). Then, since the hierarchical processing as shown in FIG. 2 is not performed, it merely shows which words appear frequently.

図９は、この実施例のグラフ表示システム１１０を用いて、化粧品の自由形式のアンケート（「現在使用しているファウンデーションお好きな点・気に入っている点」）の回答から準備したメッセージ集合から、生成したグラフの表示例を示す。グラフのリンクをたどってどのようなことが語られているかを把握することが可能である。例えば、一塊の語群を部分グラフとして把握してそれぞれから意味合いを抽出する。図９の例では、（１）自分に合った色合い、（２）価格・品質、（３）仕上がり、（４）伸び等が意味合いとして含まれることを把握できる。 FIG. 9 shows a set of messages prepared from an answer to a free-form questionnaire for cosmetics (“the foundation that you currently use / what you like”) using the graph display system 110 of this embodiment. A display example of the generated graph is shown. It is possible to grasp what is being told by following the links of the graph. For example, a group of words is grasped as a partial graph, and the meaning is extracted from each. In the example of FIG. 9, it can be understood that (1) a color suitable for oneself, (2) price / quality, (3) finish, (4) growth, and the like are included as meanings.

図１０および図１１は、年齢範囲で層別したグラフである。図１０は、年齢の属性値に基づいて２５歳〜２９歳の回答者のメッセージを全体集合として生成したグラフであり、図１１は、４０歳〜４９歳の回答者のメッセージを全体集合として生成したグラフである。図１０からは、使い心地については、無添加であること、敏感肌であることが語られていることが把握できる。図１１からはカバー力・薄付きについては、シミを語っており、またＵＶ効果も語っていることが把握できる。 10 and 11 are graphs stratified by age range. FIG. 10 is a graph in which messages of respondents aged 25 to 29 are generated as a whole set based on age attribute values, and FIG. 11 is a graph of messages of respondents aged 40 to 49 years as a whole set. It is a graph. From FIG. 10, it can be understood that the feeling of use is not added and that the skin is sensitive. It can be understood from FIG. 11 that the covering power and lightness are talking about the stain and also the UV effect.

この実施例では、グラフ上に分布される語の配置に基づいてどのような事柄が話題になっているかを把握することが容易になる。 In this embodiment, it becomes easy to grasp what is a topic based on the arrangement of words distributed on the graph.

つぎにこの発明を具体的に適用した実施例２のグラフ表示システム１００について説明する。 Next, a graph display system 100 according to a second embodiment to which the present invention is specifically applied will be described.

図１２は、この発明の具体的な実施例２のグラフ表示システム１００を全体として示している。この例では、グラフ表示システム１００をコンピュータ２００上のソフトウェアとして実現している。ソフトウェアは周知の手法により記録媒体２０１を用いたり、通信回線を用いてコンピュータ２００にインストールできる。図ではスタンドアローンの構成となっているが、ネットワークにより接続されたサーバ装置およびクライアント装置で構成しても良い。実施例２のグラフ表示システム１１０の各機能ブロックは、典型的には、ソフトウェアおよびコンピュータ２００のハードウェア資源が協働して実現する。なお、図１２において図１と対応する箇所には対応する符号を付した。 FIG. 12 shows a graph display system 100 according to a second embodiment of the present invention as a whole. In this example, the graph display system 100 is realized as software on the computer 200. The software can be installed in the computer 200 using the recording medium 201 by a well-known method or using a communication line. In the figure, a stand-alone configuration is used, but a server device and a client device connected via a network may be used. Each functional block of the graph display system 110 according to the second embodiment is typically realized by cooperation of software and hardware resources of the computer 200. In FIG. 12, portions corresponding to those in FIG.

図１２において、グラフ表示システム１００は、文書入力部１０、形態素解析部１１、構文解析部１２、構文解析結果データ記憶部１３、係り受け関係抽出部１４、係り受け関係集合記憶部１５、部分集合導出部１６、頻度計算部１７、語特定部１８、グラフ表示部１９等を含んで構成されている。この例では、一群の文からなる文書データから係り受け関係の集合を抽出して係り受け関係集合記憶部１５に記憶するようにしているけれども、外部から係り受け関係集合を取得して係り受け関係集合記憶部１５に記憶するようにしても良い。 12, the graph display system 100 includes a document input unit 10, a morpheme analysis unit 11, a syntax analysis unit 12, a syntax analysis result data storage unit 13, a dependency relationship extraction unit 14, a dependency relationship set storage unit 15, and a subset. A derivation unit 16, a frequency calculation unit 17, a word identification unit 18, a graph display unit 19 and the like are included. In this example, a set of dependency relationships is extracted from the document data consisting of a group of sentences and stored in the dependency relationship set storage unit 15. However, the dependency relationship set is acquired from the outside to obtain the dependency relationship. It may be stored in the collective storage unit 15.

この実施例では、文書データから抽出した係り受け関係のデータ（以下に述べるように基礎意味チャンクといい、用言節等の受け部分を共通にする範囲で一まとめにされたものである）を、係り受け関係に着目して係り語、受け語、その他の語で検索し、典型的には、係り語や受け語を対比させて表示して、文の把握を支援することも可能である。 In this embodiment, dependency-related data extracted from the document data (referred to as basic semantic chunks as described below, which are grouped within a common range of receiving parts such as predicates). , Focusing on the dependency relationship, it is possible to search for the dependency word, the received word, and other words, and typically display the comparison of the dependency word and the received word to assist the comprehension of the sentence. .

文書入力部１０は、一群のアンケート（自由形式のアンケートの回答。メッセージ）や一群の電子メール等の文書データ（コーパスともいう）を入力するものであり、文書データは後続の形態素解析等を行なうために適宜に前処理されてもよい。文書入力部１０は、文書データを入力できるものであればどのようなものでもよく、例えば、ファイルシステム、外部記憶装置、通信回線、Ｉ／Ｏ装置等から構成される。文書入力部１０は、アンケートや電子メール等のメッセージを受信するシステムであってもよいし、文字認識装置、音声認識装置等であってもよい。文書データの例は例えば図１４に示すようなものであり、この例では、化粧品のアンケート結果から取得した文が文の番号を割り当てられて管理されている。 The document input unit 10 inputs document data (also called a corpus) such as a group of questionnaires (an answer to a free-form questionnaire, a message) or a group of e-mails, and the document data performs subsequent morphological analysis. Therefore, it may be preprocessed appropriately. The document input unit 10 may be anything as long as it can input document data, and includes, for example, a file system, an external storage device, a communication line, and an I / O device. The document input unit 10 may be a system that receives a message such as a questionnaire or an e-mail, or may be a character recognition device, a voice recognition device, or the like. An example of document data is as shown in FIG. 14, for example. In this example, sentences acquired from cosmetic questionnaire results are managed with sentence numbers assigned.

形態素解析部１１は、周知の任意の形態素解析手法で形態素解析辞書を参照して文を形態素に分解するものである。形態素解析は例えば図１５に示すように行なわれる。 The morpheme analyzing unit 11 decomposes a sentence into morphemes by referring to a morpheme analysis dictionary by a known arbitrary morpheme analysis method. The morphological analysis is performed as shown in FIG. 15, for example.

構文解析部１２は、周知の任意の構文解析手法で、構文規則に基づいて、形態素解析結果を構文解析する。すなわち、図１３に示すように、一群の文のデータ（コーパス）が文書入力部１０により入力される（Ｓ１１０）。形態素解析部１１は、１つの文のデータを処理対象として取り出し、形態素解析を行い、構文解析部１２は形態素解析結果に基づいて構文解析を行なう（Ｓ１１１〜Ｓ１１３）。構文解析結果は構文解析結果データ記憶部１３に登録され、すべての文について以上の処理を繰り返す（Ｓ１１４、Ｓ１１５）。構文解析結果は例えば図１６に示すようなものであり、理解を容易にするためにこれを木構造で表すと図１７に示すようになる。 The syntax analysis unit 12 parses the morpheme analysis result based on the syntax rules by any known syntax analysis method. That is, as shown in FIG. 13, a group of sentence data (corpus) is input by the document input unit 10 (S110). The morpheme analysis unit 11 takes out data of one sentence as a processing target, performs morpheme analysis, and the syntax analysis unit 12 performs syntax analysis based on the morpheme analysis result (S111 to S113). The syntax analysis result is registered in the syntax analysis result data storage unit 13, and the above processing is repeated for all sentences (S114, S115). The parsing result is as shown in FIG. 16, for example, and when this is expressed in a tree structure for easy understanding, it is as shown in FIG.

係り受け関係抽出部１４は、係り受け関係抽出規則を構文解析結果のデータに適用して係り受け関係集合を抽出して、係り受け関係集合記憶部１５に記録するものである。係り受け関係抽出規則は例えば図１９に示すようなものであり、図中、「＊」は任意個のサブ木（分の構文解析木の部分をなす要素）である。この例では連用の係り受け関係を抽出するものであるが、連体の係り受け関係についても同様である。係り受け関係抽出規則を用いて例えば図２０に矢印で示すように係り受け関係を抽出できる。この例では連用の係り受け関係を示している。 The dependency relationship extracting unit 14 applies a dependency relationship extraction rule to the data of the syntax analysis result, extracts a dependency relationship set, and records it in the dependency relationship set storage unit 15. The dependency relationship extraction rule is as shown in FIG. 19, for example. In the figure, “*” is an arbitrary number of sub-trees (elements forming a part of the parse tree). In this example, a continuous dependency relationship is extracted, but the same applies to the dependency relationship. Using the dependency relationship extraction rule, for example, the dependency relationship can be extracted as indicated by an arrow in FIG. In this example, a continuous dependency relationship is shown.

係り受け関係抽出部１４は、図１８に示すように、抽出規則を入力し（Ｓ１２０）、該当する係り受け関係を構文解析結果のデータから抽出して（Ｓ１２１）、係り受け関係集合記憶部１５に記憶する（Ｓ１２２）。 As shown in FIG. 18, the dependency relationship extraction unit 14 inputs an extraction rule (S120), extracts the corresponding dependency relationship from the data of the syntax analysis result (S121), and the dependency relationship set storage unit 15 (S122).

抽出された係り受け関係のデータは、用言節等の受け部分を共通にする範囲で一まとめにされた態様で表現される。係り部は０個または複数個である。以下では、このようなデータを基礎意味チャンクとも呼ぶ。基礎意味チャンクは、例えばプログラミング言語Ｐｒｏｌｏｇのファクト形式のデータ構造で表され、図２１はこのようなデータ構造の例を示す。このデータ構造では、基礎意味チャンクとチャンク述部の２種類のデータからなる。図２１の例では、「１」は文番号を示し、「紹介，する，た」は用言節の形態素列を終止形で並べたものであり、「２３，３１」はその出現位置を示すバイトオフセットであり、「太郎，は」、「５，１１」、「花子，を」、「１７，２３」、「次郎，に」、「１１，１７」はそれぞれ受け部分の形態素列およびそれぞれの出現位置を示すバイトオフセットである。「３」は係り受けの個数を示す。「紹介＿例文」はコーパスの名称である。 The extracted dependency-related data is expressed in a unified form within a range in which receiving parts such as phrases are shared. There are zero or a plurality of engaging portions. Hereinafter, such data is also referred to as a basic semantic chunk. The basic semantic chunk is represented by, for example, a data structure in the fact format of the programming language Prolog, and FIG. 21 shows an example of such a data structure. This data structure consists of two types of data: basic semantic chunks and chunk predicates. In the example of FIG. 21, “1” indicates a sentence number, “Introduction, To Do,” indicates a morpheme sequence of prescriptive phrases arranged in an end form, and “23, 31” indicates an appearance position thereof. “Taro, ha”, “5, 11”, “Hanako, o”, “17, 23”, “Jiro, ni”, “11, 17” are the morpheme sequence of the receiving part and the respective This is a byte offset that indicates the appearance position. “3” indicates the number of dependencies. “Introduction_example sentence” is the name of the corpus.

係り受け関係集合記憶部１５は係り受け関係集合（基礎意味チャンク集合）を記憶するものである。係り受け関係集合記憶部１５は実施例１の語出現データ記憶部１５０に対応し、係り受け関係集合は例えば図５に示すようなものと同じである。 The dependency relationship set storage unit 15 stores a dependency relationship set (basic semantic chunk set). The dependency relationship set storage unit 15 corresponds to the word appearance data storage unit 150 of the first embodiment, and the dependency relationship set is the same as that shown in FIG. 5, for example.

頻度計算部１７、語特定部１８、部分集合導出部１６、およびグラフ表示部１９は、実施例１の頻度計算部１７０、語特定部１８０、部分集合導出部１６０、およびグラフ表示部１９０と対応するものである。 The frequency calculation unit 17, the word identification unit 18, the subset derivation unit 16, and the graph display unit 19 correspond to the frequency calculation unit 170, the word identification unit 180, the subset derivation unit 160, and the graph display unit 190 of the first embodiment. To do.

すなわち、頻度計算部１７は、対象語の各々が出現するメッセージが対象メッセージ集合内にいくつあるかを計算するものであり、典型的には、語ごとに当該語を含むメッセージの個数を計算する。 That is, the frequency calculation unit 17 calculates how many messages in each target word appear in the target message set, and typically calculates the number of messages including the word for each word. .

語特定部１８は、頻度計算部１７で計算された頻度に基づいて、頻度が１位からＬ位までのＬ個の語を特定する。 Based on the frequency calculated by the frequency calculation unit 17, the word specifying unit 18 specifies L words whose frequencies are from 1st to Lth.

部分集合導出部１６は、さきに図２を参照して説明した手順で部分集合を導出していくものであり、その際、頻度計算部１７に対して導入元の集合を指定し、語特定部１８から特定された語を取得する。部分集合導出部１６は、特定した語から、さきに図６（ａ）に示したようなノードデータを取得し、また部分集合の導入元および導入先に基づいて図６（ｂ）に示したようなリンクデータを取得する。ノードデータはノードＩＤ、語ＩＤ、語（文字列）、ノードに対応する部分集合に含まれるメッセージの個数（要素数）等を含むが、これに限定されない。リンクデータは、リンクＩＤ、ソースノード（親、導出元）のノードＩＤ、ターゲットノード（子、導出先）のノードＩＤ等を含むが、これに限定されない。
グラフ表示部１９は部分集合導出部１６からノードデータおよびリンクデータを取得してグラフのネットワーク構造を決定しグラフ表示を行う。ネットワーク構造は任意の手法により決定できる。以下、表示例を説明する。ただし、語（ノード）やリンクの大きさ、太さ、色等をメッセージの頻度に応じて可変させてよい。 The subset derivation unit 16 derives the subset by the procedure described above with reference to FIG. 2. At this time, the introduction source set is designated to the frequency calculation unit 17 to specify the word. The specified word is acquired from the unit 18. The subset deriving unit 16 obtains the node data as shown in FIG. 6A from the specified word, and shows the node data shown in FIG. 6B based on the introduction source and the introduction destination of the subset. Get link data like this. The node data includes a node ID, a word ID, a word (character string), the number of messages (number of elements) included in a subset corresponding to the node, but is not limited thereto. The link data includes, but is not limited to, a link ID, a node ID of a source node (parent, derivation source), a node ID of a target node (child, derivation destination), and the like.
The graph display unit 19 acquires node data and link data from the subset derivation unit 16, determines the network structure of the graph, and displays the graph. The network structure can be determined by any method. A display example will be described below. However, the size, thickness, color, and the like of words (nodes) and links may be varied according to the frequency of messages.

図２２は、この実施例でグラフ表示のパラメータを設定するユーザインタフェース例を示す。この例では、部分集合導出モードをラジオボタンＲ１、Ｒ２を用いて深さ優先および幅優先から選択し、ファンアウト（幅Ｌ）、深さＭを入力フォームＦ１、Ｆ２に入力し、ファンイン（ノードに入力するリンクの数。同一の語を特定して部分集合を選択できる個数）を指定する。また表示オプションとして全体ノードの表示・非表示や頻度表示をラジオボタンＲ３、Ｒ４で指定できる。なお、パラメータにデフォルト値を設けておいても良い。グラフ表示のパラメータを設定したらグラフ表示ボタンＢを操作してグラフ表示処理を開始させる。グラフ表示処理は典型的には図３に示すように実行される。 FIG. 22 shows an example of a user interface for setting graph display parameters in this embodiment. In this example, the subset derivation mode is selected from depth priority and width priority using the radio buttons R1 and R2, the fan-out (width L) and the depth M are input to the input forms F1 and F2, and the fan-in (node Specifies the number of links to be input to (the number of subsets that can be selected by specifying the same word). As display options, display / non-display of the entire node and frequency display can be designated by radio buttons R3 and R4. A default value may be provided for the parameter. After setting the graph display parameters, the graph display button B is operated to start the graph display processing. The graph display process is typically executed as shown in FIG.

図２３は、ファンアウト１０、深さ２、ファンイン１、幅優先、全体ノード表示、頻度表示（ターゲットノードのメッセージ数）で設定したときのグラフを示す。この例では、幅優先であるので、頻度の多い語が表示されるので話題の把握が容易である。ただし、話題（頻度の多い語）同士の関係が若干把握しにくい。 FIG. 23 shows a graph when the fan-out 10, depth 2, fan-in 1, width priority, overall node display, and frequency display (number of messages of the target node) are set. In this example, since breadth is given priority, frequently used words are displayed, so it is easy to grasp the topic. However, the relationship between topics (frequently used words) is slightly difficult to grasp.

図２４は、ファンアウト１０、深さ４、ファンイン１、深さ優先、全体ノード表示、頻度表示で設定したときのグラフを示す。この例では、頻度の多い語（話題）を共起関係で接続していくことができ、話題同士の関係を把握しやすい。ただし、深さを短くした場合、頻度の多い語でも表示されない場合がある（左から探索していく場合、上位階層で、右端の語）。 FIG. 24 shows a graph when the setting is fan-out 10, depth 4, fan-in 1, depth priority, overall node display, and frequency display. In this example, frequently used words (topics) can be connected in a co-occurrence relationship, and the relationship between topics can be easily grasped. However, when the depth is shortened, even a frequently used word may not be displayed (when searching from the left, the rightmost word in the upper hierarchy).

図２５は、ファンアウト１０、深さ３、ファンイン２、幅優先、全体ノード表示、頻度非表示で設定したときのグラフを示す。この例では、ファンインを多くした分、話題（語）の間の関係を把握しやすくなるが、グラフ表示が煩雑になる。 FIG. 25 shows a graph when the fan-out 10, depth 3, fan-in 2, width priority, overall node display, and frequency non-display are set. In this example, as the fan-in increases, it becomes easier to grasp the relationship between topics (words), but the graph display becomes complicated.

図２６は、ファンアウト１０、深さ３、ファンイン３、幅優先、全体ノード表示、頻度非表示で設定したときのグラフを示す。この例では、ファンインを多くした分、話題（語）の間の関係を把握しやすくなるが、グラフ表示が一層煩雑になる。 FIG. 26 shows a graph when the fan-out 10, depth 3, fan-in 3, width priority, overall node display, and frequency non-display are set. In this example, as the fan-in increases, it becomes easier to grasp the relationship between topics (words), but the graph display becomes more complicated.

図２７は、ファンアウト１０、深さ５、ファンイン１、幅優先、全体ノード表示、頻度非表示で設定したときのグラフを示す。この例では、深さを大きくした分、話題の関係を広範囲にフォローできる。ただし、その分、話題を把握する上でノイズとなる場合もある。頻度（ターゲットノード中のメッセージの頻度。全体メッセージ集合中の語の頻度ではない）を表示して、あるいは、頻度に応じて色を変えて、重要な話題を強調しても良い。 FIG. 27 shows a graph when the setting is fan-out 10, depth 5, fan-in 1, width priority, overall node display, and frequency non-display. In this example, you can follow the relationship of the topic in a wide range by increasing the depth. However, there is a case where it becomes noise when grasping the topic. The frequency (the frequency of messages in the target node, not the frequency of words in the entire message set) may be displayed, or the color may be changed according to the frequency to emphasize important topics.

図２８は、ファンアウト１０、深さ３、ファンイン１、幅優先、全体ノード表示、頻度非表示で設定したときのグラフを示す。この例では、話題を幅広く表示できる。 FIG. 28 shows a graph when the setting is fan-out 10, depth 3, fan-in 1, width priority, overall node display, and frequency non-display. In this example, a wide variety of topics can be displayed.

なお、以上の表示例において、図９等に鎖線で囲むようにユーザが着目領域を指定して色属性等で区分けして表示しても良い。 In the above display example, the user may designate a region of interest so as to be surrounded by a chain line in FIG.

つぎにこの発明の実施例３のグラフ表示システム１２０について説明する。 Next, a graph display system 120 according to Embodiment 3 of the present invention will be described.

この実施例のグラフ表示システム１２０は、深さ優先、幅優先の他に、「エッジ頻度優先」で語を選択表示するものである。ここで、「エッジ頻度」は、対象のメッセージ集合全体の中で、所定の語Ｖを含むメッセージの集合の中で、他の語Ｗを含むメッセージの数Ｘをいう。これをＶ→Ｗ（Ｘ）で表す。例えば、携帯電話→子供（100）は、携帯電話という語を含むメッセージ集合の中で、子供という語を含むものは100件ある、ということである。→の元の語をソース、先の語をターゲットと呼ぶ。 The graph display system 120 of this embodiment selectively displays words by “edge frequency priority” in addition to depth priority and width priority. Here, the “edge frequency” refers to the number X of messages including another word W in a set of messages including a predetermined word V in the entire target message set. This is represented by V → W (X). For example, mobile phone → children (100) means that there are 100 messages including the word “child” in the message set including the word “mobile phone”. The original word of → is called the source and the previous word is called the target.

図２９は実施例３のグラフ表示システム１２０を全体として示しており、この図において、図１と対応する箇所には対応する符号を付した。この例では、コンピュータ２００上のソフトウェアとして実現している。ソフトウェアは周知の手法により記録媒体２０１を用いたり、通信回線を用いてコンピュータ２００にインストールできる。図ではスタンドアローンの構成となっているが、ネットワークにより接続されたサーバ装置およびクライアント装置で構成しても良い。実施例１のグラフ表示システム１１０の各機能ブロックは、典型的には、ソフトウェアおよびコンピュータ２００のハードウェア資源が協働して実現する。 FIG. 29 shows the graph display system 120 of the third embodiment as a whole, and in this figure, parts corresponding to those in FIG. In this example, it is realized as software on the computer 200. The software can be installed in the computer 200 using the recording medium 201 by a well-known method or using a communication line. In the figure, a stand-alone configuration is used, but a server device and a client device connected via a network may be used. Each functional block of the graph display system 110 according to the first embodiment is typically realized by cooperation of software and hardware resources of the computer 200.

図２９において、グラフ表示システム１２０は、語出現データ記憶部１５０、エッジ頻度計算部１７１、語対特定部１８１、グラフ表示部１９０等を含んで構成される。このグラフ表示システム１２０は図２９には示さないが図１に示す構成もあわせ持つ。もちろん、エッジ頻度優先の表示に必要な部分のみから構成しても良い。エッジ頻度計算部１５１は語出現データ記憶部１５０を参照して２つの語Ｖ、Ｗについてエッジ頻度Ｖ→Ｗ（Ｘ）を計算するものである。語対特定部１８１は、エッジ頻度Ｖ→Ｗ（Ｘ）で語の対（Ｖ，Ｗ）をソートして頻度が設定頻度より大きい語対を特定してノードデータおよびリンクデータを生成する。上位所定数の語対を特定しても良い。グラフ表示部１９０は語対特定部１８１から受け取ったノードデータおよびリンクデータに基づいてグラフを表示する。 29, the graph display system 120 includes a word appearance data storage unit 150, an edge frequency calculation unit 171, a word pair identification unit 181, a graph display unit 190, and the like. Although this graph display system 120 is not shown in FIG. 29, it also has the configuration shown in FIG. Of course, you may comprise only the part required for the display of priority of edge frequency. The edge frequency calculation unit 151 refers to the word appearance data storage unit 150 and calculates the edge frequency V → W (X) for two words V and W. The word pair identification unit 181 sorts the word pairs (V, W) with the edge frequency V → W (X), identifies word pairs with a frequency greater than the set frequency, and generates node data and link data. The upper predetermined number of word pairs may be specified. The graph display unit 190 displays a graph based on the node data and link data received from the word pair identification unit 181.

図３０はグラフ表示システム１２０の動作例を説明するものであり、その詳細は以下のとおりである。 FIG. 30 illustrates an operation example of the graph display system 120, and details thereof are as follows.

［ステップＳ２０］：メッセージの全体集合を対象にセットする。
［ステップＳ２１］：エッジ頻度Ｖ→Ｗ（Ｘ）を計算する。
［ステップＳ２２］：エッジ頻度Ｖ→Ｗ（Ｘ）で語対（Ｖ，Ｗ）をソートする。
［ステップＳ２３］：エッジ頻度が閾値を越える語対（Ｖ，Ｗ）を特定する。その他の条件、例えば、語対の個数等で、選択しても良い。
［ステップＳ２４］：特定した語対（Ｖ，Ｗ）からノードデータおよびリンクデータを生成する。
［ステップＳ２５］：ノードおよびリンクを含むグラフを表示する。 [Step S20]: The entire set of messages is set as a target.
[Step S21]: The edge frequency V → W (X) is calculated.
[Step S22]: Sort word pairs (V, W) by edge frequency V → W (X).
[Step S23]: A word pair (V, W) whose edge frequency exceeds a threshold is specified. You may select by other conditions, for example, the number of word pairs.
[Step S24]: Node data and link data are generated from the specified word pair (V, W).
[Step S25]: A graph including nodes and links is displayed.

図３１は、グラフ表示のパラメータを設定するユーザインタフェース例を示す。この例では、語探索モード（部分集合導出モード）に、幅優先、深さ優先に加えて、エッジ頻度優先を含めて、ラジオボタンＲ４で指定し、表示対象のエッジ頻度閾値を入力フォームＦ４で指定するようになっている。 FIG. 31 shows an example of a user interface for setting graph display parameters. In this example, the word search mode (subset derivation mode) is specified with the radio button R4 including the edge frequency priority in addition to the width priority and depth priority, and the edge frequency threshold to be displayed is specified with the input form F4. It is supposed to be.

図３２は、エッジ頻度優先、ファンイン３、エッジ頻度閾値１０、頻度非表示、全体ノード表示で設定したときのグラフを示す。この例でも、エッジ頻度閾値を適切に設定することにより、枝刈りを行って、煩雑でない態様で話題および話題の間の関係を表示できる。表示対象の語数を制限することにより枝刈りを行っても良い。 FIG. 32 shows a graph when setting is made with edge frequency priority, fan-in 3, edge frequency threshold 10, frequency non-display, and entire node display. Also in this example, by appropriately setting the edge frequency threshold, pruning can be performed and the relationship between topics can be displayed in a non-complex manner. Pruning may be performed by limiting the number of words to be displayed.

なお、この発明は特許請求の範囲の記載に基づいて決定されるものであり、実施例の具体的な構成、課題、および効果には限定されない。この発明は上述の実施例に限定されるものではなくその趣旨を逸脱しない範囲で種々変更が可能である。例えば、グラフ表示における語のノードをクリック操作等することにより、対応する語を含む係り受け関係やメッセージを図５に示すように表示するようにしても良い。 In addition, this invention is determined based on description of a claim, and is not limited to the specific structure of the Example, a subject, and an effect. The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the spirit of the present invention. For example, a dependency relationship or message including a corresponding word may be displayed as shown in FIG. 5 by clicking the word node in the graph display.

この発明の実施例１の構成を説明するブロック図である。It is a block diagram explaining the structure of Example 1 of this invention. 上述実施例１の動作を模式的に説明する図である。It is a figure which illustrates operation | movement of the said Example 1 typically. 上述実施例１の動作例を説明するフローチャートである。It is a flowchart explaining the operation example of the said Example 1. FIG. 上述実施例１の語出現データを説明する図であるである。It is a figure explaining the word appearance data of the said Example 1. FIG. 上述実施例１の係り受け関係の語出現データを説明する図である。It is a figure explaining the word appearance data of the dependency relation of the above-mentioned Example 1. 上述実施例１のノードデータおよびリンクデータを説明する図である。It is a figure explaining the node data and link data of the said Example 1. FIG. 上述実施例１の深さ優先および幅優先で部分集合を導出する例を説明する図である。It is a figure explaining the example which derives a subset by depth priority and width priority of the above-mentioned Example 1. 従来の表示態様を説明する図である。It is a figure explaining the conventional display mode. 上述実施例１の表示例を説明する図である。It is a figure explaining the example of a display of the above-mentioned Example 1. FIG. 上述実施例１の属性値で層別した表示例を説明する図である。It is a figure explaining the example of a display classified according to the attribute value of the above-mentioned Example 1. 上述実施例１の属性値で層別した他の表示例を説明する図である。It is a figure explaining the other example of a display classified according to the attribute value of the above-mentioned Example 1. この発明の実施例２の構成を全体として示すブロック図である。It is a block diagram which shows the structure of Example 2 of this invention as a whole. 上述実施例２の構文解析結果取得動作例を説明するフローチャートである。It is a flowchart explaining the example of a syntax analysis result acquisition operation | movement of the said Example 2. FIG. 上述実施例２の文書データの例を説明する図である。It is a figure explaining the example of the document data of the said Example 2. FIG. 上述実施例２の形態素解析の例を説明する図である。It is a figure explaining the example of the morphological analysis of the above-mentioned Example 2. FIG. 上述実施例２の構文解析結果の例を説明する図である。It is a figure explaining the example of the syntax analysis result of the said Example 2. FIG. 上述実施例２の構文解析結果の木構造表現を説明する図である。It is a figure explaining the tree structure expression of the syntax analysis result of the above-mentioned Example 2. 上述実施例２の係り受け関係抽出の動作例を説明するフローチャートである。It is a flowchart explaining the operation example of the dependency relationship extraction of the said Example 2. FIG. 上述実施例２の係り受け関係抽出規則の例を説明する図である。It is a figure explaining the example of the dependency relationship extraction rule of the said Example 2. FIG. 上述実施例２の係り受け関係の抽出結果の例を説明する図である。It is a figure explaining the example of the extraction result of the dependency relationship of the said Example 2. FIG. 上述実施例２の係り受け関係のデータ構造の例を説明する図である。It is a figure explaining the example of the data structure of the dependency relation of the said Example 2. FIG. 上述実施例２のグラフ表示のパラメータを設定するユーザインタフェース例を説明する図である。It is a figure explaining the example of a user interface which sets the parameter of the graph display of the said Example 2. FIG. 上述実施例２のグラフ表示例を説明する図である。It is a figure explaining the example of a graph display of the above-mentioned Example 2. 上述実施例２の他のグラフ表示例を説明する図である。It is a figure explaining the other graph display example of the said Example 2. FIG. 上述実施例２の他のグラフ表示例を説明する図である。It is a figure explaining the other graph display example of the said Example 2. FIG. 上述実施例２の他のグラフ表示例を説明する図である。It is a figure explaining the other graph display example of the said Example 2. FIG. 上述実施例２の他のグラフ表示例を説明する図である。It is a figure explaining the other graph display example of the said Example 2. FIG. 上述実施例２の他のグラフ表示例を説明する図である。It is a figure explaining the other graph display example of the said Example 2. FIG. この発明の実施例３の構成を全体として示すブロック図である。It is a block diagram which shows the structure of Example 3 of this invention as a whole. 上述実施例３の動作例を説明するフローチャートである。It is a flowchart explaining the operation example of the said Example 3. FIG. 上述実施例３のグラフ表示のパラメータを設定するユーザインタフェース例を説明する図である。It is a figure explaining the example of a user interface which sets the parameter of the graph display of the said Example 3. FIG. 上述実施例３のグラフ表示例を説明する図である。It is a figure explaining the example of a graph display of the above-mentioned Example 3.

Explanation of symbols

１１０グラフ表示システム
１５０語出現データ記憶部
１６０部分集合導出部
１７０頻度計算部
１８０語特定部
１９０グラフ表示部 110 Graph Display System 150 Word Appearance Data Storage Unit 160 Subset Derivation Unit 170 Frequency Calculation Unit 180 Word Identification Unit 190 Graph Display Unit

Claims

In a graph display device that graphically displays words that appear in a set of sentence units to be analyzed including a plurality of sentence units,
Storage means for storing words appearing in the sentence unit in association with the sentence unit;
Frequency calculation means for calculating the frequency in the sentence unit set with respect to words appearing in the sentence unit in the sentence unit set with reference to the storage means;
Referring to the frequency calculated by the frequency calculating means, a word specifying means for specifying the uppermost L (an integer greater than or equal to 2) words in the sentence unit set;
For each word specified by the word specifying means, a subset derivation means for deriving a subset of sentence units including the word from the sentence unit set;
Display means for displaying a link from the word corresponding to the subset and the word corresponding to the sentence unit set of the derivation source to the word corresponding to the subset of the derivation destination;
The graph display device, wherein the subset derivation means derives a repeated subset starting from a sentence unit set of a root.

The graph display device according to claim 1, wherein the subset derivation unit derives the subset by a depth-first search.

The graph display device according to claim 1, wherein the subset derivation means derives a subset by breadth-first search.

In a graph display device that graphically displays words that appear in a set of sentence units to be analyzed including a plurality of sentence units,
Storage means for storing words appearing in the sentence unit in association with the sentence unit;
A frequency calculating means for calculating the frequency with which the first word and the second word that make a pair in the sentence unit in the sentence unit set to be analyzed co-occur with reference to the storage means;
For each pair whose frequency calculated by the frequency calculation means exceeds the threshold, the frequency of the pair is included in the upper L (an integer greater than or equal to 2) rank among pairs having the same first word of the pair. A word specifying means for specifying the first word and the second word that make up the pair, except when there is not,
A graph display device comprising: a display unit configured to display a graph including the first word and the second word specified by the word specifying unit and a link connecting them.

The graph display device according to claim 1, wherein the subset deriving unit derives a subset in descending order of frequency of corresponding words in the subset.

The graph display device according to claim 1, wherein the storage unit stores one or a plurality of attributes for each sentence unit, and narrows down a sentence unit set to be analyzed according to the attribute.

The graph display device according to claim 1, wherein the sentence unit includes one or more sentences.

The graph display device according to claim 1, wherein the word is a noun.

The graph display device according to claim 1, wherein the word specifying unit prohibits the same word from being specified more than a predetermined number of times.

The graph display device according to claim 9, wherein the word specifying unit prohibits the same word from being specified a plurality of times.

In a graph display program for displaying words that appear in a set of sentence units to be analyzed including a plurality of sentence units,
Storage means for storing words appearing in the sentence unit in association with the sentence unit;
Frequency calculation means for calculating the frequency in the sentence unit set with respect to words appearing in the sentence unit in the sentence unit set with reference to the storage means;
Referring to the frequency calculated by the frequency calculation means, a word specifying means for specifying the uppermost L (an integer of 2 or more) words in the sentence unit set;
For each word specified by the word specifying means, a subset derivation means for deriving a subset consisting of sentence units including the word from the sentence unit set,
As a display means for displaying a link from the word corresponding to the subset and the word corresponding to the derivation sentence unit set to the word corresponding to the derivation subset,
A program for graph display, characterized by causing a computer to function, and further causing the subset derivation means to function so as to derive a repeated subset starting from a root sentence unit set.