JP2023062700A

JP2023062700A - Document analysis support system and method

Info

Publication number: JP2023062700A
Application number: JP2022168903A
Authority: JP
Inventors: 政明星野; Masaaki Hoshino; 圭亮木村; Keisuke Kimura; 隆彦末吉; Takahiko Sueyoshi; 靖宮島; Yasushi Miyajima
Original assignee: Koozyt Inc
Current assignee: Koozyt Inc
Priority date: 2021-10-21
Filing date: 2022-10-21
Publication date: 2023-05-08

Abstract

To provide a document analysis support system, a method, and a program, which support a user to comprehensively and accurately analyze multiple documents.SOLUTION: A document analysis support system extracts semantic structures from each of multiple documents, creates an index of semantic structures, accepts, from a user, a designation of a semantic structure condition based on the viewpoint that the user wants to know with reference to the index, or automatically designates it, displays to the user the documents that meet the designated semantic structure condition, accepts from the user selection of one or more documents and designation of classification tags that are tags to the documents, and associates the designated classification tags with the selected one or more documents. After that, the system searches for documents similar to at least one document associated with the designated classification tag from at least documents associated with no classification tag among the multiple documents, and if there is a similar document, displays it to the user and accepts document selection and classification tag designation for the similar document.SELECTED DRAWING: Figure 7

Description

本発明は、概して、文書分析の支援に関する。 The present invention relates generally to supporting document analysis.

一般に、文書は、テキスト（文字列）を含む。文書分析の支援に関する技術として、例えば特許文献１に開示の技術が知られている。また学会では非特許文献１が知られている。 Generally, documents contain text (strings of characters). For example, the technology disclosed in Patent Document 1 is known as a technology related to document analysis support. Also, non-patent document 1 is known at academic societies.

特開2015-88022号公報Japanese Patent Application Laid-Open No. 2015-88022

砂山渡，谷内田正彦，2002，観点に基づいて重要文を抽出する展望台システムとそのサーチエンジンへの実装https://www.jstage.jst.go.jp/article/tjsai/17/1/17_1_14/_pdfWataru Sunayama, Masahiko Yachida, 2002, Observatory System for Extracting Important Sentences Based on Viewpoints and Its Implementation in a Search Engine https://www.jstage.jst.go.jp/article/tjsai/17/1/17_1_14 /_pdf

特許文献１に開示の技術は、検索条件に適合する全文書の各々について形態素解析と及び係り受け抽出を行い、形態素解析及び係り受け抽出の結果を基に文書を集計等することで、文書分析に要する時間の軽減を図る（段落００１０）。 The technology disclosed in Patent Document 1 performs morphological analysis and dependency extraction for each of all documents that match the search conditions, and aggregates the documents based on the results of the morphological analysis and dependency extraction. (Paragraph 0010).

非特許文献１に開示の技術は、ユーザが文書に期待する観点になりそうな単語をユーザに提示し、それを起点に重要文を探す方式である。 The technology disclosed in Non-Patent Document 1 is a method of presenting a word to the user that is likely to be the point of view that the user expects from the document, and searching for an important sentence based on that word.

しかし、特許文献１に開示の技術では、検索条件、形態素解析又は係り受け抽出が適切でないと分析の精度が低い。また、特許文献１に開示の技術は、検索条件に適合する文書を内部的に集計等して表示するものにすぎず、検索範囲としての複数の文書を網羅的に分析することはできない。 However, with the technique disclosed in Patent Document 1, analysis accuracy is low if search conditions, morphological analysis, or dependency extraction are not appropriate. Further, the technique disclosed in Japanese Patent Laid-Open No. 2002-200022 merely aggregates and displays documents matching search conditions internally, and cannot comprehensively analyze a plurality of documents as a search range.

また非特許文献１の方式は単語のみに着目しているため観点が不明確で重要文の検出精度が低い。 In addition, since the method of Non-Patent Document 1 focuses only on words, the point of view is unclear and the detection accuracy of important sentences is low.

文書分析支援システムが、インデキシング部とユーザ支援部とを備える。インデキシング部が、複数の文書の各々から意味構造を抽出し抽出された意味構造毎の件数を基に意味構造のインデックスを作成する。ユーザ支援部が、下記を行う。
（Ａ）インデックスを基にユーザが知りたい観点に基づく意味構造条件の指定をユーザから受け付ける（又は、当該意味構造条件を自動で指定する）。
（Ｂ）複数の文書に、（Ａ）で指定された意味構造条件に適合する文書があれば、当該文書の少なくとも一部又はサマリをユーザへ表示する。
（Ｃ）表示された文書のうちの一つ以上の文書の選択と、当該一つ以上の文書へのタグである分類タグの指定とをユーザから受け付ける。
（Ｄ）（Ｃ）で選択された一つ以上の文書の各々に、（Ｃ）で指定された分類タグを関連付ける。
（Ｅ）複数の文書のうち少なくとも分類タグが関連付けられていない文書から、指定された分類タグが関連付けられている少なくとも一つの文書と類似する文書を検索し、当該類似する文書があれば、当該類似する文書の少なくとも一部又はサマリをユーザへ表示し、（Ｃ）を行う。 A document analysis support system includes an indexing section and a user support section. An indexing unit extracts a semantic structure from each of a plurality of documents and creates an index of the semantic structure based on the number of cases for each extracted semantic structure. The User Support Department does the following:
(A) Receiving from the user a specification of a semantic structure condition based on the viewpoint that the user wants to know based on the index (or automatically specifying the semantic structure condition).
(B) If there is a document that matches the semantic structure conditions specified in (A) among the plurality of documents, at least a part or a summary of the document is displayed to the user.
(C) Accepting from the user selection of one or more documents from the displayed documents and specification of classification tags, which are tags for the one or more documents.
(D) Associate the classification tag specified in (C) with each of the one or more documents selected in (C).
(E) searching for a document similar to at least one document associated with a specified classification tag from among a plurality of documents not associated with at least a classification tag; Display at least part of or a summary of similar documents to the user, and perform (C).

本発明によれば、複数の文書を網羅的に精度良くユーザが分析することを支援することができる。 According to the present invention, it is possible to assist a user in comprehensively and accurately analyzing a plurality of documents.

第１の実施形態に係るシステム全体の構成例を示す。1 shows a configuration example of an entire system according to a first embodiment; 第１の実施形態に係る文書分析支援システムの論理的な構成例を示す。1 shows a logical configuration example of a document analysis support system according to the first embodiment; 第１の実施形態に係る文書ＤＢの構成例を示す。4 shows a configuration example of a document DB according to the first embodiment; 第１の実施形態に係る意味構造インデックスの構成例を示す。4 shows a configuration example of a semantic structure index according to the first embodiment; インデキシング処理の流れの例を示す。An example of the flow of indexing processing is shown. 分類支援処理の流れの例の一部を示す。A part of an example of the flow of classification support processing is shown. 分類支援処理の流れの例の残りを示す。The rest of the example of the classification assistance process flow is shown. 図６のＳ６０２で表示されるＵＩの例を示す。FIG. 7 shows an example of the UI displayed in S602 of FIG. 6. FIG. 図６のＳ６０３で表示されるＵＩの例を示す。FIG. 7 shows an example of the UI displayed in S603 of FIG. 6. FIG. 図６のＳ６０３で表示されるＵＩの例を示す。FIG. 7 shows an example of the UI displayed in S603 of FIG. 6. FIG. 図７のＳ７０２で表示されるＵＩの例を示す。FIG. 8 shows an example of the UI displayed in S702 of FIG. 7. FIG. 図７のＳ７０３で表示されるＵＩの例を示す。FIG. 8 shows an example of the UI displayed in S703 of FIG. 7. FIG. 図７のＳ７０３で表示されるＵＩの例を示す。FIG. 8 shows an example of the UI displayed in S703 of FIG. 7. FIG. 分類タグ入力支援の例を示す。An example of classification tag input support is shown. 分類タグ入力支援の別の例を示す。Another example of classification tag input assistance is shown. 意味構造リスト（メニュー）のＵＩの例を示す。An example of a semantic structure list (menu) UI is shown. 分類タグ種類のリスト（メニュー）のＵＩの例を示す。An example of a UI of a list (menu) of classification tag types is shown. 第２の実施形態に係る文書ＤＢの一部の構成例を示す。8 shows a configuration example of part of a document DB according to the second embodiment; 述語に係る主語（助詞「が」で係る）のランキングリストの例を示す。An example of a ranking list of subjects related to predicates (related to the particle "ga") is shown. 主語を含む述語項構造の頻度降順のランキングリストの例を示す。An example of a ranking list in descending order of frequency of predicate-argument structures containing subjects is shown. 第２の実施形態において図６のＳ６０２で表示されるＵＩの例を示す。6 shows an example of the UI displayed in S602 of FIG. 6 in the second embodiment. 第２の実施形態において図６のＳ６０３で表示されるＵＩの例を示す。6 shows an example of the UI displayed in S603 of FIG. 6 in the second embodiment. 第２の実施形態において図６のＳ６０２で表示されるＵＩの例を示す。6 shows an example of the UI displayed in S602 of FIG. 6 in the second embodiment. 第２の実施形態において図６のＳ６０３で表示されるＵＩの例を示す。6 shows an example of the UI displayed in S603 of FIG. 6 in the second embodiment.

以下の説明では、「インターフェース装置」は、一つ以上のインターフェースデバイスでよい。当該一つ以上のインターフェースデバイスは、下記のうちの少なくとも一つでよい。
・一つ以上のＩ／Ｏ（Input/Output）インターフェースデバイス。Ｉ／Ｏ（Input/Output）インターフェースデバイスは、Ｉ／Ｏデバイスと遠隔の表示用計算機とのうちの少なくとも一つに対するインターフェースデバイスである。表示用計算機に対するＩ／Ｏインターフェースデバイスは、通信インターフェースデバイスでよい。少なくとも一つのＩ／Ｏデバイスは、ユーザインターフェースデバイス、例えば、キーボード及びポインティングデバイスのような入力デバイスと、表示デバイスのような出力デバイスとのうちのいずれでもよい。
・一つ以上の通信インターフェースデバイス。一つ以上の通信インターフェースデバイスは、一つ以上の同種の通信インターフェースデバイス（例えば一つ以上のＮＩＣ（Network Interface Card））であってもよいし二つ以上の異種の通信インターフェースデバイス（例えばＮＩＣとＨＢＡ（Host Bus Adapter））であってもよい。 In the following description, an "interface device" may be one or more interface devices. The one or more interface devices may be at least one of the following:
- One or more I/O (Input/Output) interface devices. An I/O (Input/Output) interface device is an interface device for at least one of an I/O device and a remote display computer. The I/O interface device to the display computer may be a communications interface device. The at least one I/O device may be any of a user interface device, eg, an input device such as a keyboard and pointing device, and an output device such as a display device.
- One or more communication interface devices. The one or more communication interface devices may be one or more of the same type of communication interface device (e.g., one or more NICs (Network Interface Cards)) or two or more different types of communication interface devices (e.g., NIC and It may be an HBA (Host Bus Adapter).

また、以下の説明では、「メモリ」は、一つ以上のメモリデバイスであり、典型的には主記憶デバイスでよい。メモリにおける少なくとも一つのメモリデバイスは、揮発性メモリデバイスであってもよいし不揮発性メモリデバイスであってもよい。 Also, in the following description, "memory" may be one or more memory devices, typically a main memory device. At least one memory device in the memory may be a volatile memory device or a non-volatile memory device.

また、以下の説明では、「永続記憶装置」は、一つ以上の永続記憶デバイスである。永続記憶デバイスは、典型的には、不揮発性の記憶デバイス（例えば補助記憶デバイス）であり、具体的には、例えば、ＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Drive）である。 Also, in the following description, a "persistent storage device" is one or more persistent storage devices. A permanent storage device is typically a non-volatile storage device (for example, an auxiliary storage device), and more specifically, for example, an HDD (Hard Disk Drive) or SSD (Solid State Drive).

また、以下の説明では、「記憶装置」は、メモリと永続記憶装置の少なくともメモリでよい。 Also, in the following description, "storage" may be at least memory of memory and persistent storage.

また、以下の説明では、「プロセッサ」は、一つ以上のプロセッサデバイスである。少なくとも一つのプロセッサデバイスは、典型的には、ＣＰＵ（Central Processing Unit）のようなマイクロプロセッサデバイスであるが、ＧＰＵ（Graphics Processing Unit）のような他種のプロセッサデバイスでもよい。少なくとも一つのプロセッサデバイスは、シングルコアでもよいしマルチコアでもよい。少なくとも一つのプロセッサデバイスは、プロセッサコアでもよい。少なくとも一つのプロセッサデバイスは、処理の一部又は全部を行うハードウェア回路（例えばＦＰＧＡ（Field-Programmable Gate Array）又はＡＳＩＣ（Application Specific Integrated Circuit））といった広義のプロセッサデバイスでもよい。 Also, in the following description, a "processor" is one or more processor devices. The at least one processor device is typically a microprocessor device such as a CPU (Central Processing Unit), but may be another type of processor device such as a GPU (Graphics Processing Unit). At least one processor device may be single-core or multi-core. At least one processor device may be a processor core. At least one processor device may be a broadly defined processor device such as a hardware circuit (for example, FPGA (Field-Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)) that performs part or all of processing.

また、以下の説明では、「ｋｋｋ部」の表現にて機能を説明することがあるが、機能は、一つ以上のコンピュータプログラムがプロセッサによって実行されることで実現されてもよいし、一つ以上のハードウェア回路（例えばＦＰＧＡ又はＡＳＩＣ）によって実現されてもよい。プログラムがプロセッサによって実行されることで機能が実現される場合、定められた処理が、適宜に記憶装置及び／又はインターフェース装置等を用いながら行われるため、機能はプロセッサの少なくとも一部とされてもよい。機能を主語として説明された処理は、プロセッサあるいはそのプロセッサを有する装置が行う処理としてもよい。プログラムは、プログラムソースからインストールされてもよい。プログラムソースは、例えば、プログラム配布計算機又は計算機が読み取り可能な記録媒体（例えば非一時的な記録媒体）であってもよい。各機能の説明は一例であり、複数の機能が一つの機能にまとめられたり、一つの機能が複数の機能に分割されたりしてもよい。 In addition, in the following description, the function may be described using the expression “kkk unit”, but the function may be realized by executing one or more computer programs by a processor, or may be realized by executing one or more computer programs. It may be realized by the above hardware circuits (FPGA or ASIC, for example). When a function is realized by executing a program by a processor, the defined processing is performed using a storage device and/or an interface device as appropriate, so the function may be at least part of the processor. good. A process described with a function as the subject may be a process performed by a processor or a device having the processor. Programs may be installed from program sources. The program source may be, for example, a program distribution computer or a computer-readable recording medium (for example, a non-temporary recording medium). The description of each function is an example, and multiple functions may be combined into one function, or one function may be divided into multiple functions.

以下、本発明の幾つかの実施形態を図面に基づいて説明する。
［第１の実施形態］ Several embodiments of the present invention will be described below with reference to the drawings.
[First embodiment]

図１は、第１の実施形態に係るシステム全体の構成例を示す。 FIG. 1 shows a configuration example of the entire system according to the first embodiment.

文書分析支援システム１３と、文書分析支援システム１３を利用するユーザのユーザシステム１１とが、通信ネットワーク（例えばインターネット）１６０を介して通信する。「ユーザ」は、企業等の組織でもよいし、組織における一員（例えば、従業員）でもよいし、一般消費者でもよい。 The document analysis support system 13 and the user system 11 of the user using the document analysis support system 13 communicate via a communication network (for example, the Internet) 160 . A “user” may be an organization such as a company, a member of the organization (e.g., an employee), or a general consumer.

ユーザシステム１１は、物理的な計算機システム（例えば、パーソナルコンピュータやスマートフォン）であり、例えば、入力デバイス１１３（例えばキーボードやマウス）、出力デバイス１１４（例えば表示デバイス）、インターフェース装置１１１、記憶装置１１２及びプロセッサ１１５を備える。入力デバイス１１３及び出力デバイス１１４のような一体型デバイスでもよい。インターフェース装置１１１に入力デバイス１１３及び出力デバイス１１４が接続され、インターフェース装置１１１を通じて文書分析支援システム１３と通信が行われる。インターフェース装置１１１及び記憶装置１１２にプロセッサ１１５が接続される。ユーザシステム１１は、このような物理的な計算機システムに代えて、仮想的な計算機システム（例えば、サーバ上の仮想マシン）でもよい。 The user system 11 is a physical computer system (for example, a personal computer or a smartphone), and includes, for example, an input device 113 (for example a keyboard or mouse), an output device 114 (for example a display device), an interface device 111, a storage device 112 and A processor 115 is provided. Integrated devices such as input device 113 and output device 114 may also be used. An input device 113 and an output device 114 are connected to the interface device 111 , and communication with the document analysis support system 13 is performed through the interface device 111 . A processor 115 is connected to the interface device 111 and the storage device 112 . The user system 11 may be a virtual computer system (for example, a virtual machine on a server) instead of such a physical computer system.

文書分析支援システム１３は、インターフェース装置１３１、記憶装置１３２及びそれらに接続されたプロセッサ１３３を備える。インターフェース装置１３１を介してユーザシステム１１と通信が行われる。記憶装置１３２は、プロセッサ１３３に実行されるコンピュータプログラム、及び、プロセッサ１３３に参照又は更新されるデータを格納する。プロセッサ１３３は、記憶装置１３２に記憶されたコンピュータプログラムを実行する。文書分析支援システム１３は、本実施形態では、一つ以上の物理的な計算機で構成された物理的な計算機システムであるが、物理的な計算機システムに代えて、物理的な計算機システム（例えばクラウド基盤）に基づく仮想的な計算機システム（例えば、クラウドコンピューティングサービス）でもよい。 The document analysis support system 13 comprises an interface device 131, a storage device 132 and a processor 133 connected thereto. Communication with the user system 11 is performed via the interface device 131 . The storage device 132 stores computer programs executed by the processor 133 and data referenced or updated by the processor 133 . Processor 133 executes computer programs stored in storage device 132 . In this embodiment, the document analysis support system 13 is a physical computer system composed of one or more physical computers. infrastructure) based virtual computer system (for example, cloud computing service).

図２は、文書分析支援システム１３の論理的な構成例を示す。 FIG. 2 shows a logical configuration example of the document analysis support system 13. As shown in FIG.

記憶装置１３２に、文書ＤＢ（データベース）２００、意味構造インデックス２０１、及び、学習済言語モデル２０３が格納される。コンピュータプログラムがプロセッサ１３３に実行されることにより、インデキシング部２１１、ユーザ支援部２１２及びモデル学習部２１３が実現される。学習済言語モデル２０３は、モデル学習部２１３により学習されたモデル（典型的には、ニューラルネットワークのような機械学習モデル）である。 A document DB (database) 200 , a semantic structure index 201 , and a learned language model 203 are stored in the storage device 132 . The indexing unit 211, the user support unit 212, and the model learning unit 213 are implemented by the processor 133 executing the computer program. The trained language model 203 is a model trained by the model learning unit 213 (typically, a machine learning model such as a neural network).

図３は、文書ＤＢ２００の構成例を示す。 FIG. 3 shows a configuration example of the document DB 200. As shown in FIG.

文書ＤＢ２００は、複数の文書を含んだ情報の一例である。文書ＤＢ２００は、文書毎にエントリを有する。各エントリは、文書ＩＤ１２００、回答者属性１２０１、文書１２０２、意味構造１２０３、分類タグ１０２４及び文書ベクトル１２０５といった情報を含む。一つの文書を例に取る。 The document DB 200 is an example of information containing multiple documents. The document DB 200 has an entry for each document. Each entry contains information such as document ID 1200 , respondent attributes 1201 , document 1202 , semantic structure 1203 , classification tag 1024 and document vector 1205 . Take a document as an example.

文書ＩＤ１２００は、注目文書のＩＤを表す。回答者属性１２０１は、注目文書を回答として入力した回答者の属性（例えば、氏名、年齢又は性別）を表す。文書１２０２は、文書それ自体（テキスト（文字列））を表す。 A document ID 1200 represents the ID of the document of interest. The respondent attribute 1201 represents the attribute (for example, name, age or sex) of the respondent who has input the document of interest as an answer. A document 1202 represents the document itself (text (character string)).

意味構造１２０３は、文書から抽出された意味構造の集合（一つ以上の意味構造）を表す。本実施形態で言う「意味構造」については、後に詳述する。 The semantic structure 1203 represents a set of semantic structures (one or more semantic structures) extracted from the document. The "semantic structure" referred to in this embodiment will be described in detail later.

分類タグ１２０４は、文書に関連付けられた一つ以上の分類タグを表す。「分類タグ」とは、文書へのタグであり、文書の分類に使用されるタグである。注目文書に分類タグが関連付けられていない場合、分類タグ１２０４には分類タグが含まれていない（図３において“－”と表記）。また、文書に複数のタグが関連付けられている場合は、分類タグ１２０４では以下の様な構造でタグ間がカンマで区切られる。
（例）２つのタグがある場合。[‘アプリケーション活用の工夫’, ‘動画コンテンツの活用’] Classification tags 1204 represent one or more classification tags associated with the document. A "classification tag" is a tag to a document and is a tag used to classify the document. If no classification tag is associated with the document of interest, the classification tag 1204 does not contain any classification tag (indicated by "-" in FIG. 3). When a document is associated with multiple tags, the classification tags 1204 are separated by commas in the following structure.
(Example) When there are two tags. ['Ingenuity of application utilization', 'Utilization of video content']

文書ベクトル１２０５は、文書の定量化により表現された値であり、注目文書のＮ次元のベクトルを表す。文書ベクトル１２０５は、類似文書の検索において使用される。 A document vector 1205 is a value expressed by quantifying the document, and represents an N-dimensional vector of the document of interest. The document vector 1205 is used in searching for similar documents.

図４は、意味構造インデックス２０１の構成例を示す。 FIG. 4 shows a configuration example of the semantic structure index 201. As shown in FIG.

意味構造インデックス２０１は、意味構造毎に、エントリを有し、各エントリは、意味構造１３００、頻度数１３０１及び文書ＩＤ１３０２といった情報を含む。一つの意味構造を例に取る。 The semantic structure index 201 has an entry for each semantic structure, and each entry includes information such as a semantic structure 1300 , frequency count 1301 and document ID 1302 . Take one semantic structure as an example.

意味構造１３００は、一つ以上の文書から抽出された注目意味構造を表す。頻度数１３０１は、抽出された意味構造の数を表す。文書ＩＤ１３０２は、注目意味構造を持つ一つ以上の文書の各々の文書ＩＤを表す。 Semantic structure 1300 represents a semantic structure of interest extracted from one or more documents. The frequency number 1301 represents the number of extracted semantic structures. Document ID 1302 represents the document ID of each of the one or more documents having the semantic structure of interest.

以下、本実施形態で行われる処理の例を説明する。 An example of processing performed in this embodiment will be described below.

図５は、インデキシング処理の流れの例を示す。 FIG. 5 shows an example of the flow of indexing processing.

インデキシング部２１１が、文書ＤＢ２００から全文書１２０２を読み込む（Ｓ５００）。インデキシング部２１１が、各文書１２０２について、形態素解析を行い（Ｓ５０１）、Ｓ５０１において抽出された複数の形態素から意味構造を抽出し、抽出した意味構造を、当該文書に対応した意味構造特徴１２０３に追加する（Ｓ５０２）。 The indexing unit 211 reads all documents 1202 from the document DB 200 (S500). The indexing unit 211 performs morphological analysis on each document 1202 (S501), extracts a semantic structure from the plurality of morphemes extracted in S501, and adds the extracted semantic structure to the semantic structure feature 1203 corresponding to the document. (S502).

インデキシング部２１１が、意味構造毎に、頻度数（意味構造の数）と文書ＩＤ（当該意味構造が抽出された文書のＩＤ）とを集計し、当該集計結果を基に意味構造インデックス２０１を作成し、作成した意味構造インデックス２０１を記憶装置１３２に格納する（Ｓ５０３）。 The indexing unit 211 aggregates the frequency count (the number of semantic structures) and the document ID (the ID of the document from which the semantic structure is extracted) for each semantic structure, and creates the semantic structure index 201 based on the aggregate result. and stores the created semantic structure index 201 in the storage device 132 (S503).

本実施形態において、「文書」としてのテキストは、少なくとも一つの文又は文書を含む。「文章」は、連続した一つ以上の文を言う。 In this embodiment, text as "document" includes at least one sentence or document. A "sentence" refers to one or more consecutive sentences.

本実施形態において、「意味構造」とは、文における意味要素の組合せである。「意味要素」とは、意味を構成する要素であり、ここで言う「要素」は、形態素、単語又は文節といった任意の単位でよい。具体的には、例えば、「意味構造」は、述語項構造（例えば、主語と述語の組合せ、述語と目的語の組合せ）でもよいし、名詞（例えば内容語又は実質語）と動詞（例えば機能語）の組合せでもよいし、同一文（又は文章）中での名詞の共起関係でもよい（例えば、同一文（又は文章）中に「予定」及び「スケジューラ」と「会議」及び「スケジューラ」とがあれば、スケジューラソフトでの会議や予定の設定の話題との推定が可能）。 In this embodiment, a "semantic structure" is a combination of semantic elements in a sentence. A "semantic element" is an element that constitutes a meaning, and the "element" referred to here may be any unit such as a morpheme, a word, or a phrase. Specifically, for example, the "semantic structure" may be a predicate-argument structure (e.g., subject-predicate combination, predicate-object combination), or a noun (e.g., content word or substance word) and a verb (e.g., function word), or a co-occurrence relationship of nouns in the same sentence (or sentence) (for example, "schedule" and "scheduler" and "meeting" and "scheduler" in the same sentence (or sentence)). If there is, it is possible to presume that it is the topic of the meeting or schedule setting with the scheduler software).

図６及び図７は、分類支援処理の流れの例を示す。 6 and 7 show an example of the flow of classification support processing.

ユーザ支援部２１２が、意味構造インデックス２０１を読み込み（Ｓ６００）、当該インデックス２０１の少なくとも一部としての意味構造リスト（例えば、頻度数１３０１の降順において上位の意味構造のリスト）をユーザへ表示する。「〇〇〇をユーザへ表示」とは、〇〇〇の表示のための情報を出力すること（例えば、〇〇〇のＵＩ（User Interface）をユーザへ提供すること）を意味し、本実施形態では、その結果として、〇〇〇が、当該ユーザのユーザシステム１１の出力デバイス１１４に表示される。 The user support unit 212 reads the semantic structure index 201 (S600), and displays to the user a semantic structure list (for example, a list of semantic structures in descending order of frequency 1301) as at least part of the index 201. FIG. “Displaying 〇〇〇 to the user” means outputting information for displaying 〇〇〇 (for example, providing the UI (User Interface) of 〇〇〇 to the user), and this implementation In the form, as a result, 〇〇〇 is displayed on the output device 114 of the user system 11 of the user.

表示された意味構造の並びは、意味構造の頻度数１３０１の降順に代えて他の規則に従う順序でもよい。例えば、ユーザ支援部２１２が、所望の一つ以上の意味要素（例えば、一つ以上の単語）の入力をユーザから受け付け、意味構造リストには、当該一つ以上の意味要素を含む文との適合度が強い順に一つ以上の意味構造が並んでよい。 The order of the displayed semantic structures may be in order according to other rules instead of the descending order of the semantic structure frequency count 1301 . For example, the user support unit 212 accepts input of one or more desired semantic elements (for example, one or more words) from the user, and the semantic structure list includes sentences including the one or more semantic elements. One or more semantic structures may be arranged in descending order of suitability.

ユーザ支援部２１２が、表示された意味構造リストのうちのユーザ所望の意味構造の選択をユーザから受け付ける（Ｓ６０１）。ユーザ支援部２１２が、指定された意味構造を持つ文書（指定された意味構造を含む意味構造特徴１２０３に対応した文書１２０２）を意味構造インデックス２０１の文書ＩＤ１３０２を元に文書ＤＢ２００から検索し、見つかった文書をユーザへ表示する（Ｓ６０２）。ここで表示される文書は、文書の全て、一部又はサマリでよい。 The user support unit 212 receives from the user a selection of a semantic structure desired by the user from the displayed semantic structure list (S601). The user support unit 212 searches the document DB 200 for a document having the specified semantic structure (the document 1202 corresponding to the semantic structure feature 1203 including the specified semantic structure) based on the document ID 1302 of the semantic structure index 201, and finds the document. The document obtained is displayed to the user (S602). The document displayed here can be all, part or a summary of the document.

ユーザ支援部２１２が、Ｓ６０２で表示された文書のうちの一つ以上の文書の選択と、当該一つ以上の文書への分類タグの指定とをユーザから受け付け、選択されたそれら一つ以上の文書の各々に、指定された分類タグを関連付ける（Ｓ６０３）。これにより、選択された一つ以上の文書に分類タグが一括して関連付けられる。指定された分類タグを文書に関連付けるとは、本実施形態では、当該文書に対応した分類タグ１２０４に、指定された分類タグが追加されることを意味する。 The user support unit 212 accepts from the user a selection of one or more documents out of the documents displayed in S602 and designation of classification tags for the one or more documents, and the selected one or more documents are displayed. A designated classification tag is associated with each document (S603). This collectively associates the classification tag with one or more of the selected documents. Associating the designated classification tag with the document means that the designated classification tag is added to the classification tag 1204 corresponding to the document in this embodiment.

ユーザ支援部２１２が、分類支援処理を終了するか否かを判定する（Ｓ６０４）。 The user support unit 212 determines whether or not to end the classification support process (S604).

例えば、分類支援処理の終了の指示をユーザから受け付けた場合、Ｓ６０４はＹＥＳとなり、分類支援処理が終了する。 For example, if an instruction to end the classification support processing is received from the user, S604 becomes YES and the classification support processing ends.

一方、例えば、分類支援処理の継続の指示（例えば、既存分類タグの表示の指示）をユーザから受け付けた場合、Ｓ６０４はＮＯとなり、ユーザ支援部２１２が、図７のＳ７００を行う。すなわち、ユーザ支援部２１２が、文書ＤＢ２００における分類タグ１２０４のカラムにある一つ以上の分類タグをユーザへ表示する（Ｓ７００）。ここで表示される一つ以上の分類タグは、直前回に関連付けられた分類タグでもよいし、関連付けられている文書の数の降順で並んだ分類タグでもよいし、ユーザから指定された分類タグ条件に適合する分類タグ（又は、当該分類タグ条件に適合する度合の降順に並んだ分類タグ）でもよい。 On the other hand, for example, when an instruction to continue the classification support process (for example, an instruction to display existing classification tags) is received from the user, S604 becomes NO, and the user support unit 212 performs S700 in FIG. That is, the user support unit 212 displays to the user one or more classification tags in the column of classification tags 1204 in the document DB 200 (S700). The one or more classification tags displayed here may be the classification tags associated with the last time, the classification tags arranged in descending order of the number of associated documents, or the classification tags specified by the user. It may be a classification tag that satisfies the condition (or classification tags arranged in descending order of degree of conformity to the classification tag condition).

ユーザ支援部２１２が、Ｓ７００で表示された分類タグのうちのいずれかの分類タグの指定をユーザから受け付ける（Ｓ７０１）。 The user support unit 212 receives from the user a specification of one of the classification tags displayed in S700 (S701).

ユーザ支援部２１２が、Ｓ７０１で指定された分類タグが関連付けられている文書（指定された分類タグを含んだ分類タグ１２０４に対応した文書１２０２）を文書ＤＢ２００から特定し、特定された各文書について当該文書に類似した文書を文書ＤＢ２００から検索し、見つかった類似文書をユーザへ表示する（Ｓ７０２）。ここで表示される類似文書は、類似文書の全て、一部又はサマリでよい。類似文書検索は、既知の方法で行われてよい。例えば、ユーザ支援部２１２が、Ｓ７０１で指定された分類タグが関連付けられている文書の文書ベクトル１２０５との間で所定の条件を満たす文書ベクトル１２０５に対応した文書１２０２を、類似文書として見つける。 The user support unit 212 identifies from the document DB 200 documents associated with the classification tag designated in S701 (the document 1202 corresponding to the classification tag 1204 containing the designated classification tag), and for each identified document The document DB 200 is searched for documents similar to the document, and the found similar documents are displayed to the user (S702). The similar documents displayed here may be all, part or a summary of the similar documents. A similar document search may be performed by a known method. For example, the user support unit 212 finds, as a similar document, the document 1202 corresponding to the document vector 1205 that satisfies a predetermined condition with the document vector 1205 of the document associated with the classification tag specified in S701.

ユーザ支援部２１２が、Ｓ７０２で表示された類似文書のうちの一つ以上の類似文書の選択と、当該一つ以上の類似文書への分類タグの指定とをユーザから受け付け、選択されたそれら一つ以上の類似文書の各々に、指定された分類タグを関連付ける（Ｓ７０３）。これにより、選択された一つ以上の類似文書に分類タグが一括して関連付けられる。 The user support unit 212 receives from the user a selection of one or more similar documents from among the similar documents displayed in S702 and designation of a classification tag for the one or more similar documents, and A designated classification tag is associated with each of the one or more similar documents (S703). As a result, one or more selected similar documents are collectively associated with the classification tag.

ユーザ支援部２１２が、分類支援処理を終了するか否かを判定する（Ｓ７０４）。例えば、分類支援処理の終了の指示をユーザから受け付けた場合、Ｓ７０４はＹＥＳとなり、分類支援処理が終了する。一方、例えば、分類支援処理の継続の指示（例えば、既存分類タグの表示の指示）をユーザから受け付けた場合、Ｓ７０４はＮＯとなり、処理がＳ７００に戻る。 The user support unit 212 determines whether or not to end the classification support process (S704). For example, if an instruction to end the classification support processing is received from the user, S704 becomes YES and the classification support processing ends. On the other hand, for example, when an instruction to continue the classification support process (for example, an instruction to display an existing classification tag) is received from the user, S704 becomes NO, and the process returns to S700.

図８～図１３は、分類支援処理においてユーザへ表示されるＵＩの例を示す。なお、ここで言うＵＩは、典型的にはＧＵＩ（Graphical User Interface）であるが、ＧＵＩ以外のＵＩであってもよい。 8 to 13 show examples of UIs displayed to the user in the classification support process. Note that the UI referred to here is typically a GUI (Graphical User Interface), but may be a UI other than the GUI.

図８は、図６のＳ６０２で表示されるＵＩの例を示す。 FIG. 8 shows an example of the UI displayed in S602 of FIG.

図８に例示の状態のＵＩ４００を説明する前に、ＵＩ４００の構成を説明する。図６に例示の処理部分においては、ＵＩ４００が表示される。ＵＩ４００は、例えば、ＵＩ部品４０１～４０８を有する。ＵＩ部品４０１～４０８については、例えば下記の通りである。
・表示欄４０１には、意味構造リストからＳ６０１で指定された意味構造が表示される。・ボタン４０２は、意味構造インデックス２０１に基づく意味構造リストを表示するために操作される。このボタン４０２が押された場合、例えば図１６に示す様なメニュー画面が表示されＳ６０１が実行される。その結果４０１には選択された意味構造が表示される。
・ボタン４０３は、テキストボックス４０１に表示されている意味構造を含む文書のリストである文書リスト４１０を表示するために操作される。
・ボタン４０４は、文書の表示を文単位とする（文書における表示対象を一文のみとする）ために操作される。
・プルダウンメニュー４０５は、分類タグ種類のメニュー（リスト）を表示するために操作される。例えば図１７の様なリストが表示される。
・テキストボックス４０６には、ユーザ所望の分類タグ（具体的には、当該分類タグを表す文字列）が入力される。もし、既存の分類タグに該当するものがない場合は、ユーザは、「新規」を選択しテキストボックス４０６に直接キーボード等から新規分類タグを入力する。
・ボタン４０７は、テキストボックス４０６に入力された分類タグの関連付けのために操作される。
・ボタン４０８はＵＩ４００での作業の終了を判定するＳ６０４のために操作される。 Before describing the UI 400 in the state illustrated in FIG. 8, the configuration of the UI 400 will be described. In the processing part illustrated in FIG. 6, a UI 400 is displayed. The UI 400 has UI components 401-408, for example. For example, the UI components 401 to 408 are as follows.
The display column 401 displays the semantic structure specified in S601 from the semantic structure list. • The button 402 is operated to display a semantic structure list based on the semantic structure index 201 . When this button 402 is pressed, for example, a menu screen as shown in FIG. 16 is displayed and S601 is executed. As a result 401 the selected semantic structure is displayed.
• A button 403 is operated to display a document list 410 that is a list of documents containing the semantic structure displayed in the text box 401 .
A button 404 is operated to display the document in units of sentences (only one sentence is displayed in the document).
- The pull-down menu 405 is operated to display a menu (list) of classification tag types. For example, a list such as that shown in FIG. 17 is displayed.
A user-desired classification tag (specifically, a character string representing the classification tag) is entered in the text box 406 . If there is no applicable existing classification tag, the user selects "new" and directly inputs a new classification tag into the text box 406 using the keyboard or the like.
- The button 407 is operated to associate the classification tag entered in the text box 406 .
- The button 408 is operated for S604 to determine the end of the work on the UI 400 .

例えば、ボタン４０２が操作されると図６のＳ６００が行われ（図１６に示す様な意味構造リストが表示され）、図６のＳ６０１において意味構造「欲しい：配慮」がユーザにより指定されたとする。この場合、ユーザ支援部２１２が、意味構造「欲しい：配慮」を含んだ意味構造特徴１２０３に対応した文書１２０２を文書ＤＢ２００（図３参照）から検索し、図６のＳ６０２において、見つかった文書１２０２の文書リスト４１０を、図８に示すように表示する。文書リスト４１０は、見つかった文書毎に、チェックボックスの欄と、当該文書の少なくとも一部（又はサマリ）の欄と、当該文書に関連付けられている既存の分類タグ（当該文書に対応した分類タグ１２０４における各分類タグ）の欄と、当該文書に追加的に関連付けられる追加の分類タグの欄とを有する。なお、既存の分類タグの欄と追加の分類タグの欄は共通でもよい。この場合、当該共通の欄に、既存の分類タグが表示され、且つ、新たな分類タグが追加される。 For example, when the button 402 is operated, S600 of FIG. 6 is performed (a semantic structure list such as that shown in FIG. 16 is displayed), and in S601 of FIG. . In this case, the user support unit 212 searches the document DB 200 (see FIG. 3) for a document 1202 corresponding to the semantic structure feature 1203 including the semantic structure "want: consideration", and in S602 of FIG. is displayed as shown in FIG. The document list 410 includes, for each found document, a check box column, at least a part (or summary) column of the document, and an existing classification tag associated with the document (classification tag corresponding to the document). each classification tag in 1204) and a column for additional classification tags additionally associated with the document. Note that the existing classification tag column and the additional classification tag column may be common. In this case, existing classification tags are displayed in the common field and new classification tags are added.

図６のＳ６０３において、ユーザが、図９に示すように、幾つかの文書を選択し（ユーザ所望の文書に対応したチェックボックスにチェックマークを記入し）、プルダウンメニュー４０５から分類タグ種類「新規」を選択し、テキストボックス４０６に分類タグ「家庭のネットワーク環境の問題」を入力し、最後に、ボタン４０７を操作したとする。この場合、ユーザ支援部２１２が、当該操作に応答して、図１０に示すように、ユーザに選択された文書の追加タグの欄に「家庭のネットワーク環境の問題」を表示し、且つ、当該文書の分類タグ１２０４に分類タグ「家庭のネットワーク環境の問題」を追加する。 In S603 of FIG. 6, the user selects several documents as shown in FIG. ” is selected, the category tag “Problem with home network environment” is entered in the text box 406 , and finally the button 407 is operated. In this case, in response to the operation, the user support unit 212 displays "home network environment problem" in the additional tag field of the document selected by the user, as shown in FIG. A category tag “home network environment problem” is added to the category tag 1204 of the document.

この後、図７に例示の処理では、作業終了ボタン４０８を指定することでＵＩ４００に代えて、図１１～図１３に例示のＵＩ７００が表示される。 After that, in the processing illustrated in FIG. 7, by designating the work end button 408, instead of the UI 400, the UI 700 illustrated in FIGS. 11 to 13 is displayed.

図１１は、図７のＳ７０２で表示されるＵＩの例を示す。 FIG. 11 shows an example of the UI displayed in S702 of FIG.

図１１に例示の状態のＵＩ７００を説明する前に、ＵＩ７００の構成を説明する。ＵＩ７００は、例えば、ＵＩ部品７０１～７０８を有する。ＵＩ部品７０１～７０８については、例えば下記の通りである。
・プルダウンメニュー７０１は、既存の分類タグ（分類タグ１２０４のカラムにある分類タグ）のメニュー（リスト）を表示するために操作される。
・ボタン７０２は、プルダウンメニュー７０１に表示されている分類タグが関連付いている文書と類似する文書のリストである類似文書リスト７１０を表示するために操作される。つまり、このボタン７０２が押された場合、例えばＳ７０２における類似文書検索が実行される。
・ボタン７０３は、類似文書検索のオプションのリスト表示とオプションの指定（選択）のために操作される。
・ボタン７０４は、類似文書の表示を文単位とする（類似文書における表示対象を一文のみとする）ために操作される。
・プルダウンメニュー７０５は、分類タグ種類のメニューを表示するために操作される。
・テキストボックス７０６には、ユーザ所望の分類タグ（具体的には、当該分類タグを表す文字列）が入力される。
・ボタン７０７は、テキストボックス７０６に入力された分類タグ（又は、プルダウンメニュー７０１に表示されている分類タグ）の関連付けのために操作される。
・ボタン７０８はＵＩ７００での作業の終了を判定するＳ７０４のために操作される。 Before describing the UI 700 in the state illustrated in FIG. 11, the configuration of the UI 700 will be described. The UI 700 has, for example, UI components 701-708. For example, the UI components 701 to 708 are as follows.
- The pull-down menu 701 is operated to display a menu (list) of existing classification tags (classification tags in the column of classification tags 1204).
A button 702 is operated to display a similar document list 710 that is a list of documents similar to the document associated with the classification tag displayed in the pull-down menu 701 . That is, when this button 702 is pressed, for example, the similar document search in S702 is executed.
A button 703 is operated to display a list of similar document search options and specify (select) the options.
A button 704 is operated to display the similar document in units of sentences (only one sentence is displayed in the similar document).
- The pull-down menu 705 is operated to display a menu of classification tag types.
A user-desired classification tag (specifically, a character string representing the classification tag) is entered in the text box 706 .
A button 707 is operated to associate the classification tag input in the text box 706 (or the classification tag displayed in the pull-down menu 701).
- A button 708 is operated for S704 to determine whether the work on the UI 700 is finished.

例えば、図７のＳ７００及びＳ７０１において、ユーザが、既存の分類タグとして直前にタグ付けした分類タグ「家庭のネットワーク環境の問題」をプルダウンメニュー７０１から選択し、ボタン７０２を操作したとする。この場合、ユーザ支援部２１２が、当該操作に応答して、図７のＳ７０２を実行する。すなわち、ユーザ支援部２１２が、分類タグ「家庭のネットワーク環境の問題」を含んだ分類タグ１２０４に対応する文書１２０２毎に当該文書１２０２と類似する文書１２０２を文書ＤＢ２００から検索し、図１１に示すように、見つかった類似文書のリストである類似文書リスト７１０を表示する。類似文書リスト７１０は、見つかった類似文書毎に、チェックボックスの欄と、当該類似文書の少なくとも一部（又はサマリ）の欄と、当該類似文書に関連付けられている既存の分類タグ（当該類似文書に対応した分類タグ１２０４における各分類タグ）の欄と、当該類似文書に追加的に関連付けられる追加の分類タグの欄とを有する。なお、既存の分類タグの欄と追加の分類タグの欄は共通でもよい。この場合、当該共通の欄に、既存の分類タグが表示され、且つ、新たな分類タグが追加される。 For example, in S700 and S701 of FIG. 7, assume that the user selects the previously tagged category tag “home network environment problem” from the pull-down menu 701 and operates the button 702 as an existing category tag. In this case, the user support unit 212 executes S702 in FIG. 7 in response to the operation. That is, the user support unit 212 searches the document DB 200 for a document 1202 similar to the document 1202 corresponding to the classification tag 1204 including the classification tag "home network environment problem". A similar document list 710, which is a list of found similar documents, is displayed. The similar document list 710 includes, for each found similar document, a check box column, at least a part (or summary) column of the similar document, and an existing classification tag associated with the similar document (the similar document and a column for additional category tags additionally associated with the similar document. Note that the existing classification tag column and the additional classification tag column may be common. In this case, existing classification tags are displayed in the common field and new classification tags are added.

図７のＳ７０２において、ユーザが、図１２に示すように、幾つかの文書を選択し（ユーザ所望の文書に対応したチェックボックスにチェックマークを記入し）、プルダウンメニュー７０６から分類タグ種類「同一」を選択し、ボタン７０８を操作したとする。分類タグ種類「同一」は、プルダウンメニュー７０１に表示されている分類タグ「家庭のネットワーク環境の問題」と同一の分類タグを意味する。この場合、ユーザ支援部２１２が、当該操作に応答して、図１３に示すように、ユーザに選択された類似文書（チェックマークが記入されたチェックボックスに対応の類似文書）の追加タグの欄に「家庭のネットワーク環境の問題」を表示し、且つ、当該類似文書の分類タグ１２０４に分類タグ「家庭のネットワーク環境の問題」を追加する。 In S702 of FIG. 7, as shown in FIG. 12, the user selects several documents (puts a check mark in the check box corresponding to the document desired by the user), and selects the classification tag type "same" from the pull-down menu 706. ” is selected and the button 708 is operated. The classification tag type “same” means the same classification tag as the classification tag “home network environment problem” displayed in the pull-down menu 701 . In this case, in response to the operation, the user support unit 212 displays the column of additional tags of the similar document selected by the user (the similar document corresponding to the check box with a check mark) as shown in FIG. , and adds the classification tag ``home network environment problem'' to the classification tag 1204 of the similar document.

この後、ユーザは、更なる類似文書検索を行う場合、プルダウンメニュー７０１から所望の既存の分類タグを選択し、ボタン７０２を操作する。つまり、再度、図７のＳ７００及びＳ７０１が行われる。 After that, the user selects a desired existing classification tag from the pull-down menu 701 and operates a button 702 to search for further similar documents. That is, S700 and S701 of FIG. 7 are performed again.

以上の説明を、例えば下記のように総括することができる。下記の総括は、上述の補足説明や変形例の説明を含んでよい。 The above description can be summarized, for example, as follows. The following summary may include explanations of supplements and variations of the above.

インデキシング部２１１が、複数の文書１２０２の各々から意味構造を抽出し抽出された意味構造のインデックス２０１を作成する。ユーザ支援部２１２が、以下の（Ａ）乃至（Ｅ）を行う。
（Ａ）インデックス２０１上の意味構造の選択をユーザから受け付ける。
（Ｂ）複数の文書１２０２に、（Ａ）で選択された意味構造に適合する文書があれば、当該文書２０２の少なくとも一部又はサマリをユーザへ表示する。
（Ｃ）表示された文書１２０２のうちの一つ以上の文書１２０２の選択と、当該一つ以上の文書へのタグである分類タグの指定とをユーザから受け付ける。
（Ｄ）（Ｃ）で選択された一つ以上の文書の各々に、（Ｃ）で指定された分類タグを関連付ける。
（Ｅ）複数の文書１２０２のうち少なくとも分類タグが関連付けられていない文書から、指定された分類タグが関連付けられている少なくとも一つの文書と類似する文書を検索し、当該類似する文書があれば、当該類似する文書の少なくとも一部又はサマリをユーザへ表示し、（Ｃ）を行う。 The indexing unit 211 extracts a semantic structure from each of the plurality of documents 1202 and creates an index 201 of the extracted semantic structure. The user support unit 212 performs the following (A) to (E).
(A) Receive a selection of a semantic structure on the index 201 from the user.
(B) If the plurality of documents 1202 includes a document that matches the semantic structure selected in (A), at least a part or a summary of the document 202 is displayed to the user.
(C) Selecting one or more documents 1202 from the displayed documents 1202 and specifying classification tags, which are tags for the one or more documents, are received from the user.
(D) Associate the classification tag specified in (C) with each of the one or more documents selected in (C).
(E) Search for documents similar to at least one document associated with a designated classification tag from among documents not associated with at least a classification tag among the plurality of documents 1202, and if there is such a similar document, Display at least part of or a summary of the similar document to the user, and perform (C).

これにより、複数の文書１２０２を網羅的に精度良くユーザが分析することを支援することができる。具体的には、例えば、次の通りである。 Accordingly, it is possible to assist the user in comprehensively and accurately analyzing the plurality of documents 1202 . Specifically, for example, it is as follows.

すなわち、全文書１２０２から抽出された全ての意味構造が正確であり、各文書の各文から抽出された全ての意味構造をユーザが確認して所望の意味構造を選択できれば、複数の文書１２０２の網羅的な分析が実現されると考えられる。しかし、いわゆる表記ゆれ等が原因で意味構造が正確に文書から抽出できるとは限らない。一つの文書から抽出された複数の意味構造を組み合わせることで当該文書の意味を表すことが考えられるが、組合せパターンが多く、実現が困難である。また、各文書の各文から抽出された全ての意味構造をユーザが確認するとなると、実質的にユーザが全ての文書を見ることになり、ユーザの分析負担が軽減されない。 That is, if all the semantic structures extracted from all the documents 1202 are correct, and if the user can confirm all the semantic structures extracted from each sentence of each document and select a desired semantic structure, then the plurality of documents 1202 It is believed that exhaustive analysis will be realized. However, it is not always possible to accurately extract the semantic structure from the document due to so-called notation variations and the like. It is conceivable to express the meaning of a document by combining multiple semantic structures extracted from one document, but there are many combination patterns and it is difficult to realize. Moreover, if the user confirms all the semantic structures extracted from each sentence of each document, the user essentially sees all the documents, and the user's analysis burden is not reduced.

本実施形態によれば、ユーザにより選択された代表的な意味構造を持つ文書を検索し、見つかった文書のうちユーザが選択した文書にユーザ所望の分類タグを関連付け、その後に、ユーザが付与済の既存分類タグのうちのユーザ所望の分類タグの選択を受け付けることと、選択された既存の分類タグを持つ文書と類似する文書を検索することと、見つかった類似文書のうちユーザが選択した類似文書にユーザ所望の分類タグを関連付けることとが、必要に応じて繰り返される。つまり、意味構造をキーとした文書検索により見つかった文書のうちユーザが選択した文書を種として網羅的に文書を分析することが支援される。 According to this embodiment, a document having a representative semantic structure selected by the user is searched, the document selected by the user among the found documents is associated with the classification tag desired by the user, and then the user-assigned receiving a selection of a user-desired classification tag from among the existing classification tags; searching for documents similar to the document having the selected existing classification tag; Associating user-desired classification tags with the document is repeated as necessary. In other words, it is possible to comprehensively analyze the documents using the document selected by the user among the documents found by the document search using the semantic structure as a key.

意味構造の選択は、意味構造に基づく意味構造条件の指定の一例である。「意味構造条件」は、条件としての一つの意味構造でもよいし、二つ以上の意味構造の組合せ（例えば、採用される文書が持つ意味構造と除外される文書が持つ意味構造との組合せ）でもよい。 Selection of a semantic structure is an example of specifying a semantic structure condition based on the semantic structure. A "semantic structure condition" may be one semantic structure as a condition, or a combination of two or more semantic structures (for example, a combination of a semantic structure of a document to be adopted and a semantic structure of a document to be excluded). It's okay.

また、テキストを含んだあらゆる文書の分析支援に、本発明を適用し得る。例えば、文書としては、アンケートとしての文書、オンラインミーティングやラジオ番組等の音声データがテキスト化された文書、ＳＮＳ（Social Networking Service）等のサイトでのコメントやユーザ間のメッセージ群（ユーザ間でやりとりされたメッセージ）としての文書等が採用されてよい。 In addition, the present invention can be applied to support analysis of any document containing text. For example, the documents include documents as questionnaires, documents in which voice data of online meetings and radio programs are converted into text, comments on sites such as SNS (Social Networking Service), and messages between users (exchanges between users). A document or the like may be employed as a message that has been sent.

また、本実施形態によれば、（ａ）ユーザへ意味構造、文書及び分類タグを表示すること、（ｂ）（ａ）での表示の中からユーザによる選択を受け付けること、（ｃ）（ｂ）でユーザにより選択された意味構造、文書又は分類タグを用いた処理の結果として（ａ）を行うことが繰り返される。すなわち、本実施形態によれば、ユーザが選択した意味構造をキーに文書を検索し、見つかった文書にユーザが分類タグを関連付けることで当該文書に意味的な要素を加え、分類タグのユーザによる選択とその分類タグが関連付けられている文書の類似文書の検索及び表示との繰り返しがされる。つまり、本実施形態に係るシステムは、いわゆる人間拡張のためのシステムであり、文書分類の自動化のシステムではない。 Further, according to the present embodiment, (a) displaying the semantic structure, the document and the classification tag to the user, (b) receiving selection by the user from the display in (a), (c) (b) ) is repeated as a result of processing with the semantic structure, document or taxonomy tag selected by the user in (a). That is, according to the present embodiment, documents are searched using the semantic structure selected by the user as a key, and the user associates classification tags with the found documents to add semantic elements to the documents. There is an iterative selection and retrieval and display of documents similar to the document with which the classification tag is associated. In other words, the system according to this embodiment is a system for so-called human augmentation, not a system for automating document classification.

ユーザへのＵＩにおいて、表示対象の文書１２０２毎に、表示対象は下記でよい。これによる、ユーザによる文書選択が支援される。
・当該文書１２０２のうち、少なくとも、当該文書１２０２から抽出された意味構造を構成する意味要素（例えば、形態素や単語）を含んだ文字列。
・当該文書に分類タグが関連付けられていれば当該分類タグ。 In the UI for the user, the following may be displayed for each document 1202 to be displayed. This assists the user in document selection.
A character string containing at least semantic elements (for example, morphemes and words) that constitute the semantic structure extracted from the document 1202 .
• If a classification tag is associated with the document, the classification tag.

ユーザ支援部２１２が、上記（Ｄ）の都度に、類似文書検索の停止条件が満たされているか否かを判定し、当該判定の結果が偽の場合、（Ｅ）を行ってよい。停止条件は、下記のいずれかでよい。このような判定は、例えば図６のＳ６０４や図７のＳ７０４での判定でよい。このような停止条件が設けられることで、文書１２０２の数が膨大でも適切なタイミングで処理を終えることができる。
・複数の文書１２０２に対し、分類タグが関連付けられていない文書の割合が、一定割合未満である。言い換えれば、分類タグの関連付けが十分に行われたとみなされることに該当する条件である。
・（Ｅ）の実行回数が所定回数に達している。 The user support unit 212 may determine whether or not the similar document retrieval stop condition is satisfied each time (D) is performed, and (E) may be performed when the result of the determination is false. The stopping condition may be any of the following. Such determination may be the determination in S604 of FIG. 6 or S704 of FIG. 7, for example. By providing such a stop condition, even if the number of documents 1202 is enormous, the processing can be finished at an appropriate timing.
- The proportion of documents with no classification tag associated with the plurality of documents 1202 is less than a certain proportion. In other words, it is a condition corresponding to the fact that the association of classification tags is deemed to have been sufficiently performed.
- The number of executions of (E) has reached a predetermined number.

類似文書検索では、例えば図１１に例示のＵＩ７００のボタン７０３をユーザが操作することで、ユーザ支援部２１２が、類似文書検索のオプションとして、例えば以下のオプションをユーザへ提示してよい。（ａ）が、上述の類似文書検索（文書ベクトル１２０５を用いた類似文書検索）である。
（ａ）コサイン類似度サーチ
（ｂ）分類モデルの学習及びサーチ（既にタグのついた文書を学習データにして文書タグの予測分類モデルを学習し、その分類モデルを用いてタグ無しの文書からが街頭の文書をサーチする類似文書検索）
（ｃ）特徴量オプション（分類時に用いる特徴量ベクトル算出方法の選定）FastText, BERT, LDAなどの既知の方式
（ｄ）Revolutionalモード（分類タグが新規に関連付けされた場合に分類モデルが更新され、未知の文書のタグの分類予測の精度が上がる）
（ｅ）分類タグの入力支援 In the similar document search, for example, when the user operates the button 703 of the UI 700 illustrated in FIG. 11, the user support unit 212 may present the following options to the user as similar document search options. (a) is the aforementioned similar document retrieval (similar document retrieval using the document vector 1205).
(a) Cosine similarity search (b) Classification model training and search (Tagged documents are used as learning data to learn a predictive classification model for document tags, and the classification model is used to classify untagged documents. Similar document search for searching documents on the street)
(c) Feature quantity option (selection of feature quantity vector calculation method used for classification) Known method such as FastText, BERT, LDA (d) Revolutional mode (classification model is updated when a classification tag is newly associated, increases the accuracy of classification predictions for tags in unknown documents)
(e) Input support for classification tags

例えば、（ｅ）のオプションに従う類似文書検索、又は、ユーザが選択した意味構造をキーとした文書検索では、ユーザ支援部２１２が、文書に対し関連付ける分類タグの少なくとも一部又はサマリを学習済言語モデル２０３を用いて特定してユーザへ表示してよい。学習済言語モデル２０３への入力は、文書の少なくとも一部又はサマリを含んでよい。学習済言語モデル２０３からの出力は、分類タグの少なくとも一部又はサマリを含んでよい。 For example, in a similar document search according to the option (e) or a document search using a user-selected semantic structure as a key, the user support unit 212 converts at least part or a summary of the classification tags to be associated with the document into the learned language. The model 203 may be used to identify and display to the user. Inputs to the trained language model 203 may include at least portions or summaries of documents. The output from the trained language model 203 may include at least some or summaries of the classification tags.

このような分類タグ入力支援として、例えば、下記のいずれかが採用されてよい。 As such classification tag input support, for example, any of the following may be adopted.

例えば、図１４に例示の分類タグ入力支援によれば、代表的文書に分類タグを付けた後に根拠を自動的に可視化することで再確認の効率を上げる。 For example, according to the classification tag input support illustrated in FIG. 14, the efficiency of reconfirmation is improved by automatically visualizing the basis after the classification tag is attached to the representative document.

また、例えば、図１５に例示の分類タグ入力支援によれば、分類タグに含めるべき適切な単語がわかる。
［第２の実施形態］ Further, for example, according to the classification tag input assistance illustrated in FIG. 15, appropriate words to be included in the classification tag can be found.
[Second embodiment]

第２の実施形態の特徴は、下記である。
（１）第１の実施形態よりも意味構造をより詳細にすることができる。
（２）表記ゆれを吸収することができる。
（３）述語項に代えて又は加えて述語を起点とした文書検索も可能である。 Features of the second embodiment are as follows.
(1) The semantic structure can be made more detailed than in the first embodiment.
(2) Notation variations can be absorbed.
(3) It is also possible to retrieve documents starting from predicates instead of or in addition to predicate terms.

以下、第２の実施形態を説明する。その際、第１の実施形態との相違点を主に説明し、第１の実施形態との共通点については説明を省略又は簡略する。 A second embodiment will be described below. At that time, differences from the first embodiment will be mainly described, and descriptions of common points with the first embodiment will be omitted or simplified.

＜「（１）第１の実施形態よりも意味構造をより詳細にすることができる。」について＞ <Regarding “(1) The semantic structure can be made more detailed than in the first embodiment.”>

図１８は、第２の実施形態に係る文書ＤＢ２００の一部の構成例を示す。 FIG. 18 shows a configuration example of part of the document DB 200 according to the second embodiment.

文書１２０２は、文書全体でもよいが、本実施形態では、文書のうちの一文を表す。このため、図示しないが、文書ＩＤ１２００は、文書のＩＤと、文書における当該一文のＩＤとの組合せであってよい。すなわち、本明細書において、「文書」は、狭義には、文の集合を含んだ要素でよく、広義には、文の集合の一部（例えば、個々の文）でもよい。 Document 1202 may be an entire document, but in this embodiment represents a sentence of the document. Therefore, although not shown, the document ID 1200 may be a combination of the ID of the document and the ID of the sentence in the document. That is, in this specification, a "document" may be an element including a set of sentences in a narrow sense, or may be a part of a set of sentences (for example, individual sentences) in a broad sense.

意味構造特徴１２０３が、図３に例示した構成と異なる。具体的には、意味構造特徴１２０３が、predicate１１、predicate_hyouki１２、negation１３、pred_last１４、arg_subj１５、arg_obj１６、arg_when１７、arg_where１８及びarg_other１９といった情報を有する。一つの文を例に取る（図１８の説明において「対象文」）。 Semantic features 1203 differ from the configuration illustrated in FIG. Specifically, the semantic structure feature 1203 has information such as predicate11, predicate_hyouki12, negation13, pred_last14, arg_subj15, arg_obj16, arg_when17, arg_where18 and arg_other19. Take one sentence as an example (“object sentence” in the description of FIG. 18).

predicate１１は、対象文における述語（原形）を表す。predicate_hyouki１２は、対象文における述語（表記）を表す。 Predicate 11 represents the predicate (original form) in the target sentence. predicate_hyouki12 represents a predicate (notation) in the target sentence.

negation１３は、対象文が否定文であるか否かのフラグである。pred_last１４は、文末が述語であるか否かのフラグである。 Negation 13 is a flag indicating whether or not the target sentence is a negative sentence. pred_last14 is a flag indicating whether or not the end of the sentence is a predicate.

情報１５～１９は、意味構造を正確に表現するための情報の例である。arg_subj１５は、対象文が述語に主語で係る文である場合にその主語を表す。arg_obj１６は、対象文が述語に目的語で係る文である場合にその目的語を表す。arg_when１７は、対象文が述語に時間格（時格）で係る文である場合にその時格を表す。arg_where１８は、対象文が述語に場所格（所格）で係る文である場合にその場所格を表す。arg_other１９は、対象文が述語に他の修飾（例えば、理由、条件などの節）で係る文である場合にその修飾を表す。 Information 15 to 19 are examples of information for accurately representing the semantic structure. arg_subj15 represents the subject when the target sentence is a sentence related to the predicate by the subject. arg_obj 16 represents the object when the target sentence is a sentence related to the predicate by the object. arg_when 17 represents the time case when the target sentence is a sentence related to the predicate in the time case (time case). arg_where 18 represents the local case when the target sentence is a sentence related to the predicate by the local case (locative case). arg_other19 represents the modification when the target sentence is a sentence related to the predicate by another modification (for example, clauses such as reason and condition).

図１８の例示によれば、述語とそれに係る対象語（述語項）との間には依存関係があり、その依存関係の解析を行うことで、意味構造の要素を、単語に代えて又は加えて、句や文節にすることができる。このような意味構造を用いることで、述語と述語に係る対象語とを中心とした文書の探索が可能である。 According to the example of FIG. 18, there is a dependency relationship between a predicate and its related target word (predicate term), and by analyzing the dependency relationship, elements of the semantic structure can be replaced with or added to words. can be phrases or clauses. By using such a semantic structure, it is possible to search for documents centering on predicates and target words related to the predicates.

図１９は、述語に係る主語（助詞「が」で係る）のランキングリストの例を示す。 FIG. 19 shows an example of a ranking list of subjects related to predicates (related to the particle "ga").

このリストは、例えば、図５のＳ５０３においてインデキシング部２１１により作成される。このリストによれば、述語に係る対象語が主語であるが、主語以外の対象語についても、同様の構成のリストが作成される。 This list is created by the indexing unit 211 in S503 of FIG. 5, for example. According to this list, the target word related to the predicate is the subject, but a list with a similar structure is created for target words other than the subject.

インデキシング部２１１は、文書ＤＢ２００のarg_subj１５のカラムを参照し、述語に係る主語毎の頻度（数）を集計し、集計結果を表すランキングリスト（例えば、主語の頻度降順のリスト）を作成する。そして、インデキシング部２１１は、主語毎に、当該主語をarg_subj１５として有する全てのエントリを参照し、当該主語と当該主語が係る述語（predicate１１及び／又はpredicate_hyouki１２）とを含んだ述語項構造毎に、当該述語項構造の頻度（数）を集計する。図２０は、主語が「業務負担が」である場合の集計結果（述語項構造の頻度降順のランキングリスト）を示す。インデキシング部２１１は、各述語項構造とその頻度を、意味構造１３００及び頻度数１３０１として意味構造インデックス２０１に登録し、且つ、その意味構造を持つ文書のＩＤを文書ＩＤ１３０２として意味構造インデックス２０１に登録する（図４参照）。 The indexing unit 211 refers to the arg_subj15 column of the document DB 200, counts the frequency (number) of each subject related to the predicate, and creates a ranking list (for example, a list in descending order of subject frequency) representing the counted result. Then, for each subject, the indexing unit 211 refers to all entries having the subject as arg_subj15, and for each predicate-argument structure containing the subject and the predicate (predicate11 and/or predicate_hyouki12) related to the subject, Aggregate the frequency (number) of predicate-argument structures. FIG. 20 shows the counting result (ranking list in descending order of frequency of the predicate-argument structure) when the subject is "work burden". The indexing unit 211 registers each predicate-argument structure and its frequency in the semantic structure index 201 as a semantic structure 1300 and a frequency count 1301, and registers the ID of the document having that semantic structure in the semantic structure index 201 as a document ID 1302. (see Figure 4).

図６のＳ６００において、ユーザ支援部２１２が、意味構造インデックス２０１を参照し、図６のＳ６０１において、ユーザ支援部２１２が、図２１に例示のＵＩ２１００を表示してよい。但し、この段階では、ＵＩ２１００のうち、述語項構造の検索結果がブランクでよい。ＵＩ２１００は、選択ツール２１０１及び２１０２と、検索実行ボタン２１０３と、タグ付けボタン２１０４とを有する（要素２１０１～２１０４は、ＧＵＩ部品の一例）。 In S600 of FIG. 6, the user support unit 212 may refer to the semantic structure index 201, and in S601 of FIG. 6, the user support unit 212 may display the UI 2100 illustrated in FIG. However, at this stage, the search result of the predicate-argument structure in the UI 2100 may be blank. The UI 2100 has selection tools 2101 and 2102, a search execution button 2103, and a tagging button 2104 (elements 2101 to 2104 are examples of GUI components).

選択ツール２１０１を用いて、文書ＤＢ２００における、述語に係る述語項（例えば主語）のうちの、ユーザ所望の述語項を、選択することができる（選択ツール２１０２については後述）。検索実行ボタン２１０３が押された場合、ユーザ支援部２１２が、選択ツール２１０１を用いて選択された述語項を含んだ述語項構造（意味構造）を意味構造インデックス２０１から検索し、見つかった述語項構造を持つ文書を、文書ＤＢ２００から検索する。図６のＳ６０２において、ユーザ支援部２１２が、見つかった述語項構造及び文書等のリスト（検索結果）を同ＵＩ２１００に表示する。 Using the selection tool 2101, a user-desired predicate term can be selected from predicate terms related to predicates (for example, subjects) in the document DB 200 (the selection tool 2102 will be described later). When the search execution button 2103 is pressed, the user support unit 212 searches the semantic structure index 201 for a predicate argument structure (semantic structure) including the predicate argument selected using the selection tool 2101, and searches for the found predicate argument. A structured document is searched from the document DB 200 . In S602 of FIG. 6, the user support unit 212 displays the found predicate-argument structures and a list of documents (search results) on the same UI 2100. FIG.

その後、タグ付けボタン２１０４が押された場合、ユーザ支援部２１２が、例えば、図６のＳ６０３において、分類タグの付与のための図２２に例示のＵＩ２２００を表示する。 After that, when the tagging button 2104 is pressed, the user support unit 212 displays the UI 2200 illustrated in FIG. 22 for assigning classification tags in S603 of FIG. 6, for example.

図２１に例示の検索結果によれば、一部のＩＴスキルが高い社員にＩＴ系の業務の負担が偏っているという傾向がわかる。このため、ユーザは、分類タグとして、「社員への業務負担偏り」を、この組合せを持つ文書に付与することが考えられる。 According to the search results illustrated in FIG. 21, it can be seen that there is a tendency that the burden of IT-related work is disproportionately placed on some employees with high IT skills. For this reason, it is conceivable that the user may add a classification tag of "disproportionate workload to employees" to documents having this combination.

＜「（２）表記ゆれを吸収することができる。」について＞ <Regarding “(2) It is possible to absorb notation variations”>

図２１に例示の検索結果によれば、「社員の業務負担偏り」に関する意見文が散見される。一回の操作で複数の意見文に同一の分類タグ「社員の業務負担偏り」を関連付けたいが「業務負担が」に係る述語「偏る」には様々な表記ゆれがある。 According to the search results illustrated in FIG. 21, there are scattered opinions about "employees' unbalanced workload". We would like to associate the same classification tag "employee's work burden biased" with multiple opinion sentences in one operation, but there are various spelling variations for the predicate "biased" related to "work burden".

そこで、ユーザ支援部２１２が、見つかった述語項構造又はそれを持つ意見文に対し、ルールベース又は機械学習ベースの処理を施すことで、類似した述語項構造を特定する。 Therefore, the user support unit 212 identifies a similar predicate-argument structure by applying rule-based or machine-learning-based processing to the found predicate-argument structure or an opinion sentence having the predicate-argument structure.

例えば、ルールベースの処理は、述語の正規化である。具体的には、例えば、ユーザ支援部２１２が、述語「偏りすぎる」「偏ってしまう」などから助動詞を除き（また、余計な記号や文字があればそれを排除し）、原形「偏る」を得る。ユーザ支援部２１２は、述語項構造における述語を原形に置換し、類似した二つ以上の述語項構造を特定する。その際、ユーザ支援部２１２は、「偏っていない」などの否定形（具体的には、negation１３がオンになっているエントリに対応の文）を除く。 For example, rule-based processing is predicate normalization. Specifically, for example, the user support unit 212 removes the auxiliary verb from the predicate "too biased", "biased", etc. obtain. The user support unit 212 replaces the predicate in the predicate-argument structure with the original form and identifies two or more similar predicate-argument structures. At this time, the user support unit 212 removes negative forms such as "not biased" (specifically, sentences corresponding to entries with negation13 turned on).

また、例えば、機械学習ベースの処理は、Word2Vec、FastText又はBERT等の言語の分散表現で述語の類似度を算出することである。ユーザ支援部２１２が、距離の近い（類似度の差の一定値以下）の二つ以上の述語項構造を特定してよい（文全体で類似度が推定されてもよい）。 Also, for example, machine learning-based processing is to calculate similarity of predicates in distributed representations of languages such as Word2Vec, FastText, or BERT. The user support unit 212 may identify two or more predicate-argument structures that are close to each other (the similarity difference is equal to or less than a certain value) (similarity may be estimated for the entire sentence).

以上のようにして特定された二つ以上の述語項構造（類似した述語項構造）が、図２２に例示のＵＩ２２００において並べられる。結果として、見つかった述語項構造において述語の表記ゆれがあっても、類似した述語項構造が精度良く特定され並べられる。このため、ユーザは、所望の分類タグの付与先の意見文を選択し易い。ユーザは、タグ入力欄２２０１に分類タグ「社員の業務負担偏り」を入力し、当該分類タグの付与先の意見文を選択し（チェックマークを記入し）、タグ付け実行ボタン２２０２を押す。ユーザ支援部２１２が、入力された分類タグ「社員の業務負担偏り」を、選択された全ての意見文に関連付けて、関連付けた分類タグを、当該分類タグが付与された意見文（文書）のエントリにおける分類タグ１２０４（図３参照）に含める。 Two or more predicate-argument structures (similar predicate-argument structures) specified as described above are arranged on the UI 2200 illustrated in FIG. As a result, similar predicate-argument structures are identified and arranged with high accuracy even if there is variation in the notation of predicates in the found predicate-argument structures. Therefore, the user can easily select an opinion sentence to which a desired classification tag is attached. The user enters the classification tag “employee's work burden imbalance” in the tag input field 2201 , selects the opinion sentence to which the classification tag is assigned (enters a check mark), and presses the tagging execution button 2202 . The user support unit 212 associates the input classification tag "employee's workload bias" with all the selected opinion sentences, and assigns the associated classification tags to the opinion sentences (documents) to which the classification tags are assigned. Include in the classification tag 1204 (see FIG. 3) in the entry.

＜「（３）述語項に代えて又は加えて述語を起点とした文書検索も可能である。」について＞ <Regarding “(3) Document retrieval starting from predicates instead of or in addition to predicate terms”>

主語「業務負担が」のような述語項に代えて又は加えて述語を起点とした文書検索の例を、図２３及び図２４を参照して説明する。 An example of document retrieval using a predicate as a starting point in place of or in addition to a predicate term such as the subject "business burden" will be described with reference to FIGS. 23 and 24. FIG.

図２３に例示するように、ＵＩ２１００の選択ツール２１０２を用いて、文書ＤＢ２００における、述語のうちの、ユーザ所望の述語（例えば「足りない」）を、選択することができる。検索実行ボタン２１０３が押された場合、ユーザ支援部２１２が、選択ツール２１０２を用いて選択された述語を含んだ述語項構造（意味構造）を意味構造インデックス２０１から検索し、見つかった述語項構造を持つ文書を、文書ＤＢ２００から検索する。図６のＳ６０２において、ユーザ支援部２１２が、見つかった述語項構造及び文書等のリスト（検索結果）を同ＵＩ２１００に表示する。 As illustrated in FIG. 23, a selection tool 2102 of the UI 2100 can be used to select a user-desired predicate (for example, "insufficient") from the predicates in the document DB 200. FIG. When the search execution button 2103 is pressed, the user support unit 212 searches the semantic structure index 201 for a predicate-argument structure (semantic structure) including the predicate selected using the selection tool 2102, and searches for the found predicate-argument structure. The document DB 200 is searched for a document having In S602 of FIG. 6, the user support unit 212 displays the list (search result) of the found predicate-argument structures and documents on the same UI 2100. FIG.

その後、タグ付けボタン２１０４が押された場合、ユーザ支援部２１２が、例えば、図６のＳ６０３において、分類タグの付与のための図２４に例示のＵＩ２４００を表示する。具体的には、見つかった述語項構造において述語項（例えば主語）の表記ゆれがあっても、類似した述語項構造が精度良く特定されＵＩ２４００上に並べられる。例えば、ユーザは、タグ入力欄２４０１に分類タグ「研修時間の不足」を入力し、当該分類タグの付与先の意見文を選択し（チェックマークを記入し）、タグ付け実行ボタン２４０２を押す。ユーザ支援部２１２が、入力された分類タグ「研修時間の不足」を、選択された全ての意見文に関連付けて、関連付けた分類タグを、当該分類タグが付与された意見文（文書）のエントリにおける分類タグ１２０４（図３参照）に含める。 After that, when the tagging button 2104 is pressed, the user support unit 212 displays the UI 2400 illustrated in FIG. 24 for assigning classification tags in S603 of FIG. 6, for example. Specifically, even if there is variation in the notation of a predicate term (for example, a subject) in the found predicate term structure, similar predicate term structures are identified with high accuracy and arranged on the UI 2400 . For example, the user enters the classification tag “shortage of training time” in the tag input field 2401 , selects the opinion sentence to which the classification tag is assigned (enters a check mark), and presses the tagging execution button 2402 . The user support unit 212 associates the input classification tag “insufficient training time” with all the selected opinion sentences, and sets the associated classification tag to the entry of the opinion sentence (document) to which the classification tag is assigned. in the classification tag 1204 (see FIG. 3).

以上が、第２の実施形態の説明である。第２の実施形態において、意味構造は、述語と述語項とを含んだ述語項構造である。第２の実施形態において、第１の実施形態との関係は、例えば下記でよい。
・（Ａ）において指定される意味構造条件は、述語項構造を構成する述語及び述語項のうちの一方又は両方についての条件である（例えば、第１の実施形態での「欲しい：配慮」（例えば図８参照）が、述語及び述語項の両方の条件の一例である）。
・（Ｂ）において、（Ａ）で指定された意味構造条件に適合する文書は、（Ａ）で指定された条件に該当の述語又は述語項を含んだ述語項構造を持つ文書である。
・ユーザ支援部２１２は、（Ａ）で指定された条件に該当の述語又は述語項を含んだ述語項構造における述語項又は述語に対してルールベース又は機械学習ベースの処理を行い、類似する述語項構造を特定する。（Ｃ）において、表示された文書は、類似する述語項構造を有する文書である。「類似する述語項構造」は、述語項構造間の類似度が一定値未満の述語項構造でよい。 The above is the description of the second embodiment. In the second embodiment, the semantic structure is a predicate-argument structure containing predicates and predicate-terms. In the second embodiment, the relationship with the first embodiment may be, for example, as follows.
- The semantic structure condition specified in (A) is a condition for one or both of the predicate and the predicate term that constitute the predicate-argument structure (for example, "want: consideration" in the first embodiment ( For example, see FIG. 8) is an example of a condition for both the predicate and the predicate term).
・In (B), a document that meets the semantic structure condition specified in (A) is a document that has a predicate-argument structure that includes a predicate or predicate-argument corresponding to the condition specified in (A).
The user support unit 212 performs rule-based or machine-learning-based processing on predicate terms or predicate terms in the predicate term structure that includes the predicate or predicate term corresponding to the condition specified in (A), and performs similar predicate Identify term structure. In (C), the displayed documents are those with similar predicate-argument structures. "Similar predicate-argument structures" may be predicate-argument structures whose similarity between predicate-argument structures is less than a certain value.

以上、本発明の幾つか実施形態を説明したが、これらは本発明の説明のための例示であって、本発明の範囲をこれらの実施形態にのみ限定する趣旨ではない。本発明は、他の種々の形態でも実施することが可能である。 Although some embodiments of the present invention have been described above, these are examples for explanation of the present invention, and are not intended to limit the scope of the present invention only to these embodiments. The present invention can also be implemented in various other forms.

例えば、上述の説明では、ユーザ支援部２１２が、「（Ａ）インデックス２０１上の意味構造の選択をユーザから受け付ける。」を行うが、この（Ａ）では、ユーザ支援部２１２が、意味構造条件を、ユーザから受け付けること無しに自動で指定してよい。例えば、ユーザ支援部２１２が、ユーザの指定履歴（意味構造条件の指定の履歴）を蓄積し、当該履歴からユーザにとって重要と思われる意味構造の条件を割り出してルール化し、当該ルールに沿って意味楮条件を自動指定することができる。具体的には、例えば、ユーザがネガティブな意味構造を中心に指定しているとの履歴が蓄積されている場合、ユーザ支援部２１２が、当該履歴を基に、ネガティブな意味構造（例えば、［減少する：売上］、［退職する：社員］、［増大する：不満］）を自動的に指定してよい。 For example, in the above description, the user support unit 212 performs "(A) accepts selection of the semantic structure on the index 201 from the user." may be specified automatically without receiving input from the user. For example, the user support unit 212 accumulates a user's designation history (history of designation of semantic structure conditions), identifies semantic structure conditions that are considered important to the user from the history, forms rules, and creates semantic structures according to the rules. Kozo conditions can be automatically specified. Specifically, for example, when a history is accumulated that the user has specified mainly negative semantic structures, the user support unit 212 selects negative semantic structures (e.g., [ Decrease: Sales], [Retire: Employee], [Increase: Dissatisfaction]) may be automatically specified.

１３…文書分析支援システム 13... document analysis support system

Claims

an indexing unit that extracts a semantic structure from each of a plurality of documents and creates an index of the extracted semantic structure;
a user support unit that performs the following (A) to (E); death,
(B) if among the plurality of documents, there is a document that matches the semantic structure condition specified in (A), at least a part or a summary of the document is displayed to the user;
(C) accepting from the user a selection of one or more documents from the displayed documents and designation of a classification tag, which is a tag for the one or more documents;
(D) associate each of the one or more documents selected in (C) with a classification tag specified in (C);
(E) searching for documents similar to at least one document associated with a specified classification tag from among the plurality of documents not associated with at least a classification tag, and if there is such a document, displaying at least a portion or summary of the similar document to the user, and performing (C);
document analysis support system.

For each document to be displayed,
a character string including at least semantic elements constituting a semantic structure extracted from the document is to be displayed;
If a classification tag is associated with the document, the classification tag is also subject to display.
The document analysis support system according to claim 1.

The user support unit
Each time (D), determine whether or not the conditions for stopping the similar document search are satisfied,
If the result of the judgment is false, perform (E),
The stopping condition is any of the following:
- The ratio of documents with no associated classification tag to the plurality of documents is less than a certain ratio.
・The number of executions of (E) has reached a predetermined number,
The document analysis support system according to claim 1.

The user support unit identifies at least part of or a summary of the classification tags to be associated with the document using the learned language model and displays it to the user;
the user assistance unit displaying predicted classification tags to a user;
input to the trained language model includes at least a portion or summary of a document;
output from the trained language model includes at least a portion or summary of classification tags;
The document analysis support system according to claim 1.

the semantic structure is a predicate-argument structure containing a predicate and a predicate-argument;
The semantic structure condition specified in (A) is a condition for one or both of the predicate and the predicate term that make up the predicate-argument structure,
In (B), the document that conforms to the semantic structure condition specified in (A) is a document that has a predicate-argument structure that includes a predicate or predicate-term corresponding to the condition specified in (A).
A document analysis support system according to any one of claims 1 to 4.

The user support unit performs rule-based or machine-learning-based processing on predicate terms or predicate terms in a predicate term structure that includes a predicate or predicate term corresponding to the condition specified in (A), and performs similar predicate terms identify the structure,
In (C), the displayed documents are documents with similar predicate-argument structures,
The document analysis support system according to claim 5.

(1) a computer extracts a semantic structure from each of a plurality of documents and creates an index of the extracted semantic structure;
(2) the computer receives from the user a designation of a semantic structure condition based on the viewpoint that the user wants to know based on the index, or automatically designates the semantic structure condition;
(3) if the computer matches the semantic structure condition specified in (2) among the plurality of documents, the computer displays the document to the user;
(4) the computer receives from the user a selection of one or more documents from the displayed documents and designation of a classification tag, which is a tag for the one or more documents;
(5) the computer associates each of the one or more documents selected in (4) with the classification tag specified in (4);
(6) The computer searches for documents similar to at least one document associated with a designated classification tag from among the plurality of documents not associated with at least a classification tag, and If so, display at least a portion or summary of the similar document to the user and perform (4);
Document analysis support method.

(1) extracting a semantic structure from each of a plurality of documents and creating an index of the extracted semantic structure;
(2) receiving from the user a specification of a semantic structure condition based on a viewpoint that the user wants to know based on the index, or automatically specifying the semantic structure condition;
(3) if there is a document among the plurality of documents that matches the semantic structure condition specified in (2), the document is displayed to the user;
(4) accepting from the user a selection of one or more documents from the displayed documents and designation of a classification tag, which is a tag for the one or more documents;
(5) associate the classification tag specified in (4) with each of the one or more documents selected in (4);
(6) searching documents similar to at least one document associated with a specified classification tag from among the plurality of documents not associated with at least a classification tag; displaying at least a portion or summary of the similar document to the user, and performing (4);
A computer program that makes a computer do something.