JP2009032180A

JP2009032180A - Text mining apparatus and text mining method

Info

Publication number: JP2009032180A
Application number: JP2007197624A
Authority: JP
Inventors: Sada Mizunuma; 貞水沼
Original assignee: Hitachi Software Engineering Co Ltd
Current assignee: Hitachi Software Engineering Co Ltd
Priority date: 2007-07-30
Filing date: 2007-07-30
Publication date: 2009-02-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for efficiently collecting documents by retrieval based on text mining and document quotation information. <P>SOLUTION: This system comprises: a condition input processing part for providing a bio-database (document database) for a text mining target and an input screen for urging a user to input a keyword necessary for document retrieval and accepting the keyword input from the input screen; a text mining processing part for executing text mining retrieval of the document database on the basis of a condition accepted by the condition input part; a document quotation relation analysis processing part for analyzing the quotation relation of documents extracted by the text mining retrieval; and a result output processing part for displaying document names which are the results of the text mining retrieval and their quotation relation information like a bird's-eye view. The system is also provided with a means for storing the database (document database) necessary for evaluation in the latest state. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、テキストマイニング技術に関し、ライフサイエンス分野における学術論文の検索において、公開されている文献情報を元に、テキストマイニング技術と文献間の引用関係に基づいて、取得したい論文群の検索を行なう技術に関する。 The present invention relates to a text mining technology, and in searching for academic papers in the life science field, searches for a group of papers to be acquired based on the citation relationship between the text mining technology and the documents based on published document information. Regarding technology.

ライフサイエンス分野における実験の成果は、バイオデータベースという名のもとに塩基配列データや蛋白質アミノ酸配列データ、蛋白質立体構造データ、生体分子間の相互作用データなど、様々な情報の多くがテキスト形式の文書で保存・公開されている。 The results of experiments in the field of life science are mostly text documents in the form of biodatabases, such as base sequence data, protein amino acid sequence data, protein tertiary structure data, and interaction data between biomolecules. Saved and published in

これらのデータには、アノテーション（注釈）情報や文献情報が付随しており、最近では、これらテキスト形式データに対してテキストマイニング技術を使用して公開されている大量のデータの中から必要な情報を抽出する手段が使われるようになってきている。 These data are accompanied by annotation (annotation) information and bibliographic information, and recently, necessary information from a large amount of data published using text mining technology for these text format data. The means to extract is becoming used.

しかしながら、世の中で公開されている膨大なデータは、対象とするデータ量が膨大であることに加えて、その特性も性質も異なるデータが多いため、システムによる１回目の検索結果だけでは、満足な情報収集が難しく、利用者の判断によってスクリーニングを複数回繰り返して必要なデータを必要な量だけ入手している。特に、文献情報の取得に関しては、キーワードによる検索結果の中のある文献の引用情報を辿って必要な関連文献を収集するという作業も混在するため、最終的に収集された文献群は、どのような条件や方法で収集されたものであるのかを再度整理することが困難となっているという現実がある。多種多様な検索サービスを利用して収集した場合には特にこのような状況に陥りやすい。 However, the enormous amount of data disclosed in the world is not only satisfying the enormous amount of data, but also because there are many data with different characteristics and properties, so only the first search result by the system is satisfactory. It is difficult to collect information, and the necessary amount of data is obtained by repeating screening multiple times at the discretion of the user. In particular, regarding the acquisition of document information, the work of collecting the necessary related documents by tracing the citation information of a certain document in the search result by keyword is mixed, so what is the finally collected document group? There is a reality that it is difficult to re-arrange what is collected by various conditions and methods. This situation is particularly likely to occur when data is collected using a wide variety of search services.

尚、文献の引用情報を辿って関連文献を探し出す手法は、短い期間で数多くの文献が出版される分野では公開された文献が他文献から引用されるのも短期間で行なわれることから、有意な被引用数を得ている文献を分析して収集すべき文献を探し出す手法として用いられているものである。特に医学や生物学といった分野を包含するライフサイエンス分野は、短い期間で数多くの文献が出版される分野であり、文献の引用情報を行使した文献収集に有効である。 It should be noted that the method of searching related documents by tracing the citation information of documents is significant in the field where many documents are published in a short period of time because published documents are cited from other documents in a short period of time. It is used as a method for searching for documents to be collected by analyzing documents that have obtained a large number of citations. In particular, the life science field including fields such as medicine and biology is a field in which a large number of documents are published in a short period of time, and is effective for collecting documents that use citation information of documents.

尚、テキストマイニングとは、大規模なテキスト・データベースを、コンピュータを使って様々な観点から分析し、役に立つ知識や情報を効率良く取り出す技術である。テキストマイニングには、自然言語処理や情報の可視化など複数の要素技術が組み合わされている。テキストマイニングを使うことで、欲しい情報を含むテキストを選び出したり、テキスト間の関係やテキストに記述されている事項間の関係を分析して、個々のテキストを読むだけでは得られない情報を得たり、与えられた文章集合を特徴づけるキーワードを抽出したりすることが期待されている。このようなテキストマイニングは、顧客要求分析などへの適用で注目されているが、MEDLINEのような大規模な文書データベースが自由に利用できることから、医薬分野でのテキストマイニングの研究は盛んになりつつある。 Note that text mining is a technique for efficiently extracting useful knowledge and information by analyzing a large-scale text database from various viewpoints using a computer. Text mining combines multiple elemental technologies such as natural language processing and information visualization. By using text mining, you can select texts that contain the information you want, analyze the relationships between texts and the relationships between items described in the text, and obtain information that cannot be obtained by reading individual texts. It is expected to extract keywords that characterize a given sentence set. Such text mining is attracting attention for its application to customer requirement analysis, etc., but since large document databases such as MEDLINE can be freely used, research on text mining in the pharmaceutical field is becoming active. is there.

テキストマイニング方法に関する文献としては、下記特許文献１がある。 As a document related to the text mining method, there is Patent Document 1 below.

特開２００１−３１８９４８号公報JP 2001-318948 A

従来は、上述のように、収集対象のデータの中身に記載されているテキスト情報から直接的に特徴を抽出するテキストマイニング検索と、ある文献の引用関係を辿って他の文献情報を探し出す処理とを、別々に繰り返し行う必要があり、最終的に収集された文献がどのような条件と方法、どのような順序で収集されたのかに関して個別に管理しなければならないという問題があった。 Conventionally, as described above, a text mining search that directly extracts features from text information described in the contents of data to be collected, and a process of searching for other document information by tracing the citation relationship of a document There is a problem in that it is necessary to separately and separately manage the conditions, methods, and order in which the finally collected documents are collected.

本発明は、上記のような処理の負担を軽減することを目的とする。 An object of the present invention is to reduce the burden of the above processing.

本発明の一観点によれば、クライアント端末に対して文献検索のためのキーワードの入力を促すためのユーザインタフェースを提供し、入力された前記キーワードをもとにテキストマイニング検索処理を行なうための条件を受付けるテキストマイニング条件入力処理部と、受付けた条件に基づいて、文献データベースに直接的にテキストマイニングするテキストマイニング処理部と、テキストマイニングの処理結果によって抽出された文献の引用関係情報を、文献引用情報のみを抽出して作成した文献引用関係テーブルであって直接引用関係を文献同士の対にして整理したテーブルに基づいて解析する文献引用関係解析処理部と、前記テキストマイニング処理と前記文献引用関係解析処理とに基づいて、引用文献数と、被引用文献数と、キーワードに対応するテキストマイニングにより抽出された文献数と、を、文献毎に区別して表示する文献別表示制御を行う表示制御部とを有することを特徴とするテキストマイニング装置が提供される。 According to an aspect of the present invention, a user interface for prompting a client terminal to input a keyword for document search is provided, and a condition for performing a text mining search process based on the input keyword The text mining condition input processing unit that accepts the text, the text mining processing unit that directly mines the text into the literature database based on the accepted condition, and the citation relation information of the literature extracted by the processing result of the text mining A document citation relationship table created by extracting only information, and analyzing based on a table in which direct citation relationships are arranged as pairs of documents, and the text mining processing and the document citation relationship Based on the analysis process, the number of cited references, the number of cited references, and the key A number of documents that were extracted by the text mining corresponding to over-de, a text mining apparatus is provided, characterized in that it comprises a display control unit for performing document-specific display control for displaying in distinction for each document.

前記表示制御部は、ある文献を特定すると、該文献を引用する関係にある引用文献について、引用関係を階層的に表示する階層的表示制御を行うことが好ましい。また、前記表示制御部は、前記階層的表示と、前記文献別表示とを、同じ画面上で、又は、切り替えて表示する制御を行うことが好ましい。 When the display control unit specifies a document, it is preferable to perform hierarchical display control for hierarchically displaying the citation relationship with respect to the cited document having a relationship in which the document is cited. Moreover, it is preferable that the said display control part performs the control which displays the said hierarchical display and the said literature display on the same screen, or switching.

本発明の他の観点によれば、クライアント端末に対して文献検索のためのキーワードの入力を促すためのユーザインタフェースを提供し、入力された前記キーワードをもとにテキストマイニング検索処理を行なうための条件を受付けるステップと、受付けた条件に基づいて、文献データベースに直接的にテキストマイニングするステップと、テキストマイニングの処理結果によって抽出された文献の引用関係情報を、文献引用情報のみを抽出して作成した文献引用関係テーブルを作成するステップと、前記テキストマイニング処理と前記文献引用関係解析処理とに基づいて、引用文献数と、被引用文献数と、キーワードに対応するテキストマイニングにより抽出された文献数と、を、文献毎に区別して表示するステップと、を有することを特徴とするテキストマイニング方法が提供される。 According to another aspect of the present invention, a user interface for prompting a client terminal to input a keyword for literature search is provided, and a text mining search process is performed based on the input keyword. Based on the accepted conditions, the step of text mining directly into the literature database based on the accepted conditions, and the citation relation information of the literature extracted by the processing result of text mining by extracting only the literature citation information The number of cited documents, the number of cited documents, and the number of documents extracted by text mining corresponding to the keyword based on the step of creating the document citation relationship table, the text mining process, and the document citation relationship analysis process And a step of distinguishing and displaying each document. Text mining method according to is provided.

また、評価に必要なバイオデータベース（文献データベース）を最新の状態で保持しておく手段を備えていても良い。 Moreover, a means for keeping a biodatabase (document database) necessary for evaluation in the latest state may be provided.

本発明は、上記ステップをコンピュータに実行させるためのプログラム、該プログラムを記録するコンピュータ読みとり可能な記録媒体であっても良い。 The present invention may be a program for causing a computer to execute the above steps, or a computer-readable recording medium for recording the program.

本発明によれば、特定テーマに関する文献の収集を行う際に、テキストマイニングによる文献検索とともに、文献引用情報のみを抽出して作成した文献引用関係テーブルを利用して、被引用文献数による有用な文献の検索により、有用な文献を収集することができる。従って、文献収集プロセスに費やす時間と労力の省力化とを図ることが可能となる。また、文献引用関係テーブルにより引用関係を俯瞰的に分析する表示を簡単に行うことができる。 According to the present invention, when collecting documents related to a specific theme, using a document citation relationship table created by extracting only document citation information, together with document search by text mining, it is useful according to the number of cited documents. By searching for documents, useful documents can be collected. Therefore, it is possible to save time and labor for the document collection process. In addition, it is possible to easily perform a display for analyzing the citation relationship from the literature citation relationship table.

以下、本発明の一実施の形態によるテキストマイニング技術について、図面を参照しながら説明を行う。 Hereinafter, a text mining technique according to an embodiment of the present invention will be described with reference to the drawings.

図１は、本発明の一実施の形態によるテキストマイニング装置を含むシステム構成例を示す機能ブロック図である。本実施の形態によるシステムは、利用者がシステムを利用するためにキーワードの入力や評価結果の参照などを行なうためのクライアントコンピュータ（以下、単に「クライアント」という。）３と、テキストマイニングを行なうサーバコンピュータ（以下、単に「サーバ」という。）１と、文献情報を保持する文献データベース５と、文献引用関係を索引できる文献引用関係索引テーブル６と、を備え、それぞれがネットワーク２によって接続されている。 FIG. 1 is a functional block diagram showing a system configuration example including a text mining device according to an embodiment of the present invention. The system according to the present embodiment includes a client computer (hereinafter simply referred to as “client”) 3 for a user to input keywords and refer to evaluation results in order to use the system, and a server for text mining. A computer (hereinafter simply referred to as “server”) 1, a document database 5 that stores document information, and a document citation relationship index table 6 that can index document citation relationships are connected to each other by a network 2. .

サーバ１には、利用者に対して文献収集のために必要となるキーワードの入力を促すためのインタフェースの提供とそのキーワードをもとにテキストマイニング検索を行なうための条件を受付けるテキストマイニング条件受付手段として条件入力処理部９と、受付けた条件によって文献データベース５にテキストマイニングする手段としてテキストマイニング処理部８と、テキストマイニング結果によって抽出された文献の引用関係が格納されている文献引用関係索引テーブル６と、文献の引用関係を解析する手段として文献引用関係解析処理部７と、これらの結果を複合的に評価して表示する手段として結果出力処理部１０と、評価に必要な文献データベースを最新の状態で保持しておく手段として文献データベース更新処理部４と、を備える。条件入力処理部９は、クライアント３からの入力を受けてテキストマイニング処理部８に渡すとともに、クライアント３に対してＧＵＩを提供する。結果出力処理部１０は、ディスプレイと、ディスプレイに表示するための表示制御を行う表示制御部と、を含む。実際には、上記処理部は、ＣＰＵなどのマイクロコンピュータにより、メモリ内に格納されたプログラムをＲＡＭ上に展開して制御処理を行うのが一般的である。 The server 1 provides an interface for prompting a user to input a keyword necessary for collecting documents, and a text mining condition receiving means for receiving a condition for performing a text mining search based on the keyword. As a condition input processing unit 9, a text mining processing unit 8 as means for text mining in the document database 5 according to the accepted conditions, and a document citation relation index table 6 in which the citation relations of documents extracted by the text mining result are stored. The document citation relationship analysis processing unit 7 as a means for analyzing the citation relationship of documents, the result output processing unit 10 as a means for evaluating and displaying these results in combination, and the document database necessary for the evaluation are updated. The document database update processing unit 4 is provided as means for maintaining the state in a state. That. The condition input processing unit 9 receives an input from the client 3 and passes it to the text mining processing unit 8 and provides a GUI to the client 3. The result output processing unit 10 includes a display and a display control unit that performs display control for displaying on the display. In practice, the processing unit generally performs control processing by developing a program stored in a memory on a RAM using a microcomputer such as a CPU.

尚、文献データベース更新処理部４により文献データベース５が更新されるとき、文献引用関係索引テーブル６も同期を取る。これにより、文献中に含まれる文献引用情報を取得して文献引用関係索引テーブル６の更新も同時に行われる。 When the document database 5 is updated by the document database update processing unit 4, the document citation relation index table 6 is also synchronized. As a result, the document citation information included in the document is acquired and the document citation relation index table 6 is updated at the same time.

図２は、文献データベース５内に格納されているテーブルの一構成例を示す図である。図２に示す例では、文献ＩＤ２１と、各文献ＩＤに対する題名２２と本文２３と投稿先雑誌名２４と、が組で格納されている。この例では、題名（タイトル）、本文、投稿先雑誌名を文献情報としてデータベースに格納しているが、加えて、アブストラクト、作者、出版年なども加えて格納しても良いし、本文の代わりにアブストラクトのみを格納するようにしても良い。 FIG. 2 is a diagram illustrating a configuration example of a table stored in the document database 5. In the example illustrated in FIG. 2, a document ID 21, a title 22, a body 23, and a posting destination magazine name 24 for each document ID are stored in pairs. In this example, the title (title), text, and journal name are stored in the database as document information, but in addition, the abstract, author, year of publication, etc. may be added and stored instead of the text. It is also possible to store only the abstract.

図３は、本実施の形態による文献データベース５中の本文情報から文献引用情報のみを抜き出して作成した文献引用関係索引テーブルの一構成例を示す図である。図３に示す例では、文献ＩＤ３１と、各文献ＩＤ３１の中で引用されている引用文献のＩＤである引用文献ＩＤ３２との組により構成されている。この図から、例えば、文献１は、文献１１から文献１４において引用されていることがわかる。また、文献１２は、文献２１、２１において引用されていることがわかる。図３においては、直接引用の文献同士の関係のみを対にして予め作成しておく。被引用文献（文献）の関連情報を解析する場合は、元となる文献ＩＤを引用文献ＩＤフィールド３２に対して検索することで可能である。 FIG. 3 is a diagram showing a configuration example of a document citation relation index table created by extracting only document citation information from the text information in the document database 5 according to the present embodiment. In the example illustrated in FIG. 3, the document ID 31 includes a set of a cited document ID 32 that is an ID of a cited document cited in each document ID 31. From this figure, for example, it can be seen that Document 1 is cited in Documents 11 to 14. It can also be seen that Document 12 is cited in Documents 21 and 21. In FIG. 3, only the relationship between directly cited documents is created in advance. When analyzing the related information of the cited document (document), it is possible to search the cited document ID field 32 for the original document ID.

図４は、文献収集のために必要となるキーワードの入力を促すためのユーザインタフェースの一例を示す図である。図４に示す例では、キーワード入力画面４１を、実験仮説の検証に必要なキーワード入力フィールド群４２と、入力された情報をサーバに送信するためのボタン４３と、により構成する。また、図４に示す例では、実験仮説の検証に必要なキーワードの入力を分かりやすく促すことができるように、分野、実験手法、事象などのカテゴリ毎に、利用者にキーワード入力を促す構成としている。その他、実験実施機関や、投稿先雑誌名、著者名なども入力候補として追加しても良い。また、これらカテゴリ毎に入力されたキーワードについて、例えば、著名な投稿先雑誌名や、著名な学者などに対して、その著名度に応じて優先的に扱う（高い重み付けを行う。）を加えられるインタフェースを用意しても良い。入力欄４２にキーワードを入力した後に、送信ボタン４３を押すことで、サーバ１に入力したキーワードが送られる。 FIG. 4 is a diagram illustrating an example of a user interface for prompting input of keywords necessary for collecting documents. In the example shown in FIG. 4, the keyword input screen 41 includes a keyword input field group 42 necessary for verifying the experimental hypothesis and a button 43 for transmitting the input information to the server. Further, in the example shown in FIG. 4, in order to facilitate the input of keywords necessary for the verification of the experimental hypothesis, the user is prompted to input the keywords for each category such as field, experiment method, and event. Yes. In addition, you may add an experiment execution organization, the name of a contribution destination magazine, an author name, etc. as an input candidate. In addition, with respect to the keywords input for each category, for example, preferential treatment (high weighting) is added according to the degree of prominence for a famous posting destination magazine name, a prominent scholar, or the like. An interface may be prepared. After the keyword is input in the input field 42, the input keyword is sent to the server 1 by pressing the send button 43.

図５は、テキストマイニング結果と引用文献情報とを表示する画面の例を示した図である。図５に示すように、結果表示画面５１には、入力されたキーワードに基づくテキストマイニングによって抽出されリスト表示された文献群５３と、入力された第１のキーワード（第１のキーワードと関連性のある第２のキーワードを含めても良い。）群５２と、を行列表示した２次元マトリックスＸ１によって各文献の特徴を現すことができる。図５に示す表示例では、文献毎に第１・第２のキーワードの出現頻度が数値として表されている。さらに、マトリックスＸ１の左側には、文献毎の各文献情報が表示されている。この各文献情報欄には、その文献を特定する文献ＩＤと投稿先の雑誌名とを示すフィールド５４と、文献引用関係の解析結果を示すフィールド５５と、が表示される。ここで、フィールド５５において、引用文献数は、文献１の末尾などに記載されており、文献１で引用している他の文献の数である。例えば、文献１において引用した文献数が３であることを示している。加えて、各引用文献毎に、文献１において何カ所で引用しているかを次の階層で表示できるようにしても良い。 FIG. 5 is a diagram illustrating an example of a screen that displays a text mining result and cited document information. As shown in FIG. 5, the result display screen 51 includes a document group 53 extracted and displayed in a list by text mining based on the input keyword, and the input first keyword (relationship between the first keyword and the first keyword). A certain second keyword may be included.) A feature of each document can be expressed by a two-dimensional matrix X1 in which the group 52 is displayed in a matrix form. In the display example shown in FIG. 5, the appearance frequencies of the first and second keywords are represented as numerical values for each document. Further, each document information for each document is displayed on the left side of the matrix X1. In each document information column, a field 54 indicating a document ID for identifying the document and a journal name of a posting destination, and a field 55 indicating an analysis result of the document citation relationship are displayed. Here, in the field 55, the number of cited documents is described at the end of the document 1, etc., and is the number of other documents cited in the document 1. For example, the number of documents cited in Document 1 is 3. In addition, for each cited document, the number of places cited in document 1 may be displayed on the next layer.

被引用文献数は、他の文献により直接引用されている場合のその文献数を示す。例えば、文献１を引用している他の文献数が３５であり、文献１がかなり重要な文献であると推測することができる。 The number of cited references indicates the number of documents when directly cited by other documents. For example, the number of other documents citing document 1 is 35, and it can be estimated that document 1 is a fairly important document.

この画面５１に表示されている文献情報とその特徴を元に、さらに詳細な文献検索を行なうためのボタン（Ｖｉｅｗ）５６と、検索結果の中から利用者が必要と判断して文献ＩＤ毎にチェックを入れることのできるフィールド５７と、文献引用関係を辿って必要と判断した文献ＩＤの関係を、引用関係上で俯瞰的に表示させるための画面を呼び出すボタン５９が設けられている。尚、表示画面は、スクロールボタン５０ａ・５０ｂにより、ＸＹ方向にスクロールさせることができる。 Based on the document information displayed on the screen 51 and its features, a button (View) 56 for performing a more detailed document search, and the user determines that the search result is necessary for each document ID. There is provided a button 59 for calling up a screen for displaying the relationship between the field 57 that can be checked and the document ID determined to be necessary by tracing the document citation relationship on the citation relationship. The display screen can be scrolled in the XY directions by using the scroll buttons 50a and 50b.

２次元マトリックスＸ１は、キーワード毎にテキストマイニングにより抽出された文献数を表示させる構成により、あるキーワードに着目した場合のテキストマイニングの処理結果の精度を抽出文献数（キーワードの出現頻度）が高ければ精度が高いと推測することができる。さらに、文献のどの領域に出現しているかを解析し、その結果も表示可能としても良い。例えば、従来技術の説明箇所に多く出現しているのか、実験結果などの重要部分において出現頻度が高いのかがわかれば、精度に関する推測結果が正確になる。 The two-dimensional matrix X1 is configured to display the number of documents extracted by text mining for each keyword, so that the accuracy of the text mining processing result when attention is focused on a certain keyword is high when the number of extracted documents (keyword appearance frequency) is high. It can be assumed that the accuracy is high. Further, it is possible to analyze in which area of the document the result appears and display the result. For example, if it is known whether there are many occurrences in the explanation part of the prior art or whether the appearance frequency is high in an important part such as an experimental result, the estimation result regarding accuracy becomes accurate.

また、図には示されていないが、例えば４つのキーワードにより抽出された文献の総数を表示させて、あるキーワード群による抽出結果の精度の目安を得ることができるようにしても良い。ここでは、抽出された文献の総数が文献１から４まで順番に、１００、９０、７０、５０であるとする（スクロールボタン５０ａにより右スクロールをすると、この総数が表示される）。 Further, although not shown in the figure, for example, the total number of documents extracted by four keywords may be displayed so that a guideline of accuracy of extraction results by a certain keyword group can be obtained. Here, it is assumed that the total number of extracted documents is 100, 90, 70, 50 in order from Documents 1 to 4 (when the scroll button 50a is used to scroll right, this total number is displayed).

尚、図５の表示例では、テキストマイニング検索の結果により、ユーザが入力したキーワードを多く含んでいる文献の順番でリスト表示されているが（この場合は総数）、有意な被引用数を得ている文献を分析して収集すべき文献を探し出すために、被引用文献フィールド（５５）でソート表示できるようにしても良い。例えば、被引用文献数を設定する領域を設け（図４で設けても良い）、被引用文献数が３０以上の文献（ここでは、文献１と文献３）のみを表示させるソート機能を設けても良い。 In the display example of FIG. 5, the list is displayed in the order of documents including a large number of keywords input by the user (in this case, the total number) according to the result of the text mining search. In order to search for documents to be collected by analyzing existing documents, it may be possible to sort and display in the cited document field (55). For example, an area for setting the number of cited references is provided (may be provided in FIG. 4), and a sorting function for displaying only documents having a number of cited references of 30 or more (here, documents 1 and 3) is provided. Also good.

図６から図８までは、文献データベース５に対して文献収集に必要なキーワードによるテキストマイニング検索を行なった結果の文献を用いて、文献引用関係を辿って必要な文献を評価しながら収集していく過程を示す表示画面インタフェースを示す図であり、これを参照しながら処理の流れと画面インタフェースについて説明する。 6 to 8 are collected while evaluating the necessary documents by tracing the document citation relationship using the documents obtained as a result of the text mining search performed on the document database 5 using the keywords necessary for document collection. It is a figure which shows the display screen interface which shows a process, and demonstrates the flow of a process and a screen interface, referring this.

図６は、特定のキーワードによってテキストマイニング検索を行なった結果を表示する画面例である。図６に示す表示画面５１においては、テキストマイニング検索の結果、ユーザが入力したキーワードをより多く含んでいる文献である文献ＩＤが「１」である文献１、を引用している文献の情報を参照する目的で、文献１のチェック欄５７にチェックを入れ、このラインを選択状態とし、さらに詳細な文献検索を行なうボタン５６（Ｖｉｅｗ）をポインタ６１でクリックしようとしている様子を示す表示画面である。引用文献ＩＤ表示領域５８と選択文献関連表示ボタン５９とについては後述する。 FIG. 6 is an example of a screen that displays the result of text mining search using a specific keyword. In the display screen 51 shown in FIG. 6, as a result of the text mining search, information on documents citing document 1 having a document ID “1” that includes more keywords input by the user is displayed. For the purpose of reference, this is a display screen showing a state where a check is made in the check column 57 of document 1, this line is selected, and a button 56 (View) for performing a more detailed document search is clicked with the pointer 61. . The cited document ID display area 58 and the selected document related display button 59 will be described later.

図７は、図６の画面からの遷移画面であり、図６において選択された文献ＩＤ「１」を引用している文献群を文献毎の特徴をあわせて抽出し表示した例を示す図である。引用されている文献の文献ＩＤはフィールド７１に表示されており、当該文献（ＩＤ１）を引用している文献群を示すリスト７２において、その検索結果を表示している画面５１上で、文献１を被引用している文献のリストが３５個表示され（図では、４文献のみが示されており、スクロールにより全ての文献を見ることができる）、これらの３５文献中において、収集すべき文献として有意な被引用数（例えば１５以上を有意とする）を得ているのが文献１１および文献１２であることを知る。ここで、文献１１及び文献１２を５７のチェック欄でチェックすることで選択するとともに、これまでに文献引用関係を辿って文献収集を行なった状況を俯瞰的に把握するための画面を表示するための選択文献関連表示ボタン５９をポインタ７３によりクリックしようとしている画面である。特定のキーワードにより抽出された文献数に着目して文献をチェックすることも可能である。尚、被引用文献数にカウントされている文献は、直接引用されている文献のみである。 FIG. 7 is a transition screen from the screen of FIG. 6, and is a diagram showing an example of extracting and displaying a document group quoting the document ID “1” selected in FIG. is there. The document ID of the cited document is displayed in the field 71. In the list 72 indicating the document group quoting the document (ID1), the document 1 is displayed on the screen 51 displaying the search result. A list of 35 references cited is displayed (in the figure, only 4 references are shown, and all references can be viewed by scrolling). It is known that Document 11 and Document 12 obtain a significant number of citations (for example, 15 or more are significant). Here, the document 11 and the document 12 are selected by checking them in the check box 57, and a screen for grasping the situation in which the documents are collected by tracing the document citation relationship so far is displayed. The selected document related display button 59 is clicked by the pointer 73. It is also possible to check documents by paying attention to the number of documents extracted by a specific keyword. Note that the documents counted in the number of cited references are only directly cited documents.

図７において文献１１と１２をチェックするという行為の意味は、図５、図６、図７のように、順次、特定の文献の被引用文献をリスト表示して調査していくという作業において、自分の収集したい文献の候補となるような文献にチェックを入れておくことで、後でそれらの選択された文献がどのような引用関係によるものだったのかを整理することができることである。 The meaning of the act of checking the documents 11 and 12 in FIG. 7 is the work of sequentially displaying the cited documents of a specific document as a list as shown in FIGS. By checking the documents that are candidates for the documents that you want to collect, you can sort out what citation relationship the selected documents were based on later.

図８は、文献引用関係を辿って文献収集を行なった状況を俯瞰的に把握することができる引用関係の階層表示画面８１と、その表示画面８１内の特定文献を指定することでその文献の詳細情報を表示することができる表示画面８６とを同時に表示している様子を示す図である。文献引用関係を辿って文献収集を行なった状況を俯瞰的に把握するための引用関係の階層表示画面８１は、文献ＩＤと投稿先雑誌名、被引用文件数で構成される文献を示すボックス８２と文献引用関係を示す矢印線８３とで構成されている。 FIG. 8 shows a hierarchical display screen 81 of citation relationships that allows a bird's-eye view of the state of collecting documents by tracing the document citation relationships, and by specifying a specific document in the display screen 81, It is a figure which shows a mode that the display screen 86 which can display detailed information is displayed simultaneously. A hierarchy display screen 81 for citation relations for comprehensively grasping the state of collecting documents by tracing the citation relations of documents is a box 82 indicating a document composed of a document ID, a journal name to be posted, and the number of cited articles. And an arrow line 83 indicating a document citation relationship.

尚、当画面例の文献ボックス８２には、最初のテキストマイニング結果のリスト（図６）の上でチェックを行った「文献１」が示されており、この文献１を起点とした文献の引用関係全てが図８に表示される。その中で、チェック選択された文献がどれであったかを整理して把握できるように文献のボックスの色を変えて表示するようになっている。図においては、色が変わっているのは、文献１、文献１１・１２、文献２２である。 In the document box 82 of this screen example, “Document 1” checked on the first list of text mining results (FIG. 6) is shown. All the relationships are displayed in FIG. Among them, the color of the document box is changed and displayed so that it is possible to organize and grasp which document was checked and selected. In the figure, the colors are changed in Reference 1, References 11 and 12, and Reference 22.

このように、大元となる文献を示し、その文献を引用している文献群を示す文献ボックス群８４と、それら文献をさらに引用している文献群を示す文献ボックス８５が画面８１に表示されている。文献に関する被引用関係を示す矢印は、図３に示す文献引用関係索引テーブルに基づいて作成することができる。予め、図３に示す文献引用関係索引テーブルを作成しておくことで、図８に示す表示を迅速に行うことができる。また、文献を切り替えたり、表示を切り替えたりする処理が迅速に行えるという利点がある。ここで、文献１２に関して、欄８５に示すように直接引用される文献が矢印によって関連付けされている。この画面８１において、矢印を辿ることで文献の直接引用関係を俯瞰的に知ることができる。 Thus, a document box group 84 indicating a document group that indicates a document that is a source and that cites the document, and a document box 85 that indicates a document group that further cites the document is displayed on the screen 81. ing. An arrow indicating a cited relationship regarding a document can be created based on the document citation relationship index table shown in FIG. By preparing the document citation relation index table shown in FIG. 3 in advance, the display shown in FIG. 8 can be performed quickly. In addition, there is an advantage that processing for switching documents or switching display can be performed quickly. Here, regarding the document 12, as shown in the column 85, directly cited documents are associated by arrows. On this screen 81, the direct citation relationship between documents can be seen from a bird's-eye view by following the arrows.

表示画面８６は、例えば表示画面８１において文献１２をポインタなどで選択・決定することで表示される。この表示画面８１においては、文献１２に関する図７と同じ表示が行われる。この表示画面５１においては、文献１２を引用している文献２１から文献２４まで（図３参照）が、文献１２に対して矢印で指示している。このように、図８において、引用関係に関する俯瞰的な表示画面８１と、ある文献に関する引用関係とキーワード検索による結果とを示す詳細文献情報表示８６とを、同時に又は切り替えながら見ることができる。 The display screen 86 is displayed by selecting / determining the document 12 with a pointer or the like on the display screen 81, for example. On the display screen 81, the same display as that of FIG. On this display screen 51, documents 21 to 24 (see FIG. 3) citing document 12 indicate the document 12 with arrows. In this way, in FIG. 8, it is possible to view the panoramic display screen 81 regarding the citation relationship and the detailed document information display 86 showing the citation relationship regarding a certain document and the result of the keyword search simultaneously or while switching.

図９は、本実施の形態によるテキストマイニング装置における処理の流れを示すフローチャート図である。 FIG. 9 is a flowchart showing the flow of processing in the text mining apparatus according to this embodiment.

図９に示すように、利用者が文献収集のためのキーワード入力をクライアントから行ったことを受けて（ステップ９１）、クライアントから送信されたキーワードをサーバが受信し、入力されたキーワードをマイニング条件にして文献データベースに対してテキストマイニング処理を実行する（ステップ９２）。 As shown in FIG. 9, in response to the user inputting a keyword for document collection from the client (step 91), the server receives the keyword transmitted from the client, and the input keyword is set to the mining condition. Then, the text mining process is executed on the document database (step 92).

入力されたキーワードを特徴とする文献データリストをテキストマイニングによって作成し、各文献データに含まれる引用文献情報（引用文献及び被引用文献）を分析（ステップ９３）した後に、入力されたキーワード（および入力キーワードと関連性のあるキーワード）と、テキストマイニングによって抽出された文献群と、を2次元マトリックスによって各文献の特徴を現すとともに、その文献の引用文献数および被引用文献数を含む結果を表示させる（ステップ９４）。この画面上の情報をもとに評価を行う参照とすることができる。更に被引用文献の情報を参照したい旨の利用者からの入力を受けると、画面上の被引用文献の詳細表示ボタンをクリックすることにより、指定された被引用文献を元に、その論文の引用関係を解析処理し、その文献の引用文献数および被引用文献数を含む結果を表示する。これら一連の作業によって、引用文献を辿って必要な文献を収集する場合には、ステップ９５の処理が繰り返される。 A document data list characterized by the input keyword is created by text mining, and after the cited document information (cited document and cited document) included in each document data is analyzed (step 93), the input keyword (and The keywords that are related to the input keyword) and the group of documents extracted by text mining show the characteristics of each document using a two-dimensional matrix, and display the results including the number of cited references and the number of cited references. (Step 94). It can be used as a reference for evaluation based on information on this screen. In addition, upon receiving an input from the user that they want to refer to the information of the cited reference, click on the detail display button of the cited reference on the screen to cite the article based on the specified cited reference. The relationship is analyzed, and the result including the number of cited references and the number of cited references of the document is displayed. When the necessary documents are collected by tracing the cited documents through these series of operations, the process of step 95 is repeated.

また、この操作によって、これまでに参照した文献の引用関係を俯瞰的に参照したい旨のユーザからの入力を受けると、これまでに参照してきた文献情報とそれらの引用関係とを俯瞰的に表現する画面を表示することができる（ステップ９６）。尚、その文献の引用文献数および被引用文献数を含む結果の表示画面と、参照してきた文献情報とそれらの引用関係とを俯瞰的に表現する画面とを、同時に、或いは、切り替えて表示させることができる。 Also, when this operation receives an input from the user that gives a bird's-eye view of the citation relationship of the documents referred to so far, the document information that has been referred to so far and their citation relationship are expressed in a bird's-eye view. The screen to be displayed can be displayed (step 96). A result display screen including the number of cited documents and the number of cited references of the document, and a screen that provides a bird's-eye view of the referenced document information and their citation relationship are displayed simultaneously or by switching. be able to.

以上に説明したように、本実施の形態によるテキストマイニング装置によれば、特定テーマに関する文献の収集の際に、テキストマイニングによる文献検索とともに、被引用文献数による有用な文献の検索を行うことで有用な文献を収集することができる。また、その引用関係を俯瞰的に分析することができることように構成することで、文献収集プロセスに費やす時間と労力の省力化を図ることが可能となる。 As described above, according to the text mining device according to the present embodiment, when collecting documents related to a specific theme, by searching for useful documents based on the number of cited references, together with searching documents by text mining. Useful literature can be collected. Further, by configuring the citation relationship so that it can be analyzed from a bird's-eye view, it is possible to save time and labor required for the document collection process.

本発明は、テキストマイニング装置に利用可能である。 The present invention can be used for a text mining apparatus.

本発明の一実施の形態によるテキストマイニング装置の一構成例を示す概念図である。It is a conceptual diagram which shows the example of 1 structure of the text mining device by one embodiment of this invention. 本実施の形態による文献データベースの一構成例を示す図である。It is a figure which shows the example of 1 structure of the literature database by this Embodiment. 本実施の形態による文献引用索引情報の一構成例を示す図である。It is a figure which shows the example of 1 structure of the literature quotation index information by this Embodiment. テキストマイニング装置における文献収集のために必要となるキーワードの入力を促すためのインタフェースの一例を示す図である。It is a figure which shows an example of the interface for prompting | inputting the keyword required for document collection in a text mining apparatus. テキストマイニング結果と各文献の被引用文献情報とを複合的に評価して表示する画面の一例を示す図である。It is a figure which shows an example of the screen which compoundly evaluates and displays the text mining result and the cited reference information of each document. 文献引用関係を辿って必要な文献を評価しながら収集していくための画面インタフェースの流れと画面インタフェースを示す図である。It is a figure which shows the flow and screen interface of a screen interface for tracing and collecting required literatures by tracing a literature citation relationship. 図６において選択された文献を引用する文献に関する情報を示す表示例である。It is a display example which shows the information regarding the literature which cites the literature selected in FIG. 図７において表示された文献のうち選択された文献に関する引用関係を俯瞰的に示す図と、その中から選択された文献に関する情報を示す表示例である。8 is a diagram showing a bird's-eye view of a citation relationship related to a selected document among the documents displayed in FIG. 7 and a display example showing information related to a document selected from the citation relationship. 本実施の形態によるテキストマイニング装置における処理の流れを示すフローチャート図である。It is a flowchart figure which shows the flow of a process in the text mining device by this Embodiment.

Claims

A text mining condition input processing unit that provides a user interface for prompting a client terminal to input a keyword for literature search, and that accepts a condition for performing a text mining search process based on the input keyword; ,
Based on the accepted conditions, a text mining processing unit that directly texts the document database,
Analyzing citation relation information of documents extracted based on text mining processing results based on a citation relation table created by extracting only citation information and arranging direct citation relations as pairs of documents. A document citation relationship analysis processing unit,
Based on the text mining process and the document citation relationship analysis process, the number of cited documents, the number of cited documents, and the number of documents extracted by text mining corresponding to a keyword are displayed separately for each document. A text mining device comprising: a display control unit that performs display control for each document.

The display control unit
2. The text mining apparatus according to claim 1, wherein when a certain document is specified, hierarchical display control is performed to hierarchically display the citation relationship for a cited document having a relationship of quoting the document.

The text mining device according to claim 2, wherein the display control unit performs control to display the hierarchical display and the display by document on the same screen or by switching.

Providing a user interface for prompting a client terminal to input a keyword for literature search, and receiving a condition for performing a text mining search process based on the input keyword;
Text mining directly into the bibliographic database based on the accepted conditions;
Creating a document citation relationship table created by extracting only the document citation information from the document citation information extracted by the text mining processing result;
Based on the text mining process and the document citation relationship analysis process, the number of cited documents, the number of cited documents, and the number of documents extracted by text mining corresponding to a keyword are displayed separately for each document. And a text mining method.