JP2021114167A

JP2021114167A - Document management/viewing system and annotation text display method thereof

Info

Publication number: JP2021114167A
Application number: JP2020006803A
Authority: JP
Inventors: 祐乃福島; Yuno Fukushiima; 駿介川端; Shunsuke Kawabata
Original assignee: Toppan Forms Co Ltd
Current assignee: Toppan Edge Inc
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2021-08-05

Abstract

To provide a document management/viewing system that can display appropriate annotation text for a term when the term included in a document is selected and that can reduce labor required for the creation of the annotation text.SOLUTION: A document management/viewing system 10 includes: a first storage unit 11 that stores a document; and a control unit 13 that performs a process of displaying the document stored in the first storage unit 11 to a user, and for a term selected by the user in the document displayed to the user, performs a process of displaying annotation text corresponding to the term to the user. As the annotation text, text generated by artificial intelligence is used.SELECTED DRAWING: Figure 1

Description

本発明は、文書管理・閲覧システムに関し、特に、文書管理・閲覧システムによって管理された文書に含まれる用語の理解を支援する注釈文の表示に関する。 The present invention relates to a document management / viewing system, and more particularly to the display of annotations that support the understanding of terms contained in a document managed by the document management / viewing system.

各種の文書を蓄積して管理し、利用者によって検索されまたは指定された文書を表示あるいは出力することによって提示して利用者の閲覧に供する文書管理・閲覧システムがある。利用者は、その必要とする文書を検索し、検索された文書を例えばコンピュータの表示画面上に表示する。文書管理・閲覧システムによる管理の対象となる文書として契約書や約款などがあるが、これらの文書は、契約において使用される用語や法律に詳しくないと理解することが難しい。同様に文書管理・閲覧システムによる管理の対象となる文書としてマニュアルや規定集などがあるが、これらの文書は分量が膨大であり、その記載内容をすぐに把握することが難しい。文書管理・閲覧システムで管理される文書の適切な理解のためにはその文書に含まれる用語の意味や定義などを知っている必要があるが、そのことは実際には難しく、文書の閲覧者は、詳しく内容を知りたい用語やよく分からない用語に遭遇するたびに、その用語をキーとして検索を行う必要が生じる。例えば、窓口対応業務などを文書管理・閲覧システムを用いて遂行する場合を考えると、顧客からの問い合わせに基づいて文書を検索し、検索された文書の内容を顧客に伝えるが、そのとき、その文書に含まれる用語などに疑問が生じたとき、その用語をキーとしてその文書内での検索やその用語を含む他の文書の検索を行う必要があり、再度の検索に要する時間の分だけ、業務効率が低下することになる。文書の適切な理解のためには、その文書に含まれる用語を解説する注釈文をその文書の閲覧者に対して適時に提示できることが求められる。 There is a document management / viewing system that accumulates and manages various documents, presents them by displaying or outputting documents searched or specified by the user, and makes them available for viewing by the user. The user searches for the required document and displays the searched document on, for example, a display screen of a computer. There are contracts and contracts as documents to be managed by the document management / viewing system, but it is difficult to understand these documents unless they are familiar with the terms and laws used in the contract. Similarly, there are manuals and regulation collections as documents to be managed by the document management / viewing system, but these documents are enormous in volume and it is difficult to immediately grasp the contents of the description. In order to properly understand a document managed by a document management / viewing system, it is necessary to know the meaning and definition of terms contained in the document, but that is actually difficult and the viewer of the document Whenever a term that you want to know in detail or a term that you do not understand is encountered, you need to search using that term as a key. For example, in the case of performing counter-service operations using a document management / viewing system, a document is searched based on an inquiry from the customer, and the content of the searched document is communicated to the customer. When a question arises about a term contained in a document, it is necessary to search within the document or another document containing the term using that term as a key, and only for the time required for another search. Business efficiency will decrease. For proper understanding of a document, it is required to be able to present commentary sentences explaining the terms contained in the document to the readers of the document in a timely manner.

文書に含まれる用語についての注釈文を適時にその文書の閲覧者に提示する方法として、文書自体をハイパーテキスト文書として構成し、用語ごとにその用語に対する注釈文を含むファイルをリンクとしてその文書に埋め込む方法が知られている。この方法によれば、コンピュータの表示装置においてその文書を表示しているときにマウスなどで用語をクリックすれば、例えばブラウザなどが起動してその用語に対する注釈文が表示される。また特許文献１は、文書管理・閲覧システムであって、表示されている文書中の用語がマウス操作などによって選択されたときに、その選択された用語をキーとして文書管理・閲覧システム内の他の文書の検索やインターネット上の検索を行って検索結果を表示できるシステムを開示している。 As a way to timely present commentary on a term contained in a document to the viewer of the document, the document itself is constructed as a hypertext document, and a file containing the commentary for that term is linked to the document for each term. The method of embedding is known. According to this method, if a term is clicked with a mouse or the like while the document is being displayed on a display device of a computer, for example, a browser or the like is started and an annotation sentence for the term is displayed. Further, Patent Document 1 is a document management / browsing system, and when a term in a displayed document is selected by a mouse operation or the like, the selected term is used as a key in the document management / browsing system. We disclose a system that can display search results by searching documents and searching on the Internet.

特開２００６−７９３６６号公報Japanese Unexamined Patent Publication No. 2006-79366

特許文献１に記載された方法は、文書中に含まれて利用者によって選択された用語をキーとして検索を行うものであるが、用語の理解に役立つ検索結果が必ず得られることを保証するものではない。一方、文書をハイパーテキスト文書とし、文書に含まれる用語に対してその用語についての注釈文のファイルに対するリンクを埋め込む場合、用語ごとの注釈文を予め作成してファイルとする必要があって注釈文の作成に労力を要する。特に、文書管理・閲覧システムに蓄積される文書の量が多い場合には、注釈文の作成に多大な作業時間がかかり、人手で実行することが非現実的となる。 The method described in Patent Document 1 performs a search using a term included in the document and selected by the user as a key, but guarantees that a search result useful for understanding the term is always obtained. is not it. On the other hand, when a document is a hypertext document and a link to a file of annotations about the terms is embedded in the terms contained in the document, it is necessary to create annotations for each term in advance and make it a file. It takes a lot of effort to create. In particular, when the amount of documents stored in the document management / viewing system is large, it takes a lot of work time to create the annotation text, which makes it unrealistic to execute it manually.

本発明の目的は、文書に含まれる用語が選択されたときにその用語に対する適切な注釈文を表示でき、かつ、注釈文の作成に要する労力を低減できる文書管理・閲覧システムとその注釈文表示方法とを提供することにある。 An object of the present invention is a document management / viewing system and an annotation text display thereof that can display an appropriate annotation text for a term when a term included in the document is selected and can reduce the labor required for creating the annotation text. To provide a method.

上記の目的を達成するために本発明の文書管理・閲覧システムは、
文書を蓄積して管理し利用者に提示する文書管理・閲覧システムにおいて、
前記文書を蓄積する第１の蓄積部と、
前記利用者に対して前記第１の蓄積部に蓄積された文書を表示する処理を実行する制御部と、
を備え、
前記利用者に対して表示された前記文書において前記利用者が選択した用語について、当該用語に対応する注釈文が前記利用者に対して表示され、
前記注釈文は人工知能によって生成されたものであることを特徴とする。 In order to achieve the above object, the document management / viewing system of the present invention
In a document management / viewing system that accumulates and manages documents and presents them to users
A first storage unit for storing the documents and
A control unit that executes a process of displaying documents stored in the first storage unit to the user, and a control unit.
With
With respect to the term selected by the user in the document displayed to the user, a commentary sentence corresponding to the term is displayed to the user.
The commentary is characterized by being generated by artificial intelligence.

上記の目的を達成するために本発明の注釈文表示方法は、
文書を蓄積して管理し利用者に提示する文書管理・閲覧システムにおける注釈文表示方法において、
文書管理・閲覧システムに蓄積された文書を利用者に対して表示するステップと、
前記利用者に対して表示された前記文書において前記利用者が選択した用語について、当該用語に対応する注釈文を前記利用者に対して表示するステップと、
を有し、
前記注釈文は人工知能によって生成されたものであることを特徴とする。 In order to achieve the above object, the annotation text display method of the present invention is:
In the annotation text display method in the document management / viewing system that accumulates and manages documents and presents them to users
The steps to display the documents stored in the document management / viewing system to the user,
For the term selected by the user in the document displayed to the user, a step of displaying an annotation sentence corresponding to the term to the user, and
Have,
The commentary is characterized by being generated by artificial intelligence.

上記のように構成された本発明においては、表示された文書において利用者が用語を選択したときに、その選択された用語に関し、人工知能によって生成された注釈文を表示する。人工知能を使用することにより、人手を介することなく文書中の用語に対する的確な注釈文を利用者に対して表示することが可能になるとともに、注釈文の作成に要する労力を低減することができる。 In the present invention configured as described above, when a user selects a term in the displayed document, an annotation sentence generated by artificial intelligence is displayed with respect to the selected term. By using artificial intelligence, it is possible to display accurate annotations to the terms in the document to the user without human intervention, and it is possible to reduce the labor required to create the annotations. ..

本発明においては、用語ごとの注釈文を事前に文書管理・閲覧システムに蓄積し、利用者が選択した用語に基づいて、事前に蓄積された注釈文の中から利用者に対して表示する注釈文を選択するようにしてもよい。事前に注釈文を文書管理・閲覧システムに蓄積しておくことにより、文書中の用語を利用者が選択したときに、即座にその用語に対する注釈文を利用者に対して表示できるようになる。 In the present invention, annotation sentences for each term are accumulated in the document management / viewing system in advance, and based on the terms selected by the user, annotations to be displayed to the user from among the annotation sentences accumulated in advance. You may choose a statement. By accumulating the annotation text in the document management / viewing system in advance, when the user selects a term in the document, the annotation text for that term can be immediately displayed to the user.

用語ごとの注釈文を事前に蓄積する場合、文書管理・閲覧システムにおいて用語ごとの注釈文を生成する処理を実行することが好ましい。文書管理・閲覧システムにおいて注釈文を生成して蓄積することにより、新たな文書を文書管理・閲覧システムに追加するときに、自動的に注釈文を生成することが可能になる。注釈文は、例えば、文書管理・閲覧システムに蓄積される文書から用語を抽出する処理と、抽出された用語に関連する関連文書を、少なくとも文書管理・閲覧システムに蓄積された文書の中から検索して取得する処理と、人工知能を使用して関連文書の要約文を生成し、注釈文として、対応する用語と関連付けて蓄積する処理と、によって生成することができる。 When accumulating the annotation sentences for each term in advance, it is preferable to execute the process of generating the annotation sentences for each term in the document management / viewing system. By generating and accumulating annotations in the document management / viewing system, it becomes possible to automatically generate annotations when a new document is added to the document management / viewing system. The commentary is, for example, a process of extracting a term from a document stored in the document management / viewing system and a search for related documents related to the extracted term from at least the documents stored in the document management / viewing system. It can be generated by the process of acquiring the document and the process of generating a summary sentence of the related document using artificial intelligence and accumulating it as an annotation sentence in association with the corresponding term.

関連文書の取得では、文書管理・閲覧システムに蓄積された文書から関連文書を取得するだけでなく、例えばインターネットなどのネットワークを介して接続した外部サーバから関連文書を検索して取得してもよい。外部サーバから関連文書を取得することにより、文書管理・閲覧システムに蓄積された文書からは適切な関連文書を取得できない場合であっても注釈文を生成することが可能になる。ただし、外部サーバから取得される関連文書には検索ノイズに相当するものが含まれる可能性も高いため、要約文の生成に際しては、文書管理・閲覧システムから取得した関連文書を優先して用いることが好ましい。また、一定数を超える関連文書が取得される場合には、関連文書ごとに用語との類似度を算出し、類似度がしきい値を超える２以上の所定数の関連文書から要約文を生成することが好ましい。類似度に基づいて関連文書を選択することにより、より適切な内容の注釈文を得ることができる。また、複数の関連文書に点在している情報を集約することにより、さらに適切な情報を含む注釈文を一挙に得ることができる。 In the acquisition of related documents, not only the related documents may be acquired from the documents stored in the document management / viewing system, but also the related documents may be searched and acquired from an external server connected via a network such as the Internet. .. By acquiring the related document from the external server, it is possible to generate an annotation even if an appropriate related document cannot be acquired from the document stored in the document management / viewing system. However, since there is a high possibility that the related documents acquired from the external server include those equivalent to search noise, the related documents acquired from the document management / viewing system should be used with priority when generating the summary sentence. Is preferable. In addition, when a certain number of related documents are acquired, the similarity with the term is calculated for each related document, and a summary sentence is generated from a predetermined number of related documents whose similarity exceeds the threshold value. It is preferable to do so. By selecting related documents based on the degree of similarity, it is possible to obtain annotations with more appropriate contents. In addition, by aggregating the information scattered in a plurality of related documents, it is possible to obtain an annotation sentence including more appropriate information at once.

本発明によれば、文書に含まれる用語が選択されたときにその用語に対する適切な注釈文を表示でき、かつ、注釈文の作成に要する労力を低減できる。 According to the present invention, when a term included in a document is selected, an appropriate commentary sentence for the term can be displayed, and the labor required for creating the commentary sentence can be reduced.

本発明の実施の一形態の文書管理・閲覧システムの構成を示すブロック図である。It is a block diagram which shows the structure of the document management / viewing system of one Embodiment of this invention. 注釈文生成部の構成を示すブロック図である。It is a block diagram which shows the structure of the annotation sentence generation part. 注釈文を生成する処理を示すフローチャートである。It is a flowchart which shows the process of generating an annotation sentence. 文書中の用語の選択と注釈文の表示とを示す図である。It is a figure which shows the selection of the term in a document, and the display of an annotation sentence.

次に、本発明の実施の形態について、図面を参照して説明する。図１は、本発明の実施の一形態の文書管理・閲覧システムの構成を示すブロック図である。文書管理・閲覧システム１０は、例えばサーバ用コンピュータによって構成されるものであり、事業体などにおいて発生する文書を蓄積して管理し、利用者によって検索されまたは指定された文書を表示あるいは出力することによって提示して利用者の閲覧に供する。 Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a document management / viewing system according to an embodiment of the present invention. The document management / viewing system 10 is composed of, for example, a computer for a server, and accumulates and manages documents generated in a business entity or the like, and displays or outputs a document searched or specified by a user. Presented by the user for viewing.

文書管理・閲覧システム１０には、ローカルエリアネットワーク（ＬＡＮ）や仮想プライベートネットワーク（ＶＰＮ）などのネットワーク３０を介して、利用者が使用する１または複数の端末５０が接続する。端末５０としては、グラフィカルユーザインタフェース（ＧＵＩ）を備えたパーソナルコンピュータなどが使用され、文書管理・閲覧システム１０によって利用者に表示される文書は、端末５０の表示装置上に表示される。文書管理・閲覧システム１０は、この文書管理・閲覧システム１０による管理対象である文書を蓄積する第１の蓄積部１１と、第１の蓄積部１１に蓄積された文書に含まれる用語に対応する注釈文を蓄積する第２の蓄積部１２と、注釈文を生成する注釈文生成部２０と、制御部１３とを備えている。第２の蓄積部１２に蓄積される注釈文とは、第１の蓄積部１１に蓄積された文書に含まれる各用語について、その用語を解説してその用語の理解を支援する文章のことであり、人工知能（ＡＩ）を用いて生成されたものである。第１の蓄積部１１に蓄積された複数の文書に共通に含まれている用語については、第２の蓄積部１２において重複して注釈文を蓄積する必要はなく、同一の用語を含む文書の数によらず、１つの注釈文を第２の蓄積部１２に蓄積すればよい。第２の蓄積部１２では、用語ごとの注釈文がその用語に関連付けられて蓄積されており、用語によって第２の蓄積部１２を検索することによって、その用語に対応する注釈文を取得できる。 One or more terminals 50 used by the user are connected to the document management / viewing system 10 via a network 30 such as a local area network (LAN) or a virtual private network (VPN). As the terminal 50, a personal computer or the like provided with a graphical user interface (GUI) is used, and the document displayed to the user by the document management / viewing system 10 is displayed on the display device of the terminal 50. The document management / viewing system 10 corresponds to a first storage unit 11 that stores documents to be managed by the document management / viewing system 10 and terms included in the documents stored in the first storage unit 11. It includes a second storage unit 12 for accumulating annotation texts, a comment text generation unit 20 for generating annotation texts, and a control unit 13. The commentary sentence stored in the second storage unit 12 is a sentence that explains each term contained in the document stored in the first storage unit 11 and supports the understanding of the term. Yes, it was generated using artificial intelligence (AI). Regarding terms that are commonly included in a plurality of documents stored in the first storage unit 11, it is not necessary to store duplicate annotations in the second storage unit 12, and documents containing the same terms do not need to be stored. Regardless of the number, one commentary sentence may be stored in the second storage unit 12. In the second storage unit 12, commentary sentences for each term are stored in association with the term, and by searching the second storage unit 12 by the term, the commentary sentence corresponding to the term can be obtained.

制御部１３は、文書管理・閲覧システム１０においてネットワーク３０を介して利用者の端末５０とのインタフェースとなる部分であり、利用者からの要求に基づいて第１の蓄積部１１内の文書を検索して検索された文書を利用者の端末５０の表示画面に表示する処理を実行する。また制御部１３は、利用者の端末５０の画面において表示されている文書において、利用者がその文書に含まれる用語を端末５０の画面上で選択した場合に、選択された用語が端末５０から通知されて、通知された用語に対応する注釈文を利用者の端末５０の画面に表示する処理も実行する。注釈文を端末５０の画面に表示する処理は、例えば、通知された用語に基づいて制御部１３が第２の蓄積部１２内を検索することにより端末５０に表示すべき注釈文を選択し、選択した注釈文を端末５０に送出することによって行われる。制御部１３のこれらの処理を実行する機能は、サーバ用コンピュータによって構成された文書管理・閲覧システム１０において実行されるコンピュータプログラムによって実現することができる。 The control unit 13 is a part of the document management / viewing system 10 that serves as an interface with the user's terminal 50 via the network 30, and searches for documents in the first storage unit 11 based on a request from the user. The process of displaying the searched document on the display screen of the user's terminal 50 is executed. Further, when the user selects a term included in the document on the screen of the terminal 50 in the document displayed on the screen of the user's terminal 50, the control unit 13 selects the term from the terminal 50. The process of displaying the commentary sentence corresponding to the notified term on the screen of the user's terminal 50 is also executed. In the process of displaying the annotation text on the screen of the terminal 50, for example, the control unit 13 selects the annotation text to be displayed on the terminal 50 by searching in the second storage unit 12 based on the notified term. This is done by sending the selected annotation text to the terminal 50. The function of executing these processes of the control unit 13 can be realized by a computer program executed in the document management / viewing system 10 configured by the server computer.

端末５０は、その画面に文書管理・閲覧システム１０から送られてきた文書や注釈文を表示するとともに、画面に表示された文書に含まれる用語が利用者によって選択されたときに、その用語を文書管理・閲覧システムに送信する機能を有する。このような端末５０の機能は、一般的なブラウザソフトウェアあるいは専用のアプリケーションによって実行される。なお、文書管理・閲覧システム１０のうち、制御部１３の機能を端末５０において実現することも可能である。 The terminal 50 displays a document or commentary sent from the document management / viewing system 10 on the screen, and when a term included in the document displayed on the screen is selected by the user, the term is displayed. It has a function to send to the document management / viewing system. The function of such a terminal 50 is executed by general browser software or a dedicated application. Of the document management / viewing system 10, the function of the control unit 13 can also be realized in the terminal 50.

次に、本実施形態における注釈文の生成について説明する。注釈文は、上述したように第１の蓄積部１１に既に蓄積されている文書あるいは第１の蓄積部１１に蓄積しようとしている文書に含まれる用語に関し、人工知能を用いて生成されるものである。既に第１の蓄積部１１に蓄積されている文書に含まれる用語の注釈文を生成するかあるいはこれから蓄積する文書に含まれる用語の注釈文を生成するかによって注釈文生成の手順が変わることはないので、ここでは、これから蓄積する文書に含まれる用語についての注釈文を生成する場合を説明する。図２は注釈文生成部２０の構成を示している。以下の説明では、第１の蓄積部１１に蓄積しようとする文書を文書Ａとする。文書Ａに含まれる用語から注釈文を生成する処理と文書Ａを第１の蓄積部１１に蓄積する処理とは並行して実施することができる。 Next, the generation of the annotation text in this embodiment will be described. As described above, the commentary is generated by using artificial intelligence with respect to the terms contained in the document already stored in the first storage unit 11 or the document to be stored in the first storage unit 11. be. The procedure for generating the commentary may change depending on whether the commentary of the terms contained in the document already stored in the first storage unit 11 is generated or the commentary of the terms contained in the document to be stored in the future is generated. Since there is no such thing, here we will explain the case of generating annotations about terms contained in the documents to be accumulated from now on. FIG. 2 shows the configuration of the annotation sentence generation unit 20. In the following description, the document to be stored in the first storage unit 11 is referred to as document A. The process of generating an annotation sentence from the terms included in the document A and the process of accumulating the document A in the first storage unit 11 can be performed in parallel.

注釈文生成部２０は、第１の蓄積部１１に蓄積しようとする文書Ａから用語を抽出する用語抽出部２１と、用語抽出部２１で抽出された用語に関連する関連文書を少なくとも第１の蓄積部１１に蓄積された文書の中から検索して取得する関連文書取得部２２と、人工知能を使用して関連文書の要約文を生成する要約文生成部２３とを備えている。要約文生成部２３は、生成した要約文を注釈文として、対応する用語に関連付けて第２の蓄積部１２に蓄積する。ここで関連文書とは、対象となる用語についての説明や解説などを含む可能性がある文書のことである。多くの場合、関連文書は、対象となる用語そのものも含んでいる。 The commentary sentence generation unit 20 is at least first a term extraction unit 21 that extracts terms from the document A to be accumulated in the first storage unit 11 and a related document related to the terms extracted by the term extraction unit 21. It includes a related document acquisition unit 22 that searches and acquires from the documents stored in the storage unit 11, and a summary sentence generation unit 23 that generates a summary sentence of the related document by using artificial intelligence. The summary sentence generation unit 23 stores the generated summary sentence as an annotation sentence in the second storage unit 12 in association with the corresponding term. Here, the related document is a document that may include explanations and explanations about the target term. In many cases, the relevant document also includes the term itself in question.

次に、注釈文生成の処理について、図３を用いて説明する。まず、ステップ１０１において、用語抽出部２１が、第１の蓄積部１１に蓄積されるべき文書Ａから注釈文を生成すべき用語を抽出する。文書Ａが日本語文書である場合、注釈文を生成すべき用語は、固有名詞を含めて名詞であるか、名詞にいくつかの修飾語が付加された語句、いくつかの名詞を助詞で連結した語句が主であると考えられる。そこで固有名詞抽出のために、文書Ａに対して例えば「Ａｐａｃｈｅ（登録商標）ＯｐｅｎＮＬＰ」のような固有名詞抽出ソフトウェアを適用し、文書Ａから用語として固有名詞を抽出する。これと並行して、文書Ａに対して例えばＴＦ−ＩＤＦによるアルゴリズムなどを適用し、文書Ａから重要な語句を抽出してこれも用語とする。ＴＦ−ＩＤＦは、単語の出現頻度と逆文書頻度とから単語の重要度を評価する手法である。ところで、文書から抽出すべき用語として法律の名称も挙げられるが、法律の名称には「行政手続における特定の個人を識別するための番号の利用等に関する法律」や「特定非常災害の被害者の権利利益の保全等を図るための特別措置に関する法律」など、多数の助詞などを含んでいるために通常の形態素解析によっては抽出できないものがある。そこで用語の抽出では、文書が法律名を含んでいる可能性があるときにその法律名を用語として抽出するために、コロケーション抽出による固有名詞抽出や、公開されている法令データベースによる検索を利用することも好ましい。なお、同一用語に対応する注釈文を重複して生成する必要がないので、用語抽出部２１は、必要に応じて第２の蓄積部１２内を検索し、既に注釈文が生成されている用語については重ねて用語を抽出しないようにすることが好ましい。 Next, the process of generating an annotation sentence will be described with reference to FIG. First, in step 101, the term extraction unit 21 extracts a term for which an annotation sentence is to be generated from the document A to be stored in the first storage unit 11. When document A is a Japanese document, the terms for which commentary sentences should be generated are nouns including proper nouns, phrases with some modifiers added to the nouns, and some nouns connected by auxiliary words. It is thought that the words and phrases that were used are the main ones. Therefore, in order to extract a proper noun, a proper noun extraction software such as "Apache (registered trademark) OpenNLP" is applied to the document A, and the proper noun is extracted as a term from the document A. In parallel with this, for example, an algorithm based on TF-IDF is applied to the document A, and important words and phrases are extracted from the document A and used as terms. TF-IDF is a method of evaluating the importance of a word from the frequency of occurrence of the word and the frequency of reverse documents. By the way, the name of the law can be mentioned as a term that should be extracted from the document, but the name of the law includes "Act on the Use of Numbers for Identifying Specific Individuals in Administrative Procedures" and "Victims of Specific Emergency Disasters". Some particles cannot be extracted by ordinary morphological analysis because they contain a large number of particles, such as the "Act on Special Measures for the Protection of Rights and Interests". Therefore, in term extraction, in order to extract the legal name as a term when the document may contain the legal name, proper noun extraction by colocation extraction or search by the public legal database is used. It is also preferable. Since it is not necessary to generate the annotation sentences corresponding to the same term in duplicate, the term extraction unit 21 searches the second storage unit 12 as necessary, and the terms for which the annotation sentences have already been generated are searched. It is preferable not to extract terms again.

続いてステップ１０２において、関連文書取得部２２は、ステップ１０１において抽出された用語を検索キーとして、第１の蓄積部１１に蓄積されている文書の中から関連文書を検索して取得する。ステップ１０１において抽出された用語を含む文書が多数検索されることもあるが、その場合には、注釈文として用いられる要約文の生成に適した文書を関連文書とする。関連文書は、対象とする用語を解説するものであることが適しているから、対象とする用語に関する言及、被言及の関係を調べて関連文書を絞り込むことができる。あるいは、「ＦａｓｔＴｅｘｔ」ソフトウェアなどを用いて用語の分散表現を取得した上で、関連文書の候補となる文書に対してコサイン類似度などの手法を用いて用語に対するその文書の類似度を計算し、類似度に基づいて関連文書を選択してもよい。「Ｄｏｃ２Ｖｅｃ」ソフトウェアを使用し、用語に対する類似度に基づいて関連文書を選択してもよい。用語に対する類似度に基づいて関連文書を絞り込むときは、基本的には類似度が高い文書を関連文書とする。類似度に基づいて関連文書を選択することにより、より適切な内容の注釈文を得ることができる。関連文書の候補となる文書が一定数を超えて取得されるときは、類似度がしきい値を超える、２以上の所定数の文書を関連文書としてそれらの関連文書から要約文を生成することができる。この場合、複数の関連文書に点在している情報を集約することになるので、さらに適切な情報を含む注釈文を一挙に得ることができる。しきい値や所定数は、類似度の算出方法やどれだけの関連文書から要約文を生成するかに応じて予め定められる。 Subsequently, in step 102, the related document acquisition unit 22 searches for and acquires the related document from the documents stored in the first storage unit 11 using the term extracted in step 101 as a search key. A large number of documents containing the terms extracted in step 101 may be searched, and in that case, a document suitable for generating a summary sentence used as an annotation sentence is set as a related document. Since it is appropriate that the related document explains the target term, it is possible to narrow down the related document by examining the relationship between the reference and the reference to the target term. Alternatively, after obtaining a distributed representation of the term using "FastText" software or the like, the similarity of the document to the term is calculated by using a method such as cosine similarity to the document that is a candidate for the related document. Related documents may be selected based on similarity. You may use the "Doc2Vec" software to select relevant documents based on their similarity to the term. When narrowing down related documents based on the similarity to terms, basically, the documents with high similarity are regarded as related documents. By selecting related documents based on the degree of similarity, it is possible to obtain annotations with more appropriate contents. When more than a certain number of documents that are candidates for related documents are acquired, a summary sentence is generated from those related documents with a predetermined number of documents whose similarity exceeds the threshold value as related documents. Can be done. In this case, since the information scattered in a plurality of related documents is aggregated, it is possible to obtain an annotation sentence including more appropriate information at once. The threshold value and the predetermined number are predetermined according to the calculation method of the similarity and how many related documents the summary sentence is generated from.

以上の説明では、関連文書取得部２２は、第１の蓄積部１１に蓄積されている文書の中から関連文書を取得しているが、関連文書の検索範囲を拡大することも可能である。図２において破線で示すように、インターネットなどの外部ネットワーク６０に外部サーバ６１が接続しているときに、関連文書取得部２２は、外部サーバ６１に格納されている文書から関連文書を取得してもよい。外部サーバ６１から関連文書を取得することにより、第１の蓄積部１１に蓄積された文書からは適切な関連文書を取得できない場合であっても注釈文を生成することが可能になる。 In the above description, the related document acquisition unit 22 acquires the related document from the documents stored in the first storage unit 11, but it is also possible to expand the search range of the related document. As shown by the broken line in FIG. 2, when the external server 61 is connected to the external network 60 such as the Internet, the related document acquisition unit 22 acquires the related document from the document stored in the external server 61. May be good. By acquiring the related document from the external server 61, it is possible to generate an annotation even if an appropriate related document cannot be acquired from the document stored in the first storage unit 11.

続いてステップ１０３において、要約文生成部２３が、関連文書取得部２２が取得した関連文書から人工知能を用いて要約文を生成する。要約文は、例えば「ＬｅｘＲａｎｋ」ソフトウェアを使用して生成される。第１の蓄積部１１から取得された関連文書と外部サーバ６１から取得された関連文書があるときは、第１の蓄積部１１から取得された関連文書を優先して要約文を生成する。これは、外部サーバ６１から取得される関連文書には検索ノイズに相当するものが含まれる可能性も高く、正確性に欠けるものが存在するおそれがあるためである。そののちステップ１０４において、要約文生成部２３は、生成した要約文を注釈文として、対応する用語に関連付けて第２の蓄積部１２に蓄積する。 Subsequently, in step 103, the summary sentence generation unit 23 generates a summary sentence from the related document acquired by the related document acquisition unit 22 using artificial intelligence. The abstract is generated using, for example, "LexRank" software. When there is a related document acquired from the first storage unit 11 and a related document acquired from the external server 61, the related document acquired from the first storage unit 11 is prioritized to generate a summary sentence. This is because there is a high possibility that the related documents acquired from the external server 61 include those corresponding to the search noise, and there is a possibility that some of them lack accuracy. After that, in step 104, the summary sentence generation unit 23 stores the generated summary sentence as an annotation sentence in the second storage unit 12 in association with the corresponding term.

以上、本実施形態における注釈文の生成を説明したが、「Ａｐａｃｈｅ（登録商標）ＯｐｅｎＮＬＰ」、「ＦａｓｔＴｅｘｔ」、「Ｄｏｃ２Ｖｅｃ」、「ＬｅｘＲａｎｋ」の各ソフトウェアはオープンソースソフトウェアであり、容易に実装することができる。また、ＴＦ−ＩＤＦ手法を実現するソフトウェアやコサイン類似度などを算出するソフトウェアもオープンソースソフトウェアとして入手可能である。本実施形態において注釈文生成部２０は、サーバ用コンピュータによって構成される文書管理・閲覧システム１０上で実行されるソフトウェアによって実現することができる。 Although the generation of the annotation text in the present embodiment has been described above, the software of "Apache (registered trademark) OpenNLP", "FastText", "Doc2Vec", and "LexRank" is open source software and should be easily implemented. Can be done. In addition, software that realizes the TF-IDF method and software that calculates cosine similarity and the like are also available as open source software. In the present embodiment, the comment statement generation unit 20 can be realized by software executed on the document management / viewing system 10 configured by the server computer.

次に、本実施形態での利用者の端末５０における文書中の用語の選択と注釈文の表示とについて、図４を用いて説明する。文書管理・閲覧システム１０の第１の蓄積部１１に蓄積されている文書のいずれかが端末５０の表示画面に表示されているとして、図４は、表示画面に文書に表示されている状態を示している。文書は、表示画面内の文書表示ウィンドウ７０内に表示されている。文書が例えばハイパーテキスト文書、あるいは文字コードが埋め込まれているＰＤＦ（ポータブルドキュメントフォーマット）文書であるとすると、利用者は、マウスなどを使用して表示中の文書に含まれる文字列を選択することができる。表示画面がタッチパネルであれば、対象とする文字列の先頭文字に指を置いてその文字列の範囲をドラッグすることにより、その文字列を選択することができる。本実施形態では、表示画面において選択された文字列を、表示中の文書において利用者が選択した用語とする。図４では、四角で囲まれた「個人情報保護方針」の文字列７１が選択された用語である。選択された文字列７１については、表示色を変える、その文字列７１の部分の背景色を変えるなどして強調表示されるようにすることが好ましい。ブラウザあるいはＰＤＦ文書において文字列を選択した場合には、通常、その文字列は強調表示される。さらに、表示されている文書において、選択された文字列と同じ文字列が選択された箇所以外にも存在する場合には、それらの箇所における同一文字列を強調表示するようにしてもよい。 Next, the selection of terms in the document and the display of the annotation text on the user's terminal 50 in the present embodiment will be described with reference to FIG. Assuming that any of the documents stored in the first storage unit 11 of the document management / viewing system 10 is displayed on the display screen of the terminal 50, FIG. 4 shows a state in which the document is displayed on the display screen. Shown. The document is displayed in the document display window 70 in the display screen. Assuming that the document is, for example, a hypertext document or a PDF (portable document format) document in which a character code is embedded, the user can select a character string included in the displayed document by using a mouse or the like. Can be done. If the display screen is a touch panel, the character string can be selected by placing a finger on the first character of the target character string and dragging the range of the character string. In the present embodiment, the character string selected on the display screen is the term selected by the user in the displayed document. In FIG. 4, the character string 71 of the “personal information protection policy” surrounded by a square is the selected term. It is preferable that the selected character string 71 is highlighted by changing the display color or the background color of the portion of the character string 71. If you select a string in your browser or PDF document, that string is usually highlighted. Further, in the displayed document, if the same character string as the selected character string exists in a place other than the selected place, the same character string in those places may be highlighted.

そして利用者は、用語が選択された状態で例えばマウスの右クリックによりコンテキストメニューを呼び出し、コンテキストメニュー中の「注釈文の表示」を選択すると、選択された用語が制御部１３に送信され、制御部１３からその用語に対応する注釈文が端末５０に送られ、端末５０では、その表示画面において選択された文字列７１に対応して図４に示すように吹き出し７２が現れる。吹き出し７２の内部には、選択された用語（ここでは「個人情報保護方針」）に対する注釈文が表示される。注釈文の表示方法としては、吹き出しによる方法のほか、ポップアップウィンドウによる方法、新規タブあるいは別ページによる方法、別コラムによる方法、新規または別ウィンドウによる方法などがある。コンテキスメニューに「注釈文の表示」を追加し、「注釈文の表示」が選択されたときに、選択された用語のデータを制御部１３に送信し、制御部１３から送られてきた注釈文を何らかの形態で表示させることは、専用アプリケーションによって実現できるし、一般的なブラウザを使用する場合においてもブラウザにおける設定やその端末５０のオペレーティングシステム（ＯＳ）の設定によって実現することができる。 Then, when the user calls the context menu by right-clicking the mouse, for example, with the term selected and selects "display commentary" in the context menu, the selected term is transmitted to the control unit 13 for control. An annotation sentence corresponding to the term is sent from the unit 13 to the terminal 50, and the terminal 50 displays a balloon 72 corresponding to the character string 71 selected on the display screen as shown in FIG. Inside the balloon 72, a commentary for the selected term (here, "personal information protection policy") is displayed. As a method of displaying the annotation text, in addition to the method of using a balloon, the method of using a pop-up window, the method of using a new tab or another page, the method of using another column, the method of using a new or another window, and the like. "Display annotation text" is added to the context menu, and when "Display annotation text" is selected, the data of the selected term is sent to the control unit 13, and the annotation text sent from the control unit 13 is sent. Can be displayed in some form by a dedicated application, and even when a general browser is used, it can be realized by the setting in the browser and the setting of the operating system (OS) of the terminal 50.

さらに、第１の蓄積部１１に蓄積される文書において、ＪａｖａＳｃｒｉｐｔ（登録商標）などを使用し、文書中の各用語に対しその用語に対する注釈文情報を埋め込み、文書が表示されたときに利用者が文書中の用語に対してマウスカーソルなどを重ねたときに、第２の蓄積部１２に格納されている注釈文であってその注釈部情報に対応する注釈文が表示されるようにすることも可能である。この場合、第２の蓄積部１２において用語ごとの注釈文を格納することで、文書のデータサイズを小さくすることができ、また、用語に対する注釈文を更新する必要が生じたときにも柔軟に対応することが可能になる。 Further, in the document stored in the first storage unit 11, JavaScript (registered trademark) or the like is used to embed annotation information for each term in the document, and when the document is displayed, the user. When the mouse cursor or the like is placed on a term in the document, the annotation text stored in the second storage unit 12 and the annotation text corresponding to the annotation section information is displayed. Is also possible. In this case, by storing the commentary for each term in the second storage unit 12, the data size of the document can be reduced, and the commentary for the term needs to be updated flexibly. It will be possible to respond.

注釈文を表示させるために、第２の蓄積部１２に格納されている注釈文に対するリンクを文書に埋め込むことも可能である。この場合、端末５０において表示されている文書において用語を例えばマウスでクリックすることにより、その用語に対応する注釈文が端末５０の表示画面に表示されることになる。この場合は、用語に対するマウスによるクリックにより用語を選択できるので、用語の選択のために用語となるべき文字列の範囲を画面において指定する必要はない。しかしながら、注釈文に対するリンクを文書に埋め込む場合には、例えば、法律名である「特定非常災害の被害者の権利利益の保全等を図るための特別措置に関する法律」の文言を含む文書が表示されているときに、利用者が欲している注釈文が「特定非常災害」自体に対するものなのか、法律である「特定非常災害の被害者の権利利益の保全等を図るための特別措置に関する法律」に対するものかを区別することが難しい、というデメリットが生じる。 In order to display the comment text, it is also possible to embed a link to the comment text stored in the second storage unit 12 in the document. In this case, by clicking a term in the document displayed on the terminal 50 with a mouse, for example, an annotation sentence corresponding to the term is displayed on the display screen of the terminal 50. In this case, since the term can be selected by clicking the term with the mouse, it is not necessary to specify the range of the character string to be the term on the screen for selecting the term. However, when embedding a link to the commentary in the document, for example, a document containing the wording of the law name "Act on Special Measures for Preserving the Rights and Interests of Victims of Specified Emergency Disasters" is displayed. Whether the commentary that the user wants is for the "specific emergency disaster" itself, or the law "Act on Special Measures to Preserve the Rights and Interests of Victims of the Specific Emergency Disaster" There is a demerit that it is difficult to distinguish between the two.

本実施形態の文書管理・閲覧システム１０によれば、人工知能を用いて注釈文を生成するので、用語についての適切な注釈文を少ない労力で作成できる。特に、第１の蓄積部１０への文書の蓄積と平行して注釈文の自動生成を行うことにより、作業量を最小のものとすることができる。本実施形態によれば、文書管理・閲覧システム１０を用いて文書を閲覧している利用者は、文書中の用語を解説した注釈文を即座に画面に表示させることができる。したがってこの文書管理・閲覧システム１０によれば、利用者による文書の的確な理解を支援することができる。例えば窓口対応業務に本実施形態の文書管理・閲覧システム１０を使用する場合、文書中に含まれる用語の意味などを調べるために検索を行う時間を削減でき、顧客からの問い合わせに対して素早く対応することが可能になって、業務効率が大きく向上する。 According to the document management / viewing system 10 of the present embodiment, since the commentary sentence is generated by using artificial intelligence, it is possible to create an appropriate commentary sentence about the term with little effort. In particular, the amount of work can be minimized by automatically generating the annotation text in parallel with the accumulation of the documents in the first storage unit 10. According to the present embodiment, a user who is browsing a document using the document management / browsing system 10 can immediately display an annotation sentence explaining terms in the document on the screen. Therefore, according to this document management / viewing system 10, it is possible to support the user to accurately understand the document. For example, when the document management / viewing system 10 of the present embodiment is used for window handling work, it is possible to reduce the time required for searching to find out the meaning of terms contained in the document, and to respond quickly to inquiries from customers. It becomes possible to do so, and business efficiency is greatly improved.

１０文書管理・閲覧システム
１１第１の蓄積部
１２第２の蓄積部
１３制御部
２０注釈文生成部
２１用語抽出部
２２関連文書取得部
２３要約文生成部
３０ネットワーク
５０端末
６０外部ネットワーク
６１外部サーバ
10 Document management / viewing system 11 1st storage unit 12 2nd storage unit 13 Control unit 20 Annotation sentence generation unit 21 Term extraction unit 22 Related document acquisition unit 23 Summary sentence generation unit 30 Network 50 Terminal 60 External network 61 External server

Claims

In a document management / viewing system that accumulates and manages documents and presents them to users
A first storage unit for storing the documents and
A control unit that executes a process of displaying documents stored in the first storage unit to the user, and a control unit.
With
With respect to the term selected by the user in the document displayed to the user, a commentary sentence corresponding to the term is displayed to the user.
A document management / storage system characterized in that the commentary is generated by artificial intelligence.

It is provided with a second storage unit that stores the commentary for each term in advance.
The control unit selects the commentary text from the second storage unit based on the term selected by the user, and executes a process of displaying the selected commentary text to the user. Document management / storage system according to 1.

It is provided with a commentary sentence generation unit that generates the commentary sentence for each term.
The commentary sentence generation unit
A term extraction unit that extracts terms from documents stored in the first storage unit, and a term extraction unit.
A related document acquisition unit that searches and acquires related documents related to the term extracted by the term extraction unit from at least the documents stored in the first storage unit, and a related document acquisition unit.
An artificial intelligence is used to generate a summary sentence of the related document, and as the commentary sentence, a summary sentence generation unit that is associated with the corresponding term and is stored in the second storage unit, and a summary sentence generation unit.
The document management / storage system according to claim 2.

The related document acquisition unit searches for and acquires the related document from an external server connected via a network, and obtains the related document.
The document management according to claim 3, wherein the summary sentence generation unit generates the summary sentence by giving priority to the related document acquired from the first storage unit over the related document acquired from the external server. Storage system.

The related document acquisition unit calculates the degree of similarity between the acquired related document and the term.
The document management / viewing system according to claim 3, wherein the summary sentence generation unit generates the summary sentence from a plurality of related documents whose similarity exceeds a threshold value.

In the annotation text display method in the document management / viewing system that accumulates and manages documents and presents them to users
The steps to display the documents stored in the document management / viewing system to the user,
For the term selected by the user in the document displayed to the user, a step of displaying an annotation sentence corresponding to the term to the user, and
Have,
A method of displaying an annotation text, characterized in that the annotation text is generated by artificial intelligence.

The commentary sentence for each term is accumulated in the document management / viewing system in advance, and the commentary sentence to be displayed to the user from the accumulated commentary sentence based on the terminology selected by the user. The commentary text display method according to claim 6, wherein a text is selected.

A comment statement generation step for generating the commentary sentence for each term is provided.
The comment statement generation step is
A term extraction step for extracting terms from documents stored in the document management / viewing system, and
A related document acquisition step of searching and acquiring at least related documents related to the term extracted in the term extraction step from the documents stored in the document management / viewing system, and a related document acquisition step.
A summary sentence generation step of generating a summary sentence of the related document using artificial intelligence, using the summary sentence as the commentary sentence, associating it with a corresponding term, and accumulating it in the document management / viewing system.
The commentary text display method according to claim 7.

In the related document acquisition step, the related document is also searched and acquired from an external server connected via the network.
The annotation text display according to claim 8, wherein in the summary text generation step, the summary text is generated by giving priority to the related document acquired from the document management / viewing system over the related document acquired from the external server. Method.

In the related document acquisition step, the degree of similarity between the acquired related document and the term is calculated.
The commentary text display method according to claim 8, wherein in the summary text generation step, the summary text is generated from a plurality of related documents whose similarity exceeds a threshold value.