JP2014016873A

JP2014016873A - Document data management device, and program

Info

Publication number: JP2014016873A
Application number: JP2012154640A
Authority: JP
Inventors: Kazuhiko Oikawa; 和彦及川; Masamichi Suizu; 正道水津
Original assignee: Fujitsu Marketing Ltd
Current assignee: Fujitsu Marketing Ltd
Priority date: 2012-07-10
Filing date: 2012-07-10
Publication date: 2014-01-30

Abstract

PROBLEM TO BE SOLVED: To provide a document data management device in which information security of document data is secured, and besides browsing of document data can be performed.SOLUTION: A document data management device according to the present invention has means for: previously collecting document data as a target of a browsing request from a server device such as a document file server; dividing document data by the document constituent element of a document; endowing each document constituent element with a hash value so as to hold it; extracting, in response to a browsing request from a document browsing client as an information terminal, only a document constituent element in response to the request; and transmitting it to the document browsing client.

Description

本発明は，ドキュメントデータ管理装置およびそのプログラムに関し，特に，ドキュメントデータを構造化して管理し，ネットワークを利用してドキュメントデータを閲覧させるための技術に関するものである。 The present invention relates to a document data management device and a program thereof, and more particularly to a technique for managing document data in a structured manner and browsing the document data using a network.

近年，データの取り扱いの容易さ，ネットワークとの親和性の高さ，再利用性の高さ等の理由から，ドキュメントデータの保存形式としてＸＭＬ（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）を代表とした構造化文書データが数多く用いられている。構造化文書データでは，文書をタグで装飾することによって，文章の構造化が行われる。そして，文章構造の各要素に対して，段落や見栄えの設定を行う事ができる。さらには，タグ情報に属性を定義すれば，各要素がどのような意味をもつ要素なのか，意味付けを行う事も可能である。 In recent years, a large amount of structured document data represented by XML (Extensible Markup Language) has been used as a document data storage format for reasons such as easy handling of data, high compatibility with networks, and high reusability. It has been. In structured document data, a sentence is structured by decorating the document with tags. Then, paragraphs and appearance can be set for each element of the sentence structure. Furthermore, if an attribute is defined in tag information, it is possible to give meaning to what element each element has.

企業内で作成されるドキュメントデータは，報告書や見積書，請求書などのデータが主なデータである。企業内で作成されるこれらのドキュメントデータは，その書式が予め定められている場合が多い。このようなドキュメントデータの場合，予め文章フォーマットをＸＭＬ形式で定めておき，ドキュメントデータを作成する事で，文章を構造化した形で作成することが可能である。 Document data created in a company is mainly data such as reports, estimates, and invoices. In many cases, the document data created in the company has a predetermined format. In the case of such document data, a sentence format can be created in a structured form by defining a sentence format in XML format in advance and creating document data.

例えば，ＸＭＬデータの文書フォーマットの一つとしてしられているＯｆｆｉｃｅＯｐｅｎＸＭＬ（Standard ECMA-376 Office Open XML File Formats）においては，カスタム定義タグを利用し，タグに属性情報を持たせることによって文書の構成要素に意味付けを行うことができる。 For example, in OfficeOpenXML (Standard ECMA-376 Office Open XML File Formats), which is one of the document formats of XML data, a custom definition tag is used, and attribute information is given to the tag. Meaning can be made.

ＸＭＬデータとしてドキュメントデータを作成することによって，どのようなアプリケーションであってもＸＭＬデータとしてデータを取り扱う事が可能となる。ＸＭＬデータは，テキストデータで構成された構造化文章データであるため，データの汎用性が高く，再利用が行い易いというメリットがある。また，ＸＭＬデータが構造化文章データであるという特性を生かしたＸＭＬデータの活用方法がいくつか知られている。 By creating document data as XML data, any application can handle the data as XML data. Since XML data is structured text data composed of text data, there is a merit that data is highly versatile and can be easily reused. There are some known methods of utilizing XML data that take advantage of the characteristic that XML data is structured text data.

特開２００４−１２６８０４においては，階層構造やタグ情報を元に，ＸＭＬデータを構成要素に分解し，構成要素をデータベースのテーブルにマッピングして格納することによって，大量のＸＭＬデータを一元管理し，ＸＭＬデータの取り扱いを効率的なものとしている。さらには，特開２００２−２９７６０１においては，ＸＭＬデータが構造化文章であるという点を生かして，文章構造が異なる文書を，文書構造の同一性を保持しながら一元管理する技術が知られている。 In Japanese Patent Laid-Open No. 2004-126804, a large amount of XML data is centrally managed by decomposing XML data into components based on the hierarchical structure and tag information, mapping the components to a database table, and storing them. The handling of XML data is made efficient. Furthermore, Japanese Patent Application Laid-Open No. 2002-297601 has known a technique for managing documents having different sentence structures while maintaining the same document structure by taking advantage of the fact that XML data is structured sentences. .

特開２００４−１２６８０４号公報JP 2004-126804 A 特開２００２−２９７６０１号公報JP 2002-297601 A

Standard ECMA-376 Office Open XML File Formats［平成２４年５月９日検索］，＜ＵＲＬ：http://www.ecma-international.org/publications/standards/Ecma-376.htm＞Standard ECMA-376 Office Open XML File Formats [Search May 9, 2012], <URL: http://www.ecma-international.org/publications/standards/Ecma-376.htm>

近年，ネットワークデバイスの普及により，企業内においては営業担当者が外出先より企業内ネットワークに接続し企業内ネットワークのファイルサーバからドキュメントデータを参照する利用シーンが数多くある。このような場合，ドキュメントデータをＸＭＬデータで作成しておくことによって，ＸＭＬデータが構造化文章データであるという特性を生かして，多数あるドキュメントデータの中から必要とするドキュメントデータの必要とする箇所のみを抽出して取得する事も可能である。 In recent years, due to the widespread use of network devices, there are many usage scenes in which a sales person connects to an internal network from a remote location and refers to document data from a file server in the internal network. In such a case, by creating the document data as XML data, taking advantage of the characteristic that the XML data is structured text data, a required part of the required document data from among a large number of document data. It is also possible to extract and acquire only.

例えば，特許文献１，２に示される方法によればＸＭＬデータを，文章構造を保ったまま保存することができる。これによって，大量のＸＭＬデータから，利用者が必要とするドキュメントデータのみを文章の構造化情報を利用して効率的に抽出して参照することが可能である。 For example, according to the methods disclosed in Patent Documents 1 and 2, XML data can be stored while maintaining the sentence structure. As a result, it is possible to efficiently extract and refer to only the document data required by the user from the large amount of XML data using the structured information of the text.

しかし，通常はセキュリティ上の観点からドキュメントデータごとにアクセス制限がされている事が一般的である。具体的には，ドキュメントデータの情報セキュリティを確保するために，同じ企業内の利用者であっても，異なる部門の利用者が作成したドキュメントデータは，参照できないようなアクセス制限をドキュメントデータ単位で設定するといった運用が行われている。そのため，ドキュメントデータをＸＭＬデータとして保存していたとしても，利用者が必要とする情報をドキュメントデータから自由に抽出して利用する事が難しい場合がある。 However, in general, access is restricted for each document data from the viewpoint of security. Specifically, in order to ensure information security of document data, even for users in the same company, document data created by users in different departments have access restrictions in document data units that cannot be referenced. Operation such as setting is performed. Therefore, even if the document data is stored as XML data, it may be difficult to freely extract and use information required by the user from the document data.

通常は，ドキュメントデータごとに参照権限の設定を行い，利用者毎に参照の可否を設定する運用を行う事により利用者が必要とするドキュメントデータの参照ができるようにする運用が取られる。このような運用では，ドキュメントデータごとに参照権限の設定を行う必要が生じ，運用が煩雑である。また，参照権限の設定ミスといった問題も生じる。 Normally, an operation is performed so that the user can refer to the document data required by the user by setting the reference authority for each document data and setting the permission of reference for each user. In such an operation, it is necessary to set a reference authority for each document data, and the operation is complicated. In addition, there is a problem that the reference authority is set incorrectly.

本発明は，このような問題を解決するものであり，企業内ネットワークに保存されているドキュメントデータを容易に参照できる形態で保存した上で，複雑な設定を行う事無しに，ドキュメントデータの情報セキュリティを確保した上で，ドキュメントデータの閲覧が行えるようにすることを目的とする。 The present invention solves such a problem. Document data stored in an in-company network is stored in a form that can be easily referred to, and document data information can be stored without complicated settings. The purpose is to enable browsing of document data while ensuring security.

上記課題を解決するため，本発明のドキュメントデータ管理装置は，閲覧要求の対象となるドキュメントデータをドキュメントファイルサーバのようなサーバ装置から事前に収集し，ドキュメントデータを文書のドキュメント構成要素ごとに分割して，ドキュメント構成要素ごとにハッシュ値を付与したうえで保持し，情報端末であるドキュメント閲覧クライアントからの閲覧要求により，要求に応じたドキュメント構成要素だけを抽出し，ドキュメント閲覧クライアントに送信する手段を持つ。このドキュメントデータ管理装置の概要は，以下のとおりである。 In order to solve the above-described problems, the document data management apparatus according to the present invention collects document data to be browsed in advance from a server apparatus such as a document file server, and divides the document data for each document component of the document. Then, a hash value is assigned to each document component and held, and by means of a browsing request from the document browsing client that is an information terminal, only the document component corresponding to the request is extracted and transmitted to the document browsing client have. The outline of this document data management apparatus is as follows.

（１）ドキュメントデータ閲覧装置は，ドキュメント閲覧クライアントから参照される可能性があるドキュメントデータを，どのドキュメントファイルサーバからどのように収集するかを定義したドキュメント収集設定を記憶するドキュメント収集設定記憶部と，ドキュメント収集設定に定義されたドキュメントファイルサーバからドキュメントデータを収集するドキュメント収集部と，ドキュメント収集部によって収集されたドキュメントデータをドキュメント構成要素で分割し，ドキュメントデータ保持部に記録する，ドキュメント分割部と，ドキュメント閲覧クライアントからドキュメント検索条件を受け付けるドキュメント検索受付部と，ドキュメント検索受付部が受け付けた検索条件に従って，ドキュメントデータ保持部からドキュメント構成要素を検索し，検索されたドキュメント構成要素にドキュメント名とハッシュ値を付加しドキュメントデータ保持部に保持することによってドキュメント構成要素を生成するドキュメント検索結果生成部と，ドキュメント検索結果としてドキュメント名とハッシュ値とをドキュメント閲覧クライアントに送信するドキュメント検索結果提供部とを備える。 (1) A document data browsing apparatus includes a document collection setting storage unit that stores document collection settings that define how to collect document data that can be referred to by a document browsing client from which document file server. , A document collection unit that collects document data from the document file server defined in the document collection settings, and a document division unit that divides the document data collected by the document collection unit into document components and records them in the document data holding unit And a document search accepting unit that accepts document search conditions from the document browsing client, and a document data holding unit that performs document search according to the search conditions accepted by the document search accepting unit. A document search result generation unit that generates a document component by searching a document component, adds a document name and a hash value to the retrieved document component, and stores the result in a document data holding unit, and a document name as a document search result And a document search result providing unit for transmitting the hash value to the document browsing client.

これによって，ドキュメント閲覧クライアントから参照されるドキュメントデータをドキュメントデータ管理装置内に保持することにより，ドキュメント閲覧クライアントが直接ドキュメントファイルサーバにアクセスしなくともドキュメントデータの閲覧が可能となる。さらには，ドキュメントデータは，ドキュメント文書構造で分割して記録されることにより分割されたドキュメント構成要素だけを参照しても，意味をなさないデータとして記録されることにより，参照セキュリティを確保することができる。 As a result, document data referred to by the document browsing client is held in the document data management apparatus, so that the document browsing client can browse the document data without directly accessing the document file server. Furthermore, document data is recorded as data that does not make sense even if only the document components that are divided are referred to by being divided and recorded in the document document structure, thereby ensuring reference security. Can do.

ドキュメント分割部において，分割されたドキュメント構成要素を保持する際に，分割されたドキュメント構成要素にハッシュ値が付加して記録してもよい。これにより，ハッシュ値をキーとして分割されたデータを結合すれることにより意味のあるドキュメントデータを復元することが可能となる。 When the divided document component is held in the document dividing unit, a hash value may be added to the divided document component and recorded. This makes it possible to restore meaningful document data by combining the divided data using the hash value as a key.

また，本ドキュメント管理装置によれば，ドキュメント閲覧クライアントから入力された検索条件にしたがって，ドキュメントデータ保持部に記録されたドキュメント構成要素を抽出することができる。抽出されたドキュメント構成要素はドキュメント名とハッシュ値が付加されたうえで改めてドキュメントデータ保持部に記録される。 Further, according to the document management apparatus, the document components recorded in the document data holding unit can be extracted according to the search condition input from the document browsing client. The extracted document constituent element is added with a document name and a hash value, and recorded again in the document data holding unit.

そして，付加されたドキュメント名とハッシュ値はドキュメント検索結果提供部からドキュメント閲覧クライアントに送信される。これによってドキュメント閲覧クライアントは，このドキュメント名とハッシュ値を利用してドキュメント閲覧をすることができる。 The added document name and hash value are transmitted from the document search result providing unit to the document browsing client. As a result, the document browsing client can browse the document using the document name and the hash value.

ここで，ドキュメント名はドキュメント検索結果生成部によって，検索結果に対応して付与される一意となる任意のドキュメント名である。例えば，「“検索結果”＋連番」の様なルールによって付与される。また，ハッシュ値とは，検索されたドキュメント構成要素を組み合わせたデータを元にＭＤ５（ＭｅｓｓａｇｅＤｉｇｅｓｔＡｌｇｏｒｉｔｈｍ５）アルゴリズムなどによって生成される検索結果に対応して一意となるハッシュ値である。 Here, the document name is an arbitrary arbitrary document name given by the document search result generation unit corresponding to the search result. For example, it is given by a rule such as ““ search result ”+ serial number”. The hash value is a hash value that is unique corresponding to a search result generated by an MD5 (MessageDigestAlgorithm5) algorithm or the like based on data obtained by combining searched document components.

（２）さらに，ドキュメントデータ閲覧装置は，ドキュメント閲覧クライアントから，ドキュメント閲覧条件を受け付けるドキュメント閲覧受付部と，ドキュメント閲覧条件に従って，ドキュメントデータ保持部からドキュメント構成要素を検索し，検索されたドキュメント構成要素を組み合わせることによってドキュメントデータを生成するドキュメント生成部と，生成されたドキュメントデータを構成するドキュメント構成要素のハッシュ値が閲覧条件のハッシュ値と等しいかを検証するハッシュ値検証部と，生成されたドキュメントデータをドキュメント閲覧クライアントに送信するドキュメント提供部を備える。 (2) Furthermore, the document data browsing device searches the document browsing element from the document browsing client according to the document browsing condition and the document browsing receiving unit that receives the document browsing condition from the document browsing client. A document generation unit that generates document data by combining, a hash value verification unit that verifies whether the hash value of the document component constituting the generated document data is equal to the hash value of the viewing condition, and the generated document A document providing unit that transmits data to the document browsing client is provided.

これによって，ドキュメント閲覧クライアントから閲覧条件を入力することによって分割されたドキュメント構成要素からドキュメントデータを生成することが可能となる。さらには，ハッシュ値を検証することにより，正しいハッシュ値が入力された場合は，ハッシュ値に対応したドキュメントデータの閲覧が行え，ハッシュ値が入力されていない又は正しくない場合は，検索結果により生成されたドキュメントデータを閲覧行えることによりドキュメントデータのセキュリティを確保することができる。 As a result, it is possible to generate document data from the divided document components by inputting browsing conditions from the document browsing client. Furthermore, by verifying the hash value, if the correct hash value is input, the document data corresponding to the hash value can be browsed. If the hash value is not input or incorrect, it is generated by the search result. The security of the document data can be ensured by being able to browse the document data.

（３）また，ドキュメント閲覧装置のドキュメント収集部は，ドキュメントファイルサーバから収集したドキュメントデータが構造化文章データでない場合は，ドキュメントデータを構造化文書データに変換してドキュメントデータを収集し，ドキュメント分割部はドキュメントデータを構造化文書データのタグ情報によってドキュメント構成要素に分割する構成を取ることもできる。 (3) The document collection unit of the document browsing device collects document data by converting the document data into structured document data when the document data collected from the document file server is not structured document data, and divides the document. The unit may take a configuration in which the document data is divided into document constituent elements based on the tag information of the structured document data.

これによって，ドキュメントデータがＸＭＬデータのような構造化文章データである場合は，ＸＭＬデータのタグ情報を元にドキュメントデータをドキュメント構成要素に分割を行い，ドキュメントデータが構造化文章データでない場合は，ドキュメントデータのページ情報やドキュメントデータの有する情報を元に，構造化文章データに変換を行った上で，ドキュメント構成要素への分割を行うことができる。 Thus, when the document data is structured text data such as XML data, the document data is divided into document components based on the tag information of the XML data, and when the document data is not structured text data, Based on the page information of the document data and the information of the document data, the document data can be divided into document components after being converted into structured text data.

（４）また，ドキュメント閲覧受付部は，閲覧条件としてドキュメント検索結果生成部で付加されたドキュメント名を閲覧条件として受付けることもできるし，（５）予めドキュメントデータ保持部に記憶されているドキュメント構成要素の項目名などを閲覧条件として受付けることもできる。 (4) The document browsing accepting unit can also accept the document name added by the document search result generating unit as the browsing condition as the browsing condition. (5) Document configuration stored in the document data holding unit in advance You can also accept element item names as browsing conditions.

これによって，ドキュメント名や項目名などの条件を元に特定のドキュメント構成要素を組み合わせたドキュメントデータを閲覧した場合には，正しいハッシュ値が渡されないと，ドキュメントデータの閲覧は行えないが，ハッシュ値が不明な場合であっても，ドキュメント構成要素単位でドキュメントデータを提供することも行えることとなり，ドキュメントのセキュリティを確保した上でドキュメントデータの一部のみを提供することもできる。 As a result, when browsing document data that combines specific document components based on conditions such as document name and item name, the document data cannot be browsed unless the correct hash value is passed. Even if it is unknown, document data can be provided in units of document components, and only part of the document data can be provided while ensuring document security.

本発明によれば，ドキュメントデータ管理装置は，ドキュメントデータを文ドキュメント構成要素に分割して記憶する事により，ドキュメント構成要素の単体では意味のないデータとすることにより，ドキュメントデータ全体としてのセキュリティを確保することができる。 According to the present invention, the document data management device divides document data into sentence document components and stores them, thereby making the data of the document components meaningless, thereby improving the security of the entire document data. Can be secured.

また，ドキュメント構成要素の単体では意味のないドキュメントデータであっても，ハッシュ値をキーとしてドキュメント構成要素を組合せてドキュメントデータを生成することによって必要とするドキュメントデータを参照することが可能となる。正しいハッシュ値が入力された場合には正しいドキュメントデータが参照できるし，ハッシュ値が入力されていない又は正しくない場合であっても，なんらかのドキュメントデータが表示されるため，ドキュメントデータ閲覧者のドキュメント閲覧性を担保しつつ，ドキュメントデータのセキュリティを確保することができる。 Further, even if the document data is meaningless as a single document component, it is possible to refer to the required document data by generating the document data by combining the document components using the hash value as a key. When the correct hash value is input, the correct document data can be referred to, and even if the hash value is not input or not correct, some document data is displayed. The security of document data can be ensured while ensuring the performance.

さらには，ドキュメント構成要素単位でのデータ参照を可能とし，条件に応じてドキュメント構成要素を組み合わせて表示する事も可能としたことから，元のドキュメントデータ自体のセキュリティは担保した上で，利用者が必要とするドキュメントデータのドキュメント構成要素を参照する事が可能となる。 In addition, since it is possible to refer to data in units of document components and to display a combination of document components according to conditions, the security of the original document data itself is guaranteed and It is possible to refer to the document component of the document data required by the.

ドキュメントデータ管理装置が用いられるシステムの全体構成図である。1 is an overall configuration diagram of a system in which a document data management apparatus is used. ドキュメントデータ収集の処理フローチャートである。It is a processing flowchart of document data collection. ドキュメントデータ収集設定の例を示す図である。It is a figure which shows the example of document data collection setting. ドキュメントデータ分割の処理フローチャートである。It is a processing flowchart of document data division. ドキュメントデータ検索処理の処理フローチャートである。It is a process flowchart of a document data search process. ドキュメントデータ閲覧処理の処理フローチャートである。It is a process flowchart of a document data browsing process. ドキュメントデータ閲覧処理の処理フローチャートである。It is a process flowchart of a document data browsing process. ドキュメントデータの例を示す図である。It is a figure which shows the example of document data. 構造化文書データのドキュメントデータの例を示す図である。It is a figure which shows the example of the document data of structured document data. 構造化文書データの分割例を示す図である。It is a figure which shows the example of a division | segmentation of structured document data. 構造化文書データの分割例を示す図である。It is a figure which shows the example of a division | segmentation of structured document data. ドキュメント構成要素の保存例を示す図である。It is a figure which shows the example of a preservation | save of a document component. ドキュメントデータ検索結果のドキュメント構成要素の保存例を示す図である。It is a figure which shows the example of preservation | save of the document component of a document data search result. ドキュメント閲覧クライアントにおけるドキュメント検索条件入力画面の例を示す図である。It is a figure which shows the example of the document search condition input screen in a document browsing client. ドキュメント閲覧クライアントにおけるドキュメント検索結果表示画面の例を示す図である。It is a figure which shows the example of the document search result display screen in a document browsing client. ドキュメント閲覧クライアントにおけるドキュメント内容表示画面の例を示す図である。It is a figure which shows the example of the document content display screen in a document browsing client. ドキュメント閲覧クライアントにおけるドキュメント内容表示画面の例を示す図である。It is a figure which shows the example of the document content display screen in a document browsing client.

以下，図面を用いながら，本発明の実施の形態について詳細に説明する。図１は，ドキュメントデータ管理装置が用いられるシステムの全体構成図である。図１に示すように，ドキュメントデータ管理装置１は，ドキュメントファイルサーバ３およびドキュメント閲覧クライアント２にネットワークを介して接続され，ドキュメントファイルサーバ３に保存されているドキュメントデータを収集し，ドキュメント閲覧クライアント２にドキュメントデータを提供する。ドキュメントデータ管理装置１は，ドキュメントファイルサーバ３およびドキュメント閲覧クライアント２の接続はイントラネットの様な内部ネットワークでもよいし，インターネットを利用した外部ネットワークを経由した接続であってもよい。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is an overall configuration diagram of a system in which a document data management apparatus is used. As shown in FIG. 1, a document data management apparatus 1 is connected to a document file server 3 and a document browsing client 2 via a network, collects document data stored in the document file server 3, and collects the document browsing client 2 Provide document data to In the document data management apparatus 1, the connection between the document file server 3 and the document browsing client 2 may be an internal network such as an intranet or a connection via an external network using the Internet.

ドキュメントファイルサーバ３は，例えば，企業内でのファイルサーバやＦＴＰ（ＦｉｌｅＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）サーバ，ＷＷＷ（ＷｏｒｄＷｉｄｅＷｅｂ）サーバなどの，ドキュメントデータが格納されているサーバ装置であり，ドキュメント閲覧クライアント２は，例えば，携帯電話やスマートフォンなどの携帯情報端末や，ＰＣ（ＰｅｒｏｓｏｎａｌＣｏｍｐｕｔｅｒ）などで動作するドキュメント閲覧クライアントといった情報端末である。 The document file server 3 is a server device in which document data is stored, such as a file server in an enterprise, an FTP (File Transfer Protocol) server, a WWW (Word Wide Web) server, and the document browsing client 2 is, for example, a mobile phone It is an information terminal such as a portable information terminal such as a telephone or a smartphone or a document browsing client that operates on a PC (Personal Computer) or the like.

ドキュメントデータ管理装置１は，ドキュメント検索受付部１０１および，ドキュメント検索結果生成部１０２，ドキュメント検索結果提供部１０３，ドキュメント提供部１０４，ハッシュ値検証部１０５，ドキュメント生成部１０６，ドキュメント閲覧受付部１０７，ドキュメント分割部１０８，ドキュメント収集部１０９，ドキュメントデータ保持部１１０，ドキュメント収集設定記憶部１２０，から構成される。 The document data management apparatus 1 includes a document search reception unit 101, a document search result generation unit 102, a document search result provision unit 103, a document provision unit 104, a hash value verification unit 105, a document generation unit 106, a document browsing reception unit 107, The document dividing unit 108, the document collecting unit 109, the document data holding unit 110, and the document collection setting storage unit 120 are included.

以下，ドキュメントデータ管理装置１の各部について説明する。 Hereinafter, each part of the document data management apparatus 1 will be described.

ドキュメント収集設定記憶部１２０には，複数あるドキュメントファイルサーバ３のどのサーバからどのようにドキュメントデータを収集するのかを設定したドキュメント収集設定が記憶される。ドキュメント収集設定は図３に示される設定項目である。ドキュメント収集設定の内容については後述する。 The document collection setting storage unit 120 stores a document collection setting that sets how to collect document data from which of the plurality of document file servers 3. The document collection setting is a setting item shown in FIG. The contents of the document collection setting will be described later.

ドキュメント収集部１０９は，ドキュメント収集設定記憶部１２０に記憶されているドキュメント収集設定を参照し，ドキュメント収集設定の設定内容に従ってドキュメントファイルサーバ３よりドキュメントデータを収集する。 The document collection unit 109 refers to the document collection setting stored in the document collection setting storage unit 120 and collects document data from the document file server 3 according to the setting contents of the document collection setting.

ドキュメント分割部１０８は，ドキュメント収集部１０９がドキュメントファイルサーバ３より収集したドキュメントデータを，ドキュメント構成要素に分割し，ドキュメントデータ保持部１１０に記録する。 The document dividing unit 108 divides the document data collected from the document file server 3 by the document collecting unit 109 into document constituent elements and records them in the document data holding unit 110.

ドキュメントデータ保持部１１０は，ドキュメント分割部１０８によって分割されたドキュメント構成要素やドキュメント検索結果生成部１０２で検索されたドキュメント構成要素などが保持される。ドキュメントデータ保持部１０８に保持されるドキュメント構成要素の内容は図１２および図１３に示される内容である。ドキュメントデータ保持部に保持されるドキュメント構成要素の内容については後述する。 The document data holding unit 110 holds the document components divided by the document dividing unit 108, the document components searched by the document search result generating unit 102, and the like. The contents of the document components held in the document data holding unit 108 are the contents shown in FIGS. The contents of the document component held in the document data holding unit will be described later.

ドキュメント検索受付部１０１は，ドキュメント閲覧クライアント２から接続され，ドキュメント閲覧クライアント２からのドキュメント検索条件を受け取り，受け取った検索条件をドキュメント検索結果生成部１０２に引き渡す。 The document search reception unit 101 is connected from the document browsing client 2, receives the document search condition from the document browsing client 2, and delivers the received search condition to the document search result generation unit 102.

ドキュメント検索結果生成部１０２は引き渡されたドキュメント検索条件の検索条件を元に，ドキュメントデータ保持部１１０からドキュメント構成要素の検索を行う。検索結果のドキュメント構成要素には同一のハッシュ値が付与される。ドキュメント構成要素を組合せる事で１つのドキュメントデータとなる。検索結果のドキュメント構成要素は，ハッシュ値とドキュメント名が付加され，検索結果のドキュメント構成要素としてドキュメントデータ保持部１１０に記録される。 The document search result generation unit 102 searches for document components from the document data holding unit 110 based on the search conditions of the delivered document search conditions. The same hash value is assigned to the document component of the search result. Combining document components results in one document data. The hash value and the document name are added to the document component of the search result, and are recorded in the document data holding unit 110 as the document component of the search result.

ドキュメント検索結果提供部１０３は，ドキュメント検索結果生成部１０２で生成されたドキュメント名とハッシュ値をドキュメント閲覧クライアント３に提供する。ドキュメント閲覧クライアント２は提供されたドキュメント名とハッシュ値の表示を行う。 The document search result providing unit 103 provides the document browsing client 3 with the document name and hash value generated by the document search result generating unit 102. The document browsing client 2 displays the provided document name and hash value.

ドキュメント閲覧受付部１０７は，ドキュメント閲覧クライアント２から接続され，ドキュメントの閲覧要求を受け取り，受け取った閲覧要求の内容をドキュメント生成部１０６に引き渡す。 The document browsing reception unit 107 is connected from the document browsing client 2, receives a document browsing request, and delivers the content of the received browsing request to the document generation unit 106.

ドキュメント生成部１０６は，ドキュメント閲覧受付部１０７から引き渡された閲覧要求の要求内容に従って，ドキュメントデータ保持部１１０からドキュメント構成要素の抽出を行い，抽出されたドキュメント構成要素を組み合わせてドキュメントデータを作成する。 The document generation unit 106 extracts document components from the document data holding unit 110 according to the request content of the browsing request delivered from the document browsing reception unit 107, and creates document data by combining the extracted document components. .

ハッシュ値検証部１０５は，ドキュメント生成部１０６で作成されたドキュメントデータのハッシュ値が正しいかどうかの検証を行う。ハッシュ値が正しい場合は，生成ドキュメントはドキュメント提供部１０４よりドキュメント閲覧クライアント２に提供され，ドキュメント閲覧クライアント２によってドキュメントの表示が行われる。 The hash value verification unit 105 verifies whether the hash value of the document data created by the document generation unit 106 is correct. If the hash value is correct, the generated document is provided to the document browsing client 2 from the document providing unit 104, and the document browsing client 2 displays the document.

次に，ドキュメントデータ管理装置１が実行する処理の具体例について説明する。 Next, a specific example of processing executed by the document data management apparatus 1 will be described.

〔ドキュメントデータの収集と分割処理〕
図２は，ドキュメント収集部１０９の処理フローチャートであり，図４は，ドキュメント分割部１０８の処理フローチャートである。ドキュメント収集部１０９はドキュメント収集設定記憶部１２０に記憶されたドキュメント収集設定に応じて，ドキュメントファイルサーバからドキュメントデータの収集を行い，収集されたドキュメントデータはドキュメント分割部１０８においてドキュメント構成要素に分解され，ドキュメントデータ保持部１１０に保存される。ドキュメントデータの取得処理は，例えばあらかじめ定められた収集サイクルの設定に従って一定時間間隔で行われる。 [Collecting and dividing document data]
2 is a process flowchart of the document collection unit 109, and FIG. 4 is a process flowchart of the document dividing unit 108. The document collection unit 109 collects document data from the document file server in accordance with the document collection setting stored in the document collection setting storage unit 120, and the collected document data is decomposed into document components by the document dividing unit 108. , Stored in the document data holding unit 110. The document data acquisition process is performed at regular time intervals in accordance with, for example, a predetermined collection cycle setting.

図３は，ドキュメント収集設定記憶部１２０に記憶されているドキュメント収集設定の例を示している。図３に示すドキュメント収集設定情報は，「設定名」「サーバ」「サーバ種別」「データ取得方法」「サーバ接続設定」の各設定項目を有する。以下，それぞれの設定項目の意味について説明する。 FIG. 3 shows an example of document collection settings stored in the document collection setting storage unit 120. The document collection setting information shown in FIG. 3 includes setting items “setting name”, “server”, “server type”, “data acquisition method”, and “server connection setting”. The meaning of each setting item will be described below.

「設定名」はドキュメント収集設定の名称である。この設定名は，どのようなドキュメントを収集するのかを識別するために用いられる。「サーバ」はドキュメントが格納されているサーバのアドレスを示す。「サーバ種別」はサーバがどのような機能を持つサーバであるかについてのサーバの種別を示す。例えば，「ＦＩＬＥ」と設定されている場合は，ファイルサーバであることを表し，「ＷＥＢ」と設定されている場合は，ＷＷＷ（ＷｏｒｄＷｉｄｅＷｅｂ）サーバであることを表し，「ＦＴＰ」と設定されている場合は，ＦＴＰ（ＦｉｌｅＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）サーバであることを表す。「データ取得方法」はデータ取得に用いるプロトコルを示す。例えば，「ＳＭＢ」と設定されている場合はＳａｍｂａプロトコルで接続しデータ取得する必要があることを表し，「ＨＴＴＰ」と設定されている場合は，ＨＴＴＰ（ＨｙｐｅｒＴｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）プロトコルで接続しデータ取得する必要があることを表し，「ＦＴＰ」と設定されている場合は，ＦＴＰプロトコルで接続してデータ取得が必要であることを表す。「サーバ接続設定」はサーバに接続するための認証条件（接続ＩＤとパスワード）や接続設定（接続文字列）などが設定される。例えば，サーバ接続にＩＤとパスワードによる認証が必要なプロトコルのサーバである場合は，「ＩＤ：ＸＸＸＸＸ,ＰＷＤ：ＹＹＹＹＹ］の様に設定され，ＩＤにＸＸＸＸＸをパスワードにＹＹＹＹＹを用いて認証処理を行いサーバへの接続が行われる。 “Setting name” is the name of the document collection setting. This setting name is used to identify what kind of document is collected. “Server” indicates the address of the server where the document is stored. “Server type” indicates the type of server as to what function the server has. For example, when “FILE” is set, it indicates a file server, and when “WEB” is set, it indicates a WWW (WordWideWeb) server, and “FTP” is set. If it is, it represents an FTP (File Transfer Protocol) server. “Data acquisition method” indicates a protocol used for data acquisition. For example, when “SMB” is set, it means that it is necessary to connect and acquire data using the Samba protocol, and when “HTTP” is set, it is necessary to connect and acquire data using the HTTP (HyperTextTransfer Protocol) protocol. When “FTP” is set, it indicates that it is necessary to acquire data by connecting with the FTP protocol. “Server connection setting” sets authentication conditions (connection ID and password), connection setting (connection character string), and the like for connecting to the server. For example, in the case of a server with a protocol that requires authentication by ID and password for server connection, the setting is made as “ID: XXXX, PWD: YYYYY”, and authentication is performed using XXXX as the ID and YYYYY as the password. A connection to the server is made.

ここで，図２および図４の処理フローチャートを元にドキュメントデータの収集と変換処理の詳細を説明する。 Here, the details of document data collection and conversion processing will be described based on the processing flowcharts of FIGS.

図２はドキュメントデータ収集処理フローを表す。図２に示すステップＳ１０１では，まず，ドキュメント収集部１０９が，ドキュメント収集設定記憶部１２０より，図３に示す様なドキュメント収集設定の一覧を取得する。次に，ステップＳ１０２では，取得したドキュメント収集設定の設定数だけ，ドキュメント収集設定ごとにステップＳ１０３からステップＳ１０５までの処理を繰返し行う。例えば，図３のドキュメント収集設定例では，ドキュメント収集設定が３つ設定されていることから，ステップＳ１０３からステップＳ１０５までの処理が３回繰り返される。 FIG. 2 shows a document data collection process flow. In step S 101 shown in FIG. 2, first, the document collection unit 109 acquires a list of document collection settings as shown in FIG. 3 from the document collection setting storage unit 120. Next, in step S102, the processing from step S103 to step S105 is repeated for each document collection setting for the number of acquired document collection settings. For example, in the document collection setting example of FIG. 3, since three document collection settings are set, the processing from step S103 to step S105 is repeated three times.

次のステップＳ１０３ではドキュメント収集設定の「サーバ」に設定されているサーバに接続する。サーバへの接続は「サーバ種別」「データ取得方法」「サーバ接続設定」に設定されている情報に基づいて，接続が行われる。例えば，図３において「設定名」に「ファイルサーバからのデータ収集設定」と定義されているドキュメント収集設定は「サーバ」「サーバ種別」「データ取得方法」「サーバ接続設定」がそれぞれ，“ｇｒｏｕｐＦｉｌｅＳｅｒｖｅｒ”，“ＦＩＬＥ”，“ＳＭＢ”，“ＩＤ：ＸＸＸＸＸ，ＰＷＤ：ＹＹＹＹＹ”，と定義されている。この様に設定されている場合は，ＦＩＬＥサーバであるｇｒｏｕｐＦｉｌｅＳｅｒｖｅｒにＳａｍｂａプロトコルでＩＤにＸＸＸＸＸをパスワードにＹＹＹＹＹを用いて接続される。 In the next step S103, the server is connected to the server set in the “server” of the document collection setting. Connection to the server is made based on information set in “server type”, “data acquisition method”, and “server connection setting”. For example, in FIG. 3, the document collection settings defined as “data collection settings from file server” in “setting name” are “server”, “server type”, “data acquisition method”, and “server connection setting”, respectively, and “groupFileServer”. ”,“ FILE ”,“ SMB ”,“ ID: XXXXXX, PWD: YYYYY ”. When set in this way, the connection is made to the FILE server groupFileServer using the Samba protocol with IDXXXX as the ID and YYYYY as the password.

サーバへの接続が行われると次にステップＳ１０４において，接続したサーバに保存されているドキュメントデータの取得を行う。ドキュメントデータの取得は，ドキュメント収集設定の「データ取得方法」に設定されている情報に基づいて取得処理が行われる。例えば，「データ取得方法」に“ＦＴＰ”と設定されている場合は，ＦＴＰプロトコルを用いてドキュメントデータの取得処理が行われる。ドキュメントデータの取得はサーバに格納されているドキュメントデータを全て取得する様にしてもよいし，別途取得ルールの設定を行い，取得設定のルールに従ってドキュメントデータを取得するようにしてもよい。 When the connection to the server is made, in step S104, the document data stored in the connected server is acquired. Acquisition of document data is performed based on information set in the “data acquisition method” of the document collection setting. For example, when “FTP” is set in the “data acquisition method”, document data acquisition processing is performed using the FTP protocol. The document data may be acquired by acquiring all the document data stored in the server, or by separately setting an acquisition rule and acquiring the document data according to the acquisition setting rule.

サーバよりドキュメントデータの取得が行われると次に，ステップＳ１０５において，取得したドキュメントデータの分割処理と分割したドキュメントデータのドキュメントデータ保持部１１０への保存処理を行う。ドキュメントデータの分割処理と保存処理はステップＳ１０４で取得されたドキュメントデータの全てに対して行われる。ドキュメントデータの分割処理と保存処理の詳細については図４に示すドキュメントデータ分割の処理フローを元に後述する。 When the document data is acquired from the server, in step S105, the acquired document data is divided and the divided document data is stored in the document data holding unit 110. Document data division processing and storage processing are performed on all the document data acquired in step S104. Details of document data division processing and storage processing will be described later based on the document data division processing flow shown in FIG.

取得した全てのドキュメントデータに対して処理が完了した後に，ステップＳ１０６において，ドキュメント収集設定の全てについて処理がされたかを確認する。ドキュメント収集設定の全てについて処理が完了していない判断された場合（ステップＳ１０６，Ｎｏ）は，ステップＳ１０２に戻り，次のドキュメント収集設定の処理が行われる。ドキュメント収集設定の全てについて処理が完了したと判断された場合（ステップＳ１０６，Ｙｅｓ）は，ドキュメントデータの収集と変換処理を終了する。 After the processing has been completed for all the acquired document data, in step S106, it is confirmed whether or not all the document collection settings have been processed. If it is determined that the processing has not been completed for all the document collection settings (No in step S106), the process returns to step S102, and the next document collection setting process is performed. If it is determined that the processing has been completed for all the document collection settings (step S106, Yes), the document data collection and conversion processing is terminated.

次に，図４の処理フローを元にドキュメントデータの分割処理と保存処理の詳細を説明する。このドキュメントデータの分割処理フローは前述しているステップＳ１０５のドキュメント分割処理を詳述したものである。 Next, details of document data division processing and storage processing will be described based on the processing flow of FIG. This document data dividing process flow details the document dividing process in step S105 described above.

ドキュメント分割処理はドキュメント分割処理部１０８で処理される。まず，ステップＳ２０1では分割対象となるドキュメントデータの情報を構造化文書データとしてドキュメント収集部１０９より取得する。変換対象となるドキュメントデータは，例えば図８に示す様な文章データであり，文章データが構造化文書データで保存されている場合は，図９に示されるような文章化構造文書データとして取得される。ここで，ドキュメントデータのデータ形式が，構造化文章データとして保存されていない場合は，データ形式をＸＭＬデータ形式のような構造化文書データに変換する。構造化文章データへの変換は，ドキュメントデータのファイル情報やページ情報などのドキュメントデータのプロパティ情報や，ドキュメントデータに含まれるデータの意味などから構造化文章データに変換がおこなわれる。例えば，ドキュメントデータが表形式データである場合は，表のカラムをドキュメントデータの構成要素として変換することができるし，文章のみで構成されるドキュメントデータである場合であっても，文章の段落や表題の情報を元に構造化文章データに変換する事ができる。すでにファイル形式が構造化文章データである場合には変換処理は行われない。 The document division processing is processed by the document division processing unit 108. First, in step S201, information of document data to be divided is acquired from the document collection unit 109 as structured document data. The document data to be converted is, for example, text data as shown in FIG. 8. When text data is stored as structured document data, it is acquired as text-structured structured document data as shown in FIG. The Here, when the data format of the document data is not stored as structured text data, the data format is converted into structured document data such as an XML data format. The conversion to structured text data is performed to structured text data based on document data property information such as document data file information and page information, and the meaning of data included in the document data. For example, if the document data is tabular data, the columns of the table can be converted as a component of the document data, and even if the document data is composed of only sentences, It can be converted into structured text data based on the title information. If the file format is already structured text data, the conversion process is not performed.

ドキュメントデータを取得したら，ステップＳ２０２において，構造化文書データをドキュメント構成要素に分割する。構造化文書データはタグと属性により構成される場合が一般的である。例えば，＜ＤＡＴＡＮＡＭＥ＝”商品明細”＞として記載されたデータ行では，ＤＡＴＡがタグ名であり，ＮＡＭＥが属性名である。属性には，そのデータの種別を表す属性が定義される。ここでは，ＤＡＴＡタグにＮＡＭＥ属性として”商品明細”が定義されていることが分かる。 When the document data is acquired, the structured document data is divided into document components in step S202. In general, structured document data includes tags and attributes. For example, in a data line described as <DATA NAME = “product description”>, DATA is a tag name and NAME is an attribute name. In the attribute, an attribute representing the type of the data is defined. Here, it can be seen that “product description” is defined as the NAME attribute in the DATA tag.

ドキュメントデータの分割処理は，一つの例として，タグの属性情報を元にデータの分割を行う。図１０は，構造化文章データをタグの属性情報を元に分割した例である。この例では，ＤＡＴＡタグのＮＡＭＥ属性の単位で構造化文章データの分割を行っている。タグの属性情報を元に分割された構造化文章データは，さらに内部のタグで分割が行われる。図１１は，属性情報で分割されたドキュメントデータを内部のタグでさらに分割した例である。図１１に表されているように，属性情報で分割されたドキュメントデータの階層構造を内部のタグで最小単位にまで分割を行う。 As an example of the document data dividing process, data is divided based on tag attribute information. FIG. 10 shows an example in which structured text data is divided based on tag attribute information. In this example, structured text data is divided in units of NAME attributes of DATA tags. The structured text data divided based on the tag attribute information is further divided by internal tags. FIG. 11 shows an example in which document data divided by attribute information is further divided by internal tags. As shown in FIG. 11, the hierarchical structure of the document data divided by the attribute information is divided to the minimum unit by an internal tag.

ドキュメントデータが構造化文書データとして階層構造の内部タグで最小単位にまで分割された単位がドキュメント構成要素となる。ドキュメント構成要旨に分割を行ったら，ステップＳ２０３において分割したドキュメント構成要素を，属性ごとに分類する。分類を行う属性はステップＳ２０２においてドキュメントデータを分割する際に用いられたタグの属性によって分類される。次に，ステップＳ２０４において同じドキュメントデータから分割されたドキュメント構成要素に同じハッシュ値を関連付ける。ハッシュ値は，例えば，ドキュメントデータから，既知のＭＤ５アルゴリズムなどによって生成されるユニークなハッシュ値などである。 A unit obtained by dividing document data as structured document data into a minimum unit by a hierarchical internal tag is a document constituent element. When the document composition summary is divided, the document composition elements divided in step S203 are classified for each attribute. The attribute to be classified is classified according to the attribute of the tag used when dividing the document data in step S202. In step S204, the same hash value is associated with the document components divided from the same document data. The hash value is, for example, a unique hash value generated from document data by a known MD5 algorithm or the like.

ドキュメントデータのドキュメント構成要素への分割とハッシュ値の設定が行われたら，ステップＳ２０５において分割されたドキュメント構成要素とハッシュ値がドキュメントデータ保持部１１０に記録される。図１２は図９で表される構造化文章データ形式のドキュメントデータを分割してドキュメントデータ保持部１１０に記録された例である。図１２において，「ドキュメント名」は分割元となったドキュメントデータのドキュメント名を表し，「属性」はステップＳ２０２において分割し，ステップ２０３において分類した際の属性情報名を表し，「タグ」はステップＳ２０２において最小単位にまで分割されたドキュメント構成要素のタグ名を表し，「値」はステップＳ２０２において最小単位にまで分割されたドキュメント構成要素の値を表し，「ハッシュ値」はステップＳ２０４において付与されたハッシュ値を表す。 When the document data is divided into document constituent elements and the hash value is set, the document constituent element and the hash value divided in step S205 are recorded in the document data holding unit 110. FIG. 12 shows an example in which the document data in the structured text data format shown in FIG. 9 is divided and recorded in the document data holding unit 110. In FIG. 12, “document name” represents the document name of the document data from which the document is divided, “attribute” represents the attribute information name when divided in step S202 and classified in step 203, and “tag” represents the step. The tag name of the document component divided to the minimum unit in S202 is represented, the “value” represents the value of the document component divided to the minimum unit in step S202, and the “hash value” is assigned in step S204. Represents a hash value.

以上までが，ドキュメントデータの収集と分割処理についての詳細となる。上述の通り，ドキュメント構成要素に分割されたドキュメントデータはハッシュ値によって，関連付けが保たれるため，ハッシュ値を関連キーとして分割されたドキュメント構成要素を組合せることによって一つのドキュメントデータに復元することが可能である。 The above is the details about the collection and division processing of document data. As described above, since the document data divided into document components is related by the hash value, it is restored to one document data by combining the document components divided using the hash value as a related key. Is possible.

次に，ドキュメント閲覧クライアント２における，ドキュメント構成要素の検索と閲覧方法について詳述する。 Next, a document component search and browsing method in the document browsing client 2 will be described in detail.

〔ドキュメントデータの検索処理〕
ドキュメント構成要素の検索は，ドキュメント閲覧クライアント２より，検索キーワードがドキュメント検索受付部１０１に送信されることによって行われる。図１４はドキュメント閲覧クライアント２において表示される，検索条件入力画面の例を表したものである。図１４において検索キーワードの入力欄に検索条件となるキーワード文字列が入力され，検索ボタンが押下されることによって，ドキュメント閲覧クライアント２よりドキュメント検索受付部１０１に検索キーワードが検索条件として送信される。 [Document data search processing]
The search for the document component is performed by transmitting a search keyword from the document browsing client 2 to the document search receiving unit 101. FIG. 14 shows an example of a search condition input screen displayed on the document browsing client 2. In FIG. 14, a keyword character string serving as a search condition is input in the search keyword input field, and when the search button is pressed, the search keyword is transmitted from the document browsing client 2 to the document search reception unit 101 as a search condition.

図５は，ドキュメント検索処理の処理フローチャートである。ステップＳ３０１において，ドキュメント検索受付部１０１はドキュメント閲覧クライアント２より送信された検索条件の取得を行う。ステップＳ３０２において，ドキュメント検索結果生成部１０２は，ドキュメント検索受付部１０１が取得した検索条件を元に，ドキュメントデータ保持部１１０より，ドキュメント構成要素の検索を行い，ドキュメント構成要素の組合せを生成する。そして，ステップＳ３０３にて，ドキュメント構成要素の組合せからハッシュ値を生成し，検索されたドキュメント構成要素のそれぞれに同じハッシュ値を付与され，ステップＳ３０４において検索されたドキュメント構成要素と付与されたハッシュ値にドキュメント名が付与され，ドキュメントデータ保持部１１０に記録される。ドキュメント名には検索条件より生成される名称を付与してもよいし，固定文字列に連番の組み合わせで生成される名称を付与してもよいし，任意のドキュメント名を付与できるようにしてもよい。 FIG. 5 is a process flowchart of the document search process. In step S <b> 301, the document search reception unit 101 acquires the search condition transmitted from the document browsing client 2. In step S302, the document search result generation unit 102 searches the document component from the document data holding unit 110 based on the search condition acquired by the document search reception unit 101, and generates a combination of document components. In step S303, a hash value is generated from the combination of document components, the same hash value is assigned to each retrieved document component, and the retrieved document component and hash value provided in step S304. Is assigned a document name and recorded in the document data holding unit 110. The document name may be given a name generated from the search condition, or a fixed character string may be given a name generated by a combination of sequential numbers, or any document name can be assigned. Also good.

例えば，ドキュメントデータ保持部１１０にドキュメント構成要素が図１２に表されるように記録されている場合に，ドキュメント閲覧クライアント２から検索キーワードとして“商品名＝製氷機”が入力された場合は，ドキュメント検索受付部１０１が検索条件として検索キーワード“商品名＝製氷機”を取得し，ドキュメント検索結果生成部１０２はドキュメントデータ保持部１１０に記録されているドキュメント構成要素より，「タグ名」が”商品名”で「値」が”製氷機”となっているドキュメント構成要素の分類を検索し，図１３に表されるようなドキュメント構成要素の組合せを生成する。そして，生成されたドキュメント構成要素の組合せには，同じ検索条件によって生成されたドキュメント構成要素の組合せであることを識別するために，それぞれのドキュメント構成要素に同じハッシュ値が設定され，ドキュメント名に”検索結果０１”を設定し，図１３のドキュメント構成要素の組合せがドキュメント検索結果生成部１０２によってドキュメントデータ保持部１１０に記録される。 For example, when document components are recorded in the document data holding unit 110 as shown in FIG. 12, if “product name = ice making machine” is input from the document browsing client 2 as a search keyword, The search receiving unit 101 acquires the search keyword “product name = ice machine” as a search condition, and the document search result generation unit 102 determines that the “tag name” is “product” from the document components recorded in the document data holding unit 110. A search is made for the classification of document components whose “value” is “ice machine” in “value”, and a combination of document components as shown in FIG. 13 is generated. Then, in order to identify the combination of document components generated by the same search condition, the same hash value is set for each document component, and the document name is set to the document name. “Search result 01” is set, and the document component combination shown in FIG. 13 is recorded in the document data holding unit 110 by the document search result generation unit 102.

検索結果のドキュメント構成要素がドキュメントデータ保持部１１０に記録されると，次のステップＳ３０５において，ドキュメント検索結果提供部１０３よりドキュメント閲覧クライアント２に検索結果としてドキュメント名とハッシュ値が送信される。図１５は，ドキュメント閲覧クライアント２において，検索結果としてドキュメント名とハッシュ値が表示された画面例を表したものである。 When the document component of the search result is recorded in the document data holding unit 110, the document name and hash value are transmitted as the search result from the document search result providing unit 103 to the document browsing client 2 in the next step S305. FIG. 15 shows an example of a screen on which a document name and a hash value are displayed as a search result in the document browsing client 2.

検索キーワードの入力と検索方法は，上述した様に，タグ名とその値の入力による検索でもよいし，フリーワードによる検索でもよいし，通常のテキスト検索で用いられる手法であればどのような検索方法であってもよい。また，検索結果となるドキュメント構成要素の組合せとハッシュ値を予め任意に生成しておきドキュメントデータ保持部１１０に記録しておいてもよい。このようにすることにより，頻繁に参照されるドキュメント構成要素の組合せを予め作成しておくことが可能となり，検索条件とハッシュ値を入力するだけで容易にドキュメントデータの参照が可能となる。 As described above, the search keyword input and search method may be a search by inputting a tag name and its value, a free word search, or any search method used in a normal text search. It may be a method. Further, a combination of document constituent elements and a hash value as search results may be arbitrarily generated in advance and recorded in the document data holding unit 110. By doing so, it is possible to create a combination of frequently referred document components in advance, and it is possible to easily refer to document data simply by inputting a search condition and a hash value.

次に，ドキュメントデータ検索処理によりドキュメントデータ保持部１１０に記録されたドキュメント構成要素を閲覧する処理について詳述する。 Next, a process for browsing the document components recorded in the document data holding unit 110 by the document data search process will be described in detail.

〔ドキュメントデータの閲覧処理〕
ドキュメントデータ保持部１１０に記録されたドキュメント構成要素の閲覧は，ドキュメント閲覧クライアント２より行われるが，その閲覧方法は，図１５におけるドキュメント検索結果の表示画面から引き続き検索されたドキュメント構成要素の組合せを閲覧する場合と，図１４における検索条件入力画面において検索キーワードとハッシュ値を入力して閲覧する場合とがある。 [Document data browsing process]
Browsing of the document components recorded in the document data holding unit 110 is performed by the document browsing client 2, and the browsing method is to select a combination of document components continuously searched from the document search result display screen in FIG. There are cases of browsing and cases of browsing by inputting a search keyword and a hash value on the search condition input screen in FIG.

図６は，ドキュメント検索結果より引き続き検索されたドキュメント構成要素の組合せを閲覧する場合の処理フローを表し，図７は，検索キーワードとハッシュ値からドキュメント構成要素の組合せを閲覧する場合の処理フローを表す。 FIG. 6 shows a processing flow in the case of browsing a combination of document components that are continuously searched from the document search result, and FIG. 7 shows a processing flow in the case of browsing a combination of document components from a search keyword and a hash value. Represent.

初めに，図６を元に，ドキュメント検索結果より引き続き検索されたドキュメント構成要素の組合せを閲覧する場合の処理について詳述する。ドキュメント閲覧クライアント２にいて図１５に示すような検索結果画面が表示されている場合において，閲覧ボタンが押されると，ドキュメント閲覧クライアント２よりドキュメント閲覧受付部１０７にドキュメント名とハッシュ値が送られる。ステップＳ４０１においてドキュメント閲覧受付部１０７は送られたドキュメント名とハッシュ値を受け取る。そしてステップＳ４０２では，ドキュメント生成部１０６がドキュメント閲覧受付部１０７により取得されたドキュメント名を元に，ドキュメントデータ保持部１１０よりドキュメント構成要素の検索を行う。ここでは，ドキュメントデータ提供部１１０に記録されているドキュメント構成要素「ドキュメント名」の項目が，ドキュメント閲覧受付部１０７の取得したドキュメント名と一致するドキュメント構成要素が検索される。そして，ステップＳ４０３にて，検索されたドキュメント構成要素を組み合わせることによってドキュメントデータを生成する。次に，ステップＳ４０４にて，検索されたドキュメント構成要素に付与されているハッシュ値にドキュメント閲覧受付部１０７が取得したハッシュ値が含まれているかどうかを検証する。 First, based on FIG. 6, a process in the case of browsing a combination of document constituent elements that are continuously searched from the document search result will be described in detail. When the search result screen as shown in FIG. 15 is displayed in the document browsing client 2, when the browse button is pressed, the document browsing client 2 sends the document name and the hash value to the document browsing receiving unit 107. In step S401, the document browsing receiving unit 107 receives the sent document name and hash value. In step S <b> 402, the document generation unit 106 searches the document component from the document data holding unit 110 based on the document name acquired by the document browsing reception unit 107. Here, the document component whose item “document name” recorded in the document data providing unit 110 matches the document name acquired by the document browsing receiving unit 107 is searched. In step S403, document data is generated by combining the retrieved document components. Next, in step S404, it is verified whether or not the hash value acquired by the document browsing acceptance unit 107 is included in the hash value assigned to the retrieved document component.

ステップＳ４０５において，ステップＳ４０４で検証したハッシュ値が含まれていた場合（Ｓ４０５，ＹＥＳ）はステップＳ４０６に進み，ドキュメント提供部１０４がステップＳ４０３で生成したドキュメントデータから，検証したハッシュ値に対応するドキュメント構成要素から構成されるドキュメントデータのみをドキュメント閲覧クライアント２に返却する。ステップＳ４０５において，ステップＳ４０4で検証したハッシュ値が含まれてなかった場合（Ｓ４０５，Ｎｏ）はステップＳ４０７に進み，ドキュメント提供部１０４はドキュメント閲覧クライアント２にステップＳ４０３で生成されたドキュメントデータの全てを返却する。 In step S405, if the hash value verified in step S404 is included (S405, YES), the process proceeds to step S406, and the document corresponding to the verified hash value from the document data generated by the document providing unit 104 in step S403. Only the document data composed of the components is returned to the document browsing client 2. In step S405, if the hash value verified in step S404 is not included (S405, No), the process proceeds to step S407, and the document providing unit 104 sends all of the document data generated in step S403 to the document browsing client 2. return.

この様に，検索結果からドキュメントデータを表示する場合であっても，ハッシュ値を検証して表示を制御することによって，例えば不正な画面遷移があった場合や，ドキュメント検索結果を他の利用者に閲覧させたくない場合などに，ドキュメント検索により生成されたドキュメントデータの表示を制御する事ができるため，ドキュメントデータのセキュリティを確保する事ができる。つまりは，検索結果から生成されたハッシュ値が正しく送信された場合には，ハッシュ値に対応する検索結果のドキュメントデータが表示されるし，ハッシュ値が正しくなかった場合であっても，送信された検索結果に対応するなんらかのドキュメントデータが表示される。そのため，検索を行った者は何らかのドキュメントデータを閲覧する事が可能となるため，不正にドキュメントデータを閲覧しようとしていたとしても，正しくないドキュメントデータを正しいドキュメントデータであると誤解させる事が可能となり，ドキュメントデータの隠匿性を高めることができる。 In this way, even when document data is displayed from the search result, by controlling the display by verifying the hash value, for example, when there is an illegal screen transition, the document search result is transferred to other users. Since it is possible to control the display of the document data generated by the document search when the user does not want to browse the document data, the security of the document data can be ensured. In other words, if the hash value generated from the search result is sent correctly, the document data of the search result corresponding to the hash value is displayed, and it is sent even if the hash value is not correct. Some document data corresponding to the retrieved result is displayed. Therefore, since the person who performed the search can browse some document data, even if the document data is illegally browsed, it is possible to misunderstand that the incorrect document data is the correct document data. , The concealment of document data can be improved.

次に，図７を元に，ドキュメント検索条件入力画面より検索キーワードとハッシュ値を入力してドキュメントデータを閲覧する場合の処理について詳述する。ドキュメント閲覧クライアント２にいて図１４に示すようなドキュメント検索条件入力画面が表示されている場合において，検索キーワードとハッシュ値が入力され閲覧ボタンが押されると，ドキュメント閲覧クライアント２よりドキュメント閲覧受付部１０７に検索キーワードとハッシュ値が送られる。 Next, based on FIG. 7, a process in the case where document data is browsed by inputting a search keyword and a hash value from the document search condition input screen will be described in detail. When a document search condition input screen as shown in FIG. 14 is displayed in the document browsing client 2, when the search keyword and hash value are input and the browse button is pressed, the document browsing reception unit 107 receives the document browsing client 2. Search keyword and hash value are sent to.

ステップＳ５０１においてドキュメント閲覧受付部１０７は送られた検索キーワードとハッシュ値を閲覧条件として受け取る。そしてステップＳ５０２では，ドキュメント生成部１０６がドキュメント閲覧受付部１０７にて取得された検索キーワードを元に，ドキュメントデータ保持部１１０よりドキュメント構成要素の検索を行う。例えば，検索キーワードとしてドキュメント名が指定されている場合は，ドキュメントデータ提供部１１０に記録されている文章構造データの「ドキュメント名」項目が，ドキュメント閲覧受付部１０７の取得した検索条件のドキュメント名と一致するドキュメント構成要素が検索される。次に，ステップＳ５０４においてドキュメント閲覧受付部１０７にドキュメント閲覧クライアント２よりハッシュ値が送られているかのチェックがなされる。ハッシュ値が送られていない場合（Ｓ５０４，Ｎｏ）は，ステップＳ５０８に進み，ドキュメント提供部１０４がドキュメント閲覧クライアント２にステップＳ５０３で生成されたドキュメントデータのすべてを返却する。ハッシュ値が送られた場合（Ｓ５０４，Ｙｅｓ）は，送られたハッシュ値が正しいかどうか検証するために，ステップＳ５０５に進む。 In step S501, the document browsing receiving unit 107 receives the sent search keyword and hash value as browsing conditions. In step S <b> 502, the document generation unit 106 searches the document component from the document data holding unit 110 based on the search keyword acquired by the document browsing reception unit 107. For example, when a document name is specified as a search keyword, the “document name” item of the sentence structure data recorded in the document data providing unit 110 is the document name of the search condition acquired by the document browsing receiving unit 107. A matching document component is searched. In step S504, it is checked whether the hash value is sent from the document browsing client 2 to the document browsing receiving unit 107. If the hash value is not sent (S504, No), the process proceeds to step S508, and the document providing unit 104 returns all the document data generated in step S503 to the document browsing client 2. If the hash value has been sent (S504, Yes), the process proceeds to step S505 in order to verify whether the sent hash value is correct.

ステップＳ５０５では，ステップＳ５０３にて検索されたドキュメント構成要素に付与されているハッシュ値にドキュメント閲覧受付部１０７が取得したハッシュ値が含まれているかどうかを検証する。そして，ステップＳ５０６において，ステップＳ５０５で検証したハッシュ値が含まれていた場合（Ｓ５０６，Ｙｅｓ）は，ステップＳ５０７に進み，ドキュメント提供部１０４がステップＳ５０３で生成したドキュメントデータから，検証したハッシュ値に対応するドキュメント構成要素から構成されるドキュメントデータのみをドキュメント閲覧クライアント２に返却する。ステップＳ５０６において，ステップＳ５０５で検証したハッシュ値が含まれてなかった場合（Ｓ５０６，Ｎｏ）はステップＳ５０８に進み，ドキュメント提供部１０４がドキュメント閲覧クライアント２にステップＳ４０３で生成されたドキュメントデータの全てを返却する。 In step S505, it is verified whether or not the hash value assigned to the document component searched in step S503 includes the hash value acquired by the document browsing reception unit 107. In step S506, if the hash value verified in step S505 is included (S506, Yes), the process proceeds to step S507, and the document providing unit 104 generates the verified hash value from the document data generated in step S503. Only the document data composed of the corresponding document components is returned to the document browsing client 2. In step S506, if the hash value verified in step S505 is not included (S506, No), the process proceeds to step S508, and the document providing unit 104 sends all of the document data generated in step S403 to the document browsing client 2. return.

この様に，検索キーワードとハッシュ値を入力してドキュメントデータの表示が行えるようにすることによって，過去のドキュメント検索処理によって生成されたドキュメントデータの表示や，ドキュメントデータ保持部１１０に任意に作成されて記録されているドキュメントデータの表示を，随時行う事が可能となる。さらに，ハッシュ値を検証して表示を制御することによって，表示したいドキュメントデータの検索条件としてドキュメント名などの条件が分かっていても，正しいハッシュ値を入力しなければ，ドキュメントデータの閲覧は行えないため，ドキュメントデータのセキュリティを確保する事ができる。ハッシュ値が間違っていたり，入力されていなかったとしても，閲覧をエラーとするのではなく，何らかのドキュメントデータを表示する。これにより，不正にドキュメントデータを閲覧しようとした場合であっても，閲覧者はあたかも目的とするドキュメントデータが表示されたかのように誤認させることができるため，再度の不正閲覧の処理を防止する事ができる。 In this way, by inputting the search keyword and the hash value so that the document data can be displayed, the document data generated by the past document search process can be displayed or arbitrarily created in the document data holding unit 110. It is possible to display the document data recorded at any time. Furthermore, by verifying the hash value and controlling the display, even if the document name and other conditions are known as search conditions for the document data to be displayed, the document data cannot be browsed unless the correct hash value is entered. Therefore, the security of the document data can be ensured. Even if the hash value is incorrect or not entered, some document data is displayed instead of making the browsing error. As a result, even if the document data is illegally browsed, the viewer can be mistaken as if the target document data was displayed, so that the unauthorized browsing process can be prevented again. Can do.

分割してドキュメントデータ保持部１１０に記録されているドキュメント構成要素の個々のデータは，データ単独としても利用可能なドキュメントデータとすることもできるが，組合せることによって，ドキュメントデータ全体として価値を有するドキュメントデータとなるものである。本発明の方式によれば，ハッシュ値を知っていればドキュメント構成要素を組み合わせて，特定の価値を有するドキュメントデータを生成する事ができ，ハッシュ値を知らない者にはドキュメントデータを秘匿することができる。その一方，ハッシュ値を知らなかったとしても検索キーワードによってドキュメントデータを新たに生成し閲覧する事もできるため，分割されたドキュメント構成要素を有効活用することができる。 Individual data of the document constituent elements that are divided and recorded in the document data holding unit 110 can be used as document data that can be used as data alone, but by combining them, the entire document data has value. This is document data. According to the method of the present invention, document data having a specific value can be generated by combining document components if the hash value is known, and the document data is kept secret to those who do not know the hash value. Can do. On the other hand, even if the hash value is not known, document data can be newly generated and browsed by a search keyword, so that the divided document components can be used effectively.

図１６はドキュメント閲覧クライアント２における生成ドキュメントデータの表示例を表す。生成されたドキュメントデータは，図１６に示すように構造化文章データとして表示されるため，任意の形式に再加工して利用することも可能である。例えば，構造化文章データの閲覧定義を行う事によって図１７に示すようにテーブル表形式で生成ドキュメントを表示することも可能である。 FIG. 16 shows a display example of generated document data in the document browsing client 2. Since the generated document data is displayed as structured text data as shown in FIG. 16, it can be reprocessed into an arbitrary format and used. For example, it is also possible to display a generated document in a table form as shown in FIG. 17 by performing browsing definition of structured text data.

以上，ドキュメントデータ管理装置において行われる，ドキュメントデータの収集と分割処理，ドキュメントデータの検索処理，ドキュメントデータの閲覧処理について詳述したが，このドキュメントデータを管理する装置は，コンピュータとソフトウェアプログラムとによって実現することが可能であり，そのプログラムをコンピュータ読み取り可能な記録媒体に記録することも，ネットワークを通して提供することも可能である。 The document data collection and division process, the document data search process, and the document data browsing process performed in the document data management apparatus have been described in detail. The apparatus for managing the document data includes a computer and a software program. The program can be realized, and the program can be recorded on a computer-readable recording medium or provided through a network.

１ドキュメントデータ管理装置
２ドキュメント閲覧クライアント
３ドキュメントファイルサーバ
１０１ドキュメント検索受付部
１０２ドキュメント検索結果生成部
１０３ドキュメント検索結果提供部
１０４ドキュメント提供部
１０５ハッシュ値検証部
１０６ドキュメント生成部
１０７ドキュメント閲覧受付部
１０８ドキュメント分割部
１０９ドキュメント収集部
１１０ドキュメントデータ保持部
１２０ドキュメント収集設定記憶部 DESCRIPTION OF SYMBOLS 1 Document data management apparatus 2 Document browsing client 3 Document file server 101 Document search reception part 102 Document search result generation part 103 Document search result provision part 104 Document provision part 105 Hash value verification part 106 Document generation part 107 Document browsing reception part 108 Document Division unit 109 Document collection unit 110 Document data holding unit 120 Document collection setting storage unit

Claims

A document collection setting storage unit that stores document collection settings that define how to collect document data from a server device connected to the network;
A document collection unit for collecting document data stored in the server defined in the document collection setting;
A document data dividing unit for dividing the document data collected by the document collecting unit into document components;
A document data holding unit for holding document components divided by the document data dividing unit;
A document search receiving unit for receiving a search condition for searching for a document component stored in the document data holding unit from an information terminal connected to a network;
Searching document components from the document data holding unit based on the search condition received by the document search receiving unit, adding a document name and hash value to the document component of the search result, and holding the document component in the document holding unit A document search result generation unit for generating document components by means of
A document data management apparatus comprising: a document search result providing unit that provides a document name and a hash value added by the document search generation unit to the information terminal.

The document data management apparatus according to claim 1, wherein
The document collection unit is a document collection unit that collects document data by converting the collected document data into a structured document data format when the collected document data is not in a structured document data format;
The document dividing unit is a document dividing unit that divides document data into document components based on tag information of structured text data.

In the document data management device according to claim 1 or 2,
A document browsing receiving unit that receives browsing conditions including at least a hash value from the information terminal;
A document generation unit that searches the document data holding unit based on the browsing conditions received by the document browsing reception unit and generates document data by combining the searched document components;
A hash value verification unit that verifies whether the hash value included in the viewing condition is included in the hash value of the document component searched in the document generation unit;
A document providing unit that provides document data corresponding to a hash value from the document data generated by the document generation unit to the information terminal when the hash value verification unit determines that a hash value is included; A document data management apparatus comprising:

In the document data management apparatus according to claim 3,
The document browsing receiving unit is a document browsing receiving unit that receives, as a browsing condition, a document name added by the document search result generation unit as a browsing condition.

In the document data management device according to claim 3 or 4,
The document browsing receiving unit is a document browsing receiving unit that receives, as browsing conditions, item names of document components recorded in the document data storage unit as browsing conditions.

A document data providing program for causing a computer to function as each processing unit provided in the document data management device according to any one of claims 1 to 5.