JP2021163363A

JP2021163363A - Information processing apparatus, data cooperation system, method, and program

Info

Publication number: JP2021163363A
Application number: JP2020066621A
Authority: JP
Inventors: 正光木村; Masamitsu Kimura
Original assignee: SKY COM KK
Current assignee: SKY COM KK
Priority date: 2020-04-02
Filing date: 2020-04-02
Publication date: 2021-10-11
Anticipated expiration: 2040-04-02
Also published as: JP6818923B1

Abstract

To provide an information processing apparatus, a data cooperation system, a method, and a program that can print a document and display the document on a screen in a predetermined layout without being affected by a user environment, and easily acquire data required by a user.SOLUTION: An extraction rule storage unit 121 stores an area in a PDF file and an item name indicating the meaning of data in association with each other. A data extraction unit 111 extracts, from the input PDF file, the data included in the area associated with the item name stored in the extraction rule storage unit 121. A cooperative PDF file creation unit 112 creates a cooperative PDF file obtained by adding cooperative data associating the item name and the extracted data with each other to the PDF file.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、データ連携システム、方法およびプログラムに関する。 The present invention relates to an information processing device, a data linkage system, a method and a program.

ＰＤＦ（Portable Document Format）ファイルは、ユーザの操作する端末の使用環境に影響されることがなく、ページ単位での印刷また画面表示ができレイアウトが崩れないため、ビジネス文書として広く利用されている。ＰＤＦファイルは、表計算や文書作成等のアプリケーションプログラムによって文書内容を記述した文書ファイルを作成し、作成した文書ファイルをＰＤＦ形式に変換して生成することが一般的である。 PDF (Portable Document Format) files are widely used as business documents because they are not affected by the usage environment of the terminal operated by the user, can be printed on a page-by-page basis, can be displayed on the screen, and the layout does not collapse. The PDF file is generally generated by creating a document file in which the document contents are described by an application program such as table calculation or document creation, and converting the created document file into PDF format.

このようにして生成したＰＤＦファイルは、文書ファイルの作成の仕方によって内部のデータ構造が若干異なり、キーワード検索でユーザが意図する検索ができない場合があるという問題があった。かかる問題に対し、ＰＤＦファイルから抽出した文字情報を含む検索用ファイルとＰＤＦファイルとを対応付け一つの帳票ファイルとしてデータベースに登録し、登録した帳票ファイルに対する閲覧要求に応じ、ユーザが選択する任意の矩形領域と検索条件から判断した検索対象とすべき行に対し検索を実行する技術が開示されている（特許文献１参照）。 The PDF file generated in this way has a problem that the internal data structure is slightly different depending on the method of creating the document file, and the search intended by the user may not be possible by the keyword search. In response to this problem, the search file containing the character information extracted from the PDF file and the PDF file are associated and registered in the database as one form file, and any arbitrary user can select in response to the viewing request for the registered form file. A technique for executing a search for a line to be searched based on a rectangular area and a search condition is disclosed (see Patent Document 1).

特開2014-119939号公報Japanese Unexamined Patent Publication No. 2014-119939

しかしながら、上述した特許文献１に記載した技術では、検索用ファイルとＰＤＦファイルとを対応付けて一つの帳票ファイルとしてデータベースに格納するため、ファイル管理が煩雑になるという問題があった。 However, in the technique described in Patent Document 1 described above, since the search file and the PDF file are associated and stored in the database as one form file, there is a problem that the file management becomes complicated.

また、請求書のように文書の形式がある程度決まっており、ユーザが必要とするデータが予め決まっているようなＰＤＦファイルに対しても、任意の文言が検索できるよう検索用ファイルを生成し、検索用ファイルとＰＤＦファイルを対応付けてデータベースに格納する必要があり、事前準備作業が多く、データベースを含むファイル管理システムが必要となるため、簡易な処理でＰＤＦファイルから必要なデータを取得することができないという問題があった。 In addition, a search file is generated so that any wording can be searched even for a PDF file in which the format of the document is determined to some extent such as an invoice and the data required by the user is determined in advance. Since it is necessary to associate the search file and the PDF file and store them in the database, there is a lot of preparatory work and a file management system including the database is required, so it is necessary to acquire the necessary data from the PDF file with a simple process. There was a problem that it could not be done.

本発明は、上記に鑑みてなされたものであり、文書をユーザ環境に影響されることなく所定のレイアウトで印刷および画面表示することができるとともに、ユーザの必要とするデータを容易に取得することができる情報処理装置、データ連携システム、方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above, and the document can be printed and displayed on the screen in a predetermined layout without being affected by the user environment, and the data required by the user can be easily acquired. The purpose is to provide information processing devices, data linkage systems, methods and programs that can be used.

上述した課題を解決するために、本発明では、ＰＤＦファイルにおける領域と、データの意味を示す項目名と、を対応付けて記憶する抽出ルール記憶手段、を備え、入力されたＰＤＦファイルから抽出ルール記憶手段に記憶する項目名に対応付けられた領域に含まれるデータを抽出し、ＰＤＦファイルに、項目名と、抽出したデータとを対応付けた連携データを追記した連携ＰＤＦファイルを生成することを特徴とする。 In order to solve the above-mentioned problems, the present invention includes an extraction rule storage means for storing an area in a PDF file and an item name indicating the meaning of data in association with each other, and an extraction rule from an input PDF file. It is possible to extract the data included in the area associated with the item name stored in the storage means and generate a linked PDF file in which the linked data associated with the item name and the extracted data is added to the PDF file. It is a feature.

上述したように構成した本発明によれば、文書をユーザ環境に影響されることなく所定のレイアウトで印刷および画面表示することができるとともに、ユーザの必要とするデータを容易に取得することができるという効果を奏する。 According to the present invention configured as described above, the document can be printed and displayed on the screen in a predetermined layout without being affected by the user environment, and the data required by the user can be easily acquired. It plays the effect.

本実施例にかかるデータ連携システム１０の構成および情報処理装置１００の構成を示すブロック図である。It is a block diagram which shows the structure of the data linkage system 10 and the structure of an information processing apparatus 100 which concerns on this Example. 抽出ルール記憶部１２１に記憶する抽出ルール定義ファイルの一例を示す説明図である。It is explanatory drawing which shows an example of the extraction rule definition file stored in the extraction rule storage unit 121. データ連携システム１０が備える情報処理装置２００および情報処理装置３００の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus 200 and the information processing apparatus 300 included in the data linkage system 10. 情報処理装置１００が実行する連携ＰＤＦファイル生成処理手順を示すフローチャートである。It is a flowchart which shows the cooperation PDF file generation processing procedure which the information processing apparatus 100 executes. 連携ＰＤＦファイルのデータ構成の一例を示す説明図である。It is explanatory drawing which shows an example of the data structure of the cooperation PDF file. 情報処理装置２００が実行するデータ連携処理手順を示すフローチャートである。It is a flowchart which shows the data linkage processing procedure which the information processing apparatus 200 executes. 連携ＰＤＦファイルのデータ構成の一例を示す説明図である。It is explanatory drawing which shows an example of the data structure of the cooperation PDF file. 情報処理装置１００が実行する連携ＰＤＦファイルおよび連携データの表示処理手順を示すフローチャートであるIt is a flowchart which shows the display processing procedure of the cooperation PDF file and cooperation data executed by the information processing apparatus 100. 情報処理装置３００が実行する抽出ルール生成処理手順を示すフローチャートである。It is a flowchart which shows the extraction rule generation processing procedure which the information processing apparatus 300 executes. 画面に表示したＰＤＦファイルにおける項目名ごとの領域の指示の一例を示す説明図である。It is explanatory drawing which shows an example of the instruction of the area for each item name in the PDF file displayed on the screen. 情報処理装置１００が実行する連携ＰＤＦファイル生成処理手順を示すフローチャートである。It is a flowchart which shows the cooperation PDF file generation processing procedure which the information processing apparatus 100 executes. 連携ＰＤＦファイルのデータ構成の一例を示す説明図である。It is explanatory drawing which shows an example of the data structure of the cooperation PDF file.

添付図面を参照し本開示での１または複数の実施例を説明する。以下の説明は、本開示の実施の形態の例示であり、本開示はこれらの実施例に限定されるものではない。 One or more embodiments in the present disclosure will be described with reference to the accompanying drawings. The following description is an example of embodiments of the present disclosure, and the present disclosure is not limited to these examples.

図１は、本実施例にかかるデータ連携システム１０の構成および情報処理装置１００の構成を示すブロック図である。データ連携システム１０は、情報処理装置１００と、情報処理装置２００と、情報処理装置３００と、他のシステムが動作する情報処理装置４００と、を備える。情報処理装置１００と情報処理装置２００と情報処理装置３００と情報処理装置４００とは、図１に示すように、ネットワークＮを介して互いに通信可能に接続する。ネットワークＮは、インターネット、イントラネット、ＬＡＮ（Local Area Network）やＶＰＮ（Virtual Private Network）、移動体通信網等の任意の通信ネットワークおよびその組合せであり、その一部または全部が有線または無線であってもよい。 FIG. 1 is a block diagram showing a configuration of a data linkage system 10 and a configuration of an information processing device 100 according to this embodiment. The data linkage system 10 includes an information processing device 100, an information processing device 200, an information processing device 300, and an information processing device 400 on which another system operates. As shown in FIG. 1, the information processing device 100, the information processing device 200, the information processing device 300, and the information processing device 400 are communicably connected to each other via the network N. The network N is an arbitrary communication network such as the Internet, an intranet, a LAN (Local Area Network), a VPN (Virtual Private Network), and a mobile communication network, and a combination thereof, and a part or all thereof is wired or wireless. May be good.

情報処理装置１００は、連携データを追記したＰＤＦファイル（以下、連携ＰＤＦファイルという）を生成するコンピュータであり、パーソナルコンピュータ（以下、ＰＣという）、タブレット端末、スマートフォン、サーバ等である。情報処理装置１００は、情報処理装置１００は、図１に示すように、制御部１１０、記憶部１２０、操作表示部１３０、通信部１４０等を備える。 The information processing device 100 is a computer that generates a PDF file (hereinafter referred to as a linked PDF file) to which linked data is added, and is a personal computer (hereinafter referred to as a PC), a tablet terminal, a smartphone, a server, or the like. As shown in FIG. 1, the information processing device 100 includes a control unit 110, a storage unit 120, an operation display unit 130, a communication unit 140, and the like.

制御部１１０は、記憶部１２０に記憶する種々のプログラムおよび制御情報を展開して実行することにより、情報処理装置１００全体の動作を制御する。制御部１１０は、データ抽出部１１１、連携ＰＤＦファイル生成部１１２、タイムスタンプ付与部１１３、電子署名付与部１１４、ＱＲコード生成部１１５として機能する。 The control unit 110 controls the operation of the entire information processing device 100 by developing and executing various programs and control information stored in the storage unit 120. The control unit 110 functions as a data extraction unit 111, a linked PDF file generation unit 112, a time stamp addition unit 113, an electronic signature addition unit 114, and a QR code generation unit 115.

記憶部１２０は、抽出ルール記憶部１２１と、図示しない、各部を制御するためのプログラム、アプリケーションプログラム、各種制御情報や中間ファイル等を記憶する。 The storage unit 120 stores the extraction rule storage unit 121, a program (not shown) for controlling each unit, an application program, various control information, an intermediate file, and the like.

抽出ルール記憶部１２１は、ＰＤＦファイルからデータを抽出するためのルールを記述した抽出ルール定義ファイルを記憶する。より具体的には、抽出ルール定義ファイルは、位置情報と、項目名と、フォーマットとを対応付けて記憶する。図２は、抽出ルール記憶部１２１に記憶する抽出ルール定義ファイルの一例を示す説明図である。図２に示す例では、抽出ルール定義ファイルをＪＳＯＮ形式で記述しており、抽出するデータごとに、位置情報２１と、項目名２２と、フォーマット２３との組合せで記述する。抽出ルール定義ファイルは、ＪＳＯＮ形式に限らず、抽出ルールを定義できれば、どのような形式であってもよい。また、抽出ルール定義ファイルは、ＰＤＦファイルの種別に応じ複数の抽出ルール定義ファイルを抽出ルール記憶部１２１に記憶してもよい。 The extraction rule storage unit 121 stores an extraction rule definition file that describes rules for extracting data from the PDF file. More specifically, the extraction rule definition file stores the position information, the item name, and the format in association with each other. FIG. 2 is an explanatory diagram showing an example of an extraction rule definition file stored in the extraction rule storage unit 121. In the example shown in FIG. 2, the extraction rule definition file is described in JSON format, and each data to be extracted is described by a combination of position information 21, item name 22, and format 23. The extraction rule definition file is not limited to the JSON format, and may be in any format as long as the extraction rule can be defined. Further, as the extraction rule definition file, a plurality of extraction rule definition files may be stored in the extraction rule storage unit 121 according to the type of the PDF file.

ここで、位置情報とは、ページごとに管理されるＰＤＦファイルをページごとに印刷または表示した画像のなかでの抽出対象のデータが配置された領域を示す情報であり、図２に示す抽出ルール定義ファイル例では、ＰＤＦファイルの１ページにおける領域を矩形の座標として記述している。位置情報は、ＰＤＦファイルの所定の位置（例えば、ＰＤＦファイルの左上）を原点とし、Ｘ軸方向の座標、Y軸方向の座標として記載するほか、ＰＤＦファイルの長辺における割合と短辺における割合（例えば、原点から短辺方向１５％、長辺方向２０％の位置等）で示してもよい。なお、位置情報は、矩形での４点に限る必要はなく、多角形や円、楕円等であってもよい。 Here, the position information is information indicating an area in which the data to be extracted is arranged in the image in which the PDF file managed for each page is printed or displayed for each page, and the extraction rule shown in FIG. In the definition file example, the area on one page of the PDF file is described as the coordinates of the rectangle. The position information is described as the coordinates in the X-axis direction and the coordinates in the Y-axis direction with the predetermined position of the PDF file (for example, the upper left of the PDF file) as the origin, and the ratio on the long side and the ratio on the short side of the PDF file. (For example, a position 15% in the short side direction and 20% in the long side direction from the origin) may be indicated. The position information does not have to be limited to four rectangular points, and may be a polygon, a circle, an ellipse, or the like.

項目名とは、ＰＤＦファイルから抽出するデータ（文字や数字等）の意味を示す情報であり、例えば、データ抽出対象であるＰＤＦファイルが“請求書”であれば、項目名の一例として“会社名”、“金額”、“請求No”、“日付”、“消費税”等を項目名としてもよい。フォーマットとは、位置情報に含まれるデータを抽出する際に抽出するデータの形式を示すものであり、位置情報に含まれる、または位置情報に一部が重なるとして抽出されたデータのなかから、フォーマットに合致するデータを抽出してもよい。 The item name is information indicating the meaning of the data (characters, numbers, etc.) extracted from the PDF file. For example, if the PDF file to be extracted from the data is an "invoice", an example of the item name is "company". The item name may be "name", "amount", "billing number", "date", "consumption tax", or the like. The format indicates the format of the data to be extracted when extracting the data included in the position information, and is a format from the data included in the position information or extracted as a part overlapping the position information. Data matching the above may be extracted.

操作表示部１３０は、ユーザによる操作等を受付け、操作等に対する結果を表示する。操作表示部１３０は、アプリケーションプログラムで作成した文書ファイルやＰＤＦファイルの画像等を表示する。操作表示部１３０は、例えばＰＣの場合は、液晶ディスプレイ（ＬＣＤ：Liquid Crystal Display）や有機ＥＬ（electroluminescence）ディスプレイ等の表示装置、キーボード、マウス等であり、タブレット端末、スマートフォンの場合は、液晶ディスプレイとタッチセンサを重畳して構成するタッチパネル等である。通信部１４０は、ネットワークＮを介して他の装置と互いに通信可能に接続し、他の装置とデータを送受信する。 The operation display unit 130 receives an operation or the like by the user and displays the result for the operation or the like. The operation display unit 130 displays an image of a document file or a PDF file created by an application program. The operation display unit 130 is, for example, a display device such as a liquid crystal display (LCD) or an organic EL (electroluminescence) display in the case of a PC, a keyboard, a mouse, or the like, and a liquid crystal display in the case of a tablet terminal or a smartphone. It is a touch panel or the like that is configured by superimposing a mouse and a touch sensor. The communication unit 140 connects to another device so as to be able to communicate with each other via the network N, and transmits / receives data to / from the other device.

次に、制御部１１０で機能する、データ抽出部１１１、連携ＰＤＦファイル生成部１１２、タイムスタンプ付与部１１３、電子署名付与部１１４、ＱＲコード生成部１１５について説明する。 Next, the data extraction unit 111, the linked PDF file generation unit 112, the time stamp addition unit 113, the electronic signature addition unit 114, and the QR code generation unit 115 that function in the control unit 110 will be described.

データ抽出部１１１は、データ抽出対象であるＰＤＦファイルから、抽出ルール記憶部１２１に記憶する抽出ルール定義ファイルに記述した位置情報と項目名とフォーマットに基づいて、項目名ごとのデータ（以下、連携データという）を抽出する。データ抽出部１１１は、抽出した連携データをＸＭＬ（Extensible Markup Language）形式（またはＣＳＶ形式、その他の形式）に変換する。 The data extraction unit 111 extracts data for each item name from the PDF file to be extracted based on the position information, the item name, and the format described in the extraction rule definition file stored in the extraction rule storage unit 121 (hereinafter, cooperation). Data) is extracted. The data extraction unit 111 converts the extracted linked data into an XML (Extensible Markup Language) format (or CSV format or other format).

連携ＰＤＦファイル生成部１１２は、連携データをＰＤＦファイルの不可視領域（または可視領域）に追記したＰＤＦファイルである連携ＰＤＦファイルを生成する。連携ＰＤＦファイル生成部１１２は、後述するＱＲコード生成部１１５によって生成したＱＲコード（登録商標）をＰＤＦファイルに追記した連携ＰＤＦファイルを生成する。 The cooperation PDF file generation unit 112 generates a cooperation PDF file which is a PDF file in which cooperation data is added to an invisible area (or a visible area) of the PDF file. The linked PDF file generation unit 112 generates a linked PDF file in which the QR code (registered trademark) generated by the QR code generation unit 115, which will be described later, is added to the PDF file.

タイムスタンプ付与部１１３は、連携ＰＤＦファイルのハッシュ値を時刻認証局に送信することによって発行されるタイムスタンプを連携ＰＤＦファイルに付与する。 The time stamp adding unit 113 assigns a time stamp issued by transmitting the hash value of the linked PDF file to the time certificate authority to the linked PDF file.

電子署名付与部１１４は、連携ＰＤＦファイルのハッシュ値を用いた電子署名を生成し、連携ＰＤＦファイルに付与する。なお、電子署名付与部１１４は、電子署名に代えて、認証局が発行する電子証明書を付与してもよい。さらに、個人（自然人）が作成したことを示す電子署名（または電子証明書）を連携ＰＤＦファイルに付与することに代えて、法人や組織が作成したことを示すｅシールを付与してもよい。 The electronic signature giving unit 114 generates an electronic signature using the hash value of the linked PDF file and assigns it to the linked PDF file. The electronic signature giving unit 114 may give an electronic certificate issued by a certificate authority instead of the electronic signature. Further, instead of giving the linked PDF file an electronic signature (or an electronic certificate) indicating that it was created by an individual (natural person), an e-seal indicating that it was created by a corporation or organization may be attached.

ＱＲコード生成部１１５は、データ抽出部１１１が抽出した連携データを示すＱＲコードを生成する。 The QR code generation unit 115 generates a QR code indicating the cooperation data extracted by the data extraction unit 111.

次に、情報処理装置１００とともに、データ連携システム１０を構成する情報処理装置２００および情報処理装置３００について説明する。図３は、データ連携システム１０が備える情報処理装置２００および情報処理装置３００の構成を示すブロック図である。情報処理装置２００は、連携ＰＤＦファイルから連携データを取得し他のシステムに送信するコンピュータであり、サーバやＰＣ、タブレット端末、スマートフォン等である。情報処理装置２００は、制御部２１０、記憶部２２０、操作表示部２３０、通信部２４０等を備える。操作表示部２３０、通信部２４０の機能は、上述した操作表示部１３０、通信部１４０と同様であるため、操作表示部１３０、通信部１４０の説明を参照し、ここでの説明は省略する。 Next, the information processing device 200 and the information processing device 300 constituting the data linkage system 10 will be described together with the information processing device 100. FIG. 3 is a block diagram showing the configurations of the information processing device 200 and the information processing device 300 included in the data linkage system 10. The information processing device 200 is a computer that acquires cooperation data from a cooperation PDF file and transmits it to another system, and is a server, a PC, a tablet terminal, a smartphone, or the like. The information processing device 200 includes a control unit 210, a storage unit 220, an operation display unit 230, a communication unit 240, and the like. Since the functions of the operation display unit 230 and the communication unit 240 are the same as those of the operation display unit 130 and the communication unit 140 described above, the description of the operation display unit 130 and the communication unit 140 will be referred to, and the description thereof will be omitted here.

記憶部２２０は、連携ＰＤＦファイル記憶部２２１と、連携データ記憶部２２２と、図示しない、各部を制御するためのプログラム、アプリケーションプログラム、各種制御情報や中間ファイル等を記憶する。連携ＰＤＦファイル記憶部２２１は、連携データを抽出する連携ＰＤＦファイルを記憶する。連携ＰＤＦファイル記憶部２２１は、ネットワークＮを介して接続する、１または複数の情報処理装置１００から連携ＰＤＦファイルを受信し格納する。連携データ記憶部２２２は、連携ＰＤＦファイルから抽出した連携データを格納する。 The storage unit 220 stores a linked PDF file storage unit 221 and a linked data storage unit 222, and a program (not shown) for controlling each unit, an application program, various control information, an intermediate file, and the like, which are not shown. The cooperation PDF file storage unit 221 stores the cooperation PDF file for extracting the cooperation data. The linked PDF file storage unit 221 receives and stores the linked PDF file from one or a plurality of information processing devices 100 connected via the network N. The cooperation data storage unit 222 stores the cooperation data extracted from the cooperation PDF file.

制御部２１０は、データ連携部２１１、タイムスタンプ検証部２１２、電子署名検証部２１３として機能する。各部について説明する。 The control unit 210 functions as a data linkage unit 211, a time stamp verification unit 212, and an electronic signature verification unit 213. Each part will be described.

データ連携部２１１は、連携ＰＤＦファイル記憶部２２１に記憶する連携ＰＤＦファイルから、連携ＰＤＦファイルの不可視領域（または可視領域）に記憶する連携データを取得する。データ連携部２１１は、取得した連携データを連携データ記憶部２２２に格納する。 The data cooperation unit 211 acquires the cooperation data to be stored in the invisible area (or visible area) of the cooperation PDF file from the cooperation PDF file stored in the cooperation PDF file storage unit 221. The data linkage unit 211 stores the acquired linkage data in the linkage data storage unit 222.

タイムスタンプ検証部２１２は、連携ＰＤＦファイルに付与されたタイムスタンプを検証する。より具体的には、タイムスタンプ検証部２１２は、連携ＰＤＦファイルのハッシュ値と、タイムスタンプに含まれるハッシュ値を比較し検証する。 The time stamp verification unit 212 verifies the time stamp given to the linked PDF file. More specifically, the time stamp verification unit 212 compares and verifies the hash value of the linked PDF file with the hash value included in the time stamp.

電子署名検証部２１３は、連携ＰＤＦファイルに付与された電子署名を検証する。より具体的には、電子署名検証部２１３は、連携ＰＤＦファイルのハッシュ値と、電子署名を公開鍵で復号したハッシュ値を比較し検証する。 The electronic signature verification unit 213 verifies the electronic signature given to the linked PDF file. More specifically, the electronic signature verification unit 213 compares and verifies the hash value of the linked PDF file with the hash value obtained by decrypting the electronic signature with the public key.

情報処理装置３００は、情報処理装置１００において連携ＰＤＦファイルから連携データを抽出する際の抽出ルールを生成するコンピュータであり、サーバやＰＣ等である。情報処理装置３００は、制御部３１０、記憶部３２０、操作表示部３３０、通信部３４０等を備える。操作表示部３３０、通信部３４０の機能は、上述した操作表示部１３０、通信部１４０と同様であるため、操作表示部１３０、通信部１４０の説明を参照し、ここでの説明は省略する。 The information processing device 300 is a computer that generates an extraction rule for extracting linked data from the linked PDF file in the information processing device 100, and is a server, a PC, or the like. The information processing device 300 includes a control unit 310, a storage unit 320, an operation display unit 330, a communication unit 340, and the like. Since the functions of the operation display unit 330 and the communication unit 340 are the same as those of the operation display unit 130 and the communication unit 140 described above, the description of the operation display unit 130 and the communication unit 140 will be referred to, and the description thereof will be omitted here.

記憶部３２０は、教師データ記憶部３２１と、抽出ルール記憶部３２２と、図示しない、各部を制御するためのプログラム、アプリケーションプログラム、各種制御情報や中間ファイル等を記憶する。教師データ記憶部３２１は、抽出ルールを生成する際に用いるＰＤＦファイルを記憶する。教師データ記憶部３２１に記憶するＰＤＦファイルは、ユーザの操作する基幹システム等から収集しても、ネットワークＮを介して接続する多くの情報処理装置からクローラで収集してもよい。抽出ルール記憶部３２２は、抽出ルール定義ファイルを記憶する。 The storage unit 320 stores a teacher data storage unit 321 and an extraction rule storage unit 322, and a program (not shown) for controlling each unit, an application program, various control information, an intermediate file, and the like, which are not shown. The teacher data storage unit 321 stores the PDF file used when generating the extraction rule. The PDF file stored in the teacher data storage unit 321 may be collected from a core system operated by the user, or may be collected by a crawler from many information processing devices connected via the network N. The extraction rule storage unit 322 stores the extraction rule definition file.

ルール生成部３１１は、読み込んだＰＤＦファイルから抽出ルール定義ファイルを生成し、抽出ルール記憶部３２２に格納する。なお、抽出ルール定義ファイルは、ＰＤＦファイルの種別ごとに生成し、抽出ルール記憶部３２２に格納してもよい。 The rule generation unit 311 generates an extraction rule definition file from the read PDF file and stores it in the extraction rule storage unit 322. The extraction rule definition file may be generated for each type of PDF file and stored in the extraction rule storage unit 322.

上述したように構成されたデータ連携システム１０の情報処理装置１００、情報処理装置２００、情報処理装置３００で実行する処理について説明する。図４は、情報処理装置１００が実行する連携ＰＤＦファイル生成処理手順を示すフローチャートである。 The processing executed by the information processing device 100, the information processing device 200, and the information processing device 300 of the data linkage system 10 configured as described above will be described. FIG. 4 is a flowchart showing a linked PDF file generation processing procedure executed by the information processing apparatus 100.

情報処理装置１００は、アプリケーションプログラム上でユーザによる操作を受付け、受付けた操作に応じた処理を実行する（ステップＳ４０１）。操作表示部１３０は、アプリケーションプログラムの終了指示を受付けたか否かを判断する（ステップＳ４０２）。アプリケーションプログラムの終了指示を受付けたと判断した場合（ステップＳ４０２：Ｙｅｓ）、アプリケーションプログラムを終了する。 The information processing device 100 receives an operation by the user on the application program and executes a process according to the received operation (step S401). The operation display unit 130 determines whether or not the end instruction of the application program has been received (step S402). When it is determined that the application program termination instruction has been accepted (step S402: Yes), the application program is terminated.

アプリケーションプログラムの終了指示を受付けていないと判断した場合（ステップＳ４０２：Ｎｏ）、操作表示部１３０は、ＰＤＦファイル生成指示を受付けたか否かを判断する（ステップＳ４０３）。ＰＤＦファイル生成指示を受付けていないと判断した場合（ステップＳ４０３：Ｎｏ）、ステップＳ４０１に戻り、アプリケーションプログラム上の操作および処理を実行する。 When it is determined that the end instruction of the application program has not been accepted (step S402: No), the operation display unit 130 determines whether or not the PDF file generation instruction has been accepted (step S403). If it is determined that the PDF file generation instruction has not been accepted (step S403: No), the process returns to step S401 to execute operations and processes on the application program.

ＰＤＦファイル生成指示を受付けたと判断した場合（ステップＳ４０３：Ｙｅｓ）、アプリケーションプログラム上で作成した文書からＰＤＦファイルを生成する（ステップＳ４０４）。データ抽出部１１１は、生成したＰＤＦファイルから連携データを抽出する（ステップＳ４０５）。より具体的には、データ抽出部１１１は、抽出ルール記憶部１２１に記憶する抽出ルール定義ファイルの記述に従い、ＰＤＦファイルにおける位置情報が示す領域に含まれるデータのなかから、フォーマットに従いデータを抽出する。なお、領域に含まれるデータには、データが領域に一部かかるものを含めてもよい。データ抽出部１１１は、抽出したデータと項目名を対応付けた連携データを生成する。連携データの形式は、ＸＭＬ形式、ＣＳＶ形式、またはその他の形式である。 When it is determined that the PDF file generation instruction has been accepted (step S403: Yes), the PDF file is generated from the document created on the application program (step S404). The data extraction unit 111 extracts the cooperation data from the generated PDF file (step S405). More specifically, the data extraction unit 111 extracts data according to the format from the data included in the area indicated by the position information in the PDF file according to the description of the extraction rule definition file stored in the extraction rule storage unit 121. .. The data included in the area may include data in which a part of the data covers the area. The data extraction unit 111 generates linked data in which the extracted data is associated with the item name. The format of the linked data is XML format, CSV format, or other format.

連携ＰＤＦファイル生成部１１２は、ＰＤＦファイルに連携データを埋め込んだ連携ＰＤＦファイルを生成する（ステップＳ４０６）。より具体的には、連携ＰＤＦファイル生成部１１２は、ＰＤＦファイルから抽出した連携データをＰＤＦファイルの不可視領域に埋め込む。 The linked PDF file generation unit 112 generates a linked PDF file in which linked data is embedded in the PDF file (step S406). More specifically, the linked PDF file generation unit 112 embeds the linked data extracted from the PDF file in the invisible area of the PDF file.

図５は、連携ＰＤＦファイルのデータ構成の一例を示す説明図である。図５（a）連携ＰＤＦファイルのデータ構成例に示すように、連携ＰＤＦファイルは、可視領域にページコンテンツを記憶し、不可視領域に連携データを記憶する。連携ＰＤＦファイルを画面に表示すると、画面表示例に示すように、可視領域に記憶するページコンテンツに記述された内容が表示される。なお、ページコンテンツは、１ページに限らず、複数ページであってもよい。連携データは、連携データ例に示すようなデータがＰＤＦファイル内に格納されているが、不可視領域に格納されているため、画面には表示されない。抽出ルール定義ファイルに、ページ番号を含んでもよく、ページごとに記述したデータを抽出してもよい。 FIG. 5 is an explanatory diagram showing an example of the data structure of the linked PDF file. As shown in FIG. 5A (a) Data configuration example of the linked PDF file, the linked PDF file stores the page content in the visible area and stores the linked data in the invisible area. When the linked PDF file is displayed on the screen, the content described in the page content stored in the visible area is displayed as shown in the screen display example. The page content is not limited to one page, and may be a plurality of pages. The linked data is stored in the PDF file as shown in the linked data example, but is not displayed on the screen because it is stored in the invisible area. The page number may be included in the extraction rule definition file, or the data described for each page may be extracted.

タイムスタンプ付与部１１３は、連携ＰＤＦファイルにタイムスタンプを付与する（ステップＳ４０７）。より具体的には、タイムスタンプ付与部１１３は、連携ＰＤＦファイルのハッシュ値を算出し、算出したハッシュ値を時刻認証局に送信し、時刻認証局から受信したタイムスタンプを連携ＰＤＦファイルに付与する。 The time stamping unit 113 adds a time stamp to the linked PDF file (step S407). More specifically, the time stamp adding unit 113 calculates the hash value of the linked PDF file, transmits the calculated hash value to the time certificate authority, and assigns the time stamp received from the time certificate authority to the linked PDF file. ..

電子署名付与部１１４は、連携ＰＤＦファイルに電子署名を付与する（ステップＳ４０８）。より具体的には、電子署名付与部１１４は、連携ＰＤＦファイルのハッシュ値を算出し、算出したハッシュ値を秘密鍵で暗号化した電子署名を生成し、生成した電子署名を連携ＰＤＦファイルに付与する。図５（ｂ）は、連携ＰＤＦファイルにタイムスタンプおよび電子署名を付加した例である。なお、連携ＰＤＦファイルには、タイムスタンプおよび電子署名の両方を付与しても、いずれか一方を付与しても、どちらも付与しなくてもよい。 The electronic signature giving unit 114 gives an electronic signature to the linked PDF file (step S408). More specifically, the electronic signature granting unit 114 calculates the hash value of the linked PDF file, generates an electronic signature in which the calculated hash value is encrypted with the private key, and assigns the generated electronic signature to the linked PDF file. do. FIG. 5B is an example in which a time stamp and an electronic signature are added to the linked PDF file. In addition, both the time stamp and the electronic signature may be given to the linked PDF file, either one may be given, or neither may be given.

このように、印刷や画面表示においてレイアウトが崩れることがないＰＤＦファイルから、抽出ルール定義ファイルに記述された項目名ごとの領域に基づいて連携データを抽出し、抽出したデータをＰＤＦファイル内に格納することによって、ＰＤＦファイルを介して他のシステムにＰＤＦファイルに含まれるデータをデータの意味を示す項目名とともに容易に受け渡しすることができる。これにより、他のシステムでもＰＤＦファイルに含まれるデータを容易に利用することができる。また、ＰＤＦファイルに連携データが埋め込まれているため、１つのファイルとして送受信することができ、取り扱いが容易であるとともに、ＰＤＦファイルをＰＤＦビューアで表示または印刷することでデータの根拠を容易に確認することができる。 In this way, the linked data is extracted from the PDF file whose layout does not collapse in printing or screen display based on the area for each item name described in the extraction rule definition file, and the extracted data is stored in the PDF file. By doing so, the data contained in the PDF file can be easily passed to another system via the PDF file together with the item name indicating the meaning of the data. As a result, the data contained in the PDF file can be easily used in other systems. In addition, since the linked data is embedded in the PDF file, it can be sent and received as one file, which is easy to handle, and the basis of the data can be easily confirmed by displaying or printing the PDF file with the PDF viewer. can do.

また、連携ＰＤＦファイルにタイムスタンプを付与することによって、タイムスタンプに刻印されている時刻以前に連携ＰＤＦファイルが存在し、改ざんされていないことを確認することができる。連携ＰＤＦファイルに電子署名を付与し、受取先で電子署名を検証することによって、連携ＰＤＦファイルが改ざんされていないことや連携ＰＤＦファイルを生成したユーザがなりすましではない正当なユーザであることを確認することができる。 Further, by adding a time stamp to the linked PDF file, it can be confirmed that the linked PDF file exists before the time stamped on the time stamp and has not been tampered with. By assigning an electronic signature to the linked PDF file and verifying the electronic signature at the recipient, it is confirmed that the linked PDF file has not been tampered with and that the user who generated the linked PDF file is a legitimate user who is not spoofed. can do.

次に、情報処理装置２００で実行するデータ連携処理について説明する。図６は、情報処理装置２００が実行するデータ連携処理手順を示すフローチャートである。 Next, the data linkage process executed by the information processing apparatus 200 will be described. FIG. 6 is a flowchart showing a data linkage processing procedure executed by the information processing apparatus 200.

データ連携部２１１は、連携ＰＤＦファイル記憶部２２１から連携ＰＤＦファイルを読込む（ステップＳ６０１）。データ連携部２１１は、連携ＰＤＦファイル記憶部２２１から連携ＰＤＦファイルを読込むことに代えて、他の情報処理装置からネットワークＮを介して送信される連携ＰＤＦファイルを受信して読込んでもよい。 The data linkage unit 211 reads the linkage PDF file from the linkage PDF file storage unit 221 (step S601). Instead of reading the linked PDF file from the linked PDF file storage unit 221, the data linking unit 211 may receive and read the linked PDF file transmitted from another information processing device via the network N.

タイムスタンプ検証部２１２は、連携ＰＤＦファイルに付与されたタイムスタンプを検証する（ステップＳ６０２）。より具体的には、タイムスタンプ検証部２１２は、連携ＰＤＦファイルからハッシュ値を算出し、タイムスタンプに含まれるハッシュ値と比較し、合致すると判断した場合、タイムスタンプに含まれる時刻以降連携ＰＤＦファイルが改ざんされていないと判断する。タイムスタンプが検証できない、すなわちハッシュ値が合致しないと判断した場合は、連携ＰＤＦファイルを識別する情報とともに、タイムスタンプが検証できなかった旨をログファイルに書き込み、ステップＳ６０６に進む。 The time stamp verification unit 212 verifies the time stamp given to the linked PDF file (step S602). More specifically, the time stamp verification unit 212 calculates the hash value from the linked PDF file, compares it with the hash value included in the time stamp, and if it determines that it matches, the linked PDF file after the time included in the time stamp. Is determined to have not been tampered with. If it is determined that the time stamp cannot be verified, that is, the hash values do not match, the information for identifying the linked PDF file and the fact that the time stamp could not be verified are written in the log file, and the process proceeds to step S606.

電子署名検証部２１３は、連携ＰＤＦファイルに付与された電子署名を検証する（ステップＳ６０３）。より具体的には、電子署名検証部２１３は、連携ＰＤＦファイルのハッシュ値と、電子署名を公開鍵で復号して得られるハッシュ値と比較し、合致するか否かを判断する。電子署名が検証できない、すなわちハッシュ値が合致しないと判断した場合は、連携ＰＤＦファイルを識別する情報とともに、電子署名が検証できなかった旨をログファイルに書き込み、ステップＳ６０６に進む。 The electronic signature verification unit 213 verifies the electronic signature given to the linked PDF file (step S603). More specifically, the electronic signature verification unit 213 compares the hash value of the linked PDF file with the hash value obtained by decrypting the electronic signature with the public key, and determines whether or not they match. If it is determined that the electronic signature cannot be verified, that is, the hash values do not match, the information for identifying the linked PDF file and the fact that the electronic signature could not be verified are written in the log file, and the process proceeds to step S606.

データ連携部２１１は、連携ＰＤＦファイルに含まれる連携データを取得する（ステップＳ６０４）。データ連携部２１１は、取得した連携データを連携データ記憶部２２２に格納する（ステップＳ６０５）。データ連携部２１１は、連携ＰＤＦファイルの読込みを終了するか否かを判断する（ステップＳ６０６）。より具体的には、データ連携部２１１は、抽出対象で、かつ、データを抽出されていない連携ＰＤＦファイルが連携ＰＤＦファイル記憶部２２１に記憶されているか否かを判断する。その他、所定時間までに受信した連携ＰＤＦファイルであるか否か等によって、連携データ抽出を終了するか否かを判断してもよい。 The data linkage unit 211 acquires the linkage data included in the linkage PDF file (step S604). The data linkage unit 211 stores the acquired linkage data in the linkage data storage unit 222 (step S605). The data linkage unit 211 determines whether or not to end the reading of the linkage PDF file (step S606). More specifically, the data linkage unit 211 determines whether or not the linkage PDF file that is the extraction target and the data has not been extracted is stored in the linkage PDF file storage unit 221. In addition, it may be determined whether or not to end the linked data extraction depending on whether or not the linked PDF file is received by a predetermined time.

連携ＰＤＦファイルの読込みを終了しないと判断した場合（ステップＳ６０６：Ｎｏ）、ステップＳ６０１に戻り、連携データの抽出処理を続行する。連携ＰＤＦファイルの読込みを終了すると判断した場合（ステップＳ６０６：Ｙｅｓ）、連携データ記憶部２２２に記憶する連携データを他のシステムに送信する（ステップＳ６０７）。他のシステム（例えば、会計システムや業務管理システム等）が動作する情報処理装置４００にネットワークＮを介して連携データを送信し、他のシステムでは、受信した連携データを用いた処理を実行する。 When it is determined that the reading of the linked PDF file is not completed (step S606: No), the process returns to step S601 and the linked data extraction process is continued. When it is determined that the reading of the linked PDF file is finished (step S606: Yes), the linked data stored in the linked data storage unit 222 is transmitted to another system (step S607). The cooperation data is transmitted to the information processing device 400 in which another system (for example, an accounting system, a business management system, etc.) operates via the network N, and the other system executes processing using the received cooperation data.

このように、図５に示すような連携ＰＤＦファイルのデータ構造を採ることによって、連携ＰＤＦファイルから連携データを容易に取得することができる。連携ＰＤＦファイルは、ＰＤＦ形式の１つのファイルであるため、コンピュータ間で容易に送受信することができ、データベースのような仕組みを利用する必要もなく、１つのファイルでデータを管理することができる。 By adopting the data structure of the linked PDF file as shown in FIG. 5 in this way, the linked PDF file can be easily obtained. Since the linked PDF file is one file in PDF format, it can be easily transmitted and received between computers, and data can be managed by one file without the need to use a mechanism like a database.

また、連携ＰＤＦファイルにタイムスタンプや電子署名を付与し、連携データを抽出する際に、タイムススタンプや電子署名を検証することによって改ざんやなりすましがないデータであることを確認することができる。なお、連携ＰＤＦファイルにタイムスタンプや電子署名を付与しない場合は、タイムスタンプや電子署名の検証処理は実行せず、連携データを抽出する。 Further, when a time stamp or an electronic signature is given to the linked PDF file and the linked data is extracted, it can be confirmed that the data is not tampered with or spoofed by verifying the time stamp or the electronic signature. If the time stamp or electronic signature is not added to the linked PDF file, the time stamp or electronic signature verification process is not executed and the linked data is extracted.

他の実施例について説明する。上述した図４の連携ＰＤＦファイル生成処理でのステップＳ４０６において、連携ＰＤＦファイル生成部１１２は、ＰＤＦファイルから抽出した連携データをＰＤＦファイルの不可視領域に埋め込んだが、不可視領域に代えて可視領域に連携データを埋め込んでもよい。以下に、連携データを可視領域に埋め込む場合のデータ構成および表示処理について説明する。 Other examples will be described. In step S406 in the linked PDF file generation process of FIG. 4 described above, the linked PDF file generation unit 112 embeds the linked data extracted from the PDF file in the invisible area of the PDF file, but links to the visible area instead of the invisible area. You may embed the data. The data structure and display processing when embedding the linked data in the visible area will be described below.

図７は、連携ＰＤＦファイルのデータ構成の一例を示す説明図である。図７に示す連携ＰＤＦファイルでは、ページコンテンツを可視領域１に、連携データを可視領域２に記憶する。可視領域１に記憶するページコンテンツは、図５と同様に画面に表示され、可視領域２に連携データを記憶する場合は、画面のＰＤＦファイル上にデータが存在することを示すマーク７１を表示する。 FIG. 7 is an explanatory diagram showing an example of the data structure of the linked PDF file. In the linked PDF file shown in FIG. 7, the page content is stored in the visible area 1 and the linked data is stored in the visible area 2. The page content stored in the visible area 1 is displayed on the screen as in FIG. 5, and when the linked data is stored in the visible area 2, a mark 71 indicating that the data exists on the PDF file on the screen is displayed. ..

連携ＰＤＦファイルの可視領域２に連携データを埋め込んだ場合の、情報処理装置１００での連携ＰＤＦファイルの表示処理について説明する。図８は、情報処理装置１００が実行する連携ＰＤＦファイルの表示処理手順を示すフローチャートである。 The display processing of the linked PDF file in the information processing apparatus 100 when the linked data is embedded in the visible area 2 of the linked PDF file will be described. FIG. 8 is a flowchart showing a display processing procedure of the linked PDF file executed by the information processing apparatus 100.

操作表示部１３０は、表示対象である連携ＰＤＦファイルの指示を受付ける（ステップＳ８０１）。操作表示部１３０は、連携ＰＤＦファイルのページコンテンツを画面に表示する（ステップＳ８０２）。その際、図７に示す画面表示例のように、ページコンテンツの内容を表示することに加え、可視領域２に連携データを記憶することを示すマーク７１を表示する。マーク７１は、図７に示すようなピンのほか円やその他の図形でもよく、連携データが存在することを示すものであればどのようなものでもよい。 The operation display unit 130 receives an instruction of the linked PDF file to be displayed (step S801). The operation display unit 130 displays the page content of the linked PDF file on the screen (step S802). At that time, as in the screen display example shown in FIG. 7, in addition to displaying the content of the page content, a mark 71 indicating that the cooperation data is stored in the visible area 2 is displayed. The mark 71 may be a circle or other figure in addition to the pin as shown in FIG. 7, and may be any shape as long as it indicates that the cooperation data exists.

操作表示部１３０は、連携データの表示指示を受付けたか否かを判断する（ステップＳ８０３）。図７に示す例では、連携データを記憶することを示すマーク７１の指示を受付けたか否かを判断する。連携データの表示指示を受付けたと判断した場合（ステップＳ８０３：Ｙｅｓ）、操作表示部１３０は、連携ＰＤＦファイルの可視領域２に記憶する連携データを読み出し画面に表示する（ステップＳ８０４）。連携データは、連携データのテキストをそのまま表示しても、ＸＭＬ形式の場合、例えばＸＳＬＴ（Extensible Stylesheet Language Transformations）スタイルシートを不可視領域に格納しておき、スタイルシートを用いてＸＭＬデータを変換して表示してもよい。また、連携データは別ウィンドウや吹き出し等で表示してもよい。連携データの表示指示を受付けないと判断した場合（ステップＳ８０３：Ｎｏ）、処理を終了する。 The operation display unit 130 determines whether or not the display instruction of the cooperation data has been received (step S803). In the example shown in FIG. 7, it is determined whether or not the instruction of the mark 71 indicating that the cooperation data is stored has been accepted. When it is determined that the display instruction of the cooperation data has been received (step S803: Yes), the operation display unit 130 displays the cooperation data stored in the visible area 2 of the cooperation PDF file on the reading screen (step S804). For the linked data, even if the text of the linked data is displayed as it is, in the case of XML format, for example, an XSLT (Extensible Stylesheet Language Transformations) style sheet is stored in an invisible area, and the XML data is converted using the style sheet. It may be displayed. In addition, the linked data may be displayed in a separate window, a balloon, or the like. When it is determined that the display instruction of the linked data is not accepted (step S803: No), the process ends.

このように、連携データをページコンテンツと別の可視領域に格納しておき、表示指示を受付けた場合に連携データを表示することによって、ＰＤＦファイルに含まれるデータのうちのどのデータが抽出されて連携データとして格納されたのかをユーザが容易に確認することができる。 In this way, by storing the linked data in a visible area separate from the page content and displaying the linked data when a display instruction is received, which data included in the PDF file is extracted. The user can easily confirm whether the data is stored as linked data.

次に、ＰＤＦファイルから連携データを抽出するためのルールを生成する処理について説明する。図９は、情報処理装置３００が実行する抽出ルール生成処理手順を示すフローチャートである。 Next, a process for generating a rule for extracting linked data from a PDF file will be described. FIG. 9 is a flowchart showing an extraction rule generation processing procedure executed by the information processing apparatus 300.

ルール生成部３１１は、ルールを生成するために参照するＰＤＦファイルを教師データ記憶部３２１から読み出す（ステップＳ９０１）。操作表示部３３０は、読み出したＰＤＦファイルを画面に表示する（ステップＳ９０２）。操作表示部３３０は、項目名の入力を受付ける（ステップＳ９０３）。項目名は、キーボード等から入力しても、予め設定された項目名をプルダウンメニュー等で表示し、表示した項目名を選択するようにしてもよい。 The rule generation unit 311 reads the PDF file referred to for generating the rule from the teacher data storage unit 321 (step S901). The operation display unit 330 displays the read PDF file on the screen (step S902). The operation display unit 330 accepts the input of the item name (step S903). The item name may be input from a keyboard or the like, or a preset item name may be displayed by a pull-down menu or the like and the displayed item name may be selected.

操作表示部３３０は、項目名に応じたＰＤＦファイル上の領域の指示を受付ける（ステップＳ９０４）。図１０は、画面に表示したＰＤＦファイルにおける項目名ごとの領域の指示の一例を示す説明図である。画面に表示されたＰＤＦファイル１１において、項目名“会社名”に対する領域１２、項目名“請求No”に対する領域１３、項目名“日付”に対する領域１４、項目名“金額”に対する領域１５、項目名“件名”に対する領域１６、項目名“項目”に対する領域１７を指示した様子を示す。 The operation display unit 330 receives an instruction of an area on the PDF file according to the item name (step S904). FIG. 10 is an explanatory diagram showing an example of an instruction of an area for each item name in the PDF file displayed on the screen. In the PDF file 11 displayed on the screen, the area 12 for the item name "company name", the area 13 for the item name "billing number", the area 14 for the item name "date", the area 15 for the item name "amount", and the item name. It shows how the area 16 for the "subject" and the area 17 for the item name "item" are instructed.

ルール生成部３１１は、指示された領域からＰＤＦファイルにおける領域の位置情報を算出する（ステップＳ９０５）。ルール生成部３１１は、指示された領域に含まれるデータを取得する（ステップＳ９０６）。例えば、項目名“会社名”のデータとして“ＡＡＡ株式会社様”を取得する。ルール生成部３１１は、領域の位置情報と項目名とを対応付けて中間ファイルに格納する。ルール生成部３１１は、教師データ記憶部３２１に記憶するＰＤＦファイルをすべて読出したか否かを判断する（ステップＳ９０７）。教師データ記憶部３２１に記憶するＰＤＦファイルをすべて読出していないと判断した場合（ステップＳ９０７：Ｎｏ）、ステップＳ９０１に戻り、次のＰＤＦファイルを読出す。 The rule generation unit 311 calculates the position information of the area in the PDF file from the instructed area (step S905). The rule generation unit 311 acquires the data included in the designated area (step S906). For example, "AAA Co., Ltd." is acquired as the data of the item name "company name". The rule generation unit 311 stores the position information of the area and the item name in an intermediate file in association with each other. The rule generation unit 311 determines whether or not all the PDF files stored in the teacher data storage unit 321 have been read (step S907). When it is determined that all the PDF files stored in the teacher data storage unit 321 have not been read (step S907: No), the process returns to step S901 and the next PDF file is read.

教師データ記憶部３２１に記憶するＰＤＦファイルをすべて読み出したと判断した場合（ステップＳ９０７：Ｙｅｓ）、ルール生成部３１１は、中間ファイルに格納した項目名ごとの領域の位置情報から、項目ごとの位置情報の平均値を算出する（ステップＳ９０８）。ルール生成部３１１は、中間ファイルに格納した項目名ごとのデータから、項目ごとのフォーマットを生成する（ステップＳ９０９）。例えば、項目名“会社名”のデータにおいて“様”が共通すると判断した場合、フォーマットを“ＸＸＸＸ様”とし、“ＸＸＸＸ”を連携データとして抽出する。 When it is determined that all the PDF files stored in the teacher data storage unit 321 have been read (step S907: Yes), the rule generation unit 311 determines the position information for each item from the position information of the area for each item name stored in the intermediate file. The average value of is calculated (step S908). The rule generation unit 311 generates a format for each item from the data for each item name stored in the intermediate file (step S909). For example, when it is determined that "sama" is common in the data of the item name "company name", the format is set to "XXXX-sama" and "XXXX" is extracted as linked data.

ルール生成部３１１は、項目名ごとに、位置情報の平均値とフォーマットを対応付けた抽出ルール定義ファイルを抽出ルール記憶部３２２に格納する（ステップＳ９１０）。このような処理で生成された抽出ルール記憶部３２２に記憶する抽出ルール定義ファイルは、所定のタイミング、または情報処理装置１００の要求に応じて情報処理装置１００に送信し、情報処理装置１００の抽出ルール記憶部１２１に格納する。 The rule generation unit 311 stores the extraction rule definition file in which the average value of the position information and the format are associated with each item name in the extraction rule storage unit 322 (step S910). The extraction rule definition file stored in the extraction rule storage unit 322 generated by such processing is transmitted to the information processing device 100 at a predetermined timing or in response to a request from the information processing device 100 to extract the information processing device 100. It is stored in the rule storage unit 121.

このように、数多くのＰＤＦファイルを参照し、項目名ごとに指示した領域の位置情報の平均値を算出するとともに、項目名ごとのフォーマットを生成することによって、ＰＤＦファイルから連携データを抽出する際に、項目名の内容に応じた適切なデータを抽出することができる。 In this way, when extracting linked data from a PDF file by referring to a large number of PDF files, calculating the average value of the position information of the area specified for each item name, and generating a format for each item name. In addition, it is possible to extract appropriate data according to the content of the item name.

他の実施例として、項目名ごとの領域の位置情報の平均値を算出することに代えて、項目名ごとの領域の位置情報のすべてを含む最大領域の位置情報を算出してもよい。また、項目名ごとに、１つの領域に限らず複数の領域の位置情報を算出してもよい。 As another embodiment, instead of calculating the average value of the area position information for each item name, the position information of the maximum area including all the area position information for each item name may be calculated. Further, the position information of not only one area but also a plurality of areas may be calculated for each item name.

抽出ルール生成処理は、上述した教師データから項目名ごとの領域の位置情報の平均値や最大値を算出することに代えて、項目名に対応するデータを数多くのＰＤＦファイルの項目名ごとの領域の指示によって学習した学習済みモデルを生成してもよい。これにより、ＰＤＦファイルに含まれる連携データをＡＩ（Artificial Intelligence：人工知能）を用いて抽出することができる。学習済みモデルを生成した場合には、情報処理装置１００の抽出ルール記憶部１２１に学習済みモデルを格納し、ＰＤＦファイルを入力することによって学習済みモデルを用いて項目名ごとのデータを出力する。 In the extraction rule generation process, instead of calculating the average value and the maximum value of the position information of the area for each item name from the above-mentioned teacher data, the data corresponding to the item name is generated in the area for each item name of many PDF files. A trained model learned by the instruction of may be generated. As a result, the linked data included in the PDF file can be extracted using AI (Artificial Intelligence). When the trained model is generated, the trained model is stored in the extraction rule storage unit 121 of the information processing device 100, and by inputting the PDF file, the data for each item name is output using the trained model.

他の実施例として、図４を用いて説明した、連携データをＰＤＦファイルに埋め込むことによって連携ＰＤＦファイルを生成する処理に代えて、または加えて、連携データを示すＱＲコードを生成し、生成したＱＲコードをＰＤＦファイルに追記して連携ＰＤＦファイルを生成する処理について説明する。 As another embodiment, instead of or in addition to the process of generating the linked PDF file by embedding the linked data in the PDF file described with reference to FIG. 4, a QR code indicating the linked data was generated and generated. The process of adding the QR code to the PDF file and generating the linked PDF file will be described.

図１１は、情報処理装置１００が実行する連携ＰＤＦファイル生成処理手順を示すフローチャートである。図１１のフローチャートにおけるステップＳ１１０１〜ステップＳ１１０５、ステップＳ１１０８、ステップＳ１１０９は、上述した図４のステップＳ４０１〜ステップＳ４０５、ステップＳ４０７、ステップＳ４０８と同様であるため、上述した説明を参照し、ここでの説明は省略する。 FIG. 11 is a flowchart showing a linked PDF file generation processing procedure executed by the information processing apparatus 100. Since steps S1101 to S1105, steps S1108, and steps S1109 in the flowchart of FIG. 11 are the same as steps S401 to S405, steps S407, and step S408 of FIG. 4 described above, the above description is referred to here. The description is omitted.

ＱＲコード生成部１１５は、ステップＳ１１０５において、データ抽出部１１１がＰＤＦファイルから抽出した連携データを示すＱＲコードを生成する（ステップＳ１１０６）。連携ＰＤＦファイル生成部１１２は、ＰＤＦファイルにＱＲコードを埋め込んだ連携ＰＤＦファイルを生成する（ステップＳ１１０７）。より具体的には、連携ＰＤＦファイル生成部１１２は、ＰＤＦファイルから抽出したＱＲコードをＰＤＦファイルの可視領域に埋め込む。 In step S1105, the QR code generation unit 115 generates a QR code indicating the linked data extracted from the PDF file by the data extraction unit 111 (step S1106). The linked PDF file generation unit 112 generates a linked PDF file in which a QR code is embedded in the PDF file (step S1107). More specifically, the linked PDF file generation unit 112 embeds the QR code extracted from the PDF file in the visible area of the PDF file.

図１２は、連携ＰＤＦファイルのデータ構成の一例を示す説明図である。図１２の連携ＰＤＦファイルのデータ構成例に示すように、連携ＰＤＦファイルは、可視領域にページコンテンツと連携データを記述したＱＲコードを記憶する。連携ＰＤＦファイルを画面に表示すると、画面表示例に示すように、ページコンテンツに記述された内容とＱＲコード２１が表示される。 FIG. 12 is an explanatory diagram showing an example of the data structure of the linked PDF file. As shown in the data configuration example of the linked PDF file of FIG. 12, the linked PDF file stores a QR code in which the page content and the linked data are described in the visible area. When the linked PDF file is displayed on the screen, the content described in the page content and the QR code 21 are displayed as shown in the screen display example.

このように、連携データをＱＲコードに変換してＰＤＦファイルに埋め込むことによって、画像として表示（または認識）されるＱＲコードを読取ることで連携データを取得することができ、他のシステムに容易に受け渡すことができる。また、紙に印刷したＰＤＦファイルであっても、ＱＲコードを読取ることによって連携データを取得することができる。また、スマートフォン等によってＱＲコードを読取ることができ、容易に連携データの内容を確認することができる。 In this way, by converting the linked data into a QR code and embedding it in a PDF file, the linked data can be acquired by reading the QR code displayed (or recognized) as an image, and it can be easily installed in other systems. Can be handed over. Further, even if the PDF file is printed on paper, the cooperation data can be acquired by reading the QR code. In addition, the QR code can be read by a smartphone or the like, and the contents of the linked data can be easily confirmed.

上述した実施例にかかる情報処理装置１００、情報処理装置２００、情報処理装置３００のハードウェア構成は、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）又はＧＰＵ（Graphics Processing Unit）等の１又は複数のプロセッサを含み、ＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ、ＳＳＤ（Solid State Drive）等の外部記憶装置、通信制御装置等を備えた通常のコンピュータであり、ＲＯＭやＲＡＭ、ＨＤＤ等に記憶されたプログラムをＣＰＵが読み出し動作させることによって、上述した構成や機能を実現する。なお、制御部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）等の電子回路であってもよい。 The hardware configuration of the information processing device 100, the information processing device 200, and the information processing device 300 according to the above-described embodiment is 1 such as a CPU (Central Processing Unit), an MPU (Micro-Processing Unit), or a GPU (Graphics Processing Unit). Alternatively, it includes a plurality of processors and is equipped with an external storage device such as a ROM (Read Only Memory), a RAM (Random Access Memory), an HDD (Hard Disk Drive), a flash memory, an SSD (Solid State Drive), a communication control device, and the like. It is an ordinary computer, and the above-described configuration and functions are realized by the CPU reading and operating a program stored in a ROM, RAM, HDD, or the like. The control unit may be an electronic circuit such as an ASIC (Application Specific Integrated Circuit) or a PLD (Programmable Logic Device).

情報処理装置１００、情報処理装置２００、情報処理装置３００で動作するプログラムは、インターネット等のネットワークに接続されたコンピュータ上に格納しておき、ネットワーク経由でダウンロードさせることにより提供したり、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、ＤＶＤ、ＵＳＢメモリ、ＳＤカード等のコンピュータで読取り可能な記録媒体に記録し提供してもよい。また、上述した機能や処理を実現するプログラムは、ＡＰＩ（Application Programming Interface）やＳａａＳ（Software as a Service）、クラウドコンピューティングという利用形態で提供してもよい。 The programs running on the information processing device 100, the information processing device 200, and the information processing device 300 can be provided or installed by storing them on a computer connected to a network such as the Internet and downloading them via the network. A file in a format or an executable format may be recorded and provided on a computer-readable recording medium such as a CD-ROM, DVD, USB memory, or SD card. Further, the program that realizes the above-mentioned functions and processes may be provided in the form of API (Application Programming Interface), SaaS (Software as a Service), or cloud computing.

上述した実施例では、情報処理装置１００、情報処理装置２００、情報処理装置３００を別々の装置として説明したが、情報処理装置１００、情報処理装置２００、情報処理装置３００の機能を１つの装置として構成してもよく、いずれか２つの装置の機能を組合せて構成してもよい。 In the above-described embodiment, the information processing device 100, the information processing device 200, and the information processing device 300 have been described as separate devices, but the functions of the information processing device 100, the information processing device 200, and the information processing device 300 are regarded as one device. It may be configured, or may be configured by combining the functions of either two devices.

なお、本発明は、上述した実施例そのままに限定されるものではなく、必ずしも物理的に図示のように構成されている必要はない。また、本発明は、実施例で説明した構成要素の全部または一部を、各種の負荷や使用状況などに応じ、任意の単位で機能的または物理的に分割、統合、入替、変形または削除して構成することができる。 The present invention is not limited to the above-described embodiment as it is, and does not necessarily have to be physically configured as shown in the figure. Further, the present invention functionally or physically divides, integrates, replaces, transforms or deletes all or a part of the components described in the examples in arbitrary units according to various loads and usage conditions. Can be configured.

１０…データ連携システム、１００…情報処理装置、１１０…制御部、１１１…データ抽出部、１１２…連携ＰＤＦファイル生成部、１１３…タイムスタンプ付与部、１１４…電子署名付与部、１１５…ＱＲコード生成部、１２０…記憶部、１２１…抽出ルール記憶部、１３０…操作表示部、１４０…通信部、２００…情報処理装置、２１０…制御部、２１１…データ連携部、２１２…タイムスタンプ検証部、２１３…電子署名検証部、２２０…記憶部、２２１…連携ＰＤＦファイル記憶部、２２２…連携データ記憶部、２３０…操作表示部、２４０…通信部、３００…情報処理装置、３１０…制御部、３１１…ルール生成部、３２０…記憶部、３２１…教師データ記憶部、３２２…抽出ルール記憶部、３３０…操作表示部、３４０…通信部、４００…情報処理装置（他のシステム）
10 ... Data linkage system, 100 ... Information processing device, 110 ... Control unit, 111 ... Data extraction unit, 112 ... Linked PDF file generation unit, 113 ... Time stamping unit, 114 ... Electronic signature addition unit, 115 ... QR code generation Unit, 120 ... Storage unit, 121 ... Extraction rule storage unit, 130 ... Operation display unit, 140 ... Communication unit, 200 ... Information processing device, 210 ... Control unit, 211 ... Data linkage unit, 212 ... Time stamp verification unit, 213 ... Electronic signature verification unit, 220 ... Storage unit, 221 ... Linked PDF file storage unit, 222 ... Linked data storage unit, 230 ... Operation display unit, 240 ... Communication unit, 300 ... Information processing device, 310 ... Control unit, 311 ... Rule generation unit, 320 ... Storage unit, 321 ... Teacher data storage unit, 322 ... Extraction rule storage unit, 330 ... Operation display unit, 340 ... Communication unit, 400 ... Information processing device (other system)

Claims

An extraction rule storage means for storing an area in a PDF file and an item name indicating the meaning of data in association with each other.
A data extraction means for extracting data included in the area associated with the item name stored in the extraction rule storage means from the input PDF file, and a data extraction means.
A linked PDF file generating means for generating a linked PDF file in which linked data in which the item name and the data extracted by the data extracting means are associated with the PDF file is added to the PDF file.
An information processing device characterized by being equipped with.

The extraction rule storage means further stores a format for extracting the data in association with the item name.
The information processing apparatus according to claim 1, wherein the data extraction means extracts the data by using the format stored in the extraction rule storage means.

A QR code generating means for generating a QR code indicating the linked data is further provided.
The information processing device according to claim 1 or 2, wherein the linked PDF file generating means adds the QR code generated by the QR code generating means to the linked PDF file.

The information processing device according to claim 1 or 2, wherein the linked PDF file generating means adds the linked data to an invisible region of the PDF file.

Display instruction receiving means for receiving the display instruction of the linked data and
Further provided with a linked data display means for displaying the linked data,
The linked PDF file generation means adds the linked data to the visible area of the PDF file, and adds the linked data to the visible area.
Claim 1 or claim that the linked data display means reads out the linked data from the linked PDF file and displays the linked data when the display instruction receiving means receives the display instruction of the linked data. 2. The information processing apparatus according to 2.

The information processing apparatus according to any one of claims 1 to 5, further comprising a time stamping means for adding a time stamp to the linked PDF file.

The information processing apparatus according to any one of claims 1 to 6, further comprising an electronic signature giving means for giving an electronic signature to the linked PDF file.

A data linkage means for acquiring the linkage data from the linkage PDF file, and
The information processing apparatus according to any one of claims 1 to 7, further comprising a transmission means for transmitting the linked data acquired by the data linking means to another system.

PDF file display means to display PDF files,
In the PDF file displayed by the PDF file display means, the area receiving means for receiving each area of the item name and the area receiving means.
Any of claims 1 to 8, further comprising an extraction rule storage means for storing each area of the item name received by the area reception means in the extraction rule storage means in association with the item name. The information processing device according to one.

PDF file acquisition means to acquire PDF files,
Obtained by the PDF file acquisition means using a learning model learned based on teacher data including an item name indicating the meaning of data and an area in which data corresponding to the item name is described on one or a plurality of pages. A data extraction means for extracting the item name and data in the PDF file, and
A linked PDF file generation means for generating a linked PDF file in which the linked data in which the item name and the linked data are associated with the PDF file is added to the PDF file.
An information processing device characterized by being equipped with.

A data linkage system including an information processing device and a server device connected via a network.
The information processing device
An extraction rule storage means for storing an area in a PDF file and an item name indicating the meaning of data in association with each other.
A data extraction means for extracting data included in the area associated with the item name stored in the extraction rule storage means from the input PDF file, and a data extraction means.
A linked PDF file generating means for generating a linked PDF file in which linked data in which the item name and the data extracted by the data extracting means are associated with the PDF file is added to the PDF file.
A transmission means for transmitting the linked PDF file generated by the linked PDF file generation means to the server device, and
The server device
A receiving means for receiving the linked PDF file from the information processing device, and
A data linkage means for acquiring the linkage data from the linkage PDF file received by the receiving means, and a data linkage means.
A data linkage system comprising: a transmission means for transmitting the linkage data acquired by the data linkage means to another system connected via a network.

A method executed by a computer including an extraction rule storage means for storing an area in a PDF file and an item name indicating the meaning of data in association with each other.
A data extraction step of extracting data included in the area associated with the item name stored in the extraction rule storage means from the input PDF file, and a data extraction step.
A method characterized in that the PDF file includes a linked PDF file generation step for generating a linked PDF file in which linked data associated with the item name and the data extracted by the data extraction step is added. ..

It ’s a method that runs on a computer.
PDF file display step to display PDF file and
In the PDF file displayed by the PDF file display step, the area reception step for accepting each area of the item name and the area reception step.
A method characterized by including an extraction rule storage step of storing each area of the item name received by the area reception step in the extraction rule storage means in association with the item name.

A program comprising causing a computer to execute the method according to claim 12 and 13.