JP6550721B2

JP6550721B2 - Information processing apparatus, document management system, processing method thereof, and program

Info

Publication number: JP6550721B2
Application number: JP2014222121A
Authority: JP
Inventors: 光雄久保田
Original assignee: Canon Marketing Japan Inc
Current assignee: Canon Marketing Japan Inc
Priority date: 2014-10-31
Filing date: 2014-10-31
Publication date: 2019-07-31
Anticipated expiration: 2034-10-31
Also published as: JP2016091131A

Description

ファイル変換に関する情報処理装置、文書管理システムとその処理方法及びプログラムに関する。
The information processing apparatus relating to file conversion relates to the processing method and a program with the document management system.

従来より、紙媒体である帳票から、ホスト（上位システム）からデータを受信して電子帳票（帳票データ）に変換して、保存・管理する電子帳票システムがある。
この電子帳票システムは、独自のファイル形式にデータを変換して、システム上で電子帳票を閲覧、検索できるようになっている。（例えば、特許文献１） Conventionally, there is an electronic form system which receives data from a form, which is a paper medium, from a host (upper system), converts the data into an electronic form (form data), and stores and manages the form.
This electronic form system converts data into a unique file format so that the electronic form can be browsed and searched on the system. (For example, Patent Document 1)

特開２０１２−１２３５７４号公報JP 2012-123574 A

しかしながら、近年では電子帳票システムに取り込むデータはメインフレームなどのデータに限らず、汎用的なコンピュータで生成されるＰＤＦ形式などのファイルを取り込むことが望まれてきている。 However, in recent years, it is desirable to capture not only data such as mainframes but also files such as PDF format generated by a general-purpose computer, in addition to data such as mainframes.

この場合、単にＰＤＦファイルを取り込むことは容易であるが、電子帳票システムに保存されている独自形式の電子帳票と同じように閲覧や閲覧を行うのは難しかった。 In this case, it is easy to simply import a PDF file, but it has been difficult to view and browse the document in the same manner as the electronic form of the unique format stored in the electronic form system.

これは、電子帳票システムは、これまで入力となる帳票データ（ホスト帳票）を高速機（ラインプリンタ）に出力することを起点として取り込んでいるため、ページ情報・フォーム情報・データ情報（テキスト、文字）の形式にすることが前提となっているからである。ページ情報は行やページのレイアウトを決定する情報、フォーム情報は罫線や画像情報を決定する情報、データ情報はデータ（テキスト、文字）を行単位・文字単位で決定する情報をそれぞれ持つものである。これに対し、ＰＤＦファイルなどの中間形式のファイルには「行」という概念は存在せず、自由なレイアウトが可能となっており、保存ができたとしても電子帳票と同じように扱うことは難しかった。 This is because the electronic form system takes in the form data (host form), which has been input so far, as the starting point of outputting to the high-speed machine (line printer), page information, form information, data information (text, characters It is assumed that it takes the form of). Page information is information that determines line and page layout, form information is information that determines ruled lines and image information, and data information is information that determines data (text and characters) in line units and character units. . On the other hand, there is no concept of “line” in intermediate format files such as PDF files, and free layout is possible, and it is difficult to treat them like electronic documents even if they can be saved. The

そこで本発明の目的は、ＰＤＦファイルなどの文書ファイルを、他のファイル形式のシステムで容易に扱えるようにする仕組みを提供することである。
Accordingly, an object of the present invention is to provide a mechanism to handle the document files such as PDF files, easily in other file formats of the system.

本発明の目的を達成するための、テキストデータとテキスト属性情報と罫線情報とを含む第１の形式の文書ファイルを用いて生成されたテキスト情報とフォーム情報とページ情報で構成される第２の形式の文書ファイルを記憶させる情報処理装置であって、複数ページの前記第１の形式の文書ファイルであって、当該文書ファイルの各ページから抽出されたテキストの配置位置を取得する配置位置取得手段と、前記配置位置取得手段により取得された複数のページにおけるテキストの配置位置に基づいて、前記罫線情報により特定される共通のフォームにおけるテキストデータの行ごとの出力領域であるページ共通の行情報を決定する決定手段と、前記第１の形式の文書ファイルの罫線情報から抽出された複数のページで共通のフォーム情報と、前記第１の形式の文書ファイルのテキストデータとテキスト属性情報から抽出されたテキスト情報と、前記行情報とを用いた第2の形式の文書ファイルを登録する登録手段とを備えることを特徴とする。 In order to achieve the object of the present invention, a second information system comprises text information, form information and page information generated using a document file of a first format including text data, text attribute information and ruled line information an information processing apparatus for storing the document file format, a document file of the first type of a plurality of pages, arrangement position acquisition for acquiring positions of text extracted from each page of the document file Line information that is an output area for each line of text data in the common form specified by the ruled line information based on the arrangement position of the text on the plurality of pages acquired by the means and the arrangement position acquisition means common form information in a plurality of pages and determining means for determining, extracted from the ruled line information of the document file of the first format And characterized in that it comprises a registration means for registering a second format document file using said text data and the text information extracted from the text attribute information of the first format document file, and the line information Do.

本発明によれば、ＰＤＦファイルなどの文書ファイルを、他のファイル形式のシステムで容易に扱えるようにすることができる。
According to the present invention, it is possible to handle the document files such as PDF files, easily in other file formats of the system.

本発明に実施形態に係る電子帳票管理システムの概略構成図を示す図である。It is a figure which shows the schematic block diagram of the electronic document management system based on embodiment to this invention. 図１における帳票サーバのハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the report server in FIG. 帳票ファイルの構成例を示す図である。It is a figure which shows the structural example of a form file. ＰＤＦファイルから帳票ファイルへの変換結果を示す図である。It is a figure which shows the conversion result from a PDF file to a report file. ＰＤＦ帳票定義設定画面の例を示す図である。It is a figure which shows the example of a PDF form definition setting screen. 変換処理の全体を示すフローチャートである。It is a flow chart which shows the whole of conversion processing. ＰＤＦ構成情報の一例を示す図である。It is a figure which shows an example of PDF structure information. データ情報の一例を示す図である。It is a figure which shows an example of data information. 行情報集計処理のフローチャートである。It is a flowchart of line information total processing. Ｙ座標を集約した行の概略を示すイメージ図である。It is an image figure showing an outline of a line which summarized Y coordinates. 共有ページ定義の概念図の一例を示す図である。It is a figure which shows an example of the conceptual diagram of a shared page definition. プリントラインの概略を示す図である。It is a figure which shows the outline of a print line. プリントデータオブジェクトに割り当てる処理の概略図である。FIG. 6 is a schematic diagram of processing assigned to a print data object. 共通ページ定義処理のフローチャートである。It is a flowchart of common page definition processing. 独自帳票データ（帳票実績ファイル）を作成する処理のフローチャートである。It is a flowchart of a process of creating original form data (form result file).

以下、本発明の実施の形態を図面を参照して詳細に説明する。
図１は、本発明の実施形態に係る電子帳票システムの概略構成を示す図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a view showing a schematic configuration of an electronic form system according to an embodiment of the present invention.

図１において、帳票サーバ１００は、電子帳票管理装置として、複数種類の電子帳票（以下、単に「帳票」と称す）のデータ（以下、「帳票データ」または「帳票ファイル」と称す）を記憶、管理し、ネットワーク１０３を介してクライアントＰＣ１０１，１０２に対して、帳票ファイルの閲覧や検索等のサービスを提供する。また帳票サーバ１００は、ホスト１０４から出力されたホストデータを電子帳票に変換して保存している。さらに、帳票サーバ１００はＰＤＦファイルを受け付け、ＰＤＦファイルを帳票データに変換する。 In FIG. 1, the form server 100 stores, as an electronic form management device, data (hereinafter referred to as “form data” or “form file”) of a plurality of types of electronic forms (hereinafter simply referred to as “form”). It manages and provides services such as browsing and searching of form files to the client PCs 101 and 102 via the network 103. Further, the form server 100 converts the host data output from the host 104 into an electronic form and stores the converted form. Furthermore, the form server 100 receives the PDF file and converts the PDF file into form data.

クライアントＰＣ１０１，１０２は、ユーザが利用するパーソナルコンピュータ等の情報処理装置である。クライアントＰＣ１０１，１０２は、ネットワーク１０３を介して帳票サーバ１００に通信可能に構成され、帳票サーバ１００に対して帳票ファイルの検索要求や検索結果の閲覧等が可能である。 The client PCs 101 and 102 are information processing apparatuses such as personal computers used by users. The client PCs 101 and 102 are configured to be communicable with the form server 100 via the network 103, and can make a search request for a form file to the form server 100 and browse the search results.

ネットワーク１０３は、例えば、ＬＡＮ（ＬＯＣＡＬＡＲＥＡＮＥＴＷＯＲＫ）やインターネット等で構成される。なお、ネットワーク１０３に接続される各種装置については、図示例に限らず、用途や目的に応じて様々な装置が接続されていてもよい。
すなわち、図１は、帳票データを表示する情報処理装置と、帳票データを管理するサーバとが通信可能に接続されている帳票システムである。 The network 103 is configured by, for example, a LAN (LOCAL AREA NETWORK), the Internet, or the like. The various devices connected to the network 103 are not limited to the illustrated example, and various devices may be connected according to the application and purpose.
That is, FIG. 1 shows a form system in which an information processing apparatus for displaying form data and a server for managing form data are communicably connected.

図２は、図１における帳票サーバ１００のハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram showing an example of the hardware configuration of the form server 100 in FIG.

図２において、ＣＰＵ（中央演算装置）２０１は、システムバス２０４に接続される各デバイスを統括的に制御するとともに、各種プログラムを実行することで様々な機能を実現する。ＲＡＭ２０２は、ＣＰＵ２０１の主メモリであり、ワークエリア、一時退避領域等として機能する。ＲＯＭ２０３或いは外部メモリ２１１には、ＣＰＵ２０１の制御プログラムであるオペレーティングシステム（ＯＳ）や、帳票サーバ１００において様々な機能を実現するためのプログラム２１２が記憶されている。ＣＰＵ２０１は、これらのプログラムを必要に応じてＲＡＭ２０２にロードして実行する。プログラム２１２の実行時に用いられる帳票ファイル２１３およびインデックスファイル２１４は、外部メモリ２１１に格納されている。ＤＢ（データベース）２１５には、帳票データの格納情報（帳票ＩＤ、帳票名、格納先ディレクトリの紐付けなど）が記憶されている。 In FIG. 2, a CPU (Central Processing Unit) 201 controls various devices connected to the system bus 204 and implements various functions by executing various programs. A RAM 202 is a main memory of the CPU 201 and functions as a work area, a temporary save area, and the like. The ROM 203 or the external memory 211 stores an operating system (OS) that is a control program of the CPU 201 and a program 212 for realizing various functions in the form server 100. The CPU 201 loads these programs into the RAM 202 and executes them as necessary. The form file 213 and the index file 214 used when executing the program 212 are stored in the external memory 211. The DB (database) 215 stores form data storage information (form ID, form name, storage destination directory link, etc.).

入力コントローラ２０５は、例えば、キーボードやマウス等で構成された入力部２０９からの操作入力を制御する。表示コントローラ２０６は、表示部２１０の表示を制御する。表示部２１０は、例えば、ＣＲＴや液晶ディスプレイ等で構成される。 The input controller 205 controls an operation input from the input unit 209 configured of, for example, a keyboard, a mouse, and the like. The display controller 206 controls display on the display unit 210. The display unit 210 is configured of, for example, a CRT or a liquid crystal display.

外部メモリコントローラ（ＭＣ）２０７は、ブートプログラム、各種のアプリケーション、ユーザファイル、編集ファイルを記憶する外部メモリ２１１へのアクセスを制御する。加えて、サーバ或いは各クライアントＰＣの各種機能を実現するための各種テーブル、パラメータが記憶されている。外部メモリ２１１は、ハードディスク（ＨＤ）やフレキシブルディスク（ＦＤ）、磁気テープドライブ等で構成される。
通信Ｉ／Ｆコントローラ２０８は、ネットワーク１０３を介して行われる、クライアントＰＣ等の外部機器との通信を制御する。 An external memory controller (MC) 207 controls access to the external memory 211 that stores a boot program, various applications, user files, and edit files. In addition, various tables and parameters for realizing various functions of the server or each client PC are stored. The external memory 211 is configured by a hard disk (HD), a flexible disk (FD), a magnetic tape drive, or the like.
The communication I / F controller 208 controls communication with an external device such as a client PC performed via the network 103.

クライアントＰＣ１０１，１０２は、帳票サーバ１００と同様のハードウェア構成を有するので、その説明は省略する。なお、帳票ファイル２１３については、クライアントＰＣが、表示するために必要な部分のみを帳票サーバ１００から受信するので、クライアントＰＣ１０１，１０２が予め保持することはない。クライアントＰＣ１０１，１０２は、帳票ファイル２１３を表示するためのクライアントモジュールを保持し、帳票サーバ１００内のプログラム２１２とは異なるプログラムを保持する。また、クライアントＰＣ１０１，１０２では、帳票ファイル２１３の表示若しくは任意の帳票ファイル２１３に対する検索条件の指定等が可能である。
帳票サーバ１００は、ＰＤＦファイルを受信すると、ＣＰＵ２０１により、ＰＤＦファイルを解析して、帳票ファイル２１３を生成する。 Since the client PCs 101 and 102 have the same hardware configuration as the form server 100, the description thereof is omitted. As for the form file 213, the client PC receives only the part necessary for display from the form server 100, so the client PC 101, 102 does not hold it in advance. The client PCs 101 and 102 hold a client module for displaying the form file 213, and hold a program different from the program 212 in the form server 100. The client PCs 101 and 102 can display the form file 213 or specify a search condition for an arbitrary form file 213.
When the form server 100 receives the PDF file, the CPU 201 analyzes the PDF file and generates the form file 213.

図３は、帳票ファイル２１３の構成例を示す図である。
帳票ファイル２１３（第１の文書データ）は、帳票における罫線や図などのレイアウトを定めるフォーム情報ファイル３３０１（テキストデータと合成するためのフォームデータを記憶する第４のファイル）と、帳票に含まれる文字に関するテキスト情報ファイル３３０２（テキストデータを記憶する第１のファイル）と、帳票における行ピッチや文字のフォントなどを定める文字情報ファイル３３０３（テキストデータの属性情報を記憶する第２のファイル）と、帳票において帳票テキストデータの配置を定めるページ情報ファイル３３０４（テキストデータの描画位置を記憶する第３のファイル）とで構成されている。 FIG. 3 is a view showing a configuration example of the form file 213. As shown in FIG.
The form file 213 (first document data) is included in a form information file 3301 (a fourth file for storing form data to be combined with text data) which defines layouts such as ruled lines and figures in the form and the form. A text information file 3302 (a first file for storing text data) related to characters, and a character information file 3303 (a second file for storing attribute information of text data) for defining line pitches and fonts of characters in a form The form information is constituted by a page information file 3304 (a third file for storing the drawing position of the text data) which determines the arrangement of the form text data.

クライアントＰＣ１０１が帳票（帳票イメージ）を表示する場合、帳票サーバ１００で図示の４つの情報ファイルから表示用の帳票データを生成し、その帳票データをダウンロードした後にハードディスクに一時保存し、ワークメモリに読み込んだ後にディスプレイに帳票として表示する。 When the client PC 101 displays a form (form image), form data for display is generated from the four information files shown in the form server 100, and the form data is downloaded and temporarily stored in the hard disk and read into the work memory. After that, it is displayed on the display as a form.

また、帳票ファイル２１３は、１つまたは複数のファイルによって構成され、１または複数のページを含有している。帳票サーバ１００は、複数の帳票ファイル２１３のうちの同じ種類のものをグループ化して管理しており、それぞれの帳票ファイル２１３はそのグループの中の世代と呼ばれる。例えば、営業日報のように、日時が異なる同じ種類の帳票が世代として管理される。 The form file 213 is composed of one or more files and contains one or more pages. The form server 100 manages the same type of form files 213 of the plurality of form files as a group, and each form file 213 is called a generation in the group. For example, like business daily reports, forms of the same type with different dates are managed as generations.

次に、図４を用いて、ＰＤＦファイル（テキストデータとテキストデータの属性情報と罫線情報とを１ファイルに含む第２の文書データ）から帳票ファイル２１３（第１のファイルと第２のファイルと第３のファイルと第４のファイルを含む第２の文書データ）へ変換した場合の模式図について説明する。 Next, referring to FIG. 4, the form file 213 (first file and second file) is generated from the PDF file (second document data including text data, text data attribute information, and ruled line information in one file). A schematic diagram in the case of conversion to the second document data including the third file and the fourth file will be described.

入力されたＰＤＦファイルを電子帳票システムにおいて登録すると、ＰＤＦファイルを解析して、フォーム情報ファイル、テキスト情報ファイル、ページ情報ファイル、文字情報ファイルを生成する。 When the input PDF file is registered in the electronic form system, the PDF file is analyzed to generate a form information file, a text information file, a page information file, and a character information file.

例えば、図４は、１行目「あいうえお」２行目「かきくけこ」３行目「さしすせそ」で、各文字列の下には罫線（下線）が備わっているＰＤＦファイルを帳票ファイルに変換する例である。 For example, FIG. 4 shows the first line "Aieueo", the second line, "Kakigukeko" and the third line "Sashisuseso", and converts a PDF file having ruled lines (underscore) under each character string into a form file This is an example of

変換すると、テキスト情報ファイルには、座標の原点、行数、使用するページ情報ファイルＩＤ、使用するフォーム情報ファイルＩＤ、各行のテキストデータが記憶されている。 When converted, the text information file stores the origin of coordinates, the number of lines, the page information file ID to be used, the form information file ID to be used, and the text data of each line.

文字情報ファイルには、各文字の情報、例えば、フォント、高さ、幅、ピッチ、太さ、斜体、ＳＫＩＰ（先頭文字の場合の原点からずらすドット数）が記憶されている。 The character information file stores information of each character, for example, font, height, width, pitch, thickness, italics, and SKIP (the number of dots shifted from the origin in the case of the first character).

ページ情報ファイルには、各ページの情報、例えば、用紙サイズ、改行幅数、レコード数、ページ情報ファイルＩＤ、印字開始水平位置、印字開始垂直位置、テキスト開始位置、テキスト終了位置が記憶されている。位置は、ドット数であるものとする。
フォーム情報ファイルは、フォーム情報ファイルＩＤ、座標原点、罫線を含む画像ファイルを記憶している。 The page information file stores information of each page, for example, the paper size, the number of line breaks, the number of records, the page information file ID, the print start horizontal position, the print start vertical position, the text start position, and the text end position. . The position is assumed to be the number of dots.
The form information file stores an image file including a form information file ID, a coordinate origin, and a ruled line.

次に、図５を用いて、ＰＤＦ登録の前提となる、ＰＤＦ帳票定義の作成について説明する。 Next, creation of a PDF form definition, which is a premise of PDF registration, will be described using FIG.

管理者のクライアントＰＣから帳票サーバ１００にアクセスし、図５の設定画面をクライアントＰＣに表示する。この設定画面で、帳票サーバ１００の所定のフォルダに格納されたＰＤＦを検知する設定を行う。 The administrator's client PC accesses the form server 100 to display the setting screen of FIG. 5 on the client PC. In this setting screen, setting is made to detect the PDF stored in a predetermined folder of the form server 100.

帳票種別（ＰＤＦ）、ホスト転送ファイル名は、格納されたＰＤＦファイルのファイル名を指定する。これにより必要なＰＤＦファイルを取得して帳票ファイルに変換することが可能となる。 The form type (PDF) and the host transfer file name designate the file name of the stored PDF file. This makes it possible to obtain necessary PDF files and convert them into form files.

その他、入力となる帳票データを変換するためのホストプリンタ名を指定する。これらの設定情報をデータベース等の記憶部に保存し、この設定情報をもとにＰＤＦファイルを取得して変換処理を実行する。 In addition, the host printer name for converting the form data to be input is specified. The setting information is stored in a storage unit such as a database, and a PDF file is acquired based on the setting information to execute conversion processing.

次に、図６を用いて、変換処理の全体のフローチャートについて説明する。なお、各処理は帳票サーバ１００のＣＰＵ２０１が実行するものとする。
ステップＳ０１では、帳票サーバ１００がＰＤＦファイルを受信して、ＳＰＯＯＬフォルダに格納する。 Next, referring to FIG. 6, the entire flowchart of the conversion process will be described. Each process is executed by the CPU 201 of the form server 100.
In step S01, the form server 100 receives the PDF file and stores it in the SPOOL folder.

ステップＳ０２では、帳票サーバ１００は、ＳＰＯＯＬフォルダを監視しており、ＳＰＯＯＬフォルダに格納されたＰＤＦファイルを取得する。すなわち、テキストデータとテキストデータの属性情報と罫線情報とを１ファイルに含む第２の文書データを取り込む取込処理の一例を示すステップである。 In step S02, the form server 100 monitors the SPOOL folder, and acquires the PDF file stored in the SPOOL folder. That is, this step is a step showing an example of a loading process for loading second document data including text data, text data attribute information, and ruled line information in one file.

ステップＳ０３では、図５で設定したＰＤＦ帳票定義に一致するＰＤＦファイルが存在するかを判定する。一致する（所定の）ＰＤＦファイルが存在すると判定した場合には、ステップＳ０５へ処理を移す。一致する（所定の）ＰＤＦファイルが存在しないと判定した場合には、ステップＳ０４へ処理を移し、エラーを記録する（ステップＳ０４）。 In step S03, it is determined whether there is a PDF file that matches the PDF form definition set in FIG. If it is determined that a matching (predetermined) PDF file exists, the process proceeds to step S05. If it is determined that there is no matching (predetermined) PDF file, the process proceeds to step S04, and an error is recorded (step S04).

ステップＳ０５では、ＰＤＦ帳票定義の種別が「ＰＤＦ」であるか判定する。「ＰＤＦ」である場合には、ステップＳ０７へ処理を移す。「ＰＤＦ」でない場合には、ステップＳ０６へ処理を移しエラーを記録する（ステップＳ０６） In step S05, it is determined whether the type of the PDF form definition is "PDF". If it is "PDF", the process proceeds to step S07. If it is not "PDF", the process moves to step S06 and an error is recorded (step S06).

ステップＳ０７では、ＰＤＦファイルの解析を開始する。解析では、罫線を抽出し、同一フォームかと特定し、ＰＤＦ構成情報のもと情報を生成する。また、ＰＤＦファイルの文字列から、文字ごとの座標を決定し、データ情報のもと情報を生成する。 In step S07, analysis of the PDF file is started. In the analysis, ruled lines are extracted, it is identified as the same form, and information based on PDF configuration information is generated. Also, coordinates of each character are determined from the character string of the PDF file, and source information of data information is generated.

ステップＳ０８では、ＰＤＦファイルの解析結果により中間ファイルを生成する。この中間ファイルを一時記憶フォルダに保存する。中間ファイルは、データ情報（ＳＶＧ形式）と、フォーム情報（ＳＶＧ形式）と、ＰＤＦ構成情報（テキスト形式）である。ＳＶＧ形式はＸＭＬ形式で記載されたファイルである。 In step S08, an intermediate file is generated based on the analysis result of the PDF file. This intermediate file is saved in the temporary storage folder. The intermediate files are data information (SVG format), form information (SVG format), and PDF configuration information (text format). The SVG format is a file described in XML format.

データ情報は、ＰＤＦファイルのページ分作成され、各ページごとにテキストデータや、描画される位置情報（一文字ずつ）が記憶されている。データ情報の一例を示す図が、図８の８０１である。８０１をもとに、各文字のＹ座標ごと（行ごと）に、Ｘ座標の小さい順に文字を並べ変える。すなわち、テキストデータの１文字ごとに配置位置のＸ座標に従って、各行の文字を配列する配列処理である。 The data information is created for each page of the PDF file, and text data and position information (one character each) to be drawn are stored for each page. A diagram showing an example of the data information is 801 in FIG. Based on 801, the characters are rearranged in ascending order of the X coordinate for each Y coordinate (each line) of each character. That is, the arrangement processing is to arrange characters of each line according to the X coordinate of the arrangement position for each character of text data.

フォーム情報は、ＰＤＦファイルの各ページの罫線情報、画像などがことある数分作成される。すべてのページが同じフォーム（レイアウト）である場合には、フォーム情報は１つとなる。なお、罫線抽出は、PDFファイルに含まれる罫線属性を持つ情報、或いは、既知の罫線抽出技術を用いて、罫線を取得することで実現する。抽出された罫線情報をもとにフォーム情報が生成される。すなわち、第２の文書データから、罫線情報を抽出する罫線抽出処理である。
異なるものかどうかは、各ページから抽出される罫線情報・画像などの情報のハッシュ値をもとに判断する。 The form information is created as many pieces as ruled line information of each page of the PDF file, an image, and the like. If all pages have the same form (layout), there is one form information. The ruled line extraction is realized by acquiring ruled lines using information having ruled line attributes included in the PDF file or using a known ruled line extraction technique. Form information is generated based on the extracted ruled line information. That is, this is ruled line extraction processing for extracting ruled line information from the second document data.
Whether it is different or not is determined based on hash values of information such as ruled line information and images extracted from each page.

ＰＤＦ構成情報は、ＰＤＦファイルにつき、１つ作成されるファイルであり、ＰＤＦファイルが何ページからなるか、各ページで使用されるページ情報やフォーム情報を定義するファイルである。ＰＤＦ構成情報の一例を示す図が、図７である。 The PDF configuration information is a file created for each PDF file, and is a file that defines how many pages the PDF file comprises, page information used on each page, and form information. FIG. 7 shows an example of the PDF configuration information.

ステップＳ０９では、中間ファイルをもとに、行情報を集計する。行情報の集計については、図９のフローチャートを用いて後述する。すなわち、第２の文書データの各ページのテキストデータの配置位置から、ページ共通の行を特定する行特定処理である。
ステップＳ１０では、行情報集計結果から 共通ページ定義を作成する。共通ページ定義の作成については、図１４のフローチャートを用いて後述する。
ステップＳ１１では、共通ページ定義を使用して、独自帳票データを作成する。帳票データの作成については、図１５のフローチャートを用いて後述する。
ステップＳ１２では、本処理を終了する。 In step S09, line information is totaled based on the intermediate file. The tabulation of line information will be described later using the flowchart of FIG. That is, this is a line specifying process for specifying a line common to the pages from the arrangement position of the text data of each page of the second document data.
In step S10, a common page definition is created from the line information totaling result. The creation of the common page definition will be described later using the flowchart of FIG.
In step S11, unique form data is created using the common page definition. The creation of form data will be described later with reference to the flowchart of FIG.
In step S12, this process ends.

次に、図９のフローチャートを用いて、行情報集計処理について説明する。
ステップＳ１３では、ＰＤＦ構成情報を読み込み、ステップＳ１４では、内部メモリに保持する。 Next, line information tabulation processing will be described using the flowchart of FIG.
In step S13, the PDF configuration information is read, and in step S14, it is held in the internal memory.

ステップＳ１５では、ＰＤＦ構成情報のヘッダーページのフォーム情報を確認し、フォームごとにファイルリストを作成し、内部メモリに保持する。ファイルリストの例が９０１で、フォーム情報ごとに、どのデータ情報かを対応付けて記憶している。
ステップＳ１６では、作成したデータファイルリストを取得し、フォーム単位に処理を開始する。
ステップＳ１７では、データ情報を読み込み、ステップＳ１８では、読み込んだデータから行リスト（８０２）として、内部メモリに保持する。 In step S15, the form information of the header page of the PDF configuration information is confirmed, a file list is created for each form, and is stored in the internal memory. An example of the file list is 901, and which form of data information is stored in association with each form information.
In step S16, the created data file list is acquired, and processing is started in units of forms.
At step S17, data information is read, and at step S18, the read data is stored in the internal memory as a line list (802).

ステップＳ１９では、行リストから１行分のデータを読み込み、ステップＳ２０では、行情報集計結果の垂直（Y座標）終了位置として、読み込んだ１行分のデータのY座標データを電子帳票システム内の内部メモリに追加する。この時、同じY座標データが存在した場合は追加せず、行高さ（文字の大きさ）の値を引いた値を垂直（Y座標）開始位置として追加する。すなわち、テキストデータの配置位置のＹ座標に従って、行の位置を決定する行位置決定処理である。
図１０がＹ座標を集約した行の概略を示すイメージ図である。
ステップＳ２１では、行リスト分繰り返し、行集計結果にY座標データをマージする。 In step S19, data for one line is read from the line list, and in step S20, the Y coordinate data of the read data for one line is used as the end position of the vertical (Y coordinate) of the line information tabulation result in the electronic document system. Add to internal memory. At this time, when the same Y coordinate data exists, it is not added, but a value obtained by subtracting the value of line height (size of character) is added as a vertical (Y coordinate) start position. That is, it is line position determination processing of determining the position of a line according to the Y coordinate of the arrangement position of the text data.
FIG. 10 is an image diagram showing an outline of a row in which Y coordinates are collected.
In step S21, the row list is repeated, and the Y-coordinate data is merged with the row aggregation result.

ステップＳ２２では、ステップＳ２１の繰り返し処理を終了すると、データ情報１つ分の行情報集計結果が作成される。データ情報１つ分の行情報集計結果の例が９０２である。すなわち、第２の文書データの各ページのテキストデータの配置位置から、ページ共通の行を特定する行特定処理の特徴的なステップである。また、ページごとの行位置（Ｙ座標）に従って、ページ共通の行を特定するものである。 In step S22, when the repetition process of step S21 ends, a row information totaling result for one data information is created. An example of the row information totaling result for one data information is 902. That is, this is a characteristic step of the line specifying process for specifying a line common to pages from the arrangement position of the text data of each page of the second document data. Also, according to the row position (Y coordinate) of each page, a row common to pages is specified.

ステップＳ２３では、データファイル数分繰返し、ステップＳ２４では、フォーム情報ファイル毎のデータ情報全ての行情報集計結果が作成される。データ情報すべての行情報集計結果の例が９０３である。
ステップＳ２５では、フォーム毎に作成した行情報集計結果（９０３）を、Y座標で昇順にソートする。ソートした行情報集計結果の例が９０４である。 In step S23, the process is repeated by the number of data files, and in step S24, the row information totaling result of all the data information of each form information file is created. An example of the row information totaling result of all the data information is 903.
In step S25, the row information totaling result (903) created for each form is sorted in ascending order by Y coordinate. An example of the sorted row information aggregation result is 904.

ステップＳ２６では、ソートしたY座標から改行幅となるY座標の差分を集計して作成し、行情報集計結果に改行幅情報を付け足して内部メモリに保持する。改行幅情報を付け足した行情報集計結果の例が９０５である。前の行の情報を用いてプリントラインを生成できる場合に、「０」となっている。９０５で示されるＹ１〜Ｙ４は、テキストが含まれる各行の行情報である。
ステップＳ２７では、フォーム毎に作成した行情報集計結果を、内部メモリに保持する。
ステップＳ２８では、フォーム数分繰り返し、すべてのフォーム対する行情報集計結果を作成し、行情報集計結果リストを作成する。 In step S26, from the sorted Y-coordinates, the differences of Y-coordinates to become a line feed width are totaled and created, line feed width information is added to the line information tabulation result, and held in the internal memory. An example of the line information tabulation result obtained by adding line feed width information is 905. It is "0" when the print line can be generated using the information of the previous line. Y1 to Y4 indicated by 905 are line information of each line in which the text is included.
In step S27, the line information totaling result created for each form is held in the internal memory.
In step S28, the process is repeated for the number of forms, the line information totaling result for all forms is created, and the line information totaling result list is created.

次に、図１４のフローチャートを用いて、共通ページ定義処理について説明する。 Next, common page definition processing will be described using the flowchart in FIG.

ステップＳ２９では、作成した全ページの行情報（Y座標）を使用して共通ページ定義作成の処理を開始する。処理はフォーム毎に作成した行情報集計結果を元に開始する。例えば、行情報集計結果リストからフォーム１の行情報集計結果を取得する（例えば、９０５）。フォーム毎に繰返し行うことで、フォームごとに全ページ共通の共通ページ定義を作成することとなる。
ステップＳ３０では、共通ページ定義の情報を格納するオブジェクトを、内部メモリに作成する。
ステップＳ３１では、行情報集計結果から、行情報（Y座標）のリストを取得する。
ステップＳ３２では、取得した行情報（Y座標）のリストから、先頭行の情報を取得する。 In step S29, the common page definition creation process is started using the created line information (Y coordinate) of all pages. Processing starts based on the row information totaling result created for each form. For example, the row information totaling result of Form 1 is acquired from the row information totaling result list (for example, 905). By repeating this for each form, a common page definition common to all pages will be created for each form.
In step S30, an object storing information on the common page definition is created in the internal memory.
In step S31, a list of row information (Y coordinate) is acquired from the row information totaling result.
In step S32, information on the first line is acquired from the acquired list of line information (Y coordinates).

ステップＳ３３では、共通ページ定義で必要となるプリントライン（行情報）の値として、行情報集計結果から行数を取得し、先頭行の情報から先頭行の水平（X座標）・垂直（Y座標）の開始位置を共通ページ定義の情報として格納する。プリントラインは行情報であるため、開始位置となる水平（X座標）は「０」として登録する。
プリントラインの概略を示す図が図１２のイメージ図である。 In step S33, the number of lines is acquired from the line information aggregation result as the value of the print line (line information) necessary for the common page definition, and the horizontal (X coordinate) / vertical (Y coordinate) of the first line is obtained from the first line information. The start position of) is stored as common page definition information. Since the print line is line information, the horizontal (X coordinate) as the start position is registered as “0”.
The figure which shows the outline of a print line is an image figure of FIG.

ステップＳ３４では、プリントラインの位置を用いて、共通ページ定義で必要となるフィールド（テキスト領域）の値として、先頭行の情報から先頭行の水平（X座標）・垂直（Y座標）の開始位置・終了位置を共通ページ定義の情報として格納する。電子帳票システムにおいてテキストを描画できる領域は用紙サイズの端から４８ドット内側（有効印字領域）となるため、水平（X座標）の開始位置は「４８」、終了位置は「用紙サイズ−４８」（ドット）として登録する。また、垂直（Y座標）の開始・終了位置は、先頭行の情報で保持している値を登録することとする。 In step S34, using the print line position, the horizontal (X coordinate) / vertical (Y coordinate) start position of the first line from the information of the first line as the value of the field (text area) required for common page definition -The end position is stored as common page definition information. The area where text can be drawn in the electronic form system is 48 dots inside (effective print area) from the end of the paper size, so the horizontal (X coordinate) start position is "48" and the end position is "paper size-48" ( Register as dots). In addition, for the start and end positions of the vertical (Y coordinate), values held in the information of the first row are registered.

なお、フィールドの値は、行情報集計結果９０５を用いて登録してもよい。行情報集計結果からＹ座標が特定できる。このフィールドの値は、ページ情報ファイルに記憶される。
ステップＳ３５では、共通ページ定義で必要となる改行幅（行の差分情報）の値として、先頭行の情報から改行幅情報を取得する。
ステップＳ３６では、改行幅の設定行数が「０」か否かを判定する。 Note that the values of the fields may be registered using the row information tabulation result 905. The Y coordinate can be identified from the line information tabulation result. The value of this field is stored in the page information file.
In step S35, line feed width information is acquired from the information of the first line as a value of line feed width (line difference information) required in the common page definition.
In step S36, it is determined whether the set line number of the line feed width is "0".

ステップＳ３７では、「０」で無い（＝改行幅が存在する）場合、改行幅情報として、改行幅の値・該当の改行幅を適用する行数を共通ページ定義の情報として格納する。この時、「０」であった場合は、次の行情報を取得する。
ステップＳ３８では、改行幅の情報が先頭行かどうかを判断する。 In step S37, if it is not “0” (= new line width exists), the value of new line width and the number of lines to which the applicable new line width is applied are stored as information of common page definition as new line width information. At this time, if it is "0", the next line information is acquired.
In step S38, it is determined whether the line feed width information is the first line.

ステップＳ３９では、先頭行で無い場合、この改行幅の値の分だけ、共通ページ定義に格納したフィールドの高さを拡張するよう、共通ページ定義を更新する。先頭行であった場合は、既にフィールドが先頭行の値で作成されているため、特になにもせずに次の行情報の処理を実施する。この通り、改行幅情報の分だけ処理を行うことで、行情報集計結果で保持している改行幅情報を共通ページ定義として作成する。 In step S39, if it is not the first line, the common page definition is updated so as to extend the height of the field stored in the common page definition by the value of the line feed width. In the case of the first row, since the field has already been created with the value of the first row, processing of the next row information is performed without doing anything. As described above, by performing processing for the line feed width information, the line feed width information held in the line information totaling result is created as a common page definition.

ステップＳ４０では、行数分・フォーム数分の行情報集計結果を共通ページ定義として作成する処理が終了した場合に、この時点で共通ページ定義の作成を終了する。共有ページ定義の概念図は図１１となる。図１１によれば、同一フォーム、全てのページで共通の行情報が特定できる。
ステップＳ４１では、フォーム毎に作成した共通ページ定義を紐付けたリストを作成する。 In step S40, when the process of creating the row information totaling result for the number of rows and the number of forms as the common page definition is finished, the creation of the common page definition is ended at this time. A conceptual diagram of shared page definition is shown in FIG. According to FIG. 11, common line information can be specified for the same form and all pages.
In step S41, a list is created in which common page definitions created for each form are linked.

次に、図１５のフローチャートを用いて、共通ページ定義を使用して独自帳票データ（帳票実績ファイル）を作成する処理について説明する。 Next, processing for creating unique form data (form result file) using the common page definition will be described using the flowchart of FIG.

ステップＳ４２では、テキスト情報・文字情報を作成するためのテキストデータ変換用のコンバータを内部メモリに作成する。テキストデータ変換用のコンバータは、中間データとして作成したデータ情報を解析し、共通ページ定義を用いてテキスト情報・文字情報を作成するために、データの変換を行うためのものである。
ステップＳ４３では、ページ順にデータファイルを取得し、行リストとして読み込む。 In step S42, a text data conversion converter for creating text information / character information is created in the internal memory. The converter for text data conversion is for converting data in order to analyze data information created as intermediate data and create text information / character information using a common page definition.
In step S43, data files are acquired in page order and read as a row list.

ステップＳ４４では、読み込んだデータは内部メモリ内に保持する。行情報集計結果で取得した情報は既にメモリ上から破棄されているため、ここで改めて読み込むものとする。処理速度を考慮した場合には、読み込んだデータを使いまわすほうが高速になるが、ページ数によってはメモリの圧迫につながるため、必要な時に読み込む。 At step S44, the read data is held in the internal memory. It is assumed that the information acquired as the row information totaling result is already discarded from the memory, so it is read again here. Considering the processing speed, it is faster to reuse the read data. However, depending on the number of pages, it may lead to memory pressure, so it is read when necessary.

ステップＳ４５では、PDF構成情報とフォーム毎の共通ページ定義格納情報から、該当のデータファイルで使用する共通ページ定義（図１４で生成された共通ページ）を特定し、取得する。以降は、テキストデータ変換コンバータの処理となる。ここで、テキストデータ変換コンバータには読み込んだデータの行リスト、ページ番号、共通ページ定義の情報を渡して処理を実行する。
ステップＳ４６では、独自形式のテキストオブジェクト（テキスト情報）をメモリ内に作成する。
ステップＳ４７では、独自形式のマップパターン（文字情報）をメモリ内に作成する。
ステップＳ４８では、共通ページ定義から、プリントライン、改行幅情報を取得する。
ステップＳ４９では、プリントラインに設定されている行数分、プリントラインデータオブジェクトを内部メモリ内に作成する。 In step S45, a common page definition (common page generated in FIG. 14) to be used in the corresponding data file is specified and acquired from the PDF configuration information and the common page definition storage information for each form. After that, it becomes processing of the text data conversion converter. Here, the text data conversion converter is executed by passing information of the read data line list, page number, and common page definition.
In step S46, a text object (text information) of a unique format is created in the memory.
In step S47, a map pattern (character information) of a unique format is created in the memory.
In step S48, print line and line feed width information is acquired from the common page definition.
In step S49, as many print line data objects as the number of lines set in the print line are created in the internal memory.

ステップＳ５０では、プリントラインデータオブジェクトは、データの行リストで保持している行単位のテキストを割り当てるためのものであり、このデータを用いて、独自帳票データ（帳票実績ファイル）のテキスト情報を作成することとなる。
ステップＳ５１では、データの行リストから１行分のデータを取得する。
ステップＳ５２では、共通ページ定義のプリントラインのY座標と１行分のデータのY座標が一致するかを判定する In step S50, the print line data object is for allocating the text in line units held in the line list of data, and using this data, the text information of the original form data (form result file) is created Will be.
In step S51, data for one line is acquired from the line list of data.
In step S52, it is determined whether the Y coordinate of the print line of the common page definition matches the Y coordinate of one line of data.

ステップＳ５３では、一致した場合は、１行分のテキストをプリントラインデータオブジェクトに割り当てる。一致しなかった場合は、データ無しとしてNULLを割り当てる。 In step S53, if there is a match, one line of text is assigned to the print line data object. If it does not match, assign NULL as no data.

ステップＳ５４では、共通ページ定義のプリントラインのY座標を改行幅の分だけずらし、データの行リストが全てのプリントラインデータオブジェクトに割り当てられるまで処理を繰り返して実行する。
プリントデータオブジェクトに割り当てる処理の概略図が図１３である。 In step S54, the Y coordinate of the print line in the common page definition is shifted by the line feed width, and the process is repeated until the line list of data is assigned to all print line data objects.
FIG. 13 is a schematic view of the process of assigning to the print data object.

ステップＳ５５では、データの行リストが全て割り当てられた後、プリントラインデータオブジェクトの行数が余っている場合、残りは全てNULLを割り当てる。 In step S55, after all the line lists of data are allocated, if the number of lines of the print line data object is left, all the remaining are allocated NULL.

ステップＳ５６では、プリントラインデータオブジェクトに割り当てられたデータは独自帳票データへの変換を容易にするため、変換しやすい１ページ分のテキストリストとして作成するための格納リストを作成する。 In step S56, in order to facilitate the conversion of the data assigned to the print line data object into unique form data, a storage list is created for creating a text list of one page that is easy to convert.

ステップＳ５７では、このリストには行単位のテキストリストを格納するため、プリントラインデータに割り当てられた１行分のテキスト単位（１文字単位）で、独自帳票データのテキストオブジェクト・マップパターンを作成する。マップパターンとは、１文字ずつのバイト数・高さ・幅・フォント・ピッチ（次の文字との間隔）・斜体・太字といった情報を保持するものである。なお、フォントや斜体・太文字などは、ＰＤＦファイルの文字の属性として保持しているものを記憶する。 In step S57, in order to store a text list in line units in this list, a text object / map pattern of unique form data is created in text units (one character unit) of one line allocated to the print line data. . The map pattern holds information such as the number of bytes for each character, height, width, font, pitch (interval with the next character), italics, and bold. Note that fonts, italics, bold characters, etc. are stored as the attributes of the characters of the PDF file.

ステップＳ５８では、作成した１行分のテキストオブジェクトをテキストリストに追加し、これをプリントラインデータに割り当てられた行数分繰返し実行する。これにより、１ページ分のテキストリストが作成される。すなわち、特定した行に含まれるテキストデータを特定するテキストデータ特定処理である。
ステップＳ５９では、１ページ分のテキストリストを、一括で独自帳票データのテキスト情報に変換しやすいように、１ページ分テキスト情報として変換する。
ステップＳ６０では、同じようにメモリ上のマップパターンを１ページ分のマップパターンとして変換する。 In step S58, the created text object for one line is added to the text list, and this process is repeated by the number of lines assigned to the print line data. This creates a text list for one page. That is, it is text data identification processing that identifies text data included in the identified line.
In step S59, the text list for one page is converted as text information for one page so that it can be easily converted into text information of original form data in a batch.
In step S60, the map pattern on the memory is similarly converted as a map pattern for one page.

ステップＳ６１では、１ページ分テキスト情報をテキスト情報ファイル（REPファイル）として書き込み、１ページ分の情報が格納されたテキスト情報ファイルが作成される。 In step S61, text information for one page is written as a text information file (REP file), and a text information file storing information for one page is created.

ステップＳ６２では、１ページ分マップパターンを文字情報ファイル（MAPファイル）として書き込み、１ページ分の情報が格納された文字情報ファイルが作成される。ここまでの処理を繰返し実施することで、全ページ分のテキスト情報ファイル・文字情報ファイルが作成される。 In step S62, the one-page map pattern is written as a character information file (MAP file), and a character information file in which information for one page is stored is created. By repeatedly performing the above processing, text information files and character information files for all pages are created.

ステップＳ６３では、作成した共通ページ定義情報を、作成した数だけページ情報ファイル（PAGファイル）として書き込む。ここで、フィールドの値が記憶され、各ページの各行の領域が定義される。このフィールドの値は、独自ファイル形式でのテキスト検索の際に用いられるものである。 In step S63, the created common page definition information is written as page information files (PAG files) as many as the created number. Here, the values of the fields are stored, and the area of each line of each page is defined. The value of this field is used for text search in the unique file format.

本実施形態では、フィールドの値の終了位置のＸ座標は、ステップＳ３４で説明した「用紙サイズ−４８」（ドット）となるが、各行のテキストの終端の座標を定義することも可能である。この場合、不要な領域を定義することがなり、より適切な独自形式のファイルとなる。 In the present embodiment, the X coordinate of the end position of the value of the field is "paper size-48" (dot) described in step S34, but it is also possible to define the coordinate of the end of the text of each line. In this case, an unnecessary area is defined, and the file becomes a more appropriate proprietary format file.

ステップＳ６４では、中間データとして作成されたフォーム情報（Ｓ０８で生成されたフォーム情報）を、フォーム情報ファイル（FRMファイル）に書き込む形に変換し、ファイルへと書き込む。 In step S64, the form information (form information generated in S08) created as intermediate data is converted into a form to be written to a form information file (FRM file), and is written to the file.

ステップＳ６１〜ステップＳ６４の処理は、テキストデータ特定処理で特定されたテキストデータを含む第１のファイルと、各テキストデータに対して、第２の文書データから得られる属性情報を割り当てた第２のファイルと、行特定処理で特定された行に従って、決定されたテキストデータを描画する位置を記憶する第３のファイルと、罫線抽出処理により抽出された罫線情報を含むフォームデータを記憶する第４のファイルとを生成するファイル生成処理である。 In the processing of steps S61 to S64, a second file in which attribute information obtained from the second document data is assigned to the first file including the text data identified in the text data identification process and each text data A third file storing a file and a position where the determined text data is drawn according to the line specified in the line specifying process, and a fourth file storing form data including ruled line information extracted by the ruled line extraction process This is a file generation process for generating a file.

ステップＳ６５では、共通ページ定義を使用した独自帳票データ（帳票実績ファイル）に帳票識別情報が付与されて電子帳票システムへの登録が完了する。すなわち、ファイル生成処理により生成された第１のファイルと第２のファイルと第３のファイルと第４のファイルとを、第１の文書として登録する登録処理である。
登録が完了すると、ＰＤＦファイルは削除される。また、この登録された独自帳票データを用いて、クライアント端末のブラウザで帳票を閲覧する。 In step S65, form identification information is added to the original form data (form result file) using the common page definition, and the registration in the electronic form system is completed. That is, the registration process is to register the first file, the second file, the third file, and the fourth file generated by the file generation process as a first document.
When registration is complete, the PDF file is deleted. Further, using the registered original form data, the form is browsed by the browser of the client terminal.

以上、本実施形態を説明したが、本実施の形態では、帳票ファイル２１３を表示させる際に、ＡＣＴＩＶＥ−Ｘが帳票サーバ１００からクライアントＰＣにダウンロードされ、クライアントＰＣ上のブラウザにより帳票ファイル２１３が表示される。また、各種表示制御をＡＣＴＩＶＥ−Ｘを用いて制御するように構成しているが、これに限定されない。例えば、ブラウザを用いることなく、クライアントサーバ型のシステムで構成してもよい。この場合、クライアントＰＣ側に、帳票ファイル２１３を表示、検索することができる独自のプログラムを予めインストールさせて実現させることも可能である。 The present embodiment has been described above, but in the present embodiment, when the form file 213 is displayed, ACTIVE-X is downloaded from the form server 100 to the client PC, and the form file 213 is displayed by the browser on the client PC. Be done. In addition, although various display controls are configured to be controlled using ACTIVE-X, the present invention is not limited to this. For example, a client server system may be configured without using a browser. In this case, it is also possible to realize in advance by installing a unique program capable of displaying and searching the form file 213 on the client PC side.

以上説明したように、本実施形態によれば、ＰＤＦファイルなどのテキストデータとテキストデータの属性情報と罫線情報とを１ファイルに含む文書データを、電子帳票システムで容易に扱えるファイルに変換することができる。 As described above, according to the present embodiment, document data including text data such as a PDF file, text data attribute information, and ruled line information in one file is converted into a file that can be easily handled by the electronic voucher system. Can.

特に、PDFを電子帳票システムにおいて電子帳票として扱うことが可能となるので、ユーザの業務の効率化を図ることが可能となる。 In particular, since it is possible to handle PDF as an electronic form in an electronic form system, it is possible to improve the efficiency of the work of the user.

なお、上述した各種データの構成及びその内容はこれに限定されるものではなく、用途や目的に応じて、様々な構成や内容で構成されることは言うまでもない。 It should be noted that the configuration and contents of the various data described above are not limited to this, and it goes without saying that the various data and configurations are configured according to the application and purpose.

以上、一実施形態について示したが、本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施態様をとることが可能であり、具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 As mentioned above, although one Embodiment was shown, this invention can take the embodiment as a system, an apparatus, a method, a program, a recording medium etc., for example, and, specifically, it is comprised from a some apparatus The present invention may be applied to a single system or to an apparatus comprising a single device.

また、本発明におけるプログラムは、図６、図９、図１４、図１５に示すフローチャートの処理方法をコンピュータが実行可能なプログラムであり、本発明の記憶媒体は図６、図９、図１４、図１５の処理方法をコンピュータが実行可能なプログラムが記憶されている。なお、本発明におけるプログラムは図６、図９、図１４、図１５の各装置の処理方法ごとのプログラムであってもよい。 In addition, the program according to the present invention is a program that allows a computer to execute the processing methods of the flowcharts shown in FIGS. 6, 9, 14, and 15, and the storage medium according to the present invention is illustrated in FIGS. A program capable of executing a computer with the processing method of FIG. 15 is stored. The program in the present invention may be a program for each processing method of each device in FIG. 6, FIG. 9, FIG. 14, and FIG.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読出し実行することによっても、本発明の目的が達成されることは言うまでもない。 As described above, the recording medium recording the program for realizing the functions of the above-described embodiments is supplied to the system or apparatus, and the computer (or CPU or MPU) of the system or apparatus stores the program stored in the recording medium. It goes without saying that the object of the present invention can be achieved also by reading and executing.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記憶した記録媒体は本発明を構成することになる。 In this case, the program itself read from the recording medium realizes the novel function of the present invention, and the recording medium storing the program constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク、ソリッドステートドライブ等を用いることができる。 As a recording medium for supplying the program, for example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD-ROM, magnetic tape, non-volatile memory card, ROM, EEPROM, silicon A disk, solid state drive, or the like can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on an instruction of the program is actually It goes without saying that the processing is partially or entirely performed, and the processing realizes the functions of the above-described embodiments.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written to the memory provided to the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board is read based on the instruction of the program code. It goes without saying that the case where the CPU or the like provided in the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Further, the present invention may be applied to a system constituted by a plurality of devices or to an apparatus comprising a single device. It goes without saying that the present invention can also be applied to the case where it is achieved by supplying a program to a system or apparatus. In this case, by reading a recording medium storing a program for achieving the present invention into the system or apparatus, the system or apparatus can enjoy the effects of the present invention.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。
なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Further, by downloading and reading out a program for achieving the present invention from a server on a network, a database or the like by a communication program, the system or apparatus can receive the effects of the present invention.
In addition, all the structures which combined each embodiment mentioned above and its modification are also included in this invention.

１００帳票サーバ
１０１クライアントＰＣ
１０２クライアントＰＣ
１０３ネットワーク
２０１ＣＰＵ
２０２ＲＡＭ
２０３ＲＯＭ
２１１外部メモリ 100 Form server 101 Client PC
102 Client PC
103 Network 201 CPU
202 RAM
203 ROM
211 External memory

Claims

Information processing apparatus for storing a second-format document file composed of text information, form information, and page information generated using a first-format document file including text data, text attribute information, and ruled line information And
A document file of the first type of a plurality of pages, a position obtaining means for obtaining a position of the text extracted from each page of the document file,
Based on the text layout positions on the plurality of pages acquired by the layout position acquisition means, line common page information that is an output area for each line of text data in the common form specified by the ruled line information is determined. Decision means,
Form information common to a plurality of pages extracted from ruled line information of the document file of the first format , text data extracted from text data and text attribute information of the document file of the first format, and the line An information processing apparatus comprising registration means for registering a document file of a second format using information.

2. The information according to claim 1, wherein the text information of the document file of the second format is a text file used when outputting the document file of the second format and an attribute information file of the text. Processing unit.

The document file of the first format is a document file using a plurality of forms, and
The determining means determines row information common to pages for each of the forms;
The information processing apparatus according to claim 1, further comprising a management unit that manages the line information and the form in association with each other.

A configuration file that includes the form identification information of each page, indicating the configuration of the acquired first format document file to generate a second format document file using the first format document file, and each page The method according to any one of the preceding claims, further comprising: generation means for generating a text related file including the text and the arrangement position of the text, and a form file including form information used for the document file of the first type. The information processing apparatus according to any one of Items 3 .

Document management system for storing a second format document file composed of text information, form information, and page information generated using a first format document file including text data, text attribute information, and ruled line information And
A document file of the first type of a plurality of pages, a position obtaining means for obtaining a position of the text extracted from each page of the document file,
Based on the text layout positions on the plurality of pages acquired by the layout position acquisition means, line common page information that is an output area for each line of text data in the common form specified by the ruled line information is determined. A determination means;
Form information common to a plurality of pages extracted from ruled line information of the document file of the first format , text data extracted from text data and text attribute information of the document file of the first format, and the line A document management system comprising: registration means for registering a document file of a second format using information.

Information processing apparatus for storing a second-format document file composed of text information, form information, and page information generated using a first-format document file including text data, text attribute information, and ruled line information Processing method,
The information processing apparatus
A document file of the first type of a plurality of pages, the layout position acquiring step of acquiring location of text extracted from each page of the document file,
Based on the text layout positions on the plurality of pages acquired in the layout position acquisition step, common page line information that is an output area for each line of text data in the common form specified by the ruled line information is determined. Decision step,
Form information common to a plurality of pages extracted from ruled line information of the document file of the first format , text data extracted from text data and text attribute information of the document file of the first format, and the line And a registration step of registering a document file of the second format using the information.

Document management system for storing a second format document file composed of text information, form information, and page information generated using a first format document file including text data, text attribute information, and ruled line information Processing method,
The document management system is
A document file of the first type of a plurality of pages, the layout position acquiring step of acquiring location of text extracted from each page of the document file,
Based on the text layout positions on the plurality of pages acquired in the layout position acquisition step, common page line information that is an output area for each line of text data in the common form specified by the ruled line information is determined. Decision step,
Form information common to a plurality of pages extracted from ruled line information of the document file of the first format , text data extracted from text data and text attribute information of the document file of the first format, and the line And a registration step of registering a document file of the second format using the information.

Information processing apparatus for storing a second-format document file composed of text information, form information, and page information generated using a first-format document file including text data, text attribute information, and ruled line information The program of
The information processing apparatus;
A document file of the first type of a plurality of pages, a position obtaining means for obtaining a position of the text extracted from each page of the document file,
Based on the text layout positions on the plurality of pages acquired by the layout position acquisition means, line common page information that is an output area for each line of text data in the common form specified by the ruled line information is determined. Decision means,
Form information common to a plurality of pages extracted from ruled line information of the document file of the first format , text data extracted from text data and text attribute information of the document file of the first format, and the line A program for functioning as registration means for registering a document file of a second format using information and the like.

Document management system for storing a second format document file composed of text information, form information, and page information generated using a first format document file including text data, text attribute information, and ruled line information The program of
The document management system;
A document file of the first type of a plurality of pages, a position obtaining means for obtaining a position of the text extracted from each page of the document file,
Based on the text layout positions on the plurality of pages acquired by the layout position acquisition means, line common page information that is an output area for each line of text data in the common form specified by the ruled line information is determined. Decision means,
Form information common to a plurality of pages extracted from ruled line information of the document file of the first format , text data extracted from text data and text attribute information of the document file of the first format, and the line A program for functioning as registration means for registering a document file of a second format using information and the like.