JP2006202026A

JP2006202026A - Information processor and control method

Info

Publication number: JP2006202026A
Application number: JP2005012789A
Authority: JP
Inventors: Naohiro Yamaguchi; 直宏山口
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-01-20
Filing date: 2005-01-20
Publication date: 2006-08-03

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information processor and a control method for reducing capacity of an electronic binder configured with a plurality of electronic documents wherein duplicated font data are embedded in consideration of the fact that font data is larger in data size as compared with text data so that an electronic document (PDF etc. ) embedded with fonts becomes extremely large in size. <P>SOLUTION: This information processor is provided with an archive means for, when archiving a plurality of electronic documents 301 in which fonts can be embedded, automatically detecting font data 306 common to the plurality of electronic document groups, then for organizing the duplicated font data 306 to generate an electronic document archive file. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、複数種類の電子文書から構成される電子圧縮アーカイブを構成する情報処理装置及びその方法、コンピュータ可読メモリに関するものである。 The present invention relates to an information processing apparatus and method, and a computer-readable memory constituting an electronic compressed archive composed of a plurality of types of electronic documents.

企業ではドキュメント管理システムを導入し、文書の再利用を推進しようとしている。初期の製品は、紙の文書をスキャナで画像として取り込み、登録保存するようなものであったが、最近はパソコンで作られた電子文書が多くなり、それも登録保存できるようになってきた。 Companies are introducing document management systems to promote document reuse. Early products used to capture paper documents as images with a scanner and register and save them. Recently, however, many electronic documents have been made on personal computers, and they can now be registered and stored.

また、最近では登録されている複数の電子文書から任意のページを抜き出して圧縮アーカイブし、バインダのように綴じて一つの電子文書のようにすることもできるようになっている。それを、ここでは電子バインダと呼ぶ。また綴じた電子バインダは開いて内部のファイルを編集したり、構成ファイルを変更することが可能である。 Recently, it has become possible to extract an arbitrary page from a plurality of registered electronic documents, compress and archive them, and bind them like a binder to form one electronic document. This is called an electronic binder here. Also, the bound electronic binder can be opened to edit the internal file or change the configuration file.

この電子バインダは様々な形式のデータを素材として含むことが可能であるが、電子文書データの一つとして、Ａｄｏｂｅ社のＰＤＦ（ＰｏｒｔａｂｌｅＤｏｃｕｍｅｎｔＦｏｒｍａｔ）を取り入れることができる。 Although this electronic binder can include various types of data as materials, PDF (Portable Document Format) of Adobe Corporation can be incorporated as one of the electronic document data.

ＰＤＦファイルを印刷データとして用いることで、作成時のアプリケーションとほぼ同じイメージレイアウトで印刷が可能となる。 By using a PDF file as print data, it is possible to print with almost the same image layout as the application at the time of creation.

ＰＤＦファイルはテキストデータとフォント情報を持つが、ＯＳ環境が異なるとシステムが持つフォントや文字コード体系も異なるため、作成アプリケーションと同じ表示や印刷が実現できないことがある。 A PDF file has text data and font information, but if the OS environment is different, the font and character code system of the system may be different, and the same display and printing as the creation application may not be realized.

これを解決するため、電子バインダ内部に含まれるＰＤＦファイルには、しばしば各ＰＤＦ作成者によってテキストデータに対応するフォントセットの一部がサブセットとして埋め込まれる。 In order to solve this problem, a part of a font set corresponding to text data is often embedded as a subset in the PDF file included in the electronic binder by each PDF creator.

ＰＤＦファイルにフォントを埋め込むと、表示、印刷時にＯＳが持つフォントを使用せず埋め込まれたフォントを使用するため、異なるＯＳ環境下でも作成時の元文書と同じ表示や印刷が可能となる。 When fonts are embedded in a PDF file, the fonts embedded in the OS are not used at the time of display and printing, and the embedded fonts are used. Therefore, the same display and printing as the original document at the time of creation are possible even under different OS environments.

これにより異なる言語で作成されたＰＤＦファイルを表示したり、ある文字セットにしか存在しない特殊な文字が他の文字に置き換わって表示されるのを防ぐ事が可能である。 As a result, it is possible to display PDF files created in different languages and to prevent special characters that exist only in a certain character set from being replaced with other characters.

電子バインダは、記憶装置の容量を節約し、内部情報の秘密性を高めるため、ＰＤＦとその他のデータを圧縮アーカイブした状態で保持する。
特開２００１−３５１０８９号公報特開２００２−０９１９５７号公報 The electronic binder holds the PDF and other data in a compressed and archived state in order to save the capacity of the storage device and increase the confidentiality of the internal information.
JP 2001-351089 A JP 2002-091957 A

しかしながら、従来の電子バインダでは、次のような問題点があった。 However, the conventional electronic binder has the following problems.

電子バインダ内にフォントが埋め込まれたＰＤＦファイルが複数含まれる場合、電子バインダ全体で見ると複数のフォントサブセットが埋め込まれる。 When a plurality of PDF files in which fonts are embedded are included in the electronic binder, a plurality of font subsets are embedded in the electronic binder as a whole.

フォントデータはテキストデータに比べ非常にデータサイズが大きいため、通常のテキストデータとフォント情報を持つＰＤＦに比べ、フォントが埋め込まれたＰＤＦはサイズが非常に大きくなる。 Since the font data is much larger in data size than text data, the PDF in which the font is embedded is much larger than the PDF having normal text data and font information.

また複数ＰＤＦに記述されているテキストデータが重複している場合、各ＰＤＦが持つ埋め込みフォントサブセットが重複することになる。 When text data described in a plurality of PDFs overlaps, the embedded font subsets possessed by each PDF overlap.

そのため電子バインダに綴じられたＰＤＦ電子文書の圧縮効率が悪くなる問題があった。 Therefore, there is a problem that the compression efficiency of the PDF electronic document bound to the electronic binder is deteriorated.

本発明は、上記の問題を解決するために成されたもので、フォント埋め込み可能な電子文書をアーカイブする際に、各電子文書毎で重複したフォントデータを自動検出した後、重複したフォントデータをまとめ、アーカイブを行う。 The present invention has been made to solve the above-described problem. When an electronic document that can be embedded with fonts is archived, the duplicated font data is automatically detected after the duplicated font data is automatically detected for each electronic document. Summarize and archive.

例えば、本発明の情報処理装置は、フォントデータを埋め込み可能な複数の電子文書ファイルから、電子文書アーカイブファイルを作成する情報処理装置であって、前記複数の電子文書ファイルをアーカイブする際、前記複数の電子文書ファイル群に共通なフォントデータを抽出し、当該共通なフォントデータをまとめて、前記電子文書アーカイブファイルを作成するアーカイブ手段を有することを特徴とする。 For example, the information processing apparatus of the present invention is an information processing apparatus that creates an electronic document archive file from a plurality of electronic document files in which font data can be embedded, and when archiving the plurality of electronic document files, And an archive means for extracting the font data common to the electronic document file group and creating the electronic document archive file by collecting the common font data.

本発明によれば、フォントデータが埋め込まれた電子文書（ＰＤＦなど）群を電子バインダに綴じる際に、そのままアーカイブするのではなく共通して使用されているフォントデータをまとめてアーカイブを行うので、フォントデータを含む電子文書から構成された電子バインダの容量が小さくなる。 According to the present invention, when binding a group of electronic documents (PDF or the like) in which font data is embedded to an electronic binder, instead of archiving as it is, font data that is commonly used is archived together. The capacity of the electronic binder composed of the electronic document including font data is reduced.

（実施例１）
以下、図面を参照して本発明の好適な実施形態を詳細に説明する。 Example 1
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

図１は本実施形態の電子バインダと、その内部に綴じられた電子文書の概念図である。 FIG. 1 is a conceptual diagram of an electronic binder according to the present embodiment and an electronic document bound therein.

フォントデータを埋め込まれた電子文書（本実施例ではＰＤＦ文書）群１０１をアーカイブして、電子バインダ１０２が作成される。圧縮する際に用いられる圧縮方式には様々なものがあるが、本実施例で用いられる圧縮は、不特定の可逆圧縮アルゴリズムを用いて行われるものとする。 An electronic binder 102 is created by archiving a group of electronic documents (PDF documents in this embodiment) embedded with font data. Although there are various compression methods used for compression, the compression used in this embodiment is performed using an unspecified lossless compression algorithm.

図１で、ファイルヘッダー１０３は電子バインダ自体の情報や作成日時などの情報を持つ。 In FIG. 1, the file header 103 has information such as information on the electronic binder itself and creation date and time.

圧縮情報テーブル１０４は電子バインダ１０２に含まれる圧縮電子文書１０５に関する情報を持つ。圧縮情報テーブルは圧縮電子文書１０５のアドレスを持ち、このアドレスを参照して圧縮電子文書１０５にアクセスする。 The compression information table 104 has information regarding the compressed electronic document 105 included in the electronic binder 102. The compression information table has an address of the compressed electronic document 105, and the compressed electronic document 105 is accessed by referring to this address.

電子バインダ１０２に綴じられた圧縮電子文書１０５を、アプリケーションで処理できる状態にするために、電子バインダ１０２から圧縮電子文書１０５を抽出する。圧縮電子文書１０５に圧縮時に用いられた圧縮形式に対応した伸長アルゴリズムを用いて伸長処理を行い、電子文書１０６を作成する。 The compressed electronic document 105 is extracted from the electronic binder 102 so that the compressed electronic document 105 bound to the electronic binder 102 can be processed by an application. The compressed electronic document 105 is decompressed using a decompression algorithm corresponding to the compression format used at the time of compression, and the electronic document 106 is created.

図２はフォントデータを埋め込まれたＰＤＦ電子文書群を圧縮アーカイブする際に、各ＰＤＦ電子文書から共通フォントデータと差分フォントデータを抽出し、フォントデータをまとめた上でアーカイブする処理を示したフローチャートである。 FIG. 2 is a flowchart showing a process of extracting common font data and differential font data from each PDF electronic document, archiving the font data after collecting them when compressing and archiving a PDF electronic document group in which font data is embedded. It is.

ステップ２０１で電子文書番号ｉを０に初期化して、最初の電子文書から判定を開始する。 In step 201, the electronic document number i is initialized to 0, and determination is started from the first electronic document.

ステップ２０２でＰＤＦファイルｉの内部に埋め込まれたフォントデータを検索し、既にステップ２０３およびステップ２０４によってデータベースに登録されたフォントがあるかどうか調べる。登録済みのフォントがＰＤＦファイルｉに埋め込まれていればステップ２０３に進む。埋め込まれているフォントがＰＤＦファイルｉ−１までに埋め込まれていなければ、ステップ２０４に進む。 In step 202, the font data embedded in the PDF file i is searched to check whether there is a font already registered in the database in steps 203 and 204. If the registered font is embedded in the PDF file i, the process proceeds to step 203. If the embedded font is not embedded up to the PDF file i-1, the process proceeds to step 204.

ステップ２０３で電子文書ｉに埋め込まれているフォントデータのうち、共通フォントとして登録されていないフォントデータをフォント情報をデータベースに登録する。またその他に共通フォントとして登録されていないフォントデータがあれば、ＰＤＦファイルｉの差分フォントデータとして抽出する。 In step 203, font data not registered as a common font among the font data embedded in the electronic document i is registered in the database. If there is any other font data not registered as a common font, it is extracted as differential font data of the PDF file i.

ＰＤＦファイルｉのフォントデータをデータベースのフォントデータ情報と置換して、ＰＤＦファイルｉのフォント情報ファイルを作成する。そしてＰＤＦファイルｉからフォントデータを削除する。 The font data of the PDF file i is created by replacing the font data of the PDF file i with the font data information of the database. Then, the font data is deleted from the PDF file i.

ステップ２０４でＰＤＦファイルｉのフォントデータは全て新規データであるので、差分フォントデータとして抽出する。新規フォントデータとしてデータベースに登録する。ＰＤＦファイルｉのフォントデータをデータベースのフォントデータ情報と置換して、ＰＤＦファイルｉのフォント情報ファイルを作成する。そしてＰＤＦファイルｉからフォントデータを削除する。 In step 204, the font data of the PDF file i are all new data, so they are extracted as differential font data. Register as new font data in the database. The font data of the PDF file i is created by replacing the font data of the PDF file i with the font data information of the database. Then, the font data is deleted from the PDF file i.

ステップ２０５でｉをインクリメントする。 In step 205, i is incremented.

ステップ２０６で、現在処理中の電子文書番号ｉが全ファイル数ｆｉｌｅｎｕｍより大きいかどうかで、全ての電子文書が判定し終わったかどうかを判断する。ｉがｆｉｌｅｎｕｍより小さい場合はステップ２０２に進み、次の電子文書の判定を行う。ｉがｆｉｌｅｎｕｍ以上となり全ての電子文書の判定が終わるとステップ２０７に進む。 In step 206, it is determined whether or not all electronic documents have been determined based on whether or not the electronic document number i currently being processed is larger than the total file number filenum. If i is smaller than filen, the process proceeds to step 202 to determine the next electronic document. When i is equal to or greater than filenum and all electronic documents have been determined, the process proceeds to step 207.

ステップ２０７で全電子文書およびフォント情報ファイルを対象としてアーカイブを行い、電子バインダを作成する。 In step 207, all electronic documents and font information files are archived to create an electronic binder.

図３は本発明の手法を用いて、ＰＤＦ電子文書群をアーカイブする際にＰＤＦ電子文書からフォントデータとその他の情報を分離し、フォントデータをまとめてアーカイブする処理を示した概念図である。 FIG. 3 is a conceptual diagram showing a process of separating font data and other information from the PDF electronic document and archiving the font data collectively when archiving the PDF electronic document group using the method of the present invention.

電子文書３０１はＰＤＦ電子文書である。電子文書３０１は文書やオペレータ、イメージなどのフォント以外のデータ３０２、フォントデータ３０３および３０４を含む。フォントデータ３０３は他ＰＤＦ電子文書に共通なフォントデータである。フォントデータ３０４は他ＰＤＦ電子文書に含まれない各ＰＤＦ電子文書固有の差分フォントデータである。 The electronic document 301 is a PDF electronic document. The electronic document 301 includes data 302 other than fonts such as a document, an operator, and an image, and font data 303 and 304. Font data 303 is font data common to other PDF electronic documents. Font data 304 is differential font data unique to each PDF electronic document that is not included in other PDF electronic documents.

各ＰＤＦ電子文書からフォント以外のデータ３０２を抽出して文書データ３０５としてまとめる。 Data 302 other than fonts is extracted from each PDF electronic document and collected as document data 305.

また共通フォントデータ３０３と差分フォントデータ３０４を抽出し、フォントデータ３０６としてまとめる。 Also, common font data 303 and differential font data 304 are extracted and collected as font data 306.

これらのＰＤＦ電子文書から抽出された文書データ、共通フォントデータ、差分フォントデータをアーカイブして電子バインダ３０７を作成する。アーカイブ時に各電子文書に関する情報が電子バインダ３０７の圧縮情報テーブルに記録される。 An electronic binder 307 is created by archiving document data, common font data, and differential font data extracted from these PDF electronic documents. Information about each electronic document is recorded in the compression information table of the electronic binder 307 at the time of archiving.

ＰＤＦ電子文書中に含まれるフォントデータから共通フォントデータと差分フォントデータを抽出する処理を示した例を図４に示す。 FIG. 4 shows an example showing processing for extracting common font data and differential font data from font data included in a PDF electronic document.

ＰＤＦ電子文書４０１にフォントデータが図のように含まれるものとする。ＰＤＦ電子文書４０１は処理順で最初であるので、ＰＤＦ電子文書内の全フォントデータを新規フォントデータとして登録する。なお、各字に割り振られた丸数字は登録番号を表す。 Assume that font data is included in the PDF electronic document 401 as shown in the figure. Since the PDF electronic document 401 is the first in the processing order, all font data in the PDF electronic document is registered as new font data. The circle numbers assigned to each character represent the registration number.

次にＰＤＦ電子文書４０２に含まれるフォントデータと登録したフォントデータでマッチングを行う。以前に登録されたフォントデータと同一か置換え可能なフォントデータが存在した場合、それらのフォントデータを共通フォントデータとして登録する。なお、各字に割り振られた黒抜き丸数字は共通フォントデータの登録番号を示す。共通フォントデータにマッチングしないフォントデータを新規フォントデータとして登録する。また差分フォントデータとして登録する。 Next, matching is performed using the font data included in the PDF electronic document 402 and the registered font data. If there is font data that is the same as or replaceable with previously registered font data, the font data is registered as common font data. A black circle number assigned to each character indicates a registration number of common font data. Font data that does not match the common font data is registered as new font data. Also, it is registered as differential font data.

全対象ＰＤＦ電子文書に対する処理を行った後、共通フォントデータ、各文書の差分フォントデータ、各文書のフォント情報をまとめて電子バインダのＰＤＦ統合フォント情報４０３とする。 After processing all the target PDF electronic documents, the common font data, the difference font data of each document, and the font information of each document are combined into PDF integrated font information 403 of the electronic binder.

以上説明した本発明の実施形態の文書管理システムに適用可能な情報処理装置を示すと、図５のようになる。図５は本発明の文書管理システムに適用可能な情報処理装置の構成を示すブロック図である。 An information processing apparatus applicable to the document management system according to the embodiment of the present invention described above is shown in FIG. FIG. 5 is a block diagram showing a configuration of an information processing apparatus applicable to the document management system of the present invention.

図５において、ＣＰＵ６０２はメインバス５０７を介して情報処理装置５０１全体の制御を実行するとともに、情報処理装置５０１の外部に接続される入力装置５１１（例えば、イメージスキャナ、記憶装置、ネットワーク回線を介して接続される他の情報処理装置、電話回線を介して接続されるファクシミリ等）を入力Ｉ／Ｆ（インタフェース）５０５を介して制御する。また、情報処理装置５０１の外部に接続される出力装置５１２（例えば、プリンタ、モニタ、ネットワーク回線を介して接続される他の情報処理装置、電話回線を介して接続されるファクシミリ等）を出力Ｉ／Ｆ５０６を介して制御する。また、ＣＰＵ５０２は、ＫＢＤＩ／Ｆ（キーボードインタフェース）５０８を介して入力部（例えば、キーボード５１３やポインティングデバイス５１４やペン５１５）から入力された指示に従って、画像の入力、画像処理、色変換処理、画像の出力制御等の一連の処理を実行する。更に、入力装置５１１より入力された画像データや、キーボード５１３やポインティングデバイス５１４やペン５１５を用いて作成された画像データを表示する表示部５１０をビデオＩ／Ｆ（インタフェース５０９を介して制御する。 In FIG. 5, the CPU 602 executes control of the entire information processing apparatus 501 via the main bus 507, and also inputs an input device 511 (for example, an image scanner, a storage device, a network line) connected to the outside of the information processing apparatus 501. Other information processing apparatuses connected via the telephone line, facsimiles connected via a telephone line, etc.) are controlled via an input I / F (interface) 505. Further, an output device 512 connected to the outside of the information processing device 501 (for example, a printer, a monitor, another information processing device connected through a network line, a facsimile connected through a telephone line, etc.) is output I. Control via / F506. The CPU 502 also inputs an image, performs image processing, color conversion processing, and image according to an instruction input from an input unit (for example, a keyboard 513, a pointing device 514, or a pen 515) via a KBDI / F (keyboard interface) 508. A series of processing such as output control is executed. Further, the display unit 510 that displays image data input from the input device 511 and image data created using the keyboard 513, pointing device 514, and pen 515 is controlled via the video I / F (interface 509).

ＲＯＭ５０３は、ＣＰＵ５０２の各種制御を実行する各種制御プログラムを記憶している。ＲＡＭ５０４は、ＣＰＵ５０２によりＯＳや本発明を実現するための制御プログラムを含むその他の制御プログラムがロードされ実行される。また、制御プログラムを実行するために用いられる各種作業領域、一時待避領域として機能する。また、入力装置５１１より入力された画像データや、キーボード５１３やポインティングデバイス５１４やペン５１５を用いて作成された画像データを、一旦、保持するＶＲＡＭ（不図示）が構成されている。 The ROM 503 stores various control programs that execute various controls of the CPU 502. In the RAM 504, the CPU 502 loads and executes the OS and other control programs including a control program for realizing the present invention. It also functions as various work areas and temporary save areas used for executing the control program. In addition, a VRAM (not shown) that temporarily holds image data input from the input device 511 and image data created using the keyboard 513, the pointing device 514, and the pen 515 is configured.

尚、本発明は、複数の機器（例えばホストコンピュータ、インタフェース機器、リーダ、プリンタなど）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機、ファクシミリ装置など）に適用してもよい。 Note that the present invention can be applied to a system (for example, a copier, a facsimile machine, etc.) consisting of a single device even if it is applied to a system composed of a plurality of devices (for example, a host computer, interface device, reader, printer, etc.). You may apply.

また、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読出し実行することによっても、達成されることは言うまでもない。 Another object of the present invention is to supply a storage medium storing software program codes for implementing the functions of the above-described embodiments to a system or apparatus, and the computer (or CPU or MPU) of the system or apparatus stores the storage medium. Needless to say, this can also be achieved by reading and executing the program code stored in the.

この場合、記憶媒体から読出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。 In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention.

プログラムコードを供給するための記憶媒体としては、例えば、フロッピディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどを用いることができる。 As a storage medium for supplying the program code, for example, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

また、コンピュータが読出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.

更に、記憶媒体から読出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, after the program code read from the storage medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that the CPU or the like provided in the board or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

本発明を上記記憶媒体に適用する場合、その記憶媒体には、先に説明した図２、図３、図４に示すフローチャートに対応するプログラムコードが格納されることになる。 When the present invention is applied to the storage medium, the storage medium stores program codes corresponding to the flowcharts shown in FIGS. 2, 3, and 4 described above.

以上説明したように、本発明によれば、フォントデータが埋め込まれたＰＤＦ電子文書群を電子バインダに綴じる際に、そのままアーカイブするのではなく共通して使用されているフォントデータをまとめてアーカイブを行う情報処理装置及びその方法、コンピュータ可読メモリを提供できる。 As described above, according to the present invention, when a PDF electronic document group in which font data is embedded is bound to an electronic binder, the font data that is commonly used is not archived but archived together. An information processing apparatus and method thereof, and a computer-readable memory can be provided.

これによりフォントデータを含むＰＤＦ電子文書から構成された電子バインダの容量が小さくなる。 As a result, the capacity of the electronic binder composed of the PDF electronic document including font data is reduced.

本実施形態の文書管理システムの概念図である。It is a conceptual diagram of the document management system of this embodiment. フォントデータをまとめた上でアーカイブする処理を示したフローチャートである。It is the flowchart which showed the process which collects and archives font data. ＰＤＦ電子文書からフォントデータ抽出処理、フォントデータのアーカイブ処理を示す図である。It is a figure which shows the font data extraction process from a PDF electronic document, and the archive process of a font data. 電子バインダの内部構成例および圧縮アーカイブ処理、抽出伸長処理を示す図である。It is a figure which shows the internal structural example of an electronic binder, a compression archive process, and an extraction expansion process. 本実施形態の文書管理システムに適用可能な情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus applicable to the document management system of this embodiment.

Claims

An information processing apparatus for creating an electronic document archive file from a plurality of electronic document files in which font data can be embedded,
When archiving the plurality of electronic document files, font data common to the plurality of electronic document file groups is extracted, and the common font data is collected to create the electronic document archive file. A characteristic information processing apparatus.

The information processing apparatus according to claim 1, wherein the electronic document file in which the font data can be embedded is an electronic document file in a PDF format.

Information acquisition means for acquiring the electronic document file information stored in the electronic document archive file when a read request for electronic document file information stored therein is issued to the electronic document archive file;
The information processing apparatus according to claim 1, further comprising: a restoration unit that restores the original electronic document file based on the acquired electronic document file information according to a predetermined restoration algorithm.

The said information acquisition means acquires the information regarding the electronic document file stored in the said electronic document archive file, without performing the said restoration means from this electronic document archive file. Information processing device.

The information processing apparatus according to claim 1, wherein the archive unit compresses the electronic document file with a predetermined compression algorithm.

A control method for controlling an information processing apparatus that creates an electronic document archive file from a plurality of electronic document files in which font data can be embedded,
When archiving the plurality of electronic document files, the method includes an archiving step of extracting font data common to the plurality of electronic document file groups and collecting the common font data to create the electronic document archive file. An information processing apparatus control method.

A computer program for creating an electronic document archive file from a plurality of electronic document files in which font data can be embedded,
When archiving the plurality of electronic document files, the computer executes an archiving step of extracting font data common to the plurality of electronic document file groups and creating the electronic document archive file by collecting the common font data A computer program comprising a program code for causing a program to be executed.