JP2009111984A

JP2009111984A - Information processing apparatus and method, computer program and computer-readable recording medium

Info

Publication number: JP2009111984A
Application number: JP2008237188A
Authority: JP
Inventors: Kenichi Okihara; 健一沖原
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2007-10-11
Filing date: 2008-09-16
Publication date: 2009-05-21
Anticipated expiration: 2028-09-16
Also published as: JP5173690B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information embedding technique that enables embedding of watermark information which manipulates placement of constituent images, in electronic document data, as well as, extraction of the embedded watermark information from among printed documents. <P>SOLUTION: The present invention relates to a watermark information embedding apparatus which generates a document image from electronic document data A that have been inputted thereto, modifies the electronic document data A, based on the document image and embeds information in the electronic document data. The apparatus includes a document image generator 102 for generating a document image from the electronic document data A; a document analyzer 103 for detecting layout information of each constituent image in the generated document image; a normalization information calculating unit 104 for calculating normalization information, which is for normalizing the placement of each constituent image, based on the detected layout information; a modification unit 105 for modifying the electronic document data A, based on the calculated normalization information; and an embedding unit 107 for embedding information in the modified electronic document data A. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、文書へ情報を埋め込む技術に関する。 The present invention relates to a technique for embedding information in a document.

文書に情報を埋め込み、埋め込んだ情報（一般に、透かし情報と呼ばれる。）を抽出する技術は、文書のセキュリティを向上させるために有用である。 A technique for embedding information in a document and extracting the embedded information (generally called watermark information) is useful for improving the security of the document.

例えば、文字画像の配置位置を調整することで文書画像に透かし情報を埋め込み、その文書画像を印刷文書として出力する。その出力された印刷文書をスキャナ等で取り込むことで文書画像として、その文書画像から透かし情報を抽出するものである（特許文献１）。この技術では、複写によって文字画像の配置位置が変化しにくいため、複写耐性を持つ。 For example, watermark information is embedded in the document image by adjusting the arrangement position of the character image, and the document image is output as a print document. The output print document is captured by a scanner or the like, and watermark information is extracted from the document image as a document image (Patent Document 1). This technique has copy resistance because the arrangement position of the character image is not easily changed by copying.

一方、ページ記述によって記述されたテキストを含む電子文書データの配置情報を変更して透かし情報を埋め込んだ後、その電子文書データから透かし情報を抽出するものである（特許文献２）。ここで、ページ記述された電子文書データはプリンタに対して出力を指示するための言語である。文字や図形等を各プリンタに応じた最適な文字質や画質で印刷することができ、一般的なレーザープリンタで利用されている。特許文献１の技術では文書画像に対して透かし情報の埋め込みを行うため、最適な文字質や画質で印刷できない。従って、ページ記述された電子文書データに対しての埋め込みが必要と考えられる。
特開２００５−２５３００４号公報特開２０００−９９５０１号公報 On the other hand, after changing the arrangement information of electronic document data including text described by page description and embedding watermark information, the watermark information is extracted from the electronic document data (Patent Document 2). Here, the electronic document data described in the page is a language for instructing the printer to output. Characters, graphics, and the like can be printed with optimum character quality and image quality according to each printer, and are used in general laser printers. In the technique of Patent Document 1, since watermark information is embedded in a document image, it cannot be printed with an optimum character quality or image quality. Therefore, it is considered necessary to embed the electronic document data described in the page.
JP 2005-253004 A JP 2000-99501 A

しかしながら、特許文献１の技術は、文書画像上で透かし情報を埋め込むものであり、電子文書データとして出力するには文書画像から電子文書データへの変換が必要である。この変換を行う際、ブロックセレクションや光学文字認識（以下、ＯＣＲ）等を行うが、ＯＣＲによる誤差又はアウトラインフォントの描画処理時に文字質を向上させるヒンティング処理の影響等で文字間隔が変化してしまうことがある。そのため、文書画像上で透かし情報を埋め込み、埋め込み後に電子データを変換したものから、埋め込まれていた透かし情報が正しく抽出できない場合があった。また、上記の変換の処理量が多い。 However, the technique of Patent Document 1 embeds watermark information on a document image, and conversion from a document image to electronic document data is necessary to output it as electronic document data. When this conversion is performed, block selection, optical character recognition (hereinafter referred to as OCR), etc. are performed, but the character spacing changes due to an error due to OCR or the effect of hinting processing that improves the character quality during outline font drawing processing. It may end up. For this reason, there is a case where the embedded watermark information cannot be correctly extracted from the watermark information embedded in the document image and the electronic data converted after the embedding. In addition, the amount of processing of the conversion is large.

一方、特許文献２の技術は、電子文書データ上で透かし情報を埋め込むものであるため、それを印刷文書として出力した場合に、文書画像上で電子文書データに埋め込まれた透かし情報と同一の透かし情報を抽出することができない場合が多い。 On the other hand, since the technique of Patent Document 2 embeds watermark information on electronic document data, when it is output as a print document, the same watermark as the watermark information embedded in the electronic document data on the document image In many cases, information cannot be extracted.

従って、本発明の目的は、電子文書データに対して、構成画像の配置を操作する透かし情報の埋め込み及び、その印刷文書から埋め込まれた透かし情報を抽出可能である情報の埋め込み技術を提供することにある。 Therefore, an object of the present invention is to provide an information embedding technique capable of embedding watermark information for manipulating the arrangement of constituent images in electronic document data and extracting the embedded watermark information from the printed document. It is in.

上記課題を解決するため、本発明に係る情報処理装置は、入力された電子文書データに情報を埋め込む情報処理装置であって、前記電子文書データから文書画像を生成する画像生成手段と、前記生成された文書画像中の各構成画像のレイアウト情報を検出する検出手段と、前記検出されたレイアウト情報に基づいて、各構成画像の配置を正規化するための正規化情報を算出する算出手段と、前記算出された正規化情報に基づいて、前記電子文書データを変更し、変更された電子文書データに情報を埋め込む埋め込み手段と、を備える。 In order to solve the above-described problems, an information processing apparatus according to the present invention is an information processing apparatus that embeds information in input electronic document data, and includes an image generation unit that generates a document image from the electronic document data, and the generation Detection means for detecting layout information of each component image in the document image, and calculation means for calculating normalization information for normalizing the arrangement of each component image based on the detected layout information; Embedding means for changing the electronic document data based on the calculated normalization information and embedding information in the changed electronic document data.

また、本発明に係る情報処理方法は、入力された電子文書データに情報を埋め込む情報処理方法であって、前記電子文書データから文書画像を生成する画像生成工程と、前記生成された文書画像中の各構成画像のレイアウト情報を検出する検出工程と、前記検出されたレイアウト情報に基づいて、各構成画像の正規化するための正規化情報を算出する算出工程と、前記算出された正規化情報に基づいて、前記電子文書データを変更し、変更された電子文書データに情報を埋め込む埋め込み工程と、を有する。 An information processing method according to the present invention is an information processing method for embedding information in input electronic document data, and includes an image generation step of generating a document image from the electronic document data, and the generated document image A detection step of detecting layout information of each component image, a calculation step of calculating normalization information for normalizing each component image based on the detected layout information, and the calculated normalization information And changing the electronic document data and embedding information in the changed electronic document data.

本発明によれば、電子文書データに対して、構成画像の配置を操作する透かし情報の埋め込み、その印刷文書から、埋め込まれた透かし情報を抽出可能である情報の埋め込み技術を提供することができる。 According to the present invention, it is possible to provide an information embedding technique capable of embedding watermark information for manipulating the arrangement of constituent images in electronic document data and extracting the embedded watermark information from the printed document. .

以下、添付図面に従って、本発明に係る実施形態を詳細に説明する。 Embodiments according to the present invention will be described below in detail with reference to the accompanying drawings.

＜第１の実施形態＞
図１は、本発明の第１の実施形態及び第２の実施形態に係る透かし情報埋め込み装置１００の概念的な構成図である。 <First Embodiment>
FIG. 1 is a conceptual configuration diagram of a watermark information embedding device 100 according to the first embodiment and the second embodiment of the present invention.

透かし情報埋め込み装置（情報処理装置）１００は、入力された電子文書データＡから文書画像を生成し、文書画像に基づいて、電子文書データＡを変更し、電子文書データＡに透かし情報Ｂを埋め込む装置である。なお、文書画像は文字を含んだ画像データ（例えば、ビットマップ形式データ）とし、電子文書データはそれ以外のテキストデータとする。また、透かし情報埋め込み装置１００は、ページ記述された電子文書データ（以下、ページ記述データ）Ａを入力する電子文書データ入力部１０１を有する。また、電子文書データ入力部１０１によって入力されたページ記述データＡから文書画像を生成する文書画像生成部１０２を有する。また、文書画像生成部１０２によって生成された文書画像中の各文字画像の外接矩形に関するレイアウト情報を検出する文書解析部１０３を有する。文書画像生成部１０２は、文書解析部１０３がレイアウト情報を検出可能である解像度で文書画像を生成する。また、文書解析部１０３で検出されたレイアウト情報に基づいて、注目する外接矩形（以下、注目矩形と言う。）と、この注目矩形に隣接する外接矩形との間における間隔を正規化するための正規化情報を算出する正規化情報算出部１０４を有する。 The watermark information embedding device (information processing device) 100 generates a document image from the input electronic document data A, changes the electronic document data A based on the document image, and embeds the watermark information B in the electronic document data A. Device. The document image is image data including characters (for example, bitmap format data), and the electronic document data is other text data. The watermark information embedding device 100 has an electronic document data input unit 101 for inputting electronic document data (hereinafter referred to as page description data) A described in a page. The document image generation unit 102 generates a document image from the page description data A input by the electronic document data input unit 101. The document analysis unit 103 also detects layout information regarding the circumscribed rectangle of each character image in the document image generated by the document image generation unit 102. The document image generation unit 102 generates a document image with a resolution that allows the document analysis unit 103 to detect layout information. In addition, based on the layout information detected by the document analysis unit 103, the interval between the circumscribed rectangle to be noticed (hereinafter referred to as the noticed rectangle) and the circumscribed rectangle adjacent to the noticed rectangle is normalized. A normalization information calculation unit 104 that calculates normalization information is included.

さらに、電子文書データ入力部１０１で入力された電子文書データを変更部１０５で正規化情報に基づいて変更する。さらに、透かし情報Ｂを入力する透かし情報入力部１０６と、透かし情報入力部１０６から入力された透かし情報Ｂに基づいて、電子文書データ上の正規化された間隔を調整することにより、透かし情報Ｂを埋め込む埋め込み部１０７を有する。さらに、透かし情報Ｂが埋め込まれた埋め込み文書印刷物Ｃとして出力する出力部１０８を有する。 Further, the electronic document data input by the electronic document data input unit 101 is changed by the changing unit 105 based on the normalization information. Further, the watermark information input unit 106 for inputting the watermark information B and the watermark information B by adjusting the normalized interval on the electronic document data based on the watermark information B input from the watermark information input unit 106. Embedded portion 107 is embedded. Furthermore, an output unit 108 that outputs as an embedded document print C in which the watermark information B is embedded is provided.

図２は、第１の実施形態及び第２の実施形態に係る透かし情報埋め込み装置１００の動作手順を示すフローチャートであり、図３は、文字間隔の組の例を示す図である。 FIG. 2 is a flowchart showing an operation procedure of the watermark information embedding device 100 according to the first and second embodiments, and FIG. 3 is a diagram showing an example of a set of character intervals.

まず、ステップＳ２０１において、ページ記述データＡが電子文書データ入力部１０１に入力される。ページ記述データＡは、プリンタに描画を指示したり、モニタに表示させたりするためのデータである。例えば、ＰＳ（ＰｏｓｔＳｃｒｉｐｔ）、ＸＰＳ（ＸＭＬＰａｐｅｒＳｐｅｃｉｆｉｃａｔｉｏｎ）、及びＰＤＦ（ＰｏｒｔａｂｌｅＤｏｃｕｍｅｎｔＦｏｒｍａｔ）が挙げられる。なお、ここでは、ページ記述データの例を示すが、ページ記述データ以外の電子文書データ、例えば、テキストファイルにも適用可能である。 First, in step S 201, page description data A is input to the electronic document data input unit 101. The page description data A is data for instructing the printer to perform drawing or for displaying on the monitor. Examples thereof include PS (Post Script), XPS (XML Paper Specification), and PDF (Portable Document Format). Although an example of page description data is shown here, the present invention can also be applied to electronic document data other than page description data, for example, a text file.

ステップＳ２０２において、文書画像生成部１０２でページ記述データＡから文書画像を生成する。なお、文書画像の解像度は正規化情報を誤りなく算出可能であるならば、生成時の規定の解像度よりも低くして良い。これによって、処理の高速化や画像を記憶する領域の削減が図れる。例えば、生成時の解像度ｃｒｅａｔｅ＿ｄｐｉは次のように設定できる。まず、いくつかのページ記述データから文書画像を生成し、以下で説明する文字間隔の最小値ｓｐａｃｅ＿ｍｉｎを求めておく。そして、規定の解像度ｏｒｄｅｒ＿ｄｐｉをｓｐａｃｅ＿ｍｉｎを割った値以上で設定できる一番低い解像度を生成時の解像度ｃｒｅａｔｅ＿ｄｐｉとする。つまり、ｏｒｄｅｒ＿ｄｐｉが６００、ｓｐａｃｅ＿ｍｉｎが４とすると、ｃｒｅａｔｅ＿ｄｐｉは６００／４＝１５０ｄｐｉとなる。この方法では、最小の文字間隔が縮小によって消滅し把握できなくなることはない。従って、正規化情報を誤りなく算出できる。 In step S202, the document image generation unit 102 generates a document image from the page description data A. If the normalization information can be calculated without error, the resolution of the document image may be lower than the standard resolution at the time of generation. As a result, the processing speed can be increased and the area for storing images can be reduced. For example, the resolution create_dpi at the time of generation can be set as follows. First, a document image is generated from some page description data, and a character spacing minimum value space_min described below is obtained. The lowest resolution that can be set equal to or higher than the prescribed resolution order_dpi divided by space_min is defined as the resolution create_dpi at the time of generation. That is, if order_dpi is 600 and space_min is 4, create_dpi is 600/4 = 150 dpi. In this method, the minimum character spacing does not disappear due to the reduction and cannot be grasped. Therefore, normalization information can be calculated without error.

ステップＳ２０３において、生成された文書画像から外接矩形（文字領域）が検出される。外接矩形は、図３で示すように、文字に外接する矩形である。また、外接矩形は、本来、文字認識を行う領域を示すものであるが、透かし情報技術においては、埋め込み操作の対象となる文字領域を示すものである。 In step S203, a circumscribed rectangle (character area) is detected from the generated document image. The circumscribed rectangle is a rectangle circumscribing the character as shown in FIG. The circumscribed rectangle originally indicates an area where character recognition is performed. In the watermark information technique, the circumscribed rectangle indicates a character area to be embedded.

外接矩形を検出する際には、まず、文書画像の各画素値を垂直座標軸に対して射影し、空白部分（黒色である文字のない部分）を探索して行を判別して行分割を行う。その後、行単位で文書画像を水平座標軸に対して射影し、空白部分を探索して文字単位に分割する。これにより、各文字を外接矩形で切り出すことが可能となり、外接矩形を検出することができる。 When detecting a circumscribed rectangle, first, each pixel value of the document image is projected onto the vertical coordinate axis, and a line is divided by searching for a blank portion (a portion without a black character) to determine a row. . Thereafter, the document image is projected on the horizontal coordinate axis line by line, and a blank portion is searched and divided into characters. Thereby, each character can be cut out by a circumscribed rectangle, and the circumscribed rectangle can be detected.

このようにして、外接矩形が検出された場合には、ステップＳ２０４において、外接矩形から文書画像の文字間隔が算出される。なお、文字間隔は、例えば、図３のＰ又はＳで示すように、注目矩形と、注目矩形に隣接する外接矩形と、の間の間隔のことである。その後、ステップＳ２０４で算出された文字間隔に基づいて、ステップＳ２０５において、文字間隔を正規化するための正規化情報を算出する。なお、ステップＳ２０５の詳細については後述する。 When the circumscribed rectangle is detected in this way, the character spacing of the document image is calculated from the circumscribed rectangle in step S204. Note that the character spacing is, for example, the spacing between the target rectangle and the circumscribed rectangle adjacent to the target rectangle, as indicated by P or S in FIG. After that, based on the character spacing calculated in step S204, normalization information for normalizing the character spacing is calculated in step S205. Details of step S205 will be described later.

ステップＳ２０６において、ステップ２０５で算出された正規化情報に基づいて、ページ記述データを変更する。なお、ステップＳ２０６の詳細については後述する。次に、ステップＳ２０７において、埋め込むべき透かし情報Ｂが透かし情報入力部１０６から入力され、変更されたページ記述データ上の文字間隔を調整することによって、透かし情報Ｂが埋め込まれる。なお、ステップＳ２０７の詳細については後述する。 In step S206, the page description data is changed based on the normalization information calculated in step 205. Details of step S206 will be described later. In step S207, the watermark information B to be embedded is input from the watermark information input unit 106, and the watermark information B is embedded by adjusting the character spacing on the changed page description data. Details of step S207 will be described later.

最後に、ステップＳ２０８において、透かし情報Ｂが埋め込まれた埋め込み文書印刷物Ｃが出力される。 Finally, in step S208, the embedded document printed matter C in which the watermark information B is embedded is output.

［正規化情報算出部１０４（ステップＳ２０５）］
図４は、第１の実施形態に係る正規化情報算出部１０４の動作手順を示すフローチャートである。正規化情報算出部１０４での処理は、注目矩形と、注目矩形の前に隣接する外接矩形（以下、前矩形と言う。）及び注目矩形の後ろに隣接する外接矩形（以下、後矩形と言う。）と、の間の文字間隔を調整する。なお、図３で示すように、注目矩形と前矩形との間の文字間隔をＰ、注目矩形と後矩形との間の文字間隔をＳとする。 [Normalized information calculation unit 104 (step S205)]
FIG. 4 is a flowchart showing an operation procedure of the normalized information calculation unit 104 according to the first embodiment. Processing in the normalization information calculation unit 104 includes a target rectangle, a circumscribed rectangle adjacent to the target rectangle (hereinafter referred to as a front rectangle), and a circumscribed rectangle adjacent to the rear of the target rectangle (hereinafter referred to as a rear rectangle). )), Adjust the character spacing. Note that, as shown in FIG. 3, the character interval between the target rectangle and the front rectangle is P, and the character interval between the target rectangle and the rear rectangle is S.

まず、ステップＳ２０５ａにおいて、注目矩形に隣接する前後の文字間隔の組Ｐ、Ｓを選択する。Ｐ、Ｓは、例えば、ある行の外接矩形の数が３０である場合（但し、文字数が３０であるとは限らない。）には、その行の両端の外接矩形は除外し、偶数番目の外接矩形に着目する。 First, in step S205a, a pair P and S of character intervals before and after adjacent to the target rectangle are selected. For example, when the number of circumscribed rectangles in a line is 30 (however, the number of characters is not necessarily 30), P and S exclude the circumscribed rectangles at both ends of the line, and even-numbered Focus on the circumscribed rectangle.

電子文書データ入力部１０１から入力されたページ記述データＡにおいて、前述の文字間隔Ｐ、Ｓに対応し、ページ記述データ上の文字間隔を操作できる値である次の文字までの距離をＡ０、Ｂ０とする。なお、Ａ０、Ｂ０は、例えば、図６で示すように、ページ記述データ６０１内で数値化されて表現される。また、文字とＡ０、Ｂ０との対応関係を図１４に示す。図１４のように、例えば、Ａ０は「デ」の描画開始位置から「ジ」の描画開始位置までの距離を示しており、その距離は４４ｐｉｘｅｌである。なお、ページ記述データの種類によっては次の文字までの距離ではなく、ページ記述データ上の文字間隔そのものであったりするが、文字間隔を操作できる値であれば、本発明が適用できるのはいうまでもない。 In the page description data A input from the electronic document data input unit 101, the distance to the next character corresponding to the above-described character spacing P, S and the character spacing on the page description data can be manipulated. And Note that A0 and B0 are expressed in numerical form in the page description data 601 as shown in FIG. 6, for example. FIG. 14 shows the correspondence between characters and A0 and B0. As shown in FIG. 14, for example, A0 indicates the distance from the drawing start position of “de” to the drawing start position of “di”, and the distance is 44 pixels. Depending on the type of page description data, it is not the distance to the next character, but the character spacing itself on the page description data. However, the present invention can be applied to any value that can manipulate the character spacing. Not too long.

次に、ステップＳ２０５ｂにおいて、ＰとＳとの値が等しいか否かが判断される。ステップＳ２０５ｂで、ＰとＳとの値が等しいと判断された場合には、ステップＳ２０５ｃに進み、ＰとＳとの値が等しくないと判断された場合には、ステップＳ２０５ｄに進む。 Next, in step S205b, it is determined whether or not the values of P and S are equal. If it is determined in step S205b that the values of P and S are equal, the process proceeds to step S205c. If it is determined that the values of P and S are not equal, the process proceeds to step S205d.

ステップＳ２０５ｃにおいて、ページ記述データ上の値であるＡ０、Ｂ０のそれぞれに対応する正規化情報であるＸ、Ｙの値を０とする。 In step S205c, the values of X and Y that are normalization information corresponding to the values A0 and B0 on the page description data are set to zero.

ステップＳ２０５ｄにおいて、文字間隔Ｐ及びＳの値の平均値Ｚ＝（Ｐ＋Ｓ）／２を算出する。なお、ページ記述データＡの文字間隔において、小数点以下を使用しない場合には、Ｚは小数点以下の切り捨て、切り上げ等の処理が行われる。 In step S205d, an average value Z = (P + S) / 2 of the character spacings P and S is calculated. Note that in the character spacing of the page description data A, when the decimal part is not used, Z is subjected to processing such as rounding down or rounding up.

ステップＳ２０５ｅにおいて、Ｐ及びＳの値とＺを用いて、Ｘ及びＹをそれぞれＸ＝Ｚ−Ｐ、Ｙ＝Ｚ−Ｓとして算出する。
最後に、ステップＳ２０５ｆにおいて、注目矩形が文書画像の最終の外接矩形であるか否かを判断する。注目矩形が最終の外接矩形であると判断された場合には、ステップＳ２０５を終了し、注目矩形が最終の外接矩形ではないと判断された場合には、ステップＳ２０５ａに戻る。 In step S205e, using the values of P and S and Z, X and Y are calculated as X = Z−P and Y = Z−S, respectively.
Finally, in step S205f, it is determined whether the target rectangle is the final circumscribed rectangle of the document image. If it is determined that the target rectangle is the final circumscribed rectangle, step S205 is ended. If it is determined that the target rectangle is not the final circumscribed rectangle, the process returns to step S205a.

［変更部１０５（ステップＳ２０６）］
図５は、第１の実施形態に係る変更部１０５の動作手順を示すフローチャートである。変更部１０５は、上述の算出した正規化情報に基づいて、次の文字までの距離を変更する。その結果としてページ記述データ上の文字間隔が正規化される。 [Change unit 105 (step S206)]
FIG. 5 is a flowchart showing an operation procedure of the changing unit 105 according to the first embodiment. The changing unit 105 changes the distance to the next character based on the calculated normalization information. As a result, the character spacing on the page description data is normalized.

まず、ステップＳ２０６ａにおいて、隣接する前後の文字間隔の組Ｐ、Ｓに対応した文字情報である次の文字までの距離Ａ０、Ｂ０を選択する。 First, in step S206a, distances A0 and B0 to the next character, which is character information corresponding to the adjacent character spacing sets P and S, are selected.

次に、ステップＳ２０６ｂにおいて、変更後の次の文字までの距離Ａ１、Ｂ１をＡ１＝Ａ０＋α×Ｘ、Ｂ１＝Ｂ０＋α×Ｙとして算出し、データ内容を変更する。ここで、αは文書画像をページ記述データに対応させる係数である、予め設定した値でもよいし、新たに入力した値でもよい。 Next, in step S206b, the distances A1 and B1 to the next character after the change are calculated as A1 = A0 + α × X and B1 = B0 + α × Y, and the data content is changed. Here, α may be a preset value or a newly input value that is a coefficient for associating the document image with the page description data.

最後に、ステップＳ２０６ｃにおいて、Ｐ、Ｓに対応した文字情報が最後であるか否かを判断する。ステップＳ２０６ｃで、文字情報が最後であると判断された場合には、ステップＳ２０６が終了し、文字情報が最後でないと判断された場合には、ステップＳ２０６ａに戻る。 Finally, in step S206c, it is determined whether or not the character information corresponding to P and S is the last. If it is determined in step S206c that the character information is the last, step S206 is ended. If it is determined that the character information is not the last, the process returns to step S206a.

以上より、文字間隔が正規化されるため、ページ記述データから文字間隔の情報を得られなくとも埋め込みが可能になる。 As described above, since the character spacing is normalized, it is possible to embed even if character spacing information cannot be obtained from the page description data.

［埋め込み部１０７（ステップＳ２０７）］
図１５は、第１の実施形態に係る埋め込み部１０７の動作手順を示すフローチャートである。埋め込み部１０７は、ページ記述データ上の隣接する前後の文字間隔の大小関係を次の文字までの距離を用いて操作し、ページ記述データＡに透かし情報Ｂを挿入する。 [Embedding unit 107 (step S207)]
FIG. 15 is a flowchart showing an operation procedure of the embedding unit 107 according to the first embodiment. The embedding unit 107 operates the size relationship between adjacent character intervals on the page description data using the distance to the next character, and inserts the watermark information B into the page description data A.

まず、ステップＳ２０７ａにおいて、次の文字までの距離Ａ１、Ｂ１を選択する。ステップＳ２０７ｂにおいて、透かし情報入力部１０６より入力された透かし情報Ｂのビット列の中から埋め込むべき透かしビットを選択する。本実施形態では、透かし情報Ｂは、０又は１のビット列として構成される。 First, in step S207a, distances A1 and B1 to the next character are selected. In step S207b, a watermark bit to be embedded is selected from the bit string of the watermark information B input from the watermark information input unit 106. In the present embodiment, the watermark information B is configured as a bit string of 0 or 1.

次に、ステップＳ２０７ｃにおいて、ステップＳ２０７ｂで選択され、埋め込まれた透かしビットが１であるか否かを判断する。ステップＳ２０７ｃで、埋め込まれた透かしビットが１であると判断された場合には、ステップＳ２０７ｄにおいて、Ｂ１に対応した文字間隔よりもＡ１に対応した文字間隔の方が大きくなるように、透かし情報の埋め込みを行う。すなわち、埋め込み後の次の文字までの距離Ａ２、Ｂ２をＡ２＝Ａ１＋γ、Ｂ２＝Ｂ２−γとして算出し、透かし情報の埋め込みを行う。ここで、γは任意の正の値である。 Next, in step S207c, it is determined whether or not the watermark bit selected and embedded in step S207b is 1. If it is determined in step S207c that the embedded watermark bit is 1, in step S207d, the watermark information of the watermark information is set so that the character interval corresponding to A1 is larger than the character interval corresponding to B1. Perform embedding. That is, the distances A2 and B2 to the next character after embedding are calculated as A2 = A1 + γ and B2 = B2-γ, and watermark information is embedded. Here, γ is an arbitrary positive value.

一方、ステップＳ２０７ｃで、埋め込まれた透かしビットが１ではない、すなわち、埋め込まれた透かしビットが０であると判断された場合には、ステップＳ２０７ｅに進む。ここでは、Ｂ１に対応した文字間隔よりもＡ１に対応した文字間隔の方が小さくなるように、透かし情報の埋め込みを行う。すわなち、埋め込み後の次の文字までの距離Ａ２、Ｂ２をＡ２＝Ａ１−γ、Ｂ２＝Ｂ２＋γとして算出し、透かし情報の埋め込みを行う。なお、ステップＳ２０７ｄ及びステップＳ２０７ｅにおいて、次の文字までの距離の増減を任意の正の値であるγで操作した。この結果、文書画像中における注目矩形を移動する操作と同様の効果が得られる。 On the other hand, if it is determined in step S207c that the embedded watermark bit is not 1, that is, the embedded watermark bit is 0, the process proceeds to step S207e. Here, the watermark information is embedded so that the character interval corresponding to A1 is smaller than the character interval corresponding to B1. That is, the distances A2 and B2 to the next character after embedding are calculated as A2 = A1−γ and B2 = B2 + γ, and watermark information is embedded. In steps S207d and S207e, the increase / decrease of the distance to the next character was manipulated with an arbitrary positive value γ. As a result, the same effect as the operation of moving the target rectangle in the document image can be obtained.

これにより、ページ記述データ上の隣接する前後の文字間隔の大小関係を調整することによって、文字間隔の１組毎に１ビットの透かし情報（０又は１）が挿入される。例えば、１行に文字が３０ある場合には、１４ビット（＝３０／２−１）の透かし情報Ｂを埋め込むことができる。 Thus, by adjusting the size relationship between adjacent character intervals on the page description data, 1-bit watermark information (0 or 1) is inserted for each set of character intervals. For example, when there are 30 characters in one line, 14-bit (= 30 / 2-1) watermark information B can be embedded.

なお、本実施形態では、偶数番目の文字の位置を調整することでページ記述データに透かし情報Ｂを埋め込んだが、隣接する文字で位置を調整しなければよいため、奇数番目の文字の位置を調整することでページ記述データに透かし情報Ｂを埋め込んでもよい。 In this embodiment, the watermark information B is embedded in the page description data by adjusting the position of the even-numbered character. However, the position of the odd-numbered character is adjusted because it is not necessary to adjust the position with the adjacent character. By doing so, the watermark information B may be embedded in the page description data.

また、本実施形態では、文字の１つ置きに透かし情報Ｂを埋め込んだ。すなわち、埋め込み対象の文字の間に埋め込み対象外の文字を１つ設けた。これは、連続する文字に順番に透かし情報Ｂを埋め込んだ場合には、１つの文字間隔が隣接する文字同士で共有する状態となり、また、制御できるのが常に一つの次の文字までの距離となるためである。このため、徐々に誤差が累積されてしまい、レイアウト（配置情報）に大きく影響してしまうことによるものである。ただし、本発明は、文字間隔を用いて情報を埋め込む手法であれば、説明した手順に限定されるものではない。 In the present embodiment, the watermark information B is embedded every other character. That is, one non-embedding character is provided between the characters to be embedded. This is because when watermark information B is sequentially embedded in consecutive characters, one character interval is shared between adjacent characters, and the distance to one next character can always be controlled. It is to become. For this reason, errors are gradually accumulated, which greatly affects the layout (placement information). However, the present invention is not limited to the described procedure as long as information is embedded using character spacing.

最後に、ステップＳ２０７ｆにおいて、ページ記述データ上の文字情報が最後であるか否かを判断する。ステップＳ２０７ｆで、ページ記述データ上の文字情報が最後であると判断された場合には、ステップＳ２０７が終了し、ページ記述データ上の文字情報が最後でないと判断された場合には、ステップＳ２０７ａに戻る。 Finally, in step S207f, it is determined whether or not the character information on the page description data is the last. If it is determined in step S207f that the character information on the page description data is the last, step S207 is terminated. If it is determined that the character information on the page description data is not the last, the process proceeds to step S207a. Return.

図６は、本実施形態におけるページ記述データの変化例を示す図である。まず、電子文書データ入力部１０１に入力された電子文書データが入力される。ここで、入力される電子文書データの一例が、ページ記述データ６０１である。また、ページ記述データ６０１中で、前述の文字間隔Ｐ及びＳを操作できる値である次の文字までの距離を示す部分が次の文字までの距離データ６０１ａ、６０１ｂで示す部分である。ここでは、次の文字までの距離データ６０１ａ、６０１ｂの内で任意に選定した次の文字までの距離Ａ０、Ｂ０に着目して説明する。 FIG. 6 is a diagram illustrating a change example of the page description data in the present embodiment. First, the electronic document data input to the electronic document data input unit 101 is input. Here, an example of the input electronic document data is page description data 601. Further, in the page description data 601, the portion indicating the distance to the next character, which is a value capable of operating the character spacings P and S, is the portion indicated by the distance data 601a and 601b to the next character. Here, description will be made by paying attention to distances A0 and B0 to the next character arbitrarily selected from the distance data 601a and 601b to the next character.

次に、変更部１０５において、文書画像中の文字間隔を正規化するための正規化情報に基づいて、電子文書データが変更される。ここで、変更された電子文書データの一例が、ページ記述データ６０２である。また、ページ記述データ６０２中で、次の文字までの距離データ６０１ａ、６０１ｂに対応する部分が次の文字までの距離データ６０２ａ、６０２ｂである。また、次の文字までの距離Ａ０、Ｂ０に対応する部分が次の文字までの距離Ａ１、Ｂ１であり、次の文字までの距離は、Ａ０からＡ１、及びＢ０からＢ１に変更されたことがわかる。 Next, the change unit 105 changes the electronic document data based on normalization information for normalizing the character spacing in the document image. Here, an example of the changed electronic document data is page description data 602. In the page description data 602, the portions corresponding to the distance data 601a and 601b to the next character are the distance data 602a and 602b to the next character. The portions corresponding to the distances A0 and B0 to the next character are the distances A1 and B1 to the next character, and the distance to the next character has been changed from A0 to A1 and from B0 to B1. Recognize.

次に、埋め込み部１０７において、変更部１０５において変更された電子文書データに透かし情報Ｂが埋め込まれる。ここで、透かし情報埋め込み後の電子文書データの一例が、ページ記述データ６０３である。また、ページ記述データ６０２中で、次の文字までの距離データ６０２ａ、６０２ｂに対応する部分が６０３ａ、６０３ｂである。また、次の文字までの距離Ａ１、Ｂ１に対応する部分が次の文字までの距離Ａ２、Ｂ２であり、次の文字までの距離は、Ａ１からＡ２、及びＢ１からＢ２に変更されたことがわかる。 Next, the embedding unit 107 embeds watermark information B in the electronic document data changed by the changing unit 105. Here, an example of electronic document data after embedding watermark information is page description data 603. In the page description data 602, portions corresponding to distance data 602a and 602b to the next character are 603a and 603b. Further, the portions corresponding to the distances A1 and B1 to the next character are the distances A2 and B2 to the next character, and the distance to the next character has been changed from A1 to A2 and from B1 to B2. Recognize.

図７は、第１の実施形態に係る透かし情報抽出装置（情報処理装置）７００の構成図である。画像入力部７０１には、埋め込み文書印刷物Ｃが入力される。検出部７０２は、画像入力部７０１に入力された文書画像から文字の位置関係を解析する。透かし情報抽出部７０３は、文字の位置関係（文字間隔の大小関係）に埋め込まれた透かし情報Ｄを抽出して出力する。 FIG. 7 is a configuration diagram of a watermark information extraction apparatus (information processing apparatus) 700 according to the first embodiment. An embedded document print C is input to the image input unit 701. The detection unit 702 analyzes the positional relationship between characters from the document image input to the image input unit 701. The watermark information extraction unit 703 extracts and outputs the watermark information D embedded in the character positional relationship (character spacing relationship).

図８は、透かし情報抽出部７０３の動作手順を示すフローチャートである。まず、ステップＳ８０１において、透かし情報が埋め込まれた文書画像を入力する。スキャナ等の画像入力部７０１から埋め込み文書印刷物Ｃの文書画像が読込まれる。 FIG. 8 is a flowchart showing an operation procedure of the watermark information extraction unit 703. First, in step S801, a document image in which watermark information is embedded is input. The document image of the embedded document printed matter C is read from the image input unit 701 such as a scanner.

ステップＳ８０２において、検出部７０２で文書画像から外接矩形を検出する。次に、ステップＳ８０３において、検出部７０２で文字間隔を外接矩形から算出する。なお、算出手順は、透かし情報を埋め込む処理と同様に行うことができる。最後に、ステップＳ８０４において、透かし情報抽出部７０３で透かし情報を抽出する。この抽出処理は、注目矩形に隣接する前後の文字間隔の組Ｐ１、Ｓ１の大小関係によって、１組毎に１ビット（０又は１）の情報が抽出される。 In step S802, the detection unit 702 detects a circumscribed rectangle from the document image. In step S803, the detection unit 702 calculates the character spacing from the circumscribed rectangle. The calculation procedure can be performed in the same manner as the process of embedding watermark information. Finally, in step S804, the watermark information extraction unit 703 extracts watermark information. In this extraction process, 1-bit (0 or 1) information is extracted for each set depending on the size relationship between the sets P1 and S1 of the character spacing before and after the rectangle of interest.

本実施形態によれば、文書画像の文字間隔の情報を電子文書データに反映させた後で、電子文書データ上で透かし情報を埋め込む。したがって、電子文書データに対して文字間隔を操作することで透かし情報を埋め込み、その印刷文書から透かし情報を抽出できる。また、文書画像上で埋め込み、ブロックセレクションやＯＣＲ等の処理を行ない、処理後の文書画像を電子文書データに変換するよりも文字間隔の誤差が発生しにくく、処理量が少ない。 According to this embodiment, after reflecting the character spacing information of the document image in the electronic document data, the watermark information is embedded in the electronic document data. Therefore, it is possible to embed watermark information by manipulating the character spacing for the electronic document data and extract the watermark information from the printed document. In addition, character spacing errors are less likely to occur and the amount of processing is smaller than when embedding on a document image, processing such as block selection or OCR, and converting the processed document image into electronic document data.

なお、本実施形態では、正規化情報を用いて電子文書データを変更した後に、透かし情報の埋め込みを行なった。しかしながら、文書画像の文字間隔の関係から正規化せずに透かし情報の埋め込みを直接行なっても良い。 In this embodiment, watermark information is embedded after electronic document data is changed using normalization information. However, the watermark information may be directly embedded without normalization because of the character spacing of the document image.

なお、本実施形態では、正規化情報算出から透かし情報の埋め込みまでの各処理を文書全体で行った。しかしながら、文書の行毎に前述の各処理を行っても良い。これによって、処理結果を記憶する領域の削減が図れる。 In the present embodiment, each process from normalization information calculation to watermark information embedding is performed on the entire document. However, the above-described processes may be performed for each line of the document. As a result, the area for storing the processing results can be reduced.

なお、本実施形態では、文字間隔を調整することによって透かし情報を埋め込む手法を説明したが、本発明は、行間隔を調整することによって透かし情報を埋め込む手法にも適用することができる。その場合は、本実施形態において、文字の外接矩形を処理対象である構成画像としたが、一行分の文字が含まれる外接矩形を、処理対象である構成画像とすればよい。 In this embodiment, the method of embedding watermark information by adjusting the character spacing has been described. However, the present invention can also be applied to a method of embedding watermark information by adjusting the line spacing. In this case, in the present embodiment, the circumscribed rectangle of the character is the component image to be processed, but the circumscribed rectangle including the character for one line may be the component image to be processed.

また、本発明は、文字間隔や行間隔を調整するだけでなく、図形と文字との間の間隔や、図形と図形との間の間隔を調整する場合にも適用することができる。また、本発明は、間隔を調整して情報を埋め込むだけでなく、図形や文字のサイズを変化させたり、これらの位置を調整することによって、透かし情報を埋め込む方式にも適用することができる。 The present invention can be applied not only to adjusting the character spacing and line spacing but also to adjusting the spacing between graphics and characters and the spacing between graphics and graphics. The present invention can be applied not only to embedding information by adjusting the interval, but also to a method of embedding watermark information by changing the size of a figure or character or adjusting the position thereof.

［第１の実施形態の変形例］
上述の第１の実施形態では、正規化情報算出部１０４において、文書画像から順々に文字間隔の組ＰとＳを選択して正規化情報を算出した。しかしながら、図１７に示すようにページ記述データ（１７０１）が認識する文字領域（１７０１ａ、１７０１ｂ）と文書画像（１７０２）が認識する文字領域（１７０２ａ、１７０２ｂ）とが異なる場合がある。この原因としては、例えば、１７０２ｃのようにブロックセレクションの結果、図の一部が文字として誤って認識する場合等が挙げられる。このような場合、ページ記述データに存在しない文字間隔を用いて正規化してしまう。その結果、透かし情報を埋め込むことができない。 [Modification of First Embodiment]
In the first embodiment described above, the normalization information calculation unit 104 calculates the normalization information by selecting the character spacing pairs P and S in order from the document image. However, as shown in FIG. 17, the character areas (1701a, 1701b) recognized by the page description data (1701) may be different from the character areas (1702a, 1702b) recognized by the document image (1702). As this cause, for example, a case where a part of the figure is erroneously recognized as a character as a result of block selection as in 1702c, and the like can be mentioned. In such a case, normalization is performed using character spacing that does not exist in the page description data. As a result, watermark information cannot be embedded.

そこで、ページ記述データにおいて文字コードが存在する行の座標と一致する座標を持つ行のみに対して、正規化情報を算出する。従って、ページ記述データの各行に透かし情報を埋め込むことができる確率が高くなる。 Therefore, normalization information is calculated only for lines having coordinates that match the coordinates of the lines in which character codes exist in the page description data. Therefore, the probability that watermark information can be embedded in each row of page description data is increased.

図１６は、第１の実施形態の変形例に係る正規化情報算出部１０４の動作手順を示すフローチャートである。なお、ステップＳ２０６ｊからステップＳ２０６ｍの各処理はそれぞれステップＳ２０５ｂからステップＳ２０５ｅまでの各処理と同様であるため、説明を省略する。 FIG. 16 is a flowchart illustrating an operation procedure of the normalized information calculation unit 104 according to the modification of the first embodiment. Note that the processes from step S206j to step S206m are the same as the processes from step S205b to step S205e, respectively, and thus description thereof is omitted.

まず、ステップＳ２０６ｇにおいて、文書画像の行を選択する。例えば文書画像の右上に近い行から順々に選択する。 First, in step S206g, a line of a document image is selected. For example, selection is made in order from the line near the upper right of the document image.

ステップＳ２０６ｈにおいて、Ｓ２０６ｇで選択した行の始めの座標と一致する座標を持つ行がページ記述データに存在するか否かを判定する。文書画像における行の始めの座標は例えば、図１７の１７０２ｄのような行の最初の文字の座標（水平座標＝「７４１」、垂直座標＝「５５８５」等）である。一方、ページ記述データにおける行の始めの座標は例えば、図１７の１７０１ｄのように文字コードが存在する行の最初の数字（水平座標＝「７２９」、垂直座標＝「５５８４」等）で示される。ここで、座標の単位は文書画像及びページ記述データともにｐｉｘｅｌである。なお、文字コードは存在しない図１７の１７０１ｃのような絵に相当する部分の座標は除外する。これによって、文字部分のみに透かし情報の埋め込みが可能になる。ただし、ページ記述データの種類によっては文字部分のみに座標が割り当てられている場合も考えられる。その場合は、ページ記述データの全ての座標を判定対象とすればよい。 In step S206h, it is determined whether or not there is a line in the page description data having a coordinate that coincides with the first coordinate of the line selected in S206g. The coordinates of the beginning of the line in the document image are, for example, the coordinates of the first character of the line such as 1702d in FIG. 17 (horizontal coordinates = “741”, vertical coordinates = “5585”, etc.). On the other hand, the coordinates of the beginning of the line in the page description data are indicated by, for example, the first number (horizontal coordinate = “729”, vertical coordinate = “5584”, etc.) of the line where the character code exists as shown by 1701d in FIG. . Here, the unit of coordinates is pixel for both the document image and the page description data. Note that the coordinates of a portion corresponding to a picture such as 1701c in FIG. This makes it possible to embed watermark information only in the character portion. However, depending on the type of page description data, the coordinates may be assigned only to the character portion. In that case, all the coordinates of the page description data may be determined.

座標が一致するか否かの判定において、ページ記述データの座標と文書画像の座標には文書画像の生成方法によって、多少の誤差が生じると考えられる。従って、誤差を考慮して、例えば、座標差が２０以下なら同じ行と判定する。誤差を考慮して図１７では、文書画像１７０２の水平座標＝「７４１」、垂直座標＝「５５８５」とページ記述データ１７０１の水平座標＝「７２９」、垂直座標＝「５５８４」が一致すると判定できるため、行が存在することが分かる。 In determining whether or not the coordinates match, it is considered that some errors occur between the coordinates of the page description data and the coordinates of the document image depending on the document image generation method. Therefore, considering the error, for example, if the coordinate difference is 20 or less, it is determined that the same row. In consideration of the error, it can be determined in FIG. 17 that the horizontal coordinate = “741” and the vertical coordinate = “5585” of the document image 1702 and the horizontal coordinate = “729” and the vertical coordinate = “5584” of the page description data 1701 match. Therefore, it can be seen that the line exists.

また、ページ記述データの各行の座標データは、例えばステップＳ２０１のページ記述データ読み込み時に全て取得しメモリ等に予め保持される。ただし、ステップＳ２０６ｈの処理で逐次座標データを探索してもよい。 Further, the coordinate data of each line of the page description data is acquired, for example, when the page description data is read in step S201, and is stored in advance in a memory or the like. However, the coordinate data may be searched sequentially in the process of step S206h.

ステップＳ２０６ｈにおいて、行が存在する場合は、ステップＳ２０６ｉにおいて、行の始めから順々に文字間隔の組ＰとＳを選択し、ステップＳ２０６ｊからステップＳ２０６ｍまでの各処理を場合に応じて行なう。そして、ステップＳ２０６ｎにおいて、行の終わりか否かを判定し、行の終わりまで処理を行ない、ステップＳ２０６ｏに進む。ステップＳ２０６ｏにおいて文書の終わりか否かを判定する。文書の終わりではないなら、ステップＳ２０６ｇに進み、文書の終わりなら、ステップＳ２０６を終了する。一方、Ｓ２０６ｈにおいて、行が存在しない場合は、ステップＳ２０６ｏに進む。 In step S206h, if there is a line, character spacing sets P and S are selected in order from the beginning of the line in step S206i, and the processes from step S206j to step S206m are performed according to circumstances. In step S206n, it is determined whether or not it is the end of the line, processing is performed until the end of the line, and the process proceeds to step S206o. In step S206o, it is determined whether or not it is the end of the document. If it is not the end of the document, the process proceeds to step S206g, and if it is the end of the document, step S206 is ended. On the other hand, if there is no row in S206h, the process proceeds to step S206o.

＜第２の実施形態＞
上述の第１の実施形態では、変更部１０５において、ページ記述データ中の文字がどんな文字であっても、文字間隔を無条件に文書画像からの正規化情報によって変更した。そのため、ページ記述データ中に句読点が混在する場合には、句読点前後の文字間隔も変更されるため、文書として不自然になる場合がある。 <Second Embodiment>
In the first embodiment described above, the changing unit 105 changes the character spacing unconditionally with the normalization information from the document image, regardless of the character in the page description data. Therefore, when punctuation marks are mixed in the page description data, the character spacing before and after the punctuation marks is also changed, which may make the document unnatural.

図９は、句読点前後における文字間隔の組の例を示す図である。本実施形態では、第１の実施形態と異なり、句読点の直後の文字間隔を含む文字間隔の組を変更せず、かつ、この文字間隔の組には透かし情報を埋め込まない。 FIG. 9 is a diagram showing an example of character spacing sets before and after punctuation marks. In the present embodiment, unlike the first embodiment, the set of character intervals including the character interval immediately after the punctuation mark is not changed, and the watermark information is not embedded in the set of character intervals.

例えば、注目矩形９０４の直後の文字間隔Ｓを含む文字間隔Ｐ、Ｓの組とそれに対応する次の文字までの距離Ａ０、Ｂ０の組は、９０１、９０２が該当する。これらの文字間隔の組には、透かし情報を埋め込まないこととなる。また、透かし情報を抽出する際には、埋め込み対象としない外接矩形の前後の文字間隔の組は除外して行う。 For example, 901 and 902 correspond to a set of character intervals P and S including a character interval S immediately after the target rectangle 904 and a distance A0 and B0 corresponding to the next character. Watermark information is not embedded in these sets of character intervals. Further, when extracting watermark information, a set of character intervals before and after a circumscribed rectangle not to be embedded is excluded.

第２の実施形態に係る透かし情報埋め込み装置は、上述した第１の実施形態と同様の構成であるが、変更部１０５（ステップＳ２０６）及び埋め込み部１０７（ステップＳ２０７）での処理は、下記の点で異なる。 The watermark information embedding device according to the second embodiment has the same configuration as that of the first embodiment described above, but the processing in the changing unit 105 (step S206) and the embedding unit 107 (step S207) is as follows. It is different in point.

［変更部１０５（ステップＳ２０６）］
図１０は、第２の実施形態に係る変更部１０５の動作手順を示すフローチャートである。まず、ステップＳ２０６ｄにおいて、隣接する前後の文字間隔の組Ｐ、Ｓに対応した文字情報である次の文字までの距離Ａ０、Ｂ０を選択する。 [Change unit 105 (step S206)]
FIG. 10 is a flowchart illustrating an operation procedure of the changing unit 105 according to the second embodiment. First, in step S206d, distances A0 and B0 to the next character, which is character information corresponding to the sets P and S of adjacent adjacent character intervals, are selected.

次に、ステップＳ２０６ｅにおいて、ページ記述データＡの句読点位置情報に基づいて、Ａ０又はＢ０が句読点の直後の文字間隔に対応する次の文字までの距離であるか否かが判断される。なお、句読点位置情報は、例えば、文字コードで示される。ステップＳ２０６ｅにおいて、Ａ０又はＢ０が句読点の直後の文字間隔に対応する次の文字までの距離であると判断されない場合には、ステップＳ２０６ｆに進む。そして、ステップＳ２０６ｅにおいて、Ａ０又はＢ０が句読点の直後の文字間隔に対応する次の文字までの距離であると判断された場合には、ステップＳ２０６ｇに進む。 Next, in step S206e, based on the punctuation mark position information of the page description data A, it is determined whether A0 or B0 is the distance to the next character corresponding to the character spacing immediately after the punctuation mark. The punctuation mark position information is indicated by a character code, for example. If it is not determined in step S206e that A0 or B0 is the distance to the next character corresponding to the character spacing immediately after the punctuation mark, the process proceeds to step S206f. In step S206e, if it is determined that A0 or B0 is the distance to the next character corresponding to the character spacing immediately after the punctuation mark, the process proceeds to step S206g.

ステップＳ２０６ｆにおいて、変更後の次の文字までの距離Ａ１、Ｂ１をＡ１＝Ａ０＋α×Ｘ、Ｂ１＝Ｂ０＋α×Ｙとして算出し、データ内容を変更する。 In step S206f, the distances A1 and B1 to the next character after the change are calculated as A1 = A0 + α × X and B1 = B0 + α × Y, and the data content is changed.

一方、ステップＳ２０６ｇにおいて、Ａ１＝Ａ０、Ｂ１＝Ｂ０とする。すなわち、データ内容は変更されない。 On the other hand, in step S206g, A1 = A0 and B1 = B0. That is, the data content is not changed.

最後に、ステップＳ２０６ｈにおいて、Ｐ、Ｓに対応した文字情報が最後であるか否かを判断する。ステップＳ２０６ｈで、文字情報が最後であると判断された場合には、ステップＳ２０６が終了し、文字情報が最後でないと判断された場合には、ステップＳ２０６ｄに戻る。 Finally, in step S206h, it is determined whether or not the character information corresponding to P and S is the last. If it is determined in step S206h that the character information is the last, step S206 ends. If it is determined that the character information is not the last, the process returns to step S206d.

［埋め込み部１０７（ステップＳ２０７）］
図１１は、第２の実施形態に係る埋め込み部１０７の動作手順を示すフローチャートである。まず、ステップＳ２０７ｇにおいて、次の文字までの距離Ａ１、Ｂ１を選択する。 [Embedding unit 107 (step S207)]
FIG. 11 is a flowchart showing an operation procedure of the embedding unit 107 according to the second embodiment. First, in step S207g, distances A1 and B1 to the next character are selected.

ステップＳ２０７ｈにおいて、ページ記述データＡの句読点位置情報に基づいて、Ａ１又はＢ１が句読点の直後の文字間隔に対応する次の文字までの距離であるか否かが判断される。ステップＳ２０７ｈにおいて、Ａ１又はＢ１が句読点の直後の文字間隔に対応する次の文字までの距離であると判断されない場合には、ステップＳ２０７ｉに進む。そして、ステップＳ２０７ｈにおいて、Ａ１又はＢ１が句読点の直後の文字間隔に対応する次の文字までの距離であると判断された場合には、ステップＳ２０６ｍに進む。 In step S207h, based on the punctuation mark position information of the page description data A, it is determined whether A1 or B1 is the distance to the next character corresponding to the character spacing immediately after the punctuation mark. If it is not determined in step S207h that A1 or B1 is the distance to the next character corresponding to the character interval immediately after the punctuation mark, the process proceeds to step S207i. If it is determined in step S207h that A1 or B1 is the distance to the next character corresponding to the character spacing immediately after the punctuation mark, the process proceeds to step S206m.

ステップＳ２０７ｉにおいて、埋め込むべき透かしビットを選択する。次に、ステップＳ２０７ｊにおいて、埋め込んだ透かしビットが１であるか否かを判断する。ステップＳ２０７ｊで埋め込んだ透かしビットが１であると判断された場合には、ステップＳ２０７ｋに進み、埋め込んだ透かしビットが１ではないと判断された場合には、ステップＳ２０７ｌに進む。 In step S207i, a watermark bit to be embedded is selected. Next, in step S207j, it is determined whether or not the embedded watermark bit is 1. If it is determined that the watermark bit embedded in step S207j is 1, the process proceeds to step S207k. If it is determined that the embedded watermark bit is not 1, the process proceeds to step S207l.

ステップＳ２０７ｋにおいて、Ｂ１に対応した文字間隔よりもＡ１に対応した文字間隔の方が大きくなるように、埋め込み後の次の文字までの距離Ａ２、Ｂ２をＡ２＝Ａ１＋γ、Ｂ２＝Ｂ２−γとして算出し、透かし情報の埋め込みを行う。一方、ステップＳ２０７ｌにおいて、Ｂ１に対応した文字間隔よりもＡ１に対応した文字間隔の方が小さくなるように、埋め込み後の次の文字までの距離Ａ２、Ｂ２をＡ２＝Ａ１−γ、Ｂ２＝Ｂ２＋γとして算出し、透かし情報の埋め込みを行う。 In step S207k, distances A2 and B2 to the next character after embedding are calculated as A2 = A1 + γ and B2 = B2-γ so that the character spacing corresponding to A1 is larger than the character spacing corresponding to B1. Then, watermark information is embedded. On the other hand, in step S207l, the distances A2 and B2 to the next character after embedding are set to A2 = A1-γ and B2 = B2 + γ so that the character spacing corresponding to A1 is smaller than the character spacing corresponding to B1. And watermark information is embedded.

一方、ステップＳ２０７ｍにおいて、Ａ２＝Ａ１、Ｂ２＝Ｂ１とする。すなわち、透かし情報の埋め込みは行なわれない。 On the other hand, in step S207m, A2 = A1 and B2 = B1. That is, watermark information is not embedded.

最後に、ステップＳ２０７ｎにおいて、ページ記述データ上の文字情報が最後であるか否かを判断する。ステップＳ２０７ｎで、ページ記述データ上の文字情報が最後であると判断された場合には、ステップＳ２０７が終了し、ページ記述データ上の文字情報が最後でないと判断された場合には、ステップＳ２０７ｇに戻る。 Finally, in step S207n, it is determined whether or not the character information on the page description data is the last. If it is determined in step S207n that the character information on the page description data is the last, step S207 is ended. If it is determined that the character information on the page description data is not the last, the process proceeds to step S207g. Return.

図１２は、第２の実施形態に係る透かし情報抽出部７０３の動作手順を示すフローチャートである。透かし情報抽出装置７００は、第１実施形態と同様であるが、抽出処理において、句読点に関する処理を追加する。 FIG. 12 is a flowchart showing an operation procedure of the watermark information extraction unit 703 according to the second embodiment. The watermark information extraction apparatus 700 is the same as that in the first embodiment, but adds processing related to punctuation marks in the extraction processing.

まず、ステップＳ１２０１において、透かし情報が埋め込まれた文書画像を読込む。スキャナ等の画像入力部７０１から入力される。 First, in step S1201, a document image in which watermark information is embedded is read. Input from an image input unit 701 such as a scanner.

ステップＳ１２０２において、文書画像から外接矩形を検出する。ステップＳ１２０３において、外接矩形のサイズ等に基づいて、句読点を検出する。ステップＳ１２０２及びステップＳ１２０３の処理は、検出部７０２で行われる。 In step S1202, a circumscribed rectangle is detected from the document image. In step S1203, punctuation marks are detected based on the size of the circumscribed rectangle. The processing in steps S1202 and S1203 is performed by the detection unit 702.

ステップＳ１２０４において、前述した句読点の直後の文字間隔を含む文字間隔の組を除いた文字間隔を検出部７０２で算出する。算出方法は、埋め込み部１０７の処理と同様である。 In step S1204, the detection unit 702 calculates the character spacing excluding the character spacing set including the character spacing immediately after the punctuation mark described above. The calculation method is the same as the processing of the embedding unit 107.

ステップＳ１２０５において、透かし情報抽出部７０３で透かし情報を抽出する。この抽出処理は、隣接する前後の文字間隔の組Ｐ１、Ｓ１の大小関係によって、１組毎に１ビット（０又は１）の情報が抽出される。 In step S1205, the watermark information extraction unit 703 extracts watermark information. In this extraction process, 1-bit (0 or 1) information is extracted for each set depending on the size relationship between adjacent sets P1 and S1 of the character spacing before and after.

なお、上述の第１、第２の実施形態では、隣接する前後の文字間隔の大小関係を調整することで透かし情報を埋め込んだ。しかし、各文字間隔を閾値と比較することにより透かし情報を挿入するか否かを判定する場合には、各々の対応する文書画像内の文字間隔の情報のみを付加するだけで本発明を適用することができる。 In the first and second embodiments described above, watermark information is embedded by adjusting the size relationship between adjacent character spacings. However, when determining whether or not to insert watermark information by comparing each character interval with a threshold value, the present invention is applied only by adding only the character interval information in each corresponding document image. be able to.

＜第３の実施形態＞
本実施形態では、第１の実施形態に係る種々の処理をコンピュータに実行させる。図１３は、第３の実施形態に係るコンピュータの基本的な構成図である。例えば、コンピュータにおいて、全ての機能を実行する場合には、各機能構成をコンピュータプログラムで表現し、コンピュータに読込ませることによって、コンピュータで第１の実施形態の全ての機能を実現することができる。 <Third Embodiment>
In the present embodiment, the computer executes various processes according to the first embodiment. FIG. 13 is a basic configuration diagram of a computer according to the third embodiment. For example, when all functions are executed in a computer, all the functions of the first embodiment can be realized by a computer by expressing each function configuration by a computer program and causing the computer to read the functions.

ＣＰＵ１３０１は、ＲＡＭ１３０２やＲＯＭ１３０３に格納されているコンピュータプログラムやデータを用いて、コンピュータ全体を制御する。また、ＣＰＵ１３０１は、上記の第１実施形態及び第２実施形態で説明した各処理を行う。 The CPU 1301 controls the entire computer using computer programs and data stored in the RAM 1302 and the ROM 1303. The CPU 1301 performs each process described in the first embodiment and the second embodiment.

ＲＡＭ１３０２は、外部記憶装置１３０８から読込まれたコンピュータプログラムやデータ、他のコンピュータシステム１３１４からＩ／Ｆ（インターフェース）１３１５を介して、ダウンロードしたプログラムやデータを一時的に記憶する記憶領域を備える。また、ＲＡＭ１３０２は、ＣＰＵ１３０１が各種の処理を行うために必要な処理領域を備える。 The RAM 1302 includes a storage area for temporarily storing computer programs and data read from the external storage device 1308 and programs and data downloaded from other computer systems 1314 via an I / F (interface) 1315. The RAM 1302 includes a processing area necessary for the CPU 1301 to perform various processes.

ＲＯＭ１３０３は、コンピュータの機能プログラムや設定データ等を記憶する。ディスプレイ制御装置１３０４は、画像や文字等をディスプレイ１３０５に表示させるための制御処理を行う。ディスプレイ１３０５は、ＣＲＴや液晶画面等の表示装置であり、画像や文字等を表示する。 The ROM 1303 stores computer function programs, setting data, and the like. The display control device 1304 performs control processing for displaying an image, characters, and the like on the display 1305. A display 1305 is a display device such as a CRT or a liquid crystal screen, and displays images, characters, and the like.

操作入力デバイス１３０６は、キーボードやマウス等、ＣＰＵ１３０１に各種の指示を入力することのできるデバイスにより構成される。Ｉ／Ｏ１３０７は、操作入力デバイス１３０６を介して入力された各種の指示等をＣＰＵ１３０１に通知する。 The operation input device 1306 is configured by a device that can input various instructions to the CPU 1301 such as a keyboard and a mouse. The I / O 1307 notifies the CPU 1301 of various instructions input via the operation input device 1306.

外部記憶装置１３０８は、ハードディスク等の大容量情報記憶装置として機能し、ＯＳ（オペレーティングシステム）や上記各実施形態に係る処理をＣＰＵ１３０１に実行させるためのコンピュータプログラム、入出力原稿画像等を記憶する。外部記憶装置１３０８への情報の書き込みや外部記憶装置１３０８からの情報の読み出しは、Ｉ／Ｏ１３０９を介して行われる。 The external storage device 1308 functions as a large-capacity information storage device such as a hard disk, and stores an OS (Operating System), a computer program for causing the CPU 1301 to execute the processing according to each of the above embodiments, an input / output document image, and the like. Writing information to the external storage device 1308 and reading information from the external storage device 1308 are performed via the I / O 1309.

プリンタ１３１０は、文書や画像を出力する。出力データは、Ｉ／Ｏ１３１３を介してＲＡＭ１３０２、もしくは外部記憶装置１３０８から送信される。なお、プリンタとしては、例えば、インクジェットプリンタ、レーザビームプリンタ、熱転写型プリンタ、ドットインパクトプリンタ等が挙げられる。 The printer 1310 outputs documents and images. Output data is transmitted from the RAM 1302 or the external storage device 1308 via the I / O 1313. Examples of the printer include an ink jet printer, a laser beam printer, a thermal transfer printer, and a dot impact printer.

スキャナ１３１２は、文書や画像を読み取る。入力データは、Ｉ／Ｏ１３１３を介してＲＡＭ１３０２、もしくは外部記憶装置１３０８に送信される。 The scanner 1312 reads a document or an image. Input data is transmitted to the RAM 1302 or the external storage device 1308 via the I / O 1313.

バス１３１６は、ＣＰＵ１３０１、ＲＯＭ１３０３、ＲＡＭ１３０２、Ｉ／Ｏ１３１１、Ｉ／Ｏ１３０９、ディスプレイ制御装置１３０４、Ｉ／Ｆ１３１５、Ｉ／Ｏ１３０７、Ｉ／Ｏ１３１３をつないでネットワークを形成する。 A bus 1316 connects the CPU 1301, ROM 1303, RAM 1302, I / O 1311, I / O 1309, display control device 1304, I / F 1315, I / O 1307, and I / O 1313 to form a network.

なお、本実施形態では、コンピュータが、スキャナやプリンタでの処理を除く処理を行っているが、スキャナやプリンタが、内部に設けられた専用のハードウェア回路を用いて、コンピュータで行う処理を代わりに行ってもよい。 In this embodiment, the computer performs processing excluding the processing by the scanner or printer. However, the scanner or printer uses a dedicated hardware circuit provided in the interior to replace the processing performed by the computer. You may go to

なお、上記第１乃至第３実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならない。すなわち、本発明は、その技術思想、又はその主要な特徴から逸脱することなく、様々な形態で実施することができる。 The first to third embodiments are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be interpreted in a limited manner. . That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

＜その他の実施形態＞
なお、本実施形態は、複数の機器（例えば、ホストコンピュータ、インターフェース機器、リーダ、プリンタ等）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機、複合機、ファクシミリ装置等）に適用してもよい。 <Other embodiments>
Note that this embodiment can be applied to a system including a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), but an apparatus (for example, a copier, a multifunction device, The present invention may be applied to a facsimile machine or the like.

また、前述の実施形態では、これらの機能を実現するソフトウェアのプログラムコードを記録したコンピュータ可読記録媒体（又は記憶媒体）をシステム又は装置に供給する。この場合に、そのシステム又は装置のコンピュータ（又はＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読込み実行することによっても達成されることは言うまでもない。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本実施形態を構成することになる。 In the above-described embodiment, a computer-readable recording medium (or storage medium) that records program codes of software that realizes these functions is supplied to the system or apparatus. In this case, it goes without saying that this can also be achieved by the computer (or CPU or MPU) of the system or apparatus reading and executing the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present embodiment.

また、本発明はこれだけでなく、そのプログラムコードの指示に基づいて、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）等が実際の処理の一部又は全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれる。 The present invention is not limited to this, and an operating system (OS) or the like running on a computer performs part or all of the actual processing based on an instruction of the program code. The case where the function is realized is also included.

さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれる。その後、そのプログラムコードの指示に基づいて、その機能拡張カードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部又は全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, the program code read from the recording medium is written in a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer. Thereafter, the CPU of the function expansion card or function expansion unit performs part or all of the actual processing based on the instruction of the program code, and the functions of the above-described embodiments may be realized by the processing. Needless to say, it is included.

本実施形態を上記記録媒体に適用する場合、その記録媒体には、先に説明したフローチャートや機能構成に対応するプログラムコードが格納されることになる。 When this embodiment is applied to the recording medium, the recording medium stores program codes corresponding to the flowcharts and functional configurations described above.

第１の実施形態及び第２の実施形態に係る透かし情報埋め込み装置１００の機能構成図である。It is a functional block diagram of the watermark information embedding device 100 which concerns on 1st Embodiment and 2nd Embodiment. 第１の実施形態及び第２の実施形態に係る透かし情報埋め込み装置１００の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the watermark information embedding apparatus 100 which concerns on 1st Embodiment and 2nd Embodiment. 文字間隔の組の例を示す図である。It is a figure which shows the example of the group of a character spacing. 第１の実施形態に係る正規化情報算出部１０４の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the normalization information calculation part 104 which concerns on 1st Embodiment. 第１の実施形態に係る変更部１０５の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the change part 105 which concerns on 1st Embodiment. 第１の実施形態におけるページ記述データの変化例を示す図である。It is a figure which shows the example of a change of the page description data in 1st Embodiment. 第１の実施形態及び第２の実施形態に係る透かし情報抽出装置７００の機能構成図である。It is a functional block diagram of the watermark information extraction apparatus 700 which concerns on 1st Embodiment and 2nd Embodiment. 第１の実施形態に係る透かし情報抽出部７０４の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the watermark information extraction part 704 which concerns on 1st Embodiment. 句読点前後における文字間隔の組の例を示す図である。It is a figure which shows the example of the group of the character space | interval before and behind a punctuation mark. 第２の実施形態に係る変更部１０５の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the change part 105 which concerns on 2nd Embodiment. 第２の実施形態に係る埋め込み部１０７の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the embedding part 107 which concerns on 2nd Embodiment. 第２の実施形態に係る透かし情報抽出部７０４の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the watermark information extraction part 704 which concerns on 2nd Embodiment. 第３の実施形態に係るコンピュータの基本的な構成図である。It is a basic lineblock diagram of a computer concerning a 3rd embodiment. 第１の実施形態におけるページ記述データの文字と次の文字までの距離の関係を示す図である。It is a figure which shows the relationship between the distance of the character of page description data in 1st Embodiment, and the next character. 第１の実施形態に係る埋め込み部１０７の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the embedding part 107 which concerns on 1st Embodiment. 第１の実施形態の変形例に係る正規化情報算出部１０４の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the normalization information calculation part 104 which concerns on the modification of 1st Embodiment. ページ記述データと文書画像における各行の座標を示す図である。It is a figure which shows the coordinate of each line in page description data and a document image.

Claims

An information processing apparatus for embedding information in input electronic document data,
Image generating means for generating a document image from the electronic document data;
Detecting means for detecting layout information of each component image in the generated document image;
Calculation means for calculating normalization information for normalizing the arrangement of each component image based on the detected layout information;
An embedding means for changing the electronic document data based on the calculated normalization information and embedding information in the changed electronic document data;
An information processing apparatus comprising:

The information processing apparatus according to claim 1, wherein the image generation unit generates the document image at a resolution at which the detection unit can detect the layout information.

The layout information includes information on the position and size of each component image,
The information processing apparatus according to claim 1, wherein the calculation unit calculates an interval between the component images based on the layout information.

Each of the component images is a circumscribed rectangle of each character image,
The information processing apparatus according to claim 3, wherein the detection unit calculates an interval between circumscribed rectangles of the character images based on the layout information.

The calculation means calculates an average value of the character spacing between the noticed rectangle that is the circumscribed rectangle to be noticed and the circumscribed rectangle adjacent to the noticed rectangle based on the character spacing calculated by the detecting means, The information processing apparatus according to claim 4, wherein each character interval calculates normalization information necessary to obtain the average value.

The information processing apparatus according to claim 5, wherein the embedding unit changes information corresponding to at least one of a position or a size of the circumscribed rectangle when changing the electronic document data.

The information processing apparatus according to claim 1, further comprising means for changing the electronic document data so as to correspond to the normalized information calculated on the document image.

8. The information according to claim 1, wherein the embedding unit embeds the information by adjusting a character interval of each character in the electronic document data based on predetermined information. Information processing device.

An information processing method for embedding information in input electronic document data,
An image generation step of generating a document image from the electronic document data;
A detection step of detecting layout information of each component image in the generated document image;
A calculation step of calculating normalization information for normalizing each component image based on the detected layout information;
An embedding step for changing the electronic document data based on the calculated normalization information and embedding information in the changed electronic document data;
An information processing method characterized by comprising:

9. A computer program that causes a computer to function as the information processing apparatus according to claim 1 by being read and executed by the computer.

A computer-readable recording medium storing the computer program according to claim 10.