JP2007174181A

JP2007174181A - Document processing apparatus, document processing method, and document processing program

Info

Publication number: JP2007174181A
Application number: JP2005367921A
Authority: JP
Inventors: Kagenori Nagao; 景則長尾
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-12-21
Filing date: 2005-12-21
Publication date: 2007-07-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document processing apparatus, a document processing method, and a document processing program capable of recording attached information with high invisibility, high indivisibility from contents, and high robustness against repetitive copying to a paper document. <P>SOLUTION: A parameter value sequence extract section 401 extracts a parameter value sequence associated with a document style from a character sequence on a document received from a first document input section 10 as an information embedding object, a code sequence group generating section 402 generates a reference code sequence and an information embedding code sequence, an information coding section 403 represents embedding information received from an embedding information input section 30 as a phase difference between the reference code sequence and the information embedding code sequence, and the sum of both the code sequences is used for coding information sequence. Then a correction parameter value sequence generating section 404 composes the coding information sequence with the parameter value sequence to produce a corrected parameter value sequence, and a document correction section 405 corrects document data by the corrected parameter value sequence to embed information to the document. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、文書処理装置、文書処理方法および文書処理プログラムに関し、特に複写機、プリンタ、ファクシミリ装置、またはこれらの機能を複合的に備えた複合機等の画像形成装置に用いて好適な文書処理装置、文書処理方法および文書処理プログラムに関する。 The present invention relates to a document processing apparatus, a document processing method, and a document processing program, and particularly suitable for use in an image forming apparatus such as a copying machine, a printer, a facsimile machine, or a multifunction machine having these functions combined. The present invention relates to an apparatus, a document processing method, and a document processing program.

近年、オフィスでは、文書を電子データにより保存する電子化が一般化し、さには、電子データを紙文書化する複写機やプリンタ等の画像形成装置も広く普及している。それに伴い、顧客名簿や開発中の製品の情報等といった機密性の高い情報が企業外部に流出するなど、企業内部からの機密情報漏洩が深刻な問題となっている。特に、電子文書（電子データ）は原本と完全に同一の複製が容易に作成でき、外部への持ち出しもネットワークを使用すれば非常に容易にできるため、かつては企業における文書管理上の大きな問題となっていた。 In recent years, in offices, computerization of storing documents as electronic data has become commonplace, and image forming apparatuses such as copiers and printers that convert electronic data into paper documents have become widespread. Along with this, leakage of confidential information from inside the company has become a serious problem, with highly confidential information such as customer lists and information on products under development leaking out of the company. In particular, electronic documents (electronic data) can be easily duplicated exactly as the original, and can be taken out to the outside very easily using a network. It was.

しかし、近年の暗号化技術、認証技術、アクセス制限技術等の進歩により、適切な対策を講じれば電子文書の持ち出しは比較的困難になってきている。それに対して、電子文書は一旦紙の上に印刷され、紙文書化されてしまうと、その複写や社外への持ち出しを防止する術が無い。そのため、電子文書に比べると紙文書の機密漏洩対策の進展は遅れが目立っているのが現状である。 However, due to recent advances in encryption technology, authentication technology, access restriction technology, etc., it is relatively difficult to take out electronic documents if appropriate measures are taken. On the other hand, once an electronic document is printed on paper and converted into a paper document, there is no way to prevent copying or taking it out of the office. Therefore, the current situation is that the progress of countermeasures against the leakage of confidential information of paper documents is more noticeable than that of electronic documents.

紙文書に対する機密漏洩対策の一つとして、紙文書にあらかじめ追跡情報を付加しておく技術がある。この技術は、電子文書をプリント出力して紙文書化した際の出力者情報や出力機器情報、出力日時などの情報を紙文書上に記録しておくものである。この追跡情報の付加技術により、機密情報を印刷した紙文書が社外に流出しても、紙文書上に記録された追跡情報から流出元を特定することが可能になる。 As one of countermeasures against confidential leakage of a paper document, there is a technique for adding tracking information to the paper document in advance. In this technology, information such as output person information, output device information, output date and time when an electronic document is printed out and converted into a paper document is recorded on the paper document. With this tracking information addition technology, even if a paper document on which confidential information is printed flows out of the company, the outflow source can be identified from the tracking information recorded on the paper document.

このような追跡情報においては、不可視性およびコンテンツとの不可分性が高いものであることが望まれる。例えば、追跡情報をバーコードのような可視性の高い記録方法で記録すると、紙文書上のどこに追跡情報が記録されているかが明らかであるため、記録した追跡情報を解読・改ざんされる恐れが高くなる。また、バーコードはコンテンツと可分であるため、バーコード部を切り取ったり塗りつぶしたりすることにより、流出元の特定を容易に妨害することが可能である。 Such tracking information is desired to be highly invisible and inseparable from the content. For example, when tracking information is recorded with a highly visible recording method such as a barcode, it is clear where the tracking information is recorded on the paper document, so there is a risk that the recorded tracking information may be decrypted or altered. Get higher. Further, since the barcode is separable from the content, it is possible to easily prevent the outflow source from being identified by cutting or painting the barcode portion.

また、追跡情報としては、繰り返しコピーに対する耐性が高いものであることも望まれる。何故ならば、流出した機密情報は原本のまま流通するだけでなく、それをコピーした複製物として、さらに二次コピー、三次コピーした複製物として出回ることも考えられるからである。 The tracking information is also desired to be highly resistant to repeated copying. This is because the leaked confidential information is not only distributed as it is, but it can also be distributed as a duplicate copy of the original, as a secondary copy or as a tertiary copy.

従来、紙文書に対して追跡情報等の付加情報を記録する方法の中で不可視性の高い技術として、例えば、文字位置をシフトして文字の前後の空白長を変化させることによって情報を埋め込む技術が知られている（例えば、特許文献１参照）。 Conventionally, as a technique with high invisibility in a method of recording additional information such as tracking information on a paper document, for example, a technique of embedding information by shifting the character position and changing the blank length before and after the character Is known (see, for example, Patent Document 1).

この従来技術では、情報ビットを埋め込む対象となる文字の前後の空白長をＰ，Ｓとした場合、情報ビットが“1”ならＰ＞Ｓ、“０”ならＰ＜Ｓとなるように文字位置をシフトさせることにより情報を埋め込む。この操作を、埋め込み情報のビット数に相当する数の文字に対して行うことにより、文書に付加情報を埋め込む。埋め込みの際、句読点前後の文字位置をシフトさせると目に付きやすいため、句読点周辺の文字は情報埋め込みに使用しないなどの工夫もなされている。 In this prior art, when the space lengths before and after the character to be embedded with information bits are P and S, the character position is such that P> S if the information bit is “1”, and P <S if the information bit is “0”. Information is embedded by shifting. Additional information is embedded in the document by performing this operation on a number of characters corresponding to the number of bits of the embedded information. At the time of embedding, if the character positions before and after the punctuation marks are shifted, it is easy to see them.

また、文字パターンのサイズやフォント種別など、文字パターンの形状を変更することで情報を埋め込む技術も知られている（例えば、特許文献２参照）。 A technique for embedding information by changing the shape of a character pattern, such as the size of the character pattern and the font type, is also known (see, for example, Patent Document 2).

この従来技術では、通常の文書において出現頻度の高い、「の」や「は」などの文字を情報埋め込み対象文字とする。情報は対象文字に対して文書の先頭から順番に埋め込んでいく。情報ビットが“1”の場合のみ、別のフォントに置き換えたり、文字幅を１０％程度広げたりすることで埋め込み情報を表現し、情報ビットが“０”の場合は何もしない。埋め込んだ情報を復元する際は、情報埋め込み前の元文書と比較し、文字パターンが異なる場合は1、同一の場合は０が埋め込まれているものと解釈する。 In this conventional technique, characters such as “no” and “ha”, which frequently appear in a normal document, are set as information embedding target characters. Information is embedded in the target characters in order from the beginning of the document. Only when the information bit is “1”, the embedded information is expressed by replacing it with another font or increasing the character width by about 10%. When the information bit is “0”, nothing is done. When restoring the embedded information, it is compared with the original document before the information is embedded, and it is interpreted that 1 is embedded if the character pattern is different and 0 is embedded if they are the same.

特開２００３−２３０００１号公報（特に、第５〜７頁参照）JP2003-230001 (in particular, refer to pages 5-7) 特開２００１−２５１４９０号公報（特に、第４〜６頁参照）Japanese Patent Laid-Open No. 2001-251490 (see particularly, pages 4 to 6)

しかしながら、上記の特許文献１，２記載の情報埋め込み方法では、何度もコピーを繰り返した紙文書に対して、所期の機能である情報追跡機能を十分に発揮させることができない。何故ならば、何度もコピーを繰り返した紙文書では、文字の太りや細り、あるいは文字上のノイズや荒れといった画質劣化が生じるからである。 However, with the information embedding methods described in Patent Documents 1 and 2, the information tracking function, which is an expected function, cannot be sufficiently exerted on paper documents that have been copied many times. This is because, in a paper document that has been copied many times, image quality deterioration such as character thickening or thinning, or noise or roughness on the character occurs.

例えば、文字位置のシフト量に情報を埋め込む特許文献１記載の方法では、繰り返しコピーによる画質劣化によって文字間の空白長が変化してしまい、埋め込んだ情報が正しく復元できない場合が生じる。これを避けるには文字位置のシフト量を大きくして、空白長が多少変化しても文字前後の空白長の大小関係が維持されるようにする必要がある。しかし、文字位置のシフト量を大きくすると、文書に加えた変更が視覚的に明らかになってしまい、埋め込み情報の不可視性が損なわれることになる。 For example, in the method described in Patent Document 1 in which information is embedded in the shift amount of the character position, the space length between characters changes due to image quality deterioration due to repeated copying, and the embedded information may not be restored correctly. In order to avoid this, it is necessary to increase the shift amount of the character position so that the size relationship of the space length before and after the character is maintained even if the space length changes slightly. However, if the shift amount of the character position is increased, the change made to the document becomes visually apparent, and the invisibility of the embedded information is impaired.

また、文字列が見た目に自然に見えるように前後の文字に合わせて文字間隔を調節した文書、所謂カーニングの施された文書では、文字間隔が一定ではないために、元々Ｐ＞Ｓの関係にあった文字前後の空白長を、Ｐ＜Ｓとなるように変更しようとすると文字位置を大きくシフトする必要があり、やはり埋め込み情報の不可視性が損なわれることになる。 In addition, in a document in which the character spacing is adjusted according to the characters before and after the character string so that the character string looks natural, that is, a document subjected to so-called kerning, the character spacing is not constant. If an attempt is made to change the blank length before and after a certain character so that P <S, the character position must be shifted greatly, and the invisibility of the embedded information is also lost.

同様に、特許文献２記載の方法においても、繰り返しコピーによる画質劣化に対する耐性は低い。すなわち、画質が劣化した紙文書からも文字パターンのサイズやフォント種別などの形状の違いを判別するためには、文字サイズを大きく変更したり、形状が大きく異なるフォントを使用したりする必要がある。しかし、文字パターンの形状を大きく変更すると、文書に加えた変更が視覚的に明らかになってしまうため、埋め込み情報の不可視性が損なわれることになる。また、「の」、「は」などの埋め込み対象文字の出現頻度が小さい文書には情報を埋め込むことができないという問題もある。 Similarly, the method described in Patent Document 2 has low resistance to image quality degradation due to repeated copying. In other words, in order to determine the difference in shape such as the character pattern size and font type from a paper document whose image quality has deteriorated, it is necessary to change the character size greatly or use fonts with greatly different shapes. . However, if the shape of the character pattern is greatly changed, the change made to the document becomes visually apparent, and the invisibility of the embedded information is lost. In addition, there is a problem that information cannot be embedded in a document in which the frequency of appearance of characters to be embedded such as “no” and “ha” is small.

本発明は、上記課題に鑑みてなされたものであって、その目的とするところは、紙文書に対して不可視性およびコンテンツとの不可分性が高く、しかも繰り返しコピーに対するロバスト性の高い付加情報の記録を可能とした文書処理装置、文書処理方法および文書処理プログラムを提供することにある。 The present invention has been made in view of the above problems, and the object of the present invention is to provide additional information that is highly invisible and inseparable from content with respect to a paper document and has high robustness to repeated copying. An object is to provide a document processing apparatus, a document processing method, and a document processing program that enable recording.

上記目的を達成するために、本発明では、文字列からなる文書データに対して付加情報を埋め込む際に、入力された前記文書データから前記文字列における文書体裁に関するパラメータ値系列を抽出し、基準符号系列と情報埋め込み用符号系列とを生成し、入力された前記付加情報を前記基準符号系列と前記情報埋め込み用符号系列との間の位相差で表した符号化情報系列として生成し、前記パラメータ値系列に対して前記符号化情報系列を合成して修正パラメータ値系列を生成し、前記修正パラメータ値系列によって前記文書データを修正する。 In order to achieve the above object, in the present invention, when embedding additional information in document data consisting of a character string, a parameter value series relating to the document appearance in the character string is extracted from the input document data, Generating a code sequence and an information embedding code sequence, generating the input additional information as an encoded information sequence represented by a phase difference between the reference code sequence and the information embedding code sequence; A modified parameter value series is generated by synthesizing the encoded information series with a value series, and the document data is modified by the modified parameter value series.

このようにして文書に付加情報を埋め込むことで、文書のデジタルデータだけでなく、紙に印刷された文字文書中にも、追跡情報などの付加情報を不可視性を高めた状態で埋め込むことができるとともに、文書上の文字列から得られる文書体裁に関する文書コンテンツ自体のパラメータ値に直接付加情報を埋め込むことができる。また、文字画像そのものに付加情報が埋め込まれるため、繰り返しコピーをとった場合でも、文字そのものが消えてしまわない限り埋め込み情報が保存される。また、基準符号系列と情報埋め込み用符号系列の位相差として情報を埋め込んでいるため、埋め込み情報の復元時に行われる文字の切り出し処理を行う際に、文字の切り出し誤りが生じても、埋め込まれた基準符号系列と情報埋め込み用符号系列の位相差、即ち埋め込み情報は不変である。 By embedding additional information in the document in this way, it is possible to embed additional information such as tracking information in an invisible state, not only in the digital data of the document but also in a text document printed on paper. At the same time, the additional information can be directly embedded in the parameter value of the document content itself relating to the document format obtained from the character string on the document. Further, since the additional information is embedded in the character image itself, even if the copy is repeatedly made, the embedded information is stored as long as the character itself does not disappear. In addition, since information is embedded as a phase difference between the reference code sequence and the information embedding code sequence, even if a character cutout error occurs when the character cutout process is performed when the embedded information is restored, the information is embedded. The phase difference between the reference code sequence and the information embedding code sequence, that is, the embedding information is unchanged.

本発明によれば、紙文書に対して不可視性およびコンテンツとの不可分性が高く、しかも繰り返しコピーをとった場合でも、文字そのものが消えてしまわない限り埋め込み情報が保存されるため、繰り返しコピーに対してロバスト性が高い付加情報の記録が可能となる。また、文字の切り出し処理を行う際に、文字の切り出し誤りが生じても、埋め込まれた基準符号系列と情報埋め込み用符号系列の位相差は不変であるため、文字の切り出し誤りが生じた場合であっても埋め込み情報を確実に復元できる。 According to the present invention, the invisibility and the inseparability from the content are high with respect to the paper document, and even if the copy is repeatedly made, the embedded information is preserved as long as the characters themselves do not disappear. On the other hand, it is possible to record additional information with high robustness. In addition, when performing character segmentation processing, even if a character segmentation error occurs, the phase difference between the embedded reference code sequence and the information embedding code sequence remains unchanged. Even if it exists, the embedded information can be reliably restored.

以下、本発明の実施の形態について図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の一実施形態に係る文書処理装置の構成の概略を示すブロック図である。図１に示すように、本実施形態に係る文書処理装置１は、第１文書入力部１０、第２文書入力部２０、埋め込み情報入力部３０、処理制御部４０、文書出力部５０および情報出力部６０を備えた構成となっている。 FIG. 1 is a block diagram showing an outline of the configuration of a document processing apparatus according to an embodiment of the present invention. As shown in FIG. 1, the document processing apparatus 1 according to the present embodiment includes a first document input unit 10, a second document input unit 20, an embedded information input unit 30, a process control unit 40, a document output unit 50, and an information output. The configuration includes the portion 60.

第１文書入力部１０は、付加情報の埋め込み対象となる文書データを処理制御部４０に入力するためのものである。この第１文書入力部１０から処理制御部４０に入力される文書データとしては、ＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式の電子文書データ、あるいはスキャナやデジタルカメラにより撮影された文書のラスタデータ等が挙げられる。 The first document input unit 10 is for inputting document data to be embedded with additional information to the processing control unit 40. Document data input from the first document input unit 10 to the processing control unit 40 includes page description language such as PostScript, electronic document data in a word processor output format, or raster data of a document photographed by a scanner or a digital camera. Is mentioned.

第１文書入力部１０は、例えば、文書データが蓄積されたハードディスクドライブ、ＤＶＤ（Digital Video Disc）−ＲＡＭ／±ＲＷ／±Ｒドライブ等の大容量記憶装置、ネットワーク等を介してデータの授受を行うデータ転送装置、あるいはスキャナ装置とその制御装置、デジタルカメラのメモリ（カード）に蓄積された画像を取り出すメモリリーダ装置とその制御装置等によって実現することが可能である。 The first document input unit 10 exchanges data via, for example, a hard disk drive storing document data, a mass storage device such as a DVD (Digital Video Disc) -RAM / ± RW / ± R drive, a network, or the like. It can be realized by a data transfer device to be performed, a scanner device and its control device, a memory reader device for taking out an image stored in a memory (card) of a digital camera, its control device, or the like.

第２文書入力部２０は、付加情報の埋め込まれた文書のラスタデータ等を処理制御部４０に入力するためのものである。文書のラスタデータとしては、スキャナやデジタルカメラにより撮影された紙文書の画像データが挙げられる。 The second document input unit 20 is for inputting raster data of a document in which additional information is embedded to the processing control unit 40. Examples of document raster data include image data of a paper document photographed by a scanner or a digital camera.

埋め込み情報入力部３０は、第１文書入力部１０から入力された文書データに埋め込むべき付加情報を処理制御部４０に入力するためのものである。この埋め込み情報入力部３０から処理制御部４０に入力される付加情報としては、数字や文字、ＵＲＬ(Uniform Resource Locator)、音声や画像等のマルチメディアデータなど、ディジタルデータであればいずれも使用することができる。 The embedded information input unit 30 is used to input additional information to be embedded in the document data input from the first document input unit 10 to the processing control unit 40. As the additional information input from the embedded information input unit 30 to the processing control unit 40, any digital data such as numbers, characters, URL (Uniform Resource Locator), multimedia data such as voice and images, and the like is used. be able to.

埋め込み情報入力部３０は、例えば、文字等を入力するためのキーボードや、マルチメディアデータ等を入力するためのハードディスクドライブ装置（ＨＤＤ）、ＤＶＤ（Digital Versatile Disk）−ＲＡＭ／±ＲＷ／±Ｒドライブ等の大容量記憶装置、ネットワーク等を介して情報の授受を行うデータ転送装置によって実現することが可能である。 The embedded information input unit 30 includes, for example, a keyboard for inputting characters and the like, a hard disk drive device (HDD) for inputting multimedia data, and a DVD (Digital Versatile Disk) -RAM / ± RW / ± R drive. It can be realized by a large-capacity storage device such as a data transfer device that exchanges information via a network or the like.

処理制御部４０は、ＣＰＵ(Central Processing Unit:中央演算装置)４１、Ｉ／Ｏ回路４２、ＲＯＭ４３、ＲＡＭ４４およびＨＤＤ（ハードディスクドライブ装置）４５を有し、これら構成要素がバスライン４６を介して相互に通信可能に接続された構成となっている。 The processing control unit 40 includes a CPU (Central Processing Unit) 41, an I / O circuit 42, a ROM 43, a RAM 44, and an HDD (Hard Disk Drive Device) 45, and these components are mutually connected via a bus line 46. It is the structure connected so that communication was possible.

この処理制御部４０において、ＣＰＵ４１は、演算処理を含む本処理制御部４０全体の処理の制御を行う。Ｉ／Ｏ回路４２は、第１文書入力部１０、第２文書入力部２０や埋め込み情報入力部３０、さらには文書出力部５０や情報出力部６０の周辺機器との入出力を管理する。ＲＯＭ４３は、ＣＰＵ４１による制御の下に実行される各種処理の処理プログラムを格納する。ＲＡＭ４４は、当該各種処理の実行時に使用される一次記憶装置である。ＨＤＤ４５は、ＣＰＵ４１による制御の下に処理された文書データを記憶する。 In the process control unit 40, the CPU 41 controls the entire process including the calculation process. The I / O circuit 42 manages input / output to / from peripheral devices such as the first document input unit 10, the second document input unit 20, the embedded information input unit 30, and the document output unit 50 and the information output unit 60. The ROM 43 stores processing programs for various processes executed under the control of the CPU 41. The RAM 44 is a primary storage device used when executing the various processes. The HDD 45 stores document data processed under the control of the CPU 41.

文書出力部５０は、処理制御部４０で処理された文書データ等を所定の形式で出力するものである。文書出力部５０からは、例えば、紙に印刷された文書、またはＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式の電子文書データ、あるいは文書のラスタデータ形式のファイルが出力結果として出力される。そのため、文書出力部５０は、例えば、プリンタ装置とその制御装置、磁気ディスクやメモリカード等のリード／ライト装置とその制御装置、またはネットワーク等を介してデータの授受を行うデータ転送装置等によって実現される。 The document output unit 50 outputs the document data processed by the processing control unit 40 in a predetermined format. From the document output unit 50, for example, a document printed on paper, a page description language such as PostScript, electronic document data in a word processor output format, or a file in a raster data format of a document is output as an output result. For this reason, the document output unit 50 is realized by, for example, a printer device and its control device, a read / write device such as a magnetic disk or a memory card and its control device, or a data transfer device that exchanges data via a network or the like. Is done.

情報出力部６０は、処理制御部４０で復元された付加情報を出力するものであり、例えば、ＣＲＴ(Cathode Ray Tube)とその制御装置、プリンタ装置とその制御装置、磁気ディスクやメモリカード等のリード／ライト装置とその制御装置、またはネットワーク等を介してデータの授受を行うデータ転送装置によって構成される。 The information output unit 60 outputs additional information restored by the processing control unit 40. For example, a CRT (Cathode Ray Tube) and its control device, a printer device and its control device, a magnetic disk, a memory card, etc. The read / write device and its control device, or a data transfer device that exchanges data via a network or the like.

（情報埋め込み処理）
次に、上記構成の文書処理装置１において実行される、本発明の特徴とする情報埋め込み処理について説明する。この情報埋め込み処理は、処理制御部４０内に構築された情報埋め込み処理部４００で実行される。 (Information embedding process)
Next, information embedding processing, which is a feature of the present invention and is executed in the document processing apparatus 1 having the above-described configuration, will be described. This information embedding processing is executed by the information embedding processing unit 400 built in the processing control unit 40.

図２は、情報埋め込み処理部４００の機能構成の一例を示すブロック図である。図２に示すように、本例に係る情報埋め込み処理部４００は、パラメータ値系列抽出部４０１、符号系列群生成部４０２、情報符号化部４０３、修正パラメータ値系列生成部４０４および文書修正部４０５を備えた構成となっている。 FIG. 2 is a block diagram illustrating an example of a functional configuration of the information embedding processing unit 400. As shown in FIG. 2, the information embedding processing unit 400 according to this example includes a parameter value sequence extraction unit 401, a code sequence group generation unit 402, an information encoding unit 403, a correction parameter value sequence generation unit 404, and a document correction unit 405. It is the composition provided with.

パラメータ値系列抽出部４０１は、第１文書入力部１０から入力された電子文書データや文書のラスタデータを解析し、文書上の文字列から得られる文字間隔、文字位置、単語間隔、行間隔、文字高さ、文字幅、文字の傾き等の文書体裁に関するパラメータ値の系列を求めるものである。 The parameter value series extraction unit 401 analyzes the electronic document data input from the first document input unit 10 or the raster data of the document, and obtains a character interval, a character position, a word interval, a line interval obtained from a character string on the document, A series of parameter values related to the document appearance such as character height, character width, and character inclination is obtained.

符号系列群生成部４０２は、埋め込み情報入力部３０から入力される付加情報を符号化して符号化情報を生成するためのものであり、単一の基準符号系列と1乃至複数の情報埋め込み用符号系列の符号系列群を生成する。 The code sequence group generation unit 402 generates encoded information by encoding additional information input from the embedded information input unit 30, and includes a single reference code sequence and one or more information embedding codes. A code sequence group of the sequence is generated.

情報符号化部４０３は、埋め込み情報入力部３０から入力された付加情報を、符号系列群生成部４０２で生成された符号系列群によって符号化した符号化情報系列を生成する。この情報符号化部４０３では、基準符号系列と情報埋め込み用符号系列との間の位相差で付加情報を表し、この付加情報に応じた位相差分だけ基準符号系列に対して情報埋め込み用符号系列を位相シフトし、この位相シフト後の情報埋め込み用符号系列と基準符号系列とを加えて符号化情報系列とする。 The information encoding unit 403 generates an encoded information sequence obtained by encoding the additional information input from the embedded information input unit 30 with the code sequence group generated by the code sequence group generating unit 402. In this information encoding unit 403, additional information is represented by a phase difference between the reference code sequence and the information embedding code sequence, and an information embedding code sequence is added to the reference code sequence by a phase difference corresponding to the additional information. The phase shift is performed, and the information embedding code sequence and the reference code sequence after the phase shift are added to obtain an encoded information sequence.

修正パラメータ値系列生成部４０４は、パラメータ値系列抽出部４０１で抽出された入力文書に関するパラメータ値系列の修正を行うものである。この修正パラメータ値系列生成部４０４では、パラメータ値系列の各シンボル値を、情報符号化部４０３で求められた符号化情報系列の対応するシンボル値に応じて修正する（以下、この修正されたパラメータ値系列を「修正パラメータ値系列」と記述する。） The correction parameter value series generation unit 404 corrects the parameter value series related to the input document extracted by the parameter value series extraction unit 401. In this modified parameter value sequence generation unit 404, each symbol value of the parameter value sequence is modified according to the corresponding symbol value of the encoded information sequence obtained by the information encoding unit 403 (hereinafter, this modified parameter (The value series is described as “corrected parameter value series”.)

文書修正部４０５は、第１文書入力部１０から入力された文書の体裁を、修正パラメータ値系列生成部４０４で生成された修正パラメータ値系列の各シンボル値に応じて修正する。この文書修正部４０５では、修正後の文書上の文字列から得られる文字間隔、文字位置、単語間隔、行間隔、文字高さ、文字幅、文字の傾き等の文書体裁に関するパラメータ値が、修正パラメータ値系列の各シンボル値に等しくなるように修正が行われる。 The document correction unit 405 corrects the appearance of the document input from the first document input unit 10 in accordance with each symbol value of the correction parameter value series generated by the correction parameter value series generation unit 404. In the document correction unit 405, parameter values relating to the document appearance such as character spacing, character position, word spacing, line spacing, character height, character width, and character inclination obtained from the character string on the corrected document are corrected. Correction is performed so as to be equal to each symbol value of the parameter value series.

以上説明した情報埋め込み処理部４００の各構成要素、即ちパラメータ値系列抽出部４０１、符号系列群生成部４０２、情報符号化部４０３、修正パラメータ値系列生成部４０４および文書修正部４０５については、ＰＣ（パーソナルコンピュータ）のように、所定プログラムを実行することによって情報記憶処理機能、画像処理機能、演算処理機能等を実現するコンピュータ機器を利用して実現することが考えられる。 Regarding each component of the information embedding processing unit 400 described above, that is, the parameter value sequence extraction unit 401, the code sequence group generation unit 402, the information encoding unit 403, the correction parameter value sequence generation unit 404, and the document correction unit 405, the PC It can be realized using a computer device that realizes an information storage processing function, an image processing function, an arithmetic processing function, and the like by executing a predetermined program, such as a (personal computer).

その場合に、パラメータ値系列抽出部４０１、符号系列群生成部４０２、情報符号化部４０３、修正パラメータ値系列生成部４０４および文書修正部４０５の実現に必要となる所定プログラムは、予めＰＣ内にインストールしておくことが考えられる。ただし、予めインストールされているのではなく、コンピュータ読み取り可能な記憶媒体に格納されて提供されるものであっても良く、または有線若しくは無線による通信手段を介して配信されるものであっても良い。 In this case, a predetermined program necessary for realizing the parameter value sequence extraction unit 401, the code sequence group generation unit 402, the information encoding unit 403, the correction parameter value sequence generation unit 404, and the document correction unit 405 is stored in advance in the PC. It is possible to install it. However, it may be provided by being stored in a computer-readable storage medium instead of being installed in advance, or distributed via wired or wireless communication means. .

つまり、上述した構成の文書処理装置は、第１文書入力部１０、第２文書入力部２０、埋め込み情報入力部３０、文書出力部５０および情報出力部６０接続されるコンピュータを文書処理装置として機能させる文書処理プログラムによっても実現することが可能である。 That is, the document processing apparatus having the above-described configuration functions as a document processing apparatus using the computer connected to the first document input unit 10, the second document input unit 20, the embedded information input unit 30, the document output unit 50, and the information output unit 60. This can also be realized by a document processing program.

（文書処理方法の処理手順）
続いて、上記構成の情報埋め込み処理部４００において実行される情報埋め込み処理（文書処理プログラムによって実行される場合も含む）の処理手順について述べる。 (Processing procedure of document processing method)
Next, a processing procedure of information embedding processing (including a case where it is executed by a document processing program) executed in the information embedding processing unit 400 having the above configuration will be described.

図３は、情報埋め込み処理部４００における情報埋め込み処理の手順の一例を示すフローチャートである。この情報埋め込み処理は、ＣＰＵ４１による制御の下に実行されることになる。以下の例では、埋め込むべき付加情報がｎビットの２値データであり、付加情報を埋め込むための文書体裁に関するパラメータとして文字間隔を用いる場合について説明する。 FIG. 3 is a flowchart illustrating an example of the procedure of information embedding processing in the information embedding processing unit 400. This information embedding process is executed under the control of the CPU 41. In the following example, a case will be described in which the additional information to be embedded is n-bit binary data, and the character spacing is used as a parameter relating to the document format for embedding the additional information.

先ず、第１文書入力部１０からパラメータ値系列抽出部４０１に対して、付加情報の埋め込み対象となる電子文書データが入力される（ステップＳ１１）と、パラメータ値系列抽出部４０１において、入力された電子文書データや文書のラスタデータを解析し、文書上の文字列を構成する各文字間の間隔を求め、これを文書体裁に関するパラメータ値の系列とする（ステップＳ１２）。 First, when electronic document data to be embedded with additional information is input from the first document input unit 10 to the parameter value series extraction unit 401 (step S11), the parameter value series extraction unit 401 inputs the electronic document data. The electronic document data and the raster data of the document are analyzed to obtain an interval between characters constituting the character string on the document, and this is used as a series of parameter values relating to the document appearance (step S12).

具体的には、図４に示すように、第１文書入力部１０から入力された文書上の文字列を構成する各文字間の間隔を並べた系列を、この文書のパラメータ値系列ａ（ｋ）（ｋ＝０，１，２，…，Ｌ−１）とする。文字間隔としては、文書上のどの位置における文字間隔を用いても良いが、本例では、文書先頭からＬ個分（Ｌ≧２ⁿ ）の文字間隔を用いるものとする。また、文書上の1行から得られる文字間隔数がＬ個に満たない場合は、続く複数行から得られる文字間隔パラメータを連結して系列長Ｌのパラメータ値系列を得るものとする。 Specifically, as shown in FIG. 4, a series in which the intervals between the characters constituting the character string on the document input from the first document input unit 10 are arranged is a parameter value series a (k ) (K = 0, 1, 2,..., L−1). As the character spacing, the character spacing at any position on the document may be used, but in this example, L character spacings from the beginning of the document (L ≧ 2 ⁿ ) are used. When the number of character intervals obtained from one line on the document is less than L, a character value parameter having a sequence length L is obtained by concatenating character interval parameters obtained from a plurality of subsequent lines.

ここで、入力された文書データがＰｏｓｔＳｃｒｉｐｔ等のページ記述言語や、ワードプロセッサ出力形式の電子文書データである場合は、電子文書データを解析することにより文字間隔を直接求めることができる。電子文書データを解析する方法については特定の方法に限定されるものではなく、電子文書データのフォーマットに応じて適当な方法を用いることが可能である。 If the input document data is a page description language such as PostScript or electronic document data in a word processor output format, the character spacing can be directly obtained by analyzing the electronic document data. The method of analyzing the electronic document data is not limited to a specific method, and an appropriate method can be used according to the format of the electronic document data.

また、入力された文書データがスキャナやデジタルカメラにより撮影された文書のラスタデータである場合は、文字認識の前処理として一般的に行われる文字の切り出し処理を行い、切り出された各文字矩形間の距離を文字間隔とすることができる。文字の切り出し処理についても特定の方法に限定されるものではなく、一般に用いられる適当な手法を用いることが可能である。 In addition, when the input document data is raster data of a document taken by a scanner or digital camera, a character segmenting process generally performed as a character recognition pre-process is performed, Can be set as the character spacing. The character cut-out processing is not limited to a specific method, and any generally used appropriate method can be used.

なお、本例では、情報を埋め込むための文書体裁に関するパラメータとして文字間隔を用いるものとするが、これに限られるものではなく、文書体裁に関するパラメータとしては、文字位置、単語間隔、行間隔、文字高さ、文字幅、文字の傾きや、これらのいくつかを組み合わせたものを利用することも可能である。 In this example, the character spacing is used as a parameter related to the document format for embedding information. However, the present invention is not limited to this, and the parameters related to the document format include character position, word spacing, line spacing, and character. It is also possible to use height, character width, character inclination, or some combination of these.

次に、情報符号化部４０３において、埋め込み情報入力部３０から入力されたｎビットの２値データを、符号系列群生成部４０２で生成された符号系列群によって符号化し、符号化情報系列を求める（ステップＳ１３）。ここでは、埋め込み情報入力部３０から入力されたｎビットの２値データをｂで表すことにする。 Next, in the information encoding unit 403, n-bit binary data input from the embedded information input unit 30 is encoded by the code sequence group generated by the code sequence group generation unit 402 to obtain an encoded information sequence. (Step S13). Here, the n-bit binary data input from the embedded information input unit 30 is represented by b.

ここで、ステップＳ１３で実行される情報符号化処理、即ち埋め込み情報の符号化を行って符号化情報系列を求める情報符号化部４０３の処理の詳細について説明する。 Here, details of the information encoding process executed in step S13, that is, the process of the information encoding unit 403 for encoding the embedded information and obtaining the encoded information sequence will be described.

符号系列群生成部４０２で生成される符号系列としては、シンボル値の平均が小さな値で、ランダム性を有するものが用いられる。また、生成される基準符号系列および情報埋め込み用符号系列は共に、図５に示すような鋭い自己相関特性を持つものとする。さらに、基準符号系列と情報埋め込み用符号系列の相互相関は小さなものを用いる。 As the code sequence generated by the code sequence group generation unit 402, a code sequence having a small average value and having randomness is used. Both the generated reference code sequence and the information embedding code sequence have sharp autocorrelation characteristics as shown in FIG. Further, a small cross-correlation between the reference code sequence and the information embedding code sequence is used.

このような性質を有する公知の符号系列としては、例えばＧｏｌｄ符号系列が挙げられる。但し、符号系列群生成部４０２で生成される符号系列としては、Ｇｏｌｄ符号系列に限られるものではなく、上記の性質を有するものであれば良い。 An example of a known code sequence having such properties is a Gold code sequence. However, the code sequence generated by the code sequence group generation unit 402 is not limited to the Gold code sequence, and may be any code sequence having the above properties.

ここでは、符号系列群生成部４０２で生成される符号系列として、1周期の系列長がＬの２値系列を用いる例について記す。但し、系列長Ｌは２ⁿ 以上とする。また、基準符号系列をｒｅｆ（ｋ）、情報埋め込み用符号系列をｐｎ（ｋ）で表し、そのシンボル値は±１のいずれかをとるものとする。 Here, an example in which a binary sequence having a sequence length of L as one cycle is used as the code sequence generated by the code sequence group generation unit 402 will be described. However, the sequence length L is 2 ⁿ or more. Further, the reference code sequence is represented by ref (k), the information embedding code sequence is represented by pn (k), and the symbol value thereof takes any of ± 1.

基準符号系列をｒｅｆ（ｋ）および情報埋め込み用符号系列ｐｎ（ｋ）はそれぞれ次式（１），（２）で表される。
ｒｅｆ（ｋ）＝｛±１｝ ……（１）
（ｋ＝０，１，２，…，Ｌ−１、かつＬ≧２ⁿ ）
ｐｎ（ｋ）＝｛±１｝ ……（２）
（ｋ＝０，１，２，…，Ｌ−１、かつＬ≧２ⁿ ） The reference code sequence ref (k) and the information embedding code sequence pn (k) are expressed by the following equations (1) and (2), respectively.
ref (k) = {± 1} (1)
(K = 0, 1, 2,..., L−1 and L ≧ 2 ⁿ )
pn (k) = {± 1} (2)
(K = 0, 1, 2,..., L−1 and L ≧ 2 ⁿ )

そして、図６に示すように、これら２つの符号系列ｒｅｆ（ｋ），ｐｎ（ｋ）の位相差を利用してｎビットの埋め込み情報ｂを符号化する。ｎビットの埋め込み情報ｂを符号化した符号化情報系列ｃ（ｋ）は次式（３）で表される。
ｃ（ｋ）＝ｒｅｆ（ｋ）＋ｐｎ（ｋ＋ｂ） ……（３）
（ｋ＝０，１，２，…，Ｌ−１、かつＬ≧２ⁿ ）
但し、上記の式（３）において、符号系列ｐｎ（ｋ）は周期Ｌの周期符号系列（ｐｎ（ｋ＋Ｌ）＝ｐｎ（ｋ））である。 Then, as shown in FIG. 6, n-bit embedded information b is encoded using the phase difference between these two code sequences ref (k) and pn (k). An encoded information sequence c (k) obtained by encoding n-bit embedded information b is expressed by the following equation (3).
c (k) = ref (k) + pn (k + b) (3)
(K = 0, 1, 2,..., L−1 and L ≧ 2 ⁿ )
However, in the above equation (3), the code sequence pn (k) is a periodic code sequence of period L (pn (k + L) = pn (k)).

系列長Ｌの符号は、シフト０からシフトＬ−１までの都合Ｌ通りの状態を表現し得る。したがって、系列長Ｌの基準符号系列ｒｅｆ（ｋ）と情報埋め込み用符号系列ｐｎ（ｋ）とを用いた場合には、基準符号系列ｒｅｆ（ｋ）と情報埋め込み用符号系列ｐｎ（ｋ）との位相差を利用してlog₂ Ｌビットの情報を符号化できることになる。 The code of the sequence length L can express L states from the shift 0 to the shift L-1. Therefore, when the reference code sequence ref (k) having the sequence length L and the information embedding code sequence pn (k) are used, the reference code sequence ref (k) and the information embedding code sequence pn (k) Log ₂ L-bit information can be encoded using the phase difference.

図２のフローチャートに説明を戻す。ステップＳ１３での埋め込み情報の符号化後、修正パラメータ値系列生成部４０４において、ステップＳ１２で求めた文書のパラメータ値系列、即ち文字間隔の系列ａ（ｋ）を、ステップＳ１３で得た符号化情報系列ｃ（ｋ）に応じて修正し、修正パラメータ値系列ｄ（ｋ）を求める（ステップＳ１４）。 Returning to the flowchart of FIG. After the embedded information is encoded in step S13, the parameter value sequence of the document obtained in step S12, that is, the character spacing sequence a (k) is obtained in the modified parameter value sequence generation unit 404 by the encoded information obtained in step S13. Correction is made according to the series c (k) to obtain a correction parameter value series d (k) (step S14).

修正パラメータ値系列ｄ（ｋ）は、次式（４）で表される。
ｄ（ｋ）＝ａ（ｋ）＋δ・ｃ（ｋ） ……（４）
上記の式（４）におけるδは情報の埋め込み強度を示す定数であり、小さな値にするほど埋め込み情報の不可視性が高まる。 The corrected parameter value series d (k) is expressed by the following equation (4).
d (k) = a (k) + δ · c (k) (4)
In the above formula (4), δ is a constant indicating the embedding strength of information, and the invisibility of embedded information increases as the value becomes smaller.

後述するように、本実施形態では、系列長Ｌが数百から数千の符号系列を用いた場合、上記式（４）における情報の埋め込み強度δは文字間隔の系列ａ（ｋ）の各シンボル値に比べてはるかに小さな値とすることができる。すなわち、系列長の長い符号系列を用いれば、埋め込み情報がほとんど目視により判別することができない程度にまで不可視性を高めることができる。 As will be described later, in the present embodiment, when a code sequence having a sequence length L of several hundred to several thousand is used, the information embedding strength δ in the above equation (4) is the symbol of the character interval sequence a (k). The value can be much smaller than the value. That is, if a code sequence having a long sequence length is used, the invisibility can be increased to such an extent that the embedded information can hardly be visually discriminated.

次に、文書修正部４０５において、ステップＳ１４で求めた修正パラメータ値系列ｄ（ｋ）を用いて、第１文書入力部１０から入力された文書データの体裁を修正する（ステップＳ１５）。この文書修正ステップでは、図７に示すように、修正後の文書上の文字列から得られる文字間隔が、修正パラメータ値系列ｄ（ｋ）の各シンボル値に等しくなるように修正する処理が行われる。 Next, the document correction unit 405 corrects the appearance of the document data input from the first document input unit 10 using the correction parameter value series d (k) obtained in step S14 (step S15). In this document correction step, as shown in FIG. 7, processing for correcting the character spacing obtained from the character string on the corrected document to be equal to each symbol value of the correction parameter value series d (k) is performed. Is called.

具体的には、元の文書データがＰｏｓｔＳｃｒｉｐｔ等のページ記述言語や、ワードプロセッサ出力形式の電子文書データである場合には、文字間隔が修正パラメータ値系列ｄ（ｋ）の各シンボル値に等しくなるように、電子文書データの記述を修正する。 Specifically, when the original document data is a page description language such as PostScript or electronic document data in a word processor output format, the character spacing is made equal to each symbol value of the correction parameter value series d (k). Next, the description of the electronic document data is corrected.

一方、元の文書データがスキャナやデジタルカメラにより撮影された文書のラスタデータである場合には、パラメータ値系列抽出ステップ（ステップＳ１２）で行った文字の切り出し処理の結果を利用し、切り出された各文字矩形間の距離が修正パラメータ値系列ｄ（ｋ）の各シンボル値に等しくなるように各文字画像を再配置する。 On the other hand, when the original document data is raster data of a document photographed by a scanner or a digital camera, it is cut out using the result of the character cutout process performed in the parameter value series extraction step (step S12). The character images are rearranged so that the distance between the character rectangles is equal to the symbol values of the correction parameter value series d (k).

最後に、修正パラメータ値系列ｄ（ｋ）を用いて文字間隔を修正した文書データを、文書出力部５０から紙文書や電子文書データの形式で出力する（ステップＳ１６）。 Finally, the document data in which the character spacing is corrected using the correction parameter value series d (k) is output from the document output unit 50 in the form of a paper document or electronic document data (step S16).

以上の一連の処理を実行することにより、付加情報の埋め込まれた文書が得られる。本実施形態に係る情報埋め込み方法によれば、付加情報がバーコードなどで文書に併記されるのと異なり、文字列の文字間隔として文書コンテンツと不可分の形式で埋め込まれる。したがって、機密文書の追跡情報などを付加情報として文書に埋め込んだ場合、流出元の特定を妨害するために埋め込み情報を取り除こうとすると文書コンテンツそのものを失うことになるために、高いセキュリティ性が得られる。 By executing the above series of processing, a document in which additional information is embedded is obtained. According to the information embedding method according to the present embodiment, the additional information is embedded in an inseparable form from the document content as the character spacing of the character string, unlike the case where the additional information is written in the document with a barcode or the like. Therefore, if embedded information such as tracking information of confidential documents is embedded in the document as an additional information, the document content itself will be lost if an attempt is made to remove the embedded information in order to hinder the identification of the leaked source, thus providing high security. .

また、符号系列群生成部４０２によって生成される符号系列は、シンボル値の平均が小さな値で、ランダム性を有するものである。このことを前提に式（４）を眺めると、各文字の間隔ａ（ｋ）はランダムに広げられたり狭められたりし、かつ、ある一定の範囲（例えば、１行）を見れば、修正量の和は小さな値になることがわかる。つまり、修正後の各行の長さは元文書の行の長さとほとんど変わらないため、情報埋め込み処理の不可視性を高めることが可能となる。 In addition, the code sequence generated by the code sequence group generation unit 402 has a small average value of symbols and has randomness. Looking at equation (4) on the premise of this, the spacing a (k) between characters is randomly expanded or narrowed, and the amount of correction is ascertained by looking at a certain range (for example, one line). It turns out that the sum of becomes a small value. That is, since the length of each corrected line is almost the same as the line length of the original document, the invisibility of the information embedding process can be increased.

上述したように、本実施形態に係る情報埋め込み処理部４００または情報埋め込み処理方法では、情報埋め込み対象として入力された文書上の文字列から文書体裁に関するパラメータ値の系列を求め、基準符号系列と情報埋め込み用符号系列とを生成し、両符号系列間の位相差として埋め込み情報を表し、基準符号系列と情報埋め込み用符号系列とを加えて符号化情報系列とする。そして、この符号化情報系列とパラメータ値系列とを合成して修正パラメータ値系列を求め、電子文書データまたは文書画像の文書体裁を、当該修正パラメータ値系列に応じて変更することにより文書に情報を埋め込むようにしているため、次のような作用効果を得ることができる。 As described above, in the information embedding processing unit 400 or the information embedding processing method according to the present embodiment, a series of parameter values related to document appearance is obtained from a character string on a document input as an information embedding target, and a reference code series and information An embedding code sequence is generated, embedding information is expressed as a phase difference between the two code sequences, and a reference code sequence and an information embedding code sequence are added to form an encoded information sequence. The encoded information series and the parameter value series are combined to obtain a corrected parameter value series, and the document format of the electronic document data or document image is changed in accordance with the corrected parameter value series, so that information is stored in the document. Since it is embedded, the following effects can be obtained.

すなわち、文書のデジタルデータだけでなく、紙に印刷された文字文書中にも追跡情報などの付加情報を埋め込むことができる。また、文書上の文字列から得られる文字間隔、文字位置、単語間隔、行間隔、文字高さ、文字幅等、文書体裁に関する文書コンテンツ自体のパラメータ値に直接付加情報が埋め込まれるため、文書コンテンツと不可分であり、情報の改ざん・除去を困難にすることができる。 That is, additional information such as tracking information can be embedded not only in the digital data of the document but also in the text document printed on paper. In addition, since additional information is directly embedded in the parameter values of the document content itself relating to the document format such as character spacing, character position, word spacing, line spacing, character height, character width, etc. obtained from the character string on the document, the document content It is inseparable, making it difficult to tamper and remove information.

さらに、文字画像そのものに情報を埋め込む手法を採っていることから、繰り返しコピーをとった場合でも、文字そのものが消えてしまわない限り埋め込み情報（付加情報）が保存されるため、繰り返しコピーによる画質劣化に対して堅牢である、即ち繰り返しコピーに対してロバスト性が高いという特長も併せ持っている。 In addition, since the method of embedding information in the character image itself is used, even if repeated copies are made, the embedded information (additional information) is saved as long as the characters themselves do not disappear. In addition, it has a feature that it is robust against repeated copying, that is, it is highly robust against repeated copying.

特に、基準符号系列と情報埋め込み用符号系列の位相差として情報を埋め込んでいることから、埋め込み情報の復元時に行われる文字の切り出し処理を行う際に、文字の切り出し誤りが生じても、埋め込まれた基準符号系列と情報埋め込み用符号系列の位相差、即ち埋め込み情報は不変であるため、文字の切り出し誤りが生じた場合であっても埋め込み情報の復元が可能になる。 In particular, since the information is embedded as the phase difference between the reference code sequence and the information embedding code sequence, even when a character segmentation error occurs during the character segmentation process performed when embedding information is restored, the information is embedded. Since the phase difference between the reference code sequence and the information embedding code sequence, that is, the embedded information is unchanged, the embedded information can be restored even when a character cut-out error occurs.

（埋め込み情報復元処理）
次に、先述した本実施形態に係る文書処理装置１において実行される、本発明の特徴とする埋め込み情報復元処理について説明する。この埋め込み情報復元処理は、処理制御部４０内に構築された埋め込み情報復元処理部４５０で実行される。 (Embedded information restoration process)
Next, an embedded information restoration process, which is a feature of the present invention and is executed in the document processing apparatus 1 according to this embodiment described above, will be described. This embedded information restoration processing is executed by an embedded information restoration processing unit 450 built in the processing control unit 40.

図８は、埋め込み情報復元処理部４５０の機能構成の一例を示すブロック図である。図８に示すように、本例に係る埋め込み情報復元処理部４５０は、パラメータ値系列抽出部４５１、符号系列群生成部４５２および埋め込み情報復元部４５３を有する構成となっている。本埋め込み情報復元処理部４５０には、第２文書入力部２０から、付加情報の埋め込まれた文書のラスタデータ等が入力される。 FIG. 8 is a block diagram illustrating an example of a functional configuration of the embedded information restoration processing unit 450. As illustrated in FIG. 8, the embedded information restoration processing unit 450 according to the present example includes a parameter value sequence extraction unit 451, a code sequence group generation unit 452, and an embedded information restoration unit 453. The embedded information restoration processing unit 450 receives raster data of a document in which additional information is embedded from the second document input unit 20.

パラメータ値系列抽出部４５１は、例えば情報埋め込み処理部４００のパラメータ値系列抽出部４０１と同じ構成を採り、第２文書入力部２０から入力された、付加情報の埋め込まれた文書のラスタデータや電子文書データを解析し、文書上の文字列から得られる文字間隔、文字位置、単語間隔、行間隔、文字高さ、文字幅、文字の傾き等の文書体裁に関するパラメータ値の系列を求める。 The parameter value series extraction unit 451 has the same configuration as the parameter value series extraction unit 401 of the information embedding processing unit 400, for example, and is input from the second document input unit 20 and includes raster data and electronic data of a document embedded with additional information The document data is analyzed to obtain a series of parameter values relating to the document appearance such as character spacing, character position, word spacing, line spacing, character height, character width, and character inclination obtained from the character string on the document.

符号系列群生成部４５２は、例えば情報埋め込み処理部４００の符号系列群生成部４０２と同じ構成を採り、情報埋め込み処理部４００で生成された基準符号系列と同一の基準符号系列および当該符号系列のシフトバージョンと、同じく情報埋め込み処理部４００で生成された生成された1乃至複数の情報埋め込み用符号系列と同一の1乃至複数の情報埋め込み用符号系列および当該情報埋め込み用符号系列のシフトバージョンとを生成する。 The code sequence group generation unit 452 has the same configuration as the code sequence group generation unit 402 of the information embedding processing unit 400, for example, and the same reference code sequence as the reference code sequence generated by the information embedding processing unit 400 and the code sequence of the code sequence A shift version, one or more information embedding code sequences that are the same as the one or more information embedding code sequences generated by the information embedding processing unit 400, and a shift version of the information embedding code sequence Generate.

埋め込み情報復元部４５３は、パラメータ値系列抽出部４５１で抽出された、付加情報の埋め込まれた文書の文書体裁に関するパラメータ値系列から埋め込み情報を復元する。この埋め込み情報復元部４５３では、パラメータ値系列抽出部４５１で抽出されたパラメータ値系列に対して、符号系列群生成部４５２で生成された符号系列群の各々との間で復号処理を行うことにより、埋め込まれた付加情報を復元する。 The embedded information restoration unit 453 restores the embedded information from the parameter value series related to the document format of the document embedded with the additional information extracted by the parameter value series extraction unit 451. In the embedded information restoration unit 453, the parameter value sequence extracted by the parameter value sequence extraction unit 451 is subjected to decoding processing with each of the code sequence groups generated by the code sequence group generation unit 452. , Restore the embedded additional information.

ここで、本例に係る埋め込み情報復元処理部４５０は、付加情報が埋め込まれ、紙に出力された文書の画像から、埋め込み情報を復元することを主目的とするが、紙に出力する前のＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式の電子文書データからも埋め込み情報（付加情報）を復元できることとする。したがって、第２文書入力部２０から入力される文書データはラスタデータに限られるものではなく、ページ記述言語やワードプロセッサ出力形式の電子文書データであっても良い。 Here, the embedding information restoration processing unit 450 according to the present example mainly aims to restore the embedding information from the image of the document embedded in the additional information and output to the paper. It is assumed that embedded information (additional information) can be restored from page description language such as PostScript or electronic document data in a word processor output format. Therefore, the document data input from the second document input unit 20 is not limited to raster data, and may be electronic document data in a page description language or a word processor output format.

続いて、上記構成の埋め込み情報復元処理部４５０において実行される情報埋め込み処理について述べる。 Next, an information embedding process executed in the embedded information restoration processing unit 450 configured as described above will be described.

図８は、埋め込み情報復元処理部４５０における埋め込み情報復元処理の手順の一例を示すフローチャートである。この埋め込み情報復元処理は、ＣＰＵ４１（図１参照）による制御の下に実行されることになる。以下では、先述した情報埋め込み処理手順で紙文書に埋め込まれたｎビットの埋め込み情報ｂを復元する場合を例に採って説明する。入力となる紙文書は繰り返しコピーによる画質劣化を含むものとする。 FIG. 8 is a flowchart illustrating an example of the procedure of the embedded information restoration process in the embedded information restoration processing unit 450. This embedded information restoration process is executed under the control of the CPU 41 (see FIG. 1). In the following, a case will be described as an example where the n-bit embedded information b embedded in a paper document is restored by the above-described information embedding processing procedure. Assume that the input paper document includes image quality degradation due to repeated copying.

先ず、スキャナやデジタルカメラ等の第２文書入力部２０から、情報埋め込み済みの紙文書の文書データがラスタデータとして入力される（ステップＳ２１）と、パラメータ値系列抽出部２２において、第２文書入力部２０から入力された文書のラスタデータを解析し、文書上の文字列を構成する各文字間の間隔を求め、これを文書体裁に関するパラメータ値の系列とする（ステップＳ２２）。 First, when document data of a paper document in which information is embedded is input as raster data from the second document input unit 20 such as a scanner or a digital camera (step S21), the parameter value series extraction unit 22 inputs the second document. The raster data of the document input from the unit 20 is analyzed, the interval between the characters constituting the character string on the document is obtained, and this is used as a series of parameter values relating to the document format (step S22).

ここでは、一例として、付加情報の埋め込まれた紙に出力された文書の画像から埋め込み情報を復元する場合について説明するが、紙に出力する前のＰｏｓｔＳｃｒｉｐｔ等のページ記述言語や、ワードプロセッサ出力形式の電子文書データを入力データとしても良い。いずれの場合についても、文字間隔系列の抽出方法は情報埋め込み時と同様であるため、その説明は省略する。 Here, as an example, a case will be described in which embedded information is restored from an image of a document output on a paper in which additional information is embedded, but a page description language such as PostScript before output on paper or a word processor output format Electronic document data may be used as input data. In either case, the method for extracting the character interval series is the same as that for information embedding, and the description thereof is omitted.

パラメータ値系列抽出部２２で得られる文字間隔の系列ａ’（ｋ）については、先に記述した式（３），（４）を用いて次式（５）で表すことができる。
ａ’（ｋ）＝ｄ（ｋ）＋ｅ（ｋ）
＝ａ（ｋ）＋δ・ｃ（ｋ）＋ｅ（ｋ）
＝ａ（ｋ）＋δ・ｒｅｆ（ｋ）＋δ・ｐｎ（ｋ＋ｂ）＋ｅ（ｋ）…（５）
上記の式（５）におけるｅ（ｋ）は繰り返しコピーによる文字の太りや細り、あるいは文字上のノイズや荒れによる誤差成分、および文字間隔検出処理の検出誤差成分を表す。 The character interval series a ′ (k) obtained by the parameter value series extraction unit 22 can be expressed by the following expression (5) using the expressions (3) and (4) described above.
a ′ (k) = d (k) + e (k)
= A (k) + δ · c (k) + e (k)
= A (k) + δ · ref (k) + δ · pn (k + b) + e (k) (5)
In the above equation (5), e (k) represents an error component due to character thickening or thinning due to repeated copying, or noise or roughness on the character, and a detection error component in the character interval detection processing.

次に、符号系列群生成部２３において、情報埋め込み時に使用された基準符号系列ｒｅｆ（ｋ）と同一の基準符号系列ｒｅｆ（ｋ）および当該同一の基準符号系列ｒｅｆ（ｋ）のシフトバージョンと、同じく情報埋め込み時に使用された情報埋め込み用符号系列ｐｎ（ｋ）と同一の情報埋め込み用符号系列ｐｎ（ｋ）および当該同一の情報埋め込み用符号系列ｐｎ（ｋ）のシフトバージョンとを生成する（ステップＳ２３）。ここで生成される符号系列群は、先に記述した式（１），（２）に示したものと同一であるため、その説明は省略する。 Next, in the code sequence group generation unit 23, the same reference code sequence ref (k) as the reference code sequence ref (k) used at the time of information embedding, and a shifted version of the same reference code sequence ref (k), Similarly, an information embedding code sequence pn (k) identical to the information embedding code sequence pn (k) used at the time of information embedding and a shifted version of the same information embedding code sequence pn (k) are generated (steps). S23). Since the code sequence group generated here is the same as that shown in the equations (1) and (2) described above, description thereof is omitted.

次に、埋め込み情報復元部２４において、文字間隔の系列ａ’（ｋ）から埋め込んだｎビットの埋め込み情報ｂを復元する（ステップＳ２４）。ここでの処理の詳細について、図１０を用いて説明する。 Next, the embedding information restoration unit 24 restores the n-bit embedding information b embedded from the character interval series a ′ (k) (step S <b> 24). Details of the processing here will be described with reference to FIG.

図１０には、ｈ番目の文字切り出しが失敗して1つ前の文字と連結して切り出された場合を示している。文字の切り出し誤りがあるため、文字間隔系列ａ’（ｋ）はｈ番目の要素以降に位相のずれが生じている。 FIG. 10 shows a case where the h-th character segmentation fails and is segmented and segmented with the previous character. Since there is a character segmentation error, the character interval series a '(k) has a phase shift after the h-th element.

このように、切り出し誤りによって文字間隔系列ａ’（ｋ）に位相のずれが生じた場合であっても、本実施形態に係る情報埋め込み処理では、先述したように、基準符号系列ｒｅｆ（ｋ）と情報埋め込み用符号系列ｐｎ（ｋ）の位相差を利用して付加情報を埋め込んでいるため、埋め込み情報（付加情報）を正しく復元できる。何故ならば、切り出し誤りによる位相のずれは基準符号系列ｒｅｆ（ｋ）と情報埋め込み用符号系列ｐｎ（ｋ）の両方に同じだけ発生し、その位相差は不変であるからである。この原理について図１１を用いて説明する。 As described above, in the information embedding process according to the present embodiment, as described above, even when a phase shift occurs in the character interval sequence a ′ (k) due to the extraction error, as described above, the reference code sequence ref (k). Since the additional information is embedded using the phase difference between the information embedding code sequence pn (k), the embedded information (additional information) can be correctly restored. This is because a phase shift due to a cut-off error occurs in the same amount in both the reference code sequence ref (k) and the information embedding code sequence pn (k), and the phase difference is unchanged. This principle will be described with reference to FIG.

図１１には、ｈ番目の文字切り出しが失敗して1つ前の文字と連結して切り出された場合における情報復元の様子を示す。文字切り出しの結果から文字間隔系列ａ’（ｋ）が得られる。ｈ−１番目の文字までは正しく切り出されているので、文字間隔系列ａ’（ｋ）は、式（５）より、
ａ’（ｋ）＝ａ（ｋ）＋ｒｅｆ（ｋ）＋ｐｎ（ｋ＋ｂ）＋ｅ（ｋ） …（６）
となる。但し、ｋ＜ｈ、また式を簡単にするため情報の埋め込み強度δについてはδ＝１とする。 FIG. 11 shows the state of information restoration when the h-th character segmentation fails and is segmented and segmented with the previous character. A character interval series a ′ (k) is obtained from the result of character segmentation. Since the character up to the (h-1) th character is correctly cut out, the character interval series a ′ (k) is obtained from the equation (5):
a ′ (k) = a (k) + ref (k) + pn (k + b) + e (k) (6)
It becomes. However, k <h, and the embedding strength δ of information is set to δ = 1 to simplify the equation.

一方、ｈ番目の文字は誤って次の文字と結合して切り出されているため、ｋ≧ｈについては文字間隔系列ａ’（ｋ）は次式（７）のようになる。
ａ’（ｋ）＝ａ（ｋ＋１）＋ｒｅｆ（ｋ＋１）
＋ｐｎ（ｋ＋１＋ｂ）＋ｅ（ｋ＋１） …（７） On the other hand, since the h-th character is erroneously combined with the next character and cut out, for k ≧ h, the character interval series a ′ (k) is expressed by the following equation (7).
a ′ (k) = a (k + 1) + ref (k + 1)
+ Pn (k + 1 + b) + e (k + 1) (7)

このような文字間隔系列ａ’（ｋ）に対して、埋め込み情報復元部４５３では以下のような手順で情報を復元していく。 For such a character interval series a '(k), the embedded information restoration unit 453 restores information in the following procedure.

〈ステップ１〉
文字間隔系列ａ’（ｋ）と基準符号系列ｒｅｆ（ｋ）および当該基準符号系列ｒｅｆ（ｋ）のシフトバージョンとの部分相関を計算する。図１１では部分相関の計算範囲を点線の矩形ウインドウで示しており、ウインドウ幅は３となっている。実際にはウインドウ幅は数十から数百のサイズにする必要があるが、ここでは簡単のため３とした。 <Step 1>
The partial correlation between the character spacing sequence a ′ (k), the reference code sequence ref (k) and the shifted version of the reference code sequence ref (k) is calculated. In FIG. 11, the calculation range of the partial correlation is indicated by a dotted rectangular window, and the window width is 3. Actually, the window width needs to be several tens to several hundreds, but here it is 3 for simplicity.

図１１のウインドウ位置０において、文字間隔系列ａ’（ｋ）と基準符号系列ｒｅｆ（ｋ）および当該基準符号系列ｒｅｆ（ｋ）のシフトバージョンとの部分相関を計算すると、基準符号系列ｒｅｆ（ｋ）、即ち｛ｒｅｆ（０），ｒｅｆ（１），ｒｅｆ（２）｝との相関が最も大きくなる。基準符号系列は図５に示すような鋭い自己相関特性を持つものであるから、基準符号系列ｒｅｆ（ｋ）との相関の方が他のシフトバージョンとの相関よりも大きくなる。 When the partial correlation between the character interval sequence a ′ (k), the reference code sequence ref (k), and the shifted version of the reference code sequence ref (k) is calculated at window position 0 in FIG. 11, the reference code sequence ref (k ), That is, the correlation with {ref (0), ref (1), ref (2)} is maximized. Since the reference code sequence has a sharp autocorrelation characteristic as shown in FIG. 5, the correlation with the reference code sequence ref (k) is larger than the correlation with other shift versions.

式（６）より、文字間隔系列ａ’（ｋ）には他の信号成分ａ（ｋ），ｐｎ（ｋ＋ｂ），ｅ（ｋ）が含まれるが、これらの信号成分ａ（ｋ），ｐｎ（ｋ＋ｂ），ｅ（ｋ）と基準符号系列ｒｅｆ（ｋ）との相関値は非常に小さくなるため無視できる。基準符号系列ｒｅｆ（ｋ）はランダム性を有する系列であり、ランダム性を有する系列は自身以外のいかなる符号系列とも低相関となるからである。 According to equation (6), the character interval series a ′ (k) includes other signal components a (k), pn (k + b), and e (k). These signal components a (k) and pn ( The correlation value between k + b), e (k) and the reference code sequence ref (k) is so small that it can be ignored. This is because the reference code sequence ref (k) is a sequence having randomness, and the sequence having randomness has a low correlation with any code sequence other than itself.

〈ステップ２〉
次に、文字間隔系列ａ’（ｋ）と情報埋め込み用符号系列ｐｎ（ｋ）および当該情報埋め込み用符号系列ｐｎ（ｋ）のシフトバージョンとの部分相関を計算する。ステップ１での説明と同様の理由により、ウインドウ位置０における部分相関はｐｎ（ｋ＋ｂ）に対して最も大きな値となる。 <Step 2>
Next, the partial correlation between the character interval sequence a ′ (k), the information embedding code sequence pn (k) and the shifted version of the information embedding code sequence pn (k) is calculated. For the same reason as described in Step 1, the partial correlation at window position 0 is the largest value for pn (k + b).

〈ステップ３〉
ステップ１、ステップ２で最大部分相関が得られた符号間の位相差を求め、これを埋め込まれた情報として復元する。ウインドウ位置０について言えば、基準符号系列ｒｅｆ（ｋ）と情報埋め込み用符号系列ｐｎ（ｋ＋ｂ）の位相差、つまりｂ−０＝ｂが埋め込まれた付加情報である。 <Step 3>
The phase difference between the codes for which the maximum partial correlation is obtained in step 1 and step 2 is obtained, and this is restored as embedded information. Speaking of the window position 0, it is additional information in which the phase difference between the reference code sequence ref (k) and the information embedding code sequence pn (k + b), that is, b-0 = b is embedded.

〈ステップ４〉
ウインドウ位置をずらしてステップ１からの処理を繰り返す。例えば、ウインドウ位置１については、基準符号系列ｒｅｆ（ｋ＋１）と情報埋め込み用符号系列ｐｎ（ｋ＋１＋ｂ）との相関が最大となるため、その位相差、つまり（１＋ｂ）−１＝ｂが埋め込まれた付加情報となる。 <Step 4>
The process from step 1 is repeated by shifting the window position. For example, for window position 1, since the correlation between the reference code sequence ref (k + 1) and the information embedding code sequence pn (k + 1 + b) is maximized, the phase difference, that is, (1 + b) -1 = b is embedded. It becomes additional information.

上述したステップ１からステップ４までの手順を繰り返すことにより、各ウインドウ位置での位相差が求められる。各ウインドウ位置での位相差の内、最も多く出現した位相差を埋め込み情報として復元する。つまり、多数決原理に基づいて埋め込み情報の復元が為される。 By repeating the procedure from Step 1 to Step 4 described above, the phase difference at each window position is obtained. Of the phase differences at each window position, the phase difference that appears most frequently is restored as embedded information. That is, the embedded information is restored based on the majority rule.

文字間隔系列ａ’（ｋ）に含まれる他の信号成分ａ（ｋ），ｅ（ｋ）と、基準符号系列ｒｅｆ（ｋ）や情報埋め込み用符号系列ｐｎ（ｋ）との部分相関が大きな値をとることもあり得るがその確率は低いため、多数決原理により正しい情報を復元するのである。 A value with a large partial correlation between the other signal components a (k) and e (k) included in the character spacing sequence a ′ (k) and the reference code sequence ref (k) and the information embedding code sequence pn (k). Although the probability is low, correct information is restored by the majority rule.

以上が埋め込み情報復元処理部２０における情報復元手順であるが、図１１に示すように、文字切り出しが失敗した場合にも正しく情報復元できることを示しておく。図１１において、切り出し誤りのある位置より後ろの情報復元は次のようになる。 The above is the information restoration procedure in the embedded information restoration processing unit 20, but as shown in FIG. 11, it will be shown that information can be correctly restored even when character segmentation fails. In FIG. 11, the restoration of information after the position where there is a clipping error is as follows.

図１１のウインドウ位置ｈにおける文字間隔系列ａ’（ｋ）は式（７）で表される。したがって、ウインドウ位置ｈにおいて、文字間隔系列ａ’（ｋ）と基準符号系列ｒｅｆ（ｋ）および当該基準符号系列ｒｅｆ（ｋ）のシフトバージョンとの部分相関を計算すると、基準符号系列ｒｅｆ（ｋ＋ｈ＋１）との相関が最も大きくなる。 The character interval series a ′ (k) at the window position h in FIG. 11 is expressed by Expression (7). Accordingly, when the partial correlation between the character interval sequence a ′ (k), the reference code sequence ref (k), and the shifted version of the reference code sequence ref (k) is calculated at the window position h, the reference code sequence ref (k + h + 1) is calculated. The correlation with is the largest.

一方、情報埋め込み用符号系列ｐｎ（ｋ）および当該情報埋め込み用符号系列ｐｎ（ｋ）のシフトバージョンとの部分相関はｐｎ（ｋ＋ｈ＋ｂ＋１）に対して最も大きな値となる。したがって、その位相差は（ｈ＋ｂ＋１）−（ｈ＋１）＝ｂとなり、切り出し誤りがあっても依然として正しい埋め込み情報ｂを復元できることがわかる。 On the other hand, the partial correlation between the information embedding code sequence pn (k) and the shifted version of the information embedding code sequence pn (k) is the largest value for pn (k + h + b + 1). Therefore, the phase difference is (h + b + 1) − (h + 1) = b, and it can be seen that the correct embedded information b can still be restored even if there is a clipping error.

このようにして、埋め込み情報復元部２４で復元された埋め込み情報（付加情報）は、情報出力部６０から出力される（ステップＳ２５）。 In this manner, the embedded information (additional information) restored by the embedded information restoring unit 24 is output from the information output unit 60 (step S25).

上述したように、本実施形態に係る埋め込み情報復元処理部４５０または埋め込み情報復元処理方法では、先述した情報埋め込み処理部４００または情報埋め込み処理方法で情報が埋め込まれた文書上の文字列から文書体裁に関するパラメータ値系列を求め、このパラメータ値系列と、基準符号系列と情報埋め込み用符号系列および当該情報埋め込み用符号系列のシフトバージョンとの間の部分相関を求める。 As described above, in the embedded information restoration processing unit 450 or the embedded information restoration processing method according to the present embodiment, the document format is determined from the character string on the document in which the information is embedded by the information embedding processing unit 400 or the information embedding processing method. And a partial correlation between the parameter value series and the reference code series, the information embedding code series, and the shifted version of the information embedding code series.

次いで、基準符号系列と情報埋め込み用符号系列のそれぞれについて最大部分相関が得られる位相を求め、その位相差をその部分相関位置における埋め込み情報復元値とする。そして、このような埋め込み情報復元値を複数の部分相関位置について求め、最も高い頻度で出現した埋め込み情報復元値を、最終的に埋め込んだ情報として復元するようにしているため、次のような作用効果を得ることができる。 Next, the phase at which the maximum partial correlation is obtained is obtained for each of the reference code sequence and the information embedding code sequence, and the phase difference is set as an embedded information restoration value at the partial correlation position. Then, since such embedded information restoration values are obtained for a plurality of partial correlation positions, and the embedded information restoration value that appears with the highest frequency is finally restored as the embedded information, the following operation is performed. An effect can be obtained.

すなわち、入力される文書画像または電子文書データには、付加情報が基準符号系列と情報埋め込み用符号系列の位相差として埋め込まれていることから、文字の切り出し処理を行う際に、切り出し誤りが生じた場合であっても埋め込み情報の復元が可能になる。図１２に、文字の切り出し誤りの例を示す。図１２において、（Ａ）は１文字が分割されてしまう例を、（Ｂ）は複数文字が連結されてしまう例をそれぞれ示している。 That is, in the input document image or electronic document data, additional information is embedded as a phase difference between the reference code sequence and the information embedding code sequence, so that a clipping error occurs when performing character clipping processing. Even in such a case, it is possible to restore the embedded information. FIG. 12 shows an example of character cutout errors. In FIG. 12, (A) shows an example in which one character is divided, and (B) shows an example in which a plurality of characters are connected.

このように、埋め込み情報の復元時に必要となる文字の切り出し処理の際に、文字の切り出し誤り、具体的には文字の分割（Ａ）や複数文字の連結（Ｂ）が生じても、埋め込まれた基準符号系列と情報埋め込み用符号系列の位相差は不変であるため、その位相差を基に埋め込み情報を確実に復元することができるのである。 As described above, even when a character cut-out error, specifically, a character split (A) or a combination of multiple characters (B) occurs during character cut-out processing necessary for restoration of embedded information, the character is cut out. Since the phase difference between the reference code sequence and the information embedding code sequence is unchanged, the embedded information can be reliably restored based on the phase difference.

（変形例１）
なお、上記実施形態においては、単一の情報埋め込み用符号系列ｐｎ（ｋ）を用いるとしたが、複数の情報埋め込み用符号系列ｐｎ（ｍ，ｋ）、（ｍ＝０，１，…，Ｍ−１）を用いてＭ個の情報｛ｂ０，ｂ１，…，ｂＭ−１｝を多重化して埋め込むこともできる。このように、複数の情報埋め込み用符号系列を用いてＭ個の情報を多重化して埋め込むことにより、一定の文字列に埋め込むことができる情報の量を増加させることができる。 (Modification 1)
In the above embodiment, a single information embedding code sequence pn (k) is used. However, a plurality of information embedding code sequences pn (m, k), (m = 0, 1,..., M -1) can be used to multiplex and embed M pieces of information {b0, b1,..., BM-1}. In this manner, by multiplexing and embedding M pieces of information using a plurality of information embedding code sequences, the amount of information that can be embedded in a certain character string can be increased.

Ｍ個の情報｛ｂ０，ｂ１，…，ｂＭ−１｝は、上記実施形態と同様に、基準符号系列ｒｅｆ（ｋ）と情報埋め込み用符号系列ｐｎ（ｍ，ｋ）との位相差として埋め込む。すなわち、情報ｂｍは情報埋め込み用符号系列ｐｎ（ｍ，ｋ＋ｂｍ）によって表現される。情報埋め込み用符号系列ｐｎ（ｍ，ｋ）、（ｍ＝０，１，…，Ｍ−１）のそれぞれについては、図４に示すような鋭い自己相関特性を持ち、かつ基準符号系列や自身以外の情報埋め込み用符号系列との相互相関は小さなものを用いる。 M pieces of information {b0, b1,..., BM-1} are embedded as a phase difference between the reference code sequence ref (k) and the information embedding code sequence pn (m, k), as in the above embodiment. That is, the information bm is expressed by an information embedding code sequence pn (m, k + bm). Each of the information embedding code sequences pn (m, k), (m = 0, 1,..., M−1) has a sharp autocorrelation characteristic as shown in FIG. A small cross-correlation with the information embedding code sequence is used.

（変形例２）
また、上記実施形態において、各ウインドウ位置（部分相関位置）における部分相関値をモニターすることによって切り出し誤りのある概略位置および切り出し誤りの種類（連結／分割）を検出（推定）し、ユーザに対して例えば警告を発して切り出し結果の修正を促すようにすることもできる。 (Modification 2)
In the above embodiment, by monitoring the partial correlation value at each window position (partial correlation position), it detects (estimates) the approximate position with the clipping error and the type (concatenation / division) of the clipping error. For example, it is possible to issue a warning and prompt the user to correct the cutout result.

先述した埋め込み情報復元手順におけるステップ１では、図１１のウインドウ位置０において、文字間隔系列ａ’（ｋ）と基準符号系列ｒｅｆ（ｋ）および当該基準符号系列ｒｅｆ（ｋ）のシフトバージョンとの部分相関を計算すると、基準符号系列ｒｅｆ（ｋ）との相関が最も大きくなることを示した。また、ステップ４では、ウインドウ位置１について基準符号系列ｒｅｆ（ｋ＋１）との相関が最大となることを示した。 In step 1 in the above-described embedded information restoration procedure, the character spacing sequence a ′ (k), the reference code sequence ref (k), and the shifted version of the reference code sequence ref (k) at window position 0 in FIG. It was shown that when the correlation was calculated, the correlation with the reference code sequence ref (k) was the largest. Further, in step 4, it is shown that the correlation between the window position 1 and the reference code sequence ref (k + 1) is maximized.

このように、文字間隔系列ａ’（ｋ）と基準符号系列ｒｅｆ（ｋ）および当該基準符号系列ｒｅｆ（ｋ）のシフトバージョンとの部分相関値は、切り出し誤りが無ければ、ウインドウ位置ｗに対して基準符号系列ｒｅｆ（ｋ＋ｗ）との相関が最大となる。 As described above, the partial correlation value between the character interval sequence a ′ (k), the reference code sequence ref (k), and the shifted version of the reference code sequence ref (k) is the same as the window position w if there is no clipping error. Thus, the correlation with the reference code sequence ref (k + w) is maximized.

しかし、図１１に示すように、切り出し誤り（連結）がある場合には、切り出し誤り位置から後ろについては、ウインドウ位置ｗに対して基準符号系列ｒｅｆ（ｋ＋ｗ＋１）との相関が最大となる。同様に、切り出し誤り（分割）がある場合には、切り出し誤り位置から後ろについては、ウインドウ位置ｗに対して基準符号系列ｒｅｆ（ｋ＋ｗ−１）との相関が最大となる。 However, as shown in FIG. 11, when there is a clipping error (concatenation), the correlation with the reference code sequence ref (k + w + 1) with respect to the window position w is maximized after the clipping error position. Similarly, when there is a cutout error (division), the correlation with the reference code sequence ref (k + w-1) is maximized with respect to the window position w after the cutout error position.

したがって、各ウインドウ位置において、部分相関が最大となる基準符号系列ｒｅｆ（ｋ）の位相をモニターしていけば、切り出し誤りのある概略位置および誤りの種類（連結・分割）を検知することができる。各ウインドウ位置における文字間隔系列ａ’（ｋ）と基準符号系列ｒｅｆ（ｋ＋ｗ）およびｒｅｆ（ｋ＋ｗ±１）との部分相関について図１３に示す。 Therefore, by monitoring the phase of the reference code sequence ref (k) that maximizes the partial correlation at each window position, it is possible to detect the approximate position where the cut-out error occurs and the type of error (concatenation / division). . FIG. 13 shows a partial correlation between the character interval series a ′ (k) and the reference code series ref (k + w) and ref (k + w ± 1) at each window position.

図１３において、（Ａ）は位置ｈに切り出し誤り（連結）がある場合を表している。切り出し誤り位置ｈに到達するまでは基準符号系列ｒｅｆ（ｋ＋ｗ）との部分相関が大きく、そのシフトバージョンとの部分相関は小さい。しかし、切り出し誤り位置ｈの近傍から基準符号系列ｒｅｆ（ｋ＋ｗ）との部分相関は減少し始め、ついにはほぼゼロになる。 In FIG. 13, (A) represents a case where there is a clipping error (connection) at the position h. Until the clipping error position h is reached, the partial correlation with the reference code sequence ref (k + w) is large, and the partial correlation with the shifted version is small. However, the partial correlation with the reference code sequence ref (k + w) starts to decrease from the vicinity of the clipping error position h, and finally becomes almost zero.

それに対して、基準符号系列ｒｅｆ（ｋ＋ｗ＋１）との部分相関は切り出し誤り位置ｈの近傍で増加し始め、ついには基準符号系列ｒｅｆ（ｋ＋ｗ＋１）が最大相関を持つようになる。このような部分相関の増減パターンが観測された場合は、切り出し誤り位置ｈの近傍に切り出し誤り（連結）が存在することがわかるため、ユーザに切り出し結果の修正を促すことができる。 On the other hand, the partial correlation with the reference code sequence ref (k + w + 1) starts to increase in the vicinity of the extraction error position h, and finally the reference code sequence ref (k + w + 1) has the maximum correlation. When such an increase / decrease pattern of partial correlation is observed, it can be seen that there is a cutout error (connection) in the vicinity of the cutout error position h, so that the user can be prompted to correct the cutout result.

図１３において、（Ｂ）は位置ｈに切り出し誤り（分割）がある場合を表している。切り出し誤り位置ｈに到達するまでは基準符号系列ｒｅｆ（ｋ＋ｗ）との部分相関が大きく、そのシフトバージョンとの部分相関は小さい。しかし、切り出し誤り位置ｈの近傍から基準符号系列ｒｅｆ（ｋ＋ｗ）との部分相関は減少し始め、ついにはほぼゼロになる。 In FIG. 13, (B) represents a case where there is a clipping error (division) at the position h. Until the clipping error position h is reached, the partial correlation with the reference code sequence ref (k + w) is large, and the partial correlation with the shifted version is small. However, the partial correlation with the reference code sequence ref (k + w) starts to decrease from the vicinity of the clipping error position h, and finally becomes almost zero.

それに対して、基準符号系列ｒｅｆ（ｋ＋ｗ−１）との部分相関は切り出し誤り位置ｈの近傍で増加し始め、ついには基準符号系列ｒｅｆ（ｋ＋ｗ−１）が最大相関を持つようになる。このような部分相関の増減パターンが観測された場合は、切り出し誤り位置ｈの近傍に切り出し誤り（分割）が存在することがわかるため、ユーザに切り出し結果の修正を促すことができる。 On the other hand, the partial correlation with the reference code sequence ref (k + w-1) starts to increase in the vicinity of the extraction error position h, and finally the reference code sequence ref (k + w-1) has the maximum correlation. When such a partial correlation increase / decrease pattern is observed, it can be seen that there is a clipping error (division) in the vicinity of the clipping error position h, so that the user can be prompted to correct the clipping result.

このように、文字間隔系列と基準符号系列および当該基準符号系列のシフトバージョンとの部分相関をモニターすることにより、切り出し誤りのある概略位置および誤りの種類（連結／分割）を検出（推定）し、ユーザに対して切り出し結果の修正を促すことができる。これにより、ユーザは例えばディスプレイ上で目視では見落としがちな切り出し誤りについても確実に修正することができるため、ユーザによって修正後の文書に対して例えば再度復元処理を実行することで、埋め込み情報をより確実に復元できる利点がある。 In this way, by monitoring the partial correlation between the character interval sequence, the reference code sequence, and the shifted version of the reference code sequence, the approximate position where the clipping error occurs and the type of error (concatenation / division) are detected (estimated). The user can be prompted to correct the cutout result. As a result, the user can reliably correct, for example, a clipping error that is often overlooked visually on the display. Therefore, by executing, for example, a restoration process again on the corrected document by the user, the embedding information can be reduced. There is an advantage that it can be restored reliably.

なお、上記実施形態およびその変形例１，２では、情報埋め込み処理部４００および埋め込み情報復元処理部４５０を共に具備する文書処理装置に適用した場合を例に挙げて説明したが、本発明はこの適用例に限られるものではなく、情報埋め込み処理部４００および埋め込み情報復元処理部４５０のいずれか一方を具備する文書処理装置にも同様に適用可能である。 In the above-described embodiment and its modifications 1 and 2, the case where the present invention is applied to a document processing apparatus including both the information embedding processing unit 400 and the embedded information restoration processing unit 450 has been described as an example. The present invention is not limited to the application example, and can be similarly applied to a document processing apparatus including any one of the information embedding processing unit 400 and the embedded information restoring processing unit 450.

本発明の一実施形態に係る文書処理装置の構成の概略を示すブロック図である。It is a block diagram which shows the outline of a structure of the document processing apparatus which concerns on one Embodiment of this invention. 情報埋め込み処理部の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of a function structure of an information embedding process part. 情報埋め込み処理部における情報埋め込み処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of the information embedding process in an information embedding process part. 埋め込み情報の符号化の説明図である。It is explanatory drawing of encoding of embedded information. 基準符号系列および情報埋め込み用符号系列の自己相関特性についての説明図である。It is explanatory drawing about the autocorrelation characteristic of a reference code sequence and a code sequence for information embedding. ２つの符号系列ｒｅｆ（ｋ），ｐｎ（ｋ）の位相差を利用して埋め込み情報ｂを符号化する処理の説明図である。It is explanatory drawing of the process which encodes the embedding information b using the phase difference of two code series ref (k) and pn (k). 文書データの体裁を修正する処理の説明図である。It is explanatory drawing of the process which corrects the appearance of document data. 埋め込み情報復元処理部の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of a function structure of an embedded information restoration process part. 埋め込み情報復元処理部における埋め込み情報復元処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the embedded information restoration process in an embedded information restoration process part. ｈ番目の文字切り出しが失敗して1つ前の文字と連結して切り出された場合を示す図である。It is a figure which shows the case where h-th character cutout fails and it cuts out by connecting with the character before one. ｈ番目の文字切り出しが失敗して1つ前の文字と連結して切り出された場合における情報復元の様子を示す図である。It is a figure which shows the mode of the information restoration | restoration in case the hth character cutout fails and it cuts out by connecting with the 1st previous character. 文字の切り出し誤りの例を示す図であり、Ａ）は１文字が分割されてしまう例を、（Ｂ）は複数文字が連結されてしまう例をそれぞれ示している。It is a figure which shows the example of the character cut-out error, A) has shown the example in which one character is divided | segmented, (B) has each shown the example in which several characters are connected. 各ウインドウ位置における文字間隔系列ａ’（ｋ）と基準符号系列ｒｅｆ（ｋ＋ｗ）およびｒｅｆ（ｋ＋ｗ±１）との部分相関を示す図である。It is a figure which shows the partial correlation of the character space | interval series a '(k) in each window position, the reference code series ref (k + w), and ref (k + w +/- 1).

Explanation of symbols

１…文書処理装置、１０…第１文書入力部、２０…第２文書入力部、３０…埋め込み情報入力部、４０…処理制御部、４１…ＣＰＵ、４２…Ｉ／Ｏ回路、４３…ＲＯＭ、４４…ＲＡＭ、４５…ＨＤＤ（ハードディスクドライブ装置）、５０…文書出力部、６０…情報出力部、４００…情報埋め込み処理部、４０１…パラメータ値系列抽出部、４０２…符号系列群生成部、４０３…情報符号化部、４０４…修正パラメータ値系列生成部、４０５…文書修正部、４５０…埋め込み情報復元処理部、４５１…パラメータ値系列抽出部、４５３…符号系列群生成部、４５３…埋め込み情報復元部 DESCRIPTION OF SYMBOLS 1 ... Document processing apparatus, 10 ... 1st document input part, 20 ... 2nd document input part, 30 ... Embedded information input part, 40 ... Processing control part, 41 ... CPU, 42 ... I / O circuit, 43 ... ROM, 44 ... RAM, 45 ... HDD (hard disk drive), 50 ... document output unit, 60 ... information output unit, 400 ... information embedding processing unit, 401 ... parameter value series extraction unit, 402 ... code sequence group generation unit, 403 ... Information encoding unit, 404... Modified parameter value sequence generation unit, 405... Document correction unit, 450.

Claims

A document processing apparatus capable of embedding additional information in document data consisting of character strings,
First parameter series extraction means for extracting a parameter value series relating to the document appearance in the character string from the input document data;
First code sequence group generation means for generating a reference code sequence and an information embedding code sequence;
Information encoding for generating the input additional information as an encoded information sequence represented by a phase difference between the reference code sequence generated by the first code sequence group generation means and the information embedding code sequence Means,
Modified parameter value sequence generating means for synthesizing the encoded information sequence generated by the information encoding means with the parameter value sequence extracted by the first parameter series extracting means to generate a corrected parameter value sequence When,
A document processing apparatus comprising: a document correction unit that corrects the document data by using the correction parameter value series generated by the correction parameter value series generation unit.

The information encoding means phase-shifts the information embedding code sequence by a phase difference corresponding to the additional information with respect to the reference code sequence, and the information embedding code sequence after the phase shift and the reference code sequence The document processing apparatus according to claim 1, wherein the encoded information sequence is added.

The first code sequence group generation means generates a plurality of the information embedding code sequences,
The information encoding unit generates a plurality of encoded information sequences represented by a phase difference between the reference code sequence generated by the code sequence group generating unit and each of the plurality of information embedding code sequences. The document processing apparatus according to claim 1 or 2, wherein

A document processing apparatus capable of restoring the additional information from the document data in which the additional information is embedded by the document processing apparatus according to claim 1,
The same reference code sequence as the reference code sequence generated by the first code sequence group generation means, the shifted version of the same reference code sequence, the same information embedding code sequence as the information embedding code sequence, and the same Second code sequence group generation means for generating a shifted version of the information embedding code sequence;
Second parameter series extraction means for extracting the parameter value series from the document data in which the additional information is embedded;
The parameter value sequence extracted by the second parameter sequence extraction means, the same reference code sequence generated by the second code sequence group generation means, and a shift version of the same reference code sequence and the same An information restoring means for restoring the additional information from the document data in which the additional information is embedded based on a partial correlation between the information embedding code sequence and a shifted version of the same information embedding code sequence. Feature document processing device.

The information restoring means includes the parameter value sequence, the same reference code sequence and a shifted version of the same reference code sequence, the same information embedding code sequence and a shifted version of the same information embedding code sequence, And obtaining a phase difference that obtains the maximum partial correlation value for each of the reference code sequence and the information embedding code sequence, and using the phase difference as an information restoration value at the partial correlation value position. 5. The document processing apparatus according to claim 4, wherein a restoration value is obtained for a plurality of partial correlation positions, and an information restoration value that appears most frequently is used as a final restoration value.

The information restoring means is configured to detect a character segmentation error from a transition pattern of partial correlation values between the parameter value series at the plurality of partial correlation positions and the same reference code series and a shifted version of the same reference code series. 6. The document processing apparatus according to claim 5, wherein the rough position and the type of clipping error are detected.

The document processing apparatus according to claim 6, wherein the information restoration unit prompts the user to correct a character cutout result when detecting the cutout error.

A document processing method capable of embedding additional information in document data consisting of character strings,
Extracting a parameter value series related to the document appearance in the character string from the input document data,
Generating a reference code sequence and an information embedding code sequence;
Generating the input additional information as an encoded information sequence represented by a phase difference between the reference code sequence and the information embedding code sequence;
A modified parameter value sequence is generated by combining the encoded information sequence with the parameter value sequence,
The document processing method, wherein the document data is corrected by the correction parameter value series.

A document processing method capable of restoring the additional information from the document data in which the additional information is embedded by the document processing method according to claim 8,
Generates the same reference code sequence as the reference code sequence, the shifted version of the same reference code sequence, the same information embedding code sequence as the information embedding code sequence, and the shifted version of the same information embedding code sequence And
The parameter value sequence extracted from the document data in which the additional information is embedded, the same reference code sequence, a shifted version of the same reference code sequence, the same information embedding code sequence, and the same information A document processing method comprising: restoring the additional information from the document data in which the additional information is embedded based on a partial correlation with a shifted version of an embedded code sequence.

A computer connected to a document input device for inputting document data consisting of character strings.
First parameter series extraction means for extracting a parameter value series relating to a document format in the character string from the document data input from the document input device;
First code sequence group generation means for generating a reference code sequence and an information embedding code sequence;
Information encoding for generating the input additional information as an encoded information sequence represented by a phase difference between the reference code sequence generated by the first code sequence group generation means and the information embedding code sequence Means,
Modified parameter value sequence generating means for synthesizing the encoded information sequence generated by the information encoding means with the parameter value sequence extracted by the first parameter series extracting means to generate a corrected parameter value sequence When,
A document processing program that functions as document correction means for correcting the document data by the correction parameter value series generated by the correction parameter value series generation means.