JP4613807B2

JP4613807B2 - Document processing apparatus and document processing method

Info

Publication number: JP4613807B2
Application number: JP2005341139A
Authority: JP
Inventors: 景則長尾
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-11-25
Filing date: 2005-11-25
Publication date: 2011-01-19
Anticipated expiration: 2025-11-25
Also published as: JP2007150606A

Description

本発明は、複写機、プリンタ、ファクシミリ装置、またはこれらの機能を複合的に備えた複合機等の画像形成装置等に用いられる文書処理装置等に関する。 The present invention relates to a document processing apparatus or the like used for a copying machine, a printer, a facsimile machine, or an image forming apparatus such as a multifunction machine having these functions in combination.

近年、オフィスでは、文書を電子データにより保存する所謂「電子化」が一般化し、さらには、電子データを紙文書化する複写機・プリンタ等も広く普及している。それに伴い、顧客名簿や開発中の製品の情報等といった機密性の高い情報が企業外部に流出する事件が発生するなど、企業内部からの機密情報の漏洩が深刻な問題となっている。特に、電子文書（電子データ）は原本と完全に同一の複製が容易に作成でき、外部への持ち出しもネットワークを使用すれば極めて容易に行なえるため、予てより企業における文書管理上の大きな問題となっていた。しかし、近年の暗号化技術、認証技術、アクセス制限技術等の進歩により、適切な対策を講じれば電子文書の持ち出しは比較的困難になってきている。
それに対して、電子文書は一旦紙の上に印刷され、紙文書化されてしまうと、その複写や社外への持ち出しを防止する術が無い。そのため、電子文書に比べると紙文書の機密漏洩対策の進展は遅れが目立っているのが現状である。 In recent years, so-called “digitization” in which documents are stored as electronic data has become common in offices, and further, copiers, printers, and the like that convert electronic data into paper documents have become widespread. Along with this, leakage of confidential information from inside the company has become a serious problem. For example, highly confidential information such as customer lists and information on products under development has leaked outside the company. In particular, electronic documents (electronic data) can be easily duplicated exactly as the original, and can be taken out to the outside very easily by using a network. It was. However, due to recent advances in encryption technology, authentication technology, access restriction technology, etc., it is relatively difficult to take out electronic documents if appropriate measures are taken.
On the other hand, once an electronic document is printed on paper and converted into a paper document, there is no way to prevent copying or taking it out of the office. Therefore, the current situation is that the progress of countermeasures against the leakage of confidential information of paper documents is more noticeable than that of electronic documents.

紙文書に対する機密漏洩対策の一つとして、紙文書に予め追跡情報を付加しておく技術がある。この技術は、電子文書をプリント出力して紙文書化した際の出力者情報や出力機器情報、出力日時等の情報を紙文書上に記録しておくものである。この技術により、機密情報を印刷した紙文書が社外に流出しても、紙文書上に記録された追跡情報から流出元を特定することが可能となる。
このような追跡情報においては、不可視性およびコンテンツとの不可分性が高いことが望まれる。例えば、追跡情報をバーコードのような可視性の高い記録方法で記録すると、紙文書上のどこに追跡情報が記録されているかが明らかであるため、記録情報を解読・改ざんされる恐れが高くなる。また、バーコードはコンテンツと可分であるため、バーコード部を切り取ったり塗りつぶしたりすることにより、流出元の特定を容易に妨害することが可能である。
また、追跡情報は繰り返しコピーに対する耐性が高いものであることも望まれる。流出した機密情報は原本のまま流通するだけでなく、それをコピーしたものや、さらに二次コピー、三次コピーが出回ることも考えられるからである。 As one of countermeasures against confidential leakage of a paper document, there is a technique for adding tracking information to the paper document in advance. In this technology, information such as output person information, output device information, output date and time when an electronic document is printed out and converted into a paper document is recorded on the paper document. With this technology, even if a paper document on which confidential information is printed leaks out of the company, it is possible to specify the outflow source from the tracking information recorded on the paper document.
Such tracking information is desired to be highly invisible and inseparable from the content. For example, if the tracking information is recorded by a highly visible recording method such as a barcode, it is clear where the tracking information is recorded on the paper document, so that there is a high risk of the recorded information being deciphered / tampered. . Further, since the barcode is separable from the content, it is possible to easily prevent the outflow source from being identified by cutting or painting the barcode portion.
It is also desired that the tracking information is highly resistant to repeated copying. This is because the leaked confidential information is not only distributed as it is, but it is also possible that a copy of the confidential information, secondary copies, and tertiary copies may be circulated.

ここで、紙文書に対して追跡情報等の付加情報を記録する方法の中で不可視性の高いものとしては、以下のような方法が提案されている。
例えば、文字位置をシフトして文字の前後の空白長を変化させることにより情報を埋め込む方法が存在する（例えば、特許文献１参照）。この方法では、情報ビットを埋め込む対象となる文字の前後の空白長をＰ、Ｓとした場合、情報ビットが１ならＰ＞Ｓ、ゼロならＰ＜Ｓとなるよう文字位置をシフトさせることにより情報を埋め込む。この操作を、埋め込み情報のビット数に相当する数の文字に対して行なうことにより、文書に付加情報を埋め込む。埋め込みの際、句読点前後の文字位置をシフトさせると目に付きやすいため、句読点周辺の文字は情報埋め込みに使用しないなどの工夫もなされている。 Here, among the methods for recording additional information such as tracking information on a paper document, the following methods have been proposed as having high invisibility.
For example, there is a method of embedding information by shifting the character position and changing the blank length before and after the character (see, for example, Patent Document 1). In this method, when the space lengths before and after the character to be embedded with information bits are P and S, the character position is shifted so that P> S if the information bit is 1, and P <S if the information bit is zero. Embed. Additional information is embedded in the document by performing this operation on a number of characters corresponding to the number of bits of the embedded information. At the time of embedding, if the character positions before and after the punctuation marks are shifted, it is easy to see them.

また、文字パターンのサイズやフォント種別など、文字パターンの形状を変更することで情報を埋め込む方法が存在する（例えば、特許文献２参照）。この方法では、通常の文書において出現頻度の高い、「の」や「は」などの文字を情報埋め込み対象文字とする。情報は対象文字に対して文書の先頭から順番に埋め込んでいく。情報ビットが１の場合のみ、別のフォントに置き換えたり、文字幅を１０％程度広げたりすることで埋め込み情報を表現し、情報ビットがゼロの場合は何もしない。埋め込んだ情報を復元する際は、情報埋め込み前の元文書と比較し、文字パターンが異なる場合は１、同一の場合はゼロが埋め込まれているものと解釈する。 There is also a method of embedding information by changing the shape of the character pattern such as the size of the character pattern and the font type (see, for example, Patent Document 2). In this method, characters such as “no” and “ha” that frequently appear in a normal document are set as information embedding target characters. Information is embedded in the target characters in order from the beginning of the document. Only when the information bit is 1, the embedded information is expressed by replacing it with another font or increasing the character width by about 10%, and nothing is done when the information bit is zero. When restoring the embedded information, it is compared with the original document before the information is embedded, and if the character pattern is different, it is interpreted that 1 is embedded, and if it is the same, zero is embedded.

特開２００３−２３０００１号公報(第５−７頁)JP 2003-230001 (page 5-7) 特開２００１−２５１４９０号公報(第４−６頁)JP 2001-251490 A (page 4-6)

しかしながら、上記した特許文献１や特許文献２に開示された付加情報の埋め込み方法では、何度もコピーを繰り返した紙文書に対して、情報追跡効果を充分に機能させることができない。これは、何度もコピーを繰り返した紙文書では、文字の太りや細り、あるいは文字上のノイズや荒れといった画質劣化が生じるからである。
例えば、特許文献１のような文字位置のシフト量に情報を埋め込む方法では、繰り返しコピーによる画質劣化によって文字間の空白長が変化してしまい、埋め込んだ情報が正しく復元できない場合が生じる。これを避けるには文字位置のシフト量を大きくして、空白長が多少変化しても文字前後の空白長の大小関係が維持されるようにする必要がある。ところが、文字位置のシフト量を大きくすると、文書に加えた変更が視覚的に明らかになってしまい、埋め込み情報の不可視性が損なわれることとなる。
また、文字列が見た目に自然に見えるように前後の文字に合わせて文字間隔を調節した、所謂カーニングの施された文書では、文字間隔が一定ではないために、元々Ｐ＞Ｓの関係にあった文字前後の空白長を、Ｐ＜Ｓとなるように変更しようとすると文字位置を大きくシフトする必要があり、やはり不可視性を損なうという問題が生じる。 However, the additional information embedding methods disclosed in Patent Document 1 and Patent Document 2 described above cannot sufficiently function the information tracking effect for paper documents that have been copied many times. This is because in a paper document that has been copied many times, image quality degradation such as thick and thin characters or noise and roughness on characters occurs.
For example, in the method of embedding information in the character position shift amount as in Patent Document 1, the space length between characters changes due to image quality deterioration due to repeated copying, and the embedded information may not be correctly restored. In order to avoid this, it is necessary to increase the shift amount of the character position so that the size relationship of the space length before and after the character is maintained even if the space length changes slightly. However, if the shift amount of the character position is increased, the change made to the document becomes visually apparent, and the invisibility of the embedded information is impaired.
In a so-called kerned document in which the character spacing is adjusted according to the characters before and after the character string so that it looks natural to the eye, the character spacing is not constant. If an attempt is made to change the space length before and after the character so that P <S, the character position must be largely shifted, which also causes a problem of impairing the invisibility.

同様に、特許文献２で開示されている方法においても、繰り返しコピーによる画質劣化に対する耐性が低いという問題がある。すなわち、画質が劣化した紙文書からも文字パターンのサイズやフォント種別などの形状の違いを判別するためには、文字サイズを大きく変更したり、形状が大きく異なるフォントを使用したりする必要がある。ところが、文字パターンの形状を大きく変更すると、文書に加えた変更が視覚的に明らかになってしまい、埋め込み情報の不可視性が損なわれることとなる。
また、「の」、「は」などの埋め込み対象文字の出現頻度が小さい文書には情報を埋め込むことができないという問題もある。 Similarly, the method disclosed in Patent Document 2 also has a problem of low resistance to image quality degradation due to repeated copying. In other words, in order to determine the difference in shape such as the character pattern size and font type from a paper document whose image quality has deteriorated, it is necessary to change the character size greatly or use fonts with greatly different shapes. . However, if the shape of the character pattern is greatly changed, the change made to the document becomes visually apparent, and the invisibility of the embedded information is impaired.
In addition, there is a problem that information cannot be embedded in a document in which the frequency of appearance of characters to be embedded such as “no” and “ha” is small.

そこで本発明は、以上のような技術的課題を解決するためになされたものであり、その目的とするところは、紙文書に対して不可視性およびコンテンツとの不可分性が高い付加情報を記録可能とすることにある。
また他の目的は、繰り返しコピーに対するロバスト性の高い付加情報の記録を可能とすることにある。 Therefore, the present invention has been made to solve the technical problems as described above, and the object of the present invention is to record additional information that is invisible and highly inseparable from the content on a paper document. It is to do.
Another object is to enable recording of additional information with high robustness against repeated copying.

かかる目的のもと、本発明の文書処理装置は、文字列からなる文書データに付加情報を埋め込むことが可能な文書処理装置であって、入力された文書データから文字列における文書体裁に関するパラメータ値系列を抽出するパラメータ系列抽出部と、文書データに埋め込む付加情報を入力する情報入力部と、所定の符号系列を生成する符号系列生成部と、情報入力部から入力された付加情報をシフト量として、符号系列生成部により生成された所定の符号系列をかかるシフト量だけシフトさせた符号化情報系列を生成する情報符号化部と、パラメータ値系列に符号化情報系列を合成した修正パラメータ値系列を生成する修正パラメータ系列生成部と、生成された修正パラメータ値系列により文書データを修正する文書修正部とを備えたことを特徴としている。
なお、ここでの文書データには、ＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式の電子文書データ、スキャナやデジタルカメラにより撮影された文書のラスタデータ等が含まれる。また、文字列における文書体裁には、文字間隔、文字位置、単語間隔、行間隔、文字高さ、文字幅、文字の傾き等が含まれる。以下、同様である。 For this purpose, the document processing apparatus of the present invention is a document processing apparatus capable of embedding additional information in document data consisting of character strings, and is a parameter value relating to document appearance in character strings from input document data. A parameter sequence extraction unit that extracts a sequence, an information input unit that inputs additional information to be embedded in document data, a code sequence generation unit that generates a predetermined code sequence, and additional information input from the information input unit as a shift amount An information encoding unit that generates an encoded information sequence obtained by shifting the predetermined code sequence generated by the code sequence generating unit by the shift amount, and a modified parameter value sequence obtained by synthesizing the encoded information sequence with the parameter value sequence. A correction parameter sequence generation unit for generating, and a document correction unit for correcting the document data using the generated correction parameter value sequence. It is a symptom.
Note that the document data here includes page description language such as PostScript, electronic document data in a word processor output format, raster data of a document photographed by a scanner or a digital camera, and the like. The document format in the character string includes character spacing, character position, word spacing, line spacing, character height, character width, character inclination, and the like. The same applies hereinafter.

ここで、符号系列生成部で生成された所定の符号系列と同一の符号系列とかかる符号系列のシフトバージョンとを生成する符号系列群生成部と、パラメータ系列抽出部により抽出されたパラメータ値系列と、符号系列群生成部により生成された符号系列およびかかる符号系列のシフトバージョンとの相関値に基づき、文書データに埋め込まれた付加情報を復元する情報復元部とをさらに備えたことを特徴とすることができる。 Here, a code sequence group generation unit that generates the same code sequence as the predetermined code sequence generated by the code sequence generation unit and a shifted version of the code sequence, a parameter value sequence extracted by the parameter sequence extraction unit, And an information restoration unit for restoring additional information embedded in the document data based on a correlation value between the code sequence generated by the code sequence group generation unit and a shifted version of the code sequence. be able to.

また、情報入力部から入力された付加情報を分割して複数の部分付加情報を生成する部分情報生成部をさらに備え、情報符号化部は、部分情報生成部で生成された部分付加情報毎の部分符号化情報系列を生成し、さらに部分符号化情報系列を連結して符号化情報系列を生成することを特徴とすることもできる。
この場合に、符号系列生成部で生成された所定の符号系列と同一の符号系列とかかる符号系列のシフトバージョンとを生成する符号系列群生成部と、パラメータ系列抽出部にて抽出されたパラメータ値系列を分割して複数の部分パラメータ値系列を生成する部分パラメータ値系列生成部と、部分パラメータ値系列生成部にて生成された部分パラメータ値系列と、符号系列群生成部により生成された符号系列およびかかる符号系列のシフトバージョンとの相関値に基づき、部分パラメータ値系列毎に埋め込まれた付加情報を復元し、さらに復元された部分パラメータ値系列毎の付加情報を連結して文書データに埋め込まれた情報を復元する情報復元部とをさらに備えたことを特徴とすることもできる。 Further, the information processing unit further includes a partial information generation unit that generates a plurality of partial additional information by dividing the additional information input from the information input unit, and the information encoding unit includes a unit for each partial additional information generated by the partial information generation unit. A partial encoded information sequence is generated, and the partial encoded information sequence is further concatenated to generate an encoded information sequence.
In this case, a code sequence group generation unit that generates the same code sequence as the predetermined code sequence generated by the code sequence generation unit and a shifted version of the code sequence, and a parameter value extracted by the parameter sequence extraction unit A partial parameter value sequence generation unit that generates a plurality of partial parameter value sequences by dividing the sequence, a partial parameter value sequence generated by the partial parameter value sequence generation unit, and a code sequence generated by the code sequence group generation unit Based on the correlation value with the shift version of the code sequence, the additional information embedded for each partial parameter value sequence is restored, and the restored additional information for each partial parameter value sequence is concatenated and embedded in the document data. It is also possible to further include an information restoration unit that restores the information.

さらに、情報入力部から入力された付加情報を分割して複数の部分付加情報を生成する部分情報生成部をさらに備え、符号系列生成部は、部分情報生成部により生成された部分付加情報毎に固有の符号系列を生成し、情報符号化部は、部分付加情報毎の固有の符号系列により部分付加情報毎の部分符号化情報系列を生成し、さらにかかる部分符号化情報系列を多重化して符号化情報系列を生成することを特徴とすることができる。
この場合に、符号系列生成部で生成された部分付加情報毎の固有の符号系列と同一の符号系列と、符号系列のシフトバージョンとを生成する符号系列群生成部と、パラメータ系列抽出部にて抽出されたパラメータ値系列を分割して複数の部分パラメータ値系列を生成する部分パラメータ値系列生成部と、部分パラメータ値系列生成部で生成された部分パラメータ値系列と、符号系列群生成部で生成された符号系列およびかかる符号系列のシフトバージョンとの相関値に基づき、部分パラメータ値系列毎に埋め込まれた付加情報を復元し、さらにかかる部分パラメータ値系列毎に埋め込まれた付加情報を連結して文書データに埋め込まれた情報を復元する情報復元部とをさらに備えたことを特徴とすることもできる。 Furthermore, it further includes a partial information generation unit that divides the additional information input from the information input unit to generate a plurality of partial additional information, and the code sequence generation unit is provided for each partial additional information generated by the partial information generation unit. A unique code sequence is generated, and the information encoding unit generates a partial encoded information sequence for each partial additional information based on a specific code sequence for each partial additional information, and further multiplexes the partial encoded information sequence to generate a code. Generating a digitized information sequence.
In this case, the code sequence group generation unit that generates the same code sequence as the unique code sequence for each partial additional information generated by the code sequence generation unit and the shift version of the code sequence, and the parameter sequence extraction unit A partial parameter value series generation unit that generates a plurality of partial parameter value series by dividing the extracted parameter value series, a partial parameter value series generated by the partial parameter value series generation unit, and a code sequence group generation unit The additional information embedded for each partial parameter value sequence is restored based on the correlation value between the generated code sequence and the shifted version of the code sequence, and the additional information embedded for each partial parameter value sequence is further connected. An information restoration unit that restores information embedded in the document data can be further provided.

加えて、情報入力部から入力された付加情報を分割して複数の部分付加情報を生成する部分情報生成部をさらに備え、情報符号化部は、部分情報系列生成部で生成された部分付加情報毎の部分符号化情報系列を周期的に反復して生成し、さらにかかる部分符号化情報系列を連結して符号化情報系列を生成するとともに、符号化情報系列に同期させた同期用符号を生成し、修正パラメータ系列生成部は、同期用符号を修正パラメータ値系列に多重化させることを特徴とすることができる。
この場合に、入力された文書データから文字列における文書体裁に関するパラメータ値系列を抽出するパラメータ系列抽出部と、パラメータ系列抽出部にて抽出されたパラメータ値系列を分割して複数の部分パラメータ値系列を生成する部分パラメータ値系列生成部と、符号系列生成部で生成された所定の符号系列と同一の符号系列とかかる符号系列のシフトバージョン、および符号化情報系列に同期させた同期用符号と同一の符号系列とかかる符号系列のシフトバージョンとを生成する符号系列群生成部と、部分パラメータ値系列生成部で生成された部分パラメータ値系列と、符号系列群生成部で生成された符号系列およびかかる符号系列のシフトバージョンとの相関値に基づき、部分パラメータ値系列毎に埋め込まれた付加情報を復元するとともに、パラメータ値系列と、同期用符号系列およびかかる符号系列のシフトバージョンとの相関値に基づき、パラメータ値系列毎に埋め込まれた同期用符号系列の位置を検出し、検出された同期用符号系列位置と部分パラメータ値系列との位置関係を検出して部分パラメータ値系列の付加情報の順番を決定し、部分パラメータ値系列の付加情報をかかる順番に従って連結して文書データに埋め込まれた付加情報を復元する情報復元部とをさらに備えたことを特徴とすることもできる。 In addition, the information processing unit further includes a partial information generation unit that generates additional partial information by dividing the additional information input from the information input unit, and the information encoding unit includes the partial additional information generated by the partial information sequence generation unit. Generates a partially encoded information sequence periodically and generates a coded information sequence by concatenating such partially coded information sequences, and generates a synchronization code synchronized with the coded information sequence. The correction parameter sequence generation unit may multiplex the synchronization code into the correction parameter value sequence.
In this case, a parameter series extraction unit that extracts a parameter value series related to the document appearance in the character string from the input document data, and a plurality of partial parameter value series by dividing the parameter value series extracted by the parameter series extraction unit The same code sequence as the predetermined code sequence generated by the code sequence generation unit, the shifted version of the code sequence, and the synchronization code synchronized with the encoded information sequence A code sequence group generation unit that generates a code sequence of the code sequence and a shifted version of the code sequence, a partial parameter value sequence generated by the partial parameter value sequence generation unit, a code sequence generated by the code sequence group generation unit, and the code sequence Restore additional information embedded in each partial parameter value series based on the correlation value with the shift version of the code series In addition, the position of the synchronization code sequence embedded for each parameter value sequence is detected based on the correlation value between the parameter value sequence, the synchronization code sequence and the shifted version of the code sequence, and the detected synchronization code sequence is detected. The positional relationship between the position and the partial parameter value series is detected to determine the order of the additional information of the partial parameter value series, and the additional information embedded in the document data is connected by linking the additional information of the partial parameter value series according to the order. An information restoring unit for restoring may be further provided.

また、本発明を文書処理方法として捉え、本発明の文書処理方法は、文字列からなる文書データに付加情報を埋め込むことが可能な文書処理方法であって、文字列からなる文書データを入力し、入力された文書データから文字列における文書体裁に関するパラメータ値系列を抽出し、文書データに埋め込む付加情報をシフト量として、所定の符号系列をかかるシフト量だけシフトさせた符号化情報系列を生成し、パラメータ値系列に符号化情報系列を合成した修正パラメータ値系列を生成し、かかる修正パラメータ値系列により文書データを修正することを特徴としている。
ここで、付加情報を分割して複数の部分付加情報を生成し、部分付加情報毎の部分符号化情報系列を生成し、さらにかかる部分符号化情報系列を連結して符号化情報系列を生成し、かかる符号化情報系列を用いて文書データを修正することを特徴とすることもできる。 Further, the present invention is regarded as a document processing method, and the document processing method of the present invention is a document processing method capable of embedding additional information in document data consisting of character strings, and inputting document data consisting of character strings. Extracting a parameter value series related to the document appearance in the character string from the input document data, and generating an encoded information series obtained by shifting a predetermined code series by the shift amount using the additional information embedded in the document data as a shift amount. A modified parameter value sequence obtained by synthesizing an encoded information sequence with a parameter value sequence is generated, and document data is modified by the modified parameter value sequence.
Here, the additional information is divided to generate a plurality of partial additional information, a partial encoded information sequence for each partial additional information is generated, and further, the partial encoded information sequence is concatenated to generate an encoded information sequence. The document data may be corrected using such an encoded information sequence.

さらに、本発明の文書処理方法は、付加情報が埋め込まれた文書データから付加情報を復元することが可能な文書処理方法であって、文字列からなる文書データを入力し、入力された文書データから文字列における文書体裁に関するパラメータ値系列を抽出し、付加情報が埋め込まれた際に使用された符号系列と同一の符号系列とかかる符号系列のシフトバージョンとを生成し、パラメータ値系列と、符号系列およびかかる符号系列のシフトバージョンとの相関値に基づき、文書データに埋め込まれた付加情報を復元することを特徴としている。
ここで、抽出されたパラメータ値系列を分割して複数の部分パラメータ値系列を生成し、生成された部分パラメータ値系列と、符号系列およびかかる符号系列のシフトバージョンとの相関値に基づき、部分パラメータ値系列毎の付加情報を復元し、さらに復元された部分パラメータ値系列毎の付加情報を連結して文書データに埋め込まれた付加情報を復元することを特徴とすることもできる。 Further, the document processing method of the present invention is a document processing method capable of restoring the additional information from the document data in which the additional information is embedded, the document data consisting of a character string being input, and the input document data The parameter value series related to the document format in the character string is extracted from the code string, the same code series as the code series used when the additional information is embedded and a shifted version of the code series are generated, and the parameter value series and the code The additional information embedded in the document data is restored based on the correlation value between the sequence and the shift version of the code sequence.
Here, the extracted parameter value series is divided to generate a plurality of partial parameter value series, and based on the correlation value between the generated partial parameter value series and the code series and a shifted version of the code series, the partial parameter The additional information for each value series is restored, and the additional information embedded in the document data is restored by connecting the restored additional information for each partial parameter value series.

本発明によれば、埋め込まれた付加情報の解読・改ざん・除去を困難とすることが可能となる。また、カーニング処理された文書や繰り返しコピーが施された紙文書からも埋め込まれた付加情報を安定して復元することが可能となる。 According to the present invention, it is possible to make it difficult to decode, falsify, and remove embedded additional information. Further, it is possible to stably restore additional information embedded from a kerning-processed document or a paper document that has been repeatedly copied.

以下、添付図面を参照して、本発明の実施の形態について詳細に説明する。
[実施の形態１]
図１は本実施の形態が適用される文書処理装置１の構成を示したブロック図である。図１に示す文書処理装置１は、付加情報の埋め込み対象となる文書データ等を処理部３０に入力する第１文書入力部１０、付加情報の埋め込まれたラスタデータ等を処理部３０に入力する第２文書入力部２０、第１文書入力部１０から入力された文書データ等に埋め込むべき付加情報を入力する埋め込み情報入力部４０、第１文書入力部１０から入力された文書データ等に対して、埋め込み情報入力部４０から入力された付加情報を埋め込む処理、または第２文書入力部２０から入力されたラスタデータ等から埋め込まれた付加情報を復元する処理を実行する処理部３０、処理部３０により付加情報の埋め込み処理が施された文書データ等を所定の形式で出力する文書出力部５０、処理部３０により復元された付加情報を出力する情報出力部６０を備えている。 Embodiments of the present invention will be described below in detail with reference to the accompanying drawings.
[Embodiment 1]
FIG. 1 is a block diagram illustrating a configuration of a document processing apparatus 1 to which the exemplary embodiment is applied. A document processing apparatus 1 shown in FIG. 1 inputs a first document input unit 10 that inputs document data or the like to be embedded with additional information into the processing unit 30, and inputs raster data or the like with additional information embedded into the processing unit 30. The second document input unit 20, the embedded information input unit 40 that inputs additional information to be embedded in the document data input from the first document input unit 10, the document data input from the first document input unit 10, etc. A processing unit 30 for executing processing for embedding additional information input from the embedding information input unit 40, or for restoring additional information embedded from raster data or the like input from the second document input unit 20, and processing unit 30 The document output unit 50 that outputs the document data and the like subjected to the additional information embedding process in a predetermined format, and the information output unit that outputs the additional information restored by the processing unit 30 It is equipped with a 0.

第１文書入力部１０は、付加情報の埋め込み対象となる文書データを処理部３０に入力するためのものである。文書データとしては、ＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式の電子文書データ、あるいはスキャナやデジタルカメラにより撮影された文書のラスタデータがある。第１文書入力部１０は、例えば、文書データが蓄積されたハードディスクドライブ（ＨＤＤ）、ＤＶＤ（Digital Video Disc）−ＲＡＭ／±ＲＷ／±Ｒドライブ等の大容量記憶装置、ネットワーク等を介してデータの授受を行なうデータ転送装置、あるいはスキャナ装置とその制御装置、デジタルカメラのメモリ（カード）に蓄積された画像を取り出すメモリリーダ装置とその制御装置等によって実現することが可能である。 The first document input unit 10 is for inputting document data to be embedded with additional information to the processing unit 30. Document data includes page description language such as PostScript, electronic document data in a word processor output format, or raster data of a document photographed by a scanner or a digital camera. The first document input unit 10 receives data via, for example, a hard disk drive (HDD) in which document data is stored, a mass storage device such as a DVD (Digital Video Disc) -RAM / ± RW / ± R drive, a network, or the like. It can be realized by a data transfer device that exchanges images, a scanner device and its control device, a memory reader device that takes out images stored in a memory (card) of a digital camera, and its control device.

第２文書入力部２０は、付加情報の埋め込まれた文書のラスタデータを入力するものである。文書のラスタデータとしては、スキャナやデジタルカメラにより撮影された紙文書の画像データがある。なお、第２文書入力部２０は、後段で述べる情報埋め込み処理部３００と同一の構成をとることもできる。 The second document input unit 20 inputs raster data of a document in which additional information is embedded. As raster data of a document, there is image data of a paper document photographed by a scanner or a digital camera. Note that the second document input unit 20 can have the same configuration as the information embedding processing unit 300 described later.

埋め込み情報入力部４０は、第１文書入力部１０から入力された文書データに埋め込むべき付加情報を処理部３０に入力するためのものである。入力される付加情報としては、数字や文字、ＵＲＬ、音声や画像等のマルチメディアデータなど、デジタルデータであればいずれも使用することができる。埋め込み情報入力部４０は、例えば、文字等を入力するためのキーボードや、マルチメディアデータ等を入力するためのハードディスクドライブ（ＨＤＤ）、ＤＶＤ（Digital Video Disc）−ＲＡＭ／±ＲＷ／±Ｒドライブ等の大容量記憶装置、ネットワーク等を介して情報の授受を行うデータ転送装置等によって実現することが可能である。 The embedded information input unit 40 is used to input additional information to be embedded in the document data input from the first document input unit 10 to the processing unit 30. As the additional information to be input, any digital data such as numbers, characters, URLs, multimedia data such as sounds and images, and the like can be used. The embedded information input unit 40 includes, for example, a keyboard for inputting characters and the like, a hard disk drive (HDD) for inputting multimedia data, a DVD (Digital Video Disc) -RAM / ± RW / ± R drive, and the like. It can be realized by a large-capacity storage device, a data transfer device that transmits and receives information via a network, and the like.

文書出力部５０は、処理部３０により文書体裁が修正された文書データ等を、所定の形式で出力するものである。文書出力部５０からは、例えば、紙に印刷された文書、またはＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式の電子文書データ、あるいは文書のラスタデータ形式のファイルが、出力結果として出力される。そのため、文書出力部５０は、例えば、プリンタ装置とその制御装置、磁気ディスクやメモリカード等のリード／ライト装置とその制御装置、またはネットワーク等を介してデータの授受を行うデータ転送装置等によって実現される。 The document output unit 50 outputs document data or the like whose document format has been corrected by the processing unit 30 in a predetermined format. From the document output unit 50, for example, a document printed on paper, a page description language such as PostScript, electronic document data in a word processor output format, or a file in a raster data format of a document is output as an output result. Therefore, the document output unit 50 is realized by, for example, a printer device and its control device, a read / write device such as a magnetic disk or a memory card and its control device, or a data transfer device that exchanges data via a network or the like. Is done.

情報出力部６０は、復元された付加情報を出力するためのものである。情報出力部６０は、例えば、ＣＲＴ（Cathode Ray Tube）や液晶ディスプレイ等の表示装置とその制御装置、プリンタ装置とその制御装置、磁気ディスクやメモリカード等のリード／ライト装置とその制御装置、またはネットワーク等を介してデータの授受を行うデータ転送装置等によって実現される。 The information output unit 60 is for outputting the restored additional information. The information output unit 60 includes, for example, a display device such as a CRT (Cathode Ray Tube) or a liquid crystal display and its control device, a printer device and its control device, a read / write device such as a magnetic disk or memory card, and its control device, or This is realized by a data transfer device that exchanges data via a network or the like.

また、処理部３０は、演算処理を行うＣＰＵ(Central Processing Unit：中央処理装置)３１、第１文書入力部１０、第２文書入力部２０や埋め込み情報入力部４０、さらには文書出力部５０や情報出力部６０といった周辺機器との入出力を管理するＩ/Ｏ回路３２、処理プログラムが格納されたＲＯＭ３３、ＤＲＡＭ(Dynamic Random Access Memory)等からなる一次記憶装置としてのＲＡＭ３４、ＣＰＵ３１で処理された文書データを記憶するハードディスクドライブ装置（ＨＤＤ）３５を含んで構成されている。 The processing unit 30 includes a CPU (Central Processing Unit) 31, a first document input unit 10, a second document input unit 20, an embedded information input unit 40, a document output unit 50, and the like. Processed by an I / O circuit 32 that manages input / output with peripheral devices such as the information output unit 60, a ROM 33 that stores processing programs, a RAM 34 as a primary storage device including a DRAM (Dynamic Random Access Memory), and the CPU 31 A hard disk drive (HDD) 35 for storing document data is included.

次に、本実施の形態の文書処理装置１において実行される情報埋め込み処理について説明する。
文書処理装置１における情報埋め込み処理は、処理部３０内に構築された情報埋め込み処理部３００で実行される。図２は、情報埋め込み処理部３００の機能構成を説明するブロック図である。図２に示したように、情報埋め込み処理部３００は、パラメータ値系列抽出部３０１、符号系列生成部３０２、情報符号化部３０３、修正パラメータ値系列生成部３０４、文書修正部３０５を備えて構成されている。 Next, an information embedding process executed in the document processing apparatus 1 according to the present embodiment will be described.
The information embedding process in the document processing apparatus 1 is executed by the information embedding processing unit 300 built in the processing unit 30. FIG. 2 is a block diagram illustrating a functional configuration of the information embedding processing unit 300. As shown in FIG. 2, the information embedding processing unit 300 includes a parameter value sequence extraction unit 301, a code sequence generation unit 302, an information encoding unit 303, a correction parameter value sequence generation unit 304, and a document correction unit 305. Has been.

パラメータ値系列抽出部３０１は、第１文書入力部１０から入力された電子文書データや文書のラスタデータを解析し、文書上の文字列から得られる文字間隔、文字位置、単語間隔、行間隔、文字高さ、文字幅、文字の傾き等の文書体裁に関するパラメータ値の系列を求めるものである。
符号系列生成部３０２は、埋め込み情報入力部４０から入力される付加情報を符号化した符号化情報を生成するためのものであり、単一の符号系列を生成する。
情報符号化部３０３は、埋め込み情報入力部４０から入力された付加情報を、符号系列生成部３０２により生成された符号系列によって符号化した符号化情報系列を生成するものである。ここでは、入力された情報に応じて符号系列のシフト量を変えたものを符号化情報系列とする。 The parameter value series extraction unit 301 analyzes the electronic document data input from the first document input unit 10 or the raster data of the document, and obtains a character interval, a character position, a word interval, a line interval obtained from a character string on the document, A series of parameter values related to the document appearance such as character height, character width, and character inclination is obtained.
The code sequence generation unit 302 is for generating encoded information obtained by encoding the additional information input from the embedded information input unit 40, and generates a single code sequence.
The information encoding unit 303 generates an encoded information sequence obtained by encoding the additional information input from the embedded information input unit 40 with the code sequence generated by the code sequence generating unit 302. Here, an encoded information sequence is obtained by changing the shift amount of the code sequence in accordance with the input information.

修正パラメータ値系列生成部３０４は、パラメータ値系列抽出部３０１により求められた入力文書に関するパラメータ値系列の修正を行なうものである。修正パラメータ値系列は、パラメータ値系列の各シンボル値を、情報符号化部３０３により求められた符号化情報系列の対応するシンボル値に応じて修正することにより得られる。
文書修正部３０５は、第１文書入力部１０から入力された文書の体裁を、修正パラメータ値系列生成部３０４により求められた修正パラメータ値系列の各シンボル値に応じて修正するものである。そして、ここでは、修正後の文書上の文字列から得られる文字間隔、文字位置、単語間隔、行間隔、文字高さ、文字幅、文字の傾き等の、文書体裁に関するパラメータ値が、修正パラメータ値系列の各シンボル値に等しくなるよう修正される。 The corrected parameter value series generation unit 304 corrects the parameter value series related to the input document obtained by the parameter value series extraction unit 301. The corrected parameter value series is obtained by correcting each symbol value of the parameter value series in accordance with the corresponding symbol value of the encoded information sequence obtained by the information encoding unit 303.
The document correction unit 305 corrects the appearance of the document input from the first document input unit 10 according to each symbol value of the correction parameter value series obtained by the correction parameter value series generation unit 304. And here, the parameter values related to the document format, such as the character spacing, character position, word spacing, line spacing, character height, character width, character inclination, etc. obtained from the character string on the revised document are the correction parameters. Modified to be equal to each symbol value in the value series.

続いて、本実施の形態の文書処理装置１で用いられる情報埋め込み処理方法について述べる。図３は、文書処理装置１で実行される情報埋め込み処理の手順の一例を示したフローチャートである。以下に示した例では、埋め込むべき付加情報はｎビットの２値データ、情報を埋め込むための文書体裁に関するパラメータとして文字間隔を用いる場合について説明する。 Next, an information embedding processing method used in the document processing apparatus 1 according to this embodiment will be described. FIG. 3 is a flowchart showing an example of the procedure of the information embedding process executed by the document processing apparatus 1. In the example shown below, a case will be described in which additional information to be embedded is n-bit binary data, and character spacing is used as a parameter relating to a document format for embedding information.

まず、文書入力ステップ（Ｓ１０１）において、第１文書入力部１０からパラメータ値系列抽出部３０１に対して、付加情報の埋め込み対象となる電子文書データが入力される。
次のパラメータ値系列抽出ステップ（Ｓ１０２）において、パラメータ値系列抽出部３０１は、第１文書入力部１０から入力された電子文書データや文書のラスタデータを解析し、文書上の文字列を構成する各文字間の間隔を求め、これを文書体裁に関するパラメータ値の系列とする。
具体的には、図４に示したように、第１文書入力部１０から入力された文書上の文字列を構成する各文字間の間隔を並べた系列を、この文書のパラメータ値系列ａ（ｋ）、（ｋ＝０，１，２，・・・，Ｌ−１）とする。文字間隔としては文書上のどの位置における文字間隔を用いても良いが、本実施の形態では、文書先頭からＬ個分(Ｌ≧２^ｎ)の文字間隔を用いるものとする。また、文書上の１行から得られる文字間隔数がＬ個に満たない場合は、続く複数行から得られる文字間隔パラメータを連結して系列長Ｌのパラメータ値系列を得るものとする。 First, in the document input step (S101), electronic document data to be embedded with additional information is input from the first document input unit 10 to the parameter value series extraction unit 301.
In the next parameter value series extraction step (S102), the parameter value series extraction unit 301 analyzes the electronic document data input from the first document input unit 10 or the raster data of the document, and forms a character string on the document. An interval between characters is obtained, and this is used as a series of parameter values relating to the document style.
Specifically, as shown in FIG. 4, a series in which the intervals between the characters constituting the character string on the document input from the first document input unit 10 are arranged is a parameter value series a ( k), (k = 0, 1, 2,..., L−1). The character spacing at any position on the document may be used as the character spacing, but in this embodiment, L character spacings from the beginning of the document (L ≧ 2 ⁿ ) are used. When the number of character intervals obtained from one line on the document is less than L, a character value parameter having a sequence length L is obtained by concatenating character interval parameters obtained from a plurality of subsequent lines.

ここで、パラメータ値系列抽出ステップ（Ｓ１０２）において、入力される電子文書データがＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式である場合は、電子文書データを解析することにより文字間隔を直接求めることができる。電子文書データを解析する方法については特定の方法に限定されるものではなく、電子文書データのフォーマットに応じて適当な方法を用いることが可能である。
また、電子文書データがスキャナやデジタルカメラにより撮影された文書のラスタデータである場合は、文字認識の前処理として一般的に行われる文字の切り出し処理を行い、切り出された各文字矩形間の距離を文字間隔とすることができる。文字の切り出し処理についても特定の方法に限定されるものではなく、一般に用いられる適当な手法を用いることが可能である。
なお、本実施の形態では、情報を埋め込むための文書体裁に関するパラメータとして文字間隔を用いるが、パラメータ値としては文字位置、単語間隔、行間隔、文字高さ、文字幅、文字の傾き、さらにはこれらのいくつかを組み合わせたものを利用することも可能である。 Here, in the parameter value series extraction step (S102), if the input electronic document data is in a page description language such as PostScript or a word processor output format, the character spacing can be directly obtained by analyzing the electronic document data. it can. The method of analyzing the electronic document data is not limited to a specific method, and an appropriate method can be used according to the format of the electronic document data.
In addition, when the electronic document data is raster data of a document taken by a scanner or a digital camera, a character segmenting process generally performed as a character recognition pre-process is performed, and the distance between the extracted character rectangles. Can be the character spacing. The character cut-out processing is not limited to a specific method, and any generally used appropriate method can be used.
In this embodiment, the character spacing is used as a parameter relating to the document format for embedding information. The parameter values include character position, word spacing, line spacing, character height, character width, character inclination, and A combination of some of these can also be used.

次の埋め込み情報符号化ステップ(Ｓ１０３)では、埋め込み情報入力部４０から入力されたｎビットの２値データを、情報符号化部３０３にて、符号系列生成部３０２により生成される符号系列により符号化し、符号化情報系列を求める。ここでは、埋め込み情報入力部４０から入力されたｎビットの２値データをｂで表すこととする。 In the next embedded information encoding step (S103), n-bit binary data input from the embedded information input unit 40 is encoded by the code sequence generated by the code sequence generation unit 302 in the information encoding unit 303. To obtain an encoded information sequence. Here, n-bit binary data input from the embedded information input unit 40 is represented by b.

以下に、埋め込み情報符号化ステップ（Ｓ１０３）において符号化情報系列を求める処理の詳細について説明する。
符号系列生成部３０２により生成される符号系列は、シンボル値の平均が小さな値で、ランダム性を有するものが用いられる。また、生成される符号系列は図５に示すような鋭い自己相関特性を持つものとする。このような性質を有する公知の符号系列としてはｍ系列があるが、符号系列生成部３０２により生成される符号系列はこれらに限るものではなく、上記の性質を有するものであれば良い。
ここでは、符号系列生成部３０２により生成される符号系列として、１周期の系列長がＬのｍ系列を用いる例について記す。ただし、系列長Ｌは２^ｎ以上とする。また、符号系列をｐｎ（ｋ）で表し、そのシンボル値は±１のいずれかをとるものとする。すなわち、符号系列ｐｎ（ｋ）は、次式（１）で表される。なお、以下の記載において、式（１）から式（２６）は、[数１]から[数２６]に対応するものとする。 Details of the process for obtaining the encoded information sequence in the embedded information encoding step (S103) will be described below.
As the code sequence generated by the code sequence generation unit 302, a code sequence having a small average value and having randomness is used. The generated code sequence has a sharp autocorrelation characteristic as shown in FIG. Known code sequences having such properties include m sequences, but the code sequences generated by the code sequence generation unit 302 are not limited to these, and any code sequences having the above properties may be used.
Here, an example in which an m-sequence having a one-sequence length L is used as the code sequence generated by the code sequence generation unit 302 will be described. However, the sequence length L is 2 ⁿ or more. Also, the code sequence is represented by pn (k), and the symbol value thereof takes any of ± 1. That is, the code sequence pn (k) is expressed by the following equation (1). In the following description, Equations (1) to (26) correspond to [Equation 1] to [Equation 26].

この符号系列ｐｎ（ｋ）を用いてｎビットの埋め込み情報ｂを情報符号化部３０３にて符号化する。本実施の形態では、図６に示したように、情報の符号化法として、符号系列ｐｎ（ｋ）のシフト量で情報を表現する。系列長Ｌの符号は、シフトゼロからシフトＬ−１までの、都合Ｌ通りの状態を表現し得る。したがって、系列長Ｌの符号系列ｐｎ（ｋ）を用いてｌｏｇ_２Ｌビットの情報を符号化できることになる。この原理を利用して埋め込み情報ｂを符号化した符号化情報系列ｃ（ｋ）を求める。すなわち、符号化情報系列ｃ（ｋ）は、次式（２）で表される。ただし、次式（２）において、符号系列ｐｎ（ｋ）は周期Ｌの周期符号系列（ｐｎ（ｋ＋Ｌ）＝ｐｎ（ｋ））とみなす。 The information encoding unit 303 encodes the n-bit embedded information b using the code sequence pn (k). In the present embodiment, as shown in FIG. 6, information is expressed by a shift amount of a code sequence pn (k) as an information encoding method. The code of the sequence length L can express L states from the shift zero to the shift L−1. Therefore, log ₂ L-bit information can be encoded using the code sequence pn (k) having the sequence length L. Using this principle, a coded information sequence c (k) obtained by coding the embedded information b is obtained. That is, the encoded information sequence c (k) is expressed by the following equation (2). However, in the following equation (2), the code sequence pn (k) is regarded as a periodic code sequence of period L (pn (k + L) = pn (k)).

次の修正パラメータ値系列生成ステップ（Ｓ１０４）では、修正パラメータ値系列生成部３０４にて、パラメータ値系列抽出ステップ（Ｓ１０２）にて求められた文字間隔の系列ａ（ｋ）を、埋め込み情報符号化ステップ（Ｓ１０３）で得られた符号化情報系列ｃ（ｋ）に応じて修正した、修正パラメータ値系列ｄ（ｋ）を求める。すなわち、修正パラメータ値系列生成部３０４において、次式（３）の演算が行なわれる。 In the next modified parameter value sequence generation step (S104), the modified parameter value sequence generation unit 304 embeds the character spacing sequence a (k) obtained in the parameter value sequence extraction step (S102) into embedded information encoding. A modified parameter value sequence d (k) modified according to the encoded information sequence c (k) obtained in step (S103) is obtained. That is, the correction parameter value series generation unit 304 performs the calculation of the following equation (3).

式（３）におけるδは情報の埋め込み強度を示す定数であり、小さな値にするほど埋め込み情報の不可視性が高まる。後述するように、本実施の形態では、系列長Ｌが数百から数千の符号系列を用いた場合、式（３）のδはａ（ｋ）の各シンボル値に比べてはるかに小さな値とすることができる。すなわち、整列長の長い符号系列を用いれば、埋め込み情報がほとんど目視により判別することができない程度にまで不可視性を高めることができる。
また、ここで注目すべきは、本実施の形態では、ｎビットの情報を埋め込むに際して、文字間隔の系列ａ（ｋ）の変動幅を [−δ,δ]に納めることができることである。つまり、ｎビットの情報を埋め込むに際して、それに対応したｎ個の符号化情報系列を多重化する従来の方法では、文字間隔の系列ａ（ｋ）を［−ｎδ,ｎδ］の範囲で変動させる必要があった。これに対して、本実施の形態では文字間隔の系列ａ（ｋ）の変動幅を従来の１／ｎにすることができるため、埋め込み情報の不可視性を大幅に改善することが可能となる。 In Expression (3), δ is a constant indicating the embedding strength of information, and the invisibility of embedded information increases as the value decreases. As will be described later, in the present embodiment, when a code sequence having a sequence length L of several hundred to several thousand is used, δ in Expression (3) is a much smaller value than each symbol value of a (k). It can be. That is, if a code sequence having a long alignment length is used, the invisibility can be increased to such an extent that embedded information can hardly be discriminated visually.
It should also be noted here that in the present embodiment, when embedding n-bit information, the variation range of the character spacing series a (k) can be stored in [−δ, δ]. That is, when embedding n-bit information, in the conventional method of multiplexing n encoded information sequences corresponding thereto, it is necessary to vary the character spacing sequence a (k) in the range of [−nδ, nδ]. was there. On the other hand, in the present embodiment, the variation range of the character spacing series a (k) can be reduced to 1 / n of the conventional one, so that the invisibility of embedded information can be greatly improved.

次の文書修正ステップ（Ｓ１０５）では、文書修正部３０５にて、修正パラメータ値系列生成ステップ（Ｓ１０４）にて求められた修正パラメータ値系列ｄ（ｋ）を用いて、第１文書入力部１０から入力された文書データの体裁を修正する。ここでは、図７に示したように、修正後の文書上の文字列から得られる文字間隔が、修正パラメータ値系列ｄ（ｋ）の各シンボル値に等しくなるよう修正される。 In the next document correction step (S105), the document correction unit 305 uses the correction parameter value series d (k) obtained in the correction parameter value series generation step (S104) from the first document input unit 10. Correct the appearance of the input document data. Here, as shown in FIG. 7, the character spacing obtained from the character string on the corrected document is corrected so as to be equal to each symbol value of the correction parameter value series d (k).

文書修正ステップ（Ｓ１０５）においては、元の文書データがＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式の電子文書データである場合は、文字間隔が修正パラメータ値系列ｄ（ｋ）の各シンボル値に等しくなるよう、電子文書データの記述を修正する。
一方、文書データがスキャナやデジタルカメラにより撮影された文書のラスタデータである場合は、パラメータ値系列抽出ステップ（Ｓ１０２）で行なった文字の切り出し処理の結果を利用し、切り出された各文字矩形間の距離が修正パラメータ値系列ｄ（ｋ）の各シンボル値に等しくなるよう、各文字画像を再配置する。 In the document correction step (S105), when the original document data is electronic document data in a page description language such as PostScript or a word processor output format, the character spacing is equal to each symbol value of the correction parameter value series d (k). The description of the electronic document data is corrected so that
On the other hand, when the document data is raster data of a document photographed by a scanner or a digital camera, the result of the character segmentation process performed in the parameter value series extraction step (S102) is used, and the character rectangles between the segmented character rectangles are used. Each character image is rearranged so that the distance of becomes equal to each symbol value of the correction parameter value series d (k).

そして、次の文書出力ステップ（Ｓ１０６）において、文書出力部５０から、修正パラメータ値系列ｄ（ｋ）を用いて文字間隔を修正した文書データが紙文書や電子文書データの形式で出力される。 In the next document output step (S106), the document output unit 50 outputs the document data in which the character spacing is corrected using the correction parameter value series d (k) in the form of a paper document or electronic document data.

このようにして付加情報が埋め込まれた文書が得られる。本実施の形態の文書処理方法では、情報はバーコードなどで文書に併記されるものと異なり、文字列の文字間隔として文書コンテンツと不可分の形式で埋め込まれる。したがって、機密文書の追跡情報などを埋め込み情報として文書に埋め込んだ場合、流出元の特定を妨害するために埋め込み情報を取り除こうとすると文書コンテンツそのものを失うことになり、高いセキュリティ性が得られる。
また上述したように、本実施の形態で用いられる符号系列は、シンボル値の平均が小さな値で、ランダム性を有するものである。このことを前提に式（３）を眺めると、各文字の間隔ａ（ｋ）はランダムに広げられたり狭められたりし、かつ、ある一定の範囲（例えば１行）を見れば、修正量の和は小さな値になることがわかる。つまり、修正後の各行の長さは元文書の行の長さとほとんど変わらないため、情報埋め込み処理の不可視性を高めることが可能となる。
さらに本実施の形態の情報埋め込み処理方法は、文字画像そのものに情報を埋め込むものである。そのため、繰り返しコピーをとった場合でも、文字そのものが消えてしまわない限り埋め込み情報が保存され、繰り返しコピーによる画質劣化に対して堅牢である（ロバスト性が高い）という特長も併せ持っている。 In this way, a document in which additional information is embedded is obtained. In the document processing method according to the present embodiment, information is embedded in a form that is inseparable from the document content as the character spacing of the character string, unlike information that is written in the document with a barcode or the like. Therefore, when the tracking information of a confidential document or the like is embedded as embedded information in the document, the document content itself is lost when the embedded information is removed in order to prevent the identification of the leakage source, and high security can be obtained.
Further, as described above, the code sequence used in the present embodiment has a small average value of symbol values and has randomness. Looking at equation (3) on the premise of this, the spacing a (k) between characters is randomly expanded or narrowed, and if a certain range (for example, one line) is seen, the correction amount It can be seen that the sum is small. That is, since the length of each corrected line is almost the same as the line length of the original document, the invisibility of the information embedding process can be increased.
Furthermore, the information embedding processing method of the present embodiment embeds information in the character image itself. Therefore, even when repeated copies are made, the embedded information is preserved as long as the characters themselves do not disappear, and it has the feature of being robust against image quality deterioration due to repeated copies (high robustness).

ここで、上述した本実施の形態の文書処理装置１での情報埋め込み処理を具体例に基づき説明する。なお、本実施の形態の情報埋め込み処理はかかる具体例に限定されるものではない。
本具体例では、埋め込むべき情報ｂを３ビットの情報、その値を（ｂ＝５）とし、符号系列の系列長Ｌを１５とする簡単な例を示す。なお、本来、埋め込み情報の不可視性を高めるためには、符号系列の系列長Ｌを数百から数千に設定する必要があるが、ここでは理解を容易にするために、Ｌ＝１５とする。すなわち、本具体例では次式（４）を用いるものとする。 Here, the information embedding process in the document processing apparatus 1 of the present embodiment described above will be described based on a specific example. Note that the information embedding process of the present embodiment is not limited to such a specific example.
This specific example shows a simple example in which the information b to be embedded is 3-bit information, its value is (b = 5), and the sequence length L of the code sequence is 15. Originally, in order to increase the invisibility of embedded information, it is necessary to set the sequence length L of the code sequence from several hundred to several thousand, but here L = 15 for easy understanding. . That is, in this specific example, the following equation (4) is used.

この符号系列ｐｎ（ｋ）と埋め込み情報ｂから、符号化情報系列ｃ（ｋ）は式（２）に従って次式（５）のようになる。 From this code sequence pn (k) and embedded information b, the encoded information sequence c (k) is expressed by the following equation (5) according to equation (2).

求められた符号化情報系列ｃ（ｋ）を用いて、入力文書に対するパラメータ値系列である文字間隔ａ（ｋ）を式（３）に従って修正する。文字間隔ａ（ｋ）は例えば次式（６）のようなものであるとする。 Using the obtained encoded information sequence c (k), the character spacing a (k), which is a parameter value sequence for the input document, is corrected according to the equation (3). The character spacing a (k) is assumed to be, for example, the following equation (6).

このように、文字間隔は等間隔である必要はなく、文字列が見た目に自然に見えるように前後の文字に合わせて文字間隔を調節した、所謂カーニングの施されたものであっても良い。
また、情報の埋め込み強度は、δ＝２とする。符号系列の系列長Ｌを数百から数千とすれば、情報の埋め込み強度として小さな値を設定し、埋め込み情報の不可視性を高めることができる。しかし、この例では系列長Ｌ＝１５と小さいので、情報の埋め込み強度はある程度大きな値にする必要がある。
これらの前提の下に、修正パラメータ値系列ｄ（ｋ）を式（３）に従って求めると次式（７）のようになる。 In this way, the character spacing does not need to be equal, and the character spacing may be so-called kerning in which the character spacing is adjusted according to the characters before and after the character string so that it looks natural.
The information embedding strength is δ = 2. When the sequence length L of the code sequence is several hundred to several thousand, a small value can be set as the information embedding strength, and the invisibility of the embedded information can be increased. However, since the sequence length L is as small as 15 in this example, the information embedding strength needs to be a certain large value.
Under these assumptions, the corrected parameter value series d (k) is obtained according to the equation (3) as shown in the following equation (7).

最後に、文書の文字間隔が上記の式（７）のｄ（ｋ）となるように対応する文字の配置を修正し、紙文書や電子文書データの形式で出力する。なお、付加情報が埋め込まれた文書から元の３ビットの情報を復元する処理の具体例は後述する。 Finally, the arrangement of the corresponding characters is corrected so that the character spacing of the document becomes d (k) in the above formula (7), and the document is output in the form of a paper document or electronic document data. A specific example of processing for restoring original 3-bit information from a document in which additional information is embedded will be described later.

続いて、本実施の形態の文書処理装置１において実行される埋め込み情報復元処理について説明する。
文書処理装置１における埋め込み情報復元処理は、処理部３０内に構築された埋め込み情報復元処理部３５０で実行される。図８は、埋め込み情報復元処理部３５０の機能構成を説明するブロック図である。図８に示したように、埋め込み情報復元処理部３５０は、パラメータ値系列抽出部３０１、符号系列群生成部３０７、埋め込み情報復元部３０６を備えて構成されている。 Next, an embedded information restoration process executed in the document processing apparatus 1 according to the present embodiment will be described.
The embedded information restoration processing in the document processing apparatus 1 is executed by the embedded information restoration processing unit 350 built in the processing unit 30. FIG. 8 is a block diagram illustrating a functional configuration of the embedded information restoration processing unit 350. As illustrated in FIG. 8, the embedded information restoration processing unit 350 includes a parameter value sequence extraction unit 301, a code sequence group generation unit 307, and an embedded information restoration unit 306.

パラメータ値系列抽出部３０１は、第２文書入力部２０（図１も参照）から入力された、付加情報の埋め込まれた文書のラスタデータや電子文書データを解析し、文書上の文字列から得られる文字間隔、文字位置、単語間隔、行間隔、文字高さ、文字幅、文字の傾き、等の文書体裁に関するパラメータ値の系列を求めるものである。
符号系列群生成部３０７は、上述した情報埋め込み処理部３００で生成されたものと同一の符号系列と、そのシフトバージョンを生成するためのものであり、符号系列の生成部は情報埋め込み処理部３００と同一の構成をとることができる。
埋め込み情報復元部３０６は、パラメータ値系列抽出部３０１により抽出された、付加情報の埋め込まれた文書の文書体裁に関するパラメータ値系列から、埋め込み情報を復元するためのものである。このパラメータ値系列に対して、符号系列群生成部３０７により生成されたそれぞれの符号系列との間で復号処理を行うことにより、埋め込まれた付加情報を復元する。 The parameter value series extraction unit 301 analyzes raster data and electronic document data of a document in which additional information is embedded, which is input from the second document input unit 20 (see also FIG. 1), and obtains it from a character string on the document. A series of parameter values relating to a document style such as a character interval, a character position, a word interval, a line interval, a character height, a character width, and a character inclination is obtained.
The code sequence group generation unit 307 is for generating the same code sequence as that generated by the information embedding processing unit 300 and a shift version thereof. The code sequence generation unit is the information embedding processing unit 300. The same configuration can be taken.
The embedded information restoration unit 306 is for restoring the embedded information from the parameter value series related to the document format of the document embedded with the additional information extracted by the parameter value series extraction unit 301. The parameter value series is decoded with each code series generated by the code series group generation unit 307 to restore the embedded additional information.

ここで、本実施の形態の埋め込み情報復元処理部３５０は、付加情報が埋め込まれ、紙に出力された文書の画像から、埋め込み情報を復元することを目的とするが、紙に出力する前のＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式の電子文書データからも埋め込み情報を復元することができる。したがって、第２文書入力部２０から入力される文書データはラスタデータに限るものではなく、ページ記述言語やワードプロセッサ出力形式の電子文書データであっても良い。 Here, the embedding information restoration processing unit 350 according to the present embodiment aims to restore the embedding information from the image of the document embedded in the additional information and output to the paper. Embedded information can also be restored from page description languages such as PostScript and electronic document data in a word processor output format. Therefore, the document data input from the second document input unit 20 is not limited to raster data, and may be electronic document data in a page description language or a word processor output format.

続いて、本実施の形態の文書処理装置１で用いられる埋め込み情報復元処理方法について述べる。図９は、文書処理装置１で実行される埋め込み情報復元処理の手順の一例を示したフローチャートである。以下では、上述した情報埋め込み処理手順で紙文書に埋め込まれたｎビットの埋め込み情報ｂを復元する場合を例にとり説明する。入力となる紙文書は繰り返しコピーによる画質劣化を含むものとする。 Next, an embedded information restoration processing method used in the document processing apparatus 1 according to this embodiment will be described. FIG. 9 is a flowchart showing an example of the procedure of the embedded information restoration process executed by the document processing apparatus 1. In the following, a case will be described as an example where the n-bit embedded information b embedded in a paper document by the above-described information embedding processing procedure is restored. Assume that the input paper document includes image quality degradation due to repeated copying.

まず、文書入力ステップ（Ｓ２０１）において、情報埋め込み済みの紙文書を、スキャナやデジタルカメラのような第２文書入力部２０から入力する。
次のパラメータ値系列抽出ステップ（Ｓ２０２）において、第２文書入力部２０から得られた文書のラスタデータを解析し、文書上の文字列を構成する各文字間の間隔を求め、これを文書体裁に関するパラメータ値の系列とする。
本実施の形態では、付加情報の埋め込まれた紙に出力された文書の画像から埋め込み情報を復元する場合について説明するが、紙に出力する前のＰｏｓｔＳｃｒｉｐｔ等のページ記述言語やワードプロセッサ出力形式の電子文書データを入力データとしても良い。いずれの場合についても、文字間隔系列の抽出方法は上述した情報埋め込み時と同様であるため、説明は省略する。 First, in the document input step (S201), a paper document in which information is embedded is input from the second document input unit 20 such as a scanner or a digital camera.
In the next parameter value series extraction step (S202), the raster data of the document obtained from the second document input unit 20 is analyzed, the interval between the characters constituting the character string on the document is obtained, and this is converted into the document format. Parameter value series.
In the present embodiment, a case where embedding information is restored from an image of a document output on a paper in which additional information is embedded will be described. However, a page description language such as PostScript before output on paper or a word processor output format electronic Document data may be used as input data. In any case, the method for extracting the character interval series is the same as that in the above-described information embedding, and thus the description thereof is omitted.

パラメータ値系列抽出ステップ（Ｓ２０２）で得られた文字間隔の系列ａ’（ｋ）は、上記した式（３）を用いて次式（８）のように書くことができる。 The character spacing series a ′ (k) obtained in the parameter value series extraction step (S202) can be written as the following expression (8) using the above expression (3).

式（８）におけるｅ（ｋ）は繰り返しコピーによる文字の太りや細り、あるいは文字上のノイズや荒れによる誤差成分、および文字間隔検出処理の検出誤差成分を表す。 In equation (8), e (k) represents an error component due to character thickening or thinning due to repetitive copying, or noise or roughness on the character, and a detection error component in the character interval detection processing.

次の符号系列群生成ステップ（Ｓ２０３）において、情報埋め込み時に使用されたものと同一の符号系列と、そのシフトバージョンが、符号系列群生成部３０７により生成される。生成される符号系列は上記した式（１）に示したものと同一であるため説明は省略する。 In the next code sequence group generation step (S203), the code sequence group generation unit 307 generates the same code sequence as that used at the time of information embedding and its shifted version. Since the generated code sequence is the same as that shown in the above equation (1), description thereof is omitted.

次の埋め込み情報復元ステップ（Ｓ２０４）では、文字間隔の系列ａ’（ｋ）から埋め込んだｎビットの埋め込み情報ｂが復元される。ここでの埋め込み情報復元処理の詳細を、図１０（埋め込み情報復元ステップ（Ｓ２０４）における処理の手順の一例を示したフローチャート）を用いて説明する。
図１０のフローチャートに示したように、符号系列群生成部３０７では、初期設定が行われた後（Ｓ３０１）、上述した情報埋め込み処理部３００で生成されたものと同一の符号系列と、そのシフトバージョンが生成され、その各々と文字間隔の系列ａ’（ｋ）との相関値が求められる（Ｓ３０２）。文字間隔の系列ａ’（ｋ）とシフト量ｉを持つ符号系列ｐｎ（ｋ＋ｉ）との相関値Ｒ（ｉ）は次式（９）のように書ける。 In the next embedded information restoring step (S204), the n-bit embedded information b embedded from the character spacing series a ′ (k) is restored. Details of the embedding information restoring process here will be described with reference to FIG. 10 (a flowchart showing an example of a processing procedure in the embedding information restoring step (S204)).
As shown in the flowchart of FIG. 10, in the code sequence group generation unit 307, after the initial setting (S301), the same code sequence as that generated by the information embedding processing unit 300 described above and its shift A version is generated, and a correlation value between each version and the character spacing series a ′ (k) is obtained (S302). The correlation value R (i) between the character spacing sequence a ′ (k) and the code sequence pn (k + i) having the shift amount i can be written as the following equation (9).

この相関値Ｒ（ｉ）が最大値となるシフト量ｉｍａｘを求めれば（Ｓ３０３〜Ｓ３０６）、そのシフト量ｉｍａｘが埋め込んだ情報ｂに相当する。
このことを説明するため、式（９）を式（２）と式（８）とを用いて次式（１０）のように書き換える。 If the shift amount imax that maximizes the correlation value R (i) is obtained (S303 to S306), the shift amount imax corresponds to the embedded information b.
In order to explain this, Equation (9) is rewritten as Equation (10) below using Equation (2) and Equation (8).

ここで、上記した式（１０）の最終行の第２項に着目する。第２項は符号系列ｐｎ（ｋ）の自己相関値である。符号系列としては、図５に示すような鋭い自己相関特性を持つものを使用することは既に述べた。つまり、式（１０）の最終行の第２項が大きな値をとるのはｉ＝ｂの時のみであり、それ以外の場合は非常に小さな値になることを意味する。実際、本実施の形態において符号系列群として使用したｍ系列とそのシフトバージョンの間の相互相関値は、次のような性質を持つことが知られている。すなわち、式（１０）の最終行の第２項は、次式（１１）のように表される。 Here, attention is focused on the second term in the last row of the above-described formula (10). The second term is the autocorrelation value of the code sequence pn (k). As described above, the code sequence having a sharp autocorrelation characteristic as shown in FIG. 5 is used. In other words, the second term in the last row of Equation (10) takes a large value only when i = b, and in other cases it means a very small value. In fact, it is known that the cross-correlation value between the m-sequence used as the code sequence group and its shifted version in this embodiment has the following properties. That is, the second term in the last row of Expression (10) is expressed as the following Expression (11).

符号系列の系列長Ｌは数百から数千のものを用いるため、情報の埋め込み強度δの値を適切に設定すれば、式（１１）の最終行の第２項はｉ＝ｂのときのみ非常に大きな値をとる(図１１参照)。
それに対して、式（１１）の最終行の第１項と第３項とはそれぞれ、文字間隔の系列および誤差成分と、符号系列との相互相関値を表しており、第２項と比較するとその絶対値ははるかに小さな値となる。なぜなら既に述べたように、符号系列はシンボル値の平均が小さな値で、ランダム性を有するため、自身以外のいかなる信号とも低相関となるからである。
さらに第１項は、情報埋め込みの対象となる元文書の文字間隔が等間隔である必要はなく、文字列が見た目に自然に見えるように前後の文字に合わせて文字間隔を調節した、所謂カーニングの施されたものであっても良いことを示している。第１項におけるａ（ｋ）は元文書の文字間隔の系列を表しているが、これがｋにより様々な値をとるものであっても、符号系列の持つ上記の性質により低相関となるためである。 Since the sequence length L of the code sequence is several hundred to thousands, if the value of the information embedding strength δ is appropriately set, the second term in the last row of the equation (11) is only when i = b. It takes a very large value (see FIG. 11).
On the other hand, the first term and the third term in the last line of the expression (11) represent the cross-correlation value between the character spacing series and the error component and the code series, respectively. Its absolute value is much smaller. This is because, as already described, the code sequence has a small average symbol value and has randomness, and therefore has a low correlation with any signal other than itself.
Further, the first term is so-called kerning in which the character spacing of the original document to be information-embedded does not have to be equal, and the character spacing is adjusted according to the preceding and following characters so that the character string looks natural. It is shown that it may be given. Although a (k) in the first term represents a character spacing sequence of the original document, even if this takes various values depending on k, it is because of low correlation due to the above properties of the code sequence. is there.

以上から、符号系列の系列長Ｌが十分大きく、情報の埋め込み強度δが適切に設定されていれば、式（９）の相関値Ｒ（ｉ）は、シフト量ｂ（＝ｉｍａｘ）の符号系列に対してのみ大きな値となることがわかる。これにより埋め込み情報の復元値はｂ＝ｉｍａｘとなる。
このようにして埋め込み情報復元部３０６により復元された埋め込み情報は、情報出力部６０から出力される（Ｓ２０５）。 From the above, if the sequence length L of the code sequence is sufficiently large and the embedding strength δ of information is appropriately set, the correlation value R (i) in equation (9) is the code sequence of the shift amount b (= imax). It turns out that it becomes a big value only with respect to. As a result, the restoration value of the embedded information is b = imax.
The embedded information restored by the embedded information restoration unit 306 in this way is output from the information output unit 60 (S205).

ここで、上述した本実施の形態の文書処理装置１での埋め込み情報復元処理を具体例に基づき説明する。なお、本実施の形態の埋め込み情報復元処理はかかる具体例に限定されるものではない。
本具体例では、上述した情報埋め込み処理の具体例で得られた情報埋め込み済みの文書を繰り返しコピーし、画質劣化の発生した文書から、元の３ビットの情報を復元する手順を示す。 Here, the embedded information restoration process in the document processing apparatus 1 according to the present embodiment will be described based on a specific example. The embedded information restoration process according to the present embodiment is not limited to such a specific example.
In this specific example, a procedure is described in which the information-embedded document obtained in the above-described specific example of the information embedding process is repeatedly copied, and the original 3-bit information is restored from the document in which the image quality has deteriorated.

上述した情報埋め込み処理の具体例では、情報が埋め込まれた文書の文字間隔系列ｄ（ｋ）は、上記した式（７）に示されるものであった。この文書を繰り返しコピーしたものから文字間隔を検出した系列ａ’（ｋ）には、繰り返しコピーによる文字の太りや細り、あるいは文字上のノイズや荒れによる誤差成分、および文字間隔検出処理の検出誤差成分を表す成分ｅ（ｋ）が含まれることは式（８）で説明した。
そこで、この誤差成分ｅ（ｋ）が次のようなものであったとする。 In the specific example of the information embedding process described above, the character interval series d (k) of the document in which the information is embedded is represented by the above-described equation (7). In the series a ′ (k) in which the character spacing is detected from the repeated copies of this document, error components due to thickening and thinning of the characters due to repeated copying, noise on the characters and roughness, and detection errors of the character spacing detection processing The fact that the component e (k) representing the component is included has been described in the equation (8).
Therefore, it is assumed that the error component e (k) is as follows.

そうすると、繰り返しコピーした文書から検出される文字間隔の系列ａ’（ｋ）は、式（８）に式（７）と式（１２）を代入して次式（１３）のようになる。 Then, the character spacing series a ′ (k) detected from the repeatedly copied document is expressed by the following equation (13) by substituting equations (7) and (12) into equation (8).

検出された文字間隔の系列ａ’（ｋ）と、様々なシフト量を持つ符号系列との相互相関値を式（９）により求めると、最大相関値を持つ符号系列のシフト量ｉｍａｘが埋め込んだ情報ｂとなる。図１２に式（１３）の文字間隔の系列ａ’（ｋ）と式（５）の符号系列、およびそのシフトバージョンとの間の相互相関値を示す。
図１２に示した結果より、シフト量が５の時に最大相関値が得られ、３ビットの埋め込み情報ｂ＝５が復元された。 When the cross-correlation value between the detected character spacing sequence a ′ (k) and the code sequence having various shift amounts is obtained by Equation (9), the shift amount imax of the code sequence having the maximum correlation value is embedded. Information b. FIG. 12 shows the cross-correlation values between the character spacing sequence a ′ (k) in equation (13), the code sequence in equation (5), and the shifted version thereof.
From the results shown in FIG. 12, the maximum correlation value was obtained when the shift amount was 5, and the 3-bit embedded information b = 5 was restored.

以上説明したように、本実施の形態の文書処理装置１では、情報埋め込みの対象となる電子文書データ、または文書画像を入力し、入力された文書上の文字列から文書体裁に関するパラメータ値の系列を求める。次に、埋め込み情報をシフト量として、所定の符号系列をシフトさせた符号系列を生成し、これを符号化情報系列とする。そして、この符号化情報系列とパラメータ値系列を合成した、修正パラメータ値系列を求め、電子文書データ、または文書画像の文書体裁を、修正パラメータ値系列に応じて変更することにより、文書に情報を埋め込んでいる。
また、情報が埋め込まれた文書画像、または電子文書データを入力し、入力された文書上の文字列から文書体裁に関するパラメータ値系列を求める。次に、このパラメータ値系列と、所定の符号系列およびそのシフトバージョンとの間の相関値を求める。そして、最大相関値が得られた符号系列のシフト量を埋め込んだ情報とし、情報を復元している。 As described above, in the document processing apparatus 1 according to the present embodiment, electronic document data or a document image to be information-embedded is input, and a series of parameter values related to the document appearance from the input character string. Ask for. Next, a code sequence is generated by shifting a predetermined code sequence using the embedded information as a shift amount, and this is used as an encoded information sequence. Then, a correction parameter value series obtained by combining the encoded information series and the parameter value series is obtained, and the document format of the electronic document data or the document image is changed according to the correction parameter value series, so that information is added to the document. Embedded.
In addition, a document image or electronic document data in which information is embedded is input, and a parameter value series related to the document appearance is obtained from a character string on the input document. Next, a correlation value between the parameter value series and a predetermined code series and its shifted version is obtained. Then, the information is reconstructed using information in which the shift amount of the code sequence from which the maximum correlation value is obtained is embedded.

それにより、本実施の形態の文書処理装置１では、文書のデジタルデータだけでなく、紙に印刷された文字文書中にも追跡情報などの付加情報を埋め込むことができる。
また、付加情報は文書上の文字列から得られる文字間隔、文字位置、単語間隔、行間隔、文字高さ、文字幅等、文書体裁に関する文書コンテンツ自体のパラメータ値に直接埋め込まれるため、文書コンテンツと不可分であり、情報の改ざん・除去を困難にすることができる。 As a result, the document processing apparatus 1 according to the present embodiment can embed additional information such as tracking information not only in the digital data of the document but also in a text document printed on paper.
In addition, since the additional information is directly embedded in the parameter values of the document content itself relating to the document format such as character spacing, character position, word spacing, line spacing, character height, character width, etc. obtained from the character string on the document, the document content It is inseparable, making it difficult to tamper and remove information.

特に、本実施の形態の文書処理装置１では、ｎビットの付加情報を文字列に埋め込む際に、１個の符号化情報系列のみを用いている。それにより、符号化後の付加情報の振幅を極めて小さくすることができる。そのため、埋め込み情報の不可視性が高く、情報の解読・改ざん・除去を困難にすることが可能となる。
さらに、符号系列による復号処理の結果、文書体裁に関するパラメータ値や繰り返しコピーに起因する画像劣化成分など、付加情報以外の成分は大幅に低減される。したがって、カーニング処理の施された文書や、繰り返しコピーした紙文書からも安定して埋め込み情報を復元することができる。 In particular, the document processing apparatus 1 according to the present embodiment uses only one encoded information sequence when embedding n-bit additional information in a character string. Thereby, the amplitude of the additional information after encoding can be made extremely small. Therefore, the invisible information is highly invisible, and it is possible to make it difficult to decode, falsify, and remove the information.
Furthermore, as a result of the decoding process using the code sequence, components other than the additional information such as parameter values relating to the document format and image degradation components caused by repeated copying are greatly reduced. Therefore, it is possible to stably restore the embedded information from a document that has been subjected to kerning processing or a paper document that has been repeatedly copied.

[実施の形態２]
実施の形態１では、情報埋め込み処理部３００において、ｎビットの埋め込み情報ｂを１つの符号系列ｐｎ（ｋ）で符号化した符号化情報系列ｃ（ｋ）を生成し、これを修正パラメータ値系列生成部３０４により文書体裁に関するパラメータ値の系列ａ（ｋ）に埋め込む処理を行なう場合について説明した。実施の形態２では、より多ビットの情報を埋め込む場合であって、埋め込み情報ｂを複数の部分埋め込み情報系列に分割し、それぞれの部分埋め込み情報系列を符号化し、これを連結したものを符号化情報系列ｃ（ｋ）として情報埋め込み処理を行なう場合について説明する。なお、実施の形態１と同様な構成については同様な符号を用い、ここではその詳細な説明を省略する。 [Embodiment 2]
In the first embodiment, the information embedding processing unit 300 generates an encoded information sequence c (k) obtained by encoding n-bit embedded information b with one code sequence pn (k), and this is generated as a modified parameter value sequence. The case where the generation unit 304 performs the process of embedding in the parameter value series a (k) relating to the document format has been described. In Embodiment 2, when embedding more bits of information, the embedded information b is divided into a plurality of partial embedded information sequences, each partial embedded information sequence is encoded, and the concatenated one is encoded. A case where information embedding processing is performed as the information series c (k) will be described. In addition, the same code | symbol is used about the structure similar to Embodiment 1, and the detailed description is abbreviate | omitted here.

本実施の形態の情報埋め込み処理について図１３を用いて説明する。本実施の形態では、入力される埋め込むべき情報がｍ×ｎビット長であるとしたとき、埋め込み情報入力部４０はこれをｎビットの部分埋め込み情報系列ｂ_ｑに分割して、情報符号化部３０３に出力する。すなわち、部分埋め込み情報系列ｂ_ｑは、次式（１４）で表される。なお、ここでは、埋め込み情報入力部４０が部分情報生成部としても機能する。 The information embedding process of this embodiment will be described with reference to FIG. In the present embodiment, when the information to be embedded to be input is m × n bits long, the embedded information input unit 40 divides this into n-bit partial embedded information sequences b _q , and the information encoding unit It outputs to 303. That is, the partially embedded information series b _q is expressed by the following equation (14). Here, the embedded information input unit 40 also functions as a partial information generation unit.

このようにして求めた部分埋め込み情報系列ｂ_ｑを、情報符号化部３０３にて、上記した式（２）と同様の方法により、符号系列生成部３０２により生成された符号系列により符号化し、部分符号化情報系列ｃ_ｑ（ｋ）を求める。すなわち、部分符号化情報系列ｃ_ｑ（ｋ）は次式（１５）のように表される。 The partial embedded information sequence b _q obtained in this way is encoded by the information encoding unit 303 by the code sequence generated by the code sequence generating unit 302 by the same method as the above equation (2). An encoded information sequence c _q (k) is obtained. That is, the partial encoded information sequence c _q (k) is expressed as the following equation (15).

上記した式（１５）におけるＬは、符号系列ｐｎ（ｋ）の系列長である。したがって、部分符号化情報系列ｃ_ｑ（ｋ）の系列長もそれぞれＬとなる。このようにして求めた部分符号化情報系列ｃ_ｑ（ｋ）を連結して、系列長ｍ×Ｌの符号化情報系列ｃ（ｋ）を構成する。すなわち、符号化情報系列ｃ（ｋ）は次式（１６）のように表される。 L in the above equation (15) is the sequence length of the code sequence pn (k). Therefore, the sequence length of the partial encoded information sequence c _q (k) is also L, respectively. The thus obtained partial encoded information sequence c _q (k) is connected to form an encoded information sequence c (k) having a sequence length m × L. That is, the encoded information sequence c (k) is expressed as the following equation (16).

そして、修正パラメータ値系列生成部３０４にて、上記した式（１６）の符号化情報系列ｃ（ｋ）を、系列長ｍ×Ｌの文書体裁に関するパラメータ値の系列ａ（ｋ）に埋め込み（式（３）参照）、修正パラメータ値系列ｄ（ｋ）を求める。さらに、文書修正部３０５においては、修正パラメータ値系列ｄ（ｋ）を用いて、第１文書入力部１０から入力された文書データの体裁を修正する。その後、修正された文書データは、文書出力部５０から出力される。 Then, the modified parameter value sequence generation unit 304 embeds the encoded information sequence c (k) of the above equation (16) in the parameter value sequence a (k) related to the document length of sequence length m × L (expression (Refer to (3)), and obtain a correction parameter value series d (k). Further, the document correction unit 305 corrects the appearance of the document data input from the first document input unit 10 using the correction parameter value series d (k). Thereafter, the corrected document data is output from the document output unit 50.

一方、本実施の形態において埋め込まれた情報を復元するには、情報埋め込み済の文書から文書体裁に関するパラメータ値の系列ａ’（ｋ）を求め、これを長さＬの部分パラメータ値系列ａ’_ｑ（ｋ）に分割する。すなわち、部分パラメータ値系列ａ’_ｑ（ｋ）は、次式（１７）のように表される。なお、ここでは、パラメータ値系列抽出部３０１が部分パラメータ値系列抽出部として機能する。 On the other hand, in order to restore the embedded information in the present embodiment, a parameter value series a ′ (k) relating to the document format is obtained from the information-embedded document, and this is obtained as a partial parameter value series a ′ of length L. Divide into _q (k). That is, the partial parameter value series a ′ _q (k) is expressed as in the following equation (17). Here, the parameter value series extraction unit 301 functions as a partial parameter value series extraction unit.

次に、それぞれの部分パラメータ値系列ａ’_ｑ（ｋ）に対して実施の形態１と同様の埋め込み情報復元処理を行い、ｎビットの部分埋め込み情報系列ｂ_ｑを復元する。これらを連結して、系列長ｍ×ｎの埋め込み情報ｂ（式（１８））を復元することができる。 Next, the embedded information restoration process similar to that of the first embodiment is performed on each partial parameter value series a ′ _q (k) to restore the n-bit partial embedded information series b _q . By connecting these, embedded information b (equation (18)) having a sequence length of m × n can be restored.

本実施の形態の情報埋め込み処理方法においては、系列長Ｌの符号系列をｍ個利用して、系列長ｍ×Ｌの文書体裁に関するパラメータ値系列へ情報を埋め込んでいる。系列長Ｌの符号系列は最大ｌｏｇ_２Ｌビットの情報を表現できるから、系列長Ｌの符号系列ｍ個を用いて系列長ｍ×Ｌのパラメータ値系列へ情報を埋め込んだ場合、最大ｍｌｏｇ_２Ｌビットの情報を埋め込むことができる。
これに対して、実施の形態１のように単一の符号系列を用いる場合は、系列長ｍ×Ｌのパラメータ値系列に対して用いることのできる符号系列の長さはｍ×Ｌとなるため、この符号系列が表現できる情報は最大ｌｏｇ_２ｍ×Ｌ=ｌｏｇ_２ｍ＋ｌｏｇ_２Ｌビットとなる。
ｍ≧２に対しては常にｍｌｏｇ_２Ｌ≧（ｌｏｇ_２ｍ＋ｌｏｇ_２Ｌ）となるため、本実施の形態のように埋め込み情報を部分系列に分解することにより、同じ長さのパラメータ値系列に、より多くの情報を埋め込むことが可能となる。 In the information embedding processing method of the present embodiment, m code sequences having a sequence length L are used, and information is embedded in a parameter value sequence relating to a document format having a sequence length m × L. Since a code sequence of sequence length L can express information of maximum log ₂ L bits, when information is embedded in a parameter value sequence of sequence length m × L using m code sequences of sequence length L, maximum log ₂ L Bit information can be embedded.
On the other hand, when a single code sequence is used as in Embodiment 1, the length of the code sequence that can be used for a parameter value sequence of sequence length m × L is m × L. The information that can be expressed by this code sequence is maximum log ₂ m × L = log ₂ m + log ₂ L bits.
Since mlog ₂ L ≧ (log ₂ m + log ₂ L) is always satisfied for m ≧ 2, by decomposing the embedded information into partial series as in the present embodiment, the parameter value series of the same length is obtained. It becomes possible to embed more information.

[実施の形態３]
実施の形態２では、情報埋め込み処理部３００において、埋め込み情報ｂを複数の部分埋め込み情報系列に分割して、それぞれの部分符号化情報系列ｃ_ｑ（ｋ）を求め、これらを連結して得た系列長ｍ×Ｌの符号化情報系列ｃ（ｋ）をパラメータ値の系列ａ（ｋ）に埋め込む処理を行なう場合について説明した。実施の形態３では、部分符号化情報系列を連結する代わりに、部分符号化情報系列を多重化することにより系列長Ｌの符号化情報系列ｃ（ｋ）を生成して、情報埋め込み処理を行なう場合について説明する。なお、実施の形態１と同様な構成については同様な符号を用い、ここではその詳細な説明を省略する。 [Embodiment 3]
In Embodiment 2, the information embedding processing unit 300 divides the embedded information b into a plurality of partial embedded information sequences, obtains respective partial encoded information sequences c _q (k), and obtains them by concatenating them. The case where the process of embedding the encoded information sequence c (k) of sequence length m × L in the parameter value sequence a (k) has been described. In Embodiment 3, instead of concatenating partial encoded information sequences, an encoded information sequence c (k) having a sequence length L is generated by multiplexing partial encoded information sequences, and information embedding processing is performed. The case will be described. In addition, the same code | symbol is used about the structure similar to Embodiment 1, and the detailed description is abbreviate | omitted here.

本実施の形態の情報埋め込み処理では、情報符号化部３０３において、生成した部分符号化情報系列を多重化するために、部分埋め込み情報系列ｂ_ｑの符号化にそれぞれ異なる符号系列ｐｎ_ｑ（ｋ）を使用する。それにより、部分符号化情報系列ｃ_ｑ（ｋ）は、次式（１９）のように表される。 In the information embedding process according to the present embodiment, the information encoding unit 303 multiplexes the generated partially encoded information sequence, so that different code sequences pn _q (k) are used for encoding the partially embedded information sequence b _q. Is used. Thereby, the partial encoded information sequence c _q (k) is expressed as in the following equation (19).

上記した式（１９）における符号系列ｐｎ_ｑ（ｋ）は、シンボル値の平均が小さな値で、ランダム性を有するものが用いられる。また、生成される符号系列は図５に示すような鋭い自己相関特性を持つものとする。さらに使用される符号系列群は互いに直交性を有するものとする。情報符号化部３０３では、このようにして求めた部分符号化情報系ｃ_ｑ（ｋ）を多重化して、系列長Ｌの符号化情報系列ｃ（ｋ）を構成する。すなわち、符号化情報系列ｃ（ｋ）は次式（２０）のように表される。 As the code sequence pn _q (k) in the above equation (19), one having a small average symbol value and having randomness is used. The generated code sequence has a sharp autocorrelation characteristic as shown in FIG. Furthermore, the code sequence groups used are orthogonal to each other. The information encoding unit 303 multiplexes the partial encoded information system c _q (k) obtained in this way to configure an encoded information sequence c (k) having a sequence length L. That is, the encoded information sequence c (k) is expressed as in the following equation (20).

そして、上記した式（２０）の符号化情報系列ｃ（ｋ）を、系列長Ｌの文書体裁に関するパラメータ値の系列ａ（ｋ）に埋め込み（式（３）参照）、修正パラメータ値系列ｄ（ｋ）を求める。さらに、文書修正部３０５においては、修正パラメータ値系列ｄ（ｋ）を用いて、第１文書入力部１０から入力された文書データの体裁を修正する。その後、修正された文書データは、文書出力部５０から出力される。 Then, the encoded information sequence c (k) of the above equation (20) is embedded in the parameter value sequence a (k) relating to the document format of the sequence length L (see equation (3)), and the corrected parameter value sequence d ( k). Further, the document correction unit 305 corrects the appearance of the document data input from the first document input unit 10 using the correction parameter value series d (k). Thereafter, the corrected document data is output from the document output unit 50.

一方、埋め込まれた情報を復元するには、情報埋め込み済の文書から文書体裁に関するパラメータ値の系列ａ’（ｋ）を求め、シフト量を変えながら各符号系列との相関値を求める。パラメータ値の系列ａ’（ｋ）におけるシフト量ｉを持つｑ番目の符号系列との相関値をＲ（ｑ，ｉ）と表記することとすると、以下の式（２１）のようになる。 On the other hand, in order to restore the embedded information, a parameter value series a ′ (k) relating to the document format is obtained from the information-embedded document, and a correlation value with each code series is obtained while changing the shift amount. If the correlation value with the q-th code sequence having the shift amount i in the parameter value sequence a ′ (k) is expressed as R (q, i), the following equation (21) is obtained.

上記した式（２１）は、さらに次式（２２）のように書き換えられる。 The above equation (21) is further rewritten as the following equation (22).

ここで、上記した式（２２）の最終行の第２項に着目する。符号系列群としては、互いに直交性を有するものを使用し、かつ各々の符号系列は図５に示すような鋭い自己相関特性を持つことは既に述べた。つまり、上記した式（２３）の第２項が大きな値をとるのはｊ＝ｑかつｂ_ｊ＝ｉの時のみであり、それ以外の場合は非常に小さな値になることを意味する。
したがって、パラメータ値の系列ａ’（ｋ）と符号系列群内の各符号系列との相関値を、シフト量を変えながら求め、最大相関が得られるシフト量ｉｍａｘを求めれば、ｂ_ｑ＝ｉｍａｘとして部分埋め込み情報系列ｂ_ｑを復元することができる。これらを連結して、系列長ｍ×ｎの埋め込み情報ｂ（式（２３））を復元することができる。 Here, attention is focused on the second term in the last row of the above equation (22). As described above, the code sequences are orthogonal to each other, and each code sequence has a sharp autocorrelation characteristic as shown in FIG. In other words, the second term of the above equation (23) takes a large value only when j = q and b _j = i, and in other cases it means a very small value.
Therefore, if the correlation value between the parameter value sequence a ′ (k) and each code sequence in the code sequence group is obtained while changing the shift amount, and the shift amount imax for obtaining the maximum correlation is obtained, b _q = imax. The partially embedded information sequence _bq can be restored. By concatenating them, it is possible to restore the embedded information b (equation (23)) having a sequence length of m × n.

符号系列のシフト量として情報を埋め込む本実施の形態の情報埋め込み処理方法では、生成される部分符号化情報系列ｃ_ｑ（ｋ）の振幅が小さいため、それらを多重化して符号化情報系列ｃ（ｋ）を構成しても高い不可視性を維持することができる。上述した実施の形態２の場合と比較して、多重化をすることにより符号化情報系の系列長を短くすることができるので、短い文字列中により多くの情報を、不可視性を大きく損なうことなく埋め込むことが可能となる。 In the information embedding processing method according to the present embodiment in which information is embedded as the shift amount of the code sequence, the amplitude of the generated partial encoded information sequence c _q (k) is small. Even if k) is configured, high invisibility can be maintained. Compared to the case of the above-described second embodiment, the sequence length of the encoded information system can be shortened by multiplexing, so that a larger amount of information in a short character string is greatly impaired invisibility. Can be embedded without any problem.

[実施の形態４]
実施の形態２では、情報埋め込み処理部３００において、埋め込み情報ｂを複数の部分埋め込み情報系列に分割して、それぞれの部分符号化情報系列ｃ_ｑ（ｋ）を求め、これらを連結して得た系列長ｍ×Ｌの符号化情報系列ｃ（ｋ）をパラメータ値の系列ａ（ｋ）に埋め込む処理を行なう場合について説明した。本実施の形態では、実施の形態２のようにして求めた系列長ｍ×Ｌの符号化情報系列ｃ（ｋ）を、さらに周期的に繰り返して、文書体裁に関するパラメータ値の系列ａ（ｋ）に埋め込む処理を行なう場合について説明する。なお、実施の形態１と同様な構成については同様な符号を用い、ここではその詳細な説明を省略する。 [Embodiment 4]
In Embodiment 2, the information embedding processing unit 300 divides the embedded information b into a plurality of partial embedded information sequences, obtains respective partial encoded information sequences c _q (k), and obtains them by concatenating them. The case where the process of embedding the encoded information sequence c (k) of sequence length m × L in the parameter value sequence a (k) has been described. In the present embodiment, the encoded information sequence c (k) having the sequence length m × L obtained as in the second embodiment is further periodically repeated, and the parameter value sequence a (k) relating to the document format is obtained. A case where the process of embedding is performed will be described. In addition, the same code | symbol is used about the structure similar to Embodiment 1, and the detailed description is abbreviate | omitted here.

本実施の形態では、情報符号化部３０３にて複数の部分符号化情報系列を連結して符号化情報系列ｃ（ｋ）を生成し、修正パラメータ値系列生成部３０４にて符号化情報系列ｃ（ｋ）を周期的に繰り返して文書体裁に関するパラメータ値の系列ａ（ｋ）に埋め込む。ここでは、一例として、３つの部分符号化情報系列ｃ_０、ｃ_１、ｃ_２を連結して符号化情報系列ｃ（ｋ）を生成し、これを周期的に繰り返して文書体裁に関するパラメータ値の系列ａ（ｋ）に埋め込む場合について示す。すなわち、符号化情報系列ｃ（ｋ）は、情報符号化部３０３にて次式（２４）のように形成される。 In the present embodiment, an information encoding unit 303 concatenates a plurality of partial encoded information sequences to generate an encoded information sequence c (k), and a modified parameter value sequence generation unit 304 generates an encoded information sequence c. (K) is periodically repeated and embedded in the parameter value series a (k) relating to the document appearance. Here, as an example, three partial encoded information sequences c ₀ , c ₁ , and c ₂ are concatenated to generate an encoded information sequence c (k), which is periodically repeated to obtain parameter values relating to the document format. A case of embedding in the sequence a (k) will be described. That is, the encoded information sequence c (k) is formed by the information encoding unit 303 as shown in the following equation (24).

一方、部分パラメータ値系列から埋め込んだ情報を復元するためには、最低３Ｌの長さの部分パラメータ値系列が得られれば良い。本実施の形態では系列長６Ｌの部分パラメータ値系列が得られた場合について述べる。この場合には、この部分パラメータ値系列には、図１４に示すように、２周期分の符号化情報系列ｃ（ｋ）が埋め込まれていることになる。
実施の形態２での手順に従えば、ここから３種類の部分埋め込み情報系列ｂ_ａ、ｂ_ｂ、ｂ_ｃを復元することができるが、それらが部分埋め込み情報系列ｂ_０、ｂ_１、ｂ_２のどれに対応するかを判別することはできない。したがって、埋め込んだ情報ｂは次式（２５）に示すように３通りの可能性を持つ。 On the other hand, in order to restore the information embedded from the partial parameter value series, it is only necessary to obtain a partial parameter value series having a length of at least 3L. In the present embodiment, a case will be described in which a partial parameter value sequence having a sequence length of 6L is obtained. In this case, encoded information sequences c (k) for two periods are embedded in this partial parameter value sequence as shown in FIG.
If the procedure in the second embodiment is followed, three types of partial embedded information sequences b _a , b _b , and b _c can be restored from here, and these are partially embedded information sequences b ₀ , b ₁ , b _2. It is not possible to determine which of the two corresponds. Therefore, the embedded information b has three possibilities as shown in the following equation (25).

埋め込んだ情報ｂの正しい順番を知るために、埋め込み情報ｂに誤り検出符号などを適用することができるが、その場合には、誤り検出ビットとして数ビットを割り当てる必要があるため、埋め込むことのできる情報ビット数が減少してしまう。
そこで、本実施の形態では、図１４に示すような同期用符号を、周期的に符号化情報系列ｃ（ｋ）に多重化してパラメータ値系列に埋め込む。同期用符号はシンボル値の平均が小さな値で、ランダム性を有し、かつ図５に示すような鋭い自己相関特性を持つものとする。 In order to know the correct order of the embedded information b, an error detection code or the like can be applied to the embedded information b, but in that case, it is necessary to allocate several bits as error detection bits, so that it can be embedded. The number of information bits is reduced.
Therefore, in the present embodiment, a synchronization code as shown in FIG. 14 is periodically multiplexed into the encoded information sequence c (k) and embedded in the parameter value sequence. The synchronization code has a small average symbol value, is random, and has a sharp autocorrelation characteristic as shown in FIG.

埋め込み情報ｂを復元する際には、まず実施の形態２での埋め込み情報復元処理方法により、部分埋め込み情報系列ｂ_ａ、ｂ_ｂ、ｂ_ｃを復元する。
次に、同期用符号のシフト量を変えながら最大相関が得られる位置を検出する。同期用符号は図５に示すような鋭い自己相関特性を持つため、シフトゼロの位置でのみ大きな相関値を発生する。従って、この位置からｃ_０、ｃ_１、ｃ_２の順に部分符号化情報系列が配置されていることがわかるため、次のようにして埋め込み情報ｂ（式（２６））を復元することができる。 When restoring the embedded information b, first, the partial embedded information series b _a , b _b , and b _c are restored by the embedded information restoration processing method according to the second embodiment.
Next, the position where the maximum correlation is obtained is detected while changing the shift amount of the synchronization code. Since the synchronization code has a sharp autocorrelation characteristic as shown in FIG. 5, a large correlation value is generated only at the zero shift position. Therefore, since it can be seen from this position that the partial encoded information sequences are arranged in the order of c ₀ , c ₁ , and c ₂ , the embedded information b (formula (26)) can be restored as follows. .

本実施の形態の情報埋め込み処理方法によれば、文書全体を入手できず、文書の一部しか入手できない場合でも、符号化情報系列ｃ（ｋ）の１周期が検出できるだけの長さのパラメータ値系列が得られれば、そこから埋め込んだ情報を復元することが可能となる。
また、検出される部分埋め込み情報系列の正しい順序を、誤り検出ビット等を付加することなく知ることができるため、より多くの情報を埋め込むことができる。 According to the information embedding processing method of the present embodiment, even if the entire document cannot be obtained and only a part of the document can be obtained, the parameter value is long enough to detect one cycle of the encoded information sequence c (k). If the series is obtained, it is possible to restore the embedded information.
Further, since the correct order of the detected partial embedded information sequence can be known without adding error detection bits or the like, more information can be embedded.

本発明が適用される文書処理装置の構成を示したブロック図である。1 is a block diagram showing a configuration of a document processing apparatus to which the present invention is applied. 情報埋め込み処理部の機能構成を説明するブロック図である。It is a block diagram explaining the function structure of an information embedding process part. 文書処理装置で実行される情報埋め込み処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of the information embedding process performed with a document processing apparatus. 文書上の文字列を構成する各文字間の間隔を用いた文書のパラメータ値系列ａ（ｋ）を説明する図である。It is a figure explaining the parameter value series a (k) of a document using the space | interval between each character which comprises the character string on a document. 符号系列群生成部により生成される符号系列の自己相関特性を示した図である。It is the figure which showed the autocorrelation characteristic of the code sequence produced | generated by the code sequence group production | generation part. 情報符号化部にて符号化される符号系列ｐｎ（ｋ）において、埋め込み情報をシフト量により表現することを説明する図である。It is a figure explaining expressing embedding information with a shift amount in the code sequence pn (k) encoded by the information encoding unit. 修正後の文書上の文字列から得られる文字間隔が、修正パラメータ値系列ｄ（ｋ）の各シンボル値に等しくなるよう修正された状態を示す図である。It is a figure which shows the state corrected so that the character space | interval obtained from the character string on the document after correction | amendment might become equal to each symbol value of the correction parameter value series d (k). 埋め込み情報復元処理部の機能構成を説明するブロック図である。It is a block diagram explaining the function structure of an embedded information restoration process part. 文書処理装置で実行される埋め込み情報復元処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of the embedded information restoration process performed with a document processing apparatus. 埋め込み情報復元ステップにおける処理の手順の一例を示したフローチャートである。It is the flowchart which showed an example of the procedure of the process in an embedded information restoration step. 式（１１）最終行の第２項はｉ＝ｂのときのみ非常に大きな値をとることを説明する図である。Expression (11) is a diagram for explaining that the second term in the last row takes a very large value only when i = b. 式（１３）の文字間隔の系列ａ’（ｋ）と式（５）の符号系列、およびそのシフトバージョンとの間の相互相関値を示した図である。It is the figure which showed the cross-correlation value between the character spacing series a '(k) of Formula (13), the code sequence of Formula (5), and its shifted version. 実施の形態２での情報埋め込み処理を説明する図である。FIG. 10 is a diagram for explaining information embedding processing in the second embodiment. 実施の形態４にて、同期用符号が周期的に多重化された符号化情報系列ｃ（ｋ）をパラメータ値系列に埋め込む状態を説明する図である。In Embodiment 4, it is a figure explaining the state which embeds the coding information series c (k) by which the code | symbol for a synchronization was periodically multiplexed in the parameter value series.

Explanation of symbols

１…文書処理装置、１０…第１文書入力部、２０…第２文書入力部、３０…処理部、３１…ＣＰＵ（中央処理装置)、３２…Ｉ/Ｏ回路、３３…ＲＯＭ、３４…ＲＡＭ、３５…ハードディスクドライブ装置（ＨＤＤ）、４０…埋め込み情報入力部、５０…文書出力部、６０…情報出力部、３００…情報埋め込み処理部、３０１…パラメータ値系列抽出部、３０２…符号系列生成部、３０３…情報符号化部、３０４…修正パラメータ値系列生成部、３０５…文書修正部、３０６…埋め込み情報復元部、３０７…符号系列群生成部、３５０…埋め込み情報復元処理部 DESCRIPTION OF SYMBOLS 1 ... Document processing apparatus, 10 ... 1st document input part, 20 ... 2nd document input part, 30 ... Processing part, 31 ... CPU (central processing unit), 32 ... I / O circuit, 33 ... ROM, 34 ... RAM 35 ... Hard disk drive (HDD), 40 ... Embedded information input unit, 50 ... Document output unit, 60 ... Information output unit, 300 ... Information embedding processing unit, 301 ... Parameter value sequence extraction unit, 302 ... Code sequence generation unit , 303 ... Information encoding unit, 304 ... Correction parameter value sequence generation unit, 305 ... Document correction unit, 306 ... Embedded information restoration unit, 307 ... Code sequence group generation unit, 350 ... Embedded information restoration processing unit

Claims

A document processing apparatus capable of embedding additional information in document data consisting of character strings,
A parameter value series extraction unit that extracts a parameter value series related to the document appearance in the character string from the input document data;
An information input unit for inputting additional information to be embedded in the document data;
A partial information generation unit that divides the additional information input from the information input unit to generate a plurality of partial additional information;
A code sequence generator for generating a predetermined code sequence;
Using the partial additional information generated by the partial information generation unit as a shift amount, adding the partial encoded information sequence obtained by shifting the predetermined code sequence generated by the code sequence generation unit by the shift amount Information coding that generates for each information, and further generates a coded information sequence in which the partial coded information sequence is periodically repeated and connected, and generates a synchronization code synchronized with the coded information sequence And
Generating a modified parameter value sequence obtained by synthesizing the encoded information sequence with the parameter value sequence, and a modified parameter sequence generating unit that multiplexes and embeds the synchronization code in the modified parameter value sequence ;
A document processing apparatus comprising: a document correction unit that corrects the document data based on the generated correction parameter value series.

The parameter value series extraction unit extracts a parameter value series related to a document appearance in the character string from the document data in which the additional information is embedded,
A partial parameter value series generation unit that divides the parameter value series extracted by the parameter value series extraction unit to generate a plurality of partial parameter value series;
The code sequences shifted version of said generated by the generating unit a predetermined code sequence and the same code sequence and the code sequence, and the encoded information sequence in synchronized so said synchronizing code with the same code sequence and the code sequence A code sequence group generation unit for generating a shifted version of
Based on the correlation value between the partial parameter value sequence generated by the partial parameter value sequence generation unit, the code sequence generated by the code sequence group generation unit and a shifted version of the code sequence, the partial parameter value sequence The additional information embedded every time is restored, and based on the correlation value between the parameter value series and the synchronization code and the shifted version of the code series, the synchronization code embedded for each parameter value series Detecting the position, detecting the positional relationship between the detected position of the synchronization code and the partial parameter value series, determining the order of the additional information of the partial parameter value series, the additional information further comprising an information restoring unit for restoring the additional information linked to embedded in the document data in accordance with the order The document processing apparatus according to claim 1, wherein the door.

A document processing method capable of embedding additional information in document data consisting of character strings,
Enter document data consisting of character strings,
Extracting a parameter value series related to the document appearance in the character string from the input document data,
Dividing the additional information embedded in the document data to generate a plurality of partial additional information,
Using the plurality of partial additional information as shift amounts, a partial encoded information sequence obtained by shifting a predetermined code sequence by the shift amount is generated for each partial additional information, and the partial encoded information sequence is periodically repeated. And generating a concatenated encoded information sequence, generating a synchronization code synchronized with the encoded information sequence,
Generating a modified parameter value sequence in which the encoded information sequence is combined with the parameter value sequence, and multiplexing and embedding the synchronization code in the modified parameter value sequence ;
A document processing method, wherein the document data is corrected by the correction parameter value series.