JP2002269075A

JP2002269075A - Embedding of information in document and decoding

Info

Publication number: JP2002269075A
Application number: JP2001071565A
Authority: JP
Inventors: Kazuyoshi Nagaho; 和義長保
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2001-03-14
Filing date: 2001-03-14
Publication date: 2002-09-20

Abstract

PROBLEM TO BE SOLVED: To secretly transmit secret information by embedding different information in a digitized ordinary document and decoding the information. SOLUTION: A standardized document is generated (step S110) by substituting a basic word for wording which can be rewritten with a word and/or representation showing the same concept in a document constituting an arbitrary document (original) according to a mutually shared dictionary database. According to a data row obtained from the data to be embedded through binary coding, coding processing for rewriting the standard document into a basic word for 0 and a word other than the basic word for 1 from the beginning is performed (step S130). The decoded document is a very natural document which seems to be the same with the original at a glance. The coded document makes it possible to reproduce the embedded data by decoding differences from the standardized document by algorithm reverse to the above algorithm.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、任意のデジタル化
された文章に所望の情報を埋め込む情報埋め込み技術及
びその埋め込まれた情報を復号する情報復号技術に関す
る。[0001] 1. Field of the Invention [0002] The present invention relates to an information embedding technique for embedding desired information in an arbitrary digitized text and an information decoding technique for decoding the embedded information.

【０００２】[0002]

【従来の技術】ワードプロセッサや文字読み取り装置の
進歩により、日記や手紙などの個人的な文章、契約書な
どの業務用の文章あるいは論文や小説などの専門的な文
章までがデジタル化されている。また、インターネット
などの情報配信技術の飛躍的な向上により、デジタル化
された文章を瞬時に世界中へ配信したり、公衆に開示す
ることが可能となっている。2. Description of the Related Art With the progress of word processors and character reading devices, personal sentences such as diaries and letters, business sentences such as contracts, and specialized sentences such as papers and novels have been digitized. Also, with the dramatic improvement of information distribution technology such as the Internet, it is possible to instantly distribute digitized text to the world or to disclose it to the public.

【０００３】一方、デジタル化された文章は、容易に複
製したり、その一部分のみを変更、削除するなどの編集
も簡単に行なうことができる。この様にデジタル化され
た文章は、紙に記述された文字による文章と異なり、利
用者の発想によって無限の利用形態が考えられるように
なっている。この様な多様な利用形態は、時にはそのデ
ジタル化された文章の正当な権利者にとって不測の不利
益をもたらすことも考えられる。そこで、デジタル化さ
れた文章に限らず、画像やデータなどのデジタル情報の
保護技術開発が行なわれている。例えば、この種の保護
技術としては、インターネットで秘密情報を送受信する
際の暗号化技術、デジタル著作物に著作権情報などを埋
め込む電子透かし技術などが知られている。これらの情
報保護の技術によれば、デジタル化された文章は、暗号
キーを用いなければ意味のない文字列としてしか認識す
ることができず、正当な権利者による許可が無い限りに
おいて、デジタル化された文章の利用を制限することが
できる。[0003] On the other hand, digitized texts can be easily copied and edited, such as changing or deleting only a part thereof, easily. The sentence digitized in this way is different from a sentence written in characters on paper, and an infinite use form can be considered depending on a user's idea. Such various forms of use can sometimes bring unforeseen disadvantages to the rightful right holder of the digitized text. Therefore, techniques for protecting digital information such as images and data, not limited to digitized sentences, are being developed. For example, as this type of protection technology, an encryption technology for transmitting and receiving secret information on the Internet, a digital watermark technology for embedding copyright information and the like in a digital work, and the like are known. According to these information protection technologies, a digitized text can only be recognized as a meaningless character string without using an encryption key. The use of the sentence can be restricted.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、従来の
デジタル化された文章の暗号化技術や電子透かし技術
は、次のような課題が未解決であった。従来の暗号化の
技術などは、デジタル化された文章を意味のない文字列
に変換して配布するため、その保護された文章は暗号キ
ーを用いて復元しなければ何の意味も有しておらず、そ
の文章を安心して配信、開示することが可能となる。However, the following problems have not been solved in the conventional digital text encryption technology and digital watermarking technology. Conventional encryption technology converts digitized text into meaningless character strings and distributes them, so the protected text has no meaning unless it is restored using an encryption key. It is possible to distribute and disclose the sentence without worry.

【０００５】反面、第三者には、意味のない文字列をイ
ンターネット上など発見したとき、その文字列は暗号化
された文章であり、その文字列には暗号化するほどに重
要な何らかの情報が隠されていることを容易に知ること
ができる。従って、これらの暗号化された文字列は、い
わゆるハッカーなどの恰好のターゲットとなり、暗号が
解読される危険性に曝されることとなる。また、意味の
ない文字列が暗号化された文章であると直ぐに判明する
ため、例えば仲間と秘密の連絡が取りたい場合などであ
っても、暗号化された文章でやり取りをしている事実が
第三者に容易に分かってしまう。On the other hand, when a third party finds a meaningless character string on the Internet or the like, the character string is an encrypted sentence, and the character string contains some information that is important enough to be encrypted. Can be easily known that is hidden. Therefore, these encrypted character strings are suitable targets for so-called hackers and the like, and are exposed to a risk that the encryption is broken. In addition, since a meaningless character string is immediately identified as an encrypted text, for example, even if you want to contact a friend secretly, the fact that you exchanged with the encrypted text is It is easily understood by a third party.

【０００６】他方、電子透かしは、やり取りするデータ
自体は暗号化されていないので、上記の問題は生じな
い。しかし、通常電子透かしは、人間の視聴覚では判読
できない形態で、オリジナルのデータに埋め込まれるの
で、画像や音楽のように冗長性の高い情報には、容易か
つかなりの量のデータを埋め込むことができるものの、
テキストデータ、特に何ら装飾されていないプレーンな
テキストデータには、情報を埋め込むことが困難であっ
た。文字コードは、例えば日本語であれば１文字が２バ
イト（１６ビット）で表現されるが、どの１ビットを変
更しても異なる文字コードに該当することになり、１文
字＝１６ビットという制約を守って、何らかの情報をこ
こに埋め込むことはできない。[0006] On the other hand, in the case of a digital watermark, the exchanged data itself is not encrypted, so that the above problem does not occur. However, digital watermarks are usually embedded in the original data in a form that cannot be read by human audiovisual information, so that highly redundant information such as images and music can easily embed a considerable amount of data. Although,
It has been difficult to embed information in text data, especially plain text data without any decoration. In the case of Japanese character codes, for example, one character is represented by two bytes (16 bits). However, changing any one bit corresponds to a different character code. We can't embed any information here.

【０００７】本発明は、上記した問題点を解決するため
になされたものであり、デジタル化された通常の文章中
に異なる情報を埋め込み、その情報を再現することで、
秘密情報の伝達を秘密裏に行なうことができる優れた情
報埋め込み技術および情報再現技術を提供することを目
的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and embeds different information in a digitized ordinary text and reproduces the information.
An object of the present invention is to provide an excellent information embedding technology and an information reproducing technology that can transmit secret information in secret.

【０００８】[0008]

【課題を解決するための手段およびその作用・効果】上
記目的を達成するために本発明の原理的な発明として、
まず情報の取扱方法について説明する。すなわち、本発
明の情報の取扱方法は、デジタル化された文章に所望の
情報を埋め込み、復号する情報の取扱方法であって、所
定の規定に基づき、異なる単語及び／または表記であり
ながらも同じ概念を表す単語及び／または表記によっ
て、前記文章を標準文章に書き換える標準化工程と、前
記標準化の工程で書き換えの対象となった単語及び／ま
たは表記に関し、前記所望の情報に対応したデータに従
って、前記同じ概念を表す単語及び／または表記のいず
れを用いるかを決定し、前記文章を書き換えて該データ
を埋め込む埋込工程と、当該工程によってデータが埋め
込まれた暗号化文章を受け取り、前記所定の規則に基づ
いて、該暗号化文章の書き換えられた単語及び／または
表記を抽出する抽出工程と、該抽出した単語及び／また
は表記に基づいて、前記埋め込まれたデータを復号する
復号工程とを備えたことを要旨とする。Means for Solving the Problems and Their Functions / Effects To achieve the above object, the present invention provides
First, a method of handling information will be described. That is, the information handling method of the present invention is a method of handling information for embedding and decoding desired information in a digitized text, and based on a predetermined rule, different words and / or notations but the same. A standardization step of rewriting the sentence into a standard sentence by a word and / or a notation representing a concept; and a word and / or notation rewritten in the standardization step, the data according to the desired information, Determining whether to use a word and / or a notation representing the same concept, and embedding the data by rewriting the sentence; and receiving an encrypted sentence in which the data is embedded by the step; Extracting the rewritten words and / or notations of the encrypted text based on the extracted words and / or tables. Based on, and summarized in that and a decoding step of decoding the embedded data.

【０００９】かかる情報の取扱方法によれば、デジタル
化された文章に対してこれを標準化する処理を行ない、
この標準化の際に、埋め込もうとする情報に対応したデ
ータに従って、標準化に用いられた同じ概念を表わす単
語及び／または表記のうちのいずれを用いるかを決定
し、これに従って、文章を書き換えることで、データを
文章に埋め込み、暗号化文章を生成する。この暗号化さ
れた文章を復号する場合には、暗号化文書から、所定の
規則に基づいて、書き換えられた単語及び／または表記
を抽出し、ここから埋め込まれたデータを復号する。According to this information handling method, a process for standardizing a digitized text is performed,
At the time of this standardization, it is determined according to data corresponding to the information to be embedded, which of the words and / or notations representing the same concept used in the standardization is to be used, and the text is rewritten accordingly. Then, the data is embedded in the text to generate an encrypted text. When decrypting the encrypted text, a rewritten word and / or notation is extracted from the encrypted document based on a predetermined rule, and the embedded data is decrypted from the extracted word and / or notation.

【００１０】即ち、本発明の情報の取扱方法によれば、
デジタル化された文章において、データ自体ではなく、
異なる単語及び／または表記を標準化する部分に冗長性
を持たせ、ここに所望の情報に対応したデータを埋め込
んでいる。このため、テキストデータのような冗長性が
ほとんどないと考えられていた情報に、他の情報を埋め
込むことができる。しかも、この場合、デジタル化され
た文章はそのまま読み取ることができるので、データが
埋め込まれていること自体が外見からは分からないと言
う利点が得られる。That is, according to the information handling method of the present invention,
In digitized text, not the data itself,
A part for standardizing different words and / or notations is provided with redundancy, and data corresponding to desired information is embedded therein. For this reason, other information can be embedded in information that is considered to have little redundancy, such as text data. Moreover, in this case, since the digitized text can be read as it is, there is an advantage that the fact that the data is embedded is not apparent from the appearance.

【００１１】かかる方法に対応した本発明の情報埋込装
置は、デジタル化された文章に所望の情報を埋め込む情
報埋め込み装置であって、所定の規定に基づき、異なる
単語及び／または表記でありながらも同じ概念を表す単
語及び／または表記によって、前記文章を標準文章に書
き換える標準化手段と、前記標準化の工程で書き換えの
対象となった単語及び／または表記に関し、前記所望の
情報に対応したデータに従って、前記同じ概念を表す単
語及び／または表記のいずれを用いるかを決定し、前記
文章を書き換えて該データを埋め込んだ暗号化文章を生
成する埋込手段とを備えることを要旨としている。The information embedding device of the present invention corresponding to such a method is an information embedding device for embedding desired information in a digitized sentence, wherein different words and / or notations are used based on a predetermined rule. The standardization means for rewriting the sentence into a standard sentence by a word and / or notation expressing the same concept, and the word and / or notation rewritten in the standardization step, in accordance with data corresponding to the desired information An embedding unit that determines which of a word and / or a notation representing the same concept is to be used and rewrites the sentence to generate an encrypted sentence in which the data is embedded.

【００１２】これにより本発明の情報埋込装置は、デジ
タル化された文章を、その文章を構成している文言と同
じ概念を表す単語及び／または表記によって書き換える
ことで暗号化文章を生成することができる。従って暗号
化文章は、一見したときには当初の文章と何ら変わらな
い、ごく自然な文章として生成される。従って、他のデ
ータが埋め込まれていること自体が、外見からは分から
ないと言う利点を有する。Thus, the information embedding apparatus of the present invention generates an encrypted text by rewriting the digitized text with words and / or notations representing the same concept as the text constituting the text. Can be. Therefore, the encrypted text is generated as a very natural text that at first glance is not different from the original text. Therefore, there is an advantage that the fact that other data is embedded is not apparent from the appearance.

【００１３】この暗号化文章は、当初の文章を構成して
いる単語及び／または表記を一旦標準文章に書き換えて
標準化し、この標準文章を、データと所定の規則とによ
って書き換えたものである。従って、暗号化文章と標準
文章との差を前記所定の規則の逆のアルゴリズムにより
検出することで、暗号化文章に埋め込まれたデータを、
容易に抽出することができる。そして、この抽出された
コードから、埋め込まれた所望の情報を再現することが
可能となる。かかる本発明の情報埋込装置は、例えば、
暗号送信用として、著作権情報や情報配信者を特定する
情報を埋め込む電子透かしとしてなど、幅広い応用が考
えられる。The encrypted text is obtained by temporarily rewriting words and / or notations constituting the original text into a standard text and standardizing the standard text, and rewriting the standard text according to data and predetermined rules. Therefore, by detecting the difference between the encrypted text and the standard text by the reverse algorithm of the predetermined rule, the data embedded in the encrypted text is
It can be easily extracted. Then, it becomes possible to reproduce the desired information embedded from the extracted code. Such an information embedding device of the present invention, for example,
A wide variety of applications are conceivable, such as a digital watermark for embedding copyright information or information for specifying an information distributor, for encryption transmission.

【００１４】ここで、標準化手段及び埋込手段は、単語
及び／または表記の書き換えのための所定の規則とし
て、同一の辞書データベースを利用することが好まし
い。なお、辞書データベースとは、例えば特開平１−２
７３１７１号公報、特開平１０−２４０７４３号公報、
特開平６−３０１７１８、特開昭６２−１７８７２号公
報に開示されるように、単語や表記の不確定さに起因す
る文章の多様さを一定の表現に変化させるために利用さ
れる各種技術が利用可能である。いわゆるシソーラスや
類義語辞書などを利用することもできるし、日本語にお
ける表記のゆれ、例えば「書換」と「書き換え」などの
表記のゆれや、「エネルギ」と「エネルギー」といった
長音記号の扱いのゆれなどを利用することも可能であ
る。この様な辞書データベースは、より自然な文章作成
のためには複雑で大容量となる傾向にあるため、標準化
手段及び埋込手段で同一の辞書データベースを利用する
ことで、情報埋込装置を小型、安価に構成し、また処理
速度を向上させることができる。Here, it is preferable that the standardizing means and the embedding means use the same dictionary database as a predetermined rule for rewriting words and / or notations. Note that the dictionary database is, for example, disclosed in
No. 73171, JP-A-10-240743,
As disclosed in JP-A-6-301718 and JP-A-62-17872, various techniques used to change a variety of sentences caused by uncertainty of words and notations into a constant expression are disclosed. Available. You can use so-called thesaurus and synonym dictionaries, etc., and the fluctuation of the notation in Japanese, for example, the fluctuation of the notation such as "rewrite" and "rewrite", and the fluctuation of the treatment of long sounds such as "energy" and "energy" It is also possible to use such as. Since such a dictionary database tends to be complicated and large in volume for more natural sentence creation, by using the same dictionary database for standardization means and embedding means, the information embedding device can be reduced in size. Inexpensively, the processing speed can be improved.

【００１５】また辞書データベースは、書き換えの候補
として文字、単語、文節の段階を有することが好まし
い。同じ概念を表す単語及び／または表記への書き換え
は、全角カタカナから半角カタカナなどのような単なる
文字の書き換え、類義語による単語の書き換え、文章の
解析による文節の書き換えの順に処理が複雑となり、か
つ、その書き換えによる概念の微妙な差異が発生する。
従って、辞書データベースが文字、単語、文節の段階を
備えるようにして構成することで、埋め込む情報量に応
じて文字のレベルから文節のレベルにまで段階的に書き
換えが有効となるようにすることで、より高速な処理
で、かつ、概念の変化を少なくして情報を埋め込むこと
が可能となる。Preferably, the dictionary database has characters, words, and phrases at the stage of rewriting. Rewriting to words and / or notations that represent the same concept is complicated in the order of rewriting characters such as full-width katakana to half-width katakana, rewriting words using synonyms, rewriting phrases by analyzing sentences, and The rewriting causes a subtle difference in concept.
Therefore, by configuring the dictionary database to include the stages of characters, words, and phrases, the rewriting becomes effective stepwise from the character level to the phrase level according to the amount of information to be embedded. It is possible to embed information at a higher speed and with less change in concept.

【００１６】更に辞書データベースは、任意のデータの
使用禁止及び／またはデータ変更を可能に構成すること
がより好ましい。この様にして辞書データベースのカス
タマイズを行なった情報埋込装置は、そのカスタマイズ
を行なった辞書データベースと同じ規則性を有した者に
しか再現することができない情報の埋め込みを行なうこ
とができ、埋め込まれた情報の秘密性を一層向上させる
ことができる。Further, it is more preferable that the dictionary database is configured so that use of arbitrary data is prohibited and / or data can be changed. The information embedding apparatus that has customized the dictionary database in this way can embed information that can be reproduced only by a person having the same regularity as the customized dictionary database. Confidentiality of the information can be further improved.

【００１７】なお、標準化手段は、デジタル化された文
章を標準文章に書き換えるに際して、デジタル化された
文章と標準文章との差分に基づく差分データを生成する
差分データ生成部を備える構成としても良い。この様な
差分データは、暗号化文章から当初のデジタル化された
文章を再現するために利用することができる。The standardizing means may be configured to include a difference data generating unit for generating difference data based on a difference between the digitized text and the standard text when rewriting the digitized text into a standard text. Such difference data can be used to reproduce the original digitized text from the encrypted text.

【００１８】以上説明した情報埋込装置は、完成した装
置としてばかりではなく、その処理のアルゴリズムであ
る処理方法としても有意義である。すなわち、他の発明
としての情報埋め込み方法は、デジタル化された文章に
所望の情報を埋め込む情報埋込方法であって、所定の規
定に基づき、異なる単語及び／または表記でありながら
も同じ概念を表す単語及び／または表記によって、前記
文章を標準文章に書き換える標準化工程と、前記標準化
の工程で書き換えの対象となった単語及び／または表記
に関し、前記所望の情報に対応したデータに従って、前
記同じ概念を表す単語及び／または表記のうちのいずれ
を用いるかを決定し、前記文章を書き換えて該データを
埋め込む埋込工程とを備えることを要旨としている。The above-described information embedding device is significant not only as a completed device but also as a processing method which is an algorithm of the processing. That is, the information embedding method as another invention is an information embedding method for embedding desired information in a digitized text, and based on a predetermined rule, the same concept is used although different words and / or notations are used. A standardization step of rewriting the sentence into a standard sentence by a word and / or notation to be expressed, and the same concept according to data corresponding to the desired information regarding the word and / or notation rewritten in the standardization step. And embedding a step of rewriting the sentence to embed the data by deciding which of the word and / or the notation to use.

【００１９】この方法の発明によっても、上記情報埋込
装置と同じ作用、効果を奏することができることは明白
である。また、この方法の発明による処理方法をコンピ
ュータにて実現させるため、この方法をコンピュータに
実行させるプログラムとして構成し、そのプログラムを
インターネットなどで配信したり、ＣＤ−ＲＯＭやフロ
ッピー（登録商標）ディスクなどの記録媒体に記録して
配布してもよい。It is obvious that the same operation and effect as the above-mentioned information embedding device can be achieved by the invention of this method. In addition, in order to realize the processing method according to the invention of the method on a computer, the method is configured as a program to be executed by a computer, and the program is distributed over the Internet or the like, or a CD-ROM, a floppy (registered trademark) disk, May be recorded on a recording medium and distributed.

【００２０】以上説明した情報埋込装置、その方法やプ
ログラムによって所望の情報が埋め込まれた暗号化文章
は、他の発明である情報再現装置によって以下のように
解析され、埋め込まれた所望の情報を簡単に再現するこ
とができる。すなわち、他の発明である情報復号装置
は、請求項１記載の情報埋込装置によって得られた暗号
化文章から前記所望の情報に対応したデータを復号する
情報復号装置であって、前記所定の規則に基づいて、前
記暗号化文章の書き換えられた単語及び／または表記を
抽出する抽出手段と、該抽出した単語及び／または表記
に基づいて、前記埋め込まれたデータを復号する復号手
段とを備えることを要旨としている。The encrypted text in which the desired information is embedded by the information embedding apparatus, the method and the program described above is analyzed by the information reproducing apparatus of another invention as follows, and the desired information embedded is embedded. Can be easily reproduced. That is, an information decryption device according to another invention is an information decryption device that decrypts data corresponding to the desired information from an encrypted text obtained by the information embedding device according to claim 1, wherein Extraction means for extracting the rewritten word and / or notation of the encrypted text based on rules, and decoding means for decoding the embedded data based on the extracted word and / or notation The gist is that.

【００２１】本発明における抽出手段は、所定の規則に
基づいて、暗号化文章の書き換えられた単語及び／また
は表記を抽出する。そして、復号手段は、この抽出手段
により抽出した単語及び／または表記に基づいて埋め込
まれたデータを復号するのである。The extracting means in the present invention extracts the rewritten word and / or notation of the encrypted text based on a predetermined rule. Then, the decoding means decodes the embedded data based on the words and / or notations extracted by the extracting means.

【００２２】これにより本発明の情報再現装置は、デジ
タル化された通常の文章である暗号化文章に埋め込まれ
た所望のデータを再現することができるため、暗号化文
章を介して所望のデータの受け取りを秘密裏に行なうこ
とができる。従って、所望の情報を埋め込んだ暗号化文
章を受け取るに当たって、第三者には暗号化文章の受け
取りそのもののに意味があるかのように思え、秘密の通
信を行なっている痕跡を残さずに所望の情報を受け取る
ことができる。このため、いたずらにデータが埋め込ま
れていこと自体を教えることがなく、秘密保持の信頼性
を高めることができる。Thus, the information reproducing apparatus of the present invention can reproduce the desired data embedded in the encrypted text, which is a normal digitized text, and can reproduce the desired data through the encrypted text. Receiving can be done secretly. Therefore, upon receiving the encrypted text in which the desired information is embedded, it seems to the third party that the reception of the encrypted text itself is meaningful, and it is desirable to leave a trace of secret communication without leaving any trace. Information can be received. For this reason, it is not necessary to tell that the data is embedded unnecessarily, and it is possible to increase the reliability of confidentiality.

【００２３】この様に優れた効果を発揮する本発明の情
報再現装置は、上記の情報埋込装置と一対となって、例
えば、暗号送信用として、著作権情報や情報配信者を特
定する情報を埋め込む電子透かしとしてなど、幅広い応
用が考えられる。The information reproducing apparatus of the present invention which exhibits such excellent effects is paired with the above-mentioned information embedding apparatus, for example, for encryption transmission, copyright information or information for specifying an information distributor. A wide variety of applications are conceivable, such as as a digital watermark that embeds a.

【００２４】ここで、抽出手段は、書き換えられた単語
及び／または表記を抽出するための所定の規則として、
上述した標準化手段及び埋込手段と同一の辞書データベ
ースを利用することが望ましい。予め定められた辞書デ
ータベースを利用することで、情報の再現精度が飛躍的
に向上するからである。また、この辞書データベースが
カスタマイズされたものである場合には、同様なアルゴ
リズムを有する情報埋込装置や情報再現装置であっても
所望の情報を再現することは不可能であり、情報の秘密
保持性能が一段と向上する。Here, the extracting means includes, as a predetermined rule for extracting the rewritten word and / or notation,
It is desirable to use the same dictionary database as the standardization means and embedding means described above. This is because the use of a predetermined dictionary database greatly improves the information reproduction accuracy. Also, if this dictionary database is customized, it is impossible to reproduce desired information even with an information embedding device or an information reproducing device having a similar algorithm. Performance is further improved.

【００２５】[0025]

【発明の実施の形態】以上説明した本発明の構成及び作
用を一層明らかにするために、以下本発明の実施の形態
を実施例を通して説明する。（１）実施例の構成：図１は、本発明の実施例である情
報埋込方法及び情報復号方法を具現化するプログラム
が、インターネットのようなネットワーク１０に接続さ
れたコンピュータ２０，３０にインストールされている
例を示している。また、多数のその他のコンピュータ４
０は、ネットワーク１０に接続されている不特定の利用
者のものであり、コンピュータ２０は、これらの多数の
コンピュータ３０，４０の利用者との間で、メールなど
のテキストデータのやり取りを行なっている。こうした
コンピュータ４０の中には、ネットワーク１０上を行き
交っている情報を不正に入手しようとする悪意の利用者
も含まれている。なお、本実施例では、以下のような例
を説明する。まず、コンピュータ２０にてデジタル化さ
れた文章（テキストデータ）に情報の埋め込みを行なっ
た後に、その文章をネットワーク１０を通じて他のコン
ピュータ３０へメール送信する。次いで、そのメールを
受け取ったコンピュータ３０では、メールの文章からコ
ンピュータ２０にて埋め込まれた情報を再現する。DESCRIPTION OF THE PREFERRED EMBODIMENTS In order to further clarify the configuration and operation of the present invention described above, embodiments of the present invention will be described below through examples. (1) Configuration of the embodiment: FIG. 1 shows a program embodying an information embedding method and an information decoding method according to an embodiment of the present invention installed in computers 20 and 30 connected to a network 10 such as the Internet. An example is shown. Also, many other computers 4
Numeral 0 denotes an unspecified user connected to the network 10, and the computer 20 exchanges text data such as e-mail with the users of the many computers 30 and 40. I have. Some of these computers 40 include malicious users who try to illegally obtain information circulating on the network 10. In this embodiment, the following example will be described. First, the computer 20 embeds information in a digitized text (text data), and then sends the text to another computer 30 via the network 10. Next, the computer 30 that has received the mail reproduces the information embedded in the computer 20 from the text of the mail.

【００２６】コンピュータ２０，３０の内部構成を図２
に示した。この図では、コンピュータ２０を例として取
り上げた。コンピュータ２０は、モデムやルータ１８を
介してネットワーク１０とのデータのやり取りを制御す
るネットワークインタフェース（ＮＴ−Ｉ／Ｆ）２１、
処理を行なうＣＰＵ２２、処理プログラムや固定的なデ
ータを記憶するＲＯＭ２３、ワークエリアとしてのＲＡ
Ｍ２４、時間を管理するタイマ２５、モニタ２９への表
示を司る表示回路２６、後述する各種のデータを蓄積す
るハードディスク（ＨＤ）２７、キーボード１１やマウ
ス１２とのインタフェースを司る入力インタフェース
（Ｉ／Ｆ）２８等を備える。なお、ハードディスク２７
は、固定式のものとして記載したが、着脱式のものでも
良いし、着脱式の記憶装置（他とえ゛はＣＤ−ＲＯＭ、
ＣＤ−Ｒ、ＣＤ−ＲＷ、ＤＶＤ、フレキシブルディスク
など）を併用することも可能である。また、この実施例
では、コンピュータ２０の処理プログラムは、ＲＯＭ２
３内に記憶されているものとしたが、ハードディスク２
７に記憶しておき、起動時にＲＡＭ２４上に展開して実
行するものとしても良い。あるいは、上述した着脱式の
記録媒体から読み込むものとしても良い。更には、ネッ
トワーク１０を介して、他のサーバから読み込んで実行
するものとしても良い。FIG. 2 shows the internal configuration of the computers 20 and 30.
It was shown to. In this figure, the computer 20 is taken as an example. The computer 20 includes a network interface (NT-I / F) 21 for controlling data exchange with the network 10 via a modem or a router 18.
CPU 22 for processing, ROM 23 for storing processing programs and fixed data, RA as a work area
M24, a timer 25 for managing time, a display circuit 26 for controlling display on a monitor 29, a hard disk (HD) 27 for storing various data described later, and an input interface (I / F) for controlling an interface with the keyboard 11 and the mouse 12. ) 28 etc. The hard disk 27
Has been described as a fixed type, but a removable type may be used, or a removable storage device (CD-ROM,
CD-R, CD-RW, DVD, flexible disk, etc.). In this embodiment, the processing program of the computer 20 is stored in the ROM 2
3 is stored in the hard disk 2
7, and may be developed and executed on the RAM 24 at the time of startup. Alternatively, the data may be read from the above-mentioned removable recording medium. Furthermore, the program may be read from another server via the network 10 and executed.

【００２７】図１に示したコンピュータ２０では、使用
者は、キーボード１１から入力した文書（テキストデー
タ）や、ネットワーク１０を介して外部から取り込んだ
テキストデータを編集し、最終的にはハードディスク２
７に格納する。その上で、これをメールに添付書類とし
て添付し、他のコンピュータ３０の使用者に、ネットワ
ーク１０を介して送信する。In the computer 20 shown in FIG. 1, a user edits a document (text data) inputted from the keyboard 11 or text data taken from the outside via the network 10 and finally edits the hard disk 2.
7 is stored. Then, this is attached to the mail as an attached document and transmitted to the user of another computer 30 via the network 10.

【００２８】本実施例では、コンピュータ２０，３０
は、そのハードディスク２７に、かな漢字変換用の辞書
の他に、文書の標準化を行なうための辞書データベース
ＤＤＢを記憶している。図３は、この辞書データベース
のデータ例の説明図である。図示するように辞書データ
ベースは、異なる単語や表記でありながらも同じ概念を
表す単語や表記、いわゆる表記のゆれのデータを集積し
たものであり、いわゆる標準化用の辞書である。コンピ
ュータ２０は、作成され、ハードディスク２７に記憶さ
れた文章に対して、次の４つのレベルの標準化を行な
う。文字の標準化処理（予め定めた文字に置き換える文字
の標準化）、文字レベルの表記のゆれであり、カタカナ
語尾の長音の有無、カタカナ全角／半角といった全く概
念に変化のない表記のゆれを統一するのである。表記の統一処理（表記のゆれを予め定めた表記に統一
する処理）、単語レベルの表記のゆれであり、ほとんど
意味の異ならない複数漢字表記などを統一するのであ
る。自立語処理（自立語を、予め定めた置き換えの基準に
従って、他の自立語に置き換える処理）、自立語の表記
のゆれであり、わずかに意味の異なることがある単語間
の入れ替えを行なって、使用する自立語を統一するので
ある。付属語処理（付属語を、所定の規則に従って他の付属
語に置き換える処理）文節レベルの表記のゆれを統一す
るものであり、冗長な文末と体言止め、くだけた表現と
通常の表現、受け身表現と受け身語などを一方に統一す
るのである。In the present embodiment, the computers 20, 30
Stores a dictionary database DDB for standardizing documents, in addition to a dictionary for kana-kanji conversion, on the hard disk 27. FIG. 3 is an explanatory diagram of a data example of the dictionary database. As shown in the figure, the dictionary database is a collection of standardized dictionaries in which words and notations that are different words and notations but represent the same concept, that is, so-called skewed data are accumulated. The computer 20 performs the following four levels of standardization on the text created and stored on the hard disk 27. Character standardization processing (standardization of characters to be replaced with predetermined characters), character-level notation fluctuation, and unification of notation fluctuations that have no change in the concept, such as the presence or absence of long sounds in katakana endings and katakana full-width / half-width characters is there. Notation unification processing (processing to unify notation fluctuation to a predetermined notation), word-level notation fluctuation, and unification of plural kanji notations with almost no difference in meaning. Independent word processing (processing of replacing an independent word with another independent word in accordance with a predetermined replacement criterion), which is a fluctuation of the notation of the independent word, and performs replacement between words that may have slightly different meanings, It unifies the independent words used. Adjunct processing (processing to replace adjuncts with other adjuncts in accordance with prescribed rules) This is to unify fluctuations in phrase-level notation, stop redundant sentence and body expressions, use plain and ordinary expressions, and passive expressions And unify the passive language.

【００２９】なお、この第４の付属語処理（文節レベル
の表記のゆれの処理）は、文章の多様さを一定の表現に
変化させるために利用される、いわゆる文章の標準化技
術などにより、前後の文章や単語の用法などを加味して
行なわれるものであって、辞書データベースのデータの
一部であることの説明のために図３に一例を示してい
る。The fourth attached word processing (processing of swaying of the notation at the phrase level) is performed before and after by a so-called sentence standardization technique or the like, which is used to change the diversity of the sentence into a constant expression. FIG. 3 shows an example in order to explain that it is a part of the data of the dictionary database, taking into account the sentence and the usage of words.

【００３０】コンピュータ２０では、図４のフローチャ
ート示すような情報埋め込みプログラムが実行され、任
意のデジタル文章に所望の情報を埋め込む。この情報埋
め込みプログラムが実行されると、まず情報を埋め込む
対象となるデジタル文章（以下、単に原文という）の特
定処理（ステップＳ１００）が行なわれ、コンピュータ
２０の利用者に原文を特定するように催促する。原文
は、ハードディスク２７に記憶された文書により特定し
ても良いし、電子メールなどで作成し、一旦メールソフ
ト内で保存した文書として特定しても良い。図５は、こ
の特定処理（ステップＳ１００）にてコンピュータ２０
の利用者が特定した原文の一例である。In the computer 20, an information embedding program as shown in the flowchart of FIG. 4 is executed to embed desired information in an arbitrary digital text. When the information embedding program is executed, first, a process of specifying a digital text (hereinafter simply referred to as an original text) to be embedded with information is performed (step S100), and a user of the computer 20 is prompted to specify the original text. I do. The original text may be specified by a document stored in the hard disk 27 or may be specified as a document created by e-mail or the like and temporarily stored in mail software. FIG. 5 shows the computer 20 in this specific processing (step S100).
This is an example of the original sentence specified by the user.

【００３１】こうして原文が特定されると、次にその原
文の文章中に存在する表現のゆれを特定し、標準化する
標準化処理（ステップＳ１１０）が実行される。図６
は、この標準化処理の実行時にコンピュータ２０に表示
される表示画面である。図示するように、標準化処理に
は文字レベル、単語レベル、文節レベルのどのレベルで
行なうかを指定するチェックボックスＲＢ１、後述する
差分データを作成するか否かを選択するラジオボタンＲ
Ｂ２と、標準化の実行，キャンセルを指示するボタンＢ
Ｂ３，ＢＢ４が用意されている。本実施例では、図６に
示すように標準化レベルとして単語レベルが、差分デー
タでは作成するが選択され、実行ボタンＢＢ３がクリッ
クされた場合について説明する。When the original sentence is specified in this manner, a standardization process (step S110) for specifying the fluctuation of the expression existing in the sentence of the original sentence and standardizing the expression is then performed. FIG.
Is a display screen displayed on the computer 20 when the standardization process is executed. As shown in the figure, a check box RB1 for specifying at which level of the character level, word level, or phrase level the standardization process is to be performed, and a radio button R for selecting whether or not to create difference data described later.
B2 and a button B for instructing execution and cancellation of standardization
B3 and BB4 are prepared. In the present embodiment, a case will be described in which the word level is selected as the standardization level and "create" is selected in the difference data as shown in FIG. 6, and the execution button BB3 is clicked.

【００３２】標準化レベルが単語レベルであるときは、
図５に示した原文の文章中から辞書データベースの文字
レベルと単語レベルで定義されている表記のゆれが検索
され、その検索結果を総て基本単語に置き換える標準化
処理が実行される。従って、本実施例では、原文は図７
に示すような文章に書き換えられることとなる。なお、
図７は、原文の文章中に存在する表記のゆれについ
て（）を、また基本単語に置き換えられた表記のゆれに
ついて［］を付している。この標準化処理（ステップＳ
１１０）により、原文は辞書データベースの基本単語を
用いた文章（以下、標準化文章という）に書き換えられ
る。When the standardization level is the word level,
In the original sentence shown in FIG. 5, the notation fluctuation defined at the character level and the word level in the dictionary database is searched, and a standardization process for replacing all the search results with basic words is performed. Therefore, in the present embodiment, the original is
Will be rewritten as shown in In addition,
In FIG. 7, () is attached to the fluctuation of the notation existing in the original sentence, and [] is added to the fluctuation of the notation replaced by the basic word. This standardization process (step S
110), the original sentence is rewritten into a sentence using the basic words of the dictionary database (hereinafter referred to as a standardized sentence).

【００３３】また、差分データの作成が選択されている
ときには、図７に（），［］で示したそれぞれの表記の
ゆれに対して、（）には０を，［］には１を割り当てた
データ列が作成される。従って図７に示した例では、そ
のデータ列は「０１００００００００００００１」とな
る。そして、このデータ列を先頭から８ずつに区分し、
８ビットのＡＳＣＩＩコードの２進表記の文字表現した
ものが差分データとして決定される。なお、データ列が
８の倍数でないときは、０のデータを追加して８の倍数
となるように末尾で調整する。従って本実施例の場合の
差分データは、「０１００００００ｂ」（「ｂ」はバイ
ナリデータを示す）と「００００００１０ｂ」の２バイ
トのデータとなる。なお、このバイナリコードは、ＡＳ
ＣＩＩコードに変換すると、「＠」「SX」（制御コー
ド）に相当する。When creation of difference data is selected, 0 is assigned to () and 1 is assigned to [] for each of the notations shown in () and [] in FIG. A data string is created. Therefore, in the example shown in FIG. 7, the data string is “0100000000000001”. Then, this data string is divided into eight from the beginning,
A binary representation of an 8-bit ASCII code is determined as difference data. If the data string is not a multiple of 8, data of 0 is added and adjusted at the end so as to be a multiple of 8. Therefore, the difference data in this embodiment is two-byte data of “01000000b” (“b” indicates binary data) and “00000010b”. In addition, this binary code is AS
When converted into a CII code, they correspond to “＠” and “SX” (control code).

【００３４】こうして原文の標準化処理（ステップＳ１
１０）が終了して標準化文章と差分データが用意される
と、次に埋め込む情報をコンピュータ２０の利用者に催
促する埋め込み情報の特定処理（ステップＳ１２０）が
行なわれる。この処理では、予め埋め込む情報が記述さ
れた電子ファイルを特定したり、あるいはコンピュータ
２０のキーボード１１から直接情報をキー入力するなど
の方法により、埋め込もうとするデータを特定する。以
下、本実施例では、埋め込むデータとしてキーボードか
らキャラクタ「Ａ」（半角）が入力された場合について
説明する。In this way, the original text is standardized (step S1).
When the standardized text and the difference data are prepared after 10), the process of specifying embedded information for prompting the user of the computer 20 for information to be embedded next is performed (step S120). In this processing, data to be embedded is specified by a method such as specifying an electronic file in which information to be embedded is described in advance, or directly inputting information from the keyboard 11 of the computer 20 by a key. Hereinafter, in the present embodiment, a case where a character “A” (half-width) is input from a keyboard as data to be embedded will be described.

【００３５】キーボード１１からデータが入力される
と、これを受けて暗号化処理（ステップＳ１３０）が実
行される。この暗号化処理では、埋め込む情報が半角の
「Ａ」であるとすると、そのＡＳＣＩＩコード「４１」
をバイナリデータ「０１０００００１ｂ」に変換し、こ
のビット列に従い、図７に（），［］で示した標準化の
実行箇所について、データ列の順番に文頭から０ならば
基本文字のまま、１ならば基本文字以外の同義語などに
書き換える処理を行なう。すなわち、（），［］で示し
た表記のゆれの文頭から２番目と８番目の文字「御社」
「工員」を基本文字とは異なる「貴社」「職工」に書き
換え、図８に示すような文章（以下、暗号化文章とい
う）を作成するのである。When data is input from the keyboard 11, an encryption process (step S130) is executed in response to the data. In this encryption process, assuming that the information to be embedded is a single-byte “A”, the ASCII code “41”
Is converted to binary data "010000001b". According to this bit string, the standardization execution portions indicated by () and [] in FIG. A process of rewriting to a synonym other than a character is performed. That is, the second and eighth characters from the beginning of the shaking of the notation shown in ()
The “engineer” is rewritten to “your company” and “occupation” different from the basic characters, and a text as shown in FIG. 8 (hereinafter referred to as an encrypted text) is created.

【００３６】続いて、差分データを出力する処理を行な
う（ステップＳ１４０）。この差分データは、原文と標
準化された文章との差分を、同様に「０」「１」のビッ
ト列に変換したものである。本実施例では、差分データ
は、暗号化文章に付属する形態で出力するものとした
が、暗号化文章とは異なるファイルとして出力するもの
としても良い。Subsequently, processing for outputting difference data is performed (step S140). The difference data is obtained by similarly converting the difference between the original sentence and the standardized sentence into a bit string of “0” and “1”. In the present embodiment, the difference data is output in a form attached to the encrypted text, but may be output as a file different from the encrypted text.

【００３７】こうして、コンピュータ２０の利用者は、
データ「Ａ」が埋め込まれた暗号化文章を得ることがで
き、その暗号化文章をメールとしてコンピュータ３０へ
送信する。なお、この時、メールに差分データを添付し
たり、あるいは差分データも埋め込み情報として暗号化
文章に含ませて送信することもできる。Thus, the user of the computer 20
An encrypted text in which the data “A” is embedded can be obtained, and the encrypted text is transmitted to the computer 30 as mail. At this time, the difference data can be attached to the mail, or the difference data can be included in the encrypted text as embedded information and transmitted.

【００３８】以上説明した本実施例の暗号化処理によれ
ば、通常のテキストデータに、これとは異なるデータを
重畳し、埋め込むことができる。即ち、テキストデータ
の標準化という作業により得られた冗長性を巧みに利用
して、ここに他のデータを埋め込んでいるのである。し
かも、こうして得られた暗号化文章は、通常のテキスト
と同様に読み取ることができるから、他人に暗号文であ
ると悟られるおそれがない。更に、文章を構成する単語
や表記を異ならせることでデータを埋め込んでいるの
で、得られた暗号化文章は、読めばそのまま意味を通じ
ることができる。According to the above-described encryption processing of this embodiment, different data can be superimposed on normal text data and embedded. That is, other data is embedded here by skillfully utilizing the redundancy obtained by the work of standardizing text data. Moreover, the encrypted text obtained in this way can be read in the same manner as ordinary text, so that there is no fear that others will realize that the text is a cipher text. Further, since the data is embedded by making the words and notations constituting the text different, the obtained encrypted text can be understood as it is if read.

【００３９】次に、コンピュータ３０にて暗号化文章か
ら情報「Ａ」を再現する情報復号プログラムについて説
明する。図９は、コンピュータ３０にて実行される情報
復号プログラムのフローチャートである。このプログラ
ムが実行されると、コンピュータ３０は、情報復号の要
領を決定する復号要領決定処理（ステップＳ２００）を
実行する。図１０は、この復号要領決定処理の実行時に
コンピュータ３０のモニタ２９に表示される表示画面で
ある。図示するように、暗号解読を、文字レベル、表記
のゆれレベル、自立語レベル、付属語（文節）レベルの
うち、どれとどれを用いて行なうかを指定するためのチ
ェックボックスＲＢ１１、差分データの有無を指定する
ラジオボタンＲＢ１２と、解読の実行，キャンセルを指
示するボタンＢＢ１３，ＢＢ１４が用意されている。本
実施例では、コンピュータ２０の利用者から単語レベル
の暗号化文章が送られ、差分データもメールに添付して
送信されている場合を想定して説明する。この場合に
は、図１０に示すように単語レベルのチェックボックス
と差分データ有りのラジオボタンが選択され、実行ボタ
ンがクリックされる。どのレベルで暗号解読を行なうか
は、データの送信者と受信者で一致していなければ、解
読はできない。従って、この暗号解読のレベル自体を両
者以外に秘匿しておけば、暗号の信頼性は更に高められ
る。もとより、暗号解読用の辞書データベースも、両者
で同一のものを使用している必要があり、この点でも暗
号の信頼性は極めて高い。Next, an information decryption program for reproducing the information "A" from the encrypted text by the computer 30 will be described. FIG. 9 is a flowchart of the information decoding program executed by the computer 30. When this program is executed, the computer 30 executes a decoding point determination process (step S200) for determining the point of information decoding. FIG. 10 shows a display screen displayed on the monitor 29 of the computer 30 at the time of executing the decryption point determination process. As shown in the figure, a check box RB11 for designating which and which of the character level, the notation fluctuation level, the independent word level, and the attached word (phrase) level are to be used, and the A radio button RB12 for specifying the presence / absence and buttons BB13 and BB14 for instructing execution and cancellation of decoding are provided. In the present embodiment, description will be made on the assumption that a user of the computer 20 transmits a word-level encrypted text and also transmits difference data attached to an e-mail. In this case, a word-level check box and a radio button indicating that there is difference data are selected as shown in FIG. 10, and an execution button is clicked. The level of decryption cannot be determined unless the sender and receiver of the data match. Therefore, if the level of the decryption is kept confidential to both parties, the reliability of the encryption can be further enhanced. Needless to say, the same decryption dictionary database must be used for both, and in this regard, the reliability of the encryption is extremely high.

【００４０】続いて、文章のファイルの特定を催促する
暗号化文章の特定処理（ステップＳ２１０）が実行さ
れ、特定された暗号化文章を読み出す処理が行なわれ
る。コンピュータ３０の利用者は、この時にメールに添
付されて送られてきた暗号化文章（図７）を特定する。
このとき、差分データも読み取られる。Subsequently, a process of specifying an encrypted text (Step S210) for prompting the user to specify a text file is performed, and a process of reading the specified encrypted text is performed. The user of the computer 30 specifies the encrypted text (FIG. 7) attached to the mail at this time.
At this time, the difference data is also read.

【００４１】続く暗号解読処理（ステップＳ２２０）で
は、その特定された暗号化文章に対して、図３に例示し
た辞書データベースとを用いて、暗号化文章の文章中の
表記のゆれを文頭から検索する処理を行なう。その表記
のゆれが基本単語であるときは０，基本単語以外である
ときは１とするデータ列を作成し、このデータ列を先頭
から８ずつに区切る。例えば、図８に示した暗号化文章
の場合には、最初の８カ所の標準化が行なわれ得る単語
について、順に「基本単語」「基本単語以外」「基本単
語」「基本単語」・・・・となっているので、これを最
初の８カ所についてバイナリコードにすると、「０１０
００００１ｂ」というデータ列が作成される。これをＡ
ＳＣＩＩコードに変換すると、キャラクタ「Ａ」が復号
されるのである。In the subsequent decryption process (step S220), the specified encrypted text is searched from the beginning of the text of the encrypted text using the dictionary database illustrated in FIG. Is performed. A data string is created to be 0 when the fluctuation of the notation is a basic word and 1 when it is other than a basic word, and this data string is divided into eight from the beginning. For example, in the case of the encrypted sentence shown in FIG. 8, the words that can be standardized in the first eight places are "basic word", "other than basic word", "basic word", "basic word", and so on. Therefore, if these are converted into binary codes for the first eight locations, “010
A data string “00001b” is created. This is A
When converted to the SCII code, the character "A" is decoded.

【００４２】次の原文再現処理（ステップＳ２３０）
は、ステップＳ２００にて差分データが有るとチェック
した場合にのみ実行される処理であり、コンピュータ３
０は利用者に差分データの特定を促してこれを入手し、
暗号化文章から原文を再現する。この処理は、初めに暗
号化文章を標準化文章に再変換するために、総ての表記
のゆれを基本単語に変換する。そして、特定された差分
データ「＠」「SX」をバイナリコードに変換し、その０
と１のデータに基づいてその標準化文頭から０ならば基
本文字のまま、１ならば基本文字以外の文字に書き換え
る処理を実行するのである。この処理により、図８に示
した暗号化文章は、図５に示した原文へと変換され、コ
ンピュータ２０の利用者が最初に作成した文章を再現す
ることができる。以上の説明では、理解の便を優先し
て、埋め込まれるデータには圧縮処理などは施されてい
ないものとしたが、実際には、文章に埋め込まれるデー
タは、圧縮されている。圧縮の手法は、ＬＺ符号化な
ど、周知の手法を利用することができる。埋め込むべき
情報を圧縮することにより、テキストデータに埋め込め
る情報量は増加するから、本実施例のように、原文に回
復するための差分情報を一緒に埋め込む場合などには、
効果は特に大きい。Next original text reproduction process (step S230)
Is a process executed only when it is checked in step S200 that there is difference data.
0 prompts the user to identify the difference data and obtains it,
The original text is reproduced from the encrypted text. In this process, all the fluctuations of the notation are converted into basic words in order to first convert the encrypted text back into the standardized text. Then, the specified difference data “＠” and “SX” are converted into a binary code,
Based on the data of (1) and (1), a process of rewriting to a character other than the basic character is performed if the value is 0 from the beginning of the standardized sentence and remains as the basic character. By this processing, the encrypted text shown in FIG. 8 is converted into the original text shown in FIG. 5, and the text created first by the user of the computer 20 can be reproduced. In the above description, the embedding data has not been subjected to a compression process or the like for the sake of convenience of understanding, but the data embedded in the text is actually compressed. As a compression technique, a known technique such as LZ encoding can be used. By compressing the information to be embedded, the amount of information that can be embedded in the text data increases, so when embedding the difference information for restoring the original text together as in this embodiment,
The effect is particularly great.

【００４３】以上のように構成される本実施例の情報埋
め込み方法及び情報再現方法によれば、ネットワーク１
０を介して送受信される暗号化文章は、ごく普通の文章
の形態（図８参照）をしており、この文章に他のデータ
（本実施例では「Ａ」）が埋め込まれているとは気づか
れることがない。デジタル化された通常の文章に異なる
情報を埋め込んで暗号化文章とすることができ、この暗
号化文章により所望の情報の伝達を秘密裏に行なうこと
ができるのである。従って、所望の情報を埋め込んだ暗
号化文章を配信すると、第三者には暗号化文章そのもの
の配信に意味があるかのように思え、秘密の通信を行な
っている痕跡を残さずに所望の情報を配布することがで
きる。このため、ハッカーなどによる暗号破りの対象と
なることもなく、信頼性の高い秘密保護が達成される。
しかも、暗号化された文章を読んでも意味は通り、こち
らも情報の伝達用に用いることができる。なお、本実施
例では情報「Ａ」を埋め込む暗号化の利用形態を説明し
たが、埋め込む情報を著作権情報とすることで電子透か
しとしても利用することができる。According to the information embedding method and the information reproducing method of the present embodiment configured as described above, the network 1
The encrypted text transmitted / received via “0” is in the form of an ordinary text (see FIG. 8), and other data (“A” in this embodiment) is embedded in this text. Not noticed. Different information can be embedded in a digitized ordinary text to form an encrypted text, and the encrypted text can be used to secretly transmit desired information. Therefore, when the encrypted text in which the desired information is embedded is delivered, it seems to a third party that the delivery of the encrypted text itself is meaningful, and the desired content can be obtained without leaving a trace of secret communication. Information can be distributed. Therefore, highly reliable secret protection is achieved without being subject to code breaking by hackers or the like.
In addition, reading the encrypted text makes sense and can also be used to convey information. In this embodiment, the use form of the encryption for embedding the information “A” has been described. However, if the information to be embedded is copyright information, it can be used as an electronic watermark.

【００４４】また図３に示した辞書データベースは、書
き換えの候補として文字レベル、単語レベル、文節レベ
ルの３つの段階を任意に設定することができる。このた
め、原文と暗号化文章とを僅かな表現の違いに抑えたい
ときには文字レベルあるいは表記のゆれレベルを選択
し、埋め込む情報量が多いときには、自立語レベルや付
属語（文節）レベルも選択するなど、多様な利用が可能
となる。しかも、原文の標準化処理（ステップＳ１１
０）の際に差分データを作成することで、暗号化文章か
ら原文を忠実に再現することができ、微妙なニュアンス
が必要な文章にも安心して暗号化文章を作成し、秘密裏
に埋め込み情報を送信することができる。The dictionary database shown in FIG. 3 can arbitrarily set three levels of character level, word level, and phrase level as candidates for rewriting. Therefore, when it is desired to suppress the difference between the original text and the encrypted text with a slight difference in expression, the character level or the swaying level of the notation is selected, and when the amount of information to be embedded is large, the independent word level and the attached word (phrase) level are also selected. Various uses are possible. Moreover, the original text is standardized (step S11).
By creating the difference data at the time of 0), the original text can be faithfully reproduced from the encrypted text, and even if a nuance is required, the encrypted text can be created with confidence, and the secret information embedded Can be sent.

【００４５】なお、図３に示した辞書データベースで
は、標準化の対象となる表記を二者択一にしているが、
実際には、例えば「重要な」という自立語であれば「大
切な」以外にも「大事な」「肝要な」といった多数の同
義語が存在する。また、表記のゆれにおいても、「書き
換え」「書換」「書換え」など複数の表記のゆれが存在
する語もありえる。こうした類義語が多数存在する自立
語を置き換えるような場合には、例えば辞書における語
の並びを利用して、標準化した場合に利用する基本単語
の直後に配列されている単語に置き換えるものとすれば
よい。また、差分データについては、何番目の単語から
置き換えたかを複数ビットにより表現するものとして生
成すればよい。即ち、辞書データベースに「重要な、大
切な、大事な、肝要な」という類義語がこの順に並んで
おり、基本単語が「重要な」であるとして、原文が「大
事な」であってこれを「重要な」という基本単語に置き
換えた場合には、基本単語との距離を２と見て「１０」
という差分データを生成するのである。こうしておけ
ば、標準化の辞書データベースが、一つの基本単語につ
いて複数個の置き換え語を有している場合でも、暗号化
および復号化、更には原文の回復を行なうことができ
る。Note that, in the dictionary database shown in FIG. 3, the notation to be standardized is either an alternative.
Actually, for example, in the independent word "important", there are many synonyms such as "important" and "important" in addition to "important". In addition, there may be a word having a plurality of notations such as “rewrite”, “rewrite”, and “rewrite”. In the case of replacing a self-contained word having a large number of such synonyms, for example, a word arrangement in a dictionary may be used to replace a word arranged immediately after a basic word used in standardization. . In addition, the difference data may be generated by expressing the number of words replaced with a plurality of bits. That is, synonyms such as "important, important, important, important" are arranged in this order in the dictionary database, and it is assumed that the basic word is "important" and the original sentence is "important," When it is replaced with the basic word “important”, the distance from the basic word is 2 and “10”
That is, the difference data is generated. In this way, even if the standardized dictionary database has a plurality of replacement words for one basic word, encryption and decryption, and further, original text recovery can be performed.

【００４６】なお、本発明は上記実施例に限定されるも
のではなく、その要旨を逸脱しない各種態様により具現
化されることは勿論である。例えば、上記実施例で辞書
データベースの任意のデータの使用禁止及び／またはデ
ータ変更を可能とする辞書ユーティリティプログラムを
付加し、辞書データベースのカスタマイズを行なっても
よい。こうしてカスタマイズした辞書データベースをコ
ンピュータ２０，３０の利用者が共有するならば、同じ
アルゴリズムで暗号の送受信をしている他のコンピュー
タ４０によって暗号化文章が不正入手された場合であっ
ても、原文や埋め込み情報が再現されることもなく、秘
密性を一層向上させることができる。The present invention is not limited to the above embodiment, but may be embodied in various forms without departing from the gist of the invention. For example, the dictionary database may be customized by adding a dictionary utility program which enables the use of arbitrary data in the dictionary database and / or changes data in the above embodiment. If the users of the computers 20 and 30 share the customized dictionary database, even if the encrypted text is illegally obtained by another computer 40 that transmits and receives the cipher using the same algorithm, the original text and the The confidentiality can be further improved without the embedded information being reproduced.

【００４７】また、上記実施例では、基本単語またはそ
れ以外の文字を利用することを０または１に対応させる
ことで、１つの表記のゆれを利用して１ビットの情報を
埋め込む例を示したが、特定の表記のゆれに多ビットを
埋め込むようにしてもよい。例えば、図３に例示した辞
書データベースにある基本単語「先日」には「先日」
「以前」「過日」「先頃」の４つの置き換え語が登録さ
れているとすれば、基本単語「先日」には「０」、「以
前」には「１」、「過日」には「００」、「先頃」には
「０１」を対応させることで、１つの表記のゆれを利用
して２ビットの情報を埋め込むことができる。Further, in the above-described embodiment, an example has been described in which the use of a basic word or other characters is made to correspond to 0 or 1, thereby embedding 1-bit information by using a fluctuation of one notation. However, multiple bits may be embedded in the fluctuation of a specific notation. For example, the basic word “the other day” in the dictionary database illustrated in FIG.
Assuming that four replacement words “previous”, “overdue”, and “recently” are registered, the basic words “previous” are “0”, “previous” is “1”, and By associating “00” and “earlier” with “01”, it is possible to embed 2-bit information by using the fluctuation of one notation.

[Brief description of the drawings]

【図１】本発明の実施例である情報の埋め込み方法と情
報再現方法が適用されるコンピュータ２０，３０のイン
ターネット接続の説明図である。FIG. 1 is an explanatory diagram of an Internet connection of computers 20, 30 to which an information embedding method and an information reproducing method according to an embodiment of the present invention are applied.

【図２】コンピュータ２０の内部構成を示すブロック図
である。FIG. 2 is a block diagram showing an internal configuration of a computer 20.

【図３】そのコンピュータ２０，３０に記憶されている
辞書データベースの構成を説明する説明図である。FIG. 3 is an explanatory diagram illustrating a configuration of a dictionary database stored in the computers 20, 30.

【図４】そのコンピュータ２０にて実行される情報埋め
込みプログラムのフローチャートである。FIG. 4 is a flowchart of an information embedding program executed by the computer 20.

【図５】情報が埋め込まれる原文を説明する説明図であ
る。FIG. 5 is an explanatory diagram illustrating an original text in which information is embedded.

【図６】原文の標準化処理に際してのコンピュータ２０
の表示画面の説明図である。FIG. 6 shows a computer 20 for standardizing original text.
It is an explanatory view of a display screen.

【図７】原文を標準化した標準化文章を説明する説明図
である。FIG. 7 is an explanatory diagram illustrating a standardized sentence in which an original sentence is standardized.

【図８】標準化文章に情報「Ａ」を埋め込んだ暗号化文
章を説明する説明図である。FIG. 8 is an explanatory diagram illustrating an encrypted text in which information “A” is embedded in a standardized text.

【図９】コンピュータ３０にて実行される情報再現プロ
グラムのフローチャートである。FIG. 9 is a flowchart of an information reproduction program executed by the computer 30.

【図１０】暗号化文章の解読処理に際してのコンピュー
タ３０の表示画面の説明図である。FIG. 10 is an explanatory diagram of a display screen of a computer 30 in a process of decrypting an encrypted text.

[Explanation of symbols]

１０…ネットワーク１１…キーボード１２…マウス１８…ルータ２０…コンピュータ２２…ＣＰＵ２３…ＲＯＭ２４…ＲＡＭ２５…タイマ２６…表示回路２７…ハードディスク２９…モニタ３０…コンピュータ４０…コンピュータ DESCRIPTION OF SYMBOLS 10 ... Network 11 ... Keyboard 12 ... Mouse 18 ... Router 20 ... Computer 22 ... CPU 23 ... ROM 24 ... RAM 25 ... Timer 26 ... Display circuit 27 ... Hard disk 29 ... Monitor 30 ... Computer 40 ... Computer

Claims

[Claims]

1. An information embedding device for embedding desired information in a digitized text, based on a predetermined rule, different words and / or notations but words and / or notations representing the same concept. Standardization means for rewriting the sentence into a standard sentence; words and / or words to be rewritten in the standardization process
Or, regarding notation, it is determined which of the word and / or notation representing the same concept is to be used according to the data corresponding to the desired information, and the sentence is rewritten to generate an encrypted sentence in which the data is embedded. An information embedding device comprising:

2. The standardizing means and the embedding means include words and / or embeddings.
2. The information embedding apparatus according to claim 1, wherein the same dictionary database is used as a predetermined rule for rewriting the notation.

3. The information embedding device according to claim 2, wherein the dictionary database has at least one stage of a character, a word, and a phrase as a candidate for rewriting.

4. The information embedding device according to claim 2, wherein the dictionary database is capable of prohibiting use of arbitrary data and / or changing data.

5. The standardization means according to claim 1, further comprising: a difference data generating unit configured to generate difference data based on a difference between the digitized text and the standard text when rewriting the digitized text into a standard text. 1 to Claim 4
An information embedding device according to any one of the above.

6. An information embedding method for embedding desired information in a digitized text, based on a predetermined rule, using different words and / or notations but words and / or notations representing the same concept. A standardization step of rewriting the sentence into a standard sentence, and a word and / or a word to be rewritten in the standardization step.
Or, regarding notation, an embedding step of deciding which of a word and / or notation representing the same concept is to be used in accordance with data corresponding to the desired information, rewriting the text, and embedding the data. Information embedding method.

7. An information decryption device for decrypting data corresponding to the desired information from an encrypted text obtained by the information embedding device according to claim 1, wherein the encryption is performed based on the predetermined rule. An information decoding apparatus, comprising: an extracting unit that extracts a rewritten word and / or a notation of a formatted text; and a decoding unit that decodes the embedded data based on the extracted word and / or notation.

8. The method according to claim 1, wherein the extracting unit includes a rewritten word and / or
8. The information reproducing apparatus according to claim 7, wherein when extracting the notation, the same dictionary database as that used when embedding the data is used.

9. A program for realizing a process of embedding desired information in a digitized text in a computer, based on a predetermined rule, a word and / or a word and / or a notation that represents the same concept even though it is a different word and / or notation. Or, by notation, a function of rewriting the sentence into a standard sentence, and regarding the word and / or notation that is the object of the rewriting, the word and / or notation representing the same concept according to data corresponding to the desired information. And a program for causing a computer to implement a function of rewriting the text and embedding the data by determining which of the above is to be used.

10. A method of handling information for embedding and decoding desired information in a digitized sentence, wherein a word and / or a different word and / or a notation but representing the same concept are written based on a predetermined rule. Or, by notation, a standardization step of rewriting the sentence into a standard sentence, and the words and / or rewritten in the standardization step.
Or, regarding notation, an embedding step of deciding which of a word and / or a notation representing the same concept is to be used in accordance with data corresponding to the desired information, rewriting the text and embedding the data, Receiving an encrypted text in which data is embedded by the process, and extracting a rewritten word and / or notation of the encrypted text based on the predetermined rule; Decoding the embedded data based on the information processing method.