WO2000004501A1 - Electronic watermarking - Google Patents

Electronic watermarking Download PDF

Info

Publication number
WO2000004501A1
WO2000004501A1 PCT/GB1999/002265 GB9902265W WO0004501A1 WO 2000004501 A1 WO2000004501 A1 WO 2000004501A1 GB 9902265 W GB9902265 W GB 9902265W WO 0004501 A1 WO0004501 A1 WO 0004501A1
Authority
WO
WIPO (PCT)
Prior art keywords
characters
character
text
code data
watermark
Prior art date
Application number
PCT/GB1999/002265
Other languages
French (fr)
Inventor
Laurence Frank Turner
Athanassios Manikas
Original Assignee
Laurence Frank Turner
Athanassios Manikas
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Laurence Frank Turner, Athanassios Manikas filed Critical Laurence Frank Turner
Priority to AU49216/99A priority Critical patent/AU4921699A/en
Publication of WO2000004501A1 publication Critical patent/WO2000004501A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32144Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
    • H04N1/32149Methods relating to embedding, encoding, decoding, detection or retrieval operations
    • H04N1/32203Spatial or amplitude domain methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32144Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
    • H04N1/32149Methods relating to embedding, encoding, decoding, detection or retrieval operations
    • H04N1/32203Spatial or amplitude domain methods
    • H04N1/32219Spatial or amplitude domain methods involving changing the position of selected pixels, e.g. word shifting, or involving modulating the size of image components, e.g. of characters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32144Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
    • H04N1/32149Methods relating to embedding, encoding, decoding, detection or retrieval operations
    • H04N1/32203Spatial or amplitude domain methods
    • H04N1/32261Spatial or amplitude domain methods in binary data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0062Embedding of the watermark in text images, e.g. watermarking text documents using letter skew, letter distance or row distance

Definitions

  • the invention relates to a method of inserting code data into text data, which is sometimes called watermarking a tex .
  • Watermarking of documents allows audit trails to be set up easily. For example, if a particular document had to be watermarked by each owner before that owner was allowed to sell on the document to someone else, then from the different watermarks in the document it would be possible to identify all the previous legitimate owners of the document.
  • a selected character is marked using overwriting or underwriting, that is, the placing of one or more other character (s) at the same position in the text as the selected character.
  • these other characters might all be the same as the selected character, so that the under/overwritten characters cannot be seen.
  • the other characters may include a ⁇ space' or a 'null' character, or a combination of these overwritten one on top of another, making them invisible in the printed text. This combination of 'space' and/or null' characters is then overwritten with the selected text character.
  • the encoded material is inserted in such a way that it does not affect in any way the quality of the text.
  • Other characters which may be used in the set of characters with which each selected character is replaced may be colourless characters, including colourless underlining or overscoring characters, that is, characters which are at the same position in the string of characters as the selected character but which, in the text document, appear above or below the selected character.
  • the message or information to be inserted into the text is preferably encrypted and it is placed into the text in such a way as to make it difficult for unauthorised persons to remove the message, or watermark, from the text.
  • Figure 1 shows a schematic of a system according to one embodiment of the invention
  • Figure 2 shows a Watermarking Engine in more detail
  • FIG. 3 shows a block diagram of the structure of the Watermark Encoding Unit.
  • the term 'watermark' is used to mean a message which is to be inserted into the text.
  • the watermarking of a text-formatted file which is referred to as the Input File 12 and which has been created using a Wordprocessor 11, is performed using a Watermarking Engine 13.
  • the watermarked text-formatted file produced is referred to as the Output File 14.
  • the Watermarking Engine 13 reads the Input File 12 and embeds, according to pre-specified rules, watermarks that are contained in a Watermark Holding Unit 15 and finally saves the result into the Output File 14. Inserted watermarks can be 'seen' using the Watermark Visualisation Unit 16.
  • FIG 2 shows, in general, the elements of a text watermarking system according to one embodiment of the invention.
  • the Wate ' rmarking Engine 13 as shown in Figure 2, consists of a Watermark Encoding Unit 21, a Watermark Detection and Identification Unit 22, a Mapping Unit 23 and an Encryption Unit 24.
  • the Watermark Holding Unit 15 which is not part of the Watermarking Engine 13 itself, may contain a single watermark or a collection of a number of different watermarks. This unit 15 may be centralised or distributed throughout an electronic trading network. Centralised operation would enable some, or all, of the watermarks to be inserted by an authorised agent, while the distribution of the unit would make it possible for specified types of watermark to be inserted by an authorised party, or parties, over a network.
  • a watermark obtained from the Watermark Holding Unit 15 is defined by two parameters, the first parameter is the 'Type' of the watermark, and the second is the 'Body' of the watermark.
  • the 'Type' is an identifier and the 'Body* is the content, comprising a sequence of digits, symbols etc. corresponding to, for example, an ISBN as used in publishing, or a name which may be the name of an author or distributor, or any such combination of alpha-numeric and special characters.
  • Two illustrative examples of watermarks are given in Table 1.
  • the sequence of characters representing the Type and Body of the watermark is taken from the Watermark Holding Unit 15 and fed to a Watermark Mapping Unit 23 where the body of the watermark is mapped into a longer sequence of characters.
  • the mapping is dependant, not only on the Body, but also on the Type of the watermark. Through the mapping the Body of the watermark is converted, generally, into binary form. The invention is not restricted to the situation in which the output is a sequence of binary digits.
  • the output from the Mapping Unit 23 is input to the Encryption Unit 24 which operates in a manner well known to those in the art [see “Cryptography and Secure Communications” Rhee M Y, McGraw Hill Book Company, Singapore, 1994 which is incorporated herein by reference] .
  • the output of the Encryption Unit 24 represents a binary, encrypted form of the watermark information that is to be embedded in the formatted text file.
  • the Watermark Detection and Identification Unit, 22, examines the Input File 12 and, from Line 26, obtains information regarding the type of watermark contained in the Watermark Holding Unit 25 and checks the Input File 12 for the presence of all watermark
  • Watermark Encoding Unit 21 in order to prevent watermarking.
  • the Watermark Encoding Unit 21 which is shown in general form in Figure 3, consists of five main sub-units: a Buffer 31 for holding the mapped, encrypted body of the watermark to be inserted, an Intelligent Reader 30 which reads the input file in a manner to be described later, a Text-Character Watermarking Unit 32, a Writer 33 which writes text characters, watermarked or otherwise, to the Output File and a Text-Character Selection Procedure Unit (TCSP Unit) 34.
  • a Buffer 31 for holding the mapped, encrypted body of the watermark to be inserted
  • an Intelligent Reader 30 which reads the input file in a manner to be described later
  • a Text-Character Watermarking Unit 32 which reads the input file in a manner to be described later
  • TCSP Unit Text-Character Selection Procedure Unit
  • the operation of the Watermark Encoding Unit 21, which is under the general control of the Text-Character Selection Procedure Unit 34 will now be described in general terms.
  • the operation begins once a FALSE signal is received on Line 27 from the Watermark Detection and Identification Unit 22. Then the counter 36 shown in Figure 3 is reset to 0 and the Intelligent Reader 30 starts reading the Input File 12.
  • the Intelligent Reader 30 reads the text file 12 according to its rules of operation which depend on the format and type of the input file and any other information available in respect to the input file. If the Intelligent Reader reads a batch of characters then they are written directly by the Writer 33 to the Output File 14. If the Reader 30 reads a single character (a byte) then a check is made to determine whether this character is a text-character.
  • the Counter 36 is incremented by 1 and the TCSP Unit 34 takes over and, on the basis of a set of rules, controls whether or not this text character should be watermarked by the Text-Character Watermarking Unit. If the decision is that the text character should not be watermarked then the Text-Character Watermarking Unit 32 receives a FALSE command on Line 35 and the text character is written to the Output File. However, if the decision is that the text character should be watermarked then the command on Line 35 is TRUE and this instructs the Text-Character Watermarking Unit to perform the following two operations:
  • the watermarked text character is written to the Output File and the control passes back to the Intelligent Reader 30 which then reads the next character, or a batch of characters .
  • the Text-Character Selection Procedure Unit 34 controls the watermarking insertion and decides which characters should be watermarked according the TRUE or FALSE signal on Line 35 and a predefined set of rules, specific examples of which will be given later.
  • the constraint rules are based on the information provided by the Detection and Identification Unit 22 to the TCSP Unit 34. This is comprehensive information as to the types of the existing watermarks within the text and the ranges of positions within which the watermarked text characters fall.
  • the constraint rules become active every time the content of the counter is such that it falls within the constraint ranges referred to above.
  • the command on Line 35 then becomes FALSE thereby prohibiting watermarking of any text character under consideration.
  • the short jumping rules which come in to effect when the constraint rules are inactive, determine whether or not a text-character is to be watermarked by a digit of the specified watermark. In addition the short jumping rules determine the number of text characters that there is to be between successive watermarked characters. This separation may be by a predetermined number of text characters, or by an integer number generated at random, subject to a maximum value .
  • the long jumping rules are activated by the buffer when the last digit/symbol of the mapped and encrypted Watermark Body has been inserted into the text file. These rules are determined in a straight forward manner depending on the number of different types of watermarks to be inserted, the lengths of the watermarks, the frequency with which they are to be inserted and the length of the text file. The frequency of repetitive insertions of the same watermark should be such as to leave space for the insertion of the other types of watermarks.
  • N is the content of the counter when the last digit of the Watermark Body has been inserted then no further text characters should be watermarked so long as the condition
  • the Text-Character Selection Procedure Unit 34 maintains the state of Line 35 at FALSE, which inhibits further watermarking of read text characters until such time as the content of the Counter 36 exceeds N+F r .
  • the long jumping rules becomes inactive and the TCSP Unit 34 proceeds with the second insertion of the same specified watermark type based again on the short jumping rules which have returned to the active state.
  • a next specified watermark type is selected and, provided it follows the ordering rules, relating to the order in which watermark types are permitted to be inserted, the insertion process is repeated.
  • different watermark types are inserted in accordance with the position-division-multiplexing scheme.
  • a selected text character is to be watermarked with a binary digit 0 which is part of the Body of the watermark to be embedded then the 'space' symbol is overwritten with itself X times and then overwritten with the selected text character. If the selected text character is to be watermarked with the digit 1 then the space symbol is overwritten with itself Y times and then overwritten with the selected character, where X ⁇ Y.
  • PDF Portable Document Format
  • a document formatted according to a particular word processing system may be watermarked by using software to change the basic commands of the word processing system to allow overwriting.
  • the document could be converted from the format of a particular word processing system, such as Word (trade mark) or WordPerfect (trade mark) into the PostScript language and then, if necessary into PDF using, for example, an Adobe Acrobat Distiller (registered trade marks) .
  • Word trade mark
  • WordPerfect trade mark
  • PDF Adobe Acrobat Distiller
  • the file 1 4 To extract the watermark from the Output file, the file 1 4 must be input the Watermark Visualisation Unit 16. This first identifies the type of watermark (s) contained in the file and then, by performing generally the opposite procedure to that which was carried out by the Mapping Unit 23 and Encryption Unit 24 in order to encode the text with the watermark, the Visualisation Unit detects, reads and displays the watermark Body and Type contained within the text .
  • the binary zeros and ones representing the watermark are watermarked into the text by using X overwrites of a selected character by itself if the binary digit is a zero and by Y overwrites of the character by itself, (wrth X ⁇ Y) , if the binary digit is a "one .
  • binary zeros and ones are represented and watermarked into the text by respectively invisibly underlining or invisibly over-scoring the selected text character.
  • the method is the same as in the first embodiment described except that a colourless character is used instead of the 'space' character.
  • the method is the same as in the first embodiment described except that the 'null' character is used instead of the 'space' character.
  • a binary zero to be embedded is represented by a colourless character, say P, overwritten by the selected text character, and a binary one is represented by a different colourless character, say Q, overwritten by the selected text character.
  • This process has the advantage of increasing the bandwidth efficiency of the watermarking procedure.
  • binary zeros and ones which are part of the body of the watermark to be inserted, or combinations of a number of binary digits, are encrypted and impressed on the formatted text file using combinations of characters that are invisible in the sense related to the previous embodiments.
  • the invention thus provides an improved method of watermarking which is insensitive to formatting changes, unlike known watermarking methods. Moreover, the watermarking is more secure and cannot be detected simply by scanning the watermarked text.

Abstract

An improved method of inserting code data into a text data file, or 'watermarking' is provided. Selected characters are overwritten or underwritten with one or more other characters in such a way that only the selected character is visible. Unlike other known methods, this method is insensitive to formatting changes, such as change of font or justification.

Description

Electronic Watermarking
The invention relates to a method of inserting code data into text data, which is sometimes called watermarking a tex .
With the increase in electronic trading and the delivery of text in electronic form there is a growing need for secure methods of identifying ownership of written texts and for identifying the parties involved in electronic transactions. This can be achieved using electronic watermarking. This involves the insertion of information into the text material. The information can later be detected, read and displayed and used to identify aspects such as ownership. The information is, ideally, inserted in such a way as to not affect the perceived quality of the text.
Watermarking of documents allows audit trails to be set up easily. For example, if a particular document had to be watermarked by each owner before that owner was allowed to sell on the document to someone else, then from the different watermarks in the document it would be possible to identify all the previous legitimate owners of the document.
Techniques for watermarking text material are known, for example see "Challenges for copyright in a digital age", I D Bramhill & M R C Sims, B T Technol J Vol 15 No 2 April 1997. These methods involve making small alterations to specific characters, for example slight broadening of a comma or full stop, producing physically different characters which appear unaltered to the naked eye. The differences can be picked up electronically, for example when the document is scanned. However, these techniques are extremely sensitive to even minimal formatting changes, such as changes of font and justification. The invention provides an improved, more robust method for inserting code data into a text data file representing a string of characters. The invention is defined in the appended claims to which reference should now be made.
In the present invention a selected character is marked using overwriting or underwriting, that is, the placing of one or more other character (s) at the same position in the text as the selected character. These other characters might all be the same as the selected character, so that the under/overwritten characters cannot be seen. Alternatively, the other characters may include a λspace' or a 'null' character, or a combination of these overwritten one on top of another, making them invisible in the printed text. This combination of 'space' and/or null' characters is then overwritten with the selected text character. Thus the encoded material is inserted in such a way that it does not affect in any way the quality of the text.
Other characters which may be used in the set of characters with which each selected character is replaced may be colourless characters, including colourless underlining or overscoring characters, that is, characters which are at the same position in the string of characters as the selected character but which, in the text document, appear above or below the selected character.
The message or information to be inserted into the text is preferably encrypted and it is placed into the text in such a way as to make it difficult for unauthorised persons to remove the message, or watermark, from the text.
Preferred embodiments of the invention will now be described with reference to the drawings in which:
Figure 1 shows a schematic of a system according to one embodiment of the invention; Figure 2 shows a Watermarking Engine in more detail; and
Figure 3 shows a block diagram of the structure of the Watermark Encoding Unit.
In the following description the term 'watermark' is used to mean a message which is to be inserted into the text. As is shown in Figure 1, the watermarking of a text-formatted file, which is referred to as the Input File 12 and which has been created using a Wordprocessor 11, is performed using a Watermarking Engine 13. The watermarked text-formatted file produced is referred to as the Output File 14. The Watermarking Engine 13 reads the Input File 12 and embeds, according to pre-specified rules, watermarks that are contained in a Watermark Holding Unit 15 and finally saves the result into the Output File 14. Inserted watermarks can be 'seen' using the Watermark Visualisation Unit 16.
Figure 2 shows, in general, the elements of a text watermarking system according to one embodiment of the invention. The Wate'rmarking Engine 13, as shown in Figure 2, consists of a Watermark Encoding Unit 21, a Watermark Detection and Identification Unit 22, a Mapping Unit 23 and an Encryption Unit 24.
The Watermark Holding Unit 15, which is not part of the Watermarking Engine 13 itself, may contain a single watermark or a collection of a number of different watermarks. This unit 15 may be centralised or distributed throughout an electronic trading network. Centralised operation would enable some, or all, of the watermarks to be inserted by an authorised agent, while the distribution of the unit would make it possible for specified types of watermark to be inserted by an authorised party, or parties, over a network. A watermark obtained from the Watermark Holding Unit 15 is defined by two parameters, the first parameter is the 'Type' of the watermark, and the second is the 'Body' of the watermark. The 'Type' is an identifier and the 'Body* is the content, comprising a sequence of digits, symbols etc. corresponding to, for example, an ISBN as used in publishing, or a name which may be the name of an author or distributor, or any such combination of alpha-numeric and special characters. Two illustrative examples of watermarks are given in Table 1.
Table 1
Figure imgf000006_0001
The sequence of characters representing the Type and Body of the watermark is taken from the Watermark Holding Unit 15 and fed to a Watermark Mapping Unit 23 where the body of the watermark is mapped into a longer sequence of characters.
The mapping is dependant, not only on the Body, but also on the Type of the watermark. Through the mapping the Body of the watermark is converted, generally, into binary form. The invention is not restricted to the situation in which the output is a sequence of binary digits.
The output from the Mapping Unit 23 is input to the Encryption Unit 24 which operates in a manner well known to those in the art [see "Cryptography and Secure Communications" Rhee M Y, McGraw Hill Book Company, Singapore, 1994 which is incorporated herein by reference] . The output of the Encryption Unit 24 represents a binary, encrypted form of the watermark information that is to be embedded in the formatted text file. The Watermark Detection and Identification Unit, 22, examines the Input File 12 and, from Line 26, obtains information regarding the type of watermark contained in the Watermark Holding Unit 25 and checks the Input File 12 for the presence of all watermark
'Types' and displays any that are found to exist in the
Input File. If any of the found watermarks is of the same type as that which is currently to be embedded then a TRUE, hence 'inhibit', command is sent along Line 27 to the
Watermark Encoding Unit 21 in order to prevent watermarking.
Otherwise, a signal is sent to the Watermarking Encoding
Unit 21 along Line 27 instructing the unit 21 to start its operation. Also, if rules exist for the order in which watermark types are to be inserted then any attempt to insert a watermark in violation of these rules is inhibited via Line 27.
The Watermark Encoding Unit 21 which is shown in general form in Figure 3, consists of five main sub-units: a Buffer 31 for holding the mapped, encrypted body of the watermark to be inserted, an Intelligent Reader 30 which reads the input file in a manner to be described later, a Text-Character Watermarking Unit 32, a Writer 33 which writes text characters, watermarked or otherwise, to the Output File and a Text-Character Selection Procedure Unit (TCSP Unit) 34.
The operation of the Watermark Encoding Unit 21, which is under the general control of the Text-Character Selection Procedure Unit 34 will now be described in general terms. The operation begins once a FALSE signal is received on Line 27 from the Watermark Detection and Identification Unit 22. Then the counter 36 shown in Figure 3 is reset to 0 and the Intelligent Reader 30 starts reading the Input File 12. The Intelligent Reader 30 reads the text file 12 according to its rules of operation which depend on the format and type of the input file and any other information available in respect to the input file. If the Intelligent Reader reads a batch of characters then they are written directly by the Writer 33 to the Output File 14. If the Reader 30 reads a single character (a byte) then a check is made to determine whether this character is a text-character. If it is not a text character then it is again written directly to the output file. If, however, it is a text character then the Counter 36 is incremented by 1 and the TCSP Unit 34 takes over and, on the basis of a set of rules, controls whether or not this text character should be watermarked by the Text-Character Watermarking Unit. If the decision is that the text character should not be watermarked then the Text-Character Watermarking Unit 32 receives a FALSE command on Line 35 and the text character is written to the Output File. However, if the decision is that the text character should be watermarked then the command on Line 35 is TRUE and this instructs the Text-Character Watermarking Unit to perform the following two operations:
1) to read the next symbol of the mapped and encrypted Watermark Body from the Buffer 31;
2) watermark the selected text character in accordance with one of the watermarking methods to be described later.
The watermarked text character is written to the Output File and the control passes back to the Intelligent Reader 30 which then reads the next character, or a batch of characters .
It is clear from the above discussion that the Text-Character Selection Procedure Unit 34 controls the watermarking insertion and decides which characters should be watermarked according the TRUE or FALSE signal on Line 35 and a predefined set of rules, specific examples of which will be given later.
This decision taken by the TCSP Unit as to whether or not a text character should be watermarked is based on the following three general rules: 1. constraint rules;
2. short jumping rules;
3. long jumping rules.
The constraint rules are based on the information provided by the Detection and Identification Unit 22 to the TCSP Unit 34. This is comprehensive information as to the types of the existing watermarks within the text and the ranges of positions within which the watermarked text characters fall. The constraint rules become active every time the content of the counter is such that it falls within the constraint ranges referred to above. The command on Line 35 then becomes FALSE thereby prohibiting watermarking of any text character under consideration.
The short jumping rules, which come in to effect when the constraint rules are inactive, determine whether or not a text-character is to be watermarked by a digit of the specified watermark. In addition the short jumping rules determine the number of text characters that there is to be between successive watermarked characters. This separation may be by a predetermined number of text characters, or by an integer number generated at random, subject to a maximum value .
One such short jumping rule is that in which the counter content is evaluated modulo-M, where M is a predefined integer number, and the modulo-M value is then compared with some pre-selected integer number K which satisfies the condition 0<K<M. If the two numbers are the same, that is, (counter content modulo-M) =K, then the text character under consideration is selected to be watermarked and the Writer 33 is instructed to write the watermarked text character to the Output File 14. If the two integer numbers are not the same that is K≠ (counter content modulo-M), then the text character is written to the Output File 14 without it being watermarked. In this case the watermarking digit under consideration is kept ready for use until such time as another text-character is read and is selected to be watermarked. The process is repeated until all of the digits that make up the mapped and encrypted Body of the watermark have been inserted.
The long jumping rules are activated by the buffer when the last digit/symbol of the mapped and encrypted Watermark Body has been inserted into the text file. These rules are determined in a straight forward manner depending on the number of different types of watermarks to be inserted, the lengths of the watermarks, the frequency with which they are to be inserted and the length of the text file. The frequency of repetitive insertions of the same watermark should be such as to leave space for the insertion of the other types of watermarks.
One such long jumping rule is the following: If N is the content of the counter when the last digit of the Watermark Body has been inserted then no further text characters should be watermarked so long as the condition
(content of counter) ≤ N+Fr
holds, where Fr is an integer depending on the frequency of repetitive insertion of the same watermark.
Thus, so long as long jumping rules are active the Text-Character Selection Procedure Unit 34 maintains the state of Line 35 at FALSE, which inhibits further watermarking of read text characters until such time as the content of the Counter 36 exceeds N+Fr. Once the counter content exceeds N+Fr then the long jumping rules becomes inactive and the TCSP Unit 34 proceeds with the second insertion of the same specified watermark type based again on the short jumping rules which have returned to the active state. When the same specified watermark type has been inserted throughout the text, a next specified watermark type is selected and, provided it follows the ordering rules, relating to the order in which watermark types are permitted to be inserted, the insertion process is repeated. Thus different watermark types are inserted in accordance with the position-division-multiplexing scheme.
In a preferred embodiment of the invention, if a selected text character is to be watermarked with a binary digit 0 which is part of the Body of the watermark to be embedded then the 'space' symbol is overwritten with itself X times and then overwritten with the selected text character. If the selected text character is to be watermarked with the digit 1 then the space symbol is overwritten with itself Y times and then overwritten with the selected character, where X≠Y.
It will be appreciated that the roles of binary zeroes and ones can be interchanged. The overwriting watermarking procedure, which leaves the actual text character unchanged, can be carried out using computer commands that are well known ■ in the art.
Preferably, Portable Document Format (PDF) is used as the document format used when watermarking, as this can be used to represent a document in a manner independent of the application software, hardware and operating system used to create it.
Alternatively, a document formatted according to a particular word processing system may be watermarked by using software to change the basic commands of the word processing system to allow overwriting.
In another alternative method, the document could be converted from the format of a particular word processing system, such as Word (trade mark) or WordPerfect (trade mark) into the PostScript language and then, if necessary into PDF using, for example, an Adobe Acrobat Distiller (registered trade marks) .
To extract the watermark from the Output file, the file 14 must be input the Watermark Visualisation Unit 16. This first identifies the type of watermark (s) contained in the file and then, by performing generally the opposite procedure to that which was carried out by the Mapping Unit 23 and Encryption Unit 24 in order to encode the text with the watermark, the Visualisation Unit detects, reads and displays the watermark Body and Type contained within the text .
According to a second embodiment of the invention the binary zeros and ones representing the watermark are watermarked into the text by using X overwrites of a selected character by itself if the binary digit is a zero and by Y overwrites of the character by itself, (wrth X≠Y) , if the binary digit is a "one .
According to a third embodiment "of the invention, binary zeros and ones are represented and watermarked into the text by respectively invisibly underlining or invisibly over-scoring the selected text character.
According to a fourth embodiment- of the invention, the method is the same as in the first embodiment described except that a colourless character is used instead of the 'space' character.
According to a fifth embodiment of the invention, the method is the same as in the first embodiment described except that the 'null' character is used instead of the 'space' character. According to a sixth embodiment of the invention, a binary zero to be embedded is represented by a colourless character, say P, overwritten by the selected text character, and a binary one is represented by a different colourless character, say Q, overwritten by the selected text character.
According to yet another embodiment of the invention pairs of binary digits taken from the binary sequence representing the watermark Body to be inserted are encoded as follows:
00 is encoded as a zero as in the first embodiment
01 is encoded as a one as in the first embodiment
10 is encoded as a zero as in the second embodiment
11 is encoded as a one as in the second embodiment
This process has the advantage of increasing the bandwidth efficiency of the watermarking procedure.
According to yet a further embodiment of the invention, binary zeros and ones, which are part of the body of the watermark to be inserted, or combinations of a number of binary digits, are encrypted and impressed on the formatted text file using combinations of characters that are invisible in the sense related to the previous embodiments.
The invention thus provides an improved method of watermarking which is insensitive to formatting changes, unlike known watermarking methods. Moreover, the watermarking is more secure and cannot be detected simply by scanning the watermarked text.

Claims

Claims
1. A method of inserting code data into a text data file representing a string of characters, comprising the steps of : selecting a character to be coded with at least part of the code data, reading at least part of the code data, and, in dependence on the code data read, replacing in the text data file the selected character by a plurality of characters, including the selected character, each of the plurality of characters being allocated the same position in the string of characters as the selected character.
2. A method according to claim 1, in which the steps of selecting a character, reading part of the code data, and replacing the selected character are repeated until all the code data has been inserted into the text data file.
3. A method according to claim 1 or 2, including the step of encrypting the code data to be inserted into the text data file.
4. A method according to claim 1, 2 or 3, including the step of storing the code data in one or more buffers, and reading at least part of the code data from a buffer.
5. A method according to claim 4 in which the code data is stored as a string of symbols, successive symbols being read from the buffer as successive selected characters are coded.
6. A method according to claim 5 in which the symbols are read in pairs or groups.
7. A method according to any preceding claim, in which said plurality of characters includes the 'space' or 'null' character or a combination thereof.
8. A method according to any preceding claim, in which said plurality of characters includes colourless characters .
9. A method according to claim 8, in which said plurality of characters includes an underlining or overscoring character.
10. An apparatus for inserting code data into a text data file representing a string of characters, comprising: means for selecting a character to be coded with at least part of the code data; and means for replacing the selected character in the text data file by a plurality of characters, including the selected character, with each of the plurality of characters being placed at the same position in the string of characters as the selected character, said plurality of characters being chosen in dependence on the code data read.
11. An apparatus according to claim 10, including means for encrypting the code data to be inserted into the text data file.
12. An apparatus according to claim 10 or 11, including one or more buffers for storing the code data.
13. An apparatus according to claim 10, 11 or 12, including means for identifying whether a text data file has code data inserted therein.
14. A method of inserting code data into a text data file substantially as herein described.
15. An apparatus for inserting code dara into a text data file substantially as herein described with reference to the drawings .
PCT/GB1999/002265 1998-07-14 1999-07-14 Electronic watermarking WO2000004501A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU49216/99A AU4921699A (en) 1998-07-14 1999-07-14 Electronic watermarking

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9815300.0 1998-07-14
GB9815300A GB2339656B (en) 1998-07-14 1998-07-14 electronic watermarking

Publications (1)

Publication Number Publication Date
WO2000004501A1 true WO2000004501A1 (en) 2000-01-27

Family

ID=10835521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1999/002265 WO2000004501A1 (en) 1998-07-14 1999-07-14 Electronic watermarking

Country Status (3)

Country Link
AU (1) AU4921699A (en)
GB (1) GB2339656B (en)
WO (1) WO2000004501A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7644281B2 (en) 2004-09-27 2010-01-05 Universite De Geneve Character and vector graphics watermark for structured electronic documents security
CN102096787B (en) * 2009-12-14 2013-06-05 南京信息工程大学 Method and device for hiding information based on word2007 text segmentation

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000099501A (en) * 1998-09-17 2000-04-07 Internatl Business Mach Corp <Ibm> Method and system for padding information into document data
US20030145206A1 (en) * 2002-01-25 2003-07-31 Jack Wolosewicz Document authentication and verification
CN109558705A (en) * 2018-12-10 2019-04-02 万兴科技股份有限公司 Watermark Tiling methods, device, computer equipment and storage medium based on PDF

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5467447A (en) * 1990-07-24 1995-11-14 Vogel; Peter S. Document marking system employing context-sensitive embedded marking codes
US5953415A (en) * 1996-03-25 1999-09-14 Sun Microsystems, Inc. Fingerprinting plain text information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BRASSIL J ET AL: "ELECTRONIC MARKING AND IDENTIFICATION TECHNIQUES TO DISCOURAGE DOCUMENT COPYING", PROCEEDINGS OF THE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), TORONTO, JUNE 12 - 16, 1994, vol. 3, 12 June 1994 (1994-06-12), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 1278 - 1287, XP000496591, ISBN: 0-8186-5572-0 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7644281B2 (en) 2004-09-27 2010-01-05 Universite De Geneve Character and vector graphics watermark for structured electronic documents security
CN102096787B (en) * 2009-12-14 2013-06-05 南京信息工程大学 Method and device for hiding information based on word2007 text segmentation

Also Published As

Publication number Publication date
AU4921699A (en) 2000-02-07
GB2339656B (en) 2003-05-21
GB9815300D0 (en) 1998-09-09
GB2339656A (en) 2000-02-02

Similar Documents

Publication Publication Date Title
EP1333658B1 (en) Apparatus and method for producing a watermarked document and for authenticating the same
KR100548983B1 (en) Computer system and method for verifying the authenticity of digital documents
US5765176A (en) Performing document image management tasks using an iconic image having embedded encoded information
EP1410619B1 (en) Method of invisibly embedding and hiding data into soft-copy text documents
US20040001606A1 (en) Watermark fonts
US6664976B2 (en) Image management system and methods using digital watermarks
US6769061B1 (en) Invisible encoding of meta-information
US6782509B1 (en) Method and system for embedding information in document
US7730037B2 (en) Fragile watermarks
US20030145206A1 (en) Document authentication and verification
US20020124024A1 (en) Image management system and methods using digital watermarks
US7475429B2 (en) Method of invisibly embedding into a text document the license identification of the generating licensed software
US20050053258A1 (en) System and method for watermarking a document
WO1995020291A1 (en) Method of and apparatus for manipulating digital data works
EP1131769A2 (en) Printing and validation of self validating security documents
EP1291819A2 (en) Digital watermark embeddig
CN1193587C (en) Image and video authentication system
JP2011155688A (en) System for controlling distribution and use of digital work
WO2000004501A1 (en) Electronic watermarking
WO2008004221A2 (en) Inserting digital signatures into a transformed document
US8576049B2 (en) Document authentication and identification
Saber et al. Steganography in MS excel document using unicode system characteristics
EP1343097A1 (en) Method for embedding of information in media files
EP4261714A1 (en) Method and system for encoding and decoding information in texts
CN1282437A (en) Method for identifying image or document

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase