CN100367274C - Method for embedding and extracting watermark in English texts - Google Patents

Method for embedding and extracting watermark in English texts Download PDF

Info

Publication number
CN100367274C
CN100367274C CNB2005100774713A CN200510077471A CN100367274C CN 100367274 C CN100367274 C CN 100367274C CN B2005100774713 A CNB2005100774713 A CN B2005100774713A CN 200510077471 A CN200510077471 A CN 200510077471A CN 100367274 C CN100367274 C CN 100367274C
Authority
CN
China
Prior art keywords
watermark
sentence
mentioned
watermark information
bit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2005100774713A
Other languages
Chinese (zh)
Other versions
CN1700205A (en
Inventor
王建民
张荣奇
李德毅
叶晓俊
张指浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CNB2005100774713A priority Critical patent/CN100367274C/en
Publication of CN1700205A publication Critical patent/CN1700205A/en
Application granted granted Critical
Publication of CN100367274C publication Critical patent/CN100367274C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Editing Of Facsimile Originals (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present invention relates to a method for embedding and extracting watermarks in English texts, which belongs to the technical field of text copyright protection. The copyright information of a copyright person is converted into binary bit strings; a text is input into a computer, space and special characters are filtered, the obtained character strings and the private key of the copyright person are processed by hass operation, and an integer number Z is obtained; if Z can exactly divide the embedded ratio, the next sentence is a watermark information sentence; the remainder of the length of the copyright information bit string is extracted by Z to confirm the watermark information bit to be embedded; the remainder of the number of the characters in the watermark information sentence is extracted by Z to confirm the position of the watermark information bit, so that 0 and 1 expressed by the code quantity relation of two adjacent letters are made the same with that of the watermark information bit to be embedded, until the text is ended; the watermark information extraction is the inverse process of the embedded process. The method has the advantages of good watermark hidden performance and high safety, and the method especially has complete attack resisting ability for format conversion attack, and the quality of the text can not be lowered because of watermark information.

Description

A kind of method that in English text, embeds and extract watermark
Technical field
The present invention relates to a kind of method that in English text, embeds and extract watermark, belong to text copyright protection technology field.
Background technology
As a kind of effective means of copyright protection, digital watermarking becomes the focus of people's research day by day.Yet the research of digital watermark technology at present concentrates on image, audio frequency, video data mostly, and is less relatively for the research of text watermark.This mainly is because text has its singularity, and text is implemented comparatively difficulty of watermark:
(1) text is made of content and form, and owing to the manifestation mode difference to document content, thereby the form of text document also is not quite similar.The type of text is a lot, and file layout is also varied, as WORD document (* doc), Web page or leaf, plain text, PDF etc.People can be fit to all file layouts but but be difficult to find out a kind of digital watermark at any file format designs watermarking project.
(2) file of various forms can be changed mutually usually, even the direct plain text content in the extracted file, such as the PASTE SPECIAL among the Word, only duplicates the stickup unformatted text, will make based on the text watermark information of form and lose fully.More extreme any can be carried out typing again to text, and the original like this watermark information that is embedded in the form is just all gone.
(3) multimedia object is made up of a large amount of positions, and many all be unessential, inside watermark can be hidden in.And text is made up of character, and each character all has fixing coding, do not have can embed watermark information redundant space.
(4) some part of multimedia object can be deleted or random replacing in the condition that does not cause the consciousness variation.And for text, that I guess only changes one of them character, just can make entire article become hard to understand, even expressed meaning is opposite fully.
For above-mentioned reasons, begun to enter into today of practical stage at the multimedia digital watermark, the text watermark still is in theory and experimental phase.Existing text digital watermark is based on mostly does an amount of the adjustment to text formatting, comes embed watermark information by giving the specific form of text, mainly contains that row moves coding, word moves and encodes and feature coding.Based on the text watermark of file layout in essence, its protection be not the content of text of wisdom of humanity crystallization, but a kind of publication form.When file layout changed, watermark information had also disappeared thereupon.
Summary of the invention
The objective of the invention is to propose a kind of method that in English text, embeds and extract watermark, so that text is carried out the embedding of watermark information, extract, detect, finally reach purpose the urtext copyright protection.
The method that embeds and extract watermark in text that the present invention proposes comprises following each step:
(1) copyright information with the copyright owner is converted into binary bit bit string, and its length is L b, set a watermark and embed ratio R;
(2) read in first sentence of English text;
(3) with above-mentioned sentence filtering space and the special character of reading in, obtain a character string that English character is only arranged, carry out the one-way hash function computing, obtain a lint-long integer Z according to character string and copyright owner's private key information;
(4) use Z divided by above-mentioned embedding ratio R, if divide exactly, the then above-mentioned sentence that reads in is that watermark information is judged sentence, and carries out following step (5); As if aliquant, then read in the next sentence of above-mentioned English text, and repeating step (3) and (4);
(5) read in next sentence, and to define this sentence be the watermark information sentence, calculate this number of characters Ls, divided by Ls, the gained remainder is watermark location P with Z w, with the bit string length L of Z divided by above-mentioned copyright information b, remainder is watermark information B i
(6) with above-mentioned P wThe character code and the P at place wThe character code size at+1 place compares, if above-mentioned watermark information B iDifferent with comparative result, then above-mentioned watermark information sentence is rewritten, make itself and watermark information B iIdentical, if above-mentioned watermark information B iIdentical with comparative result, then above-mentioned watermark location P wLatent watermark information B i, repeat above-mentioned steps (3)~(6), up to end of text (EOT), watermark embeds and finishes;
(7) read in first sentence of the English text of embed watermark;
(8) carry out the one-way hash function computing according to character string and copyright owner's private key information, obtain a lint-long integer Z;
(9) watermark according to above-mentioned setting embeds ratio R, and divided by the ratio of embedding R, if divide exactly, the sentence of the then above-mentioned English text that reads in is that watermark information is judged sentence, and carries out the following step (10) with Z; If aliquant, then read in the next sentence of the above-mentioned English text of embed watermark, and repeat above-mentioned steps (8) and (9);
(10) read in next sentence, this sentence is the watermark information sentence, calculates this number of characters Ls, and divided by Ls, the gained remainder is watermark location P with Z w, with the bit string length L of Z divided by above-mentioned copyright information b, remainder is watermark information B i
(11) with above-mentioned P wThe character code and the P at place wThe coding size of the character at+1 place compares, and the bit of expression comparative result is first bit that extracts watermark information;
(12) read in next sentence, repeating step (8)~(10), obtain one the expression copyright information Bit String on the corresponding one group of bit value of each bit, up to the end of file;
(13) one group of bit value on above-mentioned each bit is judged by election law, drawn the value of each bit;
(14) each the bit combination that above-mentioned judgement is obtained is reduced into copyright information.
The method that embeds and extract watermark in English text that the present invention proposes has the following advantages:
(1) the text water mark method of the present invention's proposition has changed existing method of coming embed watermark by the adjustment to text formatting work trace, directly watermark information is hidden among the sentence of text.Watermark information and content of text closely merge, and become an inseparable integral body.Crypticity is good, and is safe, and especially attacking for format conversion has anti-attack ability completely.
(2) the text water mark method of the present invention's proposition, watermark embeds nature, and the text behind the embed watermark can not cause the decline of reading impression, and text can not produce because of the existence of watermark information and degrade.
(3) the watermark text method of the present invention's proposition, watermark information is by the copyright owner the artificial modification of part sentence in the text to be lain in the text, each sentence in the text all is copyright owner's a intellectual achievement.Therefore, the text that has embedded watermark information has well kept copyright owner's writing style.
Description of drawings
Fig. 1 is the FB(flow block) of watermark embed process in the inventive method.
Fig. 2 is the FB(flow block) that the individual bit position embeds in the watermark embed process.
Fig. 3 is the FB(flow block) of watermark extraction process in the inventive method.
The FB(flow block) of watermark information reduction in Fig. 4 leaching process.
Embodiment
The method that in text, embeds and extract watermark that the present invention proposes, wherein the FB(flow block) of embed watermark as shown in Figure 1, at first the copyright information with the copyright owner is converted into bit string, its length is L bRead in first sentence of English text, space and special character in this sentence of filtering, special character is meant all symbols except that English alphabet (totally 52 of capital and small letters), obtain a character string that English character is only arranged, private key information according to character string and copyright owner is carried out the one-way hash function computing, the characteristics of one-way hash function computing are that input parameter is the information M of random length, return the round values of a regular length, obtain a lint-long integer Z; According to application need, set a watermark and embed ratio R,,, then read in next English text sentence, and repeat above step divided by the ratio of embedding R with Z if aliquant; If divide exactly, then judge for watermark information for first of above-mentioned English text, and carry out following steps, as shown in Figure 2, read in next sentence, and to define this sentence be the watermark information sentence, calculate this number of characters L s, use Z divided by L s, the gained remainder is watermark location P w, with the bit string length L of Z divided by above-mentioned copyright information b, remainder is watermark information B iWith above-mentioned P wThe character code and the P at place wIf+1 place's character code size compares above-mentioned watermark information B iDifferent with comparative result, then above-mentioned watermark information sentence is rewritten, make itself and watermark information B iIdentical, if above-mentioned watermark information B iIdentical with comparative result, then above-mentioned watermark location P wImplicit watermark information B i, repeat above step, up to end of text (EOT).The FB(flow block) of extraction watermark is read in first sentence of the English text of embed watermark as shown in Figure 3, carries out the one-way hash function computing according to character string and copyright owner's private key information, obtains a lint-long integer Z; Watermark according to above-mentioned setting embeds ratio R,,, then reads in the next English text sentence of embed watermark, and repeats above step if aliquant divided by the ratio of embedding R with Z; If divide exactly, then first of above-mentioned English text is that first watermark information is judged sentence, and carries out following steps; Read in next sentence, this sentence is the watermark information sentence, calculates this number of characters L s, use Z divided by L s, the gained remainder is watermark location P w, with the bit string length L of Z divided by above-mentioned copyright information b, remainder is watermark information B iWith P wThe character code and the P at place wThe coding size of the character at+1 place compares, and the bit of expression comparative result is first bit that extracts watermark information; Read in next sentence, repeat above step, obtain one the expression copyright information Bit String on the corresponding one group of bit value of each bit, up to the end of file; The watermark bit string of the expression copyright information that said extracted is come out is handled, to obtain representing the character string of copyright information, its process as shown in Figure 4, one group of bit value on above-mentioned each bit is judged by election law, promptly each group bit value is carried out 0,1 primary system meter, by the value of many this bits of representative of quantity, determine the value of each bit; Each bit combination that above-mentioned judgement is obtained is reduced into character string.
Watermark information sentence in the inventive method is meant the sentence that has embedded a bit watermark information.Watermark information is judged sentence, is meant the last sentence of watermark information sentence.The watermark information sentence determined to contain watermark information in which sentence, and and the watermark information sentence determined the content of particular location that watermark information embeds and embedded watermark information jointly.
Text watermark mode that the present invention proposes changed in the past represent the method for watermark information with file layout, show watermark information by intercharacter relation.Here the indication relation can be certain association of definition arbitrarily, such as 26 English characters being divided into vowel and two set of consonant, so any two characters or belong to identity set or belong to different set can define two kinds of situations and represent " 0 " and " 1 " to represent watermark information respectively.Modal relation is exactly a size, and any one character all has a specific encoded radio in computing machine, and this value is unique, just can represent watermark information by the comparison of coding size between two characters.Such as the ASCII coding of English character, two English character C 1And C 2Between relation have only three kinds of situations:
(1) C 1Coding>C 2Coding;
(2) C 1Coding=C 2Coding;
(3) C 1Coding<C 2Coding;
Can define preceding two kinds of situations C just 1Coding>=C 2The watermark information " 1 " of a bit of coded representation, last a kind of situation C 1Coding<C 2Coded representation " 0 ".
The text watermark embed process that the present invention proposes as depicted in figs. 1 and 2, at first the copyright information with the copyright owner changes into binary zero, 1 Bit String.From text, read in first sentence,, obtain a character string of forming by letter fully this filtering space and special character.This character string is carried out the MD5 computing in conjunction with copyright owner's private key, and the result of computing is one 128 lint-long integer Z.Z to watermark embed ratio R get surplus, if divide exactly (getting surplus result is 0) then next sentence is the watermark information sentence, will embed the watermark information of a bit therein; Otherwise read in next sentence and repeat said process.When in the watermark information sentence, embedding a bit information, earlier get surplus to watermark information length with Z, by getting the watermark information that surplus result decides which bit of concrete embedding, such as getting surplus result is 3 the 3rd bits with regard to embed watermark information, and getting surplus result is last bit of 0 embed watermark information.Behind the bit of determining to embed, next specifically locate the particular location of this bit in the watermark information sentence.With Z the alphabetical number in the watermark information sentence is got surplusly, remainder has determined the position that watermark information is definite.Such as remainder is 5 watermark informations that current bit is shown by the relation table between the 5th letter in the watermark information sentence and the 6th letter; If divide exactly then represent by last letter in the watermark information sentence and first relation between alphabetical, it is the magnitude relationship of alphabetic coding, ASCII coding such as English alphabet a is 97, the ASCII coding of b is 98, then a is less than b, and represents with 0 of bit value, otherwise, if a greater than b, then represents with 1 of bit value.Certainly, primitive relation between the letter of watermark information position might not be identical with watermark information to be embedded, if different, will keep the rewriting of original meaning to the watermark information sentence so, relation is identical with watermark information to be embedded up between the letter of watermark information position.All can't satisfy condition or this sentence can not have a bit change for a certain reason if repeatedly rewrite, so just rewrite the watermark information sentence last is that this sentence no longer is the watermark information sentence.From text first carries out said process successively, and up to end of text (EOT), watermark embeds and finishes.
The text watermark extraction process that the present invention proposes is the inverse process that embeds as shown in Figure 3 and Figure 4 basically.At first from text, read in first sentence, filter out space and special character and obtain a pure alphabetic string.This alphabetic string is carried out the MD5 computing in conjunction with copyright owner's private key information, obtain a lint-long integer Z.Z gets surplusly to embedding ratio R, if divide exactly then next sentence is the watermark information sentence, will therefrom extract the watermark information of a bit; If the aliquant next sentence that reads in so repeats above-mentioned computing.When extracting watermark information from the watermark information sentence, elder generation gets surplus to the alphabetical number in the watermark information sentence with Z, and remainder has determined the position that watermark information is definite.Such as remainder is 5, is then expressed the watermark information of current bit by the magnitude relationship of the ASCII value of the 5th letter in the watermark information sentence and the 6th letter; If divide exactly, then represent by last letter in the watermark information sentence and first relation between alphabetical.Extract the watermark information of a bit according to the relation between the letter of watermark information position.With Z watermark information length is got surplusly, get surplus result and represent that current what extract is that bit of watermark information, put it in the set of corresponding bits position.Extract all watermark informations from first in text successively to end of text (EOT), obtain a set of each watermark information bit.Election law is passed through in set to each bit, draws the explicit value of each bit watermark information, at last the Bit String that draws is converted to the character string of expression copyright information.

Claims (1)

1. method that embeds in English text and extract watermark is characterized in that this method comprises following each step:
(1) copyright information with the copyright owner is converted into binary bit bit string, and its length is Lb, sets a watermark and embeds ratio R;
(2) read in first sentence of English text;
(3) with above-mentioned sentence filtering space and the special character of reading in, obtain a character string that English character is only arranged, carry out the one-way hash function computing, obtain a lint-long integer Z according to character string and copyright owner's private key information;
(4) use Z divided by above-mentioned embedding ratio R, if divide exactly, the then above-mentioned sentence that reads in is that watermark information is judged sentence, and carries out following step (5); If aliquant, then read in the next sentence of above-mentioned English text, and repeat above-mentioned steps (3) and (4);
(5) read in next sentence, and to define this sentence be the watermark information sentence, calculate this number of characters L s, use Z divided by L s, the gained remainder is watermark location P w, with the bit string length L of Z divided by above-mentioned copyright information b, remainder is watermark information B i
(6) with above-mentioned P wThe character code and the P at place wThe character code size at+1 place compares, if above-mentioned watermark information B iDifferent with comparative result, then above-mentioned watermark information sentence is rewritten, make itself and watermark information B iIdentical, if above-mentioned watermark information B iIdentical with comparative result, then above-mentioned watermark location P wImplicit watermark information B i, repeat above-mentioned steps (3)~(6), up to end of text (EOT), watermark embeds and finishes;
(7) read in first sentence of the English text of embed watermark;
(8) carry out the one-way hash function computing according to character string and copyright owner's private key information, obtain a lint-long integer Z;
(9) watermark according to above-mentioned setting embeds ratio R, and divided by the ratio of embedding R, if divide exactly, the sentence of the then above-mentioned English text that reads in is that watermark information is judged sentence, and carries out following step (10) with Z; If aliquant, then read in the next sentence of the above-mentioned English text of embed watermark, and repeat above-mentioned steps (8) and (9);
(10) read in next sentence, this sentence is the watermark information sentence, calculates this number of characters L s, use Z divided by L s, the gained remainder is watermark location P w, with the bit string length L of Z divided by above-mentioned copyright information b, remainder is watermark information B i
(11) with above-mentioned P wThe character code and the P at place wThe coding size of the character at+1 place compares, and the bit of expression comparative result is first bit that extracts watermark information;
(12) read in next sentence, repeating step (8)~(10), obtain one the expression copyright information Bit String on the corresponding one group of bit value of each bit, up to the end of file;
(13) one group of bit value on above-mentioned each bit is judged by election law, drawn the value of each bit;
(14) each the bit combination that above-mentioned judgement is obtained is reduced into copyright information.
CNB2005100774713A 2005-06-24 2005-06-24 Method for embedding and extracting watermark in English texts Expired - Fee Related CN100367274C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005100774713A CN100367274C (en) 2005-06-24 2005-06-24 Method for embedding and extracting watermark in English texts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100774713A CN100367274C (en) 2005-06-24 2005-06-24 Method for embedding and extracting watermark in English texts

Publications (2)

Publication Number Publication Date
CN1700205A CN1700205A (en) 2005-11-23
CN100367274C true CN100367274C (en) 2008-02-06

Family

ID=35476272

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100774713A Expired - Fee Related CN100367274C (en) 2005-06-24 2005-06-24 Method for embedding and extracting watermark in English texts

Country Status (1)

Country Link
CN (1) CN100367274C (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2021976A4 (en) * 2006-05-17 2010-09-29 Verimatrix Inc Efficient application of video marking technologies
CN102567938B (en) * 2010-12-23 2014-05-14 北大方正集团有限公司 Watermark image blocking method and device for western language watermark processing
CN102946531A (en) * 2012-08-24 2013-02-27 南京大学 GOP (group of picture) frame structure combined video watermarking method and system
WO2014067102A1 (en) * 2012-10-31 2014-05-08 Empire Technology Development Llc Information coding method, system and computer-readable medium
CN107330306B (en) * 2017-06-28 2020-07-28 百度在线网络技术(北京)有限公司 Text watermark embedding and extracting method and device, electronic equipment and storage medium
CN110414194B (en) * 2019-07-02 2023-08-04 南京理工大学 Text watermark embedding and extracting method
CN110489945B (en) * 2019-07-26 2021-03-30 山东科技大学 Resume information protection and divulgence tracing method
WO2022047440A1 (en) * 2020-08-31 2022-03-03 Hewlett-Packard Development Company, L.P. Watermarks for text documents
CN112215011A (en) * 2020-10-21 2021-01-12 北京嘉和美康信息技术有限公司 Method and device for processing medical documents
CN112948776A (en) * 2021-02-03 2021-06-11 海信集团控股股份有限公司 Digital watermark adding method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1259709A (en) * 1998-09-17 2000-07-12 国际商业机器公司 Method and system for inserting information into piles
CN1389825A (en) * 2002-07-12 2003-01-08 哈尔滨工业大学 Method of embedding digital watermark into and separating and recovering digital watermark from media information
CN1447233A (en) * 2003-01-23 2003-10-08 同济大学 Multi-media data protection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1259709A (en) * 1998-09-17 2000-07-12 国际商业机器公司 Method and system for inserting information into piles
CN1389825A (en) * 2002-07-12 2003-01-08 哈尔滨工业大学 Method of embedding digital watermark into and separating and recovering digital watermark from media information
CN1447233A (en) * 2003-01-23 2003-10-08 同济大学 Multi-media data protection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种有效的文档水印技术. 张小华,刘芳,焦李成.通信学报,第24卷第5期. 2003 *

Also Published As

Publication number Publication date
CN1700205A (en) 2005-11-23

Similar Documents

Publication Publication Date Title
CN100367274C (en) Method for embedding and extracting watermark in English texts
CN100447812C (en) Document data waterprint embedded method
Alotaibi et al. Improved capacity Arabic text watermarking methods based on open word space
Shirali-Shahreza et al. A new approach to Persian/Arabic text steganography
Tayyeh et al. Novel steganography scheme using Arabic text features in Holy Quran
Roy et al. A novel approach to format based text steganography
CN102360413B (en) Steganographic method with misguiding function of controllable secret key sequence
CN102096787B (en) Method and device for hiding information based on word2007 text segmentation
CN102393892B (en) Word document copyright protection method
CN110414194B (en) Text watermark embedding and extracting method
Gutub et al. Utilizing diacritic marks for Arabic text steganography
Kingslin et al. Evaluative approach towards text steganographic techniques
Alotaibi et al. Arabic text watermarking: A review
CN103761459A (en) Document multiple digital watermarking insertion method and device, and document multiple digital watermarking extraction method and device
Khairullah A novel text steganography system using font color of the invisible characters in microsoft word documents
Stojanov et al. A new property coding in text steganography of Microsoft Word documents
Chen et al. Text watermarking algorithm based on semantic role labeling
Jalil et al. Text watermarking using combined image-plus-text watermark
CN102194205A (en) Method and device for text recoverable watermark based on synonym replacement
Shah et al. Text steganography using character spacing after normalization
CN109800547B (en) Method for quickly embedding and extracting information for WORD document protection and distribution tracking
Alla et al. An evolution of Hindi text steganography
Chao et al. Information hiding in text using typesetting tools with stego-encoding
CN109993681A (en) A kind of digital watermark method of the OOX format file based on color attribute value transformation
Cheng et al. A robust text digital watermarking algorithm based on fragments regrouping strategy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080206

Termination date: 20160624

CF01 Termination of patent right due to non-payment of annual fee