CN103761459A - Document multiple digital watermarking insertion method and device, and document multiple digital watermarking extraction method and device - Google Patents

Document multiple digital watermarking insertion method and device, and document multiple digital watermarking extraction method and device Download PDF

Info

Publication number
CN103761459A
CN103761459A CN201410035906.7A CN201410035906A CN103761459A CN 103761459 A CN103761459 A CN 103761459A CN 201410035906 A CN201410035906 A CN 201410035906A CN 103761459 A CN103761459 A CN 103761459A
Authority
CN
China
Prior art keywords
document
watermark information
layer
new
ground floor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410035906.7A
Other languages
Chinese (zh)
Other versions
CN103761459B (en
Inventor
陈小军
时金桥
徐睿
蒲以国
赵亮
张锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201410035906.7A priority Critical patent/CN103761459B/en
Publication of CN103761459A publication Critical patent/CN103761459A/en
Application granted granted Critical
Publication of CN103761459B publication Critical patent/CN103761459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Abstract

The invention relates to a document multiple digital watermarking insertion method and device, and a document multiple digital watermarking extraction method and device. The document multiple digital watermarking insertion method comprises the following steps of obtaining an original watermarking message, a secret key and a document to be processed, wherein the original watermarking message is input by a user; calculating a summary message in the original watermarking message, generating new watermarking messages, storing the original watermarking message and the new watermarking messages into a database as a database record together, dividing characters in the document into two layers, obtaining the number of sets of the new watermarking messages to be inserted into the first layer of the document according to the total number of the characters of the first layer of the document and the length of the digits of the new watermarking messages, inserting the multiple sets of new watermarking messages into the property digits of the first layer of the document sequentially from front to back, and inserting the multiple sets of new watermarking messages into the property digits of the second layer of the document sequentially from back to front. Based on the character property of the document of the Word format, safety is improved due to the use of the secret key, robustness is enhanced due to repeated insertion, and watermarking capacity is improved due to multiple insertions.

Description

The embedding of a kind of document multiple digital watermarking, extracting method and device
Technical field
The present invention relates to digital watermarking field, particularly a kind of document multiple digital watermarking embedding, extracting method and device.
Background technology
In recent years, along with developing rapidly of multimedia and network technology, the copyright of protection copyright becomes a much-talked-about topic of current academia research.Digital watermarking, as the important research direction of Information Hiding Techniques, has important value aspect the multimedia copyright protections such as text, video, audio frequency.Digital watermarking is that the copyright informations such as sequence number, word, logos are embedded in multi-medium data, to play that the true and false of copyright protection, confidential corespondence, data file is differentiated and the effect such as product marking.
The Text Watermarking method that existing availability is higher mainly contains Text Watermarking and this two large class of the Text Watermarking based on natural language based on form.Based on format text, watermark is the maximum class text watermark occurring up to now; from initial row displacement, word displacement, feature coding; develop finally methods such as changing font size, color; the research of the water mark method of this type is very active, but the method exists as weak points such as security are weak, watermark capacity is low.Text Watermarking based on natural language proposes early than people such as the Mikhail.J.Atallah of 02Nian You U.S. Purdue university and VictorRaskin.Mainly to add watermark information by changing the methods such as sentence structure, synonym replacement.Natural language digital watermarking has changed the content of text, but does not change implication and the form of text, after interpolation watermark, may be discovered hardly, is also not easy destroyed.But for normative document, stricter because of its call format, this kind of method may change semanteme, thereby is not suitable for the strict file of call format.Because computing machine is ripe not enough to the processing of natural language, this has become the bottleneck based on natural Language Watermarking technology in addition.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of character attibute based on Word format file, utilize key to improve security, repeat to embed and add the embedding of document multiple digital watermarking, extracting method and the device that strong robustness, multiple embedding have improved watermark capacity.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of document multiple digital watermarking embedding grammar, comprises the following steps:
Step 1: the original watermark information, key and the pending document that obtain user's input;
Step 2: utilize digest algorithm to calculate the summary info in original watermark information, generate new watermark information, obtain the length of new watermark information position according to new watermark information;
Step 3: jointly deposit original watermark information and new watermark information in database as a data-base recording, inquire about original watermark information when extracting watermark;
Step 4: the character in document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;
Step 5: be embedded into respectively in the attribute bit in the document second layer organizing new watermark information according to order from back to front more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.
The invention has the beneficial effects as follows: the present invention is based on the character attibute of Word format file, use key to improve security, repeat to embed and strengthened robustness, multiple embedding has improved watermark capacity.
On the basis of technique scheme, the present invention can also do following improvement.
Further, the character in document being divided into two-layer method specifically comprises the following steps:
Obtain the Unicode coding as the character of key, the Unicode coding of the character as key be converted into binary sequence, using binary sequence last two as key sequence;
Obtain the Unicode coding of all characters in document, the Unicode coding of each character in document is separately converted to binary sequence;
By key sequence respectively with document in the binary sequence that changes into of each character carry out xor operation, if result is 00,10, this character is divided into document ground floor; If result is 01,11, be divided into the document second layer.
Further, described separator is the binary sequence of the non-visible character that is of little use arbitrarily in Unicode coding.
Further, by organizing the different attribute position that new watermark information is embedded into respectively in document, specifically comprise the following steps more:
For ground floor, revise respectively the NoProofing property value of all characters in ground floor, if current new watermark information to be embedded is 1, NoProofing property value is revised as to True, otherwise, keep original value False constant;
For the second layer, revise respectively the LanguageIDOther property value of all characters in the second layer, if current new watermark information to be embedded is 00, keep original value constant, if current new watermark information position to be embedded is 01, revising LanguageIDOther property value is wdBasque, if current new watermark information position to be embedded is 10, revising LanguageIDOther property value is wdVenda, if current new watermark information position to be embedded is 11, revising LanguageIDOther property value is wdEstonian.
Further, a kind of document multiple digital watermarking extracting method, comprises the following steps:
Step 1a: whether detect in pending document embed watermark information, if so, all characters are divided into two-layer by rule, proceed to step 2a, otherwise, end process;
Step 2a: extract watermark information in the attribute bit of document ground floor, extract watermark information in the attribute bit of the document second layer, obtain respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;
Step 3a: according to the length of the character sum of document ground floor, the watermark information position that extracts, obtain respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer in the ground floor of document and the second layer;
Step 4a: when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, all watermark informations are normal, document is not attacked, and exports original watermark information after Query Database; Otherwise, carry out error correct.
Further, in described step 4a when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, during two times of the group number that the group number that is also included in the watermark information extracting in the attribute bit in the document second layer is the watermark information that extracts in the attribute bit in document ground floor, all watermark informations are normal.
Further, in document, the NoProofing property value of each character and LanguageIDOther property value are predefined for default value by system, detect one by one the character attibute of each character in the document of watermark information to be extracted, if there is the NoProofing property value character different from default value with LanguageIDOther property value, the document that the document is embed watermark information, otherwise the document is the document of embed watermark information not.
Further, described error correct specifically comprises the following steps:
Step 3a.1: press many groups watermark information that separator extracts, if to organize watermark information not quite identical more, and at least one group of watermark information is while matching a data-base recording, returns to the watermark information extracting and points out document damage situations; Otherwise, turn 3a.2;
Step 3a.2: all do not mate with arbitrary data-base recording if organize watermark information, prompting document is impaired serious more, extracts watermark information failure.
Further, a kind of document multiple digital watermarking flush mounting, comprises acquisition module, generation module, memory module, the first merge module and the second merge module;
Described acquisition module, for obtaining original watermark information, key and the pending document of user's input;
Described generation module, for utilizing digest algorithm to calculate the summary info of original watermark information, generates new watermark information, obtains the length of new watermark information position according to new watermark information;
Described memory module, for jointly depositing original watermark information and new watermark information in database as a data-base recording, inquires about original watermark information when extracting watermark;
Described the first merge module, for the character of document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;
Described the second merge module, for the order according to from back to front, by organizing new watermark information, be embedded into respectively in the attribute bit of the document second layer more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.
Further, a kind of document multiple digital watermarking extraction element, comprises detection module, extraction module, computing module and matching module;
Described detection module, for detection of embed watermark information whether in pending document, if so, all characters are divided into two-layer by rule, proceed to extraction module, otherwise, end process;
Described extraction module for extracting watermark information in the attribute bit of document ground floor, extracts watermark information in the attribute bit of the document second layer, obtains respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;
Described computing module, for according to the length of the character sum of document ground floor, the watermark information position that extracts at ground floor and the second layer of document, obtains respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer;
Described matching module, while all equating with predetermined extraction group number for unanimously and all match a data-base recording, the actual extracting group number of every layer when the many groups watermark information extracting, all watermark informations are normal, and document is not attacked, and export original watermark information after Query Database; Otherwise, carry out error correct.
Accompanying drawing explanation
Fig. 1 is document multiple digital watermarking embedding grammar process flow diagram of the present invention;
Fig. 2 is document multiple digital watermarking extracting method process flow diagram of the present invention;
Fig. 3 is document multiple digital watermarking flush mounting structural drawing of the present invention;
Fig. 4 is document multiple digital watermarking extraction element structural drawing of the present invention.
In accompanying drawing, the list of parts of each label representative is as follows:
1, acquisition module, 2, generation module, 3, memory module, the 4, first merge module, the 5, second merge module, 6, detection module, 7, extraction module, 8, matching module.
Embodiment
Below in conjunction with accompanying drawing, principle of the present invention and feature are described, example, only for explaining the present invention, is not intended to limit scope of the present invention.
As shown in Figure 1, be document multiple digital watermarking embedding grammar process flow diagram of the present invention; Fig. 2 is document multiple digital watermarking extracting method process flow diagram of the present invention; Fig. 3 is document multiple digital watermarking flush mounting structural drawing of the present invention; Fig. 4 is document multiple digital watermarking extraction element structural drawing of the present invention.
Embodiment 1
A document multiple digital watermarking embedding grammar, comprises the following steps:
Step 1: the original watermark information, key and the pending document that obtain user's input;
Step 2: utilize digest algorithm to calculate the summary info in original watermark information, generate new watermark information, obtain the length of new watermark information position according to new watermark information;
Step 3: jointly deposit original watermark information and new watermark information in database as a data-base recording, inquire about original watermark information when extracting watermark;
Step 4: the character in document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;
Step 5: be embedded into respectively in the attribute bit in the document second layer organizing new watermark information according to order from back to front more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.
Character in document is divided into two-layer method specifically to be comprised the following steps:
Obtain the Unicode coding as the character of key, the Unicode coding of the character as key be converted into binary sequence, using binary sequence last two as key sequence;
Obtain the Unicode coding of all characters in document, the Unicode coding of each character in document is separately converted to binary sequence;
By key sequence respectively with document in the binary sequence that changes into of each character carry out xor operation, if result is 00,10, this character is divided into document ground floor; If result is 01,11, be divided into the document second layer.
Described separator is the binary sequence of the non-visible character that is of little use arbitrarily in Unicode coding.
By organizing the different attribute position that new watermark information is embedded into respectively in document, specifically comprise the following steps more:
For ground floor, revise respectively the NoProofing property value of all characters in ground floor, if current new watermark information to be embedded is 1, NoProofing property value is revised as to True, otherwise, keep original value False constant;
For the second layer, revise respectively the LanguageIDOther property value of all characters in the second layer, if current new watermark information to be embedded is 00, keep original value constant, if current new watermark information position to be embedded is 01, revising LanguageIDOther property value is wdBasque, if current new watermark information position to be embedded is 10, revising LanguageIDOther property value is wdVenda, if current new watermark information position to be embedded is 11, revising LanguageIDOther property value is wdEstonian.
A document multiple digital watermarking extracting method, comprises the following steps:
Step 1a: whether detect in pending document embed watermark information, if so, all characters are divided into two-layer by rule, proceed to step 2a, otherwise, end process;
Step 2a: extract watermark information in the attribute bit of document ground floor, extract watermark information in the attribute bit of the document second layer, obtain respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;
Step 3a: according to the length of the character sum of document ground floor, the watermark information position that extracts, obtain respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer in the ground floor of document and the second layer;
Step 4a: when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, all watermark informations are normal, document is not attacked, and exports original watermark information after Query Database; Otherwise, carry out error correct.
In described step 4a when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, during two times of the group number that the group number that is also included in the watermark information extracting in the attribute bit in the document second layer is the watermark information that extracts in the attribute bit in document ground floor, all watermark informations are normal.
In described step 1a, the detection method of watermark information is:
In document, the NoProofing property value of each character and LanguageIDOther property value are predefined for default value by system, detect one by one the character attibute of each character in the document of watermark information to be extracted, if there is the NoProofing property value character different from default value with LanguageIDOther property value, the document that the document is embed watermark information, otherwise the document is the document of embed watermark information not.
Described error correct specifically comprises the following steps:
Step 3a.1: press many groups watermark information that separator extracts, if to organize watermark information not quite identical more, and at least one group of watermark information is while matching a data-base recording, returns to the watermark information extracting and points out document damage situations; Otherwise, turn 3a.2;
Step 3a.2: all do not mate with arbitrary data-base recording if organize watermark information, prompting document is impaired serious more, extracts watermark information failure.
A document multiple digital watermarking flush mounting, comprises acquisition module 1, generation module 2, memory module 3, the first merge modules 4 and the second merge module 5;
Described acquisition module 1, for obtaining original watermark information, key and the pending document of user's input;
Described generation module 2, for utilizing digest algorithm to calculate the summary info of original watermark information, generates new watermark information, obtains the length of new watermark information position according to new watermark information;
Described memory module 3, for jointly depositing original watermark information and new watermark information in database as a data-base recording, inquires about original watermark information when extracting watermark;
Described the first merge module 4, for the character of document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;
Described the second merge module 5, for the order according to from back to front, by organizing new watermark information, be embedded into respectively in the attribute bit of the document second layer more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.
A document multiple digital watermarking extraction element, comprises detection module 6, extraction module 7, computing module 8 and matching module 9;
Described detection module 6, for detection of embed watermark information whether in pending document, if so, all characters are divided into two-layer by rule, proceed to extraction module 7, otherwise, end process;
Described extraction module 7 for extracting watermark information in the attribute bit of document ground floor, extracts watermark information in the attribute bit of the document second layer, obtains respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;
Described computing module 8, for according to the length of the character sum of document ground floor, the watermark information position that extracts at ground floor and the second layer of document, obtains respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer;
Described matching module 9, while all equating with predetermined extraction group number for unanimously and all match a data-base recording, the actual extracting group number of every layer when the many groups watermark information extracting, all watermark informations are normal, and document is not attacked, and export original watermark information after Query Database; Otherwise, carry out error correct.
In concrete enforcement, embedding grammar of the present invention comprises following 6 steps:
1) input original watermark information, key and pending Word document;
2) by message digest algorithm such as MD5 or SHA1, calculate the summary info of original watermark, using this as the watermark data using afterwards;
3) as a record, deposit the watermark information of generation and original watermark information in database, inquire about raw information when extracting;
4) all characters of Word document are divided into two-layer, for different layers, watermark information are embedded into different attribute bit;
5) if total number of characters is N, summary info bit length is M, embeds the watermark of K=N/M group, and group numerical value rounds downwards.Between every group of watermark, need separator, if the RLO of non-visible character that can choose Unicode coding is as the separator between every group, its value is 0010000000101110.For ground floor character, according to order embedding K group watermark from front to back;
6) for second layer character, with step 5, according to order embedding 2*K group watermark from back to front.
Above-mentioned steps 4), 5) be, 6) core of this method.
Step 4), the method for text layering is: obtain the Unicode coding as key character, convert thereof into binary sequence, get last two as key.In telescopiny, obtain one by one text character Unicode coding simultaneously, also convert thereof into binary sequence, get last two, carry out xor operation with key, if
● result is 00,10, is divided into ground floor, revises NoProofing position;
● result is 01,11, is divided into the second layer, revises LanguageIDOther position.
Step 5), 6), this method adopts the OLE interfacing of official of Microsoft, realizes the operation to character attibute.The ultimate principle of embed watermark is to utilize the attribute of single word in Word document: NoProofing and LanguageIDOther.The effect of these two attributes is as follows: for Selection object (as single character etc.), if this value of NoProofing attribute is True, spelling and syntax check instrument will be ignored the word of appointment; The LanguageIDOther attribute of character, this attribute bit can be set to the enumerated value of the less language of number of users, and Microsoft's this attribute of recommendation arranges or return the document Chinese and western languages word language used that language version creates from right to left at Microsoft Word.LanguageIDOther attribute has 64 enumerated values, through research screening, this method is chosen wherein the enumerated value of three less language of number of users (wdBasque, wdVenda, wdEstonian) as modification value, be that each character can embed two watermark bit, the second layer can embed the twice information of ground floor, thereby has improved watermark capacity.Above two character attibutes have the feature that could be found, add and be revised by programming, in the operation of common Word program, can not remove this watermark feature, possess stronger disguise and attack tolerant.Repeating to embed is repeatedly in order to improve its robustness, even if suffer to delete the attacks such as modification, as long as there is one group of watermark, is normal, so just can recover the raw information of watermark.
The extracting method of watermark is the inverse process of embedding grammar, for:
1) all characters of Word document to be detected are divided into two-layer by rule;
2) character of every layer is pressed to embedding method reading out data one by one, obtain n group watermark information;
3) when the watermark of n group is consistent, and can match a data-base recording time, can show that all watermark informations are completely normal, document is not attacked, and exports original watermark information after Query Database.Otherwise, turn error correct algorithm.
The error correction method of watermark is:
1) press the n group watermark that separator extracts, if n group is not quite identical, but during at least 1 group watermark matches to data-base recording, as document suffers increases, delete character etc., attack destruction, return to the watermark information extracting and point out document damage situations.Otherwise, turn 2;
2) n organizes non-matched data storehouse records of watermark, represents that every group of watermark information suffers to destroy in various degree, and prompting document is impaired serious, can not extract watermark information.
The detection method of watermark is:
In Word, the NoProoing of each character and the default value of LanguageIDOther are respectively FALSE and 1033(wdEnglishUS), detect one by one the character attibute of input, if having these two attributes is not the character of default value, the document that the document is embed watermark so.
good effect
The character attibute of embed watermark information is invisible attribute, so embed watermark is afterwards from being visually non, has good disguise.
From statistical theory, every layer of number of characters on average accounts for 50%, 100 character and is divided into two-layerly, and every layer on average has respectively 50 characters, and watermark capacity is exactly 150% so.Through experimental demonstration, result is as shown in table 1, and actual watermark capacity approaches 150%.With respect to other text watermarking algorithm, be greatly improved, as shown in table 2.
During embed watermark, original watermark information is encrypted by message digest algorithm, even if obtain the watermark information embedding, can not obtain original watermark information, has improved the security of watermark.In addition, use key to carry out layering, if the key of input error while extracting, so will wrong layering, cause the attribute dislocation of extracting, will can not get watermark information so, thereby further improve the security of watermark.
If the document after embed watermark suffers the attacks such as increase, delete character, extract after watermark information, according to separator judgement, as shown in Figure 2, underscore is separator, rectangle frame is watermark information.If there is complete watermark information after separator, extracted, can guarantee to a certain extent the robustness of water mark method.
Table 1 watermark capacity statistics
Figure BDA0000461673100000121
Figure BDA0000461673100000131
Table 2 text watermarking algorithm capacity comparison
Figure BDA0000461673100000132
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. a document multiple digital watermarking embedding grammar, is characterized in that, comprises the following steps:
Step 1: the original watermark information, key and the pending document that obtain user's input;
Step 2: utilize digest algorithm to calculate the summary info in original watermark information, generate new watermark information, obtain the length of new watermark information position according to new watermark information;
Step 3: jointly deposit original watermark information and new watermark information in database as a data-base recording, inquire about original watermark information when extracting watermark;
Step 4: the character in document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;
Step 5: be embedded into respectively in the attribute bit in the document second layer organizing new watermark information according to order from back to front more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.
2. document multiple digital watermarking embedding grammar according to claim 1, is characterized in that: the character in document is divided into two-layer method and specifically comprises the following steps:
Obtain the Unicode coding as the character of key, the Unicode coding of the character as key be converted into binary sequence, using binary sequence last two as key sequence;
Obtain the Unicode coding of all characters in document, the Unicode coding of each character in document is separately converted to binary sequence;
By key sequence respectively with document in the binary sequence that changes into of each character carry out xor operation, if result is 00,10, this character is divided into document ground floor; If result is 01,11, be divided into the document second layer.
3. document multiple digital watermarking embedding grammar according to claim 1, is characterized in that: described separator is the binary sequence of the non-visible character that is of little use arbitrarily in Unicode coding.
4. document multiple digital watermarking embedding grammar according to claim 1, is characterized in that: by organizing the different attribute position that new watermark information is embedded into respectively in document, specifically comprise the following steps more:
For ground floor, revise respectively the NoProofing property value of all characters in ground floor, if current new watermark information to be embedded is 1, NoProofing property value is revised as to True, otherwise, keep original value False constant;
For the second layer, revise respectively the LanguageIDOther property value of all characters in the second layer, if current new watermark information to be embedded is 00, keep original value constant, if current new watermark information position to be embedded is 01, revising LanguageIDOther property value is wdBasque, if current new watermark information position to be embedded is 10, revising LanguageIDOther property value is wdVenda, if current new watermark information position to be embedded is 11, revising LanguageIDOther property value is wdEstonian.
5. a document multiple digital watermarking extracting method, is characterized in that, comprises the following steps:
Step 1a: whether detect in pending document embed watermark information, if so, all characters are divided into two-layer by rule, proceed to step 2a, otherwise, end process;
Step 2a: extract watermark information in the attribute bit of document ground floor, extract watermark information in the attribute bit of the document second layer, obtain respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;
Step 3a: according to the length of the character sum of document ground floor, the watermark information position that extracts, obtain respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer in the ground floor of document and the second layer;
Step 4a: when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, all watermark informations are normal, document is not attacked, and exports original watermark information after Query Database; Otherwise, carry out error correct.
6. document multiple digital watermarking extracting method according to claim 5, it is characterized in that: in described step 4a when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, during two times of the group number that the group number that is also included in the watermark information extracting in the attribute bit in the document second layer is the watermark information that extracts in the attribute bit in document ground floor, all watermark informations are normal.
7. document multiple digital watermarking extracting method according to claim 5, is characterized in that, in described step 1a, the detection method of watermark information is:
In document, the NoProofing property value of each character and LanguageIDOther property value are predefined for default value by system, detect one by one the character attibute of each character in the document of watermark information to be extracted, if there is the NoProofing property value character different from default value with LanguageIDOther property value, the document that the document is embed watermark information, otherwise the document is the document of embed watermark information not.
8. document multiple digital watermarking extracting method according to claim 5, is characterized in that: described error correct specifically comprises the following steps:
Step 3a.1: press many groups watermark information that separator extracts, if to organize watermark information not quite identical more, and at least one group of watermark information is while matching a data-base recording, returns to the watermark information extracting and points out document damage situations; Otherwise, turn 3a.2;
Step 3a.2: all do not mate with arbitrary data-base recording if organize watermark information, prompting document is impaired serious more, extracts watermark information failure.
9. a document multiple digital watermarking flush mounting, is characterized in that: comprise acquisition module (1), generation module (2), memory module (3), the first merge module (4) and the second merge module (5);
Described acquisition module (1), for obtaining original watermark information, key and the pending document of user's input;
Described generation module (2), for utilizing digest algorithm to calculate the summary info of original watermark information, generates new watermark information, obtains the length of new watermark information position according to new watermark information;
Described memory module (3), for jointly depositing original watermark information and new watermark information in database as a data-base recording, inquires about original watermark information when extracting watermark;
Described the first merge module (4), for the character of document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;
Described the second merge module (5), for the order according to from back to front, by organizing new watermark information, be embedded into respectively in the attribute bit of the document second layer more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.
10. a document multiple digital watermarking extraction element, is characterized in that: comprise detection module (6), extraction module (7), computing module (8) and matching module (9);
Described detection module (6), for detection of embed watermark information whether in pending document, if so, all characters are divided into two-layer by rule, proceed to extraction module (7), otherwise, end process;
Described extraction module (7) for extracting watermark information in the attribute bit of document ground floor, extracts watermark information in the attribute bit of the document second layer, obtains respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;
Described computing module (8), for according to the length of the character sum of document ground floor, the watermark information position that extracts at ground floor and the second layer of document, obtains respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer;
Described matching module (9), while all equating with predetermined extraction group number for unanimously and all match a data-base recording, the actual extracting group number of every layer when the many groups watermark information extracting, all watermark informations are normal, document is not attacked, and exports original watermark information after Query Database; Otherwise, carry out error correct.
CN201410035906.7A 2014-01-24 2014-01-24 A kind of document multiple digital watermarking embedding, extracting method and device Active CN103761459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410035906.7A CN103761459B (en) 2014-01-24 2014-01-24 A kind of document multiple digital watermarking embedding, extracting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410035906.7A CN103761459B (en) 2014-01-24 2014-01-24 A kind of document multiple digital watermarking embedding, extracting method and device

Publications (2)

Publication Number Publication Date
CN103761459A true CN103761459A (en) 2014-04-30
CN103761459B CN103761459B (en) 2016-08-17

Family

ID=50528695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410035906.7A Active CN103761459B (en) 2014-01-24 2014-01-24 A kind of document multiple digital watermarking embedding, extracting method and device

Country Status (1)

Country Link
CN (1) CN103761459B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376236A (en) * 2014-12-02 2015-02-25 上海出版印刷高等专科学校 Scheme self-adaptive digital watermark embedding and extracting method based on camouflage technology
CN104504342A (en) * 2014-12-04 2015-04-08 中国科学院信息工程研究所 Method for hiding information by using invisible characters based on Unicode codes
CN104715168A (en) * 2015-02-13 2015-06-17 陈佳阳 File security control and trace method and system based on digital fingerprints
CN106803047A (en) * 2017-01-13 2017-06-06 中国电建集团成都勘测设计研究院有限公司 Database water mark labeling method
CN109800547A (en) * 2019-01-09 2019-05-24 杭州基尔区块链科技有限公司 A method of the information for WORD document protection and distribution tracking is quickly embedded in and extracts
CN110414194A (en) * 2019-07-02 2019-11-05 南京理工大学 A kind of insertion and extracting method of Text Watermarking
CN110874456A (en) * 2018-08-31 2020-03-10 浙江大学 Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11564003B1 (en) 2021-09-20 2023-01-24 The Nielsen Company (Us), Llc Systems, apparatus, and methods to improve watermark detection in acoustic environments

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050280876A1 (en) * 2000-12-29 2005-12-22 Xin Wang Multi-stage watermarking process and system
US8189861B1 (en) * 2011-04-05 2012-05-29 Google Inc. Watermarking digital documents
CN102708535A (en) * 2012-05-11 2012-10-03 宁波大学 Zero-watermark insertion and extraction methods with multiple keys for digital images
CN102890760A (en) * 2012-10-30 2013-01-23 南京信息工程大学 Textual zero-knowledge watermark detection method based on asymmetric encryption
CN103093127A (en) * 2013-01-21 2013-05-08 深圳大学 Method and system of dynamic copyright protection based on sudoku and multiple digital watermarks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050280876A1 (en) * 2000-12-29 2005-12-22 Xin Wang Multi-stage watermarking process and system
US8189861B1 (en) * 2011-04-05 2012-05-29 Google Inc. Watermarking digital documents
CN102708535A (en) * 2012-05-11 2012-10-03 宁波大学 Zero-watermark insertion and extraction methods with multiple keys for digital images
CN102890760A (en) * 2012-10-30 2013-01-23 南京信息工程大学 Textual zero-knowledge watermark detection method based on asymmetric encryption
CN103093127A (en) * 2013-01-21 2013-05-08 深圳大学 Method and system of dynamic copyright protection based on sudoku and multiple digital watermarks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
袁树雄 等: "《中文文本多重水印算法应用研究》", 《计算机工程与应用》, 1 May 2009 (2009-05-01), pages 96 - 99 *
袁树雄 等: "《英文文本多重数字水印算法设计与实现》", 《计算机工程》, 5 August 2006 (2006-08-05) *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376236A (en) * 2014-12-02 2015-02-25 上海出版印刷高等专科学校 Scheme self-adaptive digital watermark embedding and extracting method based on camouflage technology
CN104376236B (en) * 2014-12-02 2017-08-29 上海理工大学 Scheme self-adaptive digital watermark embedding grammar and extracting method based on camouflage science
CN104504342A (en) * 2014-12-04 2015-04-08 中国科学院信息工程研究所 Method for hiding information by using invisible characters based on Unicode codes
CN104715168A (en) * 2015-02-13 2015-06-17 陈佳阳 File security control and trace method and system based on digital fingerprints
CN104715168B (en) * 2015-02-13 2018-10-09 陈佳阳 A kind of file security management and control based on digital finger-print and the method and system traced to the source
CN106803047A (en) * 2017-01-13 2017-06-06 中国电建集团成都勘测设计研究院有限公司 Database water mark labeling method
CN110874456A (en) * 2018-08-31 2020-03-10 浙江大学 Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method
CN110874456B (en) * 2018-08-31 2022-04-26 浙江大学 Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method
CN109800547A (en) * 2019-01-09 2019-05-24 杭州基尔区块链科技有限公司 A method of the information for WORD document protection and distribution tracking is quickly embedded in and extracts
CN109800547B (en) * 2019-01-09 2023-04-07 杭州基尔区块链科技有限公司 Method for quickly embedding and extracting information for WORD document protection and distribution tracking
CN110414194A (en) * 2019-07-02 2019-11-05 南京理工大学 A kind of insertion and extracting method of Text Watermarking
CN110414194B (en) * 2019-07-02 2023-08-04 南京理工大学 Text watermark embedding and extracting method

Also Published As

Publication number Publication date
CN103761459B (en) 2016-08-17

Similar Documents

Publication Publication Date Title
CN103761459A (en) Document multiple digital watermarking insertion method and device, and document multiple digital watermarking extraction method and device
CN103049682B (en) Character pitch encoding-based dual-watermark embedded text watermarking method
Roy et al. A novel approach to format based text steganography
Ahvanooey et al. ANiTW: A novel intelligent text watermarking technique for forensic identification of spurious information on social media
CN100447812C (en) Document data waterprint embedded method
CN101957810A (en) Method and device for embedding and detecting watermark in document by using computer system
CN102360413B (en) Steganographic method with misguiding function of controllable secret key sequence
CN102096787B (en) Method and device for hiding information based on word2007 text segmentation
Yadav et al. A novel approach of bulk data hiding using text steganography
CN100367274C (en) Method for embedding and extracting watermark in English texts
CN103646195A (en) Copyright protection oriented database watermarking method
Kaur et al. An existential review on text watermarking techniques
Zhang et al. A novel robust text watermarking for word document
Singh et al. A survey on text based steganography
CN112016061A (en) Excel document data protection method based on robust watermarking technology
Myers et al. Signal separation for nonlinear dynamical systems
CN110874456B (en) Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method
Chaudhary et al. Text steganography based on feature coding method
CN104376236B (en) Scheme self-adaptive digital watermark embedding grammar and extracting method based on camouflage science
Ghilan et al. Combined Markov model and zero watermarking techniques to enhance content authentication of english text documents
CN109800547B (en) Method for quickly embedding and extracting information for WORD document protection and distribution tracking
Rui et al. A multiple watermarking algorithm for texts mixed Chinese and English
Ji et al. Coverless information hiding method based on the keyword
WO2020139563A1 (en) Information processing method, hidden information parsing and embedding method, apparatus, and device
Qi et al. Cloud model based zero-watermarking algorithm for authentication of text document

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant