CN103761459B - A kind of document multiple digital watermarking embedding, extracting method and device - Google Patents

A kind of document multiple digital watermarking embedding, extracting method and device Download PDF

Info

Publication number
CN103761459B
CN103761459B CN201410035906.7A CN201410035906A CN103761459B CN 103761459 B CN103761459 B CN 103761459B CN 201410035906 A CN201410035906 A CN 201410035906A CN 103761459 B CN103761459 B CN 103761459B
Authority
CN
China
Prior art keywords
document
watermark information
layer
new
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410035906.7A
Other languages
Chinese (zh)
Other versions
CN103761459A (en
Inventor
陈小军
时金桥
徐睿
蒲以国
赵亮
张锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201410035906.7A priority Critical patent/CN103761459B/en
Publication of CN103761459A publication Critical patent/CN103761459A/en
Application granted granted Critical
Publication of CN103761459B publication Critical patent/CN103761459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Abstract

The present invention relates to a kind of document multiple digital watermarking embedding, extracting method and device, document multiple digital watermarking embedding grammar, comprise the following steps: obtain original watermark information, key and the pending document of user's input;Calculate the summary info in original watermark information, generate new watermark information;Original watermark information and new watermark information are stored in data base collectively as a data storehouse record;Character in document is divided into two-layer, the length of the watermark information position total, new of the character according to document ground floor, obtain the group number of the new watermark information of document ground floor to be embedded, according to order be from front to back embedded into organizing new watermark information in attribute position in document ground floor more;It is embedded into organizing new watermark information in the attribute position in the document second layer according to order from back to front more.Present invention character attibute based on Word format document, uses key to improve safety, repeats embedding and strengthen robustness, and multiple embedding improves watermark capacity.

Description

A kind of document multiple digital watermarking embedding, extracting method and device
Technical field
The present invention relates to digital watermarking field, particularly to a kind of document multiple digital watermarking embedding, extracting method and dress Put.
Background technology
In recent years, along with developing rapidly of multimedia and internet, the copyright of protection copyright becomes current science One much-talked-about topic of boundary's research.Digital watermarking is as the important research direction of Information Hiding Techniques, at text, video, audio frequency Deng multimedia copyright protection aspect, there is important value.Digital watermarking is by embedding for the copyright informations such as serial number, word, logos Enter in multi-medium data, to play the effects such as copyright protection, confidential corespondence, the real and fake discrimination of data file and product marking.
The Text Watermarking method that existing availability is higher mainly has Text Watermarking based on form and based on natural language This two big class of Text Watermarking.It is the most class text watermark occurred up to now based on format text watermark, from initial Row displacement, word displacement, feature coding, develop the change method such as font size, color finally, the water mark method of this type Study the most active, but the method exists the weak points such as weak in safety, watermark capacity is low.Text based on natural language Watermark is early than within 02 year, being proposed by Mikhail.J.Atallah and VictorRaskin of Purdue university of the U.S. et al..Main If adding watermark information by changing the method such as sentence structure, synonym replacement.Natural language digital watermarking changes literary composition This content, but do not change implication and the form of text, hardly possible after adding watermark is noticeable, and is also not easy to be broken Bad.But for normative document, because its call format is relatively stricter, this kind of method may change semanteme, thus is not suitable for The file that call format is strict.The most ripe to the process of natural language additionally, due to computer, this has become based on natural language The bottleneck of speech Text Watermarking technology.
Summary of the invention
The technical problem to be solved is to provide a kind of character attibute based on Word format document, utilizes key Improve safety, repeat to embed add strong robustness, multiple embedding improves the document multiple digital watermarking of watermark capacity and embeds, carries Access method and device.
The technical scheme is that a kind of document multiple digital watermarking embedding grammar, bag Include following steps:
Step 1: obtain original watermark information, key and the pending document of user's input;
Step 2: utilize digest algorithm to calculate the summary info in original watermark information, generate new watermark information, according to newly Watermark information obtains the length of new watermark information position;
Step 3: original watermark information and new watermark information are stored in data base collectively as a data storehouse record, are used for Inquiry original watermark information when extracting watermark;
Step 4: the character in document is divided into two-layer, watermark information position total according to the character of document ground floor, new Length, obtain the group number of the new watermark information of document ground floor to be embedded, new watermark will be organized according to order from front to back more Information is respectively embedded in the attribute position in document ground floor, and organize utilizes separators between new watermark information more;
Step 5: the attribute that new watermark information is respectively embedded in the document second layer will be organized according to order from back to front more In Wei, organize utilizes separators more between new watermark information, and embedding the group number of new watermark information in the document second layer is to embed The group two times of number of new watermark information in document ground floor.
The invention has the beneficial effects as follows: present invention character attibute based on Word format document, use key to improve peace Quan Xing, repeats embedding and strengthens robustness, and multiple embedding improves watermark capacity.
On the basis of technique scheme, the present invention can also do following improvement.
Further, the character in document is divided into the method for two-layer and specifically includes following steps:
Obtaining the Unicode coding of the character being used as key, the Unicode of the character that will act as key is encoded translated is two System sequence, using binary sequence last two as key sequence;
Obtain the Unicode coding of all characters in document, the Unicode of character each in document is encoded and converts respectively For binary sequence;
The binary sequence that each character changes into document respectively by key sequence carries out xor operation, if result is 00,10, then this character is divided into document ground floor;If result is 01,11, then it is divided into the document second layer.
Further, the binary sequence of the non-visible character that described separator is arbitrarily of little use in being Unicode coding.
Further, specifically include following steps by organizing the different attribute position that new watermark information is respectively embedded in document more:
For ground floor, it is respectively modified the NoProofing property value of all characters in ground floor, if the most to be embedded New watermark information is 1, then NoProofing property value is revised as True, otherwise, keeps original value False constant;
For the second layer, it is respectively modified the LanguageIDOther property value of all characters in the second layer, if currently treating embedding The new watermark information entered is 00, then keep original value constant, if new watermark information position the most to be embedded is 01, then revise LanguageIDOther property value is wdBasque, if new watermark information position the most to be embedded is 10, then revises LanguageIDOther property value is wdVenda, if new watermark information position the most to be embedded is 11, then revises LanguageIDOther property value is wdEstonian.
Further, a kind of document multiple digital watermarking extracting method, comprise the following steps:
Step 1a: detect in pending document whether embed watermark information, if it is, all characters are divided into two by rule Layer, proceeds to step 2a, and otherwise, end processes;
Step 2a: extract watermark information in the attribute position of document ground floor, extract in the attribute position of the document second layer Watermark information, obtains the actual extracting group number of every layer of watermark information extracted respectively according to separator;
Step 3a: according to the character of document ground floor sum, the watermark that extracts in the ground floor and the second layer of document The length of information bit, respectively obtains the predetermined extraction group number of the watermark information embedding document ground floor and the second layer;
Step 4a: when the many groups watermark information extracted unanimously and all matches a data storehouse record, the reality of every layer When extraction group number is the most equal with predetermined extraction group number, the most all watermark informations are normal, and document is not attacked, and inquire about data base Rear output original watermark information;Otherwise, error correct is carried out.
Further, described step 4a unanimously and all matches a data storehouse note when the many groups watermark information extracted When record, the actual extracting group number of every layer are the most equal with predetermined extraction group number, it are additionally included in the attribute position in the document second layer and carry During the group two times of number that group number is the watermark information extracted in the attribute position in document ground floor of the watermark information taken out, institute There is watermark information normal.
Further, in document the NoProofing property value of each character and LanguageIDOther property value by system It is predefined for default value, detects the character attibute of each character in the document of watermark information to be extracted one by one, if existing The character that NoProofing property value is different from default value with LanguageIDOther property value, then the document is for embedding watermark The document of information, otherwise, the document is the document being not embedded into watermark information.
Further, described error correct specifically includes following steps:
Step 3a.1: the many groups watermark information extracted by separator, if how group watermark information is not quite identical, and at least one When group watermark information matches a data storehouse record, return the watermark information extracted and point out document damage situations;Otherwise, Turn 3a.2;
Step 3a.2: if how group watermark information the most not with any database record matching, prompting document is impaired seriously, extracts Go out watermark information failure.
Further, a kind of document multiple digital watermarking flush mounting, including acquisition module, generation module, memory module, the One embeds module and second embeds module;
Described acquisition module, for obtaining original watermark information, key and the pending document of user's input;
Described generation module, for utilizing digest algorithm to calculate the summary info in original watermark information, generates new watermark Information, obtains the length of new watermark information position according to new watermark information;
Described memory module, for being stored in original watermark information and new watermark information collectively as a data storehouse record Data base, inquiry original watermark information when being used for extracting watermark;
Described first embeds module, and for the character in document is divided into two-layer, the character according to document ground floor is total The length of watermark information position several, new, obtains the group number of the new watermark information of document ground floor to be embedded, suitable according to from front to back Sequence is respectively embedded in organizing new watermark information in the attribute position in document ground floor more, and organize utilizes separation more between new watermark information Symbol separates;
Described second embeds module, for being respectively embedded in document according to order from back to front by organizing new watermark information more In attribute position in the second layer, organize utilizes separators more between new watermark information, embeds new watermark letter in the document second layer The group number of breath is two times of the group number embedding new watermark information in document ground floor.
Further, a kind of document multiple digital watermarking extraction element, including detection module, extraction module, computing module and Matching module;
Described detection module, for detecting in pending document whether embed watermark information, if it is, all characters are pressed Rule is divided into two-layer, proceeds to extraction module, and otherwise, end processes;
Described extraction module, for extracting watermark information, at the genus of the document second layer in the attribute position of document ground floor Property position in extract watermark information, obtain every layer of actual extracting group number of watermark information extracted according to separator respectively;
Described computing module, for the character sum according to document ground floor, carries in the ground floor and the second layer of document The length of the watermark information position taken out, respectively obtains the predetermined extraction group of the watermark information embedding document ground floor and the second layer Number;
Described matching module, for when the many groups watermark information that extract unanimously and all match a data storehouse record, When the actual extracting group number of every layer is the most equal with predetermined extraction group number, the most all watermark informations are normal, and document is not attacked, Original watermark information is exported after inquiry data base;Otherwise, error correct is carried out.
Accompanying drawing explanation
Fig. 1 is document multiple digital watermarking embedding grammar flow chart of the present invention;
Fig. 2 is document multiple digital watermarking extracting method flow chart of the present invention;
Fig. 3 is document multiple digital watermarking flush mounting structure chart of the present invention;
Fig. 4 is document multiple digital watermarking extraction element structure chart of the present invention.
In accompanying drawing, the list of parts representated by each label is as follows:
1, acquisition module, 2, generation module, 3, memory module, 4, first embeds module, and 5, second embeds module, and 6, detection Module, 7, extraction module, 8, matching module.
Detailed description of the invention
Being described principle and the feature of the present invention below in conjunction with accompanying drawing, example is served only for explaining the present invention, and Non-for limiting the scope of the present invention.
As it is shown in figure 1, be document multiple digital watermarking embedding grammar flow chart of the present invention;Fig. 2 is that document of the present invention is multiple Digital watermarking extracting method flow chart;Fig. 3 is document multiple digital watermarking flush mounting structure chart of the present invention;Fig. 4 is the present invention Document multiple digital watermarking extraction element structure chart.
Embodiment 1
A kind of document multiple digital watermarking embedding grammar, comprises the following steps:
Step 1: obtain original watermark information, key and the pending document of user's input;
Step 2: utilize digest algorithm to calculate the summary info in original watermark information, generate new watermark information, according to newly Watermark information obtains the length of new watermark information position;
Step 3: original watermark information and new watermark information are stored in data base collectively as a data storehouse record, are used for Inquiry original watermark information when extracting watermark;
Step 4: the character in document is divided into two-layer, watermark information position total according to the character of document ground floor, new Length, obtain the group number of the new watermark information of document ground floor to be embedded, new watermark will be organized according to order from front to back more Information is respectively embedded in the attribute position in document ground floor, and organize utilizes separators between new watermark information more;
Step 5: the attribute that new watermark information is respectively embedded in the document second layer will be organized according to order from back to front more In Wei, organize utilizes separators more between new watermark information, and embedding the group number of new watermark information in the document second layer is to embed The group two times of number of new watermark information in document ground floor.
Character in document is divided into the method for two-layer and specifically includes following steps:
Obtaining the Unicode coding of the character being used as key, the Unicode of the character that will act as key is encoded translated is two System sequence, using binary sequence last two as key sequence;
Obtain the Unicode coding of all characters in document, the Unicode of character each in document is encoded and converts respectively For binary sequence;
The binary sequence that each character changes into document respectively by key sequence carries out xor operation, if result is 00,10, then this character is divided into document ground floor;If result is 01,11, then it is divided into the document second layer.
Described separator is the binary sequence of the non-visible character being arbitrarily of little use in Unicode coding.
Specifically include following steps by organizing the different attribute position that new watermark information is respectively embedded in document more:
For ground floor, it is respectively modified the NoProofing property value of all characters in ground floor, if the most to be embedded New watermark information is 1, then NoProofing property value is revised as True, otherwise, keeps original value False constant;
For the second layer, it is respectively modified the LanguageIDOther property value of all characters in the second layer, if currently treating embedding The new watermark information entered is 00, then keep original value constant, if new watermark information position the most to be embedded is 01, then revise LanguageIDOther property value is wdBasque, if new watermark information position the most to be embedded is 10, then revises LanguageIDOther property value is wdVenda, if new watermark information position the most to be embedded is 11, then revises LanguageIDOther property value is wdEstonian.
A kind of document multiple digital watermarking extracting method, comprises the following steps:
Step 1a: detect in pending document whether embed watermark information, if it is, all characters are divided into two by rule Layer, proceeds to step 2a, and otherwise, end processes;
Step 2a: extract watermark information in the attribute position of document ground floor, extract in the attribute position of the document second layer Watermark information, obtains the actual extracting group number of every layer of watermark information extracted respectively according to separator;
Step 3a: according to the character of document ground floor sum, the watermark that extracts in the ground floor and the second layer of document The length of information bit, respectively obtains the predetermined extraction group number of the watermark information embedding document ground floor and the second layer;
Step 4a: when the many groups watermark information extracted unanimously and all matches a data storehouse record, the reality of every layer When extraction group number is the most equal with predetermined extraction group number, the most all watermark informations are normal, and document is not attacked, and inquire about data base Rear output original watermark information;Otherwise, error correct is carried out.
In described step 4a when the many groups watermark information extracted unanimously and all match a data storehouse record, every layer When actual extracting group number is the most equal with predetermined extraction group number, it is additionally included in the attribute position in the document second layer watermark extracted During the group two times of number that group number is the watermark information extracted in the attribute position in document ground floor of information, all watermark informations Normally.
In described step 1a, the detection method of watermark information is:
In document, NoProofing property value and the LanguageIDOther property value of each character are predefined for writing from memory by system Recognize value, detect the character attibute of each character in the document of watermark information to be extracted one by one, if there is NoProofing property value The character different from default value with LanguageIDOther property value, then the document is the document embedding watermark information, otherwise, The document is the document being not embedded into watermark information.
Described error correct specifically includes following steps:
Step 3a.1: the many groups watermark information extracted by separator, if how group watermark information is not quite identical, and at least one When group watermark information matches a data storehouse record, return the watermark information extracted and point out document damage situations;Otherwise, Turn 3a.2;
Step 3a.2: if how group watermark information the most not with any database record matching, prompting document is impaired seriously, extracts Go out watermark information failure.
A kind of document multiple digital watermarking flush mounting, including acquisition module 1, generation module 2, memory module 3, first is embedding Enter module 4 and second and embed module 5;
Described acquisition module 1, for obtaining original watermark information, key and the pending document of user's input;
Described generation module 2, for utilizing digest algorithm to calculate the summary info in original watermark information, generates new watermark Information, obtains the length of new watermark information position according to new watermark information;
Described memory module 3, for depositing original watermark information and new watermark information collectively as a data storehouse record Enter data base, inquiry original watermark information when being used for extracting watermark;
Described first embeds module 4, and for the character in document is divided into two-layer, the character according to document ground floor is total The length of watermark information position several, new, obtains the group number of the new watermark information of document ground floor to be embedded, suitable according to from front to back Sequence is respectively embedded in organizing new watermark information in the attribute position in document ground floor more, and organize utilizes separation more between new watermark information Symbol separates;
Described second embeds module 5, for being respectively embedded in literary composition according to order from back to front by organizing new watermark information more In attribute position in the shelves second layer, organize utilizes separators more between new watermark information, embeds new watermark in the document second layer The group number of information is two times of the group number embedding new watermark information in document ground floor.
A kind of document multiple digital watermarking extraction element, including detection module 6, extraction module 7, computing module 8 and coupling Module 9;
Described detection module 6, for detecting in pending document whether embed watermark information, if it is, all characters Being divided into two-layer by rule, proceed to extraction module 7, otherwise, end processes;
Described extraction module 7, for extracting watermark information, at the genus of the document second layer in the attribute position of document ground floor Property position in extract watermark information, obtain every layer of actual extracting group number of watermark information extracted according to separator respectively;
Described computing module 8, for the character sum according to document ground floor, carries in the ground floor and the second layer of document The length of the watermark information position taken out, respectively obtains the predetermined extraction group of the watermark information embedding document ground floor and the second layer Number;
Described matching module 9, for when the many groups watermark information that extract unanimously and all match a data storehouse record, When the actual extracting group number of every layer is the most equal with predetermined extraction group number, the most all watermark informations are normal, and document is not attacked, Original watermark information is exported after inquiry data base;Otherwise, error correct is carried out.
In being embodied as, the embedding grammar of the present invention includes following 6 steps:
1) input original watermark information, key and pending Word document;
2) summary info of original watermark is calculated by the message digest algorithm such as MD5 or SHA1, using this as using afterwards Watermark data;
3) watermark information of generation and original watermark information are stored in data base as a record, inquire about when being used for extracting Raw information;
4) all characters of Word document are divided into two-layer, for different layers, watermark information are embedded into different genus Property position;
5) if total number of characters is N, summary info bit length is M, then embed K=N/M group watermark, and group numerical value rounds downwards.Often Separator is needed, as the RLO of the non-visible character of Unicode coding can be chosen as the separation between often group between group watermark Symbol, its value is 0010000000101110.For ground floor character, embed K group watermark according to order from front to back;
6) for second layer character, with step 5,2*K group watermark is embedded according to order from back to front.
Above-mentioned steps 4), 5), 6) be the core of this method.
Step 4), the method for text layering is: obtains the Unicode coding being used as key character, converts thereof into two and enter Sequence processed, take last two as key.Simultaneously in telescopiny, obtain text character Unicode coding one by one, also by it It is converted into binary sequence, takes last two, carry out xor operation with key, if
● result is 00,10, then be divided into ground floor, revises NoProofing position;
● result is 01,11, then be divided into the second layer, revises LanguageIDOther position.
Step 5), 6), this method uses the OLE interfacing of Microsoft official, it is achieved the operation to character attibute.Embed water The ultimate principle of print is to utilize the attribute of single word: NoProofing and LanguageIDOther in Word document.The two The effect of attribute is as follows: for Selection object (such as single character etc.), if this value of NoProofing attribute is True, Then spelling and grammar checking tools will ignore the word specified;The LanguageIDOther attribute of character, this attribute position can set Being set to the enumerated value of the less language of number of users, Microsoft recommends this attribute and arranges or return at Microsoft Word The language used by document Chinese and western language word that language version is created from right to left.LanguageIDOther attribute has 64 Enumerated value, through research screening, this method choose wherein three less language of number of users enumerated value (wdBasque, WdVenda, wdEstonian) as modified values, the most each character can embed two watermark bit, and the second layer can embed ground floor Twice information, thus improve watermark capacity.Two above character attibute have by programming could find, add and revise Feature, this watermark feature can not be removed in the operation of common Word program, possess stronger disguise and attack resistance Property.Repeating to embed is repeatedly to improve its robustness, even if suffering that deleting amendment etc. attacks, as long as it is normal for having one group of watermark , then just can recover the raw information of watermark.
The extracting method of watermark is the inverse process of embedding grammar, for:
1) all characters of Word document to be detected are divided into two-layer by rule;
2) character to every layer is pressed embedding method one by one and is read data, obtains n group watermark information;
3) consistent when n group watermark, it is possible to when matching a data storehouse record, then to may indicate that all watermark informations are complete Complete normal, document is not attacked, and exports original watermark information after inquiry data base.Otherwise, error correct algorithm is turned.
The error correction method of watermark is:
1) pressing the n group watermark that separator extracts, if n group is not quite identical, but at least 1 group watermark matches is to a data During the record of storehouse, destroy as document suffers that increase, deletion character etc. are attacked, return the watermark information extracted and point out document impaired Situation.Otherwise, 2 are turned;
2) n group watermark all non-matched data storehouse records, represent that often group watermark information is destroyed in various degree, points out document Impaired seriously, it is impossible to extract watermark information.
The detection method of watermark is:
In Word, the default value of NoProoing and LanguageIDOther of each character is respectively FALSE and 1033 (wdEnglishUS), the character attibute of input is detected one by one, if there is the two attribute not for the character of default value, then should Document is the document embedding watermark.
Good effect
The character attibute embedding watermark information is invisible attribute, so being visually can not perception after embedding watermark , there is good disguise.
For from statistical theory, every layer of number of characters averagely accounts for 50%, and 100 characters are divided into two-layer, every layer of average mark There are not 50 characters, then watermark capacity is exactly 150%.Through experimental demonstration, result is as shown in table 1, and actual watermark capacity is close 150%.It is greatly improved relative to other text watermarking algorithm, as shown in table 2.
When embedding watermark, original watermark information message digest algorithm is encrypted, even if obtaining the watermark information embedded, Original watermark information can not be obtained, improve the safety of watermark.It addition, use key to be layered, if extracted The key of input error, then will mistake be layered, causes the attribute dislocation extracted, then will can not get watermark information, from And further increase the safety of watermark.
If the document after embedding watermark suffers to increase, delete the attacks such as character, after extracting watermark information, according to separation Symbol judges, as in figure 2 it is shown, underscore is separator, rectangle frame is watermark information.If there being complete watermark to believe after separator Breath, then extracted, can be ensured the robustness of water mark method to a certain extent.
Table 1 watermark capacity is added up
Table 2 text watermarking algorithm capacity comparison
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and Within principle, any modification, equivalent substitution and improvement etc. made, should be included within the scope of the present invention.

Claims (10)

1. a document multiple digital watermarking embedding grammar, it is characterised in that comprise the following steps:
Step 1: obtain the original watermark information of user's input and pending document;
Step 2: utilize digest algorithm to calculate the summary info in original watermark information, generate new watermark information, according to new watermark Information obtains the length of new watermark information position;
Step 3: original watermark information and new watermark information are stored in data base collectively as a data storehouse record, are used for extracting Original watermark information is inquired about during watermark;
Step 4: the character in document is divided into two-layer, the length of watermark information position total according to the character of document ground floor, new Degree, obtains the group number of the new watermark information of document ground floor to be embedded, will organize new watermark information according to order from front to back more Being respectively embedded in the attribute position in document ground floor, organize utilizes separators between new watermark information more;
Step 5: be respectively embedded in organizing new watermark information in the attribute position in the document second layer according to order from back to front more, Organize utilizes separators more between new watermark information, embed in the document second layer group number of new watermark information for embedding document the The group two times of number of new watermark information in one layer.
Document multiple digital watermarking embedding grammar the most according to claim 1, it is characterised in that: the character in document is drawn The method being divided into two-layer specifically includes following steps:
Obtaining the Unicode coding of the character being used as key, the Unicode of the character that will act as key is encoded translated for binary system Sequence, using binary sequence last two as key sequence;
Obtain the Unicode coding of all characters in document, the Unicode of character each in document is encoded and is separately converted to two System sequence;
The binary sequence that each character changes into document respectively by key sequence carries out xor operation, if result be 00, 10, then this character is divided into document ground floor;If result is 01,11, then it is divided into the document second layer.
Document multiple digital watermarking embedding grammar the most according to claim 1, it is characterised in that: described separator is The binary sequence of the non-visible character being arbitrarily of little use in Unicode coding.
Document multiple digital watermarking embedding grammar the most according to claim 1, it is characterised in that: new watermark information will be organized more The different attribute position being respectively embedded in document specifically includes following steps:
For ground floor, it is respectively modified the NoProofing property value of all characters in ground floor, if new water the most to be embedded Official seal breath is 1, then NoProofing property value is revised as True, otherwise, keeps original value False constant;
For the second layer, it is respectively modified the LanguageIDOther property value of all characters in the second layer, if the most to be embedded New watermark information is 00, then keep original value constant, if new watermark information position the most to be embedded is 01, then revise LanguageIDOther property value is wdBasque, if new watermark information position the most to be embedded is 10, then revises LanguageIDOther property value is wdVenda, if new watermark information position the most to be embedded is 11, then revises LanguageIDOther property value is wdEstonian.
5. a document multiple digital watermarking extracting method, it is characterised in that comprise the following steps:
Step 1a: detect in pending document whether embed watermark information, if it is, all characters are divided into two-layer by rule, Proceeding to step 2a, otherwise, end processes;
Step 2a: extract watermark information in the attribute position of document ground floor, extracts watermark in the attribute position of the document second layer Information, obtains the actual extracting group number of every layer of watermark information extracted respectively according to separator;
Step 3a: according to the character of document ground floor sum, the watermark information that extracts in the ground floor and the second layer of document The length of position, respectively obtains the predetermined extraction group number of the watermark information embedding document ground floor and the second layer;
Step 4a: when the many groups watermark information extracted unanimously and all matches a data storehouse record, the actual extracting of every layer When group number is the most equal with predetermined extraction group number, the most all watermark informations are normal, and document is not attacked, defeated after inquiry data base Go out original watermark information;Otherwise, error correct is carried out.
Document multiple digital watermarking extracting method the most according to claim 5, it is characterised in that: when carrying in described step 4a The many groups watermark information taken out unanimously and all matches a data storehouse record, the actual extracting group number of every layer and predetermined extraction group When number is the most equal, it is additionally included in the attribute position in the document second layer group number of the watermark information extracted at document ground floor In attribute position in extract the group two times of number of watermark information time, all watermark informations are normal.
Document multiple digital watermarking extracting method the most according to claim 5, it is characterised in that watermark in described step 1a The detection method of information is:
In document, NoProofing property value and the LanguageIDOther property value of each character are predefined for acquiescence by system Value, detect the character attibute of each character in the document of watermark information to be extracted one by one, if exist NoProofing property value and The character that LanguageIDOther property value is different from default value, then the document is the document embedding watermark information, otherwise, should Document is the document being not embedded into watermark information.
Document multiple digital watermarking extracting method the most according to claim 5, it is characterised in that: described error correct is concrete Comprise the following steps:
Step 3a.1: the many groups watermark information extracted by separator, if how group watermark information is not quite identical, and least one set water When official seal breath matches a data storehouse record, return the watermark information extracted and point out document damage situations;Otherwise, turn 3a.2;
Step 3a.2: if how group watermark information the most not with any database record matching, prompting document is impaired seriously, extracts water outlet Official seal ceases unsuccessfully.
9. a document multiple digital watermarking flush mounting, it is characterised in that: include acquisition module (1), generation module (2), deposit Storage module (3), first embeds module (4) and second embeds module (5);
Described acquisition module (1), for obtaining the original watermark information of user's input and pending document;
Described generation module (2), for utilizing digest algorithm to calculate the summary info in original watermark information, generates new watermark letter Breath, obtains the length of new watermark information position according to new watermark information;
Described memory module (3), for being stored in original watermark information and new watermark information collectively as a data storehouse record Data base, inquiry original watermark information when being used for extracting watermark;
Described first embeds module (4), and for the character in document is divided into two-layer, the character according to document ground floor is total The length of watermark information position several, new, obtains the group number of the new watermark information of document ground floor to be embedded, suitable according to from front to back Sequence is respectively embedded in organizing new watermark information in the attribute position in document ground floor more, and organize utilizes separation more between new watermark information Symbol separates;
Described second embeds module (5), for being respectively embedded in document according to order from back to front by organizing new watermark information more In attribute position in the second layer, organize utilizes separators more between new watermark information, embeds new watermark letter in the document second layer The group number of breath is two times of the group number embedding new watermark information in document ground floor.
10. a document multiple digital watermarking extraction element, it is characterised in that: include detection module (6), extraction module (7), meter Calculate module (8) and matching module (9);
Described detection module (6), for detecting in pending document whether embed watermark information, if it is, all characters are pressed Rule is divided into two-layer, proceeds to extraction module (7), and otherwise, end processes;
Described extraction module (7), for extracting watermark information, at the attribute of the document second layer in the attribute position of document ground floor Extract watermark information in Wei, obtain the actual extracting group number of every layer of watermark information extracted respectively according to separator;
Described computing module (8), for the character sum according to document ground floor, extracts in the ground floor and the second layer of document The length of the watermark information position gone out, respectively obtains the predetermined extraction group number of the watermark information embedding document ground floor and the second layer;
Described matching module (9), records, often for unanimously and all matching a data storehouse when the many groups watermark information extracted When the actual extracting group number of layer is the most equal with predetermined extraction group number, the most all watermark informations are normal, and document is not attacked, and looks into Original watermark information is exported after asking data base;Otherwise, error correct is carried out.
CN201410035906.7A 2014-01-24 2014-01-24 A kind of document multiple digital watermarking embedding, extracting method and device Active CN103761459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410035906.7A CN103761459B (en) 2014-01-24 2014-01-24 A kind of document multiple digital watermarking embedding, extracting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410035906.7A CN103761459B (en) 2014-01-24 2014-01-24 A kind of document multiple digital watermarking embedding, extracting method and device

Publications (2)

Publication Number Publication Date
CN103761459A CN103761459A (en) 2014-04-30
CN103761459B true CN103761459B (en) 2016-08-17

Family

ID=50528695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410035906.7A Active CN103761459B (en) 2014-01-24 2014-01-24 A kind of document multiple digital watermarking embedding, extracting method and device

Country Status (1)

Country Link
CN (1) CN103761459B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023043931A1 (en) * 2021-09-20 2023-03-23 The Nielsen Company (Us), Llc Systems, apparatus, and methods to improve watermark detection in acoustic environments

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376236B (en) * 2014-12-02 2017-08-29 上海理工大学 Scheme self-adaptive digital watermark embedding grammar and extracting method based on camouflage science
CN104504342B (en) * 2014-12-04 2018-04-03 中国科学院信息工程研究所 Method using invisible character hiding information is encoded based on Unicode
CN104715168B (en) * 2015-02-13 2018-10-09 陈佳阳 A kind of file security management and control based on digital finger-print and the method and system traced to the source
CN106803047A (en) * 2017-01-13 2017-06-06 中国电建集团成都勘测设计研究院有限公司 Database water mark labeling method
CN110874456B (en) * 2018-08-31 2022-04-26 浙江大学 Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method
CN109800547B (en) * 2019-01-09 2023-04-07 杭州基尔区块链科技有限公司 Method for quickly embedding and extracting information for WORD document protection and distribution tracking
CN110414194B (en) * 2019-07-02 2023-08-04 南京理工大学 Text watermark embedding and extracting method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8189861B1 (en) * 2011-04-05 2012-05-29 Google Inc. Watermarking digital documents
CN102708535A (en) * 2012-05-11 2012-10-03 宁波大学 Zero-watermark insertion and extraction methods with multiple keys for digital images
CN102890760A (en) * 2012-10-30 2013-01-23 南京信息工程大学 Textual zero-knowledge watermark detection method based on asymmetric encryption
CN103093127A (en) * 2013-01-21 2013-05-08 深圳大学 Method and system of dynamic copyright protection based on sudoku and multiple digital watermarks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6912294B2 (en) * 2000-12-29 2005-06-28 Contentguard Holdings, Inc. Multi-stage watermarking process and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8189861B1 (en) * 2011-04-05 2012-05-29 Google Inc. Watermarking digital documents
CN102708535A (en) * 2012-05-11 2012-10-03 宁波大学 Zero-watermark insertion and extraction methods with multiple keys for digital images
CN102890760A (en) * 2012-10-30 2013-01-23 南京信息工程大学 Textual zero-knowledge watermark detection method based on asymmetric encryption
CN103093127A (en) * 2013-01-21 2013-05-08 深圳大学 Method and system of dynamic copyright protection based on sudoku and multiple digital watermarks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《中文文本多重水印算法应用研究》;袁树雄 等;《计算机工程与应用》;20090501;第96-99页 *
《英文文本多重数字水印算法设计与实现》;袁树雄 等;《计算机工程》;20060805;第146-148、154页 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023043931A1 (en) * 2021-09-20 2023-03-23 The Nielsen Company (Us), Llc Systems, apparatus, and methods to improve watermark detection in acoustic environments
US11843825B2 (en) 2021-09-20 2023-12-12 The Nielsen Company (Us), Llc Systems, apparatus, and methods to improve watermark detection in acoustic environments

Also Published As

Publication number Publication date
CN103761459A (en) 2014-04-30

Similar Documents

Publication Publication Date Title
CN103761459B (en) A kind of document multiple digital watermarking embedding, extracting method and device
Taleby Ahvanooey et al. A comparative analysis of information hiding techniques for copyright protection of text documents
US7730037B2 (en) Fragile watermarks
Xiang et al. A novel linguistic steganography based on synonym run-length encoding
Roy et al. A novel approach to format based text steganography
CN100447812C (en) Document data waterprint embedded method
CN103049682A (en) Character pitch encoding-based dual-watermark embedded text watermarking method
CN110414194B (en) Text watermark embedding and extracting method
CN102096787A (en) Method and device for hiding information based on word2007 text segmentation
Kaur et al. An existential review on text watermarking techniques
CN102855423A (en) Tracking method and device of literary works
Singh et al. A survey on text based steganography
Alginahi et al. An enhanced Kashida-based watermarking approach for Arabic text-documents
Zhang et al. A novel robust text watermarking for word document
CN104050400B (en) A kind of web page interlinkage guard method that steganography is encoded based on command character
CN101923700B (en) Double-effect digital watermarking method
Myers et al. Signal separation for nonlinear dynamical systems
CN102855424A (en) Digital fingerprint extraction method and device and literary works identification method and device
Chaudhary et al. Text steganography based on feature coding method
CN109800547B (en) Method for quickly embedding and extracting information for WORD document protection and distribution tracking
CN102682248B (en) Watermark embedding and extracting method for ultrashort Chinese text
Rui et al. A multiple watermarking algorithm for texts mixed Chinese and English
JP4863017B2 (en) Information hiding system, apparatus and method
Ji et al. Coverless information hiding method based on the keyword
Bashardoost et al. A novel zero-watermarking scheme for text document authentication

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant