CN103761459A

CN103761459A - Document multiple digital watermarking insertion method and device, and document multiple digital watermarking extraction method and device

Info

Publication number: CN103761459A
Application number: CN201410035906.7A
Authority: CN
Inventors: 陈小军; 时金桥; 徐睿; 蒲以国; 赵亮; 张锐
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2014-01-24
Filing date: 2014-01-24
Publication date: 2014-04-30
Anticipated expiration: 2034-01-24
Also published as: CN103761459B

Abstract

The invention relates to a document multiple digital watermarking insertion method and device, and a document multiple digital watermarking extraction method and device. The document multiple digital watermarking insertion method comprises the following steps of obtaining an original watermarking message, a secret key and a document to be processed, wherein the original watermarking message is input by a user; calculating a summary message in the original watermarking message, generating new watermarking messages, storing the original watermarking message and the new watermarking messages into a database as a database record together, dividing characters in the document into two layers, obtaining the number of sets of the new watermarking messages to be inserted into the first layer of the document according to the total number of the characters of the first layer of the document and the length of the digits of the new watermarking messages, inserting the multiple sets of new watermarking messages into the property digits of the first layer of the document sequentially from front to back, and inserting the multiple sets of new watermarking messages into the property digits of the second layer of the document sequentially from back to front. Based on the character property of the document of the Word format, safety is improved due to the use of the secret key, robustness is enhanced due to repeated insertion, and watermarking capacity is improved due to multiple insertions.

Description

The embedding of a kind of document multiple digital watermarking, extracting method and device

Technical field

The present invention relates to digital watermarking field, particularly a kind of document multiple digital watermarking embedding, extracting method and device.

Background technology

In recent years, along with developing rapidly of multimedia and network technology, the copyright of protection copyright becomes a much-talked-about topic of current academia research.Digital watermarking, as the important research direction of Information Hiding Techniques, has important value aspect the multimedia copyright protections such as text, video, audio frequency.Digital watermarking is that the copyright informations such as sequence number, word, logos are embedded in multi-medium data, to play that the true and false of copyright protection, confidential corespondence, data file is differentiated and the effect such as product marking.

The Text Watermarking method that existing availability is higher mainly contains Text Watermarking and this two large class of the Text Watermarking based on natural language based on form.Based on format text, watermark is the maximum class text watermark occurring up to now; from initial row displacement, word displacement, feature coding; develop finally methods such as changing font size, color; the research of the water mark method of this type is very active, but the method exists as weak points such as security are weak, watermark capacity is low.Text Watermarking based on natural language proposes early than people such as the Mikhail.J.Atallah of 02Nian You U.S. Purdue university and VictorRaskin.Mainly to add watermark information by changing the methods such as sentence structure, synonym replacement.Natural language digital watermarking has changed the content of text, but does not change implication and the form of text, after interpolation watermark, may be discovered hardly, is also not easy destroyed.But for normative document, stricter because of its call format, this kind of method may change semanteme, thereby is not suitable for the strict file of call format.Because computing machine is ripe not enough to the processing of natural language, this has become the bottleneck based on natural Language Watermarking technology in addition.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of character attibute based on Word format file, utilize key to improve security, repeat to embed and add the embedding of document multiple digital watermarking, extracting method and the device that strong robustness, multiple embedding have improved watermark capacity.

The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of document multiple digital watermarking embedding grammar, comprises the following steps:

Step 1: the original watermark information, key and the pending document that obtain user's input;

Step 2: utilize digest algorithm to calculate the summary info in original watermark information, generate new watermark information, obtain the length of new watermark information position according to new watermark information;

Step 3: jointly deposit original watermark information and new watermark information in database as a data-base recording, inquire about original watermark information when extracting watermark;

Step 4: the character in document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;

Step 5: be embedded into respectively in the attribute bit in the document second layer organizing new watermark information according to order from back to front more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.

The invention has the beneficial effects as follows: the present invention is based on the character attibute of Word format file, use key to improve security, repeat to embed and strengthened robustness, multiple embedding has improved watermark capacity.

On the basis of technique scheme, the present invention can also do following improvement.

Further, the character in document being divided into two-layer method specifically comprises the following steps:

Obtain the Unicode coding as the character of key, the Unicode coding of the character as key be converted into binary sequence, using binary sequence last two as key sequence;

Obtain the Unicode coding of all characters in document, the Unicode coding of each character in document is separately converted to binary sequence;

By key sequence respectively with document in the binary sequence that changes into of each character carry out xor operation, if result is 00,10, this character is divided into document ground floor; If result is 01,11, be divided into the document second layer.

Further, described separator is the binary sequence of the non-visible character that is of little use arbitrarily in Unicode coding.

Further, by organizing the different attribute position that new watermark information is embedded into respectively in document, specifically comprise the following steps more:

For ground floor, revise respectively the NoProofing property value of all characters in ground floor, if current new watermark information to be embedded is 1, NoProofing property value is revised as to True, otherwise, keep original value False constant;

For the second layer, revise respectively the LanguageIDOther property value of all characters in the second layer, if current new watermark information to be embedded is 00, keep original value constant, if current new watermark information position to be embedded is 01, revising LanguageIDOther property value is wdBasque, if current new watermark information position to be embedded is 10, revising LanguageIDOther property value is wdVenda, if current new watermark information position to be embedded is 11, revising LanguageIDOther property value is wdEstonian.

Further, a kind of document multiple digital watermarking extracting method, comprises the following steps:

Step 1a: whether detect in pending document embed watermark information, if so, all characters are divided into two-layer by rule, proceed to step 2a, otherwise, end process;

Step 2a: extract watermark information in the attribute bit of document ground floor, extract watermark information in the attribute bit of the document second layer, obtain respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;

Step 3a: according to the length of the character sum of document ground floor, the watermark information position that extracts, obtain respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer in the ground floor of document and the second layer;

Step 4a: when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, all watermark informations are normal, document is not attacked, and exports original watermark information after Query Database; Otherwise, carry out error correct.

Further, in described step 4a when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, during two times of the group number that the group number that is also included in the watermark information extracting in the attribute bit in the document second layer is the watermark information that extracts in the attribute bit in document ground floor, all watermark informations are normal.

Further, in document, the NoProofing property value of each character and LanguageIDOther property value are predefined for default value by system, detect one by one the character attibute of each character in the document of watermark information to be extracted, if there is the NoProofing property value character different from default value with LanguageIDOther property value, the document that the document is embed watermark information, otherwise the document is the document of embed watermark information not.

Further, described error correct specifically comprises the following steps:

Step 3a.1: press many groups watermark information that separator extracts, if to organize watermark information not quite identical more, and at least one group of watermark information is while matching a data-base recording, returns to the watermark information extracting and points out document damage situations; Otherwise, turn 3a.2;

Step 3a.2: all do not mate with arbitrary data-base recording if organize watermark information, prompting document is impaired serious more, extracts watermark information failure.

Further, a kind of document multiple digital watermarking flush mounting, comprises acquisition module, generation module, memory module, the first merge module and the second merge module;

Described acquisition module, for obtaining original watermark information, key and the pending document of user's input;

Described generation module, for utilizing digest algorithm to calculate the summary info of original watermark information, generates new watermark information, obtains the length of new watermark information position according to new watermark information;

Described memory module, for jointly depositing original watermark information and new watermark information in database as a data-base recording, inquires about original watermark information when extracting watermark;

Described the first merge module, for the character of document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;

Described the second merge module, for the order according to from back to front, by organizing new watermark information, be embedded into respectively in the attribute bit of the document second layer more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.

Further, a kind of document multiple digital watermarking extraction element, comprises detection module, extraction module, computing module and matching module;

Described detection module, for detection of embed watermark information whether in pending document, if so, all characters are divided into two-layer by rule, proceed to extraction module, otherwise, end process;

Described extraction module for extracting watermark information in the attribute bit of document ground floor, extracts watermark information in the attribute bit of the document second layer, obtains respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;

Described computing module, for according to the length of the character sum of document ground floor, the watermark information position that extracts at ground floor and the second layer of document, obtains respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer;

Described matching module, while all equating with predetermined extraction group number for unanimously and all match a data-base recording, the actual extracting group number of every layer when the many groups watermark information extracting, all watermark informations are normal, and document is not attacked, and export original watermark information after Query Database; Otherwise, carry out error correct.

Accompanying drawing explanation

Fig. 1 is document multiple digital watermarking embedding grammar process flow diagram of the present invention;

Fig. 2 is document multiple digital watermarking extracting method process flow diagram of the present invention;

Fig. 3 is document multiple digital watermarking flush mounting structural drawing of the present invention;

Fig. 4 is document multiple digital watermarking extraction element structural drawing of the present invention.

In accompanying drawing, the list of parts of each label representative is as follows:

1, acquisition module, 2, generation module, 3, memory module, the 4, first merge module, the 5, second merge module, 6, detection module, 7, extraction module, 8, matching module.

Embodiment

Below in conjunction with accompanying drawing, principle of the present invention and feature are described, example, only for explaining the present invention, is not intended to limit scope of the present invention.

As shown in Figure 1, be document multiple digital watermarking embedding grammar process flow diagram of the present invention; Fig. 2 is document multiple digital watermarking extracting method process flow diagram of the present invention; Fig. 3 is document multiple digital watermarking flush mounting structural drawing of the present invention; Fig. 4 is document multiple digital watermarking extraction element structural drawing of the present invention.

Embodiment 1

A document multiple digital watermarking embedding grammar, comprises the following steps:

Character in document is divided into two-layer method specifically to be comprised the following steps:

Described separator is the binary sequence of the non-visible character that is of little use arbitrarily in Unicode coding.

By organizing the different attribute position that new watermark information is embedded into respectively in document, specifically comprise the following steps more:

A document multiple digital watermarking extracting method, comprises the following steps:

In described step 4a when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, during two times of the group number that the group number that is also included in the watermark information extracting in the attribute bit in the document second layer is the watermark information that extracts in the attribute bit in document ground floor, all watermark informations are normal.

In described step 1a, the detection method of watermark information is:

In document, the NoProofing property value of each character and LanguageIDOther property value are predefined for default value by system, detect one by one the character attibute of each character in the document of watermark information to be extracted, if there is the NoProofing property value character different from default value with LanguageIDOther property value, the document that the document is embed watermark information, otherwise the document is the document of embed watermark information not.

Described error correct specifically comprises the following steps:

A document multiple digital watermarking flush mounting, comprises acquisition module 1, generation module 2, memory module 3, the first merge modules 4 and the second merge module 5;

Described acquisition module 1, for obtaining original watermark information, key and the pending document of user's input;

Described generation module 2, for utilizing digest algorithm to calculate the summary info of original watermark information, generates new watermark information, obtains the length of new watermark information position according to new watermark information;

Described memory module 3, for jointly depositing original watermark information and new watermark information in database as a data-base recording, inquires about original watermark information when extracting watermark;

Described the first merge module 4, for the character of document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;

Described the second merge module 5, for the order according to from back to front, by organizing new watermark information, be embedded into respectively in the attribute bit of the document second layer more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.

A document multiple digital watermarking extraction element, comprises detection module 6, extraction module 7, computing module 8 and matching module 9;

Described detection module 6, for detection of embed watermark information whether in pending document, if so, all characters are divided into two-layer by rule, proceed to extraction module 7, otherwise, end process;

Described extraction module 7 for extracting watermark information in the attribute bit of document ground floor, extracts watermark information in the attribute bit of the document second layer, obtains respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;

Described computing module 8, for according to the length of the character sum of document ground floor, the watermark information position that extracts at ground floor and the second layer of document, obtains respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer;

Described matching module 9, while all equating with predetermined extraction group number for unanimously and all match a data-base recording, the actual extracting group number of every layer when the many groups watermark information extracting, all watermark informations are normal, and document is not attacked, and export original watermark information after Query Database; Otherwise, carry out error correct.

In concrete enforcement, embedding grammar of the present invention comprises following 6 steps:

1) input original watermark information, key and pending Word document;

2) by message digest algorithm such as MD5 or SHA1, calculate the summary info of original watermark, using this as the watermark data using afterwards;

3) as a record, deposit the watermark information of generation and original watermark information in database, inquire about raw information when extracting;

4) all characters of Word document are divided into two-layer, for different layers, watermark information are embedded into different attribute bit;

5) if total number of characters is N, summary info bit length is M, embeds the watermark of K=N/M group, and group numerical value rounds downwards.Between every group of watermark, need separator, if the RLO of non-visible character that can choose Unicode coding is as the separator between every group, its value is 0010000000101110.For ground floor character, according to order embedding K group watermark from front to back;

6) for second layer character, with step 5, according to order embedding 2*K group watermark from back to front.

Above-mentioned steps 4), 5) be, 6) core of this method.

Step 4), the method for text layering is: obtain the Unicode coding as key character, convert thereof into binary sequence, get last two as key.In telescopiny, obtain one by one text character Unicode coding simultaneously, also convert thereof into binary sequence, get last two, carry out xor operation with key, if

● result is 00,10, is divided into ground floor, revises NoProofing position;

● result is 01,11, is divided into the second layer, revises LanguageIDOther position.

Step 5), 6), this method adopts the OLE interfacing of official of Microsoft, realizes the operation to character attibute.The ultimate principle of embed watermark is to utilize the attribute of single word in Word document: NoProofing and LanguageIDOther.The effect of these two attributes is as follows: for Selection object (as single character etc.), if this value of NoProofing attribute is True, spelling and syntax check instrument will be ignored the word of appointment; The LanguageIDOther attribute of character, this attribute bit can be set to the enumerated value of the less language of number of users, and Microsoft's this attribute of recommendation arranges or return the document Chinese and western languages word language used that language version creates from right to left at Microsoft Word.LanguageIDOther attribute has 64 enumerated values, through research screening, this method is chosen wherein the enumerated value of three less language of number of users (wdBasque, wdVenda, wdEstonian) as modification value, be that each character can embed two watermark bit, the second layer can embed the twice information of ground floor, thereby has improved watermark capacity.Above two character attibutes have the feature that could be found, add and be revised by programming, in the operation of common Word program, can not remove this watermark feature, possess stronger disguise and attack tolerant.Repeating to embed is repeatedly in order to improve its robustness, even if suffer to delete the attacks such as modification, as long as there is one group of watermark, is normal, so just can recover the raw information of watermark.

The extracting method of watermark is the inverse process of embedding grammar, for:

1) all characters of Word document to be detected are divided into two-layer by rule;

2) character of every layer is pressed to embedding method reading out data one by one, obtain n group watermark information;

3) when the watermark of n group is consistent, and can match a data-base recording time, can show that all watermark informations are completely normal, document is not attacked, and exports original watermark information after Query Database.Otherwise, turn error correct algorithm.

The error correction method of watermark is:

1) press the n group watermark that separator extracts, if n group is not quite identical, but during at least 1 group watermark matches to data-base recording, as document suffers increases, delete character etc., attack destruction, return to the watermark information extracting and point out document damage situations.Otherwise, turn 2;

2) n organizes non-matched data storehouse records of watermark, represents that every group of watermark information suffers to destroy in various degree, and prompting document is impaired serious, can not extract watermark information.

The detection method of watermark is:

In Word, the NoProoing of each character and the default value of LanguageIDOther are respectively FALSE and 1033(wdEnglishUS), detect one by one the character attibute of input, if having these two attributes is not the character of default value, the document that the document is embed watermark so.

good effect

The character attibute of embed watermark information is invisible attribute, so embed watermark is afterwards from being visually non, has good disguise.

From statistical theory, every layer of number of characters on average accounts for 50%, 100 character and is divided into two-layerly, and every layer on average has respectively 50 characters, and watermark capacity is exactly 150% so.Through experimental demonstration, result is as shown in table 1, and actual watermark capacity approaches 150%.With respect to other text watermarking algorithm, be greatly improved, as shown in table 2.

During embed watermark, original watermark information is encrypted by message digest algorithm, even if obtain the watermark information embedding, can not obtain original watermark information, has improved the security of watermark.In addition, use key to carry out layering, if the key of input error while extracting, so will wrong layering, cause the attribute dislocation of extracting, will can not get watermark information so, thereby further improve the security of watermark.

If the document after embed watermark suffers the attacks such as increase, delete character, extract after watermark information, according to separator judgement, as shown in Figure 2, underscore is separator, rectangle frame is watermark information.If there is complete watermark information after separator, extracted, can guarantee to a certain extent the robustness of water mark method.

Table 1 watermark capacity statistics

Table 2 text watermarking algorithm capacity comparison

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a document multiple digital watermarking embedding grammar, is characterized in that, comprises the following steps:

2. document multiple digital watermarking embedding grammar according to claim 1, is characterized in that: the character in document is divided into two-layer method and specifically comprises the following steps:

3. document multiple digital watermarking embedding grammar according to claim 1, is characterized in that: described separator is the binary sequence of the non-visible character that is of little use arbitrarily in Unicode coding.

4. document multiple digital watermarking embedding grammar according to claim 1, is characterized in that: by organizing the different attribute position that new watermark information is embedded into respectively in document, specifically comprise the following steps more:

5. a document multiple digital watermarking extracting method, is characterized in that, comprises the following steps:

6. document multiple digital watermarking extracting method according to claim 5, it is characterized in that: in described step 4a when the many groups watermark information extracting unanimously and all matches a data-base recording, the actual extracting group number of every layer and all equates with predetermined extraction group number, during two times of the group number that the group number that is also included in the watermark information extracting in the attribute bit in the document second layer is the watermark information that extracts in the attribute bit in document ground floor, all watermark informations are normal.

7. document multiple digital watermarking extracting method according to claim 5, is characterized in that, in described step 1a, the detection method of watermark information is:

8. document multiple digital watermarking extracting method according to claim 5, is characterized in that: described error correct specifically comprises the following steps:

9. a document multiple digital watermarking flush mounting, is characterized in that: comprise acquisition module (1), generation module (2), memory module (3), the first merge module (4) and the second merge module (5);

Described acquisition module (1), for obtaining original watermark information, key and the pending document of user's input;

Described generation module (2), for utilizing digest algorithm to calculate the summary info of original watermark information, generates new watermark information, obtains the length of new watermark information position according to new watermark information;

Described memory module (3), for jointly depositing original watermark information and new watermark information in database as a data-base recording, inquires about original watermark information when extracting watermark;

Described the first merge module (4), for the character of document is divided into two-layer, according to the length of the character sum of document ground floor, new watermark information position, obtain the group number of the new watermark information of document ground floor to be embedded, according to order from front to back, by organizing new watermark information, be embedded into respectively in the attribute bit in document ground floor more, organize between new watermark information more and utilize separator to separate;

Described the second merge module (5), for the order according to from back to front, by organizing new watermark information, be embedded into respectively in the attribute bit of the document second layer more, between the new watermark information of many groups, utilize separator to separate, the group number that embeds new watermark information in the document second layer is for embedding two times of group number of new watermark information in document ground floor.

10. a document multiple digital watermarking extraction element, is characterized in that: comprise detection module (6), extraction module (7), computing module (8) and matching module (9);

Described detection module (6), for detection of embed watermark information whether in pending document, if so, all characters are divided into two-layer by rule, proceed to extraction module (7), otherwise, end process;

Described extraction module (7) for extracting watermark information in the attribute bit of document ground floor, extracts watermark information in the attribute bit of the document second layer, obtains respectively the actual extracting group number of the watermark information of every layer of extraction according to separator;

Described computing module (8), for according to the length of the character sum of document ground floor, the watermark information position that extracts at ground floor and the second layer of document, obtains respectively embedding the predetermined extraction group number of the watermark information of document ground floor and the second layer;

Described matching module (9), while all equating with predetermined extraction group number for unanimously and all match a data-base recording, the actual extracting group number of every layer when the many groups watermark information extracting, all watermark informations are normal, document is not attacked, and exports original watermark information after Query Database; Otherwise, carry out error correct.