CN104134023A - Watermark processing method and system - Google Patents

Watermark processing method and system Download PDF

Info

Publication number
CN104134023A
CN104134023A CN201410403059.5A CN201410403059A CN104134023A CN 104134023 A CN104134023 A CN 104134023A CN 201410403059 A CN201410403059 A CN 201410403059A CN 104134023 A CN104134023 A CN 104134023A
Authority
CN
China
Prior art keywords
quinary
information
watermark
character
blank character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410403059.5A
Other languages
Chinese (zh)
Other versions
CN104134023B (en
Inventor
郭燕慧
李祺
高晓梦
杨昕雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201410403059.5A priority Critical patent/CN104134023B/en
Priority claimed from CN201410403059.5A external-priority patent/CN104134023B/en
Publication of CN104134023A publication Critical patent/CN104134023A/en
Application granted granted Critical
Publication of CN104134023B publication Critical patent/CN104134023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/106Enforcing content protection by specific content processing
    • G06F21/1063Personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a watermark processing method and system. Watermark character string information are coded into quinary bitstreams, quinary check beginning information and quinary check ending information are added in the quinary bitstreams, and therefore quinary embedded information is acquired; pdf files are loaded, the pdf files are unified on a standard blank character in a hexadecimal mode, a preset mapping relation between the pdf files is represented according to the quinary content and the hexadecimal of the standard blank character, and the content of the standard blank character is modified to embed the quinary embedded information into the pdf files; the pdf files are analyzed to acquire the content of the stand blank character, the quinary embedded information is extracted according to the preset mapping relation, the quinary bitstreams are extracted from the quinary embedded information according to the quinary check beginning information and the quinary check ending information, and the quinary bitstreams are decoded into the watermark character string information.

Description

A kind of watermark handling method and system for processing watermark
Technical field
The invention belongs to digital watermark technology field, particularly a kind of watermark handling method and system for processing watermark.
Background technology
So-called digital watermarking is to add some numerical information to reach the functions such as the discriminating of the file true and false, copyright protection in data multimedia (as image, sound, vision signal etc.).The watermark information embedding is hidden in host's file, does not affect ornamental and the integrality of source document.Digital watermarking process embeds some to protected digital object (as rest image, video, audio frequency etc.) exactly and can prove copyright ownership or follow the tracks of information tortious, can be sequence number, corporate logo, significant text of author etc.The concept close or in close relations with watermark has a lot, from the document of current appearance, have concepts such as Information hiding (Information Hiding), information disguising (Steganography), digital watermarking (Digital Watermarking) and digital finger-print (Fingerprinting).
Portable file layout (pdf) is unique cross-platform file layout of being developed by Adobe company.Pdf file, taking PostScript language image model as basis, all can ensure accurate color and printing effect accurately on which kind of printer, and pdf can verily reproduce each character, color and the image of original copy.It is a kind of electronic document format.Pdf file layout and operating system platform are irrelevant.That is to say, pdf file is no matter be at Windows, and Unix is still general in the Mac of Apple OS operating system.This feature makes it become the desirable document format that carries out electronic document distribution and digital information propagation on Internet.Increasing e-book, the description of product, company's proclamation, network data, Email are brought into use pdf formatted file
According to the feature of text digital water mark technology, the existing text digital water mark technology for pdf file roughly can be divided into four classes: the digital watermark based on text formatting, the digital watermark based on text data, the digital watermark based on text feature and the digital watermark based on content of text.
Watermark embedded technology based on text formatting:
Watermark embedded technology based on text formatting appears at study at text digital water mark early stage, because image digital watermark field is relatively ripe, the method is attempted using text as image processing, for example, by document bitmapped, but the document gray shade scale after bitmapped is little again, is similar to bianry image, make the watermarking algorithm that is applicable in a large number digital picture be difficult to apply in document watermark system.Also utilize some features of text formatting simultaneously, comprise that character pitch, word spacing and line space etc. realize digital watermarking.
Watermark embedded technology based on text data:
Watermark embedded technology based on text data be mainly text itself just watermark embed, in text, deliberately insert the spelling that is difficult for discovering, syntax, punctuate even the mistake of content carry out embed watermark.Wherein synonym substitutes the result of study drawing under this class thought impact exactly.But these class methods all can not produce sane digital watermarking, and at some in particular cases, due to the deliberately transformation to text data or substitute the variation that can cause the text meaning of these class methods, thereby reduce the quality of text.
Watermark embedded technology based on text feature:
Watermark embedded technology based on text feature is to calculate text feature by statistical method or syntactic approach, writes analog function and produces the secret information running through in full and realize watermark.The method is hidden in secret information in the statistical law or grammer code of text, and the text of generation is that rule is readable to computing machine, but text itself is insignificant, and therefore these class methods have reduced the readability of text.
Watermark embedded technology based on content of text:
Watermark embedded technology based on content of text is mainly the Information Hiding Techniques by add watermark in the minor variations of syntactic structure, and text is longer, and the experimental data of this technology is better.Information Hiding Techniques based on text semantic develops on the basis of syntactic structure digital watermark technology, this technology is utilized ontology knowledge analysis text semantic, under the condition that does not change text semantic, adjust the content of natural language sentences, effectively realized the digital watermarking to short and small text.The method has realized watermark information embedded text content, has broken through the limitation of former digital watermark technology, has good robustness.
In the prior art, there is following shortcoming:
(1), watermark capacity is subject to the restriction of carrier: to font interval in pdf document, the features such as font color, font brightness are finely tuned, and realize the embedding of watermark information, but its form statistical nature are obvious, and embedding people capacity is less.
(2), disguised weak effect: utilize nature statement technology, by synonym change, the mode such as punctuate processing, sentence pattern conversion realizes watermark and embeds, and can resist the attack of format conversion.But because of the feature such as polysemy, complicacy of Chinese character language, be difficult to create semantic and the coherent paragraph of logic, realize difficulty large, and hidden effect be poor.
Summary of the invention
Embodiment of the present invention proposes a kind of watermark handling method and system for processing watermark, improves watermark capacity and disguise.
The technical scheme of embodiment of the present invention is as follows:
A kind of watermark handling method, the method comprises:
Watermark character string information is encoded into quinary digit code stream, in described quinary digit code stream, adds quinary beginning check information and quinary end check information, thereby obtain quinary embedding information;
Be written into pdf file, standard blank character to described pdf file under sexadecimal pattern is unified, and according to the mapping relations that set in advance between quinary position content and the hexadecimal representation of standard blank character, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file; Wherein said standard blank character comprises null character (NUL), tab, newline, carriage return character and space character;
Resolve described pdf file to obtain the content of standard blank character, extract described quinary embedding information according to the described mapping relations that set in advance, and from described quinary embedding information, extract described quinary digit code stream according to described quinary beginning check information and quinary end check information, and be watermark character string information by described quinary digit code stream decoding.
Add quinary beginning check information and quinary end check information in described quinary digit code stream time, further add quinary operation system information; The method also comprises:
Further extract described quinary operation system information, and in the time judging that described quinary operation system information and self operation system information are inconsistent, described quinary digit code stream is carried out to conversion operations.
Described to pdf file the standard blank character under sexadecimal pattern unify to comprise: standard blank character by described pdf file under sexadecimal pattern is unified is NULL (0x00), NULL (0x09), NULL (0x0A), NULL (0x0D) or NULL (0x20).
The method further comprises:
Before described quinary embedding information is embedded into described pdf file by the content by Standard modification blank character, further whether the number of criterion blank character is more than or equal to the figure place of described quinary embedding information, if, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file, if not, exit this flow process.
Described quinary operation system information comprises at least one in following operating system:
Windows, Unix operating system, (SuSE) Linux OS or FreeBSD operating system.
A kind of system for processing watermark, comprises coding module, merge module and extraction module, wherein:
Coding module for watermark character string information is encoded into quinary digit code stream, adds quinary beginning check information and quinary end check information, thereby obtains quinary embedding information in described quinary digit code stream;
Merge module, be used for being written into pdf file, standard blank character to described pdf file under sexadecimal pattern is unified, and according to the mapping relations that set in advance between quinary position content and the hexadecimal representation of standard blank character, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file; Wherein said standard blank character comprises null character (NUL), tab, newline, carriage return character and space character;
Extraction module, for resolving described pdf file to obtain the content of standard blank character, extract described quinary embedding information according to the described mapping relations that set in advance, and from described quinary embedding information, extract described quinary digit code stream according to described quinary beginning check information and quinary end check information, and be watermark character string information by described quinary digit code stream decoding.
Coding module, while being further used for adding quinary beginning check information and quinary end check information in described quinary digit code stream, adds quinary operation system information;
Extraction module, is further used for extracting described quinary operation system information, and in the time judging that described quinary operation system information and self operation system information are inconsistent, described quinary digit code stream is carried out to conversion operations.
Merge module, for by described pdf file, the unification of the standard blank character under sexadecimal pattern is NULL (0x00), NULL (0x09), NULL (0x0A), NULL (0x0D) or NULL (0x20).
Merge module, be further used for before described quinary embedding information is embedded into described pdf file by the content by Standard modification blank character, whether the number of criterion blank character is more than or equal to the figure place of described quinary embedding information, if, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file, if not, exit this flow process.
Described quinary operation system information comprises at least one in following operating system:
Windows, Unix operating system, (SuSE) Linux OS or FreeBSD operating system.
Can find out from technique scheme, in this patent, watermark character string information is encoded into quinary digit code stream, in described quinary digit code stream, adds quinary beginning check information and quinary end check information, thereby obtain quinary embedding information; Be written into pdf file, standard blank character to described pdf file under sexadecimal pattern is unified, and according to the mapping relations that set in advance between quinary position content and the hexadecimal representation of standard blank character, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file; Wherein said standard blank character comprises null character (NUL), tab, newline, carriage return character and space character; Resolve described pdf file to obtain the content of standard blank character, extract described quinary embedding information according to the described mapping relations that set in advance, and from described quinary embedding information, extract described quinary digit code stream according to described quinary beginning check information and quinary end check information, and be watermark character string information by described quinary digit code stream decoding.As can be seen here, after application this patent, based on the mapping between quinary position content and the hexadecimal representation of five kinds of standard blank characters, realized the watermark processing for pdf file.
This patent relatively at present advantage and the effect of other technologies comprises:
(1): watermark capacity is large
First, contain a large amount of normal blank characters in pdf document, this makes pdf document itself just can hold bulk information; Secondly, watermark information is encoded into quinary information flow, the watermark data stream figure place greatly shortening for traditional binary message stream, this just makes watermark information amount further improve.Comprehensive above-mentioned 2 points, the watermark embedding capacity of this system improves greatly with respect to other watermark embedding methods.
(2): disguised effective
It is consistent treating above-mentioned five kinds of standard blank characters due to pdf document, therefore before and after watermark embeds, visually do not have any variation, this hidden effect of pdf document containing watermark that other modes obtain relatively of the pdf document containing watermark that this system is obtained is better.
(3): easy to operate
This system embeds leaching process in watermark and can conveniently realize, and some scheme expends a large amount of material resources, manpower, saves time.Meanwhile, user's operation is also very simple, and when embed watermark, user only need to input the new pdf path (input does not adopt default path) for the treatment of water mark inlaying pdf document path, watermark information and contain watermark information; Extracting watermark is only to need input to treat that water lift prints pdf document path.
(4): multiple operating system support
Other watermark embed systems relatively, native system can be supported current most mainstream operation system, comprising: Windows series, Unix, Linux, FreeBSD etc.
Brief description of the drawings
Fig. 1 is watermark handling method process flow diagram according to the present invention.
Fig. 2 is that pdf file watermark of the present invention embeds process flow diagram.
Fig. 3 is pdf file watermark extracting process flow diagram of the present invention.
Fig. 4 is system for processing watermark structural drawing according to the present invention.
Fig. 5 is pdf digital watermarking system structural drawing according to the present invention.
Fig. 6 is user's embed watermark schematic diagram according to the present invention.
Fig. 7 is for according to the present invention, user extracts watermark schematic diagram.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, the present invention is described in further detail.
Succinct and directly perceived on describing, below sets forth the solution of the present invention by describing some representational embodiments.Details a large amount of in embodiment are only for helping to understand the solution of the present invention.Can be not limited to these details when but clearly, technical scheme of the present invention realizes.For fear of unnecessarily fuzzy the solution of the present invention, some embodiments are not described meticulously, but have only provided framework.Hereinafter, " comprising " refers to " including but not limited to ", " according to ... " refer to " at least basis ..., but be not limited to only basis ... ".Due to the speech habits of Chinese, while hereinafter not particularly pointing out the quantity of a composition, mean that it can be also multiple that this composition can be one, or can be regarded as at least one.
Along with the develop rapidly of computer technology and technique of internet, the propagation of digital text, copy are easier, thereby its copyright protection has become a very important problem.Compare image watermark, Text Watermarking is still in the starting stage, and pdf document occupies very important effect in digital product.In order to solve digital copyright protecting problem wherein; need to further investigate Text Watermarking; design a comparatively perfect Copyright protection scheme; in scheme, tackle the right of watermark information content, each copyright entity and each stage embed watermark strategy and technology and make certain requirement, and effectively manage.
The present invention carries out watermark embedding by the special character of replacing pdf document; because in pdf document; except the blank character in annotation, character string, stream and compression coding and encrypted characters; all blank characters (comprising these five kinds of space character, null character (NUL), newline, carriage return character and tabs) are all of equal value; and it is a blank character depending on a string continuous blank character; therefore; the capacity of the method and hidden performance reach a good balance, effectively realize the copyright protection of PDF document.
For the structure of pdf document, one has 5 blank characters, as shown in table 1:
5 kinds of blank characters of equal value in table 1pdf document
In pdf document, in compression coding, character string, stream, annotation and encrypted characters, all blank characters are all of equal value, and a string continuous blank character and a blank character are also of equal value.Therefore, the blank character except compression coding, character string, stream, annotation and encrypted characters can be referred to as to standard blank character.This patent mainly utilizes pdf document to treat this behavior of blank character, changes and carry out watermark embedding between blank character.
Fundamental purpose of the present invention is that a kind of information capacity of exploitation is large, the pdf text digital water mark technology scheme of good concealment.
First, resolve pdf file structure.The file structure (being physical arrangement) of pdf comprises four parts: file header (Header), file body (Body), cross reference table (Cross-reference Table) and end-of-file (Trailer), the present invention mainly file body is operated and reach watermark embed object.
Secondly, convert character string information to quinary information by programming, and add corresponding check bit.
Finally, according to watermark quinary information, under pdf document sexadecimal pattern, five kinds of standard blank characters are replaced: first all standard blank characters are unified into NULL (0x00), if watermark information is ' 0 ', this blank character is NULL (0x00), remains unchanged, and watermark information is ' 1 ', this blank character replaces with Tab (0x09), the like.
If all positions all cannot be embedded in pdf document when watermark information amount is encoded with quinary greatly, now can utilize pdf document to treat a succession of blank character to be considered as the feature of a blank character, 5 empty symbols are arranged entirely, there are 120 kinds of schemes of 5 unequal to, watermark information can be carried out to 120 scale codings, then to the replacement of modifying of the blank character of pdf document.Because pdf Plays blank character is generally many, therefore native system mainly adopts first method, says that watermark information is encoded into quinary, then according to each of watermark information to the replacement of modifying of pdf blank character.The extracting method of watermark is the inverse approach of watermark extracting, first all blank characters under pdf sexadecimal pattern are extracted, if blank character is NULL (0x00), watermark is set to ' 0 ' in this position, and blank character is Tab (0x09), and watermark is set to 1 in this position, the like, finally obtain the quinary data stream of watermark information, then convert its scale coding to text-string, realize watermark extracting.
Fig. 1 is watermark handling method process flow diagram according to the present invention.
As shown in Figure 1, the method comprises:
Step 101: watermark character string information is encoded into quinary digit code stream, adds quinary beginning check information and quinary end check information in described quinary digit code stream, thereby obtain quinary embedding information.
Such as, here, watermark character string can comprise various copyright informations, as " Beijing University of Post & Telecommunication ", " all rights reserved, and reproduction of the book will be prosecuted " etc.
Step 102: be written into pdf file, standard blank character to described pdf file under sexadecimal pattern is unified, and according to the mapping relations that set in advance between quinary position content and the hexadecimal representation of standard blank character, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file; Wherein said standard blank character comprises null character (NUL), tab, newline, carriage return character and space character;
Step 103: resolve described pdf file to obtain the content of standard blank character, extract described quinary embedding information according to the described mapping relations that set in advance, and from described quinary embedding information, extract described quinary digit code stream according to described quinary beginning check information and quinary end check information, and be watermark character string information by described quinary digit code stream decoding.
In one embodiment:
Add quinary beginning check information and quinary end check information in described quinary digit code stream time, further add quinary operation system information; The method also comprises:
Further extract described quinary operation system information, and in the time judging that described quinary operation system information and self operation system information are inconsistent, described quinary digit code stream is carried out to conversion operations.
In one embodiment:
Described to pdf file the standard blank character under sexadecimal pattern unify to comprise: standard blank character by described pdf file under sexadecimal pattern is unified is NULL (0x00), NULL (0x09), NULL (0x0A), NULL (0x0D) or NULL (0x20).
In one embodiment:
The method further comprises:
Before described quinary embedding information is embedded into described pdf file by the content by Standard modification blank character, further whether the number of criterion blank character is more than or equal to the figure place of described quinary embedding information, if, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file, if not, exit this flow process.
In one embodiment:
Described quinary operation system information comprises at least one in following operating system:
Windows, Unix operating system, (SuSE) Linux OS or FreeBSD operating system, etc.
In the present invention, watermark embedding flow process mainly comprises that watermark information treatment scheme and watermark information embed flow process.Fig. 3 is pdf file watermark extracting process flow diagram of the present invention.
For pdf document, it is as follows that watermark embeds flow process:
(1): first watermark information (such as watermark character string information) is processed, concrete operations are as follows:
Watermark information to text formatting is encrypted, is encoded into quinary information flow, adds and starts check information, embedding place operation system information, end check information, adds up the now figure place of quinary watermark data stream.
(2): resolve pdf document;
(3): pdf document is resolved, statistical standard blank character number, concrete operations are as follows:
Resolve pdf document, pdf file body is operated, the number of finding standard blank character position and adding up all normal blank characters, the file body of pdf document thes contents are as follows:
1?0?obj
<</Type/Pages/Count?3/Kids[3?0?R?6?0?R?9?0?R]/MediaBox[0?0?612792]/MediaBox[0?0?612?792]>>
endobj
2?0?obj
<</Producer(TTKN)/CreationDate(D:20101210203208-08'00')/Author(CNKI)/Creator(ReaderEx_DIS?2.0.0Build?3356)/ModDate(D:20130318154548+08'00')>>
endobj
3?0?obj
<</Contents?4?0?R/Type/Page/Parent?1?0?R/Rotate?0/MediaBox[0?0?595842]/CropBox[0?0?595?842]/Resources?5?0?R>>
endobj
4?0?obj
[62?0?R?63?0?R?64?0?R?65?0?R?66?0?R?67?0?R?68?0?R?69?0?R]
endobj
……
Kids[3 0 R 60 R 90 R in foregoing] contain 10 standard blank characters, MediaBox[0 0 612 792] contain 4 standard blank characters, MediaBox[0 0 612 792] in contain 4 standard blank characters, the like, the standard blank character number in statistics entire chapter pdf document in file body.
(4): all standard blank character figure places in statistics pdf document;
(5): whether criterion blank character figure place is greater than quinary watermark data stream figure place, be to continue step 6), on the contrary embed unsuccessfully.
(6): all standard blank characters in unified pdf document are that NULL (0x00) (can be also other 4 kinds (NULL (0x09), NULL (0x0A), NULL (0x0D) or NULL (0x20)), according to quinary watermark data stream information, the standard blank character in pdf document is modified, until quinary watermark data stream all finishes.Concrete operations are as follows:
First, all blank characters being unified into Null (0x00), can certainly be other 4 kinds, but all must be consistent, and unify be Null (0x00) herein.
Secondly,, based on the mapping relations that set in advance between quinary position content and the hexadecimal representation of standard blank character, according to watermark quinary information flow, standard blank character is modified.
Exemplarily, alter mode can be according to shown in table 2:
Table 2 standard blank character modification
Watermark quinary information present bit content Standard blank character alter mode
0 0x00→0x00
1 0x00→0x09
2 0x00→0x0A
3 0x00→0x0D
4 0x00→0x20
The alter mode of standard blank character is below exemplarily described, has it will be appreciated by those of skill in the art that this description is only exemplary, and be not used in protection scope of the present invention is limited.
Finally, to watermark quinary information, each carries out aforesaid operations, until watermark quinary information is all embedded in pdf document, thereby obtain the new pdf document with watermark information.
In the present invention, pdf document watermark information extraction flow process mainly comprises that watermark information extracts flow process and watermark information treatment scheme.Fig. 4 is system for processing watermark structural drawing according to the present invention.
It is as follows that pdf document watermark of the present invention extracts flow process:
(1): resolve pdf document, obtain all standard blank character inter-area traffic interareas, specific practice is as follows:
All standard blank character contents in pdf file body are added up, according to each of the content record watermark information of standard blank character.
Exemplarily, corresponding with table 2, method is as shown in table 3.
Table 3 watermark information generates relation table
Standard blank character content Watermark quinary information present bit content
0x00 0
0x09 1
0x0A 2
0x0D 3
0x20 4
In utilization, table corresponded manner obtains watermark quinary inter-area traffic interarea.
(2): in criterion blank character inter-area traffic interarea, whether contain beginning check bit sum and finish check bit, enter step (3) if exist, otherwise prove this pdf document no-watermark, extract and finish;
(3): intercept normal watermark information data stream (from starting check bit to finishing check bit);
(4): the operating system while obtaining embed watermark according to the normal watermark information data stream intercepting;
(5): the normal watermark information data stream intercepting is processed, removed various check bit, obtain data stream only with watermarked information;
(6): watermark information data stream is decoded, and concrete operations are as follows:
To the decoding of quinary watermark information, first whether consistent according to embedding extraction operating system, obtain Text Watermarking information if directly carry out decoding always, just can obtain watermark text message otherwise need to do further decoding conversion, avoid occurring because system is inconsistent mess code phenomenon.
(7): watermark information is decrypted, obtains Text Watermarking information.
Based on above-mentioned analysis, Fig. 4 of the present invention is system for processing watermark structural drawing according to the present invention.
As shown in Figure 4, this system comprises coding module, merge module and extraction module, wherein:
Coding module for watermark character string information is encoded into quinary digit code stream, adds quinary beginning check information and quinary end check information, thereby obtains quinary embedding information in described quinary digit code stream;
Merge module, be used for being written into pdf file, standard blank character to described pdf file under sexadecimal pattern is unified, and according to the mapping relations that set in advance between quinary position content and the hexadecimal representation of standard blank character, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file; Wherein said standard blank character comprises null character (NUL), tab, newline, carriage return character and space character;
Extraction module, for resolving described pdf file to obtain the content of standard blank character, extract described quinary embedding information according to the described mapping relations that set in advance, and from described quinary embedding information, extract described quinary digit code stream according to described quinary beginning check information and quinary end check information, and be watermark character string information by described quinary digit code stream decoding.
In one embodiment:
Coding module, while being further used for adding quinary beginning check information and quinary end check information in described quinary digit code stream, adds quinary operation system information;
Extraction module, is further used for extracting described quinary operation system information, and in the time judging that described quinary operation system information and self operation system information are inconsistent, described quinary digit code stream is carried out to conversion operations.
In one embodiment:
Merge module, for by described pdf file, the unification of the standard blank character under sexadecimal pattern is NULL (0x00), NULL (0x09), NULL (0x0A), NULL (0x0D) or NULL (0x20).
In one embodiment:
Merge module, be further used for before described quinary embedding information is embedded into described pdf file by the content by Standard modification blank character, whether the number of criterion blank character is more than or equal to the figure place of described quinary embedding information, if, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file, if not, exit this flow process.
In one embodiment:
Described quinary operation system information comprises at least one in following operating system: Windows, Unix operating system, (SuSE) Linux OS or FreeBSD operating system, etc.
Each large media industry, entity or individual can protect with this patent their pdf document; for example certain unit delivers one section of pdf document with patent character; just can use this system to process its pdf document; embed the title, responsible official of own unit etc. key message; so not only visually without any variation, and protecting the patent of oneself.
Fig. 5 is pdf digital watermarking system structural drawing according to the present invention; Fig. 6 is user's embed watermark schematic diagram according to the present invention; Fig. 7 is for according to the present invention, user extracts watermark schematic diagram.Fig. 6 and Fig. 7 are that a user specifically uses the process flow diagram that embeds and extract.
This patent relatively at present advantage and the effect of other technologies comprises:
(1): watermark capacity is large
First, contain a large amount of normal blank characters in pdf document, this makes pdf document itself just can hold bulk information; Secondly, watermark information is encoded into quinary information flow, the watermark data stream figure place greatly shortening for traditional binary message stream, this just makes watermark information amount further improve.Comprehensive above-mentioned 2 points, the watermark embedding capacity of this system improves greatly with respect to other watermark embedding methods.
(2): disguised effective
It is consistent treating above-mentioned five kinds of blank characters due to pdf document, therefore before and after watermark embeds, visually do not have any variation, this hidden effect of pdf document containing watermark that other modes obtain relatively of the pdf document containing watermark that this system is obtained is better.
(3): easy to operate
This system embeds leaching process in watermark and can conveniently realize, and some scheme expends a large amount of material resources, manpower, saves time.Meanwhile, user's operation is also very simple, and when embed watermark, user only need to input the new pdf path (input does not adopt default path) for the treatment of water mark inlaying pdf document path, watermark information and contain watermark information; Extracting watermark is only to need input to treat that water lift prints pdf document path.
(4): multiple operating system support
Other watermark embed systems relatively, native system can be supported current most mainstream operation system, comprising: Windows series, Unix, Linux, FreeBSD etc.
The above, be only preferred embodiment of the present invention, is not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any amendment of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. a watermark handling method, is characterized in that, the method comprises:
Watermark character string information is encoded into quinary digit code stream, in described quinary digit code stream, adds quinary beginning check information and quinary end check information, thereby obtain quinary embedding information;
Be written into pdf file, standard blank character to described pdf file under sexadecimal pattern is unified, and according to the mapping relations that set in advance between quinary position content and the hexadecimal representation of standard blank character, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file; Wherein said standard blank character comprises null character (NUL), tab, newline, carriage return character and space character;
Resolve described pdf file to obtain the content of standard blank character, extract described quinary embedding information according to the described mapping relations that set in advance, and from described quinary embedding information, extract described quinary digit code stream according to described quinary beginning check information and quinary end check information, and be watermark character string information by described quinary digit code stream decoding.
2. watermark handling method according to claim 1, is characterized in that, while adding quinary beginning check information and quinary end check information, further adds quinary operation system information in described quinary digit code stream; The method also comprises:
Further extract described quinary operation system information, and in the time judging that described quinary operation system information and self operation system information are inconsistent, described quinary digit code stream is carried out to conversion operations.
3. watermark embedding method according to claim 1, it is characterized in that, described to described pdf file the standard blank character under sexadecimal pattern unify to comprise: standard blank character by described pdf file under sexadecimal pattern is unified is NULL (0x00), NULL (0x09), NULL (0x0A), NULL (0x0D) or NULL (0x20).
4. watermark embedding method according to claim 1, is characterized in that, the method further comprises:
Before described quinary embedding information is embedded into described pdf file by the content by Standard modification blank character, further whether the number of criterion blank character is more than or equal to the figure place of described quinary embedding information, if, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file, if not, exit this flow process.
5. according to the watermark embedding method described in any one in claim 1-4, it is characterized in that, described quinary operation system information comprises at least one in following operating system:
Windows, Unix operating system, (SuSE) Linux OS or FreeBSD operating system.
6. a system for processing watermark, is characterized in that, comprises coding module, merge module and extraction module, wherein:
Coding module for watermark character string information is encoded into quinary digit code stream, adds quinary beginning check information and quinary end check information, thereby obtains quinary embedding information in described quinary digit code stream;
Merge module, be used for being written into pdf file, standard blank character to described pdf file under sexadecimal pattern is unified, and according to the mapping relations that set in advance between quinary position content and the hexadecimal representation of standard blank character, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file; Wherein said standard blank character comprises null character (NUL), tab, newline, carriage return character and space character;
Extraction module, for resolving described pdf file to obtain the content of standard blank character, extract described quinary embedding information according to the described mapping relations that set in advance, and from described quinary embedding information, extract described quinary digit code stream according to described quinary beginning check information and quinary end check information, and be watermark character string information by described quinary digit code stream decoding.
7. system for processing watermark according to claim 6, is characterized in that,
Coding module, while being further used for adding quinary beginning check information and quinary end check information in described quinary digit code stream, adds quinary operation system information;
Extraction module, is further used for extracting described quinary operation system information, and in the time judging that described quinary operation system information and self operation system information are inconsistent, described quinary digit code stream is carried out to conversion operations.
8. system for processing watermark according to claim 6, is characterized in that,
Merge module, for by described pdf file, the unification of the standard blank character under sexadecimal pattern is NULL (0x00), NULL (0x09), NULL (0x0A), NULL (0x0D) or NULL (0x20).
9. system for processing watermark according to claim 6, is characterized in that,
Merge module, be further used for before described quinary embedding information is embedded into described pdf file by the content by Standard modification blank character, whether the number of criterion blank character is more than or equal to the figure place of described quinary embedding information, if, by the content of Standard modification blank character, described quinary embedding information is embedded into described pdf file, if not, exit this flow process.
10. according to the system for processing watermark described in any one in claim 6-9, it is characterized in that, described quinary operation system information comprises at least one in following operating system:
Windows, Unix operating system, (SuSE) Linux OS or FreeBSD operating system.
CN201410403059.5A 2014-08-15 A kind of watermark handling method and system for processing watermark Active CN104134023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410403059.5A CN104134023B (en) 2014-08-15 A kind of watermark handling method and system for processing watermark

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410403059.5A CN104134023B (en) 2014-08-15 A kind of watermark handling method and system for processing watermark

Publications (2)

Publication Number Publication Date
CN104134023A true CN104134023A (en) 2014-11-05
CN104134023B CN104134023B (en) 2017-01-04

Family

ID=

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780280A (en) * 2016-11-30 2017-05-31 深圳Tcl数字技术有限公司 Digital watermarking encryption method and device
CN108830772A (en) * 2018-05-25 2018-11-16 珠海奔图电子有限公司 Watermark encoder conversion method and device
CN110609809A (en) * 2019-09-23 2019-12-24 中国银行股份有限公司 Method and device for acquiring digital file
US20220019697A1 (en) * 2020-07-16 2022-01-20 Humanscape Inc. System for embedding digital verification fingerprint and method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050053258A1 (en) * 2000-11-15 2005-03-10 Joe Pasqua System and method for watermarking a document
CN102646179A (en) * 2012-02-27 2012-08-22 中山大学 PDF (Portable Document Format) document information embedding and extraction method based on PDF documents
CN103530574A (en) * 2013-09-23 2014-01-22 中山大学 Method for inserting and extracting hidden information based on English PDF document
CN103577729A (en) * 2013-10-31 2014-02-12 北京锐安科技有限公司 Method for stamping electronic seal on PDF (portable document format) file

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050053258A1 (en) * 2000-11-15 2005-03-10 Joe Pasqua System and method for watermarking a document
CN102646179A (en) * 2012-02-27 2012-08-22 中山大学 PDF (Portable Document Format) document information embedding and extraction method based on PDF documents
CN103530574A (en) * 2013-09-23 2014-01-22 中山大学 Method for inserting and extracting hidden information based on English PDF document
CN103577729A (en) * 2013-10-31 2014-02-12 北京锐安科技有限公司 Method for stamping electronic seal on PDF (portable document format) file

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
谭国律: "《PDF文档的一种数字水印算法》", 《计算机工程与应用》 *
钟征燕 等: "《基于PDF文档结构的数字水印算法》", 《计算机应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780280A (en) * 2016-11-30 2017-05-31 深圳Tcl数字技术有限公司 Digital watermarking encryption method and device
WO2018098879A1 (en) * 2016-11-30 2018-06-07 深圳Tcl数字技术有限公司 Method and device for encrypting digital watermark
CN108830772A (en) * 2018-05-25 2018-11-16 珠海奔图电子有限公司 Watermark encoder conversion method and device
CN110609809A (en) * 2019-09-23 2019-12-24 中国银行股份有限公司 Method and device for acquiring digital file
US20220019697A1 (en) * 2020-07-16 2022-01-20 Humanscape Inc. System for embedding digital verification fingerprint and method thereof
US11836274B2 (en) * 2020-07-16 2023-12-05 Humanscape Inc. System for embedding digital verification fingerprint and method thereof

Similar Documents

Publication Publication Date Title
US10482222B2 (en) Methods, apparatus, and articles of manufacture to encode auxiliary data into text data and methods, apparatus, and articles of manufacture to obtain encoded data from text data
CN102360413B (en) Steganographic method with misguiding function of controllable secret key sequence
Taleby Ahvanooey et al. A comparative analysis of information hiding techniques for copyright protection of text documents
CN107330306B (en) Text watermark embedding and extracting method and device, electronic equipment and storage medium
Roy et al. A novel approach to format based text steganography
CN102096787B (en) Method and device for hiding information based on word2007 text segmentation
Mali et al. Implementation of text watermarking technique using natural language watermarks
CN103761459B (en) A kind of document multiple digital watermarking embedding, extracting method and device
Kaur et al. An existential review on text watermarking techniques
Singh et al. A survey on text based steganography
CN103544408A (en) Method for embedment and extraction of PDF document hidden information according to composite font
Stojanov et al. A new property coding in text steganography of Microsoft Word documents
Chen et al. Text watermarking algorithm based on semantic role labeling
CN110874456B (en) Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method
Myers et al. Signal separation for nonlinear dynamical systems
Khairullah et al. Steganography in bengali unicode text
CN109800547B (en) Method for quickly embedding and extracting information for WORD document protection and distribution tracking
CN102682248B (en) Watermark embedding and extracting method for ultrashort Chinese text
Zhang et al. New digital text watermarking algorithm based on new-defined characters
Chao et al. Information hiding in text using typesetting tools with stego-encoding
CN115048665A (en) Excel file-based information hiding method, device, equipment and storage medium
CN104134023B (en) A kind of watermark handling method and system for processing watermark
CN104134023A (en) Watermark processing method and system
CN114091080A (en) Subtitle file encryption and decryption method, system, storage medium and electronic equipment
CN110008663B (en) Method for quickly embedding and extracting information for PDF document protection and distribution tracking

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant