CN101751656A - Watermark embedding and extraction method and device - Google Patents

Watermark embedding and extraction method and device Download PDF

Info

Publication number
CN101751656A
CN101751656A CN200810240483A CN200810240483A CN101751656A CN 101751656 A CN101751656 A CN 101751656A CN 200810240483 A CN200810240483 A CN 200810240483A CN 200810240483 A CN200810240483 A CN 200810240483A CN 101751656 A CN101751656 A CN 101751656A
Authority
CN
China
Prior art keywords
punctuate
available
zone
embedded
place
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200810240483A
Other languages
Chinese (zh)
Other versions
CN101751656B (en
Inventor
康凯
于权
崔晓瑜
吴於茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Founder Electronics Chief Information Technology Co ltd
New Founder Holdings Development Co ltd
Peking University
Original Assignee
BEIJING FOUNDER E-GOVERNMENT INFORMATION TECHNOLOGY Co Ltd
Peking University
Peking University Founder Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING FOUNDER E-GOVERNMENT INFORMATION TECHNOLOGY Co Ltd, Peking University, Peking University Founder Group Co Ltd filed Critical BEIJING FOUNDER E-GOVERNMENT INFORMATION TECHNOLOGY Co Ltd
Priority to CN2008102404837A priority Critical patent/CN101751656B/en
Publication of CN101751656A publication Critical patent/CN101751656A/en
Application granted granted Critical
Publication of CN101751656B publication Critical patent/CN101751656B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Editing Of Facsimile Originals (AREA)

Abstract

The invention discloses a watermark embedding and extraction method and a device which are applied to a copyright management process of digital contents; the method comprises that usable punctuation areas in a text document with information to be embedded are determined according to a set area determination rule; original positions of punctuations in each usable punctuation area are respectively determined according to a set position determination rule; for each usable punctuation area, the positions of the punctuations are adjusted according to the original positions of the punctuations and corresponding codes to be embedded, so as to embed the codes to be embedded into the corresponding usable punctuation areas. When in watermark extraction, the corresponding rules are adopted to respectively extract the embedded codes in all the usable punctuation areas. The method is simple to be operated, the embedded watermark information has good hiding performance and high stability, and simultaneously, very good visual effect can be achieved.

Description

A kind of watermark embedding and extracting method and device
Technical field
The present invention relates to the digital copyright management field, refer to especially a kind of be used for digital text, based on text digital water mark embedding and the extracting method and the device of punctuate.
Background technology
Quickening day by day along with global information digitalization process, the text information emerge in multitude, as Profile, medical records, academic certificate, patent certificate, handwritten signature, collection books, confidential documents or the like all is the common form of text, and the importance of these text informations is self-evident.In addition, popular day by day along with ecommerce, E-Government carries out piracy tracking to the electronic publication of online distribution, electronic letter, official document or the fax of dealing carried out the true and false is judged, integrated authentication is also urgent day by day.Therefore embed watermark (promptly adding additional information) in text information, thus realize that property right protection, information security maintenance etc. also just seem particularly important.
Above said text information, two kinds of common carrier formats are arranged: the one, paper, the 2nd, electronic document.The mode of embed watermark also has two kinds: explicit and implicit expression.Obviously as seen explicit embed watermark refers to the information human eye that adds, for example translucent image watermark, is printing organization, additional bar code or the like on the background.The implicit expression embed watermark refers to that the information human eye that adds is difficult to discover, and for example the image watermark of various implicit expression need just can identify in conjunction with corresponding software with specific instrument.In the combination of above-mentioned several situations, to paper document embed watermark difficulty comparatively implicitly, wherein again in the monochromatic paper document of black and white two-value (below abbreviate the two-value paper as), to embed the difficulty maximum.Paper document embeds information, and modal is display mode, for example adds specific Background on paper background, writes specific literal (as the sample file that bank sees to the client, can write the printed words of " sample " in some zones); Or adopt special paper as carrier etc.But at the two-value paper, for example implicitly embedding information in common " in black and white ", is very difficult.
In the existing technology, the technology of some embed digital watermarks in the two-value paper has also appearred.
For example: technical literature " Brassil J, Low S, Maxemchuk N F.Copyright Protectionfor the Electronic Distribution of Text Documents.Proceedings of the IEEE, I999,87 (7): 1181-1196 " in, the typesetting format of adjusting document by modification line space, word space is disclosed, thus the embedding of the information of realization, this method principle is succinct, realizes easily.Its subject matter is that the contradiction of the stability of embedding information and visual effect is difficult to solve: if the change quantity not sufficient of line space, word space then is difficult to accomplish accurately during information extraction; If guarantee the accuracy of extraction, then need bigger change amount, be easy to be discovered by the reader, can not reach the effect of " implicit expression ".
Disclose in the patented claim " method that electronic government documents or document are encrypted and discerned the false from the genuine " (publication number CN1588351) some slight distortion done in common Chinese character, human eye is difficult for discovering, but can identify this slight distortion by craft or OCR technology, thereby reach the purpose that implicit expression embeds information.This method embeds that quantity of information is bigger, and stability is high, but relates to technology such as character library, OCR, printout, specific e-text form (being used for printing), data training, and its operation is loaded down with trivial details, workload big, cost of manufacture is very high.
Disclose in the patented claim " based on the text digital water mark technology of character topology " (publication number CN1684115) by changing the disconnected topological structure that changes character that concerns of company between each stroke of forming character (string), make the connected domain number generation conversion of the stroke composition of variant word, thus the method for the information of embedding.The major defect of this method at first is the cost of manufacture height, robustness when this external information detects is also undesirable: the light contamination of paper may cause the original stroke that is not communicated with to be communicated with, be easy to cause the stroke of original connection to disconnect and duplicate, connected domain is changed, thereby feasible the detection by the connected domain number become very unstable.
Disclose each character zoning in the text in the patented claim " a kind of digital watermarking embedding and extracting method and device " (publication number CNI945622A), the point in each zone of overturning causes the interior black pixel in zone to change, thus embedding information.The cost of manufacture height of this method embed watermark, and because the restriction of visual signature, the point of upset can only be in stroke edge, and actual is the thickness that has changed stroke, also not ideal enough to visual effect.
As seen, in the prior art document is encrypted the method for (embed watermark), existed to slip into to contain much information, workload is big, make loaded down with trivial details, the shortcoming that cost of manufacture is high; Its stability is also poor, is difficult to keep in reproduction process; And discovered by the reader easily, latent avoiding effect is bad, and visual effect is not good yet.
Summary of the invention
The embodiment of the invention provides a kind of watermark embedding and extracting method and device, is used for solving the poor stability that prior art exists during embed watermark information in text document, the problem of disguised difference.
A kind of watermark embedding method comprises:
Determine rule according to the zone of setting, determine the available punctuate zone in the text document of information to be embedded;
Determine rule according to the position of setting, determine the original position of punctuate in each described available punctuate zone respectively;
At each available punctuate zone, according to the original position and the corresponding coding to be embedded of punctuate wherein, the position of punctuate is wherein adjusted, realize described coding to be embedded is embedded in the available punctuate zone of correspondence.
A kind of watermark extracting method comprises:
Rule is determined in zone identical according to embed watermark information the time, determines the available punctuate zone in the text document of the information that embeds;
Rule is determined in position identical according to embed watermark information the time, determines the position at punctuate place in each described available punctuate zone respectively;
According to the position at described punctuate place, determine the embedded coding in each available punctuate zone respectively.
A kind of watermark flush mounting comprises:
The zone determination module is used for determining rule according to the zone of setting, and determines the available punctuate zone in the text document of information to be embedded;
Position determination module is used for determining rule according to the position of setting, and determines the original position of punctuate in each described available punctuate zone respectively;
The information merge module is used at each available punctuate zone, according to the original position and the corresponding coding to be embedded of punctuate wherein, the position of punctuate is wherein adjusted, and realizes described coding to be embedded is embedded in the available punctuate zone of correspondence.
A kind of watermark extraction apparatus comprises:
The zone determination module is used for zone identical according to embed watermark information the time and determines rule, determines the available punctuate zone in the text document of the information that embeds;
Position determination module, rule is determined in position identical when being used for basis with embed watermark information, determines the position at punctuate place in each described available punctuate zone respectively;
The coding extraction module is used for the position according to described punctuate place, determines the embedded coding in each available punctuate zone respectively.
The watermark embedding that the embodiment of the invention provides and extracting method and device are by choosing available punctuate zone; By the punctuate position in each available punctuate zone is adjusted, realize coding to be embedded is embedded in the corresponding available punctuate zone.When extracting watermark, then adopt corresponding rule to extract embedded coding in each available punctuate zone respectively according to adjusted punctuate position.Said method is simple to operate, and because human eye is far smaller than change to character position to the susceptibility of punctuate position change, therefore can make change by a relatively large margin, make that the watermark information stability that embeds is high, it is good to hide property, can guarantee good visual effect simultaneously.
Description of drawings
Fig. 1 is the process flow diagram of watermark embedding method in the embodiment of the invention;
Fig. 2 is for determining the exemplary plot in available punctuate zone in the document fragment in the embodiment of the invention;
Fig. 3 is for carrying out the synoptic diagram of frequency band division to the available punctuate zone of determining in the embodiment of the invention;
Fig. 4 be in the embodiment of the invention in available punctuate zone the text fragments example after the embedding information;
Fig. 5 is the process flow diagram of watermark extracting method in the embodiment of the invention;
Fig. 6 is the structural representation of watermark flush mounting in the embodiment of the invention;
Fig. 7 is the structural representation of watermark extraction apparatus in the embodiment of the invention.
Embodiment
Watermark embedding and extracting method that the embodiment of the invention provides, in the text document of watermark information to be embedded, choose available punctuate zone according to the regional selection rule of setting, determine the position at punctuate place in each available punctuate zone, by adjusting the position of punctuate in each available punctuate zone, reach the purpose of embed watermark information then; When extracting, still adopt identical rule to determine the position at punctuate place in available punctuate zone in the text document of embed watermark information and each available punctuate zone, the watermark information that obtains embedding according to the position at punctuate place.
The watermark embedding method that the embodiment of the invention provides by adjusting the position of punctuate in the text document, reaches the purpose of embed watermark information, its process flow diagram as shown in Figure 1, execution in step is as follows:
S101:, search and determine the available punctuate zone in the text document of information to be embedded according to the regional selection rule of setting.
At first, utilize OCR, the text document that desire is embedded information carries out space of a whole page discriminance analysis, and non-text filed features such as the frame in the removal text document, form line, image, lace obtain the plain text zone.
Literal cutting and punctuate analysis are carried out in the plain text zone, find out all available punctuates, determine available punctuate zone.
Available punctuate is that the front and back of index point all have at least one other character, and promptly satisfy the punctuate position: " other character, punctuate, other character " this position relation.Then can cast out for the punctuate that does not satisfy this condition need not.For example, ineligible owing to do not have other characters thereafter when punctuate is positioned at literal line last, need not so then cast out for such punctuate.Wherein, other character comprises other symbols of all except that punctuate such as Chinese, numeral, letter.
According to two adjacent other characters of available punctuate and front and back thereof, define initial border and termination border, obtain available punctuate zone.Wherein, the initial border in available punctuate zone can comprise: the left margin of front character, right margin, centre of gravity place or center etc., the termination border in available punctuate zone can comprise: the left margin of back character, right margin, centre of gravity place or center etc.
For example: in the text document shown in Figure 2, determine available punctuate and comprise: ", " between " show, be ", " figure.First " between ".", 7 the available punctuates such as ", " between " earlier, literary composition ".What got on the initial border in available punctuate zone is the left margin of front character, and what got on the termination border is the right margin of back character, thereby determines " show, be " shown in Figure 2, " figure.First ", 7 available punctuate zones such as " literary composition earlier, ".
S102: determine rule according to the position of setting, determine the original position of punctuate in the available punctuate zone.Specifically comprise:
(1) be some frequency bands with each available punctuate area dividing.
According to the border rule of setting, calculate the length in each available punctuate zone, promptly initial border is to the distance that stops the border.According to the distance of calculating, respectively each available punctuate zone leveling is divided into some parts, every part is a frequency band.For example be divided into k part, then from first frequency band to a last frequency band, its corresponding band index is respectively 0,1,2 ... k-1.
Continue to use the example of top, frequency band division is all carried out in 7 available punctuate zones determining among Fig. 2, for example each available punctuate area dividing is 16 frequency bands, after the division as shown in Figure 3.
(2), determine the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively according to the coordinate position at the punctuate place in each available punctuate zone.
Can represent the position of punctuate by the position parameters such as center of gravity, center, left margin or right margin of punctuate, determine the frequency band at punctuate place, and determine corresponding band index according to the position at the places such as center of gravity, center, left margin or right margin of punctuate.
When determining the frequency band at punctuate place, need to calculate earlier the coordinate position at punctuate center of gravity place according to the centre of gravity place of punctuate.The formula that calculates the punctuate barycentric coordinates is as follows:
x C = Σ x i Δ S i S
Wherein, Δ S iRepresent that the horizontal coordinate that this punctuate comprises is x iBlack pixel number;
x iRepresent any horizontal coordinate value.
S is the summation of the black pixel number of punctuate;
x CBarycentric coordinates for punctuate.
Then, obtain the frequency band at punctuate place according to the coordinate position place frequency band that calculates.Promptly, judge barycentric coordinates x according to the residing coordinate range of each frequency band CFallen into which frequency band, and corresponding band index.
For example: first zone shown in Figure 3 " is shown, is ", calculate comma barycentric coordinates in the horizontal direction after, the frequency band of determining its place according to barycentric coordinates is that band index is 6 frequency band.
Especially, for the text document of horizontally-arranged, the calculated level center of gravity, then calculative for the text document of vertical setting of types is the center of gravity of vertical direction, the computing formula of vertical center of gravity can obtain according to horizontal calculation formula of gravity centre analogy.
When needs during according to center calculation, corresponding formulas is also arranged, enumerate no longer one by one herein.
Need to prove that many punctuates have hangover, for example comma just has afterbody of point.At this moment as if being image with paper by device scans such as scanners, then it is than the situation that afterbody of point easily disappears or ruptures, and promptly the point of its afterbody has lacked.If at this moment the position of representing punctuate with the center or the left and right border of punctuate then error can occur.If represent the position of punctuate with the center of gravity of punctuate, then because the disappearance of indivedual points or increase are very little to the change of whole punctuate centre of gravity place, in the actual computation process, after this value rounded up, the barycentric coordinates value that obtains can not change basically, therefore representing the position of punctuate with the centre of gravity place of punctuate, is best selection, can obtain better stability.
S103: embed rule according to the information of setting, determine the coding to be embedded in each available punctuate zone.Wherein, information embedding rule can be provided with arbitrarily and select.Specifically comprise:
At first, according to the information to be embedded of text document correspondence, determine binary number to be embedded; The figure place of the binary number of wherein determining to be embedded is smaller or equal to the available punctuate region quantity of determining.
If information to be embedded itself is a binary number, and its figure place directly determines then that smaller or equal to the available punctuate region quantity of determining this embedding information is binary number to be embedded.
If information to be embedded itself is a binary number, but its figure place is greater than the available punctuate region quantity of determining, then select the binary number of a units smaller or equal to the available punctuate region quantity of determining, as binary number to be embedded, and set up the corresponding relation of selected binary number and information to be embedded, and with information to be embedded, with and get off with the corresponding relation keeping records of the binary number of selecting, for example be recorded in the database.
If information to be embedded itself is not a binary number, but be converted into binary number by the system conversion energy, and the binary number figure place that conversion obtains determines then that smaller or equal to the available punctuate region quantity of determining transforming the binary number that obtains is binary number to be embedded.
If information to be embedded itself is not a binary number, but be converted into binary number by the system conversion energy, and the binary number figure place that conversion obtains is greater than the available punctuate region quantity of determining, then select the binary number of a units smaller or equal to the available punctuate region quantity of determining, as binary number to be embedded, and set up the corresponding relation of selected binary number and described information to be embedded, and with information to be embedded, with and note with the corresponding relation of the binary number of selecting, for example be kept in the database.
If information to be embedded itself is not binary number, and can not be converted into binary number the time; Then select the binary number of a units smaller or equal to the available punctuate region quantity of determining, as binary number to be embedded, and set up the corresponding relation of selected binary number and information to be embedded, and with information to be embedded, with and note with the corresponding relation of the binary number of selecting, for example be kept in the database.
Then, according to the information embedding rule of binary number to be embedded and setting, determine the coding to be embedded in each available punctuate zone.Be specially: according to the figure place of binary number to be embedded and the quantity in available punctuate zone, embed rule, determine the coding to be embedded in each available punctuate zone according to the information of setting.
If the figure place of binary number to be embedded equals the quantity in available punctuate zone, then directly distribute, promptly the binary coding that directly binary number to be embedded is comprised is distributed to each available punctuate zone respectively as coding to be embedded.
If binary number figure place to be embedded then distributes the binary coding that comprises in the binary number to be embedded as coding to be embedded by redundant arithmetic respectively for each available punctuate zone less than the quantity in available punctuate zone.If the quantity in available punctuate zone is M, the figure place of binary number is N, and M<N; Then calculate M/N, obtain quotient and the remainder.The available punctuate zone of remainder correspondence is cast out, be remaining available punctuate region allocation coding to be embedded then, the merchant who for example obtains is 3, then be that first binary coding comprising in 1-3 the available punctuate region allocation binary number is as coding to be embedded, be that second binary coding comprising in 4-6 the available punctuate region allocation binary number is as coding to be embedded, ..., by that analogy.
Continue to use the example of top, if 123 these numbers are embedded text document as information to be embedded, then be converted into binary number: 1111011 with 123, because the figure place of this binary number is 7, and the available punctuate zone of determining also is 7, therefore directly distribute to 7 available punctuate zones shown in Fig. 2 with 1111011, wherein first binary coding 1, the zone " figure of this binary number distributed in zone " show, be ".First " distribute this binary number second binary coding 1 ..., or the like.
For example: with 345 these numbers as information to be embedded, then be converted into binary number: 101011001 with 345, because the figure place of this binary number is 9, and the available punctuate zone of determining also is 7, therefore select the binary number of a units at random smaller or equal to available punctuate region quantity.Can select one 7 binary number also can select one herein less than 7 binary number.Set up the binary number of selection and the corresponding relation of 345 these information to be embedded, and store 345 these information to be embedded and with selected.
Especially, known or when setting at random when the corresponding coding to be embedded in each available punctuate zone, then can omit step S103, behind execution of step S102, directly enter step S104.
S104:, adjust the position of punctuate in each available punctuate zone according to the original position and the corresponding coding to be embedded of punctuate in each available punctuate zone.Can realize that by the position of adjusting punctuate each corresponding coding to be embedded in available punctuate zone is embedded in each available punctuate zone.
Wherein, each corresponding coding to be embedded in available punctuate zone comprises 1 or 0, parity (being called for short the index parity) according to the band index of the frequency band at the parity of coding to be embedded and punctuate place, position to punctuate is adjusted, its position is changed, reach the purpose of embedding information.Be specially:
(i) if the band index of the original position correspondence at punctuate place is an odd number in certain available punctuate zone, and the to be embedded of this available punctuate zone correspondence is encoded to 0, position to the band index that then moves punctuate in this available punctuate zone is the frequency band of even number, and promptly the position of mobile punctuate makes its band index become even number.
(ii) if the band index of the original position correspondence at punctuate place is an odd number in certain available punctuate zone, and the to be embedded of this available punctuate zone correspondence is encoded to 1, then do not change the frequency band at punctuate place in this available punctuate zone, promptly the parity of the band index at punctuate place is constant.This moment is better for the stability that makes embedding information, and also the position of mobile punctuate is according to circumstances seen following explanation for details.
(iii) if the band index of the original position correspondence at punctuate place is an even number in certain available punctuate zone, and the to be embedded of this available punctuate zone correspondence is encoded to 0, then do not change the frequency band at punctuate place in this available punctuate zone, promptly the parity of the band index at punctuate place is constant.This moment is better for the stability that makes embedding information, and also the position of mobile punctuate is according to circumstances seen following explanation for details.
(iv) if the band index of the original position correspondence at punctuate place is an even number in certain available punctuate zone, and the to be embedded of this available punctuate zone correspondence is encoded to 1, position to the band index that then moves this available punctuate zone punctuate is the frequency band of odd number, and promptly the position of mobile punctuate makes its band index become even number.
After all available punctuate zones that need embed coding to be embedded were all carried out above-mentioned processing, the watermark information of just finishing text document embedded.
Continue to use above-mentioned example, when information to be embedded is 123 these numbers, to be embeddedly be encoded to 1 corresponding to what first available punctuate zone " showed; be ", the frequency band band index 6 at punctuate place is an even number in this zone, and according to the (iv) bar of above-mentioned mobile rule, the frequency band that it need be moved to band index and be odd number gets on, it is 5 or 7 frequency band that punctuate in then will this available punctuate zone moves to adjacent band index, is to be to be example on 5 the frequency band to move on to band index in the present embodiment.In like manner the punctuate with other 6 zones moves according to above-mentioned rule, and the position of punctuate as shown in Figure 4 in each available punctuate zone, mobile back.
The above-mentioned parity that (ii), does not (iii) need to change punctuate place frequency band under two kinds of situations, if but punctuate just in time is positioned near the edge of frequency band, then very unstable, (for example print, scanning etc.) causes the parity upset of its band index probably because various interference.Being in the 3rd frequency band of punctuate in certain available punctuate zone for example, the occurrence of its position is 3.01, because certain interference makes it become 2.99, then the parity of its band index has become 2 by odd number 3, then can make a mistake when subsequent detection.Therefore adjust to the centre of its place frequency band for the punctuate position of the parity upset that does not need to carry out band index, can increase stability greatly.The occurrence that is arranged in the 3rd frequency band position of for example above-mentioned punctuate is 3.01, it can be adjusted into 3.5, then can increase its stability effectively.
In like manner, change the punctuate of the band index parity of place frequency band, also it should be moved to the place, centre position of the frequency band that will adjust to, so that the stability of its present position is better for needs.For example above-mentioned available punctuate zone " shows, is ", and its index is adjusted at 5 o'clock from 6, and the occurrence that makes its present position is 5.5.
The watermark extracting method that the embodiment of the invention provides is used to extract and uses the embedded watermark information of above-mentioned watermark embedding method, its process flow diagram as shown in Figure 5, execution in step is as follows:
S201: the identical regional selection rule that is adopted according to embed watermark information the time, search and determine the available punctuate zone in the text document that has embedded information.
This step determines that the process in available punctuate zone specifically with step S101, repeats no more herein.Different is can choose at random regional selection rule at step S101, promptly can define the initial border in available punctuate zone and stop the border, and the border selection rule that is adopted must adopt handled text document embed watermark information in this step the time, identical initial border and termination border when using with embed watermark information to available punctuate zone.
Continue to use the example of top, then can determine 7 available punctuate zones in the text fragments shown in Figure 4.
S202: rule is determined in position identical according to embed watermark information the time, determines the position at punctuate place in each available punctuate zone respectively.
The process of determining punctuate position in the available punctuate zone is specifically with step S102.
Different is: when definite punctuate position, the position of the identical available punctuate of location parameter (center of gravity of punctuate, center, left margin or right margin etc.) representative when needing to use with the information of embedding, for example, represent the position of punctuate during embedding information with center of gravity, then also must represent the position of punctuate in this step with center of gravity.Accuracy with the frequency band at the punctuate place that guarantees to determine and corresponding band index.
Continue to use the example of top, " show, be " that for first available punctuate zone the frequency band of determining the punctuate place is 5 or 7.
S203:, determine the embedded coding in each available punctuate zone respectively according to the position at punctuate place in each available punctuate zone.Be specially:
According to the parity of the band index of punctuate place frequency band correspondence in the available punctuate zone, determine to embed the embedded coding in this available punctuate zone.
If the band index of the frequency band correspondence at punctuate place is an even number in the available punctuate zone, determine that then the embedded coding in this available punctuate zone is 0.
If the band index of the frequency band correspondence at punctuate place is an odd number in the available punctuate zone, determine that then the embedded coding in this available punctuate zone is 1.
Continue to use the example of top, the frequency band at punctuate place is 5 or 7 in " showing, be " according to first available punctuate zone of determining, and the embedded coding that obtains in this zone is 1.
Above-mentioned watermark extracting method can also comprise the steps:
S204: during according to embed watermark information identical information embed rule and each available punctuate zone of determining in embedded coding, obtain the embedding information in the text document.Be specially:
During according to embed watermark information identical information embed rule and each available punctuate zone of determining in embedded coding, obtain embedding the binary number in the text document.
According to the binary number that obtains, obtain the embedding information of text document correspondence.Specifically comprise: when being embedding information, then can directly arrive embedding information as if binary number; If binary number when not being embedding information, is searched the corresponding relation of binary number and the information of embedding of the embedding of storage, get access to embedding information.
Continue to use the example of top, the allocation rule during according to embedded coding in each available punctuate zone and embed watermark information, the binary number that obtains embedding in the text fragment is 1111011, further can recover the information that embeds text document is 123.
In above-mentioned watermark embedding and the extracting method, can also can on paper, directly divide by the scintigram of computer software analysis text by manually with instruments such as scales in the division of embed watermark time-frequency band; When extracting watermark, also can extract accordingly, obtain each corresponding embedded coding in available punctuate zone by computer software or manual type; And then can recover the binary number of embedding.
According to above-mentioned watermark embedding method, can make up a kind of watermark flush mounting, as shown in Figure 6, comprising: regional determination module 101, position determination module 102 and information merge module 103.
Zone determination module 101 is used for determining rule according to the zone of setting, and determines the available punctuate zone in the text document of information to be embedded.
Preferable, regional determination module 101 further can comprise: acquiring unit 1011, punctuate determining unit 1012 and regional determining unit 1013.
Acquiring unit 1011, the plain text zone that is used for obtaining text document.
Punctuate determining unit 1012, literal cutting and punctuate analysis are carried out in the plain text zone that is used for that acquiring unit 1011 is got access to, and determine the available punctuate that comprises; Wherein, the front and back of available punctuate all have other adjacent characters at least.
The zone determining unit is used for two adjacent other characters of available punctuate and front and back thereof of determining according to punctuate determining unit 1012, defines initial border and stops the border, obtains available punctuate zone.
Position determination module 102 is used for determining rule according to the position of setting, and determines the original position of punctuate in each available punctuate zone respectively.
Preferable, position determination module 102 further can comprise: frequency band division unit 1021 and position determination unit 1022.
Frequency band division unit 1021, the initial border that is used to calculate each available punctuate zone are several frequency bands according to above-mentioned distance with each available punctuate area dividing to the distance that stops the border.
Position determination unit 1022 is used for the coordinate position according to the punctuate place in each available punctuate zone, determines the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively.
Information merge module 103 is used at each available punctuate zone, according to the original position and the corresponding coding to be embedded of punctuate wherein, the position of punctuate is wherein adjusted, and realizes the coding to be embedded of correspondence is embedded in the corresponding available punctuate zone.
Above-mentioned watermark flush mounting also comprises: coding assignment module 104, be used for information to be embedded according to the text document correspondence, and determine binary number to be embedded; Wherein, the figure place of binary number is smaller or equal to the available punctuate region quantity of determining; And embed rule according to the information of binary number to be embedded and setting, determine the coding to be embedded in each available punctuate zone.
According to above-mentioned watermark extracting method, can make up a kind of watermark extraction apparatus, as shown in Figure 7, comprising: regional determination module 201, position determination module 202 and coding extraction module 203.
Zone determination module 201 is used for zone identical according to embed watermark information the time and determines rule, determines the available punctuate zone in the text document of the information that embeds.
Preferable, regional determination module 201 further can comprise: acquiring unit 2011, punctuate determining unit 2012, regional determining unit 2013.
Acquiring unit 2011, the plain text zone that is used for obtaining text document.
Punctuate determining unit 2012, literal cutting and punctuate analysis are carried out in the plain text zone that is used for that acquiring unit 2011 is got access to, and determine the available punctuate that comprises; Wherein, the front and back of available punctuate all have other adjacent characters at least.
Zone determining unit 2013, available punctuate zone is determined on the available punctuate that is used for determining according to punctuate determining unit 2012, the initial border that defines when adjacent two other characters and embed watermark information before and after it and stop the border.
Position determination module 202, rule is determined in position identical when being used for basis with embed watermark information, determines the position at punctuate place in each available punctuate zone respectively.
Preferable, position determination module 202 further can comprise: frequency band division unit 2021 and position determination unit 2022.
Frequency band division unit 2021, the initial border that is used to calculate each available punctuate zone are several frequency bands according to above-mentioned distance with each available punctuate area dividing to the distance that stops the border.
Position determination unit 2022 is used for and according to the coordinate position at the punctuate place in each available punctuate zone, determines the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively.
Coding extraction module 203 is used for the position at the punctuate place determined according to position determination module 202, determines the embedded coding in each available punctuate zone respectively.
Above-mentioned watermark extraction apparatus also comprises: information is recovered module 204, and identical information embeds the embedded coding in rule and each available punctuate zone of determining when being used for according to embed watermark information, obtains embedding the binary number in the text document; And, obtain the embedding information of text document correspondence according to above-mentioned binary number.
The watermark embedding that the embodiment of the invention provides and extracting method and device are by choosing available punctuate zone; By the punctuate position in each available punctuate zone is adjusted, realize coding to be embedded is embedded in the corresponding available punctuate zone.When extracting watermark, then adopt corresponding rule to extract embedded coding in each available punctuate zone respectively according to adjusted punctuate position.Said method is simple to operate, and because human eye is far smaller than change to character position to the susceptibility of punctuate position change, therefore can make change by a relatively large margin, make that the watermark information stability that embeds is high, it is good to hide property, can obtain good visual effect simultaneously.
In addition, can also be according to information to be embedded, determine binary number to be embedded, and then embed regular by the information of setting, be each available punctuate region allocation coding to be embedded, according to the value of coding to be embedded and the original position of punctuate (frequency band at place), the position of adjustment and mobile punctuate, the information of finishing embeds; Safety, reliability height that information embeds, the accuracy during information extraction is also than higher.Especially when changing the punctuate center of gravity and come embed watermark information, can obtain better, more stable, embed and the Detection and Extraction effect more accurately.
Said method that the embodiment of the invention provides and device, the information hiding of embed watermark information is good, and has very high robustness, can resist repeatedly print, the attack of duplicating, convergent-divergent, watermark extracting can realize blind Detecting, computing is quick, is applicable to vision is required height, and robustness is wanted demanding occasion.Solved prior art because implementation cost is too high, or stability and the contradiction of visual effect can't solve and causes the problem that is difficult to use in practice.
The above; only be the preferable embodiment of the present invention; but protection scope of the present invention is not limited thereto; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily, replace or be applied to other similar devices, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims (24)

1. a watermark embedding method is characterized in that, comprising:
Determine rule according to the zone of setting, determine the available punctuate zone in the text document of information to be embedded;
Determine rule according to the position of setting, determine the original position of punctuate in each described available punctuate zone respectively;
At each available punctuate zone, according to the original position and the corresponding coding to be embedded of punctuate wherein, the position of punctuate is wherein adjusted, realize described coding to be embedded is embedded in the available punctuate zone of correspondence.
2. the method for claim 1 is characterized in that, describedly determines rule according to the zone of setting, and determines the available punctuate zone in the text document of information to be embedded, specifically comprises:
Obtain the plain text zone in the described text document;
Literal cutting and punctuate analysis are carried out in described plain text zone, determine the available punctuate that comprises; The front and back of described available punctuate all have other adjacent characters at least;
According to two adjacent other characters of described available punctuate and front and back thereof, define initial border and stop the border, obtain available punctuate zone.
3. method as claimed in claim 2 is characterized in that, the initial border in described available punctuate zone comprises: the left margin of front character, right margin, centre of gravity place or center;
The termination border in described available punctuate zone comprises: the left margin of back character, right margin, centre of gravity place or center.
4. method as claimed in claim 2 is characterized in that, and is described according to the definite rule in the position of setting, and determines the original position of punctuate in each described available punctuate zone respectively, specifically comprises:
The initial border of calculating each described available punctuate zone is several frequency bands according to described distance with each described available punctuate area dividing to the distance that stops the border;
And, determine the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively according to the coordinate position at the punctuate place in each available punctuate zone.
5. method as claimed in claim 4 is characterized in that, the coordinate position at described punctuate place specifically comprises: the coordinate position at the center of gravity of described punctuate, center, left margin or right margin place.
6. method as claimed in claim 4 is characterized in that, described basis is the original position of punctuate and corresponding coding to be embedded wherein, and the position of punctuate is wherein adjusted, and specifically comprises:
If the band index of the original position correspondence at described punctuate place is an odd number, and corresponding to be embeddedly be encoded to 0, the position of then moving described punctuate is the frequency band of even number to band index;
If the band index of the original position correspondence at described punctuate place is an odd number, and corresponding to be embeddedly be encoded to 1, then do not change the frequency band at described punctuate place;
If the band index of the original position correspondence at described punctuate place is an even number, and corresponding to be embeddedly be encoded to 0, then do not change the frequency band at described punctuate place;
If the band index of the original position correspondence at described punctuate place is an even number, and corresponding to be embeddedly be encoded to 1, the position of then moving described punctuate is the frequency band of odd number to band index.
7. as the arbitrary described method of claim 1-6, it is characterized in that, also comprise:
According to the information to be embedded of described text document correspondence, determine binary number to be embedded; The figure place of described binary number is smaller or equal to the described available punctuate region quantity of determining;
Embed rule according to the described binary number to be embedded and the information of setting, determine the coding to be embedded in each described available punctuate zone.
8. method as claimed in claim 7 is characterized in that, described information to be embedded according to described text document correspondence is determined binary number to be embedded, specifically comprises:
If described information to be embedded itself is maybe to be converted into the binary number of figure place smaller or equal to described available punctuate region quantity, determine that then the binary number that described information to be embedded or conversion obtain is a binary number to be embedded;
If described information to be embedded itself is maybe to be converted into the binary number of figure place greater than described available punctuate region quantity, or described information to be embedded is not and can not be converted into binary number the time; Then select the binary number of a units,, and set up the binary number to be embedded and the corresponding relation of described information to be embedded as binary number to be embedded smaller or equal to described available punctuate region quantity.
9. method as claimed in claim 8 is characterized in that, and is described according to the described binary number to be embedded and the information embedding rule of setting, determines the coding to be embedded in each described available punctuate zone, specifically comprises:
If the figure place of described binary number to be embedded equals the quantity in available punctuate zone, then the binary coding that directly binary number to be embedded is comprised is distributed to each available punctuate zone respectively as coding to be embedded;
If binary number figure place to be embedded then distributes the binary coding that comprises in the binary number to be embedded as coding to be embedded by redundant arithmetic respectively for each available punctuate zone less than the quantity in available punctuate zone.
10. a watermark extracting method is characterized in that, comprising:
Rule is determined in zone identical according to embed watermark information the time, determines the available punctuate zone in the text document of the information that embeds;
Rule is determined in position identical according to embed watermark information the time, determines the position at punctuate place in each described available punctuate zone respectively;
According to the position at described punctuate place, determine the embedded coding in each available punctuate zone respectively.
11. method as claimed in claim 10 is characterized in that, described basis is determined rule in identical zone during with embed watermark information, determines the available punctuate zone in the text document of the information that embeds, specifically comprises:
Obtain the plain text zone in the described text document;
Literal cutting and punctuate analysis are carried out in described plain text zone, determine the available punctuate that comprises; The front and back of described available punctuate all have other adjacent characters at least;
According to described available punctuate, the initial border that defines when adjacent two other characters and embed watermark information before and after it with stop the border, determine available punctuate zone.
12. method as claimed in claim 11 is characterized in that, the initial border in described available punctuate zone comprises: the left margin of front character, right margin, centre of gravity place or center;
The termination border in described available punctuate zone comprises: the left margin of back character, right margin, centre of gravity place or center.
13. method as claimed in claim 11 is characterized in that, described basis is determined rule in identical position during with embed watermark information, determines the position at punctuate place in each described available punctuate zone respectively, specifically comprises:
The initial border of calculating each described available punctuate zone is several frequency bands according to described distance with each described available punctuate area dividing to the distance that stops the border;
And, determine the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively according to the coordinate position at the punctuate place in each available punctuate zone.
14. method as claimed in claim 13 is characterized in that, the coordinate position at described punctuate place specifically comprises: the coordinate position at the center of gravity of described punctuate, center, left margin or right margin place.
15. method as claimed in claim 13 is characterized in that, described position according to described punctuate place is determined the embedded coding in each available punctuate zone respectively, specifically comprises:
If the band index of the frequency band correspondence at punctuate place is an even number described in the available punctuate zone, determine that then the embedded coding in this available punctuate zone is 0;
If the band index of the frequency band correspondence at punctuate place is an odd number described in the available punctuate zone, determine that then the embedded coding in this available punctuate zone is 1.
16. as the arbitrary described method of claim 10-15, it is characterized in that, also comprise:
During according to embed watermark information identical information embed rule and each available punctuate zone of determining in embedded coding, obtain embedding the binary number in the described text document; And
According to described binary number, obtain the embedding information of described text document correspondence.
17. a watermark flush mounting is characterized in that, comprising:
The zone determination module is used for determining rule according to the zone of setting, and determines the available punctuate zone in the text document of information to be embedded;
Position determination module is used for determining rule according to the position of setting, and determines the original position of punctuate in each described available punctuate zone respectively;
The information merge module is used at each available punctuate zone, according to the original position and the corresponding coding to be embedded of punctuate wherein, the position of punctuate is wherein adjusted, and realizes described coding to be embedded is embedded in the available punctuate zone of correspondence.
18. device as claimed in claim 17 is characterized in that, described regional determination module comprises:
Acquiring unit, the plain text zone that is used for obtaining described text document;
The punctuate determining unit is used for literal cutting and punctuate analysis are carried out in described plain text zone, determines the available punctuate that comprises; The front and back of described available punctuate all have other adjacent characters at least;
The zone determining unit is used for according to two adjacent other characters of described available punctuate and front and back thereof, defines initial border and stops the border, obtains available punctuate zone.
19. device as claimed in claim 18 is characterized in that, described position determination module comprises:
Frequency band division unit, the initial border that is used to calculate each described available punctuate zone are several frequency bands according to described distance with each described available punctuate area dividing to the distance that stops the border;
Position determination unit is used for the coordinate position according to the punctuate place in each available punctuate zone, determines the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively.
20. as the arbitrary described device of claim 17-19, it is characterized in that, also comprise:
The coding assignment module is used for the information to be embedded according to described text document correspondence, determines binary number to be embedded; The figure place of described binary number is smaller or equal to the described available punctuate region quantity of determining; And, determine the coding to be embedded in each described available punctuate zone according to the described binary number to be embedded and the information embedding rule of setting.
21. a watermark extraction apparatus is characterized in that, comprising:
The zone determination module is used for zone identical according to embed watermark information the time and determines rule, determines the available punctuate zone in the text document of the information that embeds;
Position determination module, rule is determined in position identical when being used for basis with embed watermark information, determines the position at punctuate place in each described available punctuate zone respectively;
The coding extraction module is used for the position according to described punctuate place, determines the embedded coding in each available punctuate zone respectively.
22. device as claimed in claim 21 is characterized in that, described regional determination module comprises:
Acquiring unit, the plain text zone that is used for obtaining described text document;
The punctuate determining unit is used for literal cutting and punctuate analysis are carried out in described plain text zone, determines the available punctuate that comprises; The front and back of described available punctuate all have other adjacent characters at least;
The zone determining unit is used for according to described available punctuate, the initial border that defines when adjacent two other characters and embed watermark information before and after it and stops the border, determines available punctuate zone.
23. device as claimed in claim 22 is characterized in that, described position determination module comprises:
Frequency band division unit, the initial border that is used to calculate each described available punctuate zone are several frequency bands according to described distance with each described available punctuate area dividing to the distance that stops the border;
Position determination unit is used for and according to the coordinate position at the punctuate place in each available punctuate zone, determines the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively.
24. as the arbitrary described device of claim 21-23, it is characterized in that, also comprise:
Information is recovered module, and identical information embeds the embedded coding in rule and each available punctuate zone of determining when being used for according to embed watermark information, obtains embedding the binary number in the described text document; And, obtain the embedding information of described text document correspondence according to described binary number.
CN2008102404837A 2008-12-22 2008-12-22 Watermark embedding and extraction method and device Expired - Fee Related CN101751656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102404837A CN101751656B (en) 2008-12-22 2008-12-22 Watermark embedding and extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102404837A CN101751656B (en) 2008-12-22 2008-12-22 Watermark embedding and extraction method and device

Publications (2)

Publication Number Publication Date
CN101751656A true CN101751656A (en) 2010-06-23
CN101751656B CN101751656B (en) 2012-03-28

Family

ID=42478602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102404837A Expired - Fee Related CN101751656B (en) 2008-12-22 2008-12-22 Watermark embedding and extraction method and device

Country Status (1)

Country Link
CN (1) CN101751656B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682248A (en) * 2012-05-15 2012-09-19 西北大学 Watermark embedding and extracting method for ultrashort Chinese text
CN102737204A (en) * 2011-04-01 2012-10-17 北京利云技术开发公司 Safety document, and method and apparatus for generating and detecting the safety document
CN106780280A (en) * 2016-11-30 2017-05-31 深圳Tcl数字技术有限公司 Digital watermarking encryption method and device
CN108174051A (en) * 2017-12-08 2018-06-15 新华三技术有限公司 A kind of vector watermark decoding method, device and electronic equipment
CN110457873A (en) * 2018-05-08 2019-11-15 中移(苏州)软件技术有限公司 A kind of watermark embedding and detection method and device
CN110914820A (en) * 2019-05-20 2020-03-24 阿里巴巴集团控股有限公司 Identifying copyrighted material using embedded copyright information
CN111985208A (en) * 2020-08-18 2020-11-24 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for realizing punctuation mark filling

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003230001A (en) * 2002-02-01 2003-08-15 Canon Inc Apparatus for embedding electronic watermark to document, apparatus for extracting electronic watermark from document, and control method therefor
CN100505621C (en) * 2005-05-18 2009-06-24 上海龙方信息技术有限公司 Method for digital signature locking localization
CN101093574A (en) * 2007-07-23 2007-12-26 中国人民解放军信息工程大学 Watermark method of vectorial geographical spatial data based on integral wavelet transforms

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737204A (en) * 2011-04-01 2012-10-17 北京利云技术开发公司 Safety document, and method and apparatus for generating and detecting the safety document
CN102737204B (en) * 2011-04-01 2015-08-05 北京利云技术开发公司 The method and apparatus of a kind of security document and generation and this security document of detection
CN102682248A (en) * 2012-05-15 2012-09-19 西北大学 Watermark embedding and extracting method for ultrashort Chinese text
CN102682248B (en) * 2012-05-15 2015-01-07 西北大学 Watermark embedding and extracting method for ultrashort Chinese text
CN106780280A (en) * 2016-11-30 2017-05-31 深圳Tcl数字技术有限公司 Digital watermarking encryption method and device
WO2018098879A1 (en) * 2016-11-30 2018-06-07 深圳Tcl数字技术有限公司 Method and device for encrypting digital watermark
CN108174051A (en) * 2017-12-08 2018-06-15 新华三技术有限公司 A kind of vector watermark decoding method, device and electronic equipment
CN108174051B (en) * 2017-12-08 2019-11-08 新华三技术有限公司 A kind of vector watermark decoding method, device and electronic equipment
CN110457873A (en) * 2018-05-08 2019-11-15 中移(苏州)软件技术有限公司 A kind of watermark embedding and detection method and device
CN110914820A (en) * 2019-05-20 2020-03-24 阿里巴巴集团控股有限公司 Identifying copyrighted material using embedded copyright information
CN111985208A (en) * 2020-08-18 2020-11-24 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for realizing punctuation mark filling
CN111985208B (en) * 2020-08-18 2024-03-26 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for realizing punctuation mark filling

Also Published As

Publication number Publication date
CN101751656B (en) 2012-03-28

Similar Documents

Publication Publication Date Title
CN101751656B (en) Watermark embedding and extraction method and device
KR101016712B1 (en) Watermark information detection method
US8224019B2 (en) Embedding information in document blank space
EP1906645B1 (en) Electronic watermark embedment apparatus and electronic watermark detection apparatus
Amano et al. A feature calibration method for watermarking of document images
US8243982B2 (en) Embedding information in document border space
US8335342B2 (en) Protecting printed items intended for public exchange with information embedded in blank document borders
CN102567938B (en) Watermark image blocking method and device for western language watermark processing
US20130170751A1 (en) Methods and devices for processing scanned book's data
US20100128299A1 (en) Prevention of unauthorized copying or scanning
TW200540728A (en) Text region recognition method, storage medium and system
Tan et al. Print-Scan Resilient Text Image Watermarking Based on Stroke Direction Modulation for Chinese Document Authentication.
Richter et al. Forensic analysis and anonymisation of printed documents
JP2011070558A (en) Document image processor, document image processing method and document image processing program
JP4871794B2 (en) Printing apparatus and printing method
JP6578858B2 (en) Information processing apparatus and program
CN112650992A (en) Document tracking encryption method based on digital watermark
TWI411927B (en) Method of embedding information in input image, method of extracting information from input image and related apparatuses thereof
JP2008085579A (en) Device for embedding information, information reader, method for embedding information, method for reading information and computer program
US7796777B2 (en) Digital watermarking system according to matrix margin and digital watermarking method
Yang et al. A SVM based text steganalysis algorithm for spacing coding
RU2431192C1 (en) Method of inserting secret digital message into printed documents and extracting said message
US20060285719A1 (en) Digital watermarking system according to pixel display property and digital watermarking method
Awan et al. Utilization of maximum data hiding capacity in object-based text document authentication
Makur et al. Watermark based recovery of tampered documents

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220914

Address after: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District

Patentee after: Peking University

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: BEIJING FOUNDER ELECTRONICS CHIEF INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District

Patentee before: Peking University

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: BEIJING FOUNDER ELECTRONICS CHIEF INFORMATION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120328

CF01 Termination of patent right due to non-payment of annual fee