Embodiment
Watermark embedding and extracting method that the embodiment of the invention provides, in the text document of watermark information to be embedded, choose available punctuate zone according to the regional selection rule of setting, determine the position at punctuate place in each available punctuate zone, by adjusting the position of punctuate in each available punctuate zone, reach the purpose of embed watermark information then; When extracting, still adopt identical rule to determine the position at punctuate place in available punctuate zone in the text document of embed watermark information and each available punctuate zone, the watermark information that obtains embedding according to the position at punctuate place.
The watermark embedding method that the embodiment of the invention provides by adjusting the position of punctuate in the text document, reaches the purpose of embed watermark information, its process flow diagram as shown in Figure 1, execution in step is as follows:
S101:, search and determine the available punctuate zone in the text document of information to be embedded according to the regional selection rule of setting.
At first, utilize OCR, the text document that desire is embedded information carries out space of a whole page discriminance analysis, and non-text filed features such as the frame in the removal text document, form line, image, lace obtain the plain text zone.
Literal cutting and punctuate analysis are carried out in the plain text zone, find out all available punctuates, determine available punctuate zone.
Available punctuate is that the front and back of index point all have at least one other character, and promptly satisfy the punctuate position: " other character, punctuate, other character " this position relation.Then can cast out for the punctuate that does not satisfy this condition need not.For example, ineligible owing to do not have other characters thereafter when punctuate is positioned at literal line last, need not so then cast out for such punctuate.Wherein, other character comprises other symbols of all except that punctuate such as Chinese, numeral, letter.
According to two adjacent other characters of available punctuate and front and back thereof, define initial border and termination border, obtain available punctuate zone.Wherein, the initial border in available punctuate zone can comprise: the left margin of front character, right margin, centre of gravity place or center etc., the termination border in available punctuate zone can comprise: the left margin of back character, right margin, centre of gravity place or center etc.
For example: in the text document shown in Figure 2, determine available punctuate and comprise: ", " between " show, be ", " figure.First " between ".", 7 the available punctuates such as ", " between " earlier, literary composition ".What got on the initial border in available punctuate zone is the left margin of front character, and what got on the termination border is the right margin of back character, thereby determines " show, be " shown in Figure 2, " figure.First ", 7 available punctuate zones such as " literary composition earlier, ".
S102: determine rule according to the position of setting, determine the original position of punctuate in the available punctuate zone.Specifically comprise:
(1) be some frequency bands with each available punctuate area dividing.
According to the border rule of setting, calculate the length in each available punctuate zone, promptly initial border is to the distance that stops the border.According to the distance of calculating, respectively each available punctuate zone leveling is divided into some parts, every part is a frequency band.For example be divided into k part, then from first frequency band to a last frequency band, its corresponding band index is respectively 0,1,2 ... k-1.
Continue to use the example of top, frequency band division is all carried out in 7 available punctuate zones determining among Fig. 2, for example each available punctuate area dividing is 16 frequency bands, after the division as shown in Figure 3.
(2), determine the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively according to the coordinate position at the punctuate place in each available punctuate zone.
Can represent the position of punctuate by the position parameters such as center of gravity, center, left margin or right margin of punctuate, determine the frequency band at punctuate place, and determine corresponding band index according to the position at the places such as center of gravity, center, left margin or right margin of punctuate.
When determining the frequency band at punctuate place, need to calculate earlier the coordinate position at punctuate center of gravity place according to the centre of gravity place of punctuate.The formula that calculates the punctuate barycentric coordinates is as follows:
Wherein, Δ S
iRepresent that the horizontal coordinate that this punctuate comprises is x
iBlack pixel number;
x
iRepresent any horizontal coordinate value.
S is the summation of the black pixel number of punctuate;
x
CBarycentric coordinates for punctuate.
Then, obtain the frequency band at punctuate place according to the coordinate position place frequency band that calculates.Promptly, judge barycentric coordinates x according to the residing coordinate range of each frequency band
CFallen into which frequency band, and corresponding band index.
For example: first zone shown in Figure 3 " is shown, is ", calculate comma barycentric coordinates in the horizontal direction after, the frequency band of determining its place according to barycentric coordinates is that band index is 6 frequency band.
Especially, for the text document of horizontally-arranged, the calculated level center of gravity, then calculative for the text document of vertical setting of types is the center of gravity of vertical direction, the computing formula of vertical center of gravity can obtain according to horizontal calculation formula of gravity centre analogy.
When needs during according to center calculation, corresponding formulas is also arranged, enumerate no longer one by one herein.
Need to prove that many punctuates have hangover, for example comma just has afterbody of point.At this moment as if being image with paper by device scans such as scanners, then it is than the situation that afterbody of point easily disappears or ruptures, and promptly the point of its afterbody has lacked.If at this moment the position of representing punctuate with the center or the left and right border of punctuate then error can occur.If represent the position of punctuate with the center of gravity of punctuate, then because the disappearance of indivedual points or increase are very little to the change of whole punctuate centre of gravity place, in the actual computation process, after this value rounded up, the barycentric coordinates value that obtains can not change basically, therefore representing the position of punctuate with the centre of gravity place of punctuate, is best selection, can obtain better stability.
S103: embed rule according to the information of setting, determine the coding to be embedded in each available punctuate zone.Wherein, information embedding rule can be provided with arbitrarily and select.Specifically comprise:
At first, according to the information to be embedded of text document correspondence, determine binary number to be embedded; The figure place of the binary number of wherein determining to be embedded is smaller or equal to the available punctuate region quantity of determining.
If information to be embedded itself is a binary number, and its figure place directly determines then that smaller or equal to the available punctuate region quantity of determining this embedding information is binary number to be embedded.
If information to be embedded itself is a binary number, but its figure place is greater than the available punctuate region quantity of determining, then select the binary number of a units smaller or equal to the available punctuate region quantity of determining, as binary number to be embedded, and set up the corresponding relation of selected binary number and information to be embedded, and with information to be embedded, with and get off with the corresponding relation keeping records of the binary number of selecting, for example be recorded in the database.
If information to be embedded itself is not a binary number, but be converted into binary number by the system conversion energy, and the binary number figure place that conversion obtains determines then that smaller or equal to the available punctuate region quantity of determining transforming the binary number that obtains is binary number to be embedded.
If information to be embedded itself is not a binary number, but be converted into binary number by the system conversion energy, and the binary number figure place that conversion obtains is greater than the available punctuate region quantity of determining, then select the binary number of a units smaller or equal to the available punctuate region quantity of determining, as binary number to be embedded, and set up the corresponding relation of selected binary number and described information to be embedded, and with information to be embedded, with and note with the corresponding relation of the binary number of selecting, for example be kept in the database.
If information to be embedded itself is not binary number, and can not be converted into binary number the time; Then select the binary number of a units smaller or equal to the available punctuate region quantity of determining, as binary number to be embedded, and set up the corresponding relation of selected binary number and information to be embedded, and with information to be embedded, with and note with the corresponding relation of the binary number of selecting, for example be kept in the database.
Then, according to the information embedding rule of binary number to be embedded and setting, determine the coding to be embedded in each available punctuate zone.Be specially: according to the figure place of binary number to be embedded and the quantity in available punctuate zone, embed rule, determine the coding to be embedded in each available punctuate zone according to the information of setting.
If the figure place of binary number to be embedded equals the quantity in available punctuate zone, then directly distribute, promptly the binary coding that directly binary number to be embedded is comprised is distributed to each available punctuate zone respectively as coding to be embedded.
If binary number figure place to be embedded then distributes the binary coding that comprises in the binary number to be embedded as coding to be embedded by redundant arithmetic respectively for each available punctuate zone less than the quantity in available punctuate zone.If the quantity in available punctuate zone is M, the figure place of binary number is N, and M<N; Then calculate M/N, obtain quotient and the remainder.The available punctuate zone of remainder correspondence is cast out, be remaining available punctuate region allocation coding to be embedded then, the merchant who for example obtains is 3, then be that first binary coding comprising in 1-3 the available punctuate region allocation binary number is as coding to be embedded, be that second binary coding comprising in 4-6 the available punctuate region allocation binary number is as coding to be embedded, ..., by that analogy.
Continue to use the example of top, if 123 these numbers are embedded text document as information to be embedded, then be converted into binary number: 1111011 with 123, because the figure place of this binary number is 7, and the available punctuate zone of determining also is 7, therefore directly distribute to 7 available punctuate zones shown in Fig. 2 with 1111011, wherein first binary coding 1, the zone " figure of this binary number distributed in zone " show, be ".First " distribute this binary number second binary coding 1 ..., or the like.
For example: with 345 these numbers as information to be embedded, then be converted into binary number: 101011001 with 345, because the figure place of this binary number is 9, and the available punctuate zone of determining also is 7, therefore select the binary number of a units at random smaller or equal to available punctuate region quantity.Can select one 7 binary number also can select one herein less than 7 binary number.Set up the binary number of selection and the corresponding relation of 345 these information to be embedded, and store 345 these information to be embedded and with selected.
Especially, known or when setting at random when the corresponding coding to be embedded in each available punctuate zone, then can omit step S103, behind execution of step S102, directly enter step S104.
S104:, adjust the position of punctuate in each available punctuate zone according to the original position and the corresponding coding to be embedded of punctuate in each available punctuate zone.Can realize that by the position of adjusting punctuate each corresponding coding to be embedded in available punctuate zone is embedded in each available punctuate zone.
Wherein, each corresponding coding to be embedded in available punctuate zone comprises 1 or 0, parity (being called for short the index parity) according to the band index of the frequency band at the parity of coding to be embedded and punctuate place, position to punctuate is adjusted, its position is changed, reach the purpose of embedding information.Be specially:
(i) if the band index of the original position correspondence at punctuate place is an odd number in certain available punctuate zone, and the to be embedded of this available punctuate zone correspondence is encoded to 0, position to the band index that then moves punctuate in this available punctuate zone is the frequency band of even number, and promptly the position of mobile punctuate makes its band index become even number.
(ii) if the band index of the original position correspondence at punctuate place is an odd number in certain available punctuate zone, and the to be embedded of this available punctuate zone correspondence is encoded to 1, then do not change the frequency band at punctuate place in this available punctuate zone, promptly the parity of the band index at punctuate place is constant.This moment is better for the stability that makes embedding information, and also the position of mobile punctuate is according to circumstances seen following explanation for details.
(iii) if the band index of the original position correspondence at punctuate place is an even number in certain available punctuate zone, and the to be embedded of this available punctuate zone correspondence is encoded to 0, then do not change the frequency band at punctuate place in this available punctuate zone, promptly the parity of the band index at punctuate place is constant.This moment is better for the stability that makes embedding information, and also the position of mobile punctuate is according to circumstances seen following explanation for details.
(iv) if the band index of the original position correspondence at punctuate place is an even number in certain available punctuate zone, and the to be embedded of this available punctuate zone correspondence is encoded to 1, position to the band index that then moves this available punctuate zone punctuate is the frequency band of odd number, and promptly the position of mobile punctuate makes its band index become even number.
After all available punctuate zones that need embed coding to be embedded were all carried out above-mentioned processing, the watermark information of just finishing text document embedded.
Continue to use above-mentioned example, when information to be embedded is 123 these numbers, to be embeddedly be encoded to 1 corresponding to what first available punctuate zone " showed; be ", the frequency band band index 6 at punctuate place is an even number in this zone, and according to the (iv) bar of above-mentioned mobile rule, the frequency band that it need be moved to band index and be odd number gets on, it is 5 or 7 frequency band that punctuate in then will this available punctuate zone moves to adjacent band index, is to be to be example on 5 the frequency band to move on to band index in the present embodiment.In like manner the punctuate with other 6 zones moves according to above-mentioned rule, and the position of punctuate as shown in Figure 4 in each available punctuate zone, mobile back.
The above-mentioned parity that (ii), does not (iii) need to change punctuate place frequency band under two kinds of situations, if but punctuate just in time is positioned near the edge of frequency band, then very unstable, (for example print, scanning etc.) causes the parity upset of its band index probably because various interference.Being in the 3rd frequency band of punctuate in certain available punctuate zone for example, the occurrence of its position is 3.01, because certain interference makes it become 2.99, then the parity of its band index has become 2 by odd number 3, then can make a mistake when subsequent detection.Therefore adjust to the centre of its place frequency band for the punctuate position of the parity upset that does not need to carry out band index, can increase stability greatly.The occurrence that is arranged in the 3rd frequency band position of for example above-mentioned punctuate is 3.01, it can be adjusted into 3.5, then can increase its stability effectively.
In like manner, change the punctuate of the band index parity of place frequency band, also it should be moved to the place, centre position of the frequency band that will adjust to, so that the stability of its present position is better for needs.For example above-mentioned available punctuate zone " shows, is ", and its index is adjusted at 5 o'clock from 6, and the occurrence that makes its present position is 5.5.
The watermark extracting method that the embodiment of the invention provides is used to extract and uses the embedded watermark information of above-mentioned watermark embedding method, its process flow diagram as shown in Figure 5, execution in step is as follows:
S201: the identical regional selection rule that is adopted according to embed watermark information the time, search and determine the available punctuate zone in the text document that has embedded information.
This step determines that the process in available punctuate zone specifically with step S101, repeats no more herein.Different is can choose at random regional selection rule at step S101, promptly can define the initial border in available punctuate zone and stop the border, and the border selection rule that is adopted must adopt handled text document embed watermark information in this step the time, identical initial border and termination border when using with embed watermark information to available punctuate zone.
Continue to use the example of top, then can determine 7 available punctuate zones in the text fragments shown in Figure 4.
S202: rule is determined in position identical according to embed watermark information the time, determines the position at punctuate place in each available punctuate zone respectively.
The process of determining punctuate position in the available punctuate zone is specifically with step S102.
Different is: when definite punctuate position, the position of the identical available punctuate of location parameter (center of gravity of punctuate, center, left margin or right margin etc.) representative when needing to use with the information of embedding, for example, represent the position of punctuate during embedding information with center of gravity, then also must represent the position of punctuate in this step with center of gravity.Accuracy with the frequency band at the punctuate place that guarantees to determine and corresponding band index.
Continue to use the example of top, " show, be " that for first available punctuate zone the frequency band of determining the punctuate place is 5 or 7.
S203:, determine the embedded coding in each available punctuate zone respectively according to the position at punctuate place in each available punctuate zone.Be specially:
According to the parity of the band index of punctuate place frequency band correspondence in the available punctuate zone, determine to embed the embedded coding in this available punctuate zone.
If the band index of the frequency band correspondence at punctuate place is an even number in the available punctuate zone, determine that then the embedded coding in this available punctuate zone is 0.
If the band index of the frequency band correspondence at punctuate place is an odd number in the available punctuate zone, determine that then the embedded coding in this available punctuate zone is 1.
Continue to use the example of top, the frequency band at punctuate place is 5 or 7 in " showing, be " according to first available punctuate zone of determining, and the embedded coding that obtains in this zone is 1.
Above-mentioned watermark extracting method can also comprise the steps:
S204: during according to embed watermark information identical information embed rule and each available punctuate zone of determining in embedded coding, obtain the embedding information in the text document.Be specially:
During according to embed watermark information identical information embed rule and each available punctuate zone of determining in embedded coding, obtain embedding the binary number in the text document.
According to the binary number that obtains, obtain the embedding information of text document correspondence.Specifically comprise: when being embedding information, then can directly arrive embedding information as if binary number; If binary number when not being embedding information, is searched the corresponding relation of binary number and the information of embedding of the embedding of storage, get access to embedding information.
Continue to use the example of top, the allocation rule during according to embedded coding in each available punctuate zone and embed watermark information, the binary number that obtains embedding in the text fragment is 1111011, further can recover the information that embeds text document is 123.
In above-mentioned watermark embedding and the extracting method, can also can on paper, directly divide by the scintigram of computer software analysis text by manually with instruments such as scales in the division of embed watermark time-frequency band; When extracting watermark, also can extract accordingly, obtain each corresponding embedded coding in available punctuate zone by computer software or manual type; And then can recover the binary number of embedding.
According to above-mentioned watermark embedding method, can make up a kind of watermark flush mounting, as shown in Figure 6, comprising: regional determination module 101, position determination module 102 and information merge module 103.
Zone determination module 101 is used for determining rule according to the zone of setting, and determines the available punctuate zone in the text document of information to be embedded.
Preferable, regional determination module 101 further can comprise: acquiring unit 1011, punctuate determining unit 1012 and regional determining unit 1013.
Acquiring unit 1011, the plain text zone that is used for obtaining text document.
Punctuate determining unit 1012, literal cutting and punctuate analysis are carried out in the plain text zone that is used for that acquiring unit 1011 is got access to, and determine the available punctuate that comprises; Wherein, the front and back of available punctuate all have other adjacent characters at least.
The zone determining unit is used for two adjacent other characters of available punctuate and front and back thereof of determining according to punctuate determining unit 1012, defines initial border and stops the border, obtains available punctuate zone.
Position determination module 102 is used for determining rule according to the position of setting, and determines the original position of punctuate in each available punctuate zone respectively.
Preferable, position determination module 102 further can comprise: frequency band division unit 1021 and position determination unit 1022.
Frequency band division unit 1021, the initial border that is used to calculate each available punctuate zone are several frequency bands according to above-mentioned distance with each available punctuate area dividing to the distance that stops the border.
Position determination unit 1022 is used for the coordinate position according to the punctuate place in each available punctuate zone, determines the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively.
Information merge module 103 is used at each available punctuate zone, according to the original position and the corresponding coding to be embedded of punctuate wherein, the position of punctuate is wherein adjusted, and realizes the coding to be embedded of correspondence is embedded in the corresponding available punctuate zone.
Above-mentioned watermark flush mounting also comprises: coding assignment module 104, be used for information to be embedded according to the text document correspondence, and determine binary number to be embedded; Wherein, the figure place of binary number is smaller or equal to the available punctuate region quantity of determining; And embed rule according to the information of binary number to be embedded and setting, determine the coding to be embedded in each available punctuate zone.
According to above-mentioned watermark extracting method, can make up a kind of watermark extraction apparatus, as shown in Figure 7, comprising: regional determination module 201, position determination module 202 and coding extraction module 203.
Zone determination module 201 is used for zone identical according to embed watermark information the time and determines rule, determines the available punctuate zone in the text document of the information that embeds.
Preferable, regional determination module 201 further can comprise: acquiring unit 2011, punctuate determining unit 2012, regional determining unit 2013.
Acquiring unit 2011, the plain text zone that is used for obtaining text document.
Punctuate determining unit 2012, literal cutting and punctuate analysis are carried out in the plain text zone that is used for that acquiring unit 2011 is got access to, and determine the available punctuate that comprises; Wherein, the front and back of available punctuate all have other adjacent characters at least.
Zone determining unit 2013, available punctuate zone is determined on the available punctuate that is used for determining according to punctuate determining unit 2012, the initial border that defines when adjacent two other characters and embed watermark information before and after it and stop the border.
Position determination module 202, rule is determined in position identical when being used for basis with embed watermark information, determines the position at punctuate place in each available punctuate zone respectively.
Preferable, position determination module 202 further can comprise: frequency band division unit 2021 and position determination unit 2022.
Frequency band division unit 2021, the initial border that is used to calculate each available punctuate zone are several frequency bands according to above-mentioned distance with each available punctuate area dividing to the distance that stops the border.
Position determination unit 2022 is used for and according to the coordinate position at the punctuate place in each available punctuate zone, determines the frequency band and the corresponding band index at punctuate place in each available punctuate zone respectively.
Coding extraction module 203 is used for the position at the punctuate place determined according to position determination module 202, determines the embedded coding in each available punctuate zone respectively.
Above-mentioned watermark extraction apparatus also comprises: information is recovered module 204, and identical information embeds the embedded coding in rule and each available punctuate zone of determining when being used for according to embed watermark information, obtains embedding the binary number in the text document; And, obtain the embedding information of text document correspondence according to above-mentioned binary number.
The watermark embedding that the embodiment of the invention provides and extracting method and device are by choosing available punctuate zone; By the punctuate position in each available punctuate zone is adjusted, realize coding to be embedded is embedded in the corresponding available punctuate zone.When extracting watermark, then adopt corresponding rule to extract embedded coding in each available punctuate zone respectively according to adjusted punctuate position.Said method is simple to operate, and because human eye is far smaller than change to character position to the susceptibility of punctuate position change, therefore can make change by a relatively large margin, make that the watermark information stability that embeds is high, it is good to hide property, can obtain good visual effect simultaneously.
In addition, can also be according to information to be embedded, determine binary number to be embedded, and then embed regular by the information of setting, be each available punctuate region allocation coding to be embedded, according to the value of coding to be embedded and the original position of punctuate (frequency band at place), the position of adjustment and mobile punctuate, the information of finishing embeds; Safety, reliability height that information embeds, the accuracy during information extraction is also than higher.Especially when changing the punctuate center of gravity and come embed watermark information, can obtain better, more stable, embed and the Detection and Extraction effect more accurately.
Said method that the embodiment of the invention provides and device, the information hiding of embed watermark information is good, and has very high robustness, can resist repeatedly print, the attack of duplicating, convergent-divergent, watermark extracting can realize blind Detecting, computing is quick, is applicable to vision is required height, and robustness is wanted demanding occasion.Solved prior art because implementation cost is too high, or stability and the contradiction of visual effect can't solve and causes the problem that is difficult to use in practice.
The above; only be the preferable embodiment of the present invention; but protection scope of the present invention is not limited thereto; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily, replace or be applied to other similar devices, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.