CN103761700A

CN103761700A - Watermark method capable of resisting printing scanning attack and based on character refinement

Info

Publication number: CN103761700A
Application number: CN201310717967.7A
Authority: CN
Inventors: 夏志华; 王淑芳; 孙星明
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2013-12-23
Filing date: 2013-12-23
Publication date: 2014-04-30

Abstract

The invention discloses a watermark method capable of resisting printing scanning attack and based on character refinement. The watermark method comprises the steps of watermark embedding and watermark extracting; as for watermark embedding, firstly, binaryzation preprocessing, character segmentation, character refinement and segmentation of refinement results are conducted on a text image, so that higher robustness is obtained; then each line is divided into a plurality of embedded units, 1 is embedded by increasing the heights of front half characters of each embedded unit, and 0 is embedded by increasing the heights of rear half characters of each embedded unit; as for watermark extracting, embedded information is read by comparing the heights of the front half characters of each embedded unit and the heights of the rear half characters of each embedded unit. The watermark method has the advantages that the structure is simple, the embedded capacity is large, the robustness is high, and the printing scanning attack can be resisted effectively.

Description

A kind of water mark method of the attacks of print_scan resisted based on thinning

Technical field

The present invention relates to Text Watermarking technical field, relate in particular to a kind of water mark method of the attacks of print_scan resisted based on thinning.

Background technology

Along with the fast development of digitizing technique, electronic document is widely used in a plurality of fields, and it can be copied arbitrarily or the shortcoming such as modification has also caused many safety problems, this make to the protection of electronic document become current in the urgent need to.Text digital water mark technology is a kind of by the information such as copyright being embedded into the effective ways that prevent document illegally to be propagated in electronic document; but due to the current print scanned universal way that has become document copying and propagation, this makes to take the text digital water mark technology that electronic document is object of protection not to be highly resistant to attacks of print_scan.So it is significant to current text copyright protection that research can be resisted print scanned Text Watermarking.

Since the nineties in last century; digital watermark technology obtains fast development; Text Watermarking is as an important branch of digital watermarking; it is a kind of technology of protecting text data information; its object is information hiding in text carrier can be inviolable be retained in data; and can be as required at any time watermark extracting with recover, thereby for determining that the abuse that works are implemented provides strong evidence.

Current text digital water mark algorithm is mainly divided into three major types: the text digital water mark method based on electronic document structure, the text digital water mark method based on natural language processing and the Text Watermarking method based on text image.

Text digital water mark method based on electronic file form embeds by adjusting the document format information realization watermark of human eye None-identified conventionally, the method has hides more greatly capacity, good concealment, but its robustness is not high, can not effectively resist attacks of print_scan.

Natural Language Watermarking method based on semantic is mainly to utilize natural language processing technique, in the situation that not changing text original meaning by equivalent information replace, the way such as voice conversion is in watermark embedded text, mainly be divided at present two classes: a class is based on syntactic structure, another kind of based on semanteme.The method also has good disguise, but embedding capacity is very limited, but this method is suitable for carrier content, does not allow the occasion of arbitrarily revising.

The current Text Watermarking method based on text image is mainly to carry out for transform domain and the spatial domain of image.Transform-domain algorithm can select the high frequency coefficient in the transform domains such as discrete cosine transform (DCT, Discrete Cosine Transform) or wavelet transform (DWT, Discrete Wavelet Transform) to carry out embed watermark information conventionally.Although transform domain method has stronger robustness, with respect to the algorithm of spatial domain, it can not guarantee the visual effect of text image after embed watermark well, and robustness good in the imagination not.

Spatial domain algorithm is mainly by realizing based on pixel upset with based on two kinds of methods of Document Segmentation.Text image water mark method based on pixel upset is set certain pixel upset rule conventionally, and to make after pixel upset image vision distortion drop to minimum, and it is very responsive to the print scanned noise that waits.Water mark method based on Document Segmentation is generally to realize by adjusting the document format information (as the font of text, color, line space, word space etc.) of human eye None-identified.

Brassil etc. have proposed three kinds of methods for text document watermark, are respectively that row moves coding, Word-shift and feature coding.The above two are all can embed row or character block carries out up and down or moves left and right according to embedding information, and will as control information, keep motionless with row or the piece of mobile row or character block direct neighbor.These two kinds of methods all need using the element adjacent with mobile element as control information when embed watermark, and this has reduced embedding capacity greatly; And need to utilize original document when detecting, can not realize blind Detecting, this not too realistic application.Feature coding is by specific mode, to change some specific text feature (as vertical end line) to carry out embedded text information, need equally the document of former beginning and end embedding, and watermark information is very easily subject to the impact of noise during watermark detection.

The people such as Tan have proposed to utilize character stroke, and character up and down structure carrys out embed watermark information.Although the method for the architectural feature of this class based on character has the ability of good opposing attacks of print_scan, it need to carry out stroke to character, and stroke such as cuts apart at the operation of series of complex, implements some complexity.

In sum, still there is following defect in above method: the water mark method based on electronic document structure can not be resisted attacks of print_scan effectively; Water mark method embedding capacity based on natural language processing is very limited; Three kinds of methods that Brassil proposes can not realize blind extraction, and embedding capacity is fewer comparatively speaking; And water mark method based on charcter topology is realized more complicated.On the other hand, for attacks of print_scan, the line space in text image and word space are metastable features, and human vision is also insensitive to its small change, and therefore the watermarking algorithm based on text image line space and word space is compared with horn of plenty.So we,, on the basis of this thought, combine text image line space and thinning robustness to realize the water mark method that can resist attacks of print_scan.

Summary of the invention

Technical matters to be solved by this invention is the defect for background technology, and the water mark method that a kind of simple in structure, embedding capacity is large, robustness is higher, can effectively resist attacks of print_scan is provided.

The present invention is for solving the problems of the technologies described above by the following technical solutions:

A water mark method for the attacks of print_scan resisted based on thinning, comprises two parts of watermark embedding and watermark extracting, and the step that watermark embeds is as follows:

Steps A), text image is carried out to binary conversion treatment;

Step B), the text image after binary conversion treatment is carried out to Character segmentation, obtain the housing boundary information of each character;

Step C), each character is carried out to thinning;

Step D), thinning result is carried out to Character segmentation, obtain the inner frame boundary information of each character;

Step e), according to inner frame boundary information, calculate width and the height of each character, weed out width and be highly less than be expert at mean breadth and the character of mid-height value.

Step F), using frame up-and-down boundary in character and mean value as the height of character, calculate the average height of every row, and the not disallowable character of residue moved to be expert at average height;

Step G), according to the quantity of character in every row, be divided at least one embedded unit, for each embedded unit, if watermark information is 1, will on each character of first half in embedded unit, move, make its height higher than be expert at average height, if watermark information is 0, will on each character of back part in embedded unit, move.

The step of watermark extracting is as follows:

Step 1), by the text image scanning that contains watermark information, be gray level image;

Step 2), gray level image is carried out respectively to binary conversion treatment, slant correction operation;

Step 3), the gray level image after binaryzation, slant correction are processed carries out Character segmentation, obtains the housing boundary information of each character;

Step 4), each character is carried out to thinning;

Step 5), the character after thinning is carried out to Character segmentation, obtain the inner frame boundary information of each character;

Step 6), adopt and step G) in identical group technology every a line is divided into at least one embedded unit, and relatively each embedded unit first half character and latter half character height and size, if the height of first half and be less than latter half height and, the watermark information that extracts this embedded unit is 1, otherwise, if the height of first half and be greater than latter half height and, the watermark information that extracts this embedded unit is 0;

Step 7), the watermark information that output extracts.

The key of watermarking algorithm guarantees the visual effect of carrier image as much as possible when being to guarantee the correct extraction of watermark.On the one hand by adjusting the relative height of character, realize watermark and embed, adjusting range more can guarantee more greatly the accuracy of extracting; Watermark after embedding on the other hand need to be resisted attacks of print_scan, but also will have not sentience, so will keep to the greatest extent adjusting range.To this, so in order to take into account above 2 points, the object of adjusting height is not character but the refinement result of character, because just can utilize like this robustness of refinement to represent the robustness of watermark.In order to guarantee the robustness of watermark, we can adopt to character the adjusting range of ' n ' font in addition, have taken into account like this requirement of extracting accuracy and visual effect.

As the further prioritization scheme of the water mark method of a kind of attacks of print_scan resisted based on thinning of the present invention, the step of described binary conversion treatment is as follows:

Steps A 1), obtain maximum gradation value and the minimum gradation value of text image, get the mean value of maximum gradation value and minimum gradation value as threshold value;

Steps A 2), according to threshold value, pixel in text image is divided into two groups, wherein the gray-scale value of one group of pixel is more than or equal to threshold value, and the gray-scale value of another group pixel is less than threshold value;

Steps A 3), obtain respectively the average gray value of two groups of pixels, get itself and half for threshold value;

Steps A 4), circulation step A2) to steps A 3), until the average gray value of two groups of pixels all no longer changes;

Steps A 5), get two groups of pixels average gray value and half for segmentation threshold;

Steps A 6), adopt segmentation threshold to carry out binaryzation operation to the pixel in text image, the pixel value that gray-scale value is more than or equal to the pixel of segmentation threshold is made as 0, and the pixel value that gray-scale value is less than the pixel of segmentation threshold is made as 1.

The noise producing in print scanned process very likely causes watermark to embed the inconsistent of front and back Character segmentation result.So must adopt suitable denoising and Binarization methods to carry out pretreatment operation to image.Because general Binarization methods has certain denoising effect, so native system directly adopts process of iteration to carry out denoising and binary conversion treatment to text image.

As the further prioritization scheme of the water mark method of a kind of attacks of print_scan resisted based on thinning of the present invention, the step of described slant correction is as follows:

Step 201), read in image and it carried out to horizontal projection operation, calculate its interlinear spacing spacing and;

Step 202), be less than the angle of 0.08 ° respectively image is carried out clockwise and be rotated counterclockwise and calculate its correspondence blank spacing and;

Step 203), by step 202) result and step 201) result compare, correct sense of rotation when wherein the sense of rotation of higher value corresponding image is correcting image;

Step 204), according to correct direction, image is rotated and is calculated interlinear spacing spacing before and after rotation and, repeat this operation until postrotational interlinear spacing spacing and be less than or equal to the value before rotation;

Step 205), reduce to rotate step-length;

Step 206), repeating step 204) to step 205) until rotation step-length is less than predefined precision;

Step 207), the angle of cumulative each rotation obtains the angle of inclination of this image.

Text image experiences possibly repeatedly analog to digital-digital to analog converter in print scanned process, during may produce random noise, and very likely cause text image run-off the straight, watermark information is caused to interference.For this phenomenon, we are the numerical characteristic before and after tilting by the multiple text of counting statistics, finally finds interlinear spacing spacing and show very large gap before and after text tilts.Be embodied in interlinear spacing spacing and reduce with the increase at angle of inclination.So only needing image rotating progressively, we find interlinear spacing spacing and the corresponding anglec of rotation is image maximum time angle of inclination.Finally utilize rotation function to correct image.Experimental data shows that this algorithm can be controlled at 0.0001 by image rectification precision, for the correct extraction of accurately Character segmentation and watermark lays the foundation.

As the further prioritization scheme of the water mark method of a kind of attacks of print_scan resisted based on thinning of the present invention, the step of described thinning is as follows:

Step C1), read in image, for each character, scan its boundary pixel point; Contrast template delete character boundary pixel point;

Step C2), repeatedly perform step C1) until character picture structure no longer change.

In morphology, the refinement of image is by continuous corrosion operation and opens operation and realize.Every thinning process of taking turns is comprised of 8 templates, and character is carried out to refinement just can not obtain the final refinement result of character until refinement result does not change continuously.Image-region can be imagined as to the smooth prairie that a monoblock has hay, if light a fire along the border on grassland, the edge of flame can advance to central area with same speed.The set of the point that the forward position that the key part of regional center is exactly more than place's flame arrives at synchronization.Although only had a pixel wide by the character after refinement, it has retained geometry and the topological property of character.These characteristics have very important effect for the robust identification of character.

The step of described Character segmentation is as follows:

Step 1), the projection of image being carried out to Row Column obtains connected domain segmentation result;

Step 2), calculating each connected domain and this connected domain and its right side connected domain merges the ratio of width to height of rear connected domain and compares these two values; Which kind of font, font size according to no matter, outside the indivedual Chinese characters of depolarization, the depth-width ratio of all the other Chinese characters all, close to 1 thought, is divided into a character by the corresponding connected domain of the value that more approaches 1.

Because the embedding of this system and extraction algorithm are that relative height based on character realizes, so accurately Character segmentation is the correct important prerequisite of extracting of watermark.The projection that first this system carries out Row Column to character obtains the Accurate Segmentation of each connected domain; Then the Character segmentation method based on connected domain structure, and the specific processing rule of various character design is merged different connected domains and fractionation obtains Character segmentation result accurately.Wherein connected domain merge and split according to being: no matter which kind of font, font size, outside the indivedual Chinese characters of depolarization (punctuation mark, " " etc.), the depth-width ratio of all the other Chinese characters is all close to 1.

Character segmentation correctness directly affects whether watermark can correctly extract and the visual effect of the text image of watermark embedding front and back.Character segmentation algorithm based on projection, principle is simple, but can not well handle the font of left and right structure well.Partitioning algorithm based on connected component more needs to solve the combination problem of each connected component of character.Preliminary experiment shows no matter be which kind of scheme, all cannot perfectly solve Character segmentation problem, in order to realize better segmentation effect, we need to be on the basis of existing algorithm, for some special rules formulated in various special characters, to obtain, divide more accurately word effect.

The present invention adopts above technical scheme compared with prior art, has following technique effect:

The present invention proposes a kind of anti printing and scanning Chinese text New Scheme of Image Watermarking based on thinning, mainly utilize the robustness of thinning and water mark method combination to realize watermark robustness.Even if make electronic document still can effectively extract the watermark information being embedded in text after attacks of print_scan, and then realized the anti printing and scanning object of text image watermark.Utilize this water mark method not only can avoid the complex process to text image, and algorithm realization is easily understood.

The present invention, for content of text copyright protection provides new gordian technique, has improved the lower problem of robustness that text digital water mark technology exists always, for further applying of digital watermark technology provides theoretical foundation and method support.

Accompanying drawing explanation

Fig. 1 is that watermark embeds flow process;

Fig. 2 is watermark extracting flow process;

Fig. 3 is that watermark embeds schematic diagram;

Fig. 4 is the horizontal projection result of non-tilted image;

Fig. 5 is tilted image horizontal projection result;

Fig. 6 is ' n ' automatic moving pattern diagram.

embodiment

Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:

As illustrated in fig. 1 and 2, the invention discloses a kind of water mark method of the attacks of print_scan resisted based on thinning, comprise two parts of watermark embedding and watermark extracting, it is characterized in that, the step that watermark embeds is as follows:

Step 1), text image is carried out to binaryzation pre-service;

Step 2), the text image after binaryzation is carried out to Character segmentation, obtain the housing boundary information of each character;

Step 3), on the basis of Character segmentation, each character is carried out to thinning operation;

Step 4), utilize the method similar with Character segmentation to obtain the segmentation result of each thinning result, obtain the inner frame boundary information of each character;

Step 5), according to housing boundary information, calculate width and the height of each character, weed out width and be highly less than be expert at mean breadth and the character of mid-height value;

Step 6), using frame up-and-down boundary in character and mean value as its height; Then calculate the average height of every row, and the not disallowable character of residue is moved to be expert at average height;

Step 7), according to the quantity of character in every row, be divided into 1 (character number <20) or 2 (character number >20) embedded units, then the character in each embedded unit is divided into 2 groups; If watermark information is 1, will on each character in embedded unit last group, move, make its height higher than be expert at average height; Otherwise, if watermark information is 0, above move rear one group of each character in embedded unit.

Fig. 3 is the instantiation that watermark embeds, and process is followed successively by from top to bottom: every row average height-move character embed watermark according to watermark information to obtain embedding result is cut apart-obtained to input text image-Character segmentation-thinning also.

The step of watermark extracting is as follows:

Step 2), gray level image is carried out to the operation of binaryzation, slant correction and obtain binary image;

Step 3), to 2) result carry out Character segmentation operation and obtain character housing border;

Step 4), 3) basis on character is carried out to Refinement operation;

Step 5), utilize the method similar with Character segmentation to obtain thinning segmentation result and obtain inner frame boundary;

Step 6), utilize and embedding grammar in identical group technology each line of text is divided to the grouping in embedded unit and unit, and relatively each embedded unit two groups of front and back height and size;

Step 7), if forward part and be less than rear section is extracted watermark information 1; Otherwise extract 0;

Step 8), output watermark information.

The key of watermarking algorithm guarantees the visual effect of carrier image as much as possible when being to guarantee the correct extraction of watermark.On the one hand by adjusting the relative height of character, realize watermark and embed, adjusting range more can guarantee more greatly the accuracy of extracting; Watermark after embedding on the other hand need to be resisted attacks of print_scan, but also will have not sentience, so will keep to the greatest extent adjusting range.To this, so in order to take into account above 2 points, the object of adjusting height is not character but the refinement result of character, because just can utilize like this robustness of refinement to represent the robustness of watermark.In addition character is adopted the adjusting range of ' n ' font, taken into account like this requirement of extracting accuracy and visual effect.

The step of binary conversion treatment is as follows:

Step 1), obtain the minimax gray-scale value of text image, and obtain both mean value as the threshold value of image;

Step 2), the threshold value of obtaining according to upper step is divided into two groups by text image pixel; Wherein one group is the pixel that gray-scale value is greater than threshold value, and another group is for being less than the pixel of threshold value;

Step 3), obtain respectively the average gray value μ of two groups ₁and μ ₂;

Step 4), obtain threshold value T ₁=(μ ₁+ μ ₂)/2;

Step 5), circulation step 2) to step 4), until the average gray value μ in every group ₁and μ ₂no longer change; T ₁be segmentation threshold.

Step 6), utilize T ₁pixel in text image is carried out to binaryzation operation, gray-scale value is more than or equal to T ₁the pixel value of pixel be made as 0, gray-scale value is less than T ₁the pixel value of pixel be made as 1.

Carry out before watermark extracting needing the paper document that contains watermark information to be scanned into electronic document, what in experiment, use is that the printer HP LaserJet M1536dnf MFP of Hewlett-Packard of print scanned one prints scanning to document.Text image after scanning is generally gray level image, thus before further being operated, it needs to carry out binary conversion treatment, to reach except the object of making an uproar.Image is carried out finding after binaryzation before and after image inclination to text is carried out to the result that horizontal projection obtains obviously different.Fig. 4 and Fig. 5 are that the same Zhang Wenben of test carries out the result of horizontal projection under inclination and non-heeling condition.Can find out, image inclination angle is larger, the interlinear spacing spacing that horizontal projection obtains and (S) less.So, we according to this point by continuous rotary test and calculate corresponding interlinear spacing spacing and find the angle of inclination of image and proofreaied and correct.

The false code of text image slant correction is as follows:

Regulation: I: Chinese text image, S _r(S _l): image I through the interlinear spacing spacing along after (contrary) hour hands and, S _a(S _b): after image I rotation the blank spacing of (front) and, A: the anglec of rotation, P: given precision

Load image I also carries out horizontal projection by image and calculates its interlinear spacing spacing and S;

Respectively clockwise and be rotated counterclockwise image and calculate corresponding S _a, S _b;

In order to assess above-mentioned slant correction algorithm, we carry out 10 different text images respectively clockwise and are rotated counterclockwise 0.4 °, then utilize the slant correction algorithm of above-mentioned proposition to calculate its angle of inclination.As shown in Table 1, this algorithm correction accuracy is controlled at 0.0001 to result of calculation as can be seen from the table, and maximum error is 0.00547 °.

The watermark of this programme embeds and extracts is mainly to realize by adjusting the relative height of character.Telescopiny is processed line by line, and the watermark information of embedding is 0/1 bit string.Character is cut apart, after refinement and again dividing processing, drawn the frame after thinning.In this programme, by the height of character, the mean value with inside casing up-and-down boundary represents.First according to the up-and-down boundary of inside casing, draw the average height of every line character, then will adjust to sustained height with line character; Significant character in colleague is divided into two groups or four groups, embeds at 1 o'clock, on the first half character of embedded unit, move M pixel value; Embed at 0 o'clock and on latter half character, move M pixel value.

The step of described Character segmentation is as follows:

Step 2), connected domain is cut apart the rear processing to special character;

Because some special character has impact to follow-up Chinese character segmentation operation, therefore, after Preliminary division connected domain, just we can be according to its characteristic information (as position coordinates, point set density etc.) identify, after union operation in carry out special processing.This type of character comprises the punctuation marks such as comma, quotation marks, suspension points, also has stroke easy in some Hanzi structures and that punctuation mark is obscured.

Equally, there are some Chinese characters, due to the singularity of its structure, after Chinese character segmentation operation in can impact the Chinese character segmentation effect of front and back, and then affect whole effect.Therefore, directly take corresponding operating herein, after connected domain is cut apart, before Chinese character segmentation, according to the rule of setting, identified, after this this type of word just no longer participates in Chinese character segmentation operation.

Step 3) connected domain merges;

Connected domain is not equal to Chinese character, and the Chinese character of many left and right structure is all split into two connected domains after connected domain is tentatively cut apart, and therefore need to carry out cutting apart of Chinese character.When cutting apart, need to design font size, font, the irrelevant feature of bold Italic and identify.This step takes this feature of depth-width ratio to carry out Chinese character segmentation, and it is according to being: no matter which kind of font, font size, outside the indivedual Chinese characters of depolarization, the depth-width ratio of all the other Chinese characters is all close to 1.Specific implementation process is:

Step 3.1) calculate the depth-width ratio of selected connected domain, if this depth-width ratio is lower than predetermined threshold value (this threshold value is to be less than 1 numerical value), the width that represents this connected domain is greater than height, thinks that this connected domain independently becomes a Chinese character, otherwise turns to step 3.2);

Step 3.2) calculate and choose the next connected domain that connected domain is adjacent to merge depth-width ratio afterwards, if this depth-width ratio chooses the depth-width ratio of connected domain more to approach 1, and the distance between two connected domains is less than a certain threshold value (being determined on a case-by-case basis), merging these two connected domains is a connected domain, repeating step 3.2 again), find whether still have the part that can merge, otherwise turn to step 3.3);

Step 3.3) by step 3.2) in the connected domain that merged be divided into a Chinese character,

Step 3.4) reselect next connected domain and turn to step 3.1).

Step 4): connected domain is processed after merging again;

After connected domain is merged, the Chinese character of the overwhelming majority will correctly be cut apart, but still exists following special circumstances need to carry out special aftertreatment:

A) the many Chinese characters adhesion causing due to reasons such as noise, cuts calculates lower depth-width ratio and is identified as a whole Chinese character.Processing procedure for this kind of situation is:

Step a1) calculate the depth-width ratio of each connected domain, if lower than predetermined threshold value, turn to step a2);

Step a2) distinguish the low depth-width ratio being caused by extremely indivedual Chinese character self characters or the low depth-width ratio being caused by the adhesion of many Chinese characters, concrete differentiating method is tentatively realized by elevation information, if determine former because the latter turns to step a3);

Step a3) by vertical projection, find the weakest junction, and it is divided into independently Chinese character (this step can only obtain and correctly cut apart with higher reliability, cannot realize entirely accurate and cut apart) one by one, complete rear steering step a1).

B) there is the Chinese character that the cause specific of indivedual these body structures of Chinese character causes to be divided into several Chinese characters by mistake.As " quiet " word, when imitation Song-Dynasty-style typeface font, the point of this word leftmost side due to depth-width ratio than whole " quiet " word closer to 1, tend to be divided into a word by mistake, cause remaining part to become another one word, for this kind of situation, the scheme that this step is taked is: leftmost side point is identified by its shape and position, then force this point can not independently become a Chinese character, will merge with aft section like this, realize and correctly cutting apart.

C) due to the stamping ink trace depth, print after the reasons such as paper surface is impaired, often, after scanning, originally for holistic Chinese character, be identified as two different individualities.Processing mode for this kind of situation is: owing to being the same whole rear Different Individual generating that disconnects, compared to the spacing between word and word, spacing between this type of disconnected pen is more short and small, and the wide high length of disconnected pen itself is also relatively little, therefore a threshold value might as well be set herein, the distance between two individualities is less than this threshold value, and individual wide high length is while being less than normal character, no matter whether depth-width ratio meets above-mentioned merging Chinese character condition, all by its merging.

The step of thinning is as follows:

Step 1), read in image, for each character, scan its boundary pixel point; Utilize given template delete character boundary pixel point; Each template is the matrix of a 3*3, totally 9 elements, and each element can only get 0 or 1, so template has 2 ⁹=512 kinds of different forms.Give each template numbering 0-511, numbering equals the weighted sum of pixel in template, and the weight of each pixel is as follows:

[\begin{matrix} 1 & 8 & 64 \\ 2 & 16 & 128 \\ 4 & 32 & 256 \end{matrix}]

Wherein two templates are as follows, ' 1 ' black pixel point representing in text image wherein, and ' 0 ' represents the white background in text image; ' * ' represents that this point can be for 0 can being also 1.Other 6 templates obtain these two templates respectively after half-twist, 180 °, 270 °.

[\begin{matrix} 1 & 1 & 1 \\ * & 0 & * \\ 0 & 0 & 0 \end{matrix}] [\begin{matrix} * & 1 & 1 \\ 0 & 1 & * \\ 0 & 0 & * \end{matrix}]

Step 2), repeatedly perform step 1) until character picture structure no longer change.

In morphology, the refinement of image is by continuous corrosion operation and opens operation and realize.Every thinning process of taking turns is comprised of 8 templates, and character is carried out to refinement just can not obtain the final refinement result of character until refinement result does not change continuously.Although only had a pixel wide by the character after refinement, it has retained geometry and the topological property of character.These characteristics have very important effect for the robust identification of character.

The step of the water mark method based on character relative height is as follows:

Step 1), first utilize Binarization methods operation to obtain the text image after binaryzation; Then utilize Character segmentation algorithm to obtain the border up and down of each character in text image.

Step 2), on the basis of Character segmentation, each character is carried out to thinning operation;

Step 3), as previously mentioned, in this algorithm, the relative height of character is to weigh by the mean value of its inside casing up-and-down boundary.First according to step 1) up-and-down boundary obtaining calculates the height of each character, and then obtains the average height of each line of text in text image.

Step 4), because the height of some punctuation mark (as ", ") in Chinese character and the height of most of characters differ larger, if it is moved and will cause vision distortion.So for guarantee watermark robustness with and embed after imperceptibility, so we are according to housing boundary information, punctuation mark and character that height and average height value are differed larger are considered as unavailable character.After removing unavailable character, need again to calculate the average height of available characters as the final average height of every row.The object of doing is like this robustness and the imperceptibility guaranteeing after watermark embeds for better.

Step 5), in order to guarantee the robustness of watermark, the text behavior that regulation at least comprises 10 available characters is can embed watermark capable.Then the reference line average height of each available rows being moved as every line character, makes its height equate with its average height of being expert at so need to move up and down the available characters of every row.

Step 6), the character of one text row is divided into 2 groups or 4 groups.When available characters number is less than or equal to 20, is divided into 2 groups, otherwise is divided into 4 groups; Wherein every two groups can embed a watermark information.That can not divide equally is given up.If watermark information is ' 1 ', first half character makes its height higher than be expert at average height in mobile every group, and it is motionless that latter half character keeps; Otherwise, if watermark information is ' 0 ', keep in first half each character motionless, each character of mobile latter half, makes its height higher than average height.

Step 7), because algorithm is the thought based on relative height, after mobile character, intercharacter relative height is larger, and watermark robustness is better; But if intercharacter relative height is excessive, will cause image fault after watermark.In order to take into account robustness and imperceptibility, we have adopted character have been carried out to ' n ' automatic moving pattern.For example in Fig. 6 shown in character, a1 and a5 move 1 character, and a2 and a4 move 2 characters, and a3 moves 3 characters.

Step 8), obtain the text image after watermark embeds.

Extracting watermark process needs first text figure to be carried out to binaryzation, inclination corrective operations equally; Then carry out Character segmentation, thinning; By after the grouping of every row significant character, obtain every group of front and rear part height and compare to draw the information of embedding.

Claims

1. a water mark method for the attacks of print_scan resisted based on thinning, comprises two parts of watermark embedding and watermark extracting, it is characterized in that, the step that watermark embeds is as follows:

Steps A), text image is carried out to binary conversion treatment;

Step C), each character is carried out to thinning;

Step e), according to inner frame boundary information, calculate width and the height of each character, weed out width and be highly less than be expert at mean breadth and the character of mid-height value;

Step G), according to the quantity of character in every row, be divided at least one embedded unit, for each embedded unit, if watermark information is 1, will on each character of first half in embedded unit, move, make its height higher than be expert at average height, if watermark information is 0, will on each character of back part in embedded unit, move;

The step of watermark extracting is as follows:

Step 4), each character is carried out to thinning;

Step 7), the watermark information that output extracts.

2. the water mark method of the attacks of print_scan resisted based on thinning according to claim 1, is characterized in that, steps A) and step 2) described in the concrete steps of binary conversion treatment as follows:

3. the water mark method of the attacks of print_scan resisted based on thinning according to claim 1, is characterized in that step 2) described in the concrete steps of slant correction as follows:

Step 205), reduce to rotate step-length;

4. the water mark method of the attacks of print_scan resisted based on thinning according to claim 1, is characterized in that, step C) and step 4) described in the step of thinning as follows: