CN107644391B - It is a kind of for printing the digital watermark treatment method and device that document is traced to the source - Google Patents
It is a kind of for printing the digital watermark treatment method and device that document is traced to the source Download PDFInfo
- Publication number
- CN107644391B CN107644391B CN201710838786.8A CN201710838786A CN107644391B CN 107644391 B CN107644391 B CN 107644391B CN 201710838786 A CN201710838786 A CN 201710838786A CN 107644391 B CN107644391 B CN 107644391B
- Authority
- CN
- China
- Prior art keywords
- text
- image
- euler
- numbers
- digital watermark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Editing Of Facsimile Originals (AREA)
- Image Processing (AREA)
Abstract
The embodiment of the invention provides a kind of for printing the digital watermark treatment method and device that document is traced to the source, and the method includes being converted to image for text document, and divide the image into the corresponding character image of each text in text document;The Euler's numbers of the text in each character image are calculated, and the corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers;The digital watermark information to be embedded of each character image is obtained, and judges whether numerical characteristic matches with digital watermark information to be embedded;If numerical characteristic and digital watermark information to be embedded mismatch, change the topological structure of the text in character image, and calculate the Euler's numbers of the text after changing topological structure, so that the numerical characteristic of the Euler's numbers matches with digital watermark information to be embedded.Using the embodiment of the present invention, the safety of papery security files output can be improved.
Description
Technical field
The present invention relates to information security fields, more particularly to a kind of digital watermark processing side to trace to the source for printing document
Method and device.
Background technique
With the fast development of electronic information technology, the continuous improvement of the level of informatization of various circles of society, all kinds of multimedias
File such as electronic document, image, video are widely used in daily life.Wherein, electronic document has wound
The advantages that building quick, saving space, transport convenience, becomes the convenient vehicle of information interchange.Many enterprises and unit it is daily
File even confidential information is stored and is transmitted using the form of electronic document.It is contained in these documents varied
Information, have huge economic value and application value.But for printing after paper document and its copy, due to
Lack document tracing information, and lead to not the source for determining document print, the random of paper document is caused to print, intentionally
Or unintentionally illegal propagation, further increase the difficulty of document print control.Based on the background occur digital watermark technology,
Effective protection can be carried out to document copyright, and authenticity of products can be identified, be widely used in copyright protection, privacy communication and visited
It asks the multiple fields such as control, while guaranteeing electronic document safety, also ensures the safety of papery security files output.
Digital watermark technology is a kind of copyright protection technology, for carriers such as video, image, documents, watermark information is (special
Calibration is known) it is embedded, or modify to its certain specific structure.Carrier after insertion contains watermark information, watermark
Information is not easy to be noticeable or modify, and the original cost value of carrier is not affected.Watermark information can the person of being embedded into carry out identification with
It extracts, and insertion person can identify the information such as the copyright owner and authorization according to watermark information, additionally it is possible to which judgement should
Whether works are modified.
It is existing to be used to print the digital watermark treatment method that document is traced to the source, pass through the line space and word for changing document text
Spacing carries out the insertion of digital watermark information.Specifically, using document as carrier, for by changing document text in the ranks
Away from come the method that carries out the insertion of digital watermark information, the spacing of every row text is first calculated, former and later two adjacent rows are then calculated
The ratio of spacing determines the insertion of digital watermark information further according to ratio, if adjacent line space ratio and number to be embedded
Watermark information is not consistent, then carries out the insertion of digital watermark information by changing the line space of document text, such as, it is specified that preceding
When the ratio of latter two adjacent line space is greater than 1, the digital watermark information of insertion is 1, the ratio of former and later two adjacent line spaces
When no more than 1, the digital watermark information of insertion is 0, when the spacing of the first row and the second row text, is composed a piece of writing with the second row and third
The ratio of word space is 1.2, and digital watermark information to be embedded is 0, then changes the spacing of the first row and the second row text, make
The spacing for obtaining the first row and the second row text is not more than 1 with the second row and the ratio of the third line text spacing, thus in the first row
And second be embedded in digital watermark information 0 between row text.Equally, for carrying out number by the word space for changing document text
The method of the insertion of watermark information first calculates the spacing of the adjacent text of every two, then calculates former and later two adjacent word spaces
Ratio determines the insertion of digital watermark information further according to ratio, if adjacent word space ratio and digital watermarking to be embedded are believed
Breath is not consistent, then carries out the insertion of digital watermark information by changing the word space of document text.
But by the existing digital watermark treatment method traced to the source for printing document, for line space algorithm, water
It is too small to print capacity;For word space algorithm, due to the watermark information of insertion be between the word space of text, printing document into
When row copying and scanning, the pixel at text edge may will do it overturning, cause the change of word space, in addition, copying and scanning process
In if zoomed in and out to document, can make word space difference that threshold value be not achieved, if wanting to resist these attacks, need to sacrifice watermark
The transparency, spacing change is adjusted so as to larger, this all makes watermark information be difficult to take into account the transparency and robustness, and watermark is caused to be believed
The availability of breath is poor, reduces the safety of papery security files output.
Summary of the invention
Being designed to provide for the embodiment of the present invention is a kind of for printing the digital watermark treatment method and dress that document is traced to the source
It sets, to improve the availability of digital watermarking, to improve the safety of papery security files output.Specific technical solution is as follows:
The digital watermark treatment method that the embodiment of the invention discloses a kind of to trace to the source for printing document, the method packet
It includes:
Text document is converted into image, and described image is divided into the corresponding text of each text in the text document
Image;
The Euler's numbers of the text in each character image are calculated, and the text is determined according to the odd even of the Euler's numbers
The corresponding numerical characteristic of image;
Obtain the digital watermark information to be embedded of each character image, and judge the numerical characteristic with it is described to be embedded
Whether digital watermark information matches;
If the numerical characteristic and the digital watermark information to be embedded mismatch, change in the character image
The topological structure of text, and the Euler's numbers of the text after changing the topological structure are calculated, so that the numerical characteristic of the Euler's numbers
Match with the digital watermark information to be embedded.
Optionally, described to change the topological structure of the text in the character image, and calculate and change the topology knot
The Euler's numbers of text after structure, so that after the numerical characteristic of the Euler's numbers and the digital watermark information to be embedded match,
The method further include:
Numerical characteristic character image corresponding with the text that the digital watermark information to be embedded matches is merged.
It is optionally, described that described image is divided into the corresponding character image of each text in the text document, comprising:
By described image binaryzation, bianry image is obtained;
The bianry image is progressively scanned from top to bottom, calculates the pixel of the every row image scanned, and according to the picture
Element obtains the floor projection of described image;
The blank space formed using the blank spaces of every row text in the ranks in described image in the floor projection, will
Described image, which is divided into, does not go together, obtains row image;
Each row image is from left to right scanned, the pixel of all texts in the every row image scanned, and root are calculated
The upright projection of described image is obtained according to the pixel;
The blank space formed using the blank spaces between the text in each row image in the upright projection, will
The row image segmentation is single character block, and the character block is the corresponding character image of text each in the text document;
Judge whether the spacing of two neighboring character block is greater than preset threshold;
When the spacing is not more than the preset threshold, the two character blocks are merged into a character block.
Optionally, the Euler's numbers for calculating the text in each character image, and according to the odd even of the Euler's numbers
Determine the corresponding numerical characteristic of the character image, comprising:
Using image recognition algorithm, the connected region number and hole number of the text in each character image are identified;
By the difference of the connected region number and described hole number, the Euler's numbers of the text are calculated;
When the Euler's numbers are odd number, the corresponding numerical characteristic of the character image is " 1 ";
When the Euler's numbers are even number, the corresponding numerical characteristic of the character image is " 0 ".
Optionally, if the numerical characteristic and the digital watermark information to be embedded mismatch, described in change
The topological structure of text in character image, and the Euler's numbers of the text after changing the topological structure are calculated, so that the Euler
Several numerical characteristics matches with the digital watermark information to be embedded, comprising:
If the numerical characteristic and the digital watermark information to be embedded mismatch, extract in the character image
Text skeleton, and the insertion point for being embedded in digital watermark information is determined in the text skeleton, the insertion point is strokes of characters
The point of intersection;
The insertion point is expanded, to disconnect the point of the strokes of characters intersection, to change in the character image
Text topological structure, and the Euler's numbers of the text after changing the topological structure are calculated, so that the number of the Euler's numbers is special
Sign matches with the digital watermark information to be embedded.
Optionally, the text skeleton extracted in the character image, and insertion number is determined in the text skeleton
The insertion point of word watermark information, comprising:
Using morphological image algorithm, the character image is converted to the text skeleton of only one pixel connection;
At least one angle point for extracting the text skeleton will remove in all angle points positioned at the angle at the character image edge
Insertion point of any angle point as embedding information except point.
Optionally, described that the insertion point is expanded, to disconnect the point of the strokes of characters intersection, comprising:
Longest straight line in all straight lines adjacent with the insertion point is obtained in the text skeleton;
Using longest straight slope structural texture member, the insertion point is expanded by structural elements, described in disconnecting
The point of strokes of characters intersection.
The digital watermark processing device that the embodiment of the invention also discloses a kind of to trace to the source for printing document, described device packet
It includes:
Divide module, for text document to be converted to image, and described image is divided into the text document respectively
The corresponding character image of text;
Computing module, for calculating the Euler's numbers of the text in each character image, and according to the surprise of the Euler's numbers
It is even to determine the corresponding numerical characteristic of the character image;
Module is obtained, for obtaining the digital watermark information to be embedded of each character image, and judges that the number is special
Whether sign matches with the digital watermark information to be embedded;
Processing module changes institute if mismatched for the numerical characteristic and the digital watermark information to be embedded
The topological structure of the text in character image is stated, and calculates the Euler's numbers of the text after changing the topological structure, so that the Europe
The numerical characteristic of number is drawn to match with the digital watermark information to be embedded.
Optionally, the device further include:
Merging module, the character image for the numerical characteristic and the digital watermark information to be embedded to match close
And.
Optionally, the segmentation module, comprising:
Submodule is handled, for obtaining bianry image for described image binaryzation;
First scanning submodule calculates the every row image scanned for progressively scanning the bianry image from top to bottom
Pixel, and according to the pixel obtain described image floor projection;
First segmentation submodule, for being thrown using the blank spaces of every row text in described image in the ranks in the level
The blank space that shadow is formed, described image is divided into and is not gone together, row image is obtained;
Second scanning submodule calculates in the every row image scanned for from left to right scanning each row image
The pixel of all texts, and according to the upright projection of pixel acquisition described image;
Second segmentation submodule, for utilizing the blank spaces between the text in each row image in the vertical throwing
The blank space that shadow is formed, is single character block by the row image segmentation, and the character block is each in the text document
The corresponding character image of text;
Judging submodule, for judging whether the spacing of two neighboring character block is greater than preset threshold;
Merge submodule, for when the spacing is not more than the preset threshold, the two character blocks to be merged into one
A character block.
Optionally, the computing module, comprising:
It identifies submodule, for using image recognition algorithm, identifies the connected region of the text in each character image
Domain number and hole number;
Computational submodule calculates the Europe of the text for the difference by the connected region number and described hole number
Draw number;
First determines submodule, and for when the Euler's numbers are odd number, the corresponding numerical characteristic of the character image is
"1";
Second determines submodule, and for when the Euler's numbers are even number, the corresponding numerical characteristic of the character image is
“0”。
Optionally, the processing module, comprising:
Extracting sub-module is extracted if mismatched for the numerical characteristic and the digital watermark information to be embedded
Text skeleton in the character image, and the insertion point for being embedded in digital watermark information is determined in the text skeleton, it is described
Insertion point is the point of strokes of characters intersection;
Submodule is expanded, for expanding to the insertion point, to disconnect the point of the strokes of characters intersection, to change
The topological structure of text in the character image, and the Euler's numbers of the text after changing the topological structure are calculated, so that should
The numerical characteristic of Euler's numbers matches with the digital watermark information to be embedded.
Optionally, the extracting sub-module, comprising:
The character image is converted to the connection of only one pixel for utilizing morphological image algorithm by converting unit
Text skeleton;
Extraction unit is located at the text for removing in all angle points for extracting at least one angle point of the text skeleton
Insertion point of any angle point as embedding information except the angle point of word image border.
Optionally, the expansion submodule, comprising:
Acquiring unit, for being obtained in the text skeleton longest one in all straight lines adjacent with the insertion point
Straight line;
Expansion cell, for being carried out by structural elements to the insertion point swollen using longest straight slope structural texture member
It is swollen, to disconnect the point of the strokes of characters intersection.
The embodiment of the invention also discloses a kind of electronic equipment, including processor, communication interface, memory and communication are total
Line, wherein the processor, the communication interface, the memory complete mutual communication by communication bus;
The memory, for storing computer program;
The processor when for executing the program stored on the memory, is realized above-mentioned a kind of for printing text
The digital watermark treatment method step that shelves are traced to the source.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is also disclosed, it is described computer-readable
Instruction is stored in storage medium, when run on a computer, so that computer executes a kind of any of the above-described use
In the digital watermark treatment method that printing document is traced to the source.
It is provided in an embodiment of the present invention a kind of for printing the digital watermark treatment method and device that document is traced to the source, the number
Text document is first converted to image by water mark method, and divides the image into character image, then by calculating each text figure
The Euler's numbers of text as in, and the corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers, then judge the number
Whether feature matches with digital watermark information to be embedded, if it does not match, changing the topology knot of the text in character image
Structure, and the Euler's numbers of the text after changing topological structure are calculated, so that the numerical characteristic of the Euler's numbers and digital watermarking to be embedded
Information matches.The Euler's numbers of this topologies adjusting text by text, come make the numerical characteristic of Euler's numbers with to embedding
Enter the method that digital watermark information matches, malice or unintentionally Attack Digital Watermarking, when so that document being divulged a secret, Neng Goucong can be resisted
Watermark information is extracted in the document divulged a secret, the source of printing document is determined according to watermark information, completes the tracing to the source of document of divulging a secret,
The availability of digital watermarking is improved, to improve the safety of papery security files output.Certainly, implement of the invention appoint
One product or method must be not necessarily required to reach all the above advantage simultaneously.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is that a kind of process for printing the digital watermark treatment method that document is traced to the source provided in an embodiment of the present invention is shown
It is intended to;
Fig. 2 is that a kind of process for printing the digital watermark treatment method that document is traced to the source provided in an embodiment of the present invention is shown
It is intended to;
Fig. 3 is provided in an embodiment of the present invention a kind of for printing the effect pair for the digital watermark treatment method that document is traced to the source
Than figure;
Fig. 4 is that a kind of structure for printing the digital watermark processing device that document is traced to the source provided in an embodiment of the present invention is shown
It is intended to;
Fig. 5 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Digital watermarking be protection information security, realize it is anti-fake trace to the source, the effective way of copyright protection, be Information Hiding Techniques
The important branch and research direction of research field.In order to usurp the document of oneself by others easily, or declare to advertisement
It passes, we generally add watermark in the document of oneself.Wherein, digital watermarking has characteristics that the 1, transparency.Watermark it is saturating
Bright property refers to whether the carrier after insertion watermark can cause people's discovering visually.2, robustness.The robustness of watermark is
The index that all kinds of attacking abilities are resisted in watermark is measured, wherein attack includes compressing, rotating, cut etc..3, capacity.Capacity refers to load
Body can accommodate the number of watermark information, and the unit of capacity is often bit.4, safety.Safety refers to what watermark information was hidden
Position and content are not well known, and the transformation of file format not will lead to the loss of watermark data, and unauthorized user can not detect
With destruction watermark.
Compared with general pattern, the color of text image, texture are simple, and the degree of redundancy of transform domain is lower, therefore difficult
To use general transform domain method insertion digital watermark information.Based on file and picture monochrome pixels Proportionality design can not water breakthrough
Although print embedding grammar solves the problems, such as that visual identification cuts out attack on the other hand, due to printing, copier halftone process algorithm
Equal system functions have differences, and the robustness of such algorithm is insufficient, and after manuscript is repeatedly duplicated, watermark information may be gone completely
It removes.Based on this, a kind of digital watermark treatment method traced to the source for printing document provided in an embodiment of the present invention passes through text
The Euler's numbers of topologies adjusting text, come the side for making the numerical characteristic of Euler's numbers and digital watermark information to be embedded match
Method, in this way, even if no matter in the ranks text is not influenced when being embedded in digital watermark information by text line space and word space, i.e.,
Away from how changing with word space, embedded digital watermark information be it is immovable, not only increase the availability of digital watermarking,
Also improve the safety of papery security files output.Detailed process is as follows:
Referring to Fig. 1, Fig. 1 is a kind of digital watermark treatment method traced to the source for printing document provided in an embodiment of the present invention
Flow diagram, include the following steps:
Text document is converted to image, and is divided into each text in the text document corresponding described image by S101
Character image.
Specifically, due to it is provided by the present invention it is a kind of for printing the digital watermark treatment method that document is traced to the source be to be based on
The digital watermark treatment method of image, it is therefore desirable to text document is first converted into image, i.e., document format is converted into picture
Document storing can be directly picture format, can also convert document to picture by file format converter tools by format
Format etc..
In addition, the present invention be directed to each texts in image to carry out digital watermark processing, therefore text document is turned
It is changed to after image, it is also necessary to divide the image into the corresponding character image of each text in text document, i.e., to the word in image
Symbol is split.Since the text image of acquisition not only contains the character one by one of composition text, and contain character row
Blank between spacing and word, or even various punctuation marks can be also had, this just needs the character cutting one by one in text
Come, the pattern matrix of single word is formed, to carry out individual character identifying processing.The task of Character segmentation is multirow or more character pictures
In each character split from whole image, become single character.Here, each text in text document is divided the image into
The corresponding character image of word, so as to more accurately carry out the embedding of digital watermarking to the text in each character image after segmentation
Enter.
S102, calculates the Euler's numbers of the text in each character image, and determines institute according to the odd even of the Euler's numbers
State the corresponding numerical characteristic of character image.
Specifically, the definition of Euler's numbers is the difference of connected region number and hole number.It is H, object in a sub-picture Hole number
The connected region number of body is C, then Euler's numbers are as follows: E=C-H.Euler's numbers are region description based on image geometry feature, and Europe
Draw number not by extending or rotation transformation is influenced, it can thus be assumed that the Euler's numbers of character are not influenced by print scanned.It is utilizing
When numerical characteristic of the odd even of text Euler's numbers to determine character image, need to identify the Euler's numbers of text, it is specified that text Euler
Number, which is divided exactly to Yu by 2,1 represents numerical characteristic " 1 ", it is specified that Euler's numbers, which are divided exactly to Yu by 2,0 represents numerical characteristic " 0 ".
S103, obtains the digital watermark information to be embedded of each character image, and judge the numerical characteristic with it is described
Whether digital watermark information to be embedded matches.
Specifically, the information of digital watermark information user oneself setting to be embedded.If numerical characteristic is " 1 " or " 0 ",
Then digital watermark information to be embedded is one of " 1 " or " 0 ".For example, for text document " phychology determines all " to
Being embedded in digital watermark information is " 010110 ", and numerical characteristic is " 001110 ", then by judging numerical characteristic and number to be embedded
Whether word watermark information matches, it is known that the insertion digital watermark information and number to be embedded of " state " and " certainly " word in text document
Word watermark information mismatches.Here, by judging whether numerical characteristic matches with digital watermark information to be embedded, thus quickly
It determines whether to be embedded in digital watermark information to be embedded for character image, so that embedded digital watermark information and number to be embedded
Watermark information is identical, improves the availability of digital watermark information.
S104 changes the text figure if the numerical characteristic and the digital watermark information to be embedded mismatch
The topological structure of text as in, and the Euler's numbers of the text after changing the topological structure are calculated, so that the number of the Euler's numbers
Word feature matches with the digital watermark information to be embedded.
Specifically, needing to change in character image if numerical characteristic and digital watermark information to be embedded mismatch
The topological structure of text.By the Euler's numbers of the topologies adjusting text of text, so that changing the text after topological structure
Numerical characteristic corresponding to the corresponding character image of word, it is consistent with digital watermark information to be embedded.
In addition, not changing the character image if numerical characteristic matches with digital watermark information to be embedded.
It can be seen that a kind of digital watermark treatment method traced to the source for printing document provided in an embodiment of the present invention, first
Text document is converted into image, and divides the image into character image, then by calculating the text in each character image
Euler's numbers, and the corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers, then judge the numerical characteristic with to embedding
Enter whether digital watermark information matches, if matched, which is not handled, if it does not match, changing
Become the topological structure of the text in character image, and calculate the Euler's numbers of the text after changing topological structure, so that the Euler's numbers
Numerical characteristic match with digital watermark information to be embedded.The Euler's numbers of this topologies adjusting text by text,
Come the method for making the numerical characteristic of Euler's numbers and digital watermark information to be embedded match, malice or unintentionally watermark can be resisted
Attack, when so that document being divulged a secret, can extract watermark information from the document divulged a secret, and determine printing document according to watermark information
Source, complete the tracing to the source of document of divulging a secret, improve the availability of digital watermarking, to improve the output of papery security files
Safety.
It in embodiments of the present invention, can also be by number after the Euler's numbers by the topologies adjusting text of text
Word feature merges with the character image that digital watermark information to be embedded matches.
Specifically, the numerical characteristic and digital watermark information to be embedded of the Euler's numbers due to the text after change topological structure
Match, therefore, the character image that numerical characteristic matches with digital watermark information to be embedded is merged, after obtaining merging
Character image be embedded in watermark information after entire text document corresponding to image, for protect document copyright and trace to the source
Provide foundation.
In an optional embodiment of the present invention, the corresponding text figure of each text in text document is divided the image into
Picture is specifically as follows:
Image binaryzation is obtained bianry image by the first step.
Specifically, since there is diversity for character font, so in general character recognition system, character recognition it
Before will first to image carry out binaryzation, carry out capable segmentation, again then to be partitioned into the character figure that specific two-value one by one indicates
Picture point battle array, the input data as monocase identification.The binaryzation of image is exactly that the gray value of the pixel on image is arranged
It is 0 or 255, that is, whole image is showed and significantly there was only black and white visual effect.In Digital Image Processing, two
Value image plays a very important role, and the binaryzation of image is conducive to being further processed for image, becomes image simply, and
And data volume reduces, and can highlight the profile of interested target.
Second step progressively scans bianry image from top to bottom, calculates the pixel of the every row image scanned, and according to pixel
Obtain the floor projection of image.
Specifically, carrying out capable segmentation and column split to the character in bianry image, projection localization, projection localization are generally used
It is the interval using character, divides single character.And projection localization need first for input binaryzation character image on to
Then lower progressive scan calculates the sum of the pixel value of each scan line, to obtain the floor projection of character image.
Third step, the blank space formed using the blank spaces of every row text in the ranks in image in floor projection will
Image segmentation is not go together, and obtains row image.
Specifically, during the scanning process, character image is more regular along the floor projection of line direction, each of projection
Wave crest is corresponding with each line of text in image, and having between two adjacent rows than wider one section of projection information is 0, this is
The white space between adjacent rows is corresponded to, i.e., the blank spaces of every row text in the ranks are empty in the blank that floor projection is formed
Gap.According to this rule, row cutting is easier, it is directly right after the character image of whole picture being projected in the horizontal direction
Character image carries out capable segmentation, can not only improve the efficiency of row segmentation, and can be improved the accuracy rate of row segmentation.
4th step from left to right scans each row image, calculates the pixel of all texts in the every row image scanned, and
The upright projection of image is obtained according to pixel.
It does not go together specifically, dividing the image into, after obtaining row image, is carried out in every this corresponding row image of style of writing
Column split obtains individual character one by one.And according to the blank spaces between the text in each row image vertical when column split
The blank space formed is projected come what is be split, this just needs first from left to right to scan each row image, then calculates each column figure
The sum of pixel value as in, the upright projection of image is obtained with this.It here, is also to take whole image when progress column split
Instead, i.e., black matrix wrongly written or mispronounced character is become from white gravoply, with black engraved characters, bianry image black pixel value is 0, therefore white pixel value 1 is projecting
When, obtaining the projection value at inter-character space (blank space) is 0, i.e., the sum of gap location each column pixel value is 0.In this way, can root
According to the sum of the pixel value of gap location, single text more accurately is partitioned into from every row image.
5th step, the blank space formed using the blank spaces between the text in each row image in upright projection will
Row image segmentation is single character block, and character block is the corresponding character image of text each in text document.
Specifically, the upright projection of the white space in each row image between text and text, the blank spaces that will form,
Single character picture is cut out by these blank spaces, in this way, the character image split is single text
I.e. single character.
6th step, judges whether the spacing of two neighboring character block is greater than preset threshold.
For including the text of tiled configuration or the text of left, center, right structure in row image, due to by each row image
Text between the blank space that is formed in upright projection of blank spaces, be single word by the Text segmentation in each row image
Block is accorded with, which may be a part of a text, such as an only radical of text.And two adjacent characters it
Between spacing it is bigger than the spacing in a character, for example, two adjacent characters " if ", the spacing ratio between " such as " and " fruit "
Spacing in " such as " word between " female " and " mouth " is big.Therefore, it is necessary to judge it is default whether the spacing of two neighboring character block is greater than
Threshold value, with this come to ensure divided character block all be complete text, to carry out digital watermark information to individual text
Insertion.Here, preset threshold can be set according to the spacing in text document between character, can also be according to actual needs
To set.
The two character blocks are merged into a character block when spacing is not more than preset threshold by the 7th step.
Specifically, showing that the two character blocks are likely to when the spacing of two neighboring character block is not more than preset threshold
It is the two parts for forming a text, as soon as then the two character blocks are merged into a character block, in this way, obtaining one completely
Character, thus also obtain insertion digital watermark information minimum unit so as to single text carry out watermark insertion it is accurate
Property is higher.
In an optional embodiment of the present invention, the Euler's numbers of the text in each character image are calculated, and according to Euler
Several odd evens determines the corresponding numerical characteristic of character image, is specifically as follows:
The first step identifies the connected region number and hole number of the text in each character image using image recognition algorithm.
Specifically, the corresponding black of bianry image 0 is background, and 1 corresponding white, is prospect due in image procossing.And
Practical Chinese character is black, should be prospect, white portion should be background.Therefore when identifying Euler's numbers, character image is negated,
The stroke part of text is taken as prospect, i.e. character image is black matrix wrongly written or mispronounced character.The connected region of text refers in the text mutually
Disjunct white stroke number, for example, " hole " word character image negate after, i.e., " hole " word is black matrix wrongly written or mispronounced character, in " hole " word mutually not
Connected white stroke number is 6, that is to say, that the connected region number in " hole " word is 6.The hole number of text refers to text
Stroke fences up the closed area number of (closed curve).In the present invention, the identification method of hole number can be such as under type: white
In the character image of bottom surplus, identify that its connected region number, hole number are that connected region number subtracts one.
Second step calculates the Euler's numbers of text by the difference of connected region number and hole number.
Specifically, the number for the connected region that will identify that subtracts the difference of hole number, as the Euler's numbers of the text,
For example, being H in a sub-picture Hole number, the connected region number of object is C, then Euler's numbers are as follows: E=C-H.
Third step, when Euler's numbers are odd number, the corresponding numerical characteristic of character image is " 1 ".
Specifically, the numerical characteristic for all character images that the difference of connected region number and hole number is odd number is determined as
“1”。
4th step, when Euler's numbers are even number, the corresponding numerical characteristic of character image is " 0 ".
Specifically, the numerical characteristic for all character images that the difference of connected region number and hole number is even number is determined as
"0".Here, the numerical characteristic of character image is determined by the odd even of Euler's numbers, it on the other hand, can also be by changing text
The odd even of Euler's numbers makes text be embedded in different digital watermark informations.
In embodiments of the present invention, if numerical characteristic and digital watermark information to be embedded mismatch, change text figure
The topological structure of text as in, and the Euler's numbers of the text after changing topological structure are calculated, it is specifically as follows:
The first step extracts the text in character image if numerical characteristic and digital watermark information to be embedded mismatch
Skeleton, and the insertion point for being embedded in digital watermark information is determined in text skeleton.
Specifically, skeleton is a kind of description for embodying graph connectedness and topological structure, in text image, skeleton embodies
The most important information of character.The topological structure of character reflects the most basic information of character, therefore can be more using skeleton
Easily find out stroke intersection point.If numerical characteristic and digital watermark information to be embedded mismatch, need to change character image
In text topological structure.This just needs first to extract the text skeleton in character image, and insertion is determined in text skeleton
The insertion point of digital watermark information, for example, the watermark information that should be embedded in is " 1 " for current character " mouth ", however its own
The information (watermark information to be embedded) of carrying is " 0 ", needs to change its Euler's numbers thus, extracts the skeleton of " mouth " word first,
The skeleton shape of extraction is the rectangle of only one pixel of four edges thickness, angle point is then extracted in skeleton, the angle point extracted
For four angles of rectangle, one of angle point (such as lower right corner) is selected, judging the angle point really is two-stroke intersection point, then selects
Stroke cutting operation is carried out at the point.Here, insertion point is the point of strokes of characters intersection.
Second step expands insertion point, to disconnect the point of strokes of characters intersection, to change the text in character image
Topological structure, and calculate change topological structure after text Euler's numbers so that the numerical characteristic of the Euler's numbers with it is to be embedded
Digital watermark information matches.
It is embedded in after the insertion point of digital watermark information specifically, being determined in text skeleton, insertion point is expanded,
To disconnect the point of strokes of characters intersection, in this way, being disconnected the point of strokes of characters intersection, the connected region or hole number of text will
Change, change so as to cause the Euler's numbers of text, the odd even of the Euler's numbers by changing text is embedded in text different
Digital watermark information.The Euler's numbers of this topologies adjusting text by text, come make the numerical characteristics of Euler's numbers with to
The method that insertion digital watermark information matches can resist malice or unintentionally Attack Digital Watermarking can when so that document being divulged a secret
Watermark information is extracted from the document divulged a secret, the source of printing document is determined according to watermark information, completes the tracing back of document of divulging a secret
Source, improves the availability of digital watermarking, to improve the safety of papery security files output.
A kind of process for printing the digital watermark treatment method that document is traced to the source provided in embodiments of the present invention is shown
It is intended to, as shown in Figure 2.A width character image on the left side is source character image in Fig. 2, and an intermediate width character image is will be left
Image after the one width character image binaryzation on side, and show on bianry image the insertion point of embeddable watermark information, respectively
It is point corresponding to white box in figure, the secondary character image of the one of the right is the character image changed after text topological structure.Its
In, the secondary character image of the one of the right is by insertion point (second in " will " word stroke in an intermediate width character image
With third pen as insertion point) expansion will second in " will " word stroke disconnected with third pen.Obtaining the right
A secondary character image after, what second in " will " word in the secondary character image of one on the right in stroke and third pen disconnected
Point can be embedded in digital watermark information to be embedded.The Euler's numbers of this topologies adjusting text by text, to make Euler
The method that several numerical characteristics and digital watermark information to be embedded match can resist malice or unintentionally Attack Digital Watermarking, with
When document being made to divulge a secret, watermark information can be extracted from the document divulged a secret, the source of printing document is determined according to watermark information,
The tracing to the source of document of divulging a secret is completed, the availability of digital watermarking is improved, to improve the safety of papery security files output.
What is provided in embodiments of the present invention is a kind of for printing the effect pair for the digital watermark treatment method that document is traced to the source
Than figure, as shown in figure 3, a line text " this is the effect of processing method " above in Fig. 3 is source document this document, a line below
Text is the text document being embedded in after digital watermark information, in source document this document, "Yes", " place ", " side ", " " digital spy
Sign is different from digital watermark information to be embedded, therefore changes the topological structure of these texts, passes through a style of writing the upper surface of in Fig. 3
Word and the comparison of following a line text are as can be seen that the 6th and the 7th in "Yes" word stroke is disconnected, in " place " word stroke
The first stroke and third pen is disconnected, the third pen in " side " word stroke and the 4th are disconnected, " " in word stroke first
Pen is disconnected with second, is embedded in point by the point intersected to these strokes and is expanded, disconnects the point that strokes of characters intersects,
To change the topological structure of the text in character image, to be embedded in digital watermark information to be embedded to text.
In an optional embodiment of the present invention, the text skeleton in character image is extracted, and in text skeleton really
Surely it is embedded in the insertion point of digital watermark information, is specifically as follows:
Character image is converted to the text skeleton of only one pixel connection using morphological image algorithm by the first step.
Specifically, carrying out skeleton to text first with morphological image algorithm when finding stroke cut-point (insertion point)
It extracts, then strokes of characters cut-point is found out based on skeleton, specifically include: keeping original text word topological structure constant, bone is extracted to it
The Euler's numbers of frame, skeleton are consistent with original text word Euler's numbers;Angle point grid is carried out to text skeleton, it is standby to do to extract multiple angle points
With;In the angle point extracted, any point in addition to being located at text edge is selected, is embedded in point for the point as watermark, and remember
Record the coordinate.
In addition, character image is converted to the text skeleton of only one pixel connection using morphological image algorithm,
Exactly character image is refined, character image refinement occurs generally as a kind of Preprocessing Technique, it is therefore an objective to extraction source figure
It is wide to be refined into only one pixel, forms " bone by the skeleton of picture for the lines that line thickness in original image is greater than 1 pixel
Frame " can be relatively easy to analysis image after forming skeleton, such as extract the feature of image.Refining basic thought is " depriving layer by layer ",
Deprived inwards in layer since line edge, until lines remain next pixel until.Image thinning is greatly pressed
With having contracted original image data volume, and keep the Basic Topological of its shape constant, to be taken out for the feature in Text region
The application such as take to lay a good foundation.
Second step extracts at least one angle point of text skeleton, is located at the character image edge for removing in all angle points
Angle point except insertion point of any angle point as embedding information.
Specifically, the angle point number of each text skeleton is different, all angle points for extracting text skeleton, the angle point are first passed through
At least one, then select any one angle point not being located at except the angle point at character image edge in all angle points as embedding
The insertion point for entering information, in this way, greatly improving the availability of the digital watermarking of insertion.
Wherein, insertion point is expanded, to disconnect the point of strokes of characters intersection, is specifically as follows:
The first step obtains longest straight line in all straight lines adjacent with insertion point in text skeleton.
Specifically, longest straight line in all straight lines adjacent with insertion point is obtained in text skeleton, it is right in this way
After insertion point is expanded, the availability of the digital watermarking of insertion can be improved.Such as the insertion point of " flutterring " word is the perpendicular of the right
It, to disconnect perpendicular and point stroke intersection the point on the right of " flutterring " word, is answered with a part for intersection if to be expanded to insertion point
Longest straight line in all straight lines adjacent with insertion point should be obtained in " flutterring " word, i.e., the right is perpendicular, rather than " flutterring "
Stroke is erected to perpendicular and point intersection part on the right of word any a part being divide into upper part and lower part.
Second step expands insertion point by structural elements, using longest straight slope structural texture member to disconnect text
The point of word stroke intersection.
Specifically, using longest straight slope structural texture member, by selecting appropriate structural elements, to original text word cut-point
The region of surrounding carries out expansive working, and two-stroke originally connected disconnects after expansion, and topological structure changes, the surprise of Euler's numbers
It is even to change, so that Chinese character be made to carry different watermark informations.Here, structural elements are constructed according to longest straight slope.Such as
" mouth " word, Euler's numbers change into " 0 ", need to change into its Euler's numbers into " 1 ", then in bottom right corner point extract one it is longest
Vertical straight line expands " mouth " lower right corner by the straight line as the structural elements of expansion, horizontal stroke and the right after expansion
Perpendicular stroke disconnects, then Euler's numbers change into " 1 ".When disconnecting the point of strokes of characters intersection, make strokes of characters using the method for expansion
It disconnects being embedded in digital watermark information to be embedded, improves the availability of the digital watermarking of insertion.
Referring to fig. 4, Fig. 4 is a kind of digital watermark processing device traced to the source for printing document provided in an embodiment of the present invention
Structural schematic diagram, including following module:
Divide module 401, for text document to be converted to image, and divides the image into each text pair in text document
The character image answered;
Computing module 402 is determined for calculating the Euler's numbers of the text in each character image, and according to the odd even of Euler's numbers
The corresponding numerical characteristic of character image;
Obtain module 403, for obtaining the digital watermark information to be embedded of each character image, and judge numerical characteristic with to
Whether insertion digital watermark information matches, wherein digital watermark information to be embedded is one of numerical characteristic;
Processing module 404 changes character image if mismatched for numerical characteristic and digital watermark information to be embedded
In text topological structure, and calculate change topological structure after text Euler's numbers so that the numerical characteristic of the Euler's numbers
Match with digital watermark information to be embedded.
It can be seen that a kind of digital watermark processing device traced to the source for printing document provided in an embodiment of the present invention, first
Text document is converted into image by segmentation module, and divides the image into character image, is then calculated by computing module
The Euler's numbers of text in each character image, and the corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers, then
The digital watermark information to be embedded of each character image is obtained by obtaining module, and judges the numerical characteristic and digital water to be embedded
Whether official seal breath matches, if it does not match, changing the topological structure of the text in character image by processing module, and counts
The Euler's numbers for calculating the text after changing topological structure, so that the numerical characteristic of the Euler's numbers and digital watermark information phase to be embedded
Match.The Euler's numbers of this topologies adjusting text by text, to make the numerical characteristic and digital water to be embedded of Euler's numbers
The matched method of official seal manner of breathing can resist malice or unintentionally Attack Digital Watermarking can be from the text divulged a secret when so that document being divulged a secret
Watermark information is extracted in shelves, the source of printing document is determined according to watermark information, the tracing to the source of document of divulging a secret is completed, improves number
The availability of word watermark, to improve the safety of papery security files output.
Further, the device further include:
Merging module, for merging numerical characteristic with the character image that digital watermark information to be embedded matches.
Further, divide module 401, comprising:
Submodule is handled, for obtaining bianry image for image binaryzation;
First scanning submodule calculates the picture of the every row image scanned for progressively scanning bianry image from top to bottom
Element, and according to the floor projection of pixel acquisition image;
First segmentation submodule, for what is formed using the blank spaces of every row text in image in the ranks in floor projection
Blank space is divided the image into and is not gone together, obtains row image;
Second scanning submodule calculates all in the every row image scanned for from left to right scanning each row image
The pixel of text, and according to the upright projection of pixel acquisition image;
Second segmentation submodule, for what is formed using the blank spaces between the text in each row image in upright projection
Blank space, is single character block by row image segmentation, and character block is the corresponding character image of text each in text document;
Judging submodule, for judging whether the spacing of two neighboring character block is greater than preset threshold;
Merge submodule, for when spacing is not more than preset threshold, the two character blocks to be merged into a character block.
Further, computing module 402, comprising:
It identifies submodule, for using image recognition algorithm, identifies the connected region number of the text in each character image
With hole number;
Computational submodule calculates the Euler's numbers of text for passing through the difference of connected region number and hole number;
First determines submodule, for when Euler's numbers are odd number, the corresponding numerical characteristic of character image to be " 1 ";
Second determines submodule, for when Euler's numbers are even number, the corresponding numerical characteristic of character image to be " 0 ".
Further, processing module 404, comprising:
Extracting sub-module extracts character image if mismatched for numerical characteristic and digital watermark information to be embedded
In text skeleton, and in text skeleton determine insertion digital watermark information insertion point, insertion point be strokes of characters intersect
Point;
Submodule is expanded, for expanding to insertion point, to disconnect the point of strokes of characters intersection, to change character image
In text topological structure, and calculate change topological structure after text Euler's numbers.
Further, extracting sub-module, comprising:
Character image is converted to the text of only one pixel connection for utilizing morphological image algorithm by converting unit
Character skeleton;
Extraction unit is located at the text figure for removing in all angle points for extracting at least one angle point of text skeleton
As edge angle point except insertion point of any angle point as embedding information.
Further, submodule is expanded, comprising:
Acquiring unit, for obtaining longest straight line in all straight lines adjacent with insertion point in text skeleton;
Expansion cell, for being expanded to insertion point by structural elements using longest straight slope structural texture member, with
Disconnect the point of strokes of characters intersection.
The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 5, include processor 501, communication interface 502,
Memory 503 and communication bus 504, wherein processor 501, communication interface 502, memory 503 are complete by communication bus 504
At mutual communication.
Memory 503, for storing computer program;
Processor 501 when for executing the program stored on memory 503, realizes following steps:
Text document is converted into image, and divides the image into the corresponding character image of each text in text document;
The Euler's numbers of the text in each character image are calculated, and the corresponding number of character image is determined according to the odd even of Euler's numbers
Word feature;
The digital watermark information to be embedded of each character image is obtained, and judges numerical characteristic and digital watermark information to be embedded
Whether match;
If numerical characteristic and digital watermark information to be embedded mismatch, change the topology knot of the text in character image
Structure, and the Euler's numbers of the text after changing topological structure are calculated, so that the numerical characteristic of the Euler's numbers and digital watermarking to be embedded
Information matches.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just
It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy
The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit,
CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal
Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing
It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete
Door or transistor logic, discrete hardware components.
It can be seen that a kind of electronic equipment provided through the embodiment of the present invention, is first converted to image for text document, and
Character image is divided the image into, then by calculating the Euler's numbers of the text in each character image, and according to Euler's numbers
Odd even determines the corresponding numerical characteristic of character image, then judge the numerical characteristic and digital watermark information to be embedded whether phase
Match, if it does not match, changing the topological structure of the text in character image, and calculates the Europe of the text after changing topological structure
Number is drawn, so that the numerical characteristic of the Euler's numbers matches with digital watermark information to be embedded.This topological structure by text
The Euler's numbers of text are adjusted, come the method for making the numerical characteristic of Euler's numbers and digital watermark information to be embedded match, can be supported
Anti- malice or unintentionally Attack Digital Watermarking when so that document being divulged a secret, can extract watermark information from the document divulged a secret, according to water
Official seal breath determines the source of printing document, completes the tracing to the source of document of divulging a secret, the availability of digital watermarking is improved, to improve
The safety of papery security files output.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can
It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes any institute in above-described embodiment
A kind of digital watermark treatment method traced to the source for printing document stated.Wherein, described a kind of to trace to the source for printing document
Digital watermark treatment method includes:
Text document is converted into image, and divides the image into the corresponding character image of each text in text document;
The Euler's numbers of the text in each character image are calculated, and the corresponding number of character image is determined according to the odd even of Euler's numbers
Word feature;
The digital watermark information to be embedded of each character image is obtained, and judges numerical characteristic and digital watermark information to be embedded
Whether match;
If numerical characteristic and digital watermark information to be embedded mismatch, change the topology knot of the text in character image
Structure, and the Euler's numbers of the text after changing topological structure are calculated, so that the numerical characteristic of the Euler's numbers and digital watermarking to be embedded
Information matches.
It can be seen that a kind of computer readable storage medium provided through the embodiment of the present invention, first turns text document
It is changed to image, and divides the image into character image, then by calculating the Euler's numbers of the text in each character image, and root
The corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers, then judges the numerical characteristic and digital watermark information to be embedded
Whether match, if it does not match, change character image in text topological structure, and calculate change topological structure after
The Euler's numbers of text, so that the numerical characteristic of the Euler's numbers matches with digital watermark information to be embedded.It is this by text
The Euler's numbers of topologies adjusting text, come the side for making the numerical characteristic of Euler's numbers and digital watermark information to be embedded match
Method can resist malice or unintentionally Attack Digital Watermarking, and when so that document being divulged a secret, watermark letter can be extracted from the document divulged a secret
Breath determines the source of printing document according to watermark information, completes the tracing to the source of document of divulging a secret, improves the availability of digital watermarking,
To improve the safety of papery security files output.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device,
For electronic equipment, computer readable storage medium embodiment, since it is substantially similar to the method embodiment, so the ratio of description
Relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (5)
1. a kind of digital watermark treatment method traced to the source for printing document, which is characterized in that the described method includes:
Text document is converted into image, and described image is divided into the corresponding text figure of each text in the text document
Picture;
The Euler's numbers of the text in each character image are calculated, and the character image is determined according to the odd even of the Euler's numbers
Corresponding numerical characteristic;
The digital watermark information to be embedded of each character image is obtained, and judges the numerical characteristic and the number to be embedded
Whether watermark information matches;
If the numerical characteristic and the digital watermark information to be embedded mismatch, change the text in the character image
Topological structure, and the Euler's numbers of the text after changing the topological structure are calculated, so that the numerical characteristic of the Euler's numbers and institute
Digital watermark information to be embedded is stated to match;
It is described that described image is divided into the corresponding character image of each text in the text document, comprising:
By described image binaryzation, bianry image is obtained;
The bianry image is progressively scanned from top to bottom, calculates the pixel of the every row image scanned, and is obtained according to the pixel
Take the floor projection of described image;
The blank space formed using the blank spaces of every row text in the ranks in described image in the floor projection, will be described
Image segmentation is not go together, and obtains row image;
Each row image is from left to right scanned, calculates the pixel of all texts in the every row image scanned, and according to institute
State the upright projection that pixel obtains described image;
The blank space formed using the blank spaces between the text in each row image in the upright projection, will be described
Row image segmentation is single character block, and the character block is the corresponding character image of text each in the text document;
Judge whether the spacing of two neighboring character block is greater than preset threshold;
When the spacing is not more than the preset threshold, the two character blocks are merged into a character block;
The Euler's numbers for calculating the text in each character image, and the text is determined according to the odd even of the Euler's numbers
The corresponding numerical characteristic of image, comprising:
Using image recognition algorithm, the connected region number and hole number of the text in each character image are identified;
By the difference of the connected region number and described hole number, the Euler's numbers of the text are calculated;
When the Euler's numbers are odd number, the corresponding numerical characteristic of the character image is " 1 ";
When the Euler's numbers are even number, the corresponding numerical characteristic of the character image is " 0 ";
If the numerical characteristic and the digital watermark information to be embedded mismatch, change in the character image
The topological structure of text, and the Euler's numbers of the text after changing the topological structure are calculated, so that the numerical characteristic of the Euler's numbers
Match with the digital watermark information to be embedded, comprising:
If the numerical characteristic and the digital watermark information to be embedded mismatch, the text in the character image is extracted
Skeleton, and the insertion point for being embedded in digital watermark information is determined in the text skeleton, the insertion point is strokes of characters intersection
Point;
The insertion point is expanded, to disconnect the point of the strokes of characters intersection, to change the text in the character image
The topological structure of word, and calculate the Euler's numbers of the text after changing the topological structure so that the numerical characteristic of the Euler's numbers with
The digital watermark information to be embedded matches;
The text skeleton extracted in the character image, and insertion digital watermark information is determined in the text skeleton
Insertion point, comprising:
Using morphological image algorithm, the character image is converted to the text skeleton of only one pixel connection;
At least one angle point for extracting the text skeleton, by all angle points except positioned at the character image edge angle point it
Insertion point of the outer any angle point as embedding information;
It is described that the insertion point is expanded, to disconnect the point of the strokes of characters intersection, comprising:
Longest straight line in all straight lines adjacent with the insertion point is obtained in the text skeleton;
Using longest straight slope structural texture member, the insertion point is expanded by structural elements, to disconnect the text
The point of stroke intersection.
2. the method according to claim 1, wherein described in the topology for changing the text in the character image
Structure, and the Euler's numbers of the text after changing the topological structure are calculated, so that the numerical characteristic of the Euler's numbers is with described to embedding
Enter after digital watermark information matches, the method further include:
Numerical characteristic character image corresponding with the text that the digital watermark information to be embedded matches is merged.
3. a kind of digital watermark processing device traced to the source for printing document, which is characterized in that described device includes:
Divide module, for text document to be converted to image, and described image is divided into each text in the text document
Corresponding character image;
Computing module, for calculating the Euler's numbers of the text in each character image, and it is true according to the odd even of the Euler's numbers
Determine the corresponding numerical characteristic of the character image;
Obtain module, for obtaining the digital watermark information to be embedded of each character image, and judge the numerical characteristic with
Whether the digital watermark information to be embedded matches;
Processing module changes the text if mismatched for the numerical characteristic and the digital watermark information to be embedded
The topological structure of text in word image, and the Euler's numbers of the text after changing the topological structure are calculated, so that the Euler's numbers
Numerical characteristic match with the digital watermark information to be embedded;
The segmentation module, comprising:
Submodule is handled, for obtaining bianry image for described image binaryzation;
First scanning submodule calculates the picture of the every row image scanned for progressively scanning the bianry image from top to bottom
Element, and according to the floor projection of pixel acquisition described image;
First segmentation submodule, for utilizing the blank spaces of every row text in described image in the ranks in the floor projection shape
At blank space, described image is divided into and is not gone together, row image is obtained;
Second scanning submodule calculates all in the every row image scanned for from left to right scanning each row image
The pixel of text, and according to the upright projection of pixel acquisition described image;
Second segmentation submodule, for utilizing the blank spaces between the text in each row image in the upright projection shape
At blank space, by the row image segmentation be single character block, the character block be the text document in each text
Corresponding character image;
Judging submodule, for judging whether the spacing of two neighboring character block is greater than preset threshold;
Merge submodule, for when the spacing is not more than the preset threshold, the two character blocks to be merged into a word
Accord with block;
The computing module, comprising:
It identifies submodule, for using image recognition algorithm, identifies the connected region number of the text in each character image
With hole number;
Computational submodule calculates the Euler's numbers of the text for the difference by the connected region number and described hole number;
First determines submodule, for when the Euler's numbers are odd number, the corresponding numerical characteristic of the character image to be " 1 ";
Second determines submodule, for when the Euler's numbers are even number, the corresponding numerical characteristic of the character image to be " 0 ";
The processing module, comprising:
Extracting sub-module, if mismatched for the numerical characteristic and the digital watermark information to be embedded, described in extraction
Text skeleton in character image, and the insertion point for being embedded in digital watermark information, the insertion are determined in the text skeleton
Point is the point of strokes of characters intersection;
Submodule is expanded, for expanding to the insertion point, to disconnect the point of the strokes of characters intersection, described in changing
The topological structure of text in character image, and the Euler's numbers of the text after changing the topological structure are calculated, so that the Euler
Several numerical characteristics matches with the digital watermark information to be embedded;
The extracting sub-module, comprising:
The character image is converted to the text of only one pixel connection for utilizing morphological image algorithm by converting unit
Character skeleton;
Extraction unit is located at the text figure for removing in all angle points for extracting at least one angle point of the text skeleton
As edge angle point except insertion point of any angle point as embedding information;
The expansion submodule, comprising:
Acquiring unit, for obtained in the text skeleton with longest one in the adjacent all straight lines of the insertion point it is straight
Line;
Expansion cell, for being expanded to the insertion point by structural elements using longest straight slope structural texture member, with
Disconnect the point of the strokes of characters intersection.
4. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein described
Processor, the communication interface, the memory complete mutual communication by communication bus;
The memory, for storing computer program;
The processor when for executing the program stored on memory, realizes any method step of claim 1-2
Suddenly.
5. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
Program when the computer program is executed by processor, realizes any method and step of claim 1-2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710838786.8A CN107644391B (en) | 2017-09-18 | 2017-09-18 | It is a kind of for printing the digital watermark treatment method and device that document is traced to the source |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710838786.8A CN107644391B (en) | 2017-09-18 | 2017-09-18 | It is a kind of for printing the digital watermark treatment method and device that document is traced to the source |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107644391A CN107644391A (en) | 2018-01-30 |
CN107644391B true CN107644391B (en) | 2019-11-26 |
Family
ID=61111903
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710838786.8A Active CN107644391B (en) | 2017-09-18 | 2017-09-18 | It is a kind of for printing the digital watermark treatment method and device that document is traced to the source |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107644391B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428356B (en) * | 2019-07-22 | 2023-04-28 | 中孚安全技术有限公司 | Paper printed part hidden watermark tracing method, system, terminal and storage medium |
CN111028123B (en) * | 2019-11-11 | 2022-05-20 | 浙江大学 | Anti-printing large-capacity text digital watermarking method |
CN113139547B (en) * | 2020-01-20 | 2022-04-29 | 阿里巴巴集团控股有限公司 | Text recognition method and device, electronic equipment and storage medium |
CN112053275B (en) * | 2020-07-14 | 2023-03-21 | 清华大学 | Printing and scanning attack resistant PDF document watermarking method and device |
CN117350909A (en) * | 2023-10-24 | 2024-01-05 | 江苏群杰物联科技有限公司 | Text watermark processing method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102368328A (en) * | 2011-09-19 | 2012-03-07 | 北京航空航天大学 | Digital watermarking method applied to counterfeit prevention for print documents |
CN102592126A (en) * | 2010-11-15 | 2012-07-18 | 柯尼卡美能达美国研究所有限公司 | Method for binarizing scanned document images containing gray or light colored text printed with halftone pattern |
CN105260148A (en) * | 2015-10-22 | 2016-01-20 | 苏州恒盛信息技术有限公司 | Printing file authenticating and tracing method and system based on electronic label |
CN106845475A (en) * | 2016-12-15 | 2017-06-13 | 西安电子科技大学 | Natural scene character detecting method based on connected domain |
-
2017
- 2017-09-18 CN CN201710838786.8A patent/CN107644391B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102592126A (en) * | 2010-11-15 | 2012-07-18 | 柯尼卡美能达美国研究所有限公司 | Method for binarizing scanned document images containing gray or light colored text printed with halftone pattern |
CN102368328A (en) * | 2011-09-19 | 2012-03-07 | 北京航空航天大学 | Digital watermarking method applied to counterfeit prevention for print documents |
CN105260148A (en) * | 2015-10-22 | 2016-01-20 | 苏州恒盛信息技术有限公司 | Printing file authenticating and tracing method and system based on electronic label |
CN106845475A (en) * | 2016-12-15 | 2017-06-13 | 西安电子科技大学 | Natural scene character detecting method based on connected domain |
Non-Patent Citations (1)
Title |
---|
基于字符欧拉数的抗打印扫描文本水印算法;李艳 等;《第九届中国通信学会学术年会论文集》;20131029;第421-425页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107644391A (en) | 2018-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107644391B (en) | It is a kind of for printing the digital watermark treatment method and device that document is traced to the source | |
Gebhardt et al. | Document authentication using printing technique features and unsupervised anomaly detection | |
JP3373811B2 (en) | Method and apparatus for embedding and detecting watermark information in black and white binary document image | |
US9400769B2 (en) | Document layout system | |
Amano et al. | A feature calibration method for watermarking of document images | |
CN101615252B (en) | Method for extracting text information from adaptive images | |
TWI284288B (en) | Text region recognition method, storage medium and system | |
CN108052653A (en) | Acquisition methods, device, storage medium, terminal and the image search method of characteristics of image descriptor | |
CN102194123B (en) | Method and device for defining table template | |
JP3943638B2 (en) | Automatic recognition method of drop word in document image without using OCR | |
CN102339352A (en) | Electronic paper marking method | |
US8144925B2 (en) | Mapping based message encoding for fast reliable visible watermarking | |
CN103336961A (en) | Interactive natural scene text detection method | |
Kwag et al. | Efficient skew estimation and correction algorithm for document images | |
JP2008176521A (en) | Pattern separation extraction program, pattern separation extraction apparatus and pattern separation extraction method | |
Nguyen et al. | On the security of text-based 3D CAPTCHAs | |
CN108829711A (en) | A kind of image search method based on multi-feature fusion | |
CN109190339A (en) | A kind of webpage digital watermarking image generates, identification, Method of printing and device | |
CN104182966A (en) | Automatic splicing method of regular shredded paper | |
Das et al. | Heuristic based script identification from multilingual text documents | |
CN103985078A (en) | Image and text mixing digital watermark embedding and extracting method of resisting to printing and scanning | |
AU2009202451B2 (en) | Image processing apparatus, image forming apparatus and program | |
US9141854B2 (en) | Method and apparatus for generating structure of table included in image | |
CN107798649A (en) | The recognition methods of picture and device | |
CN103020651B (en) | Method for detecting sensitive information of microblog pictures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Yang Yu Inventor after: Chen Yuwei Inventor after: Lei Min Inventor after: Li Deyin Inventor after: Zhan Rui Inventor before: Yang Yu Inventor before: Chen Yuwei Inventor before: Lei Min |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |