CN107644391B - It is a kind of for printing the digital watermark treatment method and device that document is traced to the source - Google Patents

It is a kind of for printing the digital watermark treatment method and device that document is traced to the source Download PDF

Info

Publication number
CN107644391B
CN107644391B CN201710838786.8A CN201710838786A CN107644391B CN 107644391 B CN107644391 B CN 107644391B CN 201710838786 A CN201710838786 A CN 201710838786A CN 107644391 B CN107644391 B CN 107644391B
Authority
CN
China
Prior art keywords
text
image
euler
numbers
digital watermark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710838786.8A
Other languages
Chinese (zh)
Other versions
CN107644391A (en
Inventor
杨榆
陈雨薇
雷敏
李德印
詹瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201710838786.8A priority Critical patent/CN107644391B/en
Publication of CN107644391A publication Critical patent/CN107644391A/en
Application granted granted Critical
Publication of CN107644391B publication Critical patent/CN107644391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention provides a kind of for printing the digital watermark treatment method and device that document is traced to the source, and the method includes being converted to image for text document, and divide the image into the corresponding character image of each text in text document;The Euler's numbers of the text in each character image are calculated, and the corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers;The digital watermark information to be embedded of each character image is obtained, and judges whether numerical characteristic matches with digital watermark information to be embedded;If numerical characteristic and digital watermark information to be embedded mismatch, change the topological structure of the text in character image, and calculate the Euler's numbers of the text after changing topological structure, so that the numerical characteristic of the Euler's numbers matches with digital watermark information to be embedded.Using the embodiment of the present invention, the safety of papery security files output can be improved.

Description

It is a kind of for printing the digital watermark treatment method and device that document is traced to the source
Technical field
The present invention relates to information security fields, more particularly to a kind of digital watermark processing side to trace to the source for printing document Method and device.
Background technique
With the fast development of electronic information technology, the continuous improvement of the level of informatization of various circles of society, all kinds of multimedias File such as electronic document, image, video are widely used in daily life.Wherein, electronic document has wound The advantages that building quick, saving space, transport convenience, becomes the convenient vehicle of information interchange.Many enterprises and unit it is daily File even confidential information is stored and is transmitted using the form of electronic document.It is contained in these documents varied Information, have huge economic value and application value.But for printing after paper document and its copy, due to Lack document tracing information, and lead to not the source for determining document print, the random of paper document is caused to print, intentionally Or unintentionally illegal propagation, further increase the difficulty of document print control.Based on the background occur digital watermark technology, Effective protection can be carried out to document copyright, and authenticity of products can be identified, be widely used in copyright protection, privacy communication and visited It asks the multiple fields such as control, while guaranteeing electronic document safety, also ensures the safety of papery security files output.
Digital watermark technology is a kind of copyright protection technology, for carriers such as video, image, documents, watermark information is (special Calibration is known) it is embedded, or modify to its certain specific structure.Carrier after insertion contains watermark information, watermark Information is not easy to be noticeable or modify, and the original cost value of carrier is not affected.Watermark information can the person of being embedded into carry out identification with It extracts, and insertion person can identify the information such as the copyright owner and authorization according to watermark information, additionally it is possible to which judgement should Whether works are modified.
It is existing to be used to print the digital watermark treatment method that document is traced to the source, pass through the line space and word for changing document text Spacing carries out the insertion of digital watermark information.Specifically, using document as carrier, for by changing document text in the ranks Away from come the method that carries out the insertion of digital watermark information, the spacing of every row text is first calculated, former and later two adjacent rows are then calculated The ratio of spacing determines the insertion of digital watermark information further according to ratio, if adjacent line space ratio and number to be embedded Watermark information is not consistent, then carries out the insertion of digital watermark information by changing the line space of document text, such as, it is specified that preceding When the ratio of latter two adjacent line space is greater than 1, the digital watermark information of insertion is 1, the ratio of former and later two adjacent line spaces When no more than 1, the digital watermark information of insertion is 0, when the spacing of the first row and the second row text, is composed a piece of writing with the second row and third The ratio of word space is 1.2, and digital watermark information to be embedded is 0, then changes the spacing of the first row and the second row text, make The spacing for obtaining the first row and the second row text is not more than 1 with the second row and the ratio of the third line text spacing, thus in the first row And second be embedded in digital watermark information 0 between row text.Equally, for carrying out number by the word space for changing document text The method of the insertion of watermark information first calculates the spacing of the adjacent text of every two, then calculates former and later two adjacent word spaces Ratio determines the insertion of digital watermark information further according to ratio, if adjacent word space ratio and digital watermarking to be embedded are believed Breath is not consistent, then carries out the insertion of digital watermark information by changing the word space of document text.
But by the existing digital watermark treatment method traced to the source for printing document, for line space algorithm, water It is too small to print capacity;For word space algorithm, due to the watermark information of insertion be between the word space of text, printing document into When row copying and scanning, the pixel at text edge may will do it overturning, cause the change of word space, in addition, copying and scanning process In if zoomed in and out to document, can make word space difference that threshold value be not achieved, if wanting to resist these attacks, need to sacrifice watermark The transparency, spacing change is adjusted so as to larger, this all makes watermark information be difficult to take into account the transparency and robustness, and watermark is caused to be believed The availability of breath is poor, reduces the safety of papery security files output.
Summary of the invention
Being designed to provide for the embodiment of the present invention is a kind of for printing the digital watermark treatment method and dress that document is traced to the source It sets, to improve the availability of digital watermarking, to improve the safety of papery security files output.Specific technical solution is as follows:
The digital watermark treatment method that the embodiment of the invention discloses a kind of to trace to the source for printing document, the method packet It includes:
Text document is converted into image, and described image is divided into the corresponding text of each text in the text document Image;
The Euler's numbers of the text in each character image are calculated, and the text is determined according to the odd even of the Euler's numbers The corresponding numerical characteristic of image;
Obtain the digital watermark information to be embedded of each character image, and judge the numerical characteristic with it is described to be embedded Whether digital watermark information matches;
If the numerical characteristic and the digital watermark information to be embedded mismatch, change in the character image The topological structure of text, and the Euler's numbers of the text after changing the topological structure are calculated, so that the numerical characteristic of the Euler's numbers Match with the digital watermark information to be embedded.
Optionally, described to change the topological structure of the text in the character image, and calculate and change the topology knot The Euler's numbers of text after structure, so that after the numerical characteristic of the Euler's numbers and the digital watermark information to be embedded match, The method further include:
Numerical characteristic character image corresponding with the text that the digital watermark information to be embedded matches is merged.
It is optionally, described that described image is divided into the corresponding character image of each text in the text document, comprising:
By described image binaryzation, bianry image is obtained;
The bianry image is progressively scanned from top to bottom, calculates the pixel of the every row image scanned, and according to the picture Element obtains the floor projection of described image;
The blank space formed using the blank spaces of every row text in the ranks in described image in the floor projection, will Described image, which is divided into, does not go together, obtains row image;
Each row image is from left to right scanned, the pixel of all texts in the every row image scanned, and root are calculated The upright projection of described image is obtained according to the pixel;
The blank space formed using the blank spaces between the text in each row image in the upright projection, will The row image segmentation is single character block, and the character block is the corresponding character image of text each in the text document;
Judge whether the spacing of two neighboring character block is greater than preset threshold;
When the spacing is not more than the preset threshold, the two character blocks are merged into a character block.
Optionally, the Euler's numbers for calculating the text in each character image, and according to the odd even of the Euler's numbers Determine the corresponding numerical characteristic of the character image, comprising:
Using image recognition algorithm, the connected region number and hole number of the text in each character image are identified;
By the difference of the connected region number and described hole number, the Euler's numbers of the text are calculated;
When the Euler's numbers are odd number, the corresponding numerical characteristic of the character image is " 1 ";
When the Euler's numbers are even number, the corresponding numerical characteristic of the character image is " 0 ".
Optionally, if the numerical characteristic and the digital watermark information to be embedded mismatch, described in change The topological structure of text in character image, and the Euler's numbers of the text after changing the topological structure are calculated, so that the Euler Several numerical characteristics matches with the digital watermark information to be embedded, comprising:
If the numerical characteristic and the digital watermark information to be embedded mismatch, extract in the character image Text skeleton, and the insertion point for being embedded in digital watermark information is determined in the text skeleton, the insertion point is strokes of characters The point of intersection;
The insertion point is expanded, to disconnect the point of the strokes of characters intersection, to change in the character image Text topological structure, and the Euler's numbers of the text after changing the topological structure are calculated, so that the number of the Euler's numbers is special Sign matches with the digital watermark information to be embedded.
Optionally, the text skeleton extracted in the character image, and insertion number is determined in the text skeleton The insertion point of word watermark information, comprising:
Using morphological image algorithm, the character image is converted to the text skeleton of only one pixel connection;
At least one angle point for extracting the text skeleton will remove in all angle points positioned at the angle at the character image edge Insertion point of any angle point as embedding information except point.
Optionally, described that the insertion point is expanded, to disconnect the point of the strokes of characters intersection, comprising:
Longest straight line in all straight lines adjacent with the insertion point is obtained in the text skeleton;
Using longest straight slope structural texture member, the insertion point is expanded by structural elements, described in disconnecting The point of strokes of characters intersection.
The digital watermark processing device that the embodiment of the invention also discloses a kind of to trace to the source for printing document, described device packet It includes:
Divide module, for text document to be converted to image, and described image is divided into the text document respectively The corresponding character image of text;
Computing module, for calculating the Euler's numbers of the text in each character image, and according to the surprise of the Euler's numbers It is even to determine the corresponding numerical characteristic of the character image;
Module is obtained, for obtaining the digital watermark information to be embedded of each character image, and judges that the number is special Whether sign matches with the digital watermark information to be embedded;
Processing module changes institute if mismatched for the numerical characteristic and the digital watermark information to be embedded The topological structure of the text in character image is stated, and calculates the Euler's numbers of the text after changing the topological structure, so that the Europe The numerical characteristic of number is drawn to match with the digital watermark information to be embedded.
Optionally, the device further include:
Merging module, the character image for the numerical characteristic and the digital watermark information to be embedded to match close And.
Optionally, the segmentation module, comprising:
Submodule is handled, for obtaining bianry image for described image binaryzation;
First scanning submodule calculates the every row image scanned for progressively scanning the bianry image from top to bottom Pixel, and according to the pixel obtain described image floor projection;
First segmentation submodule, for being thrown using the blank spaces of every row text in described image in the ranks in the level The blank space that shadow is formed, described image is divided into and is not gone together, row image is obtained;
Second scanning submodule calculates in the every row image scanned for from left to right scanning each row image The pixel of all texts, and according to the upright projection of pixel acquisition described image;
Second segmentation submodule, for utilizing the blank spaces between the text in each row image in the vertical throwing The blank space that shadow is formed, is single character block by the row image segmentation, and the character block is each in the text document The corresponding character image of text;
Judging submodule, for judging whether the spacing of two neighboring character block is greater than preset threshold;
Merge submodule, for when the spacing is not more than the preset threshold, the two character blocks to be merged into one A character block.
Optionally, the computing module, comprising:
It identifies submodule, for using image recognition algorithm, identifies the connected region of the text in each character image Domain number and hole number;
Computational submodule calculates the Europe of the text for the difference by the connected region number and described hole number Draw number;
First determines submodule, and for when the Euler's numbers are odd number, the corresponding numerical characteristic of the character image is "1";
Second determines submodule, and for when the Euler's numbers are even number, the corresponding numerical characteristic of the character image is “0”。
Optionally, the processing module, comprising:
Extracting sub-module is extracted if mismatched for the numerical characteristic and the digital watermark information to be embedded Text skeleton in the character image, and the insertion point for being embedded in digital watermark information is determined in the text skeleton, it is described Insertion point is the point of strokes of characters intersection;
Submodule is expanded, for expanding to the insertion point, to disconnect the point of the strokes of characters intersection, to change The topological structure of text in the character image, and the Euler's numbers of the text after changing the topological structure are calculated, so that should The numerical characteristic of Euler's numbers matches with the digital watermark information to be embedded.
Optionally, the extracting sub-module, comprising:
The character image is converted to the connection of only one pixel for utilizing morphological image algorithm by converting unit Text skeleton;
Extraction unit is located at the text for removing in all angle points for extracting at least one angle point of the text skeleton Insertion point of any angle point as embedding information except the angle point of word image border.
Optionally, the expansion submodule, comprising:
Acquiring unit, for being obtained in the text skeleton longest one in all straight lines adjacent with the insertion point Straight line;
Expansion cell, for being carried out by structural elements to the insertion point swollen using longest straight slope structural texture member It is swollen, to disconnect the point of the strokes of characters intersection.
The embodiment of the invention also discloses a kind of electronic equipment, including processor, communication interface, memory and communication are total Line, wherein the processor, the communication interface, the memory complete mutual communication by communication bus;
The memory, for storing computer program;
The processor when for executing the program stored on the memory, is realized above-mentioned a kind of for printing text The digital watermark treatment method step that shelves are traced to the source.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is also disclosed, it is described computer-readable Instruction is stored in storage medium, when run on a computer, so that computer executes a kind of any of the above-described use In the digital watermark treatment method that printing document is traced to the source.
It is provided in an embodiment of the present invention a kind of for printing the digital watermark treatment method and device that document is traced to the source, the number Text document is first converted to image by water mark method, and divides the image into character image, then by calculating each text figure The Euler's numbers of text as in, and the corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers, then judge the number Whether feature matches with digital watermark information to be embedded, if it does not match, changing the topology knot of the text in character image Structure, and the Euler's numbers of the text after changing topological structure are calculated, so that the numerical characteristic of the Euler's numbers and digital watermarking to be embedded Information matches.The Euler's numbers of this topologies adjusting text by text, come make the numerical characteristic of Euler's numbers with to embedding Enter the method that digital watermark information matches, malice or unintentionally Attack Digital Watermarking, when so that document being divulged a secret, Neng Goucong can be resisted Watermark information is extracted in the document divulged a secret, the source of printing document is determined according to watermark information, completes the tracing to the source of document of divulging a secret, The availability of digital watermarking is improved, to improve the safety of papery security files output.Certainly, implement of the invention appoint One product or method must be not necessarily required to reach all the above advantage simultaneously.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is that a kind of process for printing the digital watermark treatment method that document is traced to the source provided in an embodiment of the present invention is shown It is intended to;
Fig. 2 is that a kind of process for printing the digital watermark treatment method that document is traced to the source provided in an embodiment of the present invention is shown It is intended to;
Fig. 3 is provided in an embodiment of the present invention a kind of for printing the effect pair for the digital watermark treatment method that document is traced to the source Than figure;
Fig. 4 is that a kind of structure for printing the digital watermark processing device that document is traced to the source provided in an embodiment of the present invention is shown It is intended to;
Fig. 5 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Digital watermarking be protection information security, realize it is anti-fake trace to the source, the effective way of copyright protection, be Information Hiding Techniques The important branch and research direction of research field.In order to usurp the document of oneself by others easily, or declare to advertisement It passes, we generally add watermark in the document of oneself.Wherein, digital watermarking has characteristics that the 1, transparency.Watermark it is saturating Bright property refers to whether the carrier after insertion watermark can cause people's discovering visually.2, robustness.The robustness of watermark is The index that all kinds of attacking abilities are resisted in watermark is measured, wherein attack includes compressing, rotating, cut etc..3, capacity.Capacity refers to load Body can accommodate the number of watermark information, and the unit of capacity is often bit.4, safety.Safety refers to what watermark information was hidden Position and content are not well known, and the transformation of file format not will lead to the loss of watermark data, and unauthorized user can not detect With destruction watermark.
Compared with general pattern, the color of text image, texture are simple, and the degree of redundancy of transform domain is lower, therefore difficult To use general transform domain method insertion digital watermark information.Based on file and picture monochrome pixels Proportionality design can not water breakthrough Although print embedding grammar solves the problems, such as that visual identification cuts out attack on the other hand, due to printing, copier halftone process algorithm Equal system functions have differences, and the robustness of such algorithm is insufficient, and after manuscript is repeatedly duplicated, watermark information may be gone completely It removes.Based on this, a kind of digital watermark treatment method traced to the source for printing document provided in an embodiment of the present invention passes through text The Euler's numbers of topologies adjusting text, come the side for making the numerical characteristic of Euler's numbers and digital watermark information to be embedded match Method, in this way, even if no matter in the ranks text is not influenced when being embedded in digital watermark information by text line space and word space, i.e., Away from how changing with word space, embedded digital watermark information be it is immovable, not only increase the availability of digital watermarking, Also improve the safety of papery security files output.Detailed process is as follows:
Referring to Fig. 1, Fig. 1 is a kind of digital watermark treatment method traced to the source for printing document provided in an embodiment of the present invention Flow diagram, include the following steps:
Text document is converted to image, and is divided into each text in the text document corresponding described image by S101 Character image.
Specifically, due to it is provided by the present invention it is a kind of for printing the digital watermark treatment method that document is traced to the source be to be based on The digital watermark treatment method of image, it is therefore desirable to text document is first converted into image, i.e., document format is converted into picture Document storing can be directly picture format, can also convert document to picture by file format converter tools by format Format etc..
In addition, the present invention be directed to each texts in image to carry out digital watermark processing, therefore text document is turned It is changed to after image, it is also necessary to divide the image into the corresponding character image of each text in text document, i.e., to the word in image Symbol is split.Since the text image of acquisition not only contains the character one by one of composition text, and contain character row Blank between spacing and word, or even various punctuation marks can be also had, this just needs the character cutting one by one in text Come, the pattern matrix of single word is formed, to carry out individual character identifying processing.The task of Character segmentation is multirow or more character pictures In each character split from whole image, become single character.Here, each text in text document is divided the image into The corresponding character image of word, so as to more accurately carry out the embedding of digital watermarking to the text in each character image after segmentation Enter.
S102, calculates the Euler's numbers of the text in each character image, and determines institute according to the odd even of the Euler's numbers State the corresponding numerical characteristic of character image.
Specifically, the definition of Euler's numbers is the difference of connected region number and hole number.It is H, object in a sub-picture Hole number The connected region number of body is C, then Euler's numbers are as follows: E=C-H.Euler's numbers are region description based on image geometry feature, and Europe Draw number not by extending or rotation transformation is influenced, it can thus be assumed that the Euler's numbers of character are not influenced by print scanned.It is utilizing When numerical characteristic of the odd even of text Euler's numbers to determine character image, need to identify the Euler's numbers of text, it is specified that text Euler Number, which is divided exactly to Yu by 2,1 represents numerical characteristic " 1 ", it is specified that Euler's numbers, which are divided exactly to Yu by 2,0 represents numerical characteristic " 0 ".
S103, obtains the digital watermark information to be embedded of each character image, and judge the numerical characteristic with it is described Whether digital watermark information to be embedded matches.
Specifically, the information of digital watermark information user oneself setting to be embedded.If numerical characteristic is " 1 " or " 0 ", Then digital watermark information to be embedded is one of " 1 " or " 0 ".For example, for text document " phychology determines all " to Being embedded in digital watermark information is " 010110 ", and numerical characteristic is " 001110 ", then by judging numerical characteristic and number to be embedded Whether word watermark information matches, it is known that the insertion digital watermark information and number to be embedded of " state " and " certainly " word in text document Word watermark information mismatches.Here, by judging whether numerical characteristic matches with digital watermark information to be embedded, thus quickly It determines whether to be embedded in digital watermark information to be embedded for character image, so that embedded digital watermark information and number to be embedded Watermark information is identical, improves the availability of digital watermark information.
S104 changes the text figure if the numerical characteristic and the digital watermark information to be embedded mismatch The topological structure of text as in, and the Euler's numbers of the text after changing the topological structure are calculated, so that the number of the Euler's numbers Word feature matches with the digital watermark information to be embedded.
Specifically, needing to change in character image if numerical characteristic and digital watermark information to be embedded mismatch The topological structure of text.By the Euler's numbers of the topologies adjusting text of text, so that changing the text after topological structure Numerical characteristic corresponding to the corresponding character image of word, it is consistent with digital watermark information to be embedded.
In addition, not changing the character image if numerical characteristic matches with digital watermark information to be embedded.
It can be seen that a kind of digital watermark treatment method traced to the source for printing document provided in an embodiment of the present invention, first Text document is converted into image, and divides the image into character image, then by calculating the text in each character image Euler's numbers, and the corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers, then judge the numerical characteristic with to embedding Enter whether digital watermark information matches, if matched, which is not handled, if it does not match, changing Become the topological structure of the text in character image, and calculate the Euler's numbers of the text after changing topological structure, so that the Euler's numbers Numerical characteristic match with digital watermark information to be embedded.The Euler's numbers of this topologies adjusting text by text, Come the method for making the numerical characteristic of Euler's numbers and digital watermark information to be embedded match, malice or unintentionally watermark can be resisted Attack, when so that document being divulged a secret, can extract watermark information from the document divulged a secret, and determine printing document according to watermark information Source, complete the tracing to the source of document of divulging a secret, improve the availability of digital watermarking, to improve the output of papery security files Safety.
It in embodiments of the present invention, can also be by number after the Euler's numbers by the topologies adjusting text of text Word feature merges with the character image that digital watermark information to be embedded matches.
Specifically, the numerical characteristic and digital watermark information to be embedded of the Euler's numbers due to the text after change topological structure Match, therefore, the character image that numerical characteristic matches with digital watermark information to be embedded is merged, after obtaining merging Character image be embedded in watermark information after entire text document corresponding to image, for protect document copyright and trace to the source Provide foundation.
In an optional embodiment of the present invention, the corresponding text figure of each text in text document is divided the image into Picture is specifically as follows:
Image binaryzation is obtained bianry image by the first step.
Specifically, since there is diversity for character font, so in general character recognition system, character recognition it Before will first to image carry out binaryzation, carry out capable segmentation, again then to be partitioned into the character figure that specific two-value one by one indicates Picture point battle array, the input data as monocase identification.The binaryzation of image is exactly that the gray value of the pixel on image is arranged It is 0 or 255, that is, whole image is showed and significantly there was only black and white visual effect.In Digital Image Processing, two Value image plays a very important role, and the binaryzation of image is conducive to being further processed for image, becomes image simply, and And data volume reduces, and can highlight the profile of interested target.
Second step progressively scans bianry image from top to bottom, calculates the pixel of the every row image scanned, and according to pixel Obtain the floor projection of image.
Specifically, carrying out capable segmentation and column split to the character in bianry image, projection localization, projection localization are generally used It is the interval using character, divides single character.And projection localization need first for input binaryzation character image on to Then lower progressive scan calculates the sum of the pixel value of each scan line, to obtain the floor projection of character image.
Third step, the blank space formed using the blank spaces of every row text in the ranks in image in floor projection will Image segmentation is not go together, and obtains row image.
Specifically, during the scanning process, character image is more regular along the floor projection of line direction, each of projection Wave crest is corresponding with each line of text in image, and having between two adjacent rows than wider one section of projection information is 0, this is The white space between adjacent rows is corresponded to, i.e., the blank spaces of every row text in the ranks are empty in the blank that floor projection is formed Gap.According to this rule, row cutting is easier, it is directly right after the character image of whole picture being projected in the horizontal direction Character image carries out capable segmentation, can not only improve the efficiency of row segmentation, and can be improved the accuracy rate of row segmentation.
4th step from left to right scans each row image, calculates the pixel of all texts in the every row image scanned, and The upright projection of image is obtained according to pixel.
It does not go together specifically, dividing the image into, after obtaining row image, is carried out in every this corresponding row image of style of writing Column split obtains individual character one by one.And according to the blank spaces between the text in each row image vertical when column split The blank space formed is projected come what is be split, this just needs first from left to right to scan each row image, then calculates each column figure The sum of pixel value as in, the upright projection of image is obtained with this.It here, is also to take whole image when progress column split Instead, i.e., black matrix wrongly written or mispronounced character is become from white gravoply, with black engraved characters, bianry image black pixel value is 0, therefore white pixel value 1 is projecting When, obtaining the projection value at inter-character space (blank space) is 0, i.e., the sum of gap location each column pixel value is 0.In this way, can root According to the sum of the pixel value of gap location, single text more accurately is partitioned into from every row image.
5th step, the blank space formed using the blank spaces between the text in each row image in upright projection will Row image segmentation is single character block, and character block is the corresponding character image of text each in text document.
Specifically, the upright projection of the white space in each row image between text and text, the blank spaces that will form, Single character picture is cut out by these blank spaces, in this way, the character image split is single text I.e. single character.
6th step, judges whether the spacing of two neighboring character block is greater than preset threshold.
For including the text of tiled configuration or the text of left, center, right structure in row image, due to by each row image Text between the blank space that is formed in upright projection of blank spaces, be single word by the Text segmentation in each row image Block is accorded with, which may be a part of a text, such as an only radical of text.And two adjacent characters it Between spacing it is bigger than the spacing in a character, for example, two adjacent characters " if ", the spacing ratio between " such as " and " fruit " Spacing in " such as " word between " female " and " mouth " is big.Therefore, it is necessary to judge it is default whether the spacing of two neighboring character block is greater than Threshold value, with this come to ensure divided character block all be complete text, to carry out digital watermark information to individual text Insertion.Here, preset threshold can be set according to the spacing in text document between character, can also be according to actual needs To set.
The two character blocks are merged into a character block when spacing is not more than preset threshold by the 7th step.
Specifically, showing that the two character blocks are likely to when the spacing of two neighboring character block is not more than preset threshold It is the two parts for forming a text, as soon as then the two character blocks are merged into a character block, in this way, obtaining one completely Character, thus also obtain insertion digital watermark information minimum unit so as to single text carry out watermark insertion it is accurate Property is higher.
In an optional embodiment of the present invention, the Euler's numbers of the text in each character image are calculated, and according to Euler Several odd evens determines the corresponding numerical characteristic of character image, is specifically as follows:
The first step identifies the connected region number and hole number of the text in each character image using image recognition algorithm.
Specifically, the corresponding black of bianry image 0 is background, and 1 corresponding white, is prospect due in image procossing.And Practical Chinese character is black, should be prospect, white portion should be background.Therefore when identifying Euler's numbers, character image is negated, The stroke part of text is taken as prospect, i.e. character image is black matrix wrongly written or mispronounced character.The connected region of text refers in the text mutually Disjunct white stroke number, for example, " hole " word character image negate after, i.e., " hole " word is black matrix wrongly written or mispronounced character, in " hole " word mutually not Connected white stroke number is 6, that is to say, that the connected region number in " hole " word is 6.The hole number of text refers to text Stroke fences up the closed area number of (closed curve).In the present invention, the identification method of hole number can be such as under type: white In the character image of bottom surplus, identify that its connected region number, hole number are that connected region number subtracts one.
Second step calculates the Euler's numbers of text by the difference of connected region number and hole number.
Specifically, the number for the connected region that will identify that subtracts the difference of hole number, as the Euler's numbers of the text, For example, being H in a sub-picture Hole number, the connected region number of object is C, then Euler's numbers are as follows: E=C-H.
Third step, when Euler's numbers are odd number, the corresponding numerical characteristic of character image is " 1 ".
Specifically, the numerical characteristic for all character images that the difference of connected region number and hole number is odd number is determined as “1”。
4th step, when Euler's numbers are even number, the corresponding numerical characteristic of character image is " 0 ".
Specifically, the numerical characteristic for all character images that the difference of connected region number and hole number is even number is determined as "0".Here, the numerical characteristic of character image is determined by the odd even of Euler's numbers, it on the other hand, can also be by changing text The odd even of Euler's numbers makes text be embedded in different digital watermark informations.
In embodiments of the present invention, if numerical characteristic and digital watermark information to be embedded mismatch, change text figure The topological structure of text as in, and the Euler's numbers of the text after changing topological structure are calculated, it is specifically as follows:
The first step extracts the text in character image if numerical characteristic and digital watermark information to be embedded mismatch Skeleton, and the insertion point for being embedded in digital watermark information is determined in text skeleton.
Specifically, skeleton is a kind of description for embodying graph connectedness and topological structure, in text image, skeleton embodies The most important information of character.The topological structure of character reflects the most basic information of character, therefore can be more using skeleton Easily find out stroke intersection point.If numerical characteristic and digital watermark information to be embedded mismatch, need to change character image In text topological structure.This just needs first to extract the text skeleton in character image, and insertion is determined in text skeleton The insertion point of digital watermark information, for example, the watermark information that should be embedded in is " 1 " for current character " mouth ", however its own The information (watermark information to be embedded) of carrying is " 0 ", needs to change its Euler's numbers thus, extracts the skeleton of " mouth " word first, The skeleton shape of extraction is the rectangle of only one pixel of four edges thickness, angle point is then extracted in skeleton, the angle point extracted For four angles of rectangle, one of angle point (such as lower right corner) is selected, judging the angle point really is two-stroke intersection point, then selects Stroke cutting operation is carried out at the point.Here, insertion point is the point of strokes of characters intersection.
Second step expands insertion point, to disconnect the point of strokes of characters intersection, to change the text in character image Topological structure, and calculate change topological structure after text Euler's numbers so that the numerical characteristic of the Euler's numbers with it is to be embedded Digital watermark information matches.
It is embedded in after the insertion point of digital watermark information specifically, being determined in text skeleton, insertion point is expanded, To disconnect the point of strokes of characters intersection, in this way, being disconnected the point of strokes of characters intersection, the connected region or hole number of text will Change, change so as to cause the Euler's numbers of text, the odd even of the Euler's numbers by changing text is embedded in text different Digital watermark information.The Euler's numbers of this topologies adjusting text by text, come make the numerical characteristics of Euler's numbers with to The method that insertion digital watermark information matches can resist malice or unintentionally Attack Digital Watermarking can when so that document being divulged a secret Watermark information is extracted from the document divulged a secret, the source of printing document is determined according to watermark information, completes the tracing back of document of divulging a secret Source, improves the availability of digital watermarking, to improve the safety of papery security files output.
A kind of process for printing the digital watermark treatment method that document is traced to the source provided in embodiments of the present invention is shown It is intended to, as shown in Figure 2.A width character image on the left side is source character image in Fig. 2, and an intermediate width character image is will be left Image after the one width character image binaryzation on side, and show on bianry image the insertion point of embeddable watermark information, respectively It is point corresponding to white box in figure, the secondary character image of the one of the right is the character image changed after text topological structure.Its In, the secondary character image of the one of the right is by insertion point (second in " will " word stroke in an intermediate width character image With third pen as insertion point) expansion will second in " will " word stroke disconnected with third pen.Obtaining the right A secondary character image after, what second in " will " word in the secondary character image of one on the right in stroke and third pen disconnected Point can be embedded in digital watermark information to be embedded.The Euler's numbers of this topologies adjusting text by text, to make Euler The method that several numerical characteristics and digital watermark information to be embedded match can resist malice or unintentionally Attack Digital Watermarking, with When document being made to divulge a secret, watermark information can be extracted from the document divulged a secret, the source of printing document is determined according to watermark information, The tracing to the source of document of divulging a secret is completed, the availability of digital watermarking is improved, to improve the safety of papery security files output.
What is provided in embodiments of the present invention is a kind of for printing the effect pair for the digital watermark treatment method that document is traced to the source Than figure, as shown in figure 3, a line text " this is the effect of processing method " above in Fig. 3 is source document this document, a line below Text is the text document being embedded in after digital watermark information, in source document this document, "Yes", " place ", " side ", " " digital spy Sign is different from digital watermark information to be embedded, therefore changes the topological structure of these texts, passes through a style of writing the upper surface of in Fig. 3 Word and the comparison of following a line text are as can be seen that the 6th and the 7th in "Yes" word stroke is disconnected, in " place " word stroke The first stroke and third pen is disconnected, the third pen in " side " word stroke and the 4th are disconnected, " " in word stroke first Pen is disconnected with second, is embedded in point by the point intersected to these strokes and is expanded, disconnects the point that strokes of characters intersects, To change the topological structure of the text in character image, to be embedded in digital watermark information to be embedded to text.
In an optional embodiment of the present invention, the text skeleton in character image is extracted, and in text skeleton really Surely it is embedded in the insertion point of digital watermark information, is specifically as follows:
Character image is converted to the text skeleton of only one pixel connection using morphological image algorithm by the first step.
Specifically, carrying out skeleton to text first with morphological image algorithm when finding stroke cut-point (insertion point) It extracts, then strokes of characters cut-point is found out based on skeleton, specifically include: keeping original text word topological structure constant, bone is extracted to it The Euler's numbers of frame, skeleton are consistent with original text word Euler's numbers;Angle point grid is carried out to text skeleton, it is standby to do to extract multiple angle points With;In the angle point extracted, any point in addition to being located at text edge is selected, is embedded in point for the point as watermark, and remember Record the coordinate.
In addition, character image is converted to the text skeleton of only one pixel connection using morphological image algorithm, Exactly character image is refined, character image refinement occurs generally as a kind of Preprocessing Technique, it is therefore an objective to extraction source figure It is wide to be refined into only one pixel, forms " bone by the skeleton of picture for the lines that line thickness in original image is greater than 1 pixel Frame " can be relatively easy to analysis image after forming skeleton, such as extract the feature of image.Refining basic thought is " depriving layer by layer ", Deprived inwards in layer since line edge, until lines remain next pixel until.Image thinning is greatly pressed With having contracted original image data volume, and keep the Basic Topological of its shape constant, to be taken out for the feature in Text region The application such as take to lay a good foundation.
Second step extracts at least one angle point of text skeleton, is located at the character image edge for removing in all angle points Angle point except insertion point of any angle point as embedding information.
Specifically, the angle point number of each text skeleton is different, all angle points for extracting text skeleton, the angle point are first passed through At least one, then select any one angle point not being located at except the angle point at character image edge in all angle points as embedding The insertion point for entering information, in this way, greatly improving the availability of the digital watermarking of insertion.
Wherein, insertion point is expanded, to disconnect the point of strokes of characters intersection, is specifically as follows:
The first step obtains longest straight line in all straight lines adjacent with insertion point in text skeleton.
Specifically, longest straight line in all straight lines adjacent with insertion point is obtained in text skeleton, it is right in this way After insertion point is expanded, the availability of the digital watermarking of insertion can be improved.Such as the insertion point of " flutterring " word is the perpendicular of the right It, to disconnect perpendicular and point stroke intersection the point on the right of " flutterring " word, is answered with a part for intersection if to be expanded to insertion point Longest straight line in all straight lines adjacent with insertion point should be obtained in " flutterring " word, i.e., the right is perpendicular, rather than " flutterring " Stroke is erected to perpendicular and point intersection part on the right of word any a part being divide into upper part and lower part.
Second step expands insertion point by structural elements, using longest straight slope structural texture member to disconnect text The point of word stroke intersection.
Specifically, using longest straight slope structural texture member, by selecting appropriate structural elements, to original text word cut-point The region of surrounding carries out expansive working, and two-stroke originally connected disconnects after expansion, and topological structure changes, the surprise of Euler's numbers It is even to change, so that Chinese character be made to carry different watermark informations.Here, structural elements are constructed according to longest straight slope.Such as " mouth " word, Euler's numbers change into " 0 ", need to change into its Euler's numbers into " 1 ", then in bottom right corner point extract one it is longest Vertical straight line expands " mouth " lower right corner by the straight line as the structural elements of expansion, horizontal stroke and the right after expansion Perpendicular stroke disconnects, then Euler's numbers change into " 1 ".When disconnecting the point of strokes of characters intersection, make strokes of characters using the method for expansion It disconnects being embedded in digital watermark information to be embedded, improves the availability of the digital watermarking of insertion.
Referring to fig. 4, Fig. 4 is a kind of digital watermark processing device traced to the source for printing document provided in an embodiment of the present invention Structural schematic diagram, including following module:
Divide module 401, for text document to be converted to image, and divides the image into each text pair in text document The character image answered;
Computing module 402 is determined for calculating the Euler's numbers of the text in each character image, and according to the odd even of Euler's numbers The corresponding numerical characteristic of character image;
Obtain module 403, for obtaining the digital watermark information to be embedded of each character image, and judge numerical characteristic with to Whether insertion digital watermark information matches, wherein digital watermark information to be embedded is one of numerical characteristic;
Processing module 404 changes character image if mismatched for numerical characteristic and digital watermark information to be embedded In text topological structure, and calculate change topological structure after text Euler's numbers so that the numerical characteristic of the Euler's numbers Match with digital watermark information to be embedded.
It can be seen that a kind of digital watermark processing device traced to the source for printing document provided in an embodiment of the present invention, first Text document is converted into image by segmentation module, and divides the image into character image, is then calculated by computing module The Euler's numbers of text in each character image, and the corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers, then The digital watermark information to be embedded of each character image is obtained by obtaining module, and judges the numerical characteristic and digital water to be embedded Whether official seal breath matches, if it does not match, changing the topological structure of the text in character image by processing module, and counts The Euler's numbers for calculating the text after changing topological structure, so that the numerical characteristic of the Euler's numbers and digital watermark information phase to be embedded Match.The Euler's numbers of this topologies adjusting text by text, to make the numerical characteristic and digital water to be embedded of Euler's numbers The matched method of official seal manner of breathing can resist malice or unintentionally Attack Digital Watermarking can be from the text divulged a secret when so that document being divulged a secret Watermark information is extracted in shelves, the source of printing document is determined according to watermark information, the tracing to the source of document of divulging a secret is completed, improves number The availability of word watermark, to improve the safety of papery security files output.
Further, the device further include:
Merging module, for merging numerical characteristic with the character image that digital watermark information to be embedded matches.
Further, divide module 401, comprising:
Submodule is handled, for obtaining bianry image for image binaryzation;
First scanning submodule calculates the picture of the every row image scanned for progressively scanning bianry image from top to bottom Element, and according to the floor projection of pixel acquisition image;
First segmentation submodule, for what is formed using the blank spaces of every row text in image in the ranks in floor projection Blank space is divided the image into and is not gone together, obtains row image;
Second scanning submodule calculates all in the every row image scanned for from left to right scanning each row image The pixel of text, and according to the upright projection of pixel acquisition image;
Second segmentation submodule, for what is formed using the blank spaces between the text in each row image in upright projection Blank space, is single character block by row image segmentation, and character block is the corresponding character image of text each in text document;
Judging submodule, for judging whether the spacing of two neighboring character block is greater than preset threshold;
Merge submodule, for when spacing is not more than preset threshold, the two character blocks to be merged into a character block.
Further, computing module 402, comprising:
It identifies submodule, for using image recognition algorithm, identifies the connected region number of the text in each character image With hole number;
Computational submodule calculates the Euler's numbers of text for passing through the difference of connected region number and hole number;
First determines submodule, for when Euler's numbers are odd number, the corresponding numerical characteristic of character image to be " 1 ";
Second determines submodule, for when Euler's numbers are even number, the corresponding numerical characteristic of character image to be " 0 ".
Further, processing module 404, comprising:
Extracting sub-module extracts character image if mismatched for numerical characteristic and digital watermark information to be embedded In text skeleton, and in text skeleton determine insertion digital watermark information insertion point, insertion point be strokes of characters intersect Point;
Submodule is expanded, for expanding to insertion point, to disconnect the point of strokes of characters intersection, to change character image In text topological structure, and calculate change topological structure after text Euler's numbers.
Further, extracting sub-module, comprising:
Character image is converted to the text of only one pixel connection for utilizing morphological image algorithm by converting unit Character skeleton;
Extraction unit is located at the text figure for removing in all angle points for extracting at least one angle point of text skeleton As edge angle point except insertion point of any angle point as embedding information.
Further, submodule is expanded, comprising:
Acquiring unit, for obtaining longest straight line in all straight lines adjacent with insertion point in text skeleton;
Expansion cell, for being expanded to insertion point by structural elements using longest straight slope structural texture member, with Disconnect the point of strokes of characters intersection.
The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 5, include processor 501, communication interface 502, Memory 503 and communication bus 504, wherein processor 501, communication interface 502, memory 503 are complete by communication bus 504 At mutual communication.
Memory 503, for storing computer program;
Processor 501 when for executing the program stored on memory 503, realizes following steps:
Text document is converted into image, and divides the image into the corresponding character image of each text in text document;
The Euler's numbers of the text in each character image are calculated, and the corresponding number of character image is determined according to the odd even of Euler's numbers Word feature;
The digital watermark information to be embedded of each character image is obtained, and judges numerical characteristic and digital watermark information to be embedded Whether match;
If numerical characteristic and digital watermark information to be embedded mismatch, change the topology knot of the text in character image Structure, and the Euler's numbers of the text after changing topological structure are calculated, so that the numerical characteristic of the Euler's numbers and digital watermarking to be embedded Information matches.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
It can be seen that a kind of electronic equipment provided through the embodiment of the present invention, is first converted to image for text document, and Character image is divided the image into, then by calculating the Euler's numbers of the text in each character image, and according to Euler's numbers Odd even determines the corresponding numerical characteristic of character image, then judge the numerical characteristic and digital watermark information to be embedded whether phase Match, if it does not match, changing the topological structure of the text in character image, and calculates the Europe of the text after changing topological structure Number is drawn, so that the numerical characteristic of the Euler's numbers matches with digital watermark information to be embedded.This topological structure by text The Euler's numbers of text are adjusted, come the method for making the numerical characteristic of Euler's numbers and digital watermark information to be embedded match, can be supported Anti- malice or unintentionally Attack Digital Watermarking when so that document being divulged a secret, can extract watermark information from the document divulged a secret, according to water Official seal breath determines the source of printing document, completes the tracing to the source of document of divulging a secret, the availability of digital watermarking is improved, to improve The safety of papery security files output.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes any institute in above-described embodiment A kind of digital watermark treatment method traced to the source for printing document stated.Wherein, described a kind of to trace to the source for printing document Digital watermark treatment method includes:
Text document is converted into image, and divides the image into the corresponding character image of each text in text document;
The Euler's numbers of the text in each character image are calculated, and the corresponding number of character image is determined according to the odd even of Euler's numbers Word feature;
The digital watermark information to be embedded of each character image is obtained, and judges numerical characteristic and digital watermark information to be embedded Whether match;
If numerical characteristic and digital watermark information to be embedded mismatch, change the topology knot of the text in character image Structure, and the Euler's numbers of the text after changing topological structure are calculated, so that the numerical characteristic of the Euler's numbers and digital watermarking to be embedded Information matches.
It can be seen that a kind of computer readable storage medium provided through the embodiment of the present invention, first turns text document It is changed to image, and divides the image into character image, then by calculating the Euler's numbers of the text in each character image, and root The corresponding numerical characteristic of character image is determined according to the odd even of Euler's numbers, then judges the numerical characteristic and digital watermark information to be embedded Whether match, if it does not match, change character image in text topological structure, and calculate change topological structure after The Euler's numbers of text, so that the numerical characteristic of the Euler's numbers matches with digital watermark information to be embedded.It is this by text The Euler's numbers of topologies adjusting text, come the side for making the numerical characteristic of Euler's numbers and digital watermark information to be embedded match Method can resist malice or unintentionally Attack Digital Watermarking, and when so that document being divulged a secret, watermark letter can be extracted from the document divulged a secret Breath determines the source of printing document according to watermark information, completes the tracing to the source of document of divulging a secret, improves the availability of digital watermarking, To improve the safety of papery security files output.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For electronic equipment, computer readable storage medium embodiment, since it is substantially similar to the method embodiment, so the ratio of description Relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (5)

1. a kind of digital watermark treatment method traced to the source for printing document, which is characterized in that the described method includes:
Text document is converted into image, and described image is divided into the corresponding text figure of each text in the text document Picture;
The Euler's numbers of the text in each character image are calculated, and the character image is determined according to the odd even of the Euler's numbers Corresponding numerical characteristic;
The digital watermark information to be embedded of each character image is obtained, and judges the numerical characteristic and the number to be embedded Whether watermark information matches;
If the numerical characteristic and the digital watermark information to be embedded mismatch, change the text in the character image Topological structure, and the Euler's numbers of the text after changing the topological structure are calculated, so that the numerical characteristic of the Euler's numbers and institute Digital watermark information to be embedded is stated to match;
It is described that described image is divided into the corresponding character image of each text in the text document, comprising:
By described image binaryzation, bianry image is obtained;
The bianry image is progressively scanned from top to bottom, calculates the pixel of the every row image scanned, and is obtained according to the pixel Take the floor projection of described image;
The blank space formed using the blank spaces of every row text in the ranks in described image in the floor projection, will be described Image segmentation is not go together, and obtains row image;
Each row image is from left to right scanned, calculates the pixel of all texts in the every row image scanned, and according to institute State the upright projection that pixel obtains described image;
The blank space formed using the blank spaces between the text in each row image in the upright projection, will be described Row image segmentation is single character block, and the character block is the corresponding character image of text each in the text document;
Judge whether the spacing of two neighboring character block is greater than preset threshold;
When the spacing is not more than the preset threshold, the two character blocks are merged into a character block;
The Euler's numbers for calculating the text in each character image, and the text is determined according to the odd even of the Euler's numbers The corresponding numerical characteristic of image, comprising:
Using image recognition algorithm, the connected region number and hole number of the text in each character image are identified;
By the difference of the connected region number and described hole number, the Euler's numbers of the text are calculated;
When the Euler's numbers are odd number, the corresponding numerical characteristic of the character image is " 1 ";
When the Euler's numbers are even number, the corresponding numerical characteristic of the character image is " 0 ";
If the numerical characteristic and the digital watermark information to be embedded mismatch, change in the character image The topological structure of text, and the Euler's numbers of the text after changing the topological structure are calculated, so that the numerical characteristic of the Euler's numbers Match with the digital watermark information to be embedded, comprising:
If the numerical characteristic and the digital watermark information to be embedded mismatch, the text in the character image is extracted Skeleton, and the insertion point for being embedded in digital watermark information is determined in the text skeleton, the insertion point is strokes of characters intersection Point;
The insertion point is expanded, to disconnect the point of the strokes of characters intersection, to change the text in the character image The topological structure of word, and calculate the Euler's numbers of the text after changing the topological structure so that the numerical characteristic of the Euler's numbers with The digital watermark information to be embedded matches;
The text skeleton extracted in the character image, and insertion digital watermark information is determined in the text skeleton Insertion point, comprising:
Using morphological image algorithm, the character image is converted to the text skeleton of only one pixel connection;
At least one angle point for extracting the text skeleton, by all angle points except positioned at the character image edge angle point it Insertion point of the outer any angle point as embedding information;
It is described that the insertion point is expanded, to disconnect the point of the strokes of characters intersection, comprising:
Longest straight line in all straight lines adjacent with the insertion point is obtained in the text skeleton;
Using longest straight slope structural texture member, the insertion point is expanded by structural elements, to disconnect the text The point of stroke intersection.
2. the method according to claim 1, wherein described in the topology for changing the text in the character image Structure, and the Euler's numbers of the text after changing the topological structure are calculated, so that the numerical characteristic of the Euler's numbers is with described to embedding Enter after digital watermark information matches, the method further include:
Numerical characteristic character image corresponding with the text that the digital watermark information to be embedded matches is merged.
3. a kind of digital watermark processing device traced to the source for printing document, which is characterized in that described device includes:
Divide module, for text document to be converted to image, and described image is divided into each text in the text document Corresponding character image;
Computing module, for calculating the Euler's numbers of the text in each character image, and it is true according to the odd even of the Euler's numbers Determine the corresponding numerical characteristic of the character image;
Obtain module, for obtaining the digital watermark information to be embedded of each character image, and judge the numerical characteristic with Whether the digital watermark information to be embedded matches;
Processing module changes the text if mismatched for the numerical characteristic and the digital watermark information to be embedded The topological structure of text in word image, and the Euler's numbers of the text after changing the topological structure are calculated, so that the Euler's numbers Numerical characteristic match with the digital watermark information to be embedded;
The segmentation module, comprising:
Submodule is handled, for obtaining bianry image for described image binaryzation;
First scanning submodule calculates the picture of the every row image scanned for progressively scanning the bianry image from top to bottom Element, and according to the floor projection of pixel acquisition described image;
First segmentation submodule, for utilizing the blank spaces of every row text in described image in the ranks in the floor projection shape At blank space, described image is divided into and is not gone together, row image is obtained;
Second scanning submodule calculates all in the every row image scanned for from left to right scanning each row image The pixel of text, and according to the upright projection of pixel acquisition described image;
Second segmentation submodule, for utilizing the blank spaces between the text in each row image in the upright projection shape At blank space, by the row image segmentation be single character block, the character block be the text document in each text Corresponding character image;
Judging submodule, for judging whether the spacing of two neighboring character block is greater than preset threshold;
Merge submodule, for when the spacing is not more than the preset threshold, the two character blocks to be merged into a word Accord with block;
The computing module, comprising:
It identifies submodule, for using image recognition algorithm, identifies the connected region number of the text in each character image With hole number;
Computational submodule calculates the Euler's numbers of the text for the difference by the connected region number and described hole number;
First determines submodule, for when the Euler's numbers are odd number, the corresponding numerical characteristic of the character image to be " 1 ";
Second determines submodule, for when the Euler's numbers are even number, the corresponding numerical characteristic of the character image to be " 0 ";
The processing module, comprising:
Extracting sub-module, if mismatched for the numerical characteristic and the digital watermark information to be embedded, described in extraction Text skeleton in character image, and the insertion point for being embedded in digital watermark information, the insertion are determined in the text skeleton Point is the point of strokes of characters intersection;
Submodule is expanded, for expanding to the insertion point, to disconnect the point of the strokes of characters intersection, described in changing The topological structure of text in character image, and the Euler's numbers of the text after changing the topological structure are calculated, so that the Euler Several numerical characteristics matches with the digital watermark information to be embedded;
The extracting sub-module, comprising:
The character image is converted to the text of only one pixel connection for utilizing morphological image algorithm by converting unit Character skeleton;
Extraction unit is located at the text figure for removing in all angle points for extracting at least one angle point of the text skeleton As edge angle point except insertion point of any angle point as embedding information;
The expansion submodule, comprising:
Acquiring unit, for obtained in the text skeleton with longest one in the adjacent all straight lines of the insertion point it is straight Line;
Expansion cell, for being expanded to the insertion point by structural elements using longest straight slope structural texture member, with Disconnect the point of the strokes of characters intersection.
4. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein described Processor, the communication interface, the memory complete mutual communication by communication bus;
The memory, for storing computer program;
The processor when for executing the program stored on memory, realizes any method step of claim 1-2 Suddenly.
5. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program when the computer program is executed by processor, realizes any method and step of claim 1-2.
CN201710838786.8A 2017-09-18 2017-09-18 It is a kind of for printing the digital watermark treatment method and device that document is traced to the source Active CN107644391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710838786.8A CN107644391B (en) 2017-09-18 2017-09-18 It is a kind of for printing the digital watermark treatment method and device that document is traced to the source

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710838786.8A CN107644391B (en) 2017-09-18 2017-09-18 It is a kind of for printing the digital watermark treatment method and device that document is traced to the source

Publications (2)

Publication Number Publication Date
CN107644391A CN107644391A (en) 2018-01-30
CN107644391B true CN107644391B (en) 2019-11-26

Family

ID=61111903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710838786.8A Active CN107644391B (en) 2017-09-18 2017-09-18 It is a kind of for printing the digital watermark treatment method and device that document is traced to the source

Country Status (1)

Country Link
CN (1) CN107644391B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428356B (en) * 2019-07-22 2023-04-28 中孚安全技术有限公司 Paper printed part hidden watermark tracing method, system, terminal and storage medium
CN111028123B (en) * 2019-11-11 2022-05-20 浙江大学 Anti-printing large-capacity text digital watermarking method
CN113139547B (en) * 2020-01-20 2022-04-29 阿里巴巴集团控股有限公司 Text recognition method and device, electronic equipment and storage medium
CN112053275B (en) * 2020-07-14 2023-03-21 清华大学 Printing and scanning attack resistant PDF document watermarking method and device
CN117350909A (en) * 2023-10-24 2024-01-05 江苏群杰物联科技有限公司 Text watermark processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102368328A (en) * 2011-09-19 2012-03-07 北京航空航天大学 Digital watermarking method applied to counterfeit prevention for print documents
CN102592126A (en) * 2010-11-15 2012-07-18 柯尼卡美能达美国研究所有限公司 Method for binarizing scanned document images containing gray or light colored text printed with halftone pattern
CN105260148A (en) * 2015-10-22 2016-01-20 苏州恒盛信息技术有限公司 Printing file authenticating and tracing method and system based on electronic label
CN106845475A (en) * 2016-12-15 2017-06-13 西安电子科技大学 Natural scene character detecting method based on connected domain

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102592126A (en) * 2010-11-15 2012-07-18 柯尼卡美能达美国研究所有限公司 Method for binarizing scanned document images containing gray or light colored text printed with halftone pattern
CN102368328A (en) * 2011-09-19 2012-03-07 北京航空航天大学 Digital watermarking method applied to counterfeit prevention for print documents
CN105260148A (en) * 2015-10-22 2016-01-20 苏州恒盛信息技术有限公司 Printing file authenticating and tracing method and system based on electronic label
CN106845475A (en) * 2016-12-15 2017-06-13 西安电子科技大学 Natural scene character detecting method based on connected domain

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于字符欧拉数的抗打印扫描文本水印算法;李艳 等;《第九届中国通信学会学术年会论文集》;20131029;第421-425页 *

Also Published As

Publication number Publication date
CN107644391A (en) 2018-01-30

Similar Documents

Publication Publication Date Title
CN107644391B (en) It is a kind of for printing the digital watermark treatment method and device that document is traced to the source
Gebhardt et al. Document authentication using printing technique features and unsupervised anomaly detection
JP3373811B2 (en) Method and apparatus for embedding and detecting watermark information in black and white binary document image
US9400769B2 (en) Document layout system
Amano et al. A feature calibration method for watermarking of document images
CN101615252B (en) Method for extracting text information from adaptive images
TWI284288B (en) Text region recognition method, storage medium and system
CN108052653A (en) Acquisition methods, device, storage medium, terminal and the image search method of characteristics of image descriptor
CN102194123B (en) Method and device for defining table template
JP3943638B2 (en) Automatic recognition method of drop word in document image without using OCR
CN102339352A (en) Electronic paper marking method
US8144925B2 (en) Mapping based message encoding for fast reliable visible watermarking
CN103336961A (en) Interactive natural scene text detection method
Kwag et al. Efficient skew estimation and correction algorithm for document images
JP2008176521A (en) Pattern separation extraction program, pattern separation extraction apparatus and pattern separation extraction method
Nguyen et al. On the security of text-based 3D CAPTCHAs
CN108829711A (en) A kind of image search method based on multi-feature fusion
CN109190339A (en) A kind of webpage digital watermarking image generates, identification, Method of printing and device
CN104182966A (en) Automatic splicing method of regular shredded paper
Das et al. Heuristic based script identification from multilingual text documents
CN103985078A (en) Image and text mixing digital watermark embedding and extracting method of resisting to printing and scanning
AU2009202451B2 (en) Image processing apparatus, image forming apparatus and program
US9141854B2 (en) Method and apparatus for generating structure of table included in image
CN107798649A (en) The recognition methods of picture and device
CN103020651B (en) Method for detecting sensitive information of microblog pictures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Yang Yu

Inventor after: Chen Yuwei

Inventor after: Lei Min

Inventor after: Li Deyin

Inventor after: Zhan Rui

Inventor before: Yang Yu

Inventor before: Chen Yuwei

Inventor before: Lei Min

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant