CN1601956A - Text digital Watermark tech using character's features for carrying watermark information - Google Patents

Text digital Watermark tech using character's features for carrying watermark information Download PDF

Info

Publication number
CN1601956A
CN1601956A CN 200410040853 CN200410040853A CN1601956A CN 1601956 A CN1601956 A CN 1601956A CN 200410040853 CN200410040853 CN 200410040853 CN 200410040853 A CN200410040853 A CN 200410040853A CN 1601956 A CN1601956 A CN 1601956A
Authority
CN
China
Prior art keywords
character
string
font
coding
watermark information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200410040853
Other languages
Chinese (zh)
Inventor
刘�东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 200410040853 priority Critical patent/CN1601956A/en
Publication of CN1601956A publication Critical patent/CN1601956A/en
Priority to CN 200510065893 priority patent/CN1684115B/en
Priority to PCT/CN2005/001703 priority patent/WO2006042460A1/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

Fundamental principle of the invention is that multiple fonts are designed for same character (string) in same semantic through changing relation of connection and disconnection between strokes of constituting character (string). Moreover, proper encoding is carried out for topological structure of these character fonts, and digital water ink information is carried out by these codes for different fonts of characters (string). The invention includes related contents: (1) method for designing multiple fonts of character; (2) method for encoding the said fonts; (3) method of unified encoding; (4) text digital water ink technique for encoding fonts respectively; (5) unified text digital water ink technique for encoding fonts. Features are: small influence on vision, simple and reliable detection method, good expansibility and large capability of information.

Description

Utilize the font style characteristic of character to carry the text digital water mark technology of watermark information
Technical field
The invention belongs to command, control, communications, and information engineering field, be specifically related to the hiding of data, the encoding and decoding of data, digital watermark technology.
Background technology
Digital watermark technology is an important component part in Information Hiding Techniques field, and it will have the information (digital watermark information) of certain sense, utilizes digital embedding grammar to be hidden in various digital pictures, sound, video, the textual number product.These electronic products that have digital watermark information can be difficult for the normal use in perceived ground on the one hand, on the other hand, can detect the digital watermark information that is embedded in these digital products by specific technological means.The copyright protection, content verification that digital watermark technology is widely used in digital product and numerous basins such as false proof, as to prevent that illegal copies, usage track, secret data from communicating by letter.According to the difference of digital watermarking carrier, digital watermarking can be divided into main several kinds such as image digital watermark, sound figure watermark, video digital watermark and text digital water mark.What the present invention relates to mainly is the text digital water mark field, is characterized in that it is in the text of main component that watermark information is hidden in by character.
As survey article " text digital water mark " (Journal of Chinese Information Processing, the 15th volume, the 5th phase, author: Huang Hua, Qi Chun, Li Jun, Zhu Weifang) described, existing text digital water mark technology concentrates on and utilizes the format information of text to preserve digital watermark information.This paper has mainly been told about the word space, the line space that utilize text and has been encoded embed watermark information.The defective of this thinking is: for the coding method that utilizes word space, line space, langue (as English) based on the Latin alphabet has certain advantage, but for similar Chinese like this based on the language of ideographic language, owing to there is not the word space under the English meaning, the embedding ratio of watermark is difficulty.Simultaneously, the error that the watermark information that utilizes the word space coding is detected is bigger, and the watermark information that the digital watermark that utilizes line space to encode carries is less.
Article " technology of Information Hiding Based on Text Document " (computer application research, 2003.10,39~41, author: Cao soldier Dai Guan illuminate in ancient name for China Mu De person of outstanding talent) mainly told about to encode and carry the technology of watermark information with the font of pointing information coding, character.Its shortcoming is: because punctuation mark is less relatively in the text, so it is less to utilize pointing information coding to carry information.Utilizing the subject matter of text digital water mark technology of the font coding of character to be to detect with the printed matter file is that the watermark information of carrier file is very difficult, does not mention detection method in this case in the literary composition.
Above-mentioned two pieces of articles have mentioned that all the feature coding that utilizes character preserves the technology of digital watermark information, have related generally to change the length of part stroke or the height of whole character comes embed watermark information.The subject matter of this technology is to being the detection difficult of the watermark information of carrier file with the printed matter file equally, can bring bigger visual impact simultaneously.
As article " research of two-value text digital watermark technology and emulation " (system emulation journal, VOL.16 No.3,2004.3, author: Wang Huiqin, Li Renhou) described, the thinking of the existing text digital water mark technology that another kind is main is that text is converted to image file, carries out the watermark information loading according to the method that the image digital watermark technology provides.The shortcoming of this method is the demonstration and the processing that can not have the image electronic file of watermark information with most word processor.
As article " Techniques for data Hiding " (IBM Systems Journal, 1996,35 (3﹠amp; 4), Bender W, et al.) described, another existing text digital water mark technology substitutes by the particular phrase in the text being carried out synonym, and the different vocabulary of synonym are encoded, and is used to load watermark information.The shortcoming of this technology is to be difficult to for all vocabulary finds appropriate synonym, but causes the capacity of text embed watermark information quite limited, is not that each vocabulary all has corresponding with it synonym after all.
Number of patent application is 00805218.2, patent name is " the sightless coding of metamessage " (applicant: imperial Philips Electronics N.V, the inventor: K. dust henry enters China's date: 2001.09.18) provide a kind of patent text digital watermark technology.This technology is encoded to the order that invisible symbol (as space, carriage return, tabulation symbol etc.) occurs, and these sightless symbols is embedded into is used to represent digital watermark information in the text then.Because watermark information concentrates on the invisible symbol, this technology can only be used for the situation that the carrier file is an e-file, can not be used for the situation that the carrier file is the printed matter file.Simultaneously, visicodes a large amount of in the text do not load watermark information, the watermark information skewness, so, very easily victim removal of the watermark information of Jia Zaiing in this way.
Number of patent application is 200410040307.0, and patent name is that " carrying the text digital water mark technology that hides Info with the redundancy encoding of symbol " (application time: 2004.7.26 is in waiting to authorize for applicant: Liu Dong, inventor: Liu Dong.) another kind of patent text digital watermark technology is provided.This technology is carried out redundancy encoding to character, and carries out the demonstration and the processing of digital watermark information in conjunction with corresponding font file.For the carrier file is the situation of e-file, and this method has realized embedding, the demonstration of watermark information preferably and detected.For the carrier file is the situation of printed matter file, and its essence is based on the technology of character font coding, and this method has only solved the imbedding problem of digital watermark information preferably, and the detection of digital watermark information is still very difficult.
Summary of the invention
The purpose of this invention is to provide the text digital water mark technology that a kind of font style characteristic that utilizes character carries digital watermark information, be used for solving problems such as that existing text digital water mark technology occurs is little to the visual influence capacity big, that the carrier file carries watermark information that the user brings such as watermark information, printed matter digital watermark information detection difficult.The carrier file that the present invention is specially adapted to digital watermark information is the situation of printed matter file.
Basic principle of the present invention is by changing the disconnected relation of company between each stroke of forming character (string), design the multiple font of semantically identical character (string), and the font style characteristic that the topological structure of character (string) font is formed carries out appropriate coding, utilize the coding of character (string) font to carry digital watermark information, thereby constitute a kind of new text digital water mark technology.
The present invention includes the following content that is closely related:
(1) is used to carry the method that same character (string) is designed to multiple font of digital watermark information;
(2) the some kinds of multiple fonts to same character (string) carry out Methods for Coding;
(3) some kinds of multiple fonts to a plurality of characters (string) carry out the method for Unified coding;
(4) based on the text digital water mark technology that the multiple font of a plurality of characters (string) is encoded respectively;
(5) based on text digital water mark technology to the multiple font Unified coding of a plurality of characters (string).
Be used to carry the method that same character (string) is designed to multiple font of digital watermark information:
Main design philosophy is to change the topological structure of character (string) by the disconnected relation of company that changes between each stroke of forming character (string), thereby designs multiple character (string) profile of semantically identical same character (string).
The font design principle that all is suitable for for some kinds of character shape coding methods of the present invention is: according to the characteristics of each character (string) self, indivedual strokes that character (string) is inner are made tiny variation and are changed the disconnected relation of company between each stroke, thereby change the topological structure of character (string) font, will be mapped as the bigger variation of font topological structure the less relatively variation that stroke is made; Form company between each stroke of character (string) font break relation should be clear, the minimum range between the stroke should be higher than can detected distinguish of system limit value.
Character (string) font method for designing should be taken all factors into consideration following factor:
1) factor of character shape coding
In the digital watermarking system of reality, the design of character (string) font should combine with the specific coding method of the present invention, designs corresponding font topological structure at specific character (string) character shape coding method.General design principle is: the change of character (string) font topological structure can cause the different coding under the specific coding method.In the design of the multiple font of character (string), should take into full account the factor of coding, design suitable font, obtain coding as much as possible, thereby solve the problem that character (string) carries the watermark information desired volume.
2) factor of font font style
The design of character (string) font also should fully take into account the factor of font font style.On the one hand, in the carrier file, usually requiring to carry between the character (string) of digital watermark information has identical font style, thereby reduces because of carrying the visual influence that watermark information brings.This not only requires between the different fonts of same character (string), and the height of font, width and font style should be basic identical, and between the multiple font of a plurality of character (string) similar height should be arranged also, width and identical font style.On the other hand, because the character of original vector file itself has different font style (branch that the Song typeface, regular script are for example arranged), so, also should consider at different font styles for same character (string) design identical topological structure is arranged but the font of different fonts style is arranged.In the present invention, after the topological structure of character (string) font and coding method thereof are determined, the detection method of the watermark information that it carries has also been determined, design the font that identical topological structure is arranged but the different fonts style is arranged, for font method for designing of the present invention provides a kind of autgmentability under the situation that does not change watermark information detection method.In the design of the multiple font of character (string), should take into full account the factor of font font style, thereby solve the scaling concern that reduces visual impact and font design.
In the application system of reality, should take all factors into consideration the design that above-mentioned factor is carried out character (string) font at different application backgrounds.The design rule that the present invention recommends is:
(1) at first should determine specific coding method according to applicable cases, and change the disconnected relation of company between each stroke of character (string) at this coding method, thereby change the topological structure of character (string) font, this changes of topology structure should corresponding as far as possible different coding, thereby increases the capacity that character (string) carries digital watermark information.
(2) for the multiple font of the different coding correspondence of same character (string), should design one group of essentially identical font of height, width and font style.For example, the multiple font of the different coding correspondence of a certain character (string) is Song typeface style, and its font height, width are similar, just the disconnected relation difference of the company between each stroke of these characters (string) font inside.This design is mainly used in the situation that the carrier file of same font style carries digital watermark information, and for example, the character of whole carrier file all has only a kind of font---the Song typeface.
(3) consider autgmentability, the font of the same character (string) of same-code also should have different font styles.For example, the multiple font of the same-code correspondence of a certain character (string) has multiple fonts style (for example Song typeface, regular script, imitation Song-Dynasty-Style typeface etc.).This design is mainly used in the situation that the carrier file of different fonts style carries digital watermark information, for example, the character in the carrier file (string) has plenty of the Song typeface, has plenty of regular script, even same character (string) is the Song typeface in a place, is regular script at another place.
Of particular note, the method for designing to character glyphs can be generalized in the design of character string font similarly.Can change the topological structure of character string by the disconnected relation of the company between each character of change forming character string, thereby design multiple shape for character string, and the difformity of use character string is carried digital watermark information.Compare the least unit difference of utilizing the method difference of character string only to be to carry digital watermark information with the method for utilizing character to carry digital watermark information.Utilizing character to carry in the method for digital watermark information, the least unit of carrying watermark information is single character, and is utilizing character string to carry in the method for digital watermark information, and the least unit of carrying watermark information is the character string that a plurality of characters are formed.Font design philosophy in these two kinds of methods and coding rule do not have the difference of essence, character string can be regarded as a character of being made up of complicated stroke.In the text digital water mark system of reality, if wish with the character string to be that unit carries digital watermark information, the whole-word (group) that the present invention recommends a plurality of characters are formed is as the base unit that carries digital watermark information, and the design of character string font also designs at complete word (group).Language (as English) based on the Latin alphabet comparatively is fit to utilize character string to carry digital watermark information, simultaneously, for the language based on Chinese characters such as Chinese, Korean, can design the character string of artistic font style and carry digital watermark information.The some kinds of multiple fonts to same character (string) carry out Methods for Coding:
This invention content comprises the deformation method of 7 kinds of typical coding methods and some kinds.
1) based on the coding method of " figure " structure
This method may further comprise the steps:
(1) character (string) font is mapped as " figure " that defines in the mathematics subject " graph theory " according to clear and definite rule.
A kind of concrete implementation method is that summit, crosspoint, the flex point with stroke is mapped as the node (end points) of " figure " of definition in the mathematics subject " graph theory ", and the stroke between the connection summit, crosspoint, flex point is mapped as the limit of " figure ".Like this, the font of character (string) can be mapped as undirected " figure " of definition in " graph theory ".The font of character (string) is many-to-one mapping relations with " figure ", and promptly the font of a character (string) is mapped on one " figure ", and one " figure " may be mapped as a plurality of characters (string) font.Also can on the basis of non-directed graph, add specific spatial order rule (for example from left to right, from top to bottom etc.), character (string) font is mapped as directed graph.
(2) " figure " corresponding characters (string) font of isomorphism has identical coding, and the coding of " figure " corresponding characters (string) font of isomorphism can not identical (promptly will not have two different codings) at least.
In a plurality of " figure " of the different font correspondences of the same character (string) of step (1) gained, some of them " figure " might occur and be isomorphism (according in " graph theory " to the definition of isomorphism), should be identical sign indicating number with " figure " corresponding characters (string) character shape coding of isomorphism during coding.Usually, should with " figure " corresponding characters (string) character shape coding of isomorphism not different sign indicating numbers, to improve the capacity that character (string) carries watermark information as far as possible.But, consider in the text digital water mark system of reality, the capacity that a plurality of different characters (string) carry information is preferably identical, and the coding number of wishing character (string) font be rounding (for example, usually require to be 2 multiple or to be the power exponent at the end with 2) etc. factor, allow " figure " corresponding characters (string) font of a plurality of not isomorphisms that identical coding is arranged.In order to guarantee that character (string) has at least a binary watermark information to carry capacity, then to have at least two not " figure " corresponding font of isomorphism different codings is arranged, character (string) font that promptly carries digital watermark information will have two different encoding states at least.
2) coding method of the arrangement set that forms based on the component of " figure "
This method may further comprise the steps:
(1) character (string) font is mapped as " figure " that defines in the mathematics subject " graph theory " according to clear and definite rule.
This step is equal to the step (1) of " based on the coding method of graph structure ".
(2) with the component (each largest connected subgraph) of " figure " as component, and form according to the spatial order of clearly definition and to arrange set, wherein, the component of isomorphism (largest connected subgraph) is defined as identical set element.
" figure " of character (string) font correspondence may comprise a plurality of independently connected subgraphs, and according to the definition of " graph theory ", independently connected subgraph (promptly mutual disconnected each largest connected subgraph) is called as the component of " figure ".This method with the component of " figure " of character (string) font correspondence as set element according to the spatial order of clearly definition (for example earlier from left to right, arrange then from top to bottom), like this, since the font correspondence of each character (string) clear and definite " figure ", thereby also clear and definite correspondence the arrangement formed by the component of " figure " gather.In such arrangement set, the component of isomorphism (largest connected subgraph) is considered to identical element, and the component of isomorphism (largest connected subgraph) can not be considered to element inequality.
(3) identical arrangement set corresponding characters (string) font has identical coding, and the coding of corresponding characters (string) font is gathered in different arrangements can not identical (promptly having two different codings at least).
The arrangement set of forming according to the component by " figure " that obtains in the step (2) is encoded, and identical arrangement set corresponding characters (string) font has identical coding.With the identical reason of " based on the coding method of graph structure " step (2), the coding of different arrangement set corresponding characters (string) fonts can be different, also can be identical, but can not be identical (promptly have at least two different codings of arranging set corresponding characters (string) fonts inequality).
3) based on the coding method of independent connected region number
This method is encoded at the number of the independent connected region of being made up of the stroke of character (string) font (promptly disconnected each other those connected regions).The independent connected region number of character (string) font equals the component number of " figure " of this character (string) font correspondence.
The rule of coding is considered and the identical reason of " based on the coding method of graph structure " step (2), character (string) font that remains identical independent connected region number has identical coding, and the coding of the character of different independent connected region numbers (string) font can not identical (promptly having two different codings at least).
" based on the coding method of graph structure " and " coding method of the arrangement set that forms based on the component of figure " all needs the homoorganicity between different " figure " is judged, and isomorphism evaluation algorithm computational complexity on mathematical theory of " figure " higher (being np problem).Though " figure " of character (string) font correspondence usually can be too complicated, it is feasible directly utilizing the isomorphism evaluation algorithm that has " figure " now to handle, and processing procedure is relative complex still.This coding method is the method for simplifying to " based on the graph structure coding method ".
4) based on the coding method of the composite set of independent connected region number and independent closed area number
There are one or more closed areas that surrounded by stroke in some specific characters (string), the particularly character of language such as Chinese, Korean (string).In addition, adopt character of the present invention (string) font method for designing, also can be some specific characters (string) and design the closed area that surrounds by character (string) font stroke.The composite set of the independent closed area number that the independent connected region number that this coding method is formed at the stroke of character (string) font and the stroke of character (string) font surround is encoded.
The rule of coding is considered and the identical reason of " based on the coding method of graph structure " step (2), composite set corresponding characters (string) font that remains identical has identical coding, and the coding of different composite set corresponding characters (string) fonts can not identical (promptly having two different codings at least).
Compare with independent " based on the coding method of independent connected region number ", this method provides greater flexibility and more space encoder.
5) based on independent connected region number and independent closed area number and coding method
The independent closed area number sum that the independent connected region number that this method is formed at the stroke of character (string) font and the stroke of character (string) font surround is encoded.
The rule of coding is considered and the identical reason of " based on the coding method of graph structure " step (2), remaining independent connected region number character (string) font identical with independent closed area number sum has identical coding, and the coding of the character that both sums are different (string) font can not identical (promptly having two different codings at least).
Compare with " based on the coding method of independent connected region number with the composite set of independent closed area number ", this method is simpler.
6) based on the remainder of independent connected region number after divided by integer carried out Methods for Coding
The remainder of the independent connected region number that this method is formed at the stroke of character (string) font after divided by integer encoded, and the character that remainder is identical (string) font has identical coding, and different character (string) font of remainder has different codings.
The value of integer is flexibly in this method, and when integer got 2, this method is equivalent to encoded to the parity of independent connected region number.Between character (string) font that independent connected region number is an odd number identical coding is arranged, between character (string) font that independent connected region number is an even number identical coding is arranged also, but the coding between different character (string) font of independent connected region number parity is different.In this method, the span of integer is comparatively suitable between 2~4, and from simple easy-to-use angle, it is 2 or 4 that the present invention recommends the value of integer.
7) based on independent connected region number and the independent closed area number sum remainder after divided by integer is carried out Methods for Coding
The remainder of the independent closed area number sum that the independent connected region number that this method is formed at the stroke of character (string) font and the stroke of character (string) font surround after divided by integer encoded, the character that remainder is identical (string) font has identical coding, and different character (string) font of remainder has different codings.
Only carry out Methods for Coding at the remainder of independent connected region number after divided by integer and compare with aforementioned, this method provides greater flexibility, but is similar in essence.In this method, the span of integer is comparatively suitable between 2~8, and from simple easy-to-use angle, it is 2,4 or 8 that the present invention recommends the value of integer.
It needs to be noted that above-mentioned 7 kinds of typical coding methods of the present invention have a variety of distortion.
A kind of mode of distortion is: at the component of " figure " of character (string) font correspondence, independent connected region number, independent closed area number one or multinomial arrangement set (perhaps composite set) as element are encoded.For example, can be according to the order that clearly defines, as an element, the number of independent connected region, independent closed area number are encoded as the composite set that two other element forms respectively with first component in the component of " figure " of character (string) font correspondence.
The mode of another kind of distortion is: at character (string) font independence connected region number, independent closed area number are encoded as the mathematical operation result of parameter.For example, similar aforesaid " based on the coding method of independent connected region number " can be done square the number of independent connected region, cube, ask parity, judge whether to encode for the result of mathematical operations such as prime number.Also can be similar aforesaid " based on independent connected region number and independent closed area number and coding method ", product to independent connected region number and independent closed area number is encoded, perhaps independent connected region number and independent closed area number are carried out mathematical operations such as exponent arithmetic, logarithm operation as parameter, its result is encoded.In addition, can also encode to the permutation and combination set that above-mentioned various mathematical operation results form.The rest may be inferred, can be out of shape a lot of coding methods.
The mode of another kind of distortion is: a plurality of coding methods of integrated application are encoded to character (string) font.On the basis that utilizes a kind of method to encode, to utilizing additive method to carry out secondary coding again based on the character glyphs that the same-code value is arranged of this coding method, and can the rest may be inferred, fully utilize a plurality of methods and repeatedly encode.For example, adopt " based on the method for independent connected region number " to encode earlier, again a plurality of character glyphs that identical independent connected region number is arranged are carried out secondary coding, the mode of secondary coding can adopt " based on the coding method of the composite set of independent connected region number and independent closed area number ".Also can further carry out coding three times at independent connected region number character (string) font identical with the composite set of independent closed area number, the mode of three codings can adopt " based on the coding method of graph structure ".The rest may be inferred, can be out of shape a lot of coding methods, can select according to self topological structure of character (string) font.Comprehensive a plurality of coding method is encoded to character (string) font can enlarge the capacity that character (string) carries digital watermark information.
Though above-mentioned deformation method comes down to the simple extension of these typical coding methods from seem different with 7 kinds of typical coding methods of the present invention in form.
The some kinds of multiple fonts to a plurality of characters (string) carry out the method for Unified coding:
Multiple font at a plurality of characters (string) (is annotated: the multiple font that comprises same character (string)) in the range of convergence of Zu Chenging, utilize a kind of method in " the some kinds of multiple fonts to same character (string) carry out Methods for Coding " of the present invention, multiple font to a plurality of characters (string) carries out Unified coding, and character (string) character shape coding value to establish rules really be unified between a plurality of characters (string) then.
In the digital watermarking system of reality, need carry digital watermarking jointly by a plurality of characters (string) of carrier file usually, and this method is the method for the multiple font of a plurality of characters (string) being carried out Unified coding.This method adopts a kind of among the present invention " the some kinds of multiple fonts to same character (string) carry out Methods for Coding " (as 7 kinds of typical coding methods as described in the last chapters and sections and deformation method thereof), multiple font to a plurality of characters (string) uses identical method to encode, and the concrete corresponding relation of the font style characteristic of character (string) and coding is unified between kinds of characters (string).For example, if adopt " based on the coding method of graph structure ", the multiple font of the same character (string) of not only corresponding isomorphism " figure " has same-code, and the multiple font of the kinds of characters (string) of corresponding isomorphism " figure " also has same-code.Again for example, if adopt " based on the coding method of independent connected region number ", the all corresponding identical encoded radio of all characters (string) font that identical independent connected region number is then arranged, no matter the font of these characters (string) is the font of same character (string), or the font of a plurality of kinds of characters (string).Based on the text digital water mark technology that the multiple font of a plurality of characters (string) is encoded respectively:
In this text digital water mark technology, digital watermark information is embedded in the multiple font of carrier file character (string), and the coding of character (string) font is used for representing digital watermark information.Adopt the disconnected method that concerns the topological structure that changes character (string) of company between each stroke of change character of the present invention (string), for multiple character (string) font designed in same character (string).The coding method of font adopts 7 kinds of the present invention typically the multiple font of same character (string) to be carried out Methods for Coding and some kinds of deformation methods thereof.
An important feature of this text digital water mark technology is can be different at the character shape coding method of a plurality of kinds of characters (string) of carrier file, can be the special character shape coding method of certain character (string) selection according to the characteristics of character (string) self.As a rule, the characteristics that the complicated and simple degree of character (string) stroke and topological structure have himself, under the prerequisite of keeping certain visual sensory quality, the quantity of the different topology structure font that kinds of characters (string) can be designed is discrepant.Each character (string) ability of carrying digital watermark information by the variation of font is discrepant in fact, use different character shape coding methods can fully reflect this species diversity to kinds of characters (string), thereby increase the ability that whole carrier file carries digital watermark information.On the determining of specific coding value, the font of kinds of characters (string) and the rule of correspondence of encoded radio are separate in this text digital water mark technology, concerning the multiple font of same character (string), only consider the different topology structure of this character (string) is encoded, do not consider the influence that the topological structure of other characters (string) font may bring, thereby coding is comparatively simple to this character (string).
The characteristics of the watermark information detection method of this text digital water mark technology are: the distinctive character shape coding method that needs each character (string) in the clear and definite carrier file.Testing process should at first be determined the semantic information of each character (string) in the carrier file usually, inquire about the specific character shape coding method of each character (string) according to the semantic information of character (string), detect the coding of corresponding font style characteristic according to specific character shape coding method then, thereby detect digital watermark information with each character (string) font in definite carrier file.For example, a character (string) corresponding codes method is " based on the coding method of graph structure ", then should detect " figure " architectural feature of this character (string) font correspondence, to determine corresponding character shape coding.Another character (string) corresponding codes method is " based on the coding method of connected region number ", then should detect this font style characteristic of connected region number that this character (string) font comprises, thereby further determines the coding of font.The character shape coding of each character correspondence in the carrier file is combined, just obtained the digital watermark information that whole carrier file carries.In this text digital water mark technology, the basis of digital watermark information testing process be at first should clear and definite carrier file in the specific character shape coding method of each character (string), the testing process of the digital watermark information that character (string) carries is related with the identifying of character (string) semantic information.
Based on text digital water mark technology to the multiple font Unified coding of a plurality of characters (string):
Similar to above-mentioned " based on the text digital water mark technology that the multiple font of a plurality of characters (string) is encoded respectively ", in this text digital water mark technology, digital watermark information still is embedded in the multiple font of carrier file character (string), and the coding of character (string) font is used for representing digital watermark information.Still adopt the disconnected method that concerns the topological structure that changes character (string) of company between each stroke of change character of the present invention (string), for multiple character (string) profile designed in same character (string).
Compare with " based on the text digital water mark technology that the multiple font of a plurality of characters (string) is encoded respectively ", the main distinction of this technology is: " the some kinds of multiple fonts to a plurality of characters (string) carry out the method for Unified coding " of the present invention and deformation method thereof are adopted in the coding method of font.
An important feature of this text digital water mark technology is identical at the character shape coding method of a plurality of characters (string) of carrier file, can only use a kind of common method that the multiple font of a plurality of characters (string) is encoded.On the determining of specific coding value, it then is unified that the character shape coding value of kinds of characters (string) is established rules really, and promptly for the multiple font of a plurality of kinds of characters (string), the topological structure feature that needs only their correspondences is identical, and then their encoded radio should be identical.Multiple font to same character (string) is encoded, not only should consider the coding factor of the different topology architectural feature font of this character (string) self, but also should be taken into account the coding situation of other characters (string) font, should be harmonious with the coding of other characters (string) font.
The characteristics of the watermark information detection method of this text digital water mark technology are: because each character (string) has only a kind of common character shape coding method in the carrier file, testing process does not need to know the semantic information of each character (string), can directly detect the coding characteristic of character (string) font at common coding method, with the coding of each character (string) font in definite carrier file, thereby detect digital watermark information.For example, if the carrier file uses " based on the coding method of graph structure ", then a plurality of characters (string) in the carrier file all use this identical method, and " figure " architectural feature that can directly detect each character (string) font correspondence is to determine its coding.Again for example, if the carrier file uses " based on the remainder of independent connected region number after divided by integer carried out Methods for Coding ", and integer gets 2, and then detection method is very simple.Do not need to know the semantic information of each character (string), directly calculate the independent connected region number of each character (string), the odd number number is a kind of coding (for example being encoded to 1), and the even number number is another kind of coding (for example being encoded to 0), thereby directly determines the coding of each character (string) font.The character shape coding of each character correspondence in the carrier file is combined, just obtained the digital watermark information that whole carrier file carries.In this text digital water mark technology, the identifying of the testing process of the digital watermark information that character (string) carries and character (string) semantic information is irrelevant.
The existing text digital water mark technology of contrast, main feature of the present invention is:
(1) detection method of digital watermark information of the present invention is only relevant with the topological structure of character (string) font, and with the size of character (string), the angle of inclination is irrelevant, is convenient to detect.The convergent-divergent of character glyphs, rotation do not influence the detection to watermark information, and noise resisting ability is strong.Particularly adopt font designing technique, the coding techniques of the closed area number that the connected region number formed based on character (string) font stroke and stroke surround, the detection method of digital watermark information is very simple and error rate is little.
(2) the present invention is the disconnected relation of company that changes each stroke of character (string) to shifting gears of character (string) font, can not change the profile size of character (string) font, whole style, the visual impact that watermark information causes is little, and the digital watermark information of embedding is difficult to be aware.
(3) character of the present invention (string) font method for designing is flexible, after having determined specific coding rule, can be as required design same-code for same character (string) but the multiple font of different fonts style is arranged, not need the program that changes detection method and be correlated with, extensibility be good.
(4) " based on the text digital water mark technology that the multiple font of a plurality of characters (string) is encoded respectively " of the present invention can design the font of character (string) especially at the concrete condition of each character (string), and select specific coding method, thereby it is bigger to make that character (string) carries the capacity of watermark information.
(5) " based on the text digital water mark technology to the multiple font Unified coding of a plurality of characters (string) " of the present invention semantic information of not needing to detect each character (string) just can directly be determined the text digital water mark information that each character (string) carries, the detection of original vector fileinfo and the independence that digital watermark information detects have been realized, simplify the detection method of watermark information, and reduced the link of makeing mistakes.
Description of drawings
Fig. 1 shown the method by the multiple font of the disconnected relation same character of design of the company that changes stroke of the present invention in the mode of example, and shown " figure " the corresponding method in character glyphs and the mathematics subject " graph theory ".
Fig. 2 has shown the font design and the coding method of the closed area number that connected region number of forming based on the stroke of character of the present invention and stroke surround in the mode of example.
Fig. 3 has shown the multiple font design and the Methods for Coding of same character string in the mode of example.
Fig. 4 has shown one group of character shape coding method that character is different separately in the mode of example.
Fig. 5 has shown the loading of watermark information of " based on the digital watermark technology that the multiple font of a plurality of characters (string) is encoded respectively " and the principle of detection in the mode of example.
Fig. 6 has shown the loading of watermark information of " based on the digital watermark technology to the multiple font Unified coding of a plurality of characters (string) " and the principle of detection in the mode of example.
Fig. 7 is the digital watermark information leaching process schematic diagram of " based on the text digital water mark technology that the multiple font of a plurality of characters (string) is encoded respectively " of the present invention
Fig. 8 is the digital watermark information leaching process schematic diagram of " based on the text digital water mark technology to the multiple font Unified coding of a plurality of characters (string) " of the present invention.
Embodiment
Below by for example, and, describe the specific embodiment of the present invention in detail with reference to accompanying drawing.
Same character (string) is designed to the embodiment of the method for multiple font:
As shown in Figure 1, by changing the disconnected relation of company between the stroke of forming Chinese character " king " (100), obtain different fonts (110), (120), (130), (140) of semantically identical character " king ".Height, the width of these different fonts are identical, and the font style unanimity of font all is thin surplus body.Comparing with original character glyphs (100), only is that the minority stroke has been done small shortening on former font (100) basis.But the small variation of these strokes has brought the variation that connects disconnected relation between the stroke, thereby has changed the topological structure of font, and with regard to topological structure, what the small variation of these strokes brought is the bigger variation of font topological structure.Usually, more for the composition stroke that similar Chinese character is such, the character that font is relatively complicated, the mode that changes the disconnected relation of company between the stroke is a lot.In Fig. 2, Chinese character " opens " 15 kinds of different fonts, and these fonts, only is a part in the some kinds of fonts that may form by the disconnected relation of the company that changes stroke.In the design of the font of reality, should consider on the one hand combine and carry out font design with the specific character shape coding method of the present invention, make the change of font cause the difference of coding as far as possible.Should consider to reduce the visual impact that FC brings on the other hand as far as possible, for example should change the disconnected relation of company between the complete stroke as far as possible, rather than a complete stroke is divided into different parts in conjunction with the characteristics of character self.
As shown in Figure 4, the design of the multiple font of same character can be based on carrying out with a kind of font style, and Chinese character " you " has two kinds of fonts (400), (401) based on Song typeface font style.The design of the font of kinds of characters also can be based on carrying out with a kind of font style, for example font (410), (411) of Chinese character " good ", and the font (430) of Chinese character " mother ", (431), (432), (433) all are based on Song typeface font style.In the carrier file that carries digital watermark information that is made of the character glyphs with a kind of font style, the visual influence that the adding digital watermark information brings is very little.In addition, the multiple font of the same character of identical topological structure also can corresponding different fonts style.As shown in Figure 4, each character all has the font of both font types style, is respectively the Song typeface and lishu, and there is different font styles the font of the identical topological structure of Chinese character " you " (400), (402), (400) corresponding Song typeface, (402) corresponding lishu.Such design can be suitable for having the situation of different fonts style character in same carrier file, can satisfy the autgmentability requirement of carrying out multiple character glyphs design under the situation that does not change character shape coding method, watermark information detection method again.
As shown in Figure 3, by changing the disconnected topological structure that changes character string (300) that concerns of company between each character of forming character string " draft " (300), can design multiple font (311)~(314), (321), (322), (331), (332), (340) for character string (300), and be that base unit carries out carrying of digital watermark information with the font of these character strings.Identical with the font design philosophy of character, the disconnected relation of company that changes between each character of forming character string should be selected appropriate company's breakpoint equally, for character string " draft " (300), it is suitable selecting between character d and r, r and a, a and f, f and the t company's breakpoint as shown in the figure.In addition, various fonts (311)~(314) of " draft " (300), (321), (322), (331), (332), (340) are all based on a kind of font style (thin surplus body), and the visual influence that replacement each other brings is very little.Similar to the character glyphs method for designing, the design of the font of kinds of characters string also can be based on carrying out with a kind of font style, and the multiple font of the same character string of identical topological structure also can corresponding different fonts style, can directly be analogized by the character glyphs method for designing.
The some kinds of multiple fonts to same character (string) carry out the embodiment of Methods for Coding:
Explanation hereby: based on succinct reason, the example of following coding method mainly adopts the coding example to character glyphs, but this does not influence the explanation to the embodiment of character string character shape coding method, the coding exemplary method of character glyphs simply can be extrapolated on the coding of character string font.
1) based on the embodiment of the coding method of " figure " structure
As shown in Figure 1, at first according to the rule of determining---the summit of font stroke, crosspoint, flex point are mapped as the node (end points) of " figure " of definition in the mathematics subject " graph theory ", stroke between connection summit, crosspoint, the flex point is mapped as the limit of " figure ", the different fonts of " king " (100) is mapped as " figure " of definition in the mathematics subject " graph theory ".Wherein (110) are mapped as (111), (120) and are mapped as that (121), (130) are mapped as (131), (140) are mapped as (141), and (111), (121), (131), (141) are non-directed graph.In addition, if according to the spatiality of character glyphs, definition connects the direction on limit of two nodes (end points) for from left to right, and direction from top to bottom can be mapped as directed graph with character glyphs.For example, the directed graph of font (110) is expressed as (112), and the directed graph of font (120) is expressed as (122), and the directed graph of font (130) is expressed as (132), and the directed graph of font (140) is expressed as (142).Like this, just a plurality of fonts of Chinese character " king " (100) corresponding non-directed graph or directed graph have been expressed as by illustrated mode.
Following step is based on the non-directed graph of font correspondence or the structure attribute of directed graph is encoded." figure " structure of noticing the different font correspondences of character might be an isomorphism according to the definition in " graph theory ", and the coding of the multiple font that shines upon for " figure " of isomorphism should be identical.For example, the non-directed graph (121) of font (120), (130) mapping is an isomorphism with (131), and their coding should be identical.But for a plurality of fonts of the non-directed graph correspondence of isomorphism, its directed graph is isomorphism not necessarily.For example, directed graph (122), (132) that font (120), (130) are corresponding are isomorphisms not, and therefore, this coding method is should at first clear and definite font corresponding with non-directed graph still to be corresponding with directed graph.If the unified non-directed graph that uses is encoded, the font among Fig. 1 (110) may be encoded as " 0 " (ternary); Because font (120), (130) corresponding non-directed graph (121), (131) are isomorphisms, their common codings can be " 1 " (ternary); (140) may be encoded as " 3 " (ternary).If the unified directed graph that uses is encoded, font among Fig. 1 (110), (120), (130), (140) corresponding directed graph (112), (122), (132), (142) variant structure can be encoded to " 00 ", " 01 ", " 10 ", " 11 " (binary system) respectively.
In the actual digital watermarking system, consider the uniformity of coding system between the rounding of coding and the character, needing to have " figure " corresponding characters font of different structure to carry out identical coding.For example among Fig. 1, for coding method based on non-directed graph, the non-directed graph of the multiple font correspondence of " king " (100) has three kinds of states, but encodes without ternary usually in real system, need be that two states carries out binary coding with the font rounding of three kinds of " figure " structures.Font (110) that can " figure " (111), (141) of isomorphism not are corresponding, (140) are compiled and are identical sign indicating number " 0 ", and " figure " (121) of isomorphism, compile (131) corresponding font (120), (130) is identical sign indicating number " 1 ".But in order to carry the one-bit digital watermark information at least, in " figure " of isomorphism not corresponding character shape coding, two different codings should be arranged at least, promptly the coding of font (110)~(140) can not be identical.
2) embodiment of the coding method of the arrangement set that forms based on the component of " figure "
At first, this method need adopt and " based on the coding method of graph structure " identical method, the multiple font of character is mapped as the non-directed graph or the directed graph of definition in " graph theory ".For example among Fig. 1, different fonts (110), (120), (130), (140) of " king " (100) are mapped as non-directed graph (111), (121), (131), (141) respectively.
" figure " of character glyphs correspondence may be made up of a plurality of independently connected subgraphs, according to the definition of " graph theory ", each independently connected subgraph be called the component of " figure ", i.e. disconnected each largest connected subgraph mutually.For example among Fig. 1, " figure " (111) have only one-component (1111); And there are two independently connected subgraphs " figure " (121), i.e. two components (1211), (1212); Also there are two components (1311), (1312) in " figure " (131); There are three components (1411), (1412), (1413) in " figure " (141).
Then, this method is arranged as set element the component of " figure " of character glyphs correspondence according to the spatial order of clearly definition.Like this, because each character glyphs has been shone upon one " figure ", thereby font has also shone upon the arrangement set of being made up of the component of " figure ".Arrange in accordance with the order from top to bottom among Fig. 1, font (120) is corresponding by two components (1211) of " figure " (121), the arrangement set { (1211), (1212) } that (1212) are formed; Font (130) is corresponding by two components (1311) of " figure " (131), the arrangement set { (1311), (1312) } that (1312) are formed.Component (1211) is an isomorphism with (1312) obviously, component (1212) is an isomorphism with (1311), and the component of isomorphism (independently connected subgraph) is regarded as identical set element, i.e. arrangement set (123) is made up of identical element with arrangement set (133).But it is sequential arranging set, and these two set (123) are different with putting in order of (133) component, thereby are different set.This coding method is encoded at arranging set, thereby the coding of these two set (123), (133) corresponding characters font (120), (130) can difference.Being equivalent to the composite set that the component to " figure " forms in fact based on the coding method of the non-directed graph of character glyphs correspondence encodes, and this method is that the arrangement set that the component of " figure " forms is encoded, and helps increasing the capacity that character carries digital watermark information.
This method comes down to based on the simplification to the coding method of the directed graph of character glyphs correspondence, the homoorganicity of directed graph is judged be comparatively complicated, and is judged comparatively simple to the homoorganicity of non-directed graph.This method is encoded to the arrangement set of non-directed graph component, has both simplified the homoorganicity deterministic process of " figure ", can keep the not more advantage of isomorphism state of directed graph again to a certain extent.
3) based on the embodiment of the coding method of independent connected region number
As shown in Figure 1, character glyphs (110) only has an independent connected region (1101); Character glyphs (120) has two independent connected regions (1201), (1202); Character glyphs (130) also has two independent connected regions (1301), (1302); Character glyphs (140) has three independent connected regions (1401), (1402), (1403).This method is to encode at independent connected region number, and the character glyphs of identical independent connected region number has identical coding, and the independent connected region number of font (120), (130) is 2, so they should have identical coding.Character glyphs (110), (140) of different independent connected region numbers can be different with the coding between (120) (or (130)), also can be identical, but in order to make character " king " (100) have the ability of carrying digital watermark information, their coding can not be identical.
This method is equivalent to encodes to the component of " figure " of character glyphs correspondence (independently connected subgraph) number.Can see that in Fig. 1 the independent connected region number of font (110) equals the component number of " figure " (111), is 1; The independent connected region number of font (120) equals the component number of " figure " (121), is 2; The independent connected region number of font (130) equals the component number of " figure " (131), is 2; The independent connected region number of font (140) equals the component number of " figure " (141), is 3.
In like manner, can utilize this method to encode to character string shown in Figure 3.English word " draft " (300) has 9 kinds of different fonts, and wherein, there are 4 independently connected regions font (311), (312), (313), (314) that glyph group (310) comprises; There are 3 independently connected regions font (321), (322) that glyph group (320) comprises; There are 2 independently connected regions font (331), (332) that glyph group (330) comprises; Font (340) has 1 independently connected region.Like this, 9 kinds of different font correspondences of " draft " (300) 4 kinds of different independent connected region numbers, thereby 4 kinds of encoding states are arranged, can compile is 4 kinds of different sign indicating numbers.This shows, be similar to the coding method of character string font to coding method to character glyphs, and difference only is to form the stroke of character string independence connected region may be from different characters.
4) based on the embodiment of the coding method of the composite set of independent connected region number and independent closed area number
As shown in Figure 2, Chinese character " opens " 15 kinds of different fonts, and wherein, there is 1 independently connected region font (2001), (2002), (2003), (210), (220); There are 2 independently connected regions font (2301), (2302), (2303), (240), (250); There are 3 independently connected regions font (2601), (2602), (2603), (270), (280).Meanwhile, in each font that glyph group (200), (230), (260) comprise, the independent closed area number that is surrounded by stroke is 0; In the font shown in font (210), (240), (270), the independent closed area number that is surrounded by stroke is 1; In the font shown in font (220), (250), (280), the independent closed area number that is surrounded by stroke is 2.Independent connected region number and independent closed area number that character glyphs is comprised form composite set, for example the composite set that font (210) is corresponding is { independent connected region a number 1, independent closed area number 1}, the composite set that font (280) is corresponding is { independent connected region number 3, independent closed area number 2}.Like this, 15 different font correspondences among Fig. 29 kinds of different composite sets, thereby 9 kinds of encoding states are arranged, can compile is 9 kinds of different sign indicating numbers.
It should be noted that, three kinds of fonts (2001) that glyph group (200) comprises, (2002), (2003) are though the topological structure of font different (" figure " of their correspondences be isomorphism not), but their correspondences is identical with the composite set that independent closed area number forms by independent connected region number, according to the regulation of this coding method, should compile the sign indicating number that is identical.In like manner, should there be identical coding font (2301), (2302), (2303), and also should there be identical coding font (2601), (2602), (2603).Same situation also exists in Fig. 4, as shown in Figure 4, character shape coding to character " mother " has adopted this coding method, the independent connected region number of font (431), (435) is 2, independent closed area number is 1, encoded radio all is " 01 ", but the topological structure of font (431), (435) is different.The also similar above-mentioned example of the coding situation of font (433), (437).
5) based on independent connected region number and independent closed area number and the embodiment of coding method:
As shown in Figure 2, in 15 kinds of different fonts that Chinese character " opens ", independent connected region number that font (2001), (2002), (2003) comprise and independent closed area number and be 1; Independent connected region number that font (210), (2301), (2302), (2303) comprise and independent closed area number and be 2; Independent connected region number that font (220), (240), (2601), (2602), (2603) comprise and independent closed area number and be 3; Independent connected region number that font (250), (270) comprise and independent closed area number and be 4; Independent connected region number that font (280) comprises and independent closed area number and be 5.Like this, 15 different font correspondences among Fig. 25 kinds of different independent connected region numbers and independent closed area number and, thereby 5 kinds of encoding states are arranged, can compile is 5 kinds of different sign indicating numbers.
6) based on the embodiment of the remainder of independent connected region number after divided by integer being carried out Methods for Coding
If utilizing this method encodes to 4 kinds of different fonts (400)~(403) of the character " you " among Fig. 4, suppose that integer gets 2, the parity that is equivalent to the independent connected region number that font is comprised is encoded, suppose that again the character glyphs corresponding codes that independent connected region number is an odd number is " 1 ", the character glyphs corresponding codes that independent connected region number is an even number is " 0 ", and following result is then arranged:
As shown in Figure 4, the independent connected region number that font (400), (402) comprise is 4, is 0 divided by 2 back remainders, and promptly the independent connected region number that comprises of font (400), (402) is an even number, they be encoded to " 0 "; The independent connected region number that font (401), (403) comprise is 5, is 1 divided by 2 back remainders, and promptly the independent connected region number that comprises of font (401), (403) is an odd number, they be encoded to " 1 ".Like this, after encoding according to this method in 4 kinds of different fonts (400)~(403) of character " you ", corresponding 2 kinds of different encoding states, being compiled is 2 kinds of different sign indicating numbers.
7) based on the specific implementation method of independent connected region number and the independent closed area number sum remainder after divided by integer being carried out Methods for Coding
If utilizing this method encodes to 15 kinds of different fonts (200)~(280) that the character among Fig. 2 " opens ", suppose that integer gets 4, independent connected region number and independent closed area number sum are that 0 font corresponding codes is " 00 " divided by 4 back remainders, remainder is that 1 font corresponding codes is " 01 ", remainder is that 2 font corresponding codes is " 10 ", remainder is that 3 font corresponding codes is " 11 ", and following result is then arranged:
As shown in Figure 2, independent connected region number that font (2001), (2002), (2003) comprise and independent closed area number and be 1, divided by 4 the back remainders be 1, character shape coding is " 01 "; Independent connected region number that font (210), (2301), (2302), (2303) comprise and independent closed area number and be 2, divided by 4 the back remainders be 2, character shape coding is " 10 "; Independent connected region number that font (220), (240), (2601), (2602), (2603) comprise and independent closed area number and be 3, divided by 4 the back remainders be 3, character shape coding is " 11 "; Independent connected region number that font (250), (270) comprise and independent closed area number and be 4, divided by 4 the back remainders be 0, character shape coding is " 00 "; Independent connected region number that font (280) comprises and independent closed area number and be 5, divided by 4 the back remainders be 1, character shape coding is " 01 ", and is identical with the coding of glyph group (200).Like this, after 15 different fonts among Fig. 2 are encoded according to this method, corresponding 4 kinds of different encoding states, being compiled is 4 kinds of different sign indicating numbers.
The specific implementation method that the multiple coding method of integrated application is encoded:
For example, 15 kinds of fonts that as shown in Figure 2 character " is opened " adopt multiple coding method to carry out integrated encode.
At first, utilize " based on the coding method of the composite set of independent connected region number and independent closed area number " to encode, as previously mentioned, 15 kinds of fonts have 9 kinds of encoding states, and can compile is 9 kinds of different sign indicating numbers.Wherein the coding of three kinds of fonts (2001) of comprising of glyph group (200), (2002), (2003) is identical; The coding of three kinds of fonts (2301) that glyph group (230) comprises, (2302), (2303) is identical; The coding of three kinds of fonts (2601) that glyph group (260) comprises, (2602), (2603) is identical.
Then, use " based on the coding method of graph structure " that the font in glyph group (200), (230), (260) is carried out secondary coding again.If adopt coded system, then " figure " of the font (2001) that comprises of glyph group (200), (2002), (2003) corresponding three kinds of different structures based on non-directed graph; " figure " of the font (2301) that comprises of glyph group (230), (2302), (2303) corresponding two kinds of different structures then, wherein, " figure " of font (2302), (2303) correspondence is isomorphism; The font (2601) that glyph group (260) comprises, (2602), (2603) are also corresponding " figure " of three kinds of different structures.Through behind twice coding, 15 kinds of character glyphs among Fig. 2 have states different in 14, and can compile is 14 kinds of different sign indicating numbers.
The multiple font of a plurality of characters (string) is carried out the specific implementation method of the method for Unified coding:
The characteristics of this coding method are (to annotate: the multiple font that comprises same character (string)) in the range of convergence of Zu Chenging at the multiple font of a plurality of characters (string), utilization is carried out Unified coding with a kind of method to the multiple font of a plurality of characters (string), and it is unified between a plurality of characters (string) then that character (string) character shape coding value is established rules really.
The definite of specific coding method can expand to the rule of these methods according to Unified coding on the multiple font of a plurality of characters on " the some kinds of multiple fonts to same character (string) carry out Methods for Coding " of the present invention basis.For example, adopt " based on the coding method of independent connected region number ", to the character set among Fig. 4 " good ", "! " font encode, the font (410) of character " good ", (412) and character "! " the independent connected region number that comprises of font (440), (442) be 2, encoded radio is " 0 "; The font (411) of character " good ", (413) and character "! " the independent connected region number that comprises of font (441), (443) be 1, encoded radio is " 1 ".Character set " good ", "! " in the font of character correspondence and the mapping ruler of encoded radio be unified between kinds of characters, thereby satisfy the requirement of this coding method.
On the contrary, if still adopt " based on the coding method of independent connected region number ", to the character set among Fig. 4 you ", and " good ", "! " font encode the requirement of the discontented unabridged version item coding method of the mapping ruler of font shown in Fig. 4 and encoded radio.This is because encoded radio is that the font (400) of 0 character " you ", the independent connected region number of (402) are 4, and encoded radio be 0 character " good " font (410), (412) and character "! " font (440), the independent connected region number of (442) be 2, the font of character " you " and the same character of the mapping ruler of encoded radio " good ", "! " mapping ruler inconsistent.Equally, be character glyphs (401), (403), (411) (413), (441), (443) of " 1 " for encoded radio, the same character of character " you " " good " (or "! ") the font and the mapping ruler of encoded radio also inconsistent.Hence one can see that, adopt " based on the coding method of independent connected region number ", character set " you ", " good ", "! " as the requirement of the discontented unabridged version item coding method of definite method of the character shape coding value of Fig. 4.
Again for example, if adopt " based on the remainder of independent connected region number after divided by integer carried out Methods for Coding ", to above-mentioned character set " you ", and " good ", "! " font encode, suppose that integer gets 2, then the mapping ruler of font shown in Fig. 4 and encoded radio satisfies the requirement of this coding method.Because integer gets 2, be equivalent to the parity of independent connected region number is encoded, suppose that again the character glyphs corresponding codes that independent connected region number is an odd number is " 1 ", the character glyphs corresponding codes that independent connected region number is an even number is " 0 ", as shown in Figure 4: the independent connected region number of font (400) (402) is 4, the independent connected region number of font (410) (412) is 2, font (440), (442) independent connected region number also is 2, the independent connected region number that these fonts comprise is remainder 0 divided by 2 backs, be that the independent connected region number that these fonts comprise is even number, so their coding is identical, be " 0 ".The independent connected region number of font (401) (403) is 5, the independent connected region number of font (411) (413) is 1, the independent connected region number of font (441), (443) also is 1, the independent connected region number that these fonts comprise is remainder 1 divided by 2 backs, be that the independent connected region number that these fonts comprise is odd number, their coding is identical, is " 1 ".In this example, it should be noted that especially font is consistent with the mapping ruler of encoded radio between the kinds of characters, the mapping relations of remainder and character shape coding value can not change between kinds of characters.Hence one can see that, adopts " based on the remainder of independent connected region number after divided by integer carried out Methods for Coding ", character set " you ", " good ", "! " satisfy the requirement of this coding method as definite method of the character shape coding value of Fig. 4.
As from the foregoing, for same character set " you ", and " good ", "! " font as shown in Figure 4; can not adopt " based on the coding method of independent connected region number " that Unified coding is carried out in this set, and can adopt " based on independent connected region number is carried out Methods for Coding divided by the remainder after 2 " that Unified coding is carried out in this set.
Embodiment based on the text digital water mark technology that the multiple font of a plurality of characters (string) is encoded respectively:
Fig. 5 has shown that in the mode of example the watermark information of this item of digital digital watermark loads and the principle that detects.
As shown in Figure 5, to (510), arrive block diagram (520), (530) again, represent the flow process that watermark information loads to the flow process of block diagram (521), (531) at last from block diagram (500), (501).This flow process represents that digital watermark information " 0101100 " (500) is loaded into text, and " hello, mother! " in (501)." hello, mother according to text at first! " the semantic information inquiry table as shown in Figure 4 of each character in (501), determine that each character carries the length of watermark information (figure place), obtain character " you ", " good ", "! " figure place of carrying watermark information is one; The figure place that character " mother " carries watermark information is two; Character ", " does not carry the ability of watermark information.Utilize the method shown in (510) that watermark information is cut apart then, shown in block diagram (510), the watermark information that character " you " is corresponding is " 0 "; The watermark information that character " good " is corresponding is " 1 "; The not corresponding any watermark information of character ", " (because as shown in Figure 4, ", " do not carry the ability of watermark information), the watermark information that previous character " mother " is corresponding is " 01 "; The watermark information that the character in back " mother " is corresponding is " 10 "; Character "! " corresponding watermark information is " 0 ".Next the table in the query graph 4 is again searched the font that each character glyphs coding equals the watermark information of this character correspondence.As shown in Figure 4, in the multiple font of character " you ", (400), (402) be encoded to 0, corresponding watermark information " 0 "; In the multiple font of character " good ", (411), (413) be encoded to 1, corresponding watermark information " 1 "; In the multiple font of character " mother ", (431), (435) be encoded to 01, corresponding watermark information " 01 ", (432), (436) be encoded to 10, corresponding watermark information " 10 "; Character "! " multiple font in, (440), (442) be encoded to 0, corresponding watermark information " 0 ".The corresponding result of character glyphs and watermark information is shown in block diagram (520), (530).Among Fig. 5, each character all has the coding of two fonts to equal the watermark information of this character correspondence, and these two fonts belong to different font styles respectively: the Song typeface and lishu.At last altogether with the glyph group of a plurality of characters of same font style, text-string (521), (531) of watermark information " 0101100 " (500) have been obtained carrying, wherein the font style of character string (521) is a lishu, and the font style of character string (531) is the Song typeface.Because the font style between character string (521), (531) inner each character is unified, the influence that loading digital watermark information " 0101100 " (500) back visually brings for the people is very little.
Fig. 7 has shown the process that the digital watermark information of this technology detects.The carrier file (700) that has digital watermark information identifies original vector e-file (720) by the semantic recognition system of character (710).Meanwhile, character glyphs recognition system (730) utilizes the recognition result of the semantic recognition system of character (710) that the carrier file (700) that has digital watermark information is carried out the character glyphs code identification, after identifying the coding of each character glyphs, the coding of each character just obtains digital watermark information (740) in the combination carrier file.The reason that character glyphs recognition system (730) need utilize the recognition result of the semantic recognition system of character (710) to carry out the character glyphs code identification is: in this text digital water mark technology, the coding method of each character in the carrier file can be different, and the watermark information testing process at first needs the specific character shape coding method of each character in the clear and definite carrier file.Therefore, need carry out semanteme identification to each character in the carrier file, search as shown in Figure 4 coding schedule, determine the character shape coding method that each character is specific by the semantic information of character, thereby detect the font style characteristic of each coding method correspondence, further determine the coding of each character glyphs.Need to prove the detection method of the font style characteristic of coding method correspondence of the present invention, be existing mature technology.For example, the homoorganicity of " figure " of character glyphs mapping is judged that the independent connected region number that character glyphs is comprised, the calculating of independent closed area number utilize to have mature technology now and can finish, these technology are not within the scope of the present invention.
For example, from character string shown in Figure 5 (521) or (531), detect digital watermark information if desired, detection system need be known the specific character shape coding mode of each character of forming character string (521) or (531), so, detection system at first should identify the semantic information of each character, obtains the specific coding method of each character by the semantic information of character again.Character " you " in this example, " good ", "! " what adopt is " based on the independent coding method that is communicated with district's number ", what character " mother " adopted is " based on the independent coding method that is communicated with the composite set of district's number and independent closed area number ", character ", " does not carry watermark information.Then, detection system is according to the particular glyph coding method of each character, detects the font style characteristic of coding method correspondence, for example, should detect character " you ", " good ", "! " the independent font style characteristic that is communicated with district's number of font correspondence, detect the font style characteristic of independent connected region number and the composite set of independent closed area number formation of the font correspondence of character " mother ".The result who detects the character font style characteristic obtains corresponding character glyphs coding, and the corresponding relation of each character glyphs and coding is shown in block diagram (520), (530) in character string (521), (531).Make up the coding of each character glyphs, obtain the digital watermark information that character string (521), (531) carry and be " 0101100 " (500).
Based on embodiment to the text digital water mark technology of the multiple font Unified coding of a plurality of characters (string):
Fig. 6 has shown the loading of watermark information of this item of digital digital watermark and the principle of detection in the mode of example.
As shown in Figure 6, to (610), arrive block diagram (620), (630) again, represent the flow process that watermark information loads to the flow process of block diagram (621), (631) at last from block diagram (600), (601).This flow process represents that digital watermark information " 010 " (600) is loaded into text, and " hello! " in (601)." hello for text-string as shown in Figure 4! " the coding method of each character in (601) identical (all adopting " based on independent connected region number is carried out Methods for Coding divided by the remainder after 2 "); and the watermark information length (figure place) that each character carries is identical; do not need special watermark information dividing processing, only needs in order, equal length (figure place) carries out the watermark information correspondence.Shown in block diagram (610), an information in the corresponding watermark information " 010 " of each character sequence (600).And then the table in the query graph 4, search the font that each character glyphs coding equals the watermark information of this character correspondence.As shown in Figure 4, in the multiple font of character " you ", (400), (402) be encoded to 0, corresponding watermark information " 0 "; In the multiple font of character " good ", (411), (413) be encoded to 1, corresponding watermark information " 1 "; Character "! " multiple font in, (440), (442) be encoded to 0, corresponding watermark information " 0 ".The corresponding result of character glyphs and watermark information is shown in block diagram (620), (630).Among Fig. 6, each character all has the coding of two fonts to equal the watermark information of this character correspondence, and these two fonts belong to different font styles respectively: the Song typeface and lishu.At last altogether with the glyph group of a plurality of characters of same font style, text-string (621), (631) of watermark information " 010 " (600) have been obtained carrying, wherein the font style of character string (621) is a lishu, and the font style of character string (631) is the Song typeface.Because the font style between character string (621), (631) inner each character is unified, the influence that loading digital watermark information " 010 " (600) back visually brings for the people is very little.
Fig. 8 has shown the process that the digital watermark information of this technology detects.The carrier file (800) that has digital watermark information identifies original vector e-file (820) by the semantic recognition system of character (810).Meanwhile, character glyphs recognition system (830) is carried out the character glyphs code identification to the carrier file (800) that has digital watermark information.Owing to each character in the carrier file (800) that has digital watermark information has common character shape coding method, can directly detect the font style characteristic of each character corresponding at this coding method with this coding method, further determine the coding of each character glyphs, the coding of each character just obtains digital watermark information (840) in the combination carrier file.Semantic identifying of character and character glyphs code identification process are independently mutually, and direct contact does not take place.
For example, detect digital watermark information if desired from character string shown in Figure 6 (621) or (631), character glyphs recognition system (830) is carried out character glyphs identification according to unified character shape coding method.Character " you " in the example, " good ", "! " all adopted " based on independent connected region number is carried out Methods for Coding divided by the remainder after 2 " (be equivalent to " carrying out Methods for Coding ") based on parity to independent connected region number; so; character glyphs recognition system (830) can directly be judged the parity of the independent connected region number that each character glyphs comprises in character string (621) or (631); odd number is encoded to " 1 ", and even numbered is " 0 ".This rule all is identical to each character, thereby can directly detect the character glyphs corresponding codes, do not need to know the semantic information of each character, the corresponding relation of each character glyphs and coding is shown in block diagram (620), (630) in character string (621), (631).Make up the coding of each character glyphs, obtain the digital watermark information that character string (621), (631) carry and be " 010 " (600).
This item of digital digital watermark is compared with " based on the text digital water mark technology that the multiple font of a plurality of characters (string) is encoded respectively ", the difference of essence is: in this technology, each character that carries in the carrier file of digital watermark information need adopt common character shape coding method, and for " based on the text digital water mark technology that the multiple font of a plurality of characters (string) is encoded respectively ", each character in the carrier file can adopt different character shape coding methods.

Claims (14)

1. method that same character (string) is designed to multiple font that is used to carry digital watermark information, it is characterized in that: change the topological structure of character (string) by the disconnected relation of company that changes between each stroke of forming character (string), thereby obtain multiple character (string) profile of semantically identical same character (string).
2. the multiple font to same character (string) carries out Methods for Coding, the steps include:
(1) character (string) font is mapped as " figure " that defines in the mathematics subject " graph theory " according to clear and definite rule.
(2) " figure " corresponding characters (string) font of isomorphism has identical coding, and the coding of " figure " corresponding characters (string) font of isomorphism can not identical (promptly not having two different codings at least).
3. the multiple font to same character (string) carries out Methods for Coding, the steps include:
(1) character (string) font is mapped as " figure " that defines in the mathematics subject " graph theory " according to clear and definite rule.
(2) with the component (each largest connected subgraph) of " figure " as component, and form according to the spatial order of clearly definition and to arrange set, wherein, the component of isomorphism (largest connected subgraph) is defined as identical set element.
(3) identical arrangement set corresponding characters (string) font has identical coding, and the coding of corresponding characters (string) font is gathered in different arrangements can not identical (promptly having two different codings at least).
4. the multiple font to same character (string) carries out Methods for Coding, it is characterized in that:
The number of the independent connected region of forming at the stroke of character (string) is encoded, character (string) font that independent connected region number is identical has identical coding, and the coding of character (string) font that independent connected region number is different can not identical (promptly having two different codings at least).
5. the multiple font to same character (string) carries out Methods for Coding, it is characterized in that:
The composite set of the independent closed area number that the independent connected region number of forming at the stroke of character (string) and the stroke of character (string) surround is encoded, identical composite set corresponding characters (string) font has identical coding, and the coding of different composite set corresponding characters (string) fonts can not identical (promptly having two different codings at least).
6. the multiple font to same character (string) carries out Methods for Coding, it is characterized in that:
The independent closed area number sum that the independent connected region number of forming at the stroke of character (string) and the stroke of character (string) surround is encoded, the character that both sums are identical (string) font has identical coding, and the coding of the character that both sums are different (string) font can not identical (promptly having two different codings at least).
7. the multiple font to same character (string) carries out Methods for Coding, it is characterized in that:
The remainder of the independent connected region number of forming at the stroke of character (string) after divided by integer encoded, and the character that remainder is identical (string) font has identical coding, and different character (string) font of remainder has different codings.
8. the multiple font to same character (string) carries out Methods for Coding, it is characterized in that:
The remainder of the independent closed area number sum that the number of the independent connected region of forming at the stroke of character (string) and the stroke of character (string) surround after divided by integer encoded, the character that remainder is identical (string) font has identical coding, and different character (string) font of remainder has different codings.
9. the multiple font to same character (string) carries out Methods for Coding, it is characterized in that:
Comprehensive utilization is encoded to the multiple font of same character (string) as the described coding method of claim 2~8.
10. the multiple font to a plurality of characters (string) carries out Methods for Coding, it is characterized in that:
Multiple font at a plurality of characters (string) (is annotated: the multiple font that comprises same character (string)) in the range of convergence of Zu Chenging, utilize as a kind of method in the described method of claim 2~9 the multiple font of a plurality of characters (string) is carried out Unified coding, it is unified between a plurality of characters (string) then that character (string) character shape coding value is established rules really.
11. the multiple font to character (string) carries out Methods for Coding, it is characterized in that:
At the component of " figure " of character (string) font correspondence, independent connected region number, independent closed area number one or multinomial arrangement set (perhaps composite set) as element are encoded.
12. the multiple font to character (string) carries out Methods for Coding, it is characterized in that:
At number, the independent closed area number of the independent connected region of character (string) font correspondence are encoded as the mathematical operation result of parameter.
13. a text digital water mark technology is characterized in that:
(1) adopt suitable character (string) font of the method for claim 1 design, and with respectively the multiple font of a plurality of characters of carrier file (string) being encoded as the described method of claim 2~9.Digital watermark information is embedded in the multiple font of each character of carrier file (string), and the coding of character (string) font is used for representing digital watermark information.
(2) coding method of determining in (1) at each character (string) in the carrier file separately detects the font style characteristic of each character (string) in the carrier file respectively, with the coding of definite each character of carrier file (string) font, thereby detects digital watermark information.
14. a text digital water mark technology is characterized in that:
(1) adopts suitable character (string) font of the method for claim 1 design, and the multiple font of a plurality of characters (string) in the carrier file is carried out Unified coding with method as claimed in claim 10.Digital watermark information is embedded in the multiple font of each character of carrier file (string), and the coding of character (string) font is used for representing digital watermark information.
(2) the common coding method of determining in (1) at each character (string) in the carrier file, the unified font style characteristic that detects each character (string) in the carrier file with the coding of definite each character of carrier file (string) font, thereby detects digital watermark information.
CN 200410040853 2004-10-18 2004-10-18 Text digital Watermark tech using character's features for carrying watermark information Pending CN1601956A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN 200410040853 CN1601956A (en) 2004-10-18 2004-10-18 Text digital Watermark tech using character's features for carrying watermark information
CN 200510065893 CN1684115B (en) 2004-10-18 2005-04-20 Text digital water printing technology based on character topoloical structure
PCT/CN2005/001703 WO2006042460A1 (en) 2004-10-18 2005-10-17 Hidden data communication method and the application thereof in text digital watermark technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410040853 CN1601956A (en) 2004-10-18 2004-10-18 Text digital Watermark tech using character's features for carrying watermark information

Publications (1)

Publication Number Publication Date
CN1601956A true CN1601956A (en) 2005-03-30

Family

ID=34664803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410040853 Pending CN1601956A (en) 2004-10-18 2004-10-18 Text digital Watermark tech using character's features for carrying watermark information

Country Status (1)

Country Link
CN (1) CN1601956A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8311265B2 (en) 2005-09-16 2012-11-13 Beijing Sursen International Information Tech Co. Embedding and detecting hidden information
CN102096787B (en) * 2009-12-14 2013-06-05 南京信息工程大学 Method and device for hiding information based on word2007 text segmentation
CN105095699A (en) * 2014-05-20 2015-11-25 富士通株式会社 Watermark information embedding method and device, and watermark information decoding method
CN107037814A (en) * 2017-05-10 2017-08-11 中山市金马科技娱乐设备股份有限公司 The space positioning system and its localization method of trackless Ferris Wheel
CN109582926A (en) * 2018-11-26 2019-04-05 北京邮电大学 A kind of digital printing method of the anti printing and scanning attack based on fusion font
CN110874456A (en) * 2018-08-31 2020-03-10 浙江大学 Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method
CN111191414A (en) * 2019-11-11 2020-05-22 苏州亿歌网络科技有限公司 Page watermark generation method, identification method, device, equipment and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8311265B2 (en) 2005-09-16 2012-11-13 Beijing Sursen International Information Tech Co. Embedding and detecting hidden information
CN102096787B (en) * 2009-12-14 2013-06-05 南京信息工程大学 Method and device for hiding information based on word2007 text segmentation
CN105095699A (en) * 2014-05-20 2015-11-25 富士通株式会社 Watermark information embedding method and device, and watermark information decoding method
CN107037814A (en) * 2017-05-10 2017-08-11 中山市金马科技娱乐设备股份有限公司 The space positioning system and its localization method of trackless Ferris Wheel
CN107037814B (en) * 2017-05-10 2024-01-05 广东金马游乐股份有限公司 Space positioning system and method for trackless sightseeing vehicle
CN110874456A (en) * 2018-08-31 2020-03-10 浙江大学 Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method
CN110874456B (en) * 2018-08-31 2022-04-26 浙江大学 Watermark embedding method, watermark extracting method, watermark embedding device, watermark extracting device and data processing method
CN109582926A (en) * 2018-11-26 2019-04-05 北京邮电大学 A kind of digital printing method of the anti printing and scanning attack based on fusion font
CN111191414A (en) * 2019-11-11 2020-05-22 苏州亿歌网络科技有限公司 Page watermark generation method, identification method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN1296806C (en) Reduced keyboard disambiguating system
CN1259632C (en) Method and system for filtering & selecting from a candidate listing generated by random inputting method
US9069753B2 (en) Determining proximity measurements indicating respective intended inputs
CN1024050C (en) Method and apparatus for encoding and recording Chinese characters
CN1684115A (en) Text digital water printing technology based on character topoloical structure
CN1648828A (en) System and method for disambiguating phonetic input
CN105229669A (en) Image processing apparatus and image processing method
CN103049458B (en) A kind of method and system revising user thesaurus
CN102156551A (en) Method and system for correcting error of word input
CN101727271A (en) Method and device for providing error correcting prompt and input method system
CN1075563A (en) Improving one's methods of the exchange code conversion of multi-byte character string characters
Hung et al. Micrography QR codes
CN1601956A (en) Text digital Watermark tech using character's features for carrying watermark information
CN1771494A (en) Automatic segmentation of texts comprising chunsk without separators
CN104331400B (en) A kind of Mongolian code conversion method and device
JPWO2008146583A1 (en) Dictionary registration system, dictionary registration method, and dictionary registration program
CN1674055A (en) Text digital water mark technology based on symbol redundancy encoding
CN103488616B (en) A kind of embedded font processing method and device
CN103123572B (en) A kind of method inputting character and electronic installation
CN1387109A (en) Numeral (keypad) input method for braille
CN104794140A (en) Text highlighting method and device
CN105487684B (en) The output intent of Chinese-character phonetic letter character and the output device of Chinese-character phonetic letter character
CN1059281C (en) Chinese phonetic coding method with initial consonant, simple or compound vowel and tone
CN106293368A (en) A kind of data processing method and electronic equipment
CN1257445C (en) Chinese-character 'Pronunciation-meaning code' input method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication