WO2006042460A1

WO2006042460A1 - Hidden data communication method and the application thereof in text digital watermark technology

Info

Publication number: WO2006042460A1
Application number: PCT/CN2005/001703
Authority: WO
Inventors: Dong Liu
Original assignee: Dong Liu
Priority date: 2004-10-18
Filing date: 2005-10-17
Publication date: 2006-04-27
Also published as: CN1684115B; CN1684115A

Abstract

A hidden data communication method and the application thereof in text digital watermark technology are disclosed. The invention belongs to the field of communication and information engineering, particularly, to technologies of the data hiding, data encoding and decoding, and digital watermark. In the invention, the hidden information is carried by using the shapes of characters or character strings which have different topological structures. The invention is characterized in easy detecting, strong anti-noise capability and good robustness. According to the invention, the visual influence caused by the watermark information is imperceptible, and the embedded digital watermark information has the advantages of being imperceptible, extendable, and having a capability of carrying large quantity of information.

Description

Hidden data communication method and application thereof in text digital watermarking technology

The invention belongs to the field of communication and information engineering, and specifically relates to data hiding, data encoding and decoding, and digital water printing technology.

Background technique

Digital watermarking technology is an important part of information hiding technology. It uses specific information (digital watermark information) to be hidden in various digital image, sound, video and text digital products by digital embedding. On the one hand, these electronic products with digital watermark information can be used not easily and perceptibly, and on the other hand, digital watermark information embedded in these digital products can be detected by specific technical means. Digital watermarking technology is widely used in many watersheds such as copyright protection of digital products, content verification and anti-counterfeiting, prevention of illegal copying, operation tracking, and secret data communication. According to different digital watermark carriers, digital watermarks can be divided into image digital watermarks, sound digital watermarks, video digital watermarks, and text digital watermarks. The invention relates mainly to the field of text digital watermarking, characterized in that watermark information is hidden in a file composed of characters as elements, and the main existing text digital watermarking techniques are as follows:

(1) Using the format information of the text file to save the digital watermark information. Watermark information is usually embedded by encoding the word spacing and line spacing of the text. The shortcoming of this kind of thinking is: For the coding method using word spacing and line spacing, the Latin alphabet-based language system (such as English) has certain advantages, but for languages like Chinese, which are based on block text, because There is a word spacing in the English sense, and the embedding of the watermark is difficult. At the same time, the error of detecting the watermark information encoded by the word spacing is large, and the watermarking technique using the line spacing coding carries less watermark information.

(2) Carrying watermark information by punctuation information encoding and character font encoding. The disadvantages are: Since the punctuation marks in the text file are relatively small, the information carried by the punctuation information is less. Text Digital Coding Using Character Fonts The main problem with watermarking techniques is that it is difficult to detect watermark information using printed documents as a carrier file. The detection method in this case is not mentioned herein.

(3) Using the feature encoding of the character to save the digital watermark information. It mainly involves changing the length of a part of the character stroke or the height of the entire character to embed the watermark information. The main problem of this technology is also that it is difficult to detect the watermark information using the printed documents as the carrier file, and it will bring a large visual impact.

(4) The text file is converted into an image file, and the watermark information is loaded according to the method provided by the image digital watermarking technology. A disadvantage of this method is that the display and processing of image electronic files with watermark information cannot be performed with most word processing software.

(5) Encoding different words of the same meaning by synonymous substitution of specific phrases in the text for loading watermark information. The disadvantage of this technique is that it is difficult to find the proper synonym for all vocabulary, and the capacity of the text to embed the watermark information is rather limited. After all, not every vocabulary has a synonym corresponding to it.

Summary of the invention

The object of the present invention is to provide a text digital watermarking technology for carrying hidden watermark information by using a topology of characters (strings), which is used for solving the visual influence of watermark information, such as watermark information, existing in existing text digital watermarking technology. Larger, the carrier file carries the capacity of the watermark information is small, and the digital watermark information of the printed matter is difficult to detect. The technique of the present invention is suitable for the case where the carrier file of the digital watermark information is an electronic file or a printed document.

The "character (string)" described in this article is shorthand for "character or string".

The basic principle of the present invention is to design a plurality of glyphs of semantically identical characters (strings) by appropriately changing the topological structure of characters (strings), and to properly encode glyph features based on the character (string) glyph topology. The digital watermark information is embedded by the encoding of the character (string) glyph, thereby forming a new text digital watermarking technology.

The present invention includes the following closely related contents: Confirmation, (1) A method for designing the same character (string) into a plurality of glyphs for carrying digital watermark information;

(2) Several methods of encoding multiple glyphs of the same character (string);

(3) a number of methods for uniformly encoding multiple glyphs of multiple characters (strings);

(4) A textual digital watermarking technique based on a plurality of glyphs for a plurality of characters (strings);

(5) A textual digital watermarking technique based on uniform coding of multiple glyphs for multiple characters (strings);

A method for designing the same character (string) into a plurality of glyphs for carrying digital watermark information:

The main design idea is to design a variety of character (string) shapes of the same character (string) that are semantically identical by appropriately changing the topology of the characters (strings). Among them, the more natural glyph design method is: change the topology of the character (string) by changing the concatenation relationship between the strokes that make up the character (string). However, it is not limited to this, any method of changing the character (string) topology is feasible, as long as the change does not cause the semantic confusion of the character (string) itself for the human visual recognition ability.

It should be specially noted that the design method of character glyphs can be similarly extended to the design of string glyphs. In this case, the entire string is treated as a whole to change the topology of the entire string. Not only can the topology of the individual characters that make up the string be changed, but also the transition between the characters that make up the string can be changed. The relationship changes the topology of the string, thereby designing a variety of topological shapes for the string, and using the glyphs of different topological structures of the string to carry the digital watermark information. In the method of carrying digital watermark information by using characters, the minimum carrier unit carrying the watermark information is a single character, and in the method of carrying the digital watermark information by using the character string, the minimum carrier unit carrying the watermark information is a character composed of multiple characters. string. The glyph design ideas and encoding rules in these two methods are not fundamentally different. You can think of a string as a character with a complex topology.

In an actual text digital watermarking system, if it is desired to carry digital watermark information in units of character strings, the present invention recommends that a complete word (group) composed of a plurality of characters be used as a basic unit for carrying digital watermark information, and the design of the string glyph is also Design for complete words (groups). Latin-based languages (such as English) are more suitable for carrying digital watermark information using strings, usually in handwritten and cursive glyphs. At the same time, for Chinese, Korean and other block-based languages, art can be designed. Font style (or handwritten, cursive) glyphs to carry digital watermark information.

Several ways to encode multiple glyphs of the same character (string):

The essence of the present invention is to represent hidden information by glyphs of different topologies of semantically identical characters (strings), which requires encoding of glyphs of different topologies. The basic encoding rules of the present invention are: characters of the same topology ( The string) glyphs have the same encoding, and the encoding of the character (string) glyphs of different topologies cannot be identical (ie, there are at least two different encodings). In general, glyphs of different topologies should be coded as different code values as much as possible.

The following are six specific encoding methods that comply with the above rules and several variant encoding methods.

1) Encoding method based on "graph" structure

The method includes the following steps:

(1) 'The character (string) glyphs are mapped to the "graphs" defined in the "Graphics" of the mathematics according to certain rules. A specific rule is to map feature points such as vertices, intersections, and inflection points of character strokes to nodes (endpoints) of "graphs" defined in the mathematical theory "graph theory", and connect these feature points (vertices, intersections, The stroke between the inflection point, etc. is mapped to the side of the "graph". In this way, the glyphs of characters (strings) can be mapped to the undirected "graphs" defined in "Graphics". The glyph of a character (string) and the "picture" are a many-to-one mapping relationship, that is, a glyph of one character (string) is mapped onto a "picture", and a "picture" may be mapped to a plurality of character (string) glyphs. You can also map character (string) glyphs to directed graphs by adding specific spatial order rules (for example, from left to right, top to bottom, etc.) on the basis of undirected graphs.

(2) The character (string) glyphs corresponding to the isomorphic "graph" have the same encoding, and the encoding of the character (string) glyphs corresponding to different "pictures" cannot be identical (that is, at least two different encodings are required) . In the plurality of "graphs" corresponding to different glyphs of the same character (string) obtained in step (1), it is possible that some of the "graphs" are isomorphic (according to the definition of isomorphism in "graph theory") When encoding, the character (string) glyph corresponding to the isomorphic "graph" should be encoded into the same code. In general, the character (string) glyphs corresponding to different "pictures" should be encoded into different codes to maximize the capacity of the character (string) to carry the watermark information. However, considering that in an actual text digital watermarking system, the capacity of a plurality of different characters (strings) to carry information is preferably the same, and the number of codes of the desired character (string) glyph is rounded (for example, usually required A factor such as a multiple of 2 or a power exponent of 2) allows a plurality of differently constructed character (string) glyphs to have the same encoding. In order to ensure that the character (string) has at least one binary watermark information carrying capacity, at least two different fonts corresponding to the "picture" have different encodings, that is, the character (string) font carrying the digital watermark information must have at least Two different encoding states.

2) Encoding method based on the number of independent connected regions

The method encodes the number of independent connected regions (ie, those connected regions that are not connected to each other) contained in the character (string) glyph. The number of independent connected regions of the character (string) glyph is equal to the number of components of the "graph" corresponding to the character (string) glyph.

The coding rule takes into account the same reason as the "image structure-based coding method" step (2), and the character (string) glyphs of the same independent connected area number have the same coding, and the characters of different independent connected areas ( The encoding of the string) glyphs cannot be identical (ie there are at least two different encodings).

"Coding method based on graph structure" needs to judge the isomorphism between different "graphs", and the isomorphism judgment algorithm of "graph" has higher computational complexity in mathematics theory (for NP problem). Although the "graph" corresponding to the character (string) glyph is usually not too complicated, it is feasible to directly use the isomorphic judgment algorithm of the existing "graph", but the process is still relatively complicated. This coding method is a simplified method of "pattern-based coding method".

3) Coding method based on a combined set of the number of independent connected regions and the number of independent closed regions

Some specific characters (strings), especially characters (strings) in languages such as Chinese and Korean, have one or more enclosed areas enclosed by strokes. Further, with the character (string) font design method of the present invention, it is also possible to design a closed area surrounded by character (string) font strokes for some specific characters (strings). The encoding method encodes a combined set of the number of independent connected regions included in the character (string) glyph and the number of independent closed regions included in the character (string) glyph.

The coding rule takes into account the same reason as the "image structure-based coding method" step (2), and the character (string) glyph corresponding to the same combination set has the same coding, and the corresponding combination of characters (string) The encoding of the glyphs cannot be identical (ie there are at least two different encodings).

Compared with the separate "encoding method based on the number of independent connected regions", this method provides greater flexibility and more coding space.

4) Encoding method based on the sum of the number of independent connected regions and the number of independent closed regions

The method encodes the sum of the number of independent connected regions included in the character (string) glyph and the number of independent closed regions included in the character (string) glyph.

The coding rule takes into account the same reason as the "pattern-based coding method" step (2), and the character (string) glyphs of the same number of independent connected areas and the number of independent closed areas have the same coding, two The encoding of different character (string) glyphs cannot be identical (ie, there are at least two different encodings).

Compared with the "encoding method based on the combined set of the number of independent connected areas and the number of independent closed areas", the method is simpler.

5) Method for encoding based on the remainder of dividing the number of independent connected regions by an integer

The method encodes the number of independent connected regions included in the character (string) glyph divided by the integer, and the character (string) glyphs with the same remainder have the same encoding, and the character (string) glyphs with different remainders have different encodings.

The value of an integer in this method is flexible. When an integer is 2, the method is equivalent to the number of independent connected regions. Parity is encoded. The characters (strings) with independent odd-numbered connected areas have the same encoding, and the characters with independent number of connected areas (strings) have the same encoding, but the number of independent connected areas is different. The encoding between character (string) glyphs is different. In this method, the integer value ranges from 2 to 4, and from the perspective of ease of use, the integer value of the present invention is 2 or 4.

6) A method of encoding based on the remainder of the number of independent connected regions and the number of independent closed regions divided by an integer

The method encodes the sum of the number of independent connected regions included in the character (string) glyph and the number of independent closed regions included in the character (string) glyph divided by the remainder of the integer, and the characters (strings) having the same remainder have the same Encoding, the remainder of the different character (string) glyphs have different encodings.

This method provides greater flexibility, but is essentially similar, compared to the foregoing method of encoding only the remainder after dividing the number of independent connected regions by an integer. In this method, the integer value range is preferably between 2 and 8. From the perspective of ease of use, the integer value of the present invention is 2, 4 or 8. It is to be noted that there are many variations of the above six typical coding methods of the present invention.

A variant is as follows: The mathematical operation result of the number of independent connected regions of the character (string) and the number of independent closed regions is used as a parameter. For example, similar to the above-mentioned "encoding method based on the number of independent connected regions", it is possible to encode the results of mathematical operations such as square, cubic, parity, and whether or not the number of independent connected regions is a prime number. It is also possible to encode the product of the number of independent connected regions and the number of independent closed regions, or the number of independent connected regions, similar to the above-mentioned "encoding method based on the sum of the number of independent connected regions and the number of independent closed regions". The number of independent closed regions is used as a parameter to perform mathematical operations such as exponential operations and logarithmic operations, and the results are encoded. Further, it is also possible to encode the array of arrangement combinations formed by the above various mathematical operation results. And so on, you can transform a lot of coding methods.

Another variant is that the character (string) glyphs are encoded by a plurality of encoding methods. On the basis of coding by a method, the character (string) glyphs having the same coded value based on the coding method are further coded by other methods, and may be multi-timed by multiple methods. coding. For example, firstly, the method based on the method of the number of independent connected regions is used, and then the plurality of character (string) glyphs having the same number of independent connected regions are secondarily encoded, and the method of secondary encoding can be based on "independent communication. The encoding method of the combined number of regions and the number of independent closed regions." Further, the character (string) glyph of the same number of independent connected regions and the combined set of independent closed regions may be further encoded three times, and the third encoding method may adopt a "coding method based on the graph structure". By analogy, many encoding methods can be transformed, which can be selected according to the self-topology of the character (string) glyph. Combining multiple encoding methods to encode character (string) glyphs can expand the capacity of characters (strings) to carry digital watermark information.

Although the above described deformation method is similar in form to the six typical coding methods of the present invention, it is essentially a simple extension of these typical coding methods.

Several methods for uniformly encoding multiple glyphs of multiple characters (strings):

Encoding a plurality of glyphs of the same character (string) of the present invention within a set of a plurality of glyphs of a plurality of characters (strings) (note: a plurality of glyphs including the same character (string)) One of the methods "uniformly encodes a plurality of glyphs of a plurality of characters (strings), and the rule for determining the character (string) glyph code value is uniform among a plurality of characters (strings).

In an actual digital watermarking system, it is usually required to carry a digital watermark by a plurality of characters (strings) of a carrier file, and the method is a method of uniformly encoding a plurality of characters of a plurality of characters (strings). The method adopts one of the "methods for encoding a plurality of glyphs of the same character (string)" (such as the six typical coding methods and the deformation method thereof described in the foregoing section), for a plurality of characters (strings) The various glyphs are encoded using the same method, and the specific correspondence between the glyph features of the characters (strings) and the encoding is uniform between different characters (strings). For example, if using "based on graphs The encoding method of the structure has the same encoding not only for the various glyphs of the same character (string) corresponding to the isomorphic "graph", but also for the various glyphs of different characters (strings) corresponding to the isomorphic "graph". If the "encoding method based on the number of independent connected regions" is used, all the character (string) glyphs having the same number of independent connected regions correspond to the same encoded value, regardless of whether the glyphs of these characters (strings) are the same character ( The glyph of the string), or the glyph of multiple different characters (strings).

A textual digital watermarking technique based on a plurality of glyphs for multiple characters (strings):

In the text digital watermarking technique of this item, the digital watermark information is embedded in a plurality of glyphs of the character (string) of the carrier file, and the encoding of the character (string) glyph is used to represent the digital watermark information. The design method of the glyph adopts the method of changing the topology of the character (string) glyph of the present invention, and designs a plurality of character (string) glyphs for the same character (string). The glyph coding method employs six typical methods of encoding a plurality of glyphs of the same character (string) of the present invention and a plurality of variations thereof.

An important feature of this text digital watermarking technology is that the glyph encoding methods for a plurality of different characters (strings) contained in the carrier file can be different, and a special glyph can be selected for a certain character (string) according to the characteristics of the character (string) itself. Coding method. Generally speaking, the complexity and topology of character (string) strokes have their own characteristics. Under the premise of maintaining a certain visual sensory quality, the number of different topological structure glyphs that can be designed by different characters (strings) is Difference. In essence, the ability of each character (string) to carry digital watermark information through the change of glyph is different. Different font coding methods for different characters (strings) can fully reflect this difference, thereby increasing the entire carrier file carrying numbers. The ability to watermark information. In the determination of the specific code value, the corresponding rules of the glyphs and code values of different characters (strings) in the text digital watermarking technology are independent of each other. For multiple glyphs of the same character (string), only the different topologies of the character (string) are considered to be encoded, regardless of the other character (string) glyph topology may bring the character (string) The effect is thus simpler to code.

The feature of the watermark information detecting method of the text digital watermarking technology is: It is necessary to clarify the unique glyph encoding method of each character (string) in the carrier file. The detection process should generally first determine the semantic information of each character (string) in the carrier file, query each character (string) specific glyph encoding method according to the semantic information of the character (string), and then detect the corresponding glyph feature according to the specific glyph encoding method. The encoding of each character (string) glyph in the carrier file is determined to detect the digital watermark information. For example, if the encoding method corresponding to a character (string) is "coding method based on graph structure", the "graph" structure feature corresponding to the character (string) glyph should be detected to determine the corresponding glyph encoding. The encoding method corresponding to another character (string) is "the encoding method based on the number of connected regions", and the glyph feature of the number of connected regions included in the character (string) glyph should be detected to further determine the encoding of the glyph. By combining the font codes corresponding to the characters (strings) in the carrier file, the digital watermark information carried by the entire carrier file is obtained. In the text digital watermarking technology of this text, the basis of the digital watermark information detection process is to first specify the specific glyph encoding method of each character (string) in the carrier file, and the detection process of the digital watermark information carried by one character (string) and the character ( The semantic information of a string is usually associated. A textual digital watermarking technique based on a uniform encoding of multiple glyphs for multiple characters (strings):

Similar to the above-mentioned "text digital watermarking technique based on encoding a plurality of glyphs of a plurality of characters (strings)", in the text digital watermarking technique, digital watermark information is embedded in a plurality of glyphs of carrier file characters (strings). The encoding of the character (string) glyph is used to represent the digital watermark information. Still using the method of changing the topology of a character (string) of the present invention, a plurality of character (string) shapes are designed for the same character (string).

Compared with "text digital watermarking technology based on encoding multiple fonts of multiple characters (strings) respectively, the main difference of this technique is that: the encoding method of the glyphs uses the "several pairs of characters" (string) of the present invention. ) A method of unified coding of multiple glyphs and its deformation method.

An important feature of this text digital watermarking technique is that the glyph encoding methods for multiple characters (strings) of a carrier file are the same, and only a common method can be used to encode multiple glyphs of multiple characters (strings). In determining the specific code value, the rule for determining the glyph code value of different characters (strings) is uniform, that is, for a plurality of glyphs of a plurality of different characters (strings), as long as their corresponding topological features are the same, their The encoded values should be the same. For the same character (string) The encoding of multiple glyphs should not only consider the encoding factors of the different topological glyph features of the character (string) itself, but also the encoding of other character (string) glyphs, which should be combined with other characters (strings). The coding is coordinated.

The feature of the watermark information detection method of the text digital watermarking technology is: Since each character (string) in the carrier file has only one common glyph coding method, the detection process does not need to know the semantic information of each character (string), and can directly The encoding characteristics of the character (string) font are detected for a common encoding method to determine the encoding of each character (string) font in the carrier file, thereby detecting the digital watermark information. For example, if the carrier file uses the "coding method based on the graph structure", the same method is used for multiple characters (strings) in the carrier file, and the "graph" structure feature corresponding to each character (string) font can be directly detected. Determine its encoding. For another example, if the carrier file uses "method of encoding based on the remainder of dividing the number of independent connected regions by an integer", and the integer is taken as 2, the detection method is extremely simple. It is not necessary to know the semantic information of each character (string), and directly calculate the number of independent connected regions of each character (string), the odd number is one type of code (for example, code is 1), and the even number is another code (for example) The code is 0), thereby directly determining the encoding of each character (string) glyph. By combining the glyph codes corresponding to the characters (strings) in the carrier file, the digital watermark information carried by the entire carrier file is obtained. In the text digital watermarking technique, the detection process of the digital watermark information carried by one character (string) is independent of the semantic information of the character (string).

Compared with the existing text digital watermarking technology, the main features of the present invention are:

(1) The method for detecting digital watermark information of the present invention is only related to the topology of the character (string) font, and is independent of the size of the character (string) and the tilt angle, and is convenient for detection. The scaling and rotation of the character (string) glyph does not affect the detection of the watermark information, and the anti-noise ability is strong, and the Lu Gang is good.

(2) The change of the character (string) glyph of the present invention is to appropriately change the topological structure of the character (string), and the shape of the character (string) glyph may not be changed, the overall style, the visual influence caused by the watermark information is small, and the embedded Digital watermark information is not easily noticeable.

(3) The character (string) glyph design method of the present invention is flexible, and after determining a specific encoding rule, a plurality of glyphs having the same encoding but different font styles can be designed for the same character (string) as needed, without changing The detection method and related programs have good scalability.

(4) The "text digital watermarking technique based on a plurality of glyphs for a plurality of characters (strings) respectively) of the present invention "specially designs the glyphs of characters (strings) for the specific case of each character (string), and selects A specific encoding method, such that the character (string) carries a large amount of watermark information.

(5) The text digital watermarking technique based on the uniform encoding of a plurality of characters of a plurality of characters (strings) of the present invention can directly determine the text carried by each character (string) without detecting the semantic information of each character (string). The digital watermark information simplifies the detection method of the watermark information and reduces the error.

DRAWINGS

Fig. 1 shows, by way of example, a method of designing a plurality of glyphs of the same character by appropriately changing the topological structure of characters, and shows a method in which character glyphs correspond to "graphs" in the "Graphics" of the mathematical discipline.

Fig. 2 shows, by way of example, a glyph design and encoding method based on the number of connected regions and the number of closed regions included in the character glyph.

Figure 3 shows, by way of example, a method of designing multiple glyphs for a string.

Figure 4 shows, by way of example, a glyph design and coding method for each set of characters.

Figure 5 shows, by way of example, various forms of character (string) glyph design methods.

Fig. 6 shows, by way of example, the principle of loading and detecting the watermark information based on the digital watermarking technique of separately encoding a plurality of glyphs of a plurality of characters (strings).

Fig. 7 shows, by way of example, the principle of loading and detecting watermark information "based on a digital watermarking technique of uniformly encoding a plurality of glyphs for a plurality of characters (strings)". 5) A specific implementation method of encoding a remainder after dividing the number of independent connected regions by an integer. If this method is used, four different glyphs (400) to (403) of the character "you" in Fig. 4 are used. Row coding, assuming an integer of 2, is equivalent to encoding the parity of the number of independent connected regions included in the glyph, and assuming that the code corresponding to the number of independent connected regions is "1", and the independent connected regions are If the code corresponding to an even number of character glyphs is "0", the following results are obtained:

As shown in FIG. 4, the number of independent connected regions included in the glyphs (400) and (402) is 4, and the remainder after dividing by 2 is 0, that is, the number of independent connected regions included in the glyphs (400) and (402) is even. , their encoding is "0"; glyph (401),

(403) The number of independent connected areas included is 5, and the remainder after dividing by 2 is 1, that is, the number of independent connected areas included in the glyphs (401) and (403) is an odd number, and their codes are "1". In this way, the characters "you" are 4 different glyphs (400) ~

(403) After encoding according to this method, two different encoding states are corresponding, and two different codes are encoded.

6) A specific implementation method for encoding a remainder based on the sum of the number of independent connected regions and the number of independent closed regions divided by an integer

If this method is used to encode the 15 different glyphs (200)~(280) of the character "in" in Figure 2, assume that the integer is 4, the sum of the number of independent connected regions and the number of independent closed regions divided by 4 The code corresponding to the glyph with the remainder of 0 is "00", the code corresponding to the glyph with a remainder of 1 is "01", the code with the remainder of 2 is "10", and the code with the remainder of 3 is "10". 11 ", then the following results:

As shown in Figure 2, the sum of the number of independent connected areas and the number of independent closed areas included in the glyphs (2001), (2002), and (2003) is 1, and the remainder is divided by 4, and the glyph is encoded as "01". "; glyphs (210), (2301), (2302), (2303) contain the number of independent connected areas and the number of independent closed areas is 2, divided by 4, the remainder is 2, the glyph is encoded as "10" ; The sum of the number of independent connected areas and the number of independent closed areas included in the glyphs (220), (240), (2601), (2602), and (2603) is 3, and the remainder after dividing by 4 is 3, and the glyph is encoded as "11"; The sum of the number of independent connected areas and the number of independent closed areas contained in glyphs (250) and (270) is 4, and the remainder after dividing by 4 is 0, the glyph is encoded as "00"; the glyph (280) contains The sum of the number of independent connected areas and the number of independent closed areas is 5, and the remainder after division by 4 is 1 and the glyph is coded as "01", which is the same as the code of the glyph group (200). Thus, the 15 different glyphs in Figure 2 are encoded according to this method, corresponding to 4 different encoding states, and are encoded into 4 different codes.

The specific implementation method of comprehensively applying multiple coding methods for coding:

For example, the 15 glyphs of the character "initial" shown in Fig. 2 are integrated and encoded using a plurality of encoding methods. First, the encoding is performed using "the encoding method based on the combined set of the number of independent connected regions and the number of independent closed regions". As described above, there are nine kinds of encoding states for the nine types of glyphs, which can be coded into nine different codes. The glyphs (200) include the same glyphs (2001) and (2002). (2003); the glyphs (230) include the same glyphs (2301), (2302), and (2303). The fonts of the three glyphs (2601), (2602), and (2603) included in the glyph group (260) are the same.

Then, the glyphs in the glyph groups (200), (230), and (260) are secondarily encoded by the "coding method based on the graph structure". If the encoding method based on the undirected graph is adopted, the glyph group (200) contains the glyphs (2001), (2002),

(2003) corresponds to three different structures of "graphs"; then the glyphs (230) contain glyphs (2301), (2302), (2303) corresponding to two different structures of "graphs", where glyphs (2302), (2303) The corresponding "graph" is isomorphic; glyph group

(260) The included glyphs (2601), (2602), and (2603) also correspond to "graphs" of three different structures. After two encodings, the 15 character glyphs in Figure 2 have 14 different states and can be programmed into 14 different codes.

A specific implementation method for uniformly encoding multiple glyphs of multiple characters (strings):

This encoding method is characterized by a plurality of characters (strings) of a plurality of glyphs (note: a plurality of glyphs including the same character (string)), using the same method for multiple characters (strings) A plurality of glyphs are uniformly coded, and a rule for determining a character (string) glyph code value is uniform among a plurality of characters (strings). The specific encoding method can be determined based on the "several methods for encoding a plurality of glyphs of the same character (string)" of the present invention.

10 The rules of unified coding are extended to multiple glyphs of multiple characters.

For example, if the method of encoding based on the remainder of dividing the number of independent connected regions by an integer is used, the glyphs of the character set {"you", "good", "!"} are encoded, and the integer is assumed to be 2, Then, the mapping rules of the glyphs and the encoded values shown in FIG. 4 satisfy the requirements of the encoding method of this item. Since the integer is 2, it is equivalent to encoding the parity of the number of independent connected regions, and it is assumed that the code corresponding to the character glyph with an odd number of independent connected regions is "1", and the number of independent connected regions is even. The corresponding code is "0", as shown in Figure 4: The number of independent connected areas of the glyphs (400) (402) is 4, and the number of independent connected areas of the glyphs (410) (412) is 2, glyphs (440) The number of independent connected regions of (442) is also 2. The number of independent connected regions included in these glyphs is divided by 2 and the remainder is 0. That is, the number of independent connected regions included in these glyphs is even, so their encoding is The same, all are "0". The number of independent connected areas of the glyphs (401) (403) is 5, the number of independent connected areas of the glyphs (411) (413) is 1, and the number of independent connected areas of the glyphs (441) and (443) is also 1. The number of independent connected regions included in these glyphs is divided by 2 and the remainder is 1. That is, the number of independent connected regions included in these glyphs is odd, and their codes are the same, all being "1". In this example, it should be noted that the mapping rules between glyphs and coded values between different characters are consistent, and the mapping relationship between the remainder and the glyph code values cannot be changed between different characters. From this, it can be seen that the method of encoding based on the remainder after dividing the number of independent connected regions by an integer, the character set {"you", "good", "!" } is determined by the method of determining the glyph code value of FIG. Meet the requirements of this coding method.

On the contrary, if the "encoding method based on the number of independent connected regions" is still adopted, the glyphs of the character set {"you", "good", "!"} in Fig. 4 are encoded, and the glyphs shown in Fig. 4 are The mapping rules of the encoded values do not meet the requirements of this encoding method. This is because the number of independent connected regions of the glyphs (400) and (402) of the character "0" with an encoded value of 0 is 4, and the glyphs (410), (412) with the character "good" with a value of 0 are The number of independent connected areas of the glyphs (440) and (442) of the character "!" is 2, and the mapping rules of the characters "you" and the encoded values are inconsistent with the mapping rules of the characters "good" and "!". Similarly, for character glyphs (401), (403), (411) (413), (441), (443) with an encoding value of "1", the character "you" is the same as the character "good" (or "! The glyph of ") is also inconsistent with the mapping rules of the encoded values. It can be seen that the "encoding method based on the number of independent connected regions", the character set {"you", "good", "!" }, the method of determining the glyph code value of Fig. 4 does not satisfy the requirements of the encoding method of this item. .

As can be seen from the above, for the glyphs shown in Figure 4 of the same character set {"you", "good", "!" }, the collection cannot be unified by the "encoding method based on the number of independent connected regions" Encoding, and the set can be uniformly coded by "method of encoding based on the remainder of dividing the number of independent connected regions by two."

A specific implementation of a text digital watermarking technique based on a plurality of glyphs for a plurality of characters (strings):

Figure 6 shows, by way of example, the principle of watermark information loading and detection of the digital watermarking technique.

As shown in FIG. 6, the flow from block diagrams (600), (601) to (610), to block diagrams (620), (630), and finally to block diagrams (621), (631) represents the flow of watermark information loading. The flow represents loading the digital watermark information "0101100" (600) into the text "Hello, Mom!" (601). First, query the table shown in Figure 4 according to the semantic information of the characters in the text "Hello, Mom!" (601), determine the length (number of bits) of each character carrying the watermark information, and get the character "you", "Good", "!" The number of digits carrying the watermark information is one digit; the character "Mom" carries the watermark information with two digits; the character "," has no ability to carry watermark information. Then, the watermark information is segmented by the method shown in (610). As shown in the block diagram (610), the watermark information corresponding to the character "you" is "0"; the watermark information corresponding to the character "good" is "1";",,, does not correspond to any watermark information (because, as shown in Figure 4, "," does not have the ability to carry watermark information), the watermark information corresponding to the previous character "mother" is "01"; the latter character "mother" corresponds The watermark information is "10"; the character "! "The corresponding watermark information is "0". Next, the table in Fig. 4 is queried to find the glyph of each character glyph code equal to the watermark information corresponding to the character. As shown in Fig. 4, various glyphs in the character "you" Among them, the codes of (400) and (402) are 0, corresponding to the watermark information "0"; among the various glyphs of the characters "good", the codes of (411) and (413) are 1, corresponding to the watermark information "1" In the various glyphs of the character "mother", the codes of (431) and (435) are 01, corresponding to the watermark information "01", the codes of (432) and (436) are 10, corresponding to the watermark information "10"; In the character "! " Among the various glyphs, (440), (442) The code is 0, corresponding to the watermark information "0". Corresponding results of character glyphs and watermark information are shown in block diagrams (620) and (630). In Figure 6, each character has a two-character code equal to the watermark information corresponding to the character, and the two glyphs belong to different font styles: Song and Lishu. Finally, the glyphs of multiple characters of the same font style are combined to obtain a text string (621), (631) carrying the watermark information "0101100" (600), wherein the font style of the string (621) is a librarian, a character The font style of the string (631) is Song. Due to the uniform font style between the characters in the strings (621) and (631), the visual impact on the digital watermark information "0101100" (600) is small.

Figure 8 shows the process of digital watermark information detection by this technique. The carrier file (800) with digital watermark information identifies the original carrier electronic file (820) by the character semantic recognition system (810). On the basis of this, the character glyph recognition system (830) performs character (string) glyph coding recognition on the carrier file (800) with digital watermark information, and recognizes the coding of each character (string) glyph, and then combines the carrier files. The encoding of the character (string) yields digital watermark information (840). The character glyph recognition system (830) needs to use the recognition result of the character semantic recognition system (810) to perform character (string) glyph coding recognition because in the present technology, the encoding method of each character (string) in the carrier file can be Differently, the watermark information detection process first needs to clarify the specific glyph coding method of each character (string) in the carrier file. Therefore, it is necessary to perform semantic recognition on each character (string) in the carrier file, find the coding table shown in FIG. 4 by the semantic information of the character (string), and determine a specific glyph coding method for each character (string), thereby detecting each The glyph feature corresponding to the encoding method further determines the encoding of each character (string) glyph. In addition, the original carrier electronic file can be directly used as a template for the detection of watermark information. The original carrier electronic file template (850) provides the semantic information of the characters (strings) in the carrier file, and the character font recognition system (830) can perform character (string) feature and code recognition. The watermark information detection process at this time is a Non-blind watermark detection process.

For example, if digital watermark information needs to be detected from the character string (621) or (631) shown in Fig. 6, the detection system needs to know the specific glyph coding mode of each character constituting the character string (621) or (631). Therefore, the detection system should first identify the semantic information of each character (including manual recognition), or directly obtain the original carrier electronic file, use the original carrier electronic file as a template to provide the semantic information of the character, and then obtain each character through the semantic information of the character. Specific coding method. In this example, the characters "you", "good", "!" are based on "encoding method based on the number of independent connected areas", and the character "mother" is based on "number of independent connected areas and number of independent closed areas" The encoding method of the combined set ", character", "does not carry watermark information. Then, the detecting system detects the glyph features corresponding to the encoding method according to the specific glyph encoding method of each character. For example, the glyph features of the number of independent connected regions corresponding to the glyphs of the characters "you", "good", and "!" should be detected. The glyph feature of the combined set of the number of independent connected regions corresponding to the glyph of the character "mother" and the number of independent closed regions is detected. The result of detecting the character glyph feature is to obtain the corresponding character glyph encoding, and the character string (621). The correspondence between each character glyph and the encoding in (631) is as shown in the block diagrams (620) and (630). The encoding of each character glyph is combined to obtain the digital watermark information carried by the character strings (621) and (631) as "0101100" (600).

It should be noted that the method for detecting glyph features corresponding to the coding method of the present invention is a prior art mature technology. For example, the isomorphism judgment of the "graph" of the character (string) glyph mapping, the calculation of the number of independent connected regions included in the character (string) glyph, and the number of independent closed regions can be completed by using existing mature techniques. Techniques are not included in the scope of the invention. A specific implementation of a textual digital watermarking technique based on a plurality of glyphs for a plurality of characters (strings) - Figure 7 shows, by way of example, the principle of loading and detecting watermark information of the present digital watermarking technique.

As shown in FIG. 7, the flow from block diagrams (700), (701) to (710), to block diagrams (720), (730), and finally to block diagrams (721), (731) represents the flow of watermark information loading. This flow means loading the digital watermark information "010" (700) into the text "Hello I" (701). As shown in Fig. 4, the encoding method of each character in the text string "Hello!" ( 701 ) is the same (using the method of encoding based on the remainder after dividing the number of independent connected regions by 2), Moreover, the length (number of bits) of the watermark information carried by each character is the same, and no special watermark information segmentation processing is required, and only the watermark information is required to be sequentially and equal in length (number of bits). As shown in block (710), each character order corresponds to one bit of information in the watermark information "010" (700). Then query the table in Figure 4 to find the glyph of each character glyph code equal to the watermark information corresponding to the character. As shown in Figure 4, in the various glyphs of the character "you", the encoding of (400) and (402) is 0, corresponding to the watermark letter.

12 In the various glyphs of the character "good", the codes of (411) and (413) are 1, corresponding to the watermark information "1"; among the various glyphs of the character "!", (440), The code of (442) is 0, corresponding to the watermark information "0". Corresponding results of character glyphs and watermark information are shown in block diagrams (720) and (730). In Figure 7, each character has a two-character code equal to the watermark information of the character, and the two glyphs belong to different font styles: Song and Lishu. Finally, the glyphs of multiple characters of the same font style are combined to obtain a text string (721), (731) carrying the watermark information "010" (700), wherein the font style of the string (721) is a librarian, a character The font style of string (731) is Song. Due to the uniform font style between the characters in the strings (721) and (731), the visual effect of loading the digital watermark information "010" (700) is small.

Assuming that "the method of encoding based on the sum of the number of independent connected regions and the number of independent closed regions divided by an integer" is used, the words (strings) in the sentence (350) in FIG. 3 are uniformly encoded, The integer is taken as 2, which is equivalent to uniformly coding the parity of the number of independent connected regions and the number of independent closed regions. It is also assumed that the code corresponding to the odd-numbered string glyph is "1", and the code corresponding to the even-numbered string glyph is "0", and the sentence (350) carries the digital watermark information as follows: The sum of the number of independent connected areas (3501) and the number of independent closed areas is 4, (3502) is 13, (3503) is 11, (3504) is 2, (3505) is 4, and (3506) is 2. (3507) is 4, (3508) is 7. It can be seen that the sum of the number of independent connected areas of words (3501), (3504), (3505), (3506), (3507) and the number of independent closed areas is even, coded as "0", words (3502), (3503 (3508) The sum of the number of independent connected areas and the number of independent closed areas is an odd number, and the code is "1". Thus, the words (strings) in the sentence (350) correspond to the code "01100001" in the order from left to right, and the ASC code corresponding to the binary number "01100001" is "a", which is equivalent to the sentence (350). The digital watermark information "a" (or binary watermark information "01100001") is carried.

Figure 9 shows the process of digital watermark information detection by this technique. The character glyph recognition system (910) directly performs character (string) glyph code recognition on the carrier file (900) with digital watermark information. Since each character (string) in the carrier file (900) with digital watermark information has a common glyph encoding method, the glyph features of each character (string) corresponding to the encoding method can be directly detected, and each character (string) is further determined. The encoding of the glyphs, the encoding of each character (string) in the carrier file, results in digital watermark information (920). The whole watermark detection process is a blind watermark detection process, which does not require obtaining a template of the original carrier electronic file, or performing character (string) semantic recognition. For example, if it is necessary to detect digital watermark information from the character string (721) or (731) shown in Fig. 7, the character font recognition system (910) performs character glyph recognition in accordance with a unified glyph encoding method. The characters "you", "good", and "!" in the example use the "method based on the number of independent connected regions divided by 2" (equivalent to "based on the number of independent connected regions" The method of encoding the parity is "), so the character glyph recognition system (910) can directly judge the string (721) or

(731) The parity of the number of independent connected regions included in each character glyph, the odd code 'is '1', and the even code is '0'. This rule is the same for each character, so that the code corresponding to the character glyph can be directly detected. It is not necessary to know the semantic information of each character, and the correspondence between each character glyph and the code in the characters (721) and (731) is as shown in the block diagram. (720), (730) are shown. Combine the encoding of each character glyph to obtain the digital watermark information carried by the characters (721) and (731) as "010"

(700).

The essential difference between this digital watermarking technology and the "text digital watermarking technique based on the encoding of multiple fonts of multiple characters (strings)" is: In this technology, in the carrier file carrying the digital watermark information Each character (string) needs to adopt a common glyph encoding method, and for "text digital watermarking technology based on a plurality of glyphs for multiple characters (strings) respectively", each character (string) in the carrier file can be different. Glyph encoding method.

13

Claims

Claim

A hidden data communication method, characterized in that: a glyph of a different topological structure of a character or a character string is used to carry hidden information.

2. A method of designing the same character or string into a plurality of glyphs, and the feature is: by changing the topology of the character or the string, thereby obtaining various shapes of the same character or string for carrying hidden information.

3. A method of designing the same character or character string into a plurality of glyphs as claimed in claim 2, wherein: changing the character or character string by changing a concatenation relationship between the strokes constituting the character or the character string. Topology to get multiple shapes of the same character or string.

4. A method for encoding multiple glyphs of the same character or a string, the feature is: characters or string glyphs of the same topology have the same encoding, and characters of different topological structures or string glyphs cannot be identical. That is, there are at least two different encodings.

5. A method of encoding multiple glyphs of the same character or string, the steps of which are:

(1) Mapping character or string glyphs to "graphs" defined in the "Graphics" of mathematics;

(2) The characters or string glyphs corresponding to the isomorphic "graph" have the same encoding. The encoding of the characters or string glyphs corresponding to different "maps" cannot be identical, that is, there are at least two different encodings.

6. A method for encoding a plurality of glyphs of the same character or a character string, wherein: the number of independent connected regions included in the character or string glyph is encoded, and the characters or string glyphs having the same number of independent connected regions are respectively. With the same encoding, the encoding of characters or string glyphs with different numbers of independent connected regions cannot be identical, that is, there are at least two different encodings.

7. A method for encoding a plurality of glyphs of the same character or a character string, wherein: the number of independent connected regions included in the character or string glyph and the number of independent closed regions included in the character or string glyph. The combined set is encoded, and the characters or string glyphs corresponding to the same combination set have the same encoding, and the encoding of the corresponding character or string glyph of different combination sets cannot be identical, that is, there are at least two different encodings.

8. A method for encoding a plurality of glyphs of the same character or a character string, wherein: the sum of the number of independent connected regions included in the character or string glyph and the number of independent closed regions included in the character or string glyph. Encoding, the same character or string glyph has the same encoding, and the encoding of the different characters or string glyphs cannot be identical, that is, there are at least two different encodings.

9. A method for encoding a plurality of glyphs of the same character or a character string, wherein: the character or string having the same remainder is encoded by dividing the number of independent connected regions included in the character or string glyph by the remainder of the integer. Glyphs have the same encoding, and different numbers of characters or string glyphs have different encodings.

10. A method of encoding a plurality of glyphs of the same character or a character string, wherein: the sum of the number of independent connected regions included in the character or string glyph and the number of independent closed regions included in the character or string glyph Divided by the remainder after the integer, the same number of characters or string glyphs have the same encoding, and the remainder of the different characters or string glyphs have different encodings.

11. Combining the use of a plurality of glyphs of the same character or character string as encoded in claim 4, 5, 6, 7, 8, 9, or 10 to encode a plurality of glyphs of the same character or character string method.

12. A method of encoding a plurality of glyphs of the same character or a character string as claimed in claim 4, 5, 6, 7, 8, 9, 10 or 11 for a plurality of glyphs of a plurality of characters or character strings The encoding method is characterized in that: in a collection of a plurality of characters or a plurality of glyphs of a character string, a plurality of glyphs of a plurality of characters or a character string are uniformly encoded, and a character or string glyph code value is determined. Consistent across multiple characters or strings.

13. A method of encoding a plurality of glyphs of a character or a character string, wherein: the mathematical operation result of the number of independent connected regions included in the character or string glyph and the number of independent closed regions is used as a parameter.

14. Text digital watermark embedding and detecting method, which is characterized by:

14 (1) designing a suitable character or character string glyph according to the method of claim 2 or 3, and encoding a plurality of glyphs of a plurality of characters or character strings of the carrier file by the method according to claims 4 to 11, respectively. . The digital watermark information is embedded in a plurality of glyphs of characters or strings of the carrier file, and the encoding of the character or string glyph is used to represent the digital watermark information.

(2) for each character or string in the carrier file in the respective encoding method determined in (1), respectively detecting the glyph features of each character or string in the carrier file to determine the character or string glyph of the carrier file. Encoding to detect digital watermark information.

15. A text digital watermark embedding and detecting method, which is characterized by:

(1) A suitable character or string glyph is designed by the method of claim 2 or 3, and a plurality of glyphs of a plurality of characters or character strings in the carrier file are uniformly encoded by the method according to claim 12. The digital watermark information is embedded in a plurality of glyphs of characters or strings of the carrier file, and the code of the character or string glyph is used to represent the digital watermark information.

(2) For the common encoding method determined in (1) for each character or character string in the carrier file, uniformly detect the glyph features of each character or string in the carrier file to determine the character or string glyph of the carrier file. Encoding to detect digital watermark information.

15